#
dataquality
Here are 23 public repositories matching this topic...
sscdotopen
commented
Sep 24, 2018
We should have more tests for the column profiling functionality.
kaspersorensen
commented
Feb 13, 2019
The GUI currently includes social links to Twitter, LinkedIn etc. We should add our new Gitter channel to these links.
Library for Semi-Automated Data Science
python
data-science
machine-learning
scikit-learn
artificial-intelligence
interoperability
hyperparameter-optimization
hyperparameter-tuning
ibm-research
automl
automated-machine-learning
dataquality
hyperparameter-search
ibm-research-ai
pipeline-tests
pipeline-testing
-
Updated
Jun 16, 2020 - Python
Open
2019年10月份迭代工作
xhp681
commented
Oct 4, 2019
Columbus - the powerful monitoring tool
-
Updated
Nov 10, 2018 - Java
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementación de leyes de protección de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algún tipo de encriptación. Adicionalmente permite agregar reglas de validación más complejas sobre la misma tabla.
data
spark
hive
hadoop
bigdata
cloudera
data-warehouse
data-engineering
parquet
chile
hortonworks
data-engineer
gdpr
datamart
spark-sql
dataquality
data-governance
huemul-bigdatagovernance
trabaja-sobre-spark
huemul
-
Updated
Jun 20, 2020 - Scala
Open source clients for working with Data Culpa Validator services from data pipelines
-
Updated
May 25, 2020 - Python
Tutorial and examples of Data Quality in Big Data System
-
Updated
Apr 25, 2017
Java client package for Quadient Data Services
-
Updated
Feb 7, 2019 - Java
A Practical Approach for Population Data Quality Assessment
-
Updated
Jan 16, 2020 - SAS
Data Stream Quality Control with Apache Kafka
-
Updated
May 17, 2020 - Python
a nix DataProfiler for deep analysis of raw tabular file data quality.
-
Updated
Jun 17, 2017 - Awk
This program is running daily to check the sensor and probe data quality.
-
Updated
Sep 8, 2017 - PigLatin
NOW-QUAL: Vaccine coverage survey Near-time Data Monitoring and Cleaning standard development template
-
Updated
Mar 31, 2019 - Stata
This repository provides our generic test protocol for the integration test of ASS.
-
Updated
Oct 10, 2019
CSV Data Validator is a tool to validate csv file. It parse csv and validate the data with .hdr(csv meta data) before ingestion to Data Lake. It checks data file availability for every day load and validate data with respective meta data like File Size, Checksum, Delimiter, Record count etc. It ensure landed data conformity before give go ahead for data ingestion to Data Lake. It generate complete stats or error log.
metadata
quality
checksum
delimiter
size
csv-parser
metadata-parser
opencsv
dataquality
datavalidator
quality-check
datavalidation
univocity
-
Updated
Jan 6, 2019 - Java
Examples of using Quadient Data Services using Postman
-
Updated
Jul 26, 2018
Improve this page
Add a description, image, and links to the dataquality topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dataquality topic, visit your repo's landing page and select "manage topics."
As a user,
It would be nice to have the "Observed Value" Field be standardized to show percentages of "successful" validations, vs a mix of 0% / 100%. This causes confusion as there are different levels of validation outputs with different verbage (making someone not used to the expectations confused) I've given an example below in a screenshot for what I mean:
![image](https://user-images.g