Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
Updated
May 26, 2023 - Go
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
A light-weight, flexible, and expressive statistical data testing library
Jupyter notebook and datasets from the pandas Q&A video series
General Assembly's 2015 Data Science course in Washington, DC
simple tools for data cleaning in R
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Prepping tables for machine learning
Schema-Inspector is a simple JavaScript object sanitization and validation module.
Powerful product analytics for data teams, with full control over data & models.
Easy to use Python library of customized functions for cleaning and analyzing data.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Data Science Feature Engineering and Selection Tutorials
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling to supercharge model performance.
A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Exploratory data analysis
An R package for data screening
Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."