#

data-cleaning

Here are 1,911 public repositories matching this topic...

miller

johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Updated May 26, 2023
Go

cleanlab / cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Updated May 26, 2023
Python

unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

testing schema validation data-validation pandas-dataframe assertions pandas testing-tools data-processing dataframes data-cleaning hypothesis-testing data-verification pandas-validation data-check data-assertions dataframe-schema pandas-validator

Updated May 26, 2023
Python

justmarkham / pandas-videos

Jupyter notebook and datasets from the pandas Q&A video series

python data-science tutorial jupyter-notebook pandas data-analysis data-cleaning

Updated May 16, 2022
Jupyter Notebook

justmarkham / DAT8

General Assembly's 2015 Data Science course in Washington, DC

Updated Oct 6, 2022
Jupyter Notebook

hi-primus / optimus

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

data-science machine-learning spark bigdata data-transformation pyspark data-extraction data-analysis data-wrangling dask data-exploration data-preparation data-cleaning data-profiling data-cleansing big-data-cleaning data-cleaner cudf dask-cudf

Updated May 22, 2023
Python

sfirke / janitor

simple tools for data cleaning in R

data-science r excel spss tidyverse pivot-tables data-analysis data-cleaning dirty-data tabulations

Updated May 26, 2023
R

data-forge / data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

visualization nodejs javascript linq json data csv pandas data-visualization data-analysis data-wrangling data-management data-manipulation data-cleaning data-munging data-cleansing data-forge

Updated May 1, 2023
TypeScript

skrub-data / skrub

Prepping tables for machine learning

data-science data machine-learning data-analysis data-preprocessing data-preparation data-cleaning dirty-data

Updated May 22, 2023
Python

schema-inspector / schema-inspector

Schema-Inspector is a simple JavaScript object sanitization and validation module.

javascript sanitization validation data-cleaning

Updated Dec 22, 2022
JavaScript

objectiv-analytics

objectiv / objectiv-analytics

Powerful product analytics for data teams, with full control over data & models.

Updated Jan 13, 2023
Python

data-cleaning / validate

Professional data validation for the R environment

r validation data-cleaning

Updated May 1, 2023
R

klib

akanz1 / klib

Sponsor

Easy to use Python library of customized functions for cleaning and analyzing data.

python data-science data-visualization feature-selection data-analysis klib data-preprocessing data-cleaning

Updated Jan 14, 2023
Python

msamogh / nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

machine-learning torch pytorch data-preprocessing preprocessing data-processing data-cleaning data-pipeline

Updated Sep 22, 2022
Python

jim-schwoebel / voicebook

🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

visualization security data machine-learning server voice python3 voice-recognition generation transcription voice-control data-cleaning voice-assistant encryption-decryption voice-recording voice-activity-detection wake-word-detection featurization voice-computing

Updated Dec 8, 2022
Python

rasgointelligence / feature-engineering-tutorials

Data Science Feature Engineering and Selection Tutorials

python data-science machine-learning tutorial jupyter notebook scikit-learn exploratory-data-analysis tutorials pandas feature-selection xgboost feature-engineering features data-cleaning pandas-profiling sweetviz pyrasgo

Updated May 24, 2023
Jupyter Notebook

encord-team / encord-active

The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling to supercharge model performance.

python data-science data machine-learning computer-vision deep-learning data-validation annotations ml object-detection data-cleaning active-learning data-quality data-centric mlops noisy-labels model-quality label-errors label-quality

Updated May 26, 2023
Python

probcomp / PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning

probabilistic-programming bayesian-inference data-cleaning probabilistic-graphical-models data-cleansing

Updated May 25, 2022
Julia

ajaymache / data-analysis-using-python

Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊

data-science exploratory-data-analysis eda data-visualization kaggle-competition data-analytics data-analysis data-wrangling data-cleaning kaggle-dataset data-cleansing data-science-python data-analysis-python kaggle-used-cars-dataset

Updated Jan 2, 2019
Jupyter Notebook

ekstroem / dataMaid

An R package for data screening

reproducible-research data-cleaning data-screening

Updated Jan 25, 2022
HTML

Improve this page

Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."