#
cleaning-data
Here are 149 public repositories matching this topic...
A framework for cleaning Chinese dialog data
-
Updated
May 14, 2021 - Python
An open-source package for python to clean raw text data
-
Updated
Dec 29, 2021 - Python
Time-series Data Preprocessing Studio in Jupyter notebook.
-
Updated
Jan 23, 2019 - Jupyter Notebook
Implementation of the paper Identifying Mislabeled Data using the Area Under the Margin Ranking: https://arxiv.org/pdf/2001.10528v2.pdf
deep-learning
paper
pytorch
noise-detection
pytorch-implementation
cleaning-data
mislabel-identification
-
Updated
Feb 6, 2020 - Jupyter Notebook
Simple and automatic data cleaning in one line of code! It performs one-hot encoding, date & time casting to datetime dtype, detects binary columns, safely convert non-numeric columns to numeric dtypes, cleaning dirty/empty values, normalizing values and removing unwanted columns all in one line of code. Get your data ready for model training and fitting quickly.
-
Updated
May 22, 2021 - Python
Data cleaning tool.
-
Updated
Apr 20, 2021 - JavaScript
Some little notes from the author for everyone who wants to know or learn about the process that a data scientist must do from the beginning of data collection to making predictions with a model that has been built. These notes are based on the knowledge that the authors have learned and implemented. Enjoy it!
data-science
exploratory-data-analysis
statistical-methods
data-visualization
python3
statistical-analysis
supervised-learning
data-analysis
unsupervised-learning
cleaning-data
-
Updated
Sep 29, 2020 - Jupyter Notebook
A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with parallel processing using custom number of processes.
nlp
natural-language-processing
word2vec
vocabulary
python3
reduction
spacy
feature-extraction
preprocess
glove
vectorization
tfidf
stages
parallel-processing
cleaning-data
-
Updated
Jan 21, 2022 - Python
A simple tool for cleaning image datasets at a glance.
computer-vision
annotation
interface
tool
image-dataset
binary-classification
annotation-tool
cleaning-data
cleaning-dataset
-
Updated
Dec 16, 2020 - TypeScript
Udacity Data Analyst Nanodegree - Project IV
python
json
numpy
csv-files
pandas
requests
report
data-wrangling
tweepy
data-analyst
visualizations
udacity-data-analyst-nanodegree
cleaning-data
tsv-files
juypter-notebook
assessing-data
-
Updated
Jun 26, 2020 - HTML
NodeJS wrapper for the email-validator.net API
javascript
typescript
validation
data-validation
email
verification
email-marketing
email-validation
node-js
node-module
email-verification
data-quality
cleaning
cleaning-data
byteplant
email-cleaning
-
Updated
Feb 12, 2022 - TypeScript
Udacity Natural Language Processing Nanodegree.
nlp
pipeline
machine-translation
scraping
speech-recognition
preprocessing
tokenization
stemming
hmm-model
lemmatization
wrangling
cleaning-data
partsofspeechtagger
-
Updated
Aug 8, 2020 - HTML
We have all been in a situation where we go to a doctor in an emergency and find that the consultation fees are too high. As data scientists, we all should do better. What if you have data that records important details about a doctor and you get to build a model to predict the doctor’s consulting fee?
-
Updated
Oct 1, 2020 - Jupyter Notebook
Practice Repository for Natural Language tasks!
nlp
spacy
preprocessing
tokenization
stemming
lemmatization
nltk-library
text-blob
cleaning-data
partofspeech-tagger
-
Updated
Jun 9, 2020 - Jupyter Notebook
Clean your data frame in one readable function
-
Updated
Sep 9, 2021 - R
The main aim is to clean the data with pandas library.
-
Updated
Feb 13, 2020 - Jupyter Notebook
Learn how much Singapore is saving energy per years by recycling plastics, paper, glass, ferrous and non-ferrous metal
-
Updated
Dec 1, 2021 - Jupyter Notebook
A complete collection of commonly used code Snippets in Python
mysql
natural-language-processing
data-mining
sql
mongodb
image-processing
pandas
python3
text-processing
string-matching
automations
cleaning-data
-
Updated
Jun 28, 2021 - Python
Introducing you to the fundamentals of the quintessential Python data analysis library, pandas, and its core data structures – the Series and DataFrame objects.
dataframes
data-selection
cleaning-data
pandas-series
preprocessing-data
fundamental-programming-tools
the-pandas-library
the-pandas-documentation
collecting-data
-
Updated
Nov 8, 2021 - Jupyter Notebook
Data analysis and forecasting applied to World Happiness
-
Updated
Jan 15, 2022 - Jupyter Notebook
The aim of this project is to find mobile app profiles that are profitable for the App Store and Google Play markets.
-
Updated
Jan 23, 2020 - Jupyter Notebook
An analysis and exploration of 4 years of Trump's tweets, including data cleaning, sentiment analysis and topic categorization with LDA and NMF.
-
Updated
Jul 10, 2021 - Jupyter Notebook
In this project, I will demonstrate my skill in cleaning data with R.
-
Updated
Jun 18, 2021
Project No. 4 in the Udacity Data Analyst Nanodegree Winter 2019-2020. Using Python, we’ll gather data from a variety of sources, assess its quality and tidiness, then clean it. We’ll document our wrangling efforts in a Jupyter Notebook, plus showcase them through analyses and visualizations using Python and SQL.
-
Updated
May 31, 2021 - Jupyter Notebook
Interactive polynomial fit with a smooth union between original data and fit
-
Updated
Sep 25, 2019 - Python
This file contains advanced excel formulas and functions like Vlookup with match function, index and match function, offset & counta, logical operators, max if function, formula formatting, text functions, Date and time functions, data cleaning, etc. which I have worked on to take my excel skills at the high level.
-
Updated
Feb 4, 2021
Different types of Linear Regression using sklearn performed on KC House Data
machine-learning
linear-regression
data-visualization
sklearn-library
cleaning-data
house-sales-prediction
regression-sklearn
-
Updated
Oct 26, 2019 - Jupyter Notebook
Unsupervised Machine Learning- CyrptoCurrency Analysis, using several models on a cryptocurrency data in order to discover patterns and groups in data. Analysis done to create a report that includes what cryptocurrencies are on the trading market and how they could be grouped in order to create a classification system for potential new investments into the cryptocurrency market.
-
Updated
Nov 1, 2021 - Jupyter Notebook
This exclusive repository consists of various minor data analysis projects and study materials to acquire the knowledge behind data visualization and programming with MATLAB. Diverse topics are covered from Crime against women, Sentiment Analysis, Digital Signal Analysis, Student Academic Performance Data to Analyzing Temperature and Humidity in Finland.
data-science
plots
sentiment-analysis
data-visualization
dataset
classification
data-analysis
imdb-dataset
cleaning-data
snaphu
-
Updated
Jan 15, 2022 - Jupyter Notebook
Improve this page
Add a description, image, and links to the cleaning-data topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the cleaning-data topic, visit your repo's landing page and select "manage topics."
Background
This thread is borne out of the discussion from #968 , in an effort to make documentation more beginner-friendly & more understandable.
One of the subtasks mentioned in that thread was to go through the function docstrings and include a minimal working example to each of the public functions in pyjanitor.
Criteria reiterated here for the benefit of discussion: