#
datacleaning
Here are 271 public repositories matching this topic...
Aylr
commented
Dec 28, 2020
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
- make a batch from a long query string
- run validation
- render result to data docs
- See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
DataPrep — The easiest way to prepare data in Python
data-science
connector
exploratory-data-analysis
eda
apis
data-exploration
cleaning
dataprep
datacleaning
dataconnector
apiwrapper
webconnector
datapreparation
-
Updated
Feb 21, 2022 - Python
3
mokeeqian
commented
Dec 12, 2021
Does HyperGBM's make_experiment return the best model?
How does it work on paramter tuning? It's say that, what's its seach space (e.g. in XGboost)???
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization
nlp
machine-learning
sentiment-analysis
cross-validation
eda
data-visualization
wordcloud
classification
data-analysis
bag-of-words
hashtags
evaluation-metrics
count-vectorizer
datacleaning
-
Updated
May 14, 2019 - Jupyter Notebook
This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/
-
Updated
Nov 1, 2020 - Jupyter Notebook
data and code for scrapping and cleaning data on covid-19 in India from https://www.mohfw.gov.in/ and https://www.covid19india.org/
-
Updated
Oct 1, 2020 - Jupyter Notebook
An open-source package for python to clean raw text data
-
Updated
Dec 29, 2021 - Python
portfolio
data
machine-learning
time-series
plotly
ml
naive-bayes-classifier
folium
nlp-machine-learning
datacleaning
-
Updated
Sep 2, 2018 - HTML
Predicts home prices of Bangalore. Used Flutter, Flask and Jupyter Notebook.
python
data-science
linear-regression
exploratory-data-analysis
jupyter-notebook
flutter
flask-api
datacleaning
-
Updated
May 11, 2021 - Jupyter Notebook
This repo contains 4 different projects. Built various machine learning models for Kaggle competitions. Also carried out Exploratory Data Analysis, Data Cleaning, Data Visualization, Data Munging, Feature Selection etc
data-science
machine-learning
exploratory-data-analysis
kaggle
kaggle-competition
data-analysis
house-price-prediction
data-munging
datavisualization
datacleaning
datamunging
diabetes-prediction
creditcardfrauddetection
bankloanprediction
-
Updated
Jun 30, 2021 - Jupyter Notebook
Examples for Optimus a Data Cleansing Library for Big Data.
-
Updated
Oct 24, 2017
A basic machine learning model built in python jupyter notebook to classify whether a set of tweets into two categories: racist/sexist non-racist/sexist.
python
training
data
analysis
anaconda
machine
projects
datascience
deeplearning
predictive-modeling
datacleaning
textclassification
anaconda3
textpreprocessing
jupyt
predictiveanalytics
-
Updated
Jun 24, 2019 - Jupyter Notebook
All kaggle datasets and the R codes
-
Updated
Oct 10, 2020 - HTML
repository contains complete WHO data of 2003 outbreak with code used to web scrap, data mung and cleaning
-
Updated
May 24, 2020 - Jupyter Notebook
mde: Missing Data Explorer
data-science
r
statistics
exploratory-data-analysis
rstats
data-analysis
replace
missing-data
missingness
r-package
missing
data-exploration
data-cleaning
recode
omit
datacleaner
r-stats
datacleaning
missing-values
missing-value-treatment
-
Updated
Feb 10, 2022 - R
Spark-lean, an interactive PySpark-based Data Cleaning Library
-
Updated
Nov 22, 2018 - Python
A package to aid with data cleaning using pandas.
-
Updated
Feb 2, 2022 - Python
The course material from multiple sources
-
Updated
Jul 13, 2020 - Jupyter Notebook
-
Updated
Nov 15, 2021 - Jupyter Notebook
correlation
random-forest
decision-tree
k-fold
gradient-boosting
stepwise
datacleaning
skewness
datamodeling
-
Updated
May 18, 2020 - R
Machine Learning Project on Imbalanced Data in R
machine-learning
support-vector-machine
feature-engineering
naive-bayes-algorithm
hypothesis-testing
smote
oversampling
imbalanced-learning
xgboost-algorithm
undersampling
datacleaning
dataexploration
-
Updated
Feb 24, 2018 - R
Analyze sales data from more than 16,500 games.
-
Updated
May 6, 2020 - HTML
Text Preprocessing
-
Updated
Nov 12, 2017 - Python
Samples for Azure Databricks Orientation
python
json
json-schema
azure
pandas-dataframe
pandas
seaborn
pyspark
matplotlib
azure-storage
databricks
pyodbc
pyspark-notebook
pyspark-tutorial
databricks-notebooks
datacleaning
azuresqldb
matplotlib-pyplot
seaborn-plots
azureblobstorage
-
Updated
Nov 3, 2020 - HTML
-
Updated
Jun 21, 2018 - Jupyter Notebook
My fictitious firm, GDSMC Global, is a security consultancy focusing on supporting governments around the world in understanding, predicting, and stopping terrorism attacks. Our goal is to allow individual nation states to better deploy security resources to reduce the likelihood of successful terrorism in the future, and to understand what are the likely coming costs of terrorism so that resources can be set aside, in advance, to rebuild after inevitable and unfortunate attack.Although governments can submit their own internal security data to us for study, our models are constructed using the Global Terrorism Database (GTD) maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism at the University of Maryland ( http://start.umd.edu/gtd/ ).
visualization
consulting
r
deployment
imputation
logistic-regression
predictive-modeling
business-solutions
cleaning
correlation-matrices
datacleaning
-
Updated
Feb 28, 2018 - R
Improve this page
Add a description, image, and links to the datacleaning topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the datacleaning topic, visit your repo's landing page and select "manage topics."
Sometimes the cell UI gets URL boundaries wrong.
To Reproduce
Create a project and put the following string in a cell: