Skip to content

cleanlab/examples

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
December 4, 2022 11:15
December 4, 2022 11:15
December 4, 2022 11:15
December 27, 2021 09:48

cleanlab Examples

This repo contains code examples that demonstrate how to use cleanlab with specific real-world models/datasets, how its underlying algorithms work, how to get better results via advanced functionality, and how to train certain models used in some cleanlab tutorials.

To quickly learn how to run cleanlab on your own data, first check out the quickstart tutorials before diving into the examples below.

Table of Contents

Example Description
1 find_label_errors_iris Find label errors introduced into the Iris classification dataset.
2 classifier_comparison Use CleanLearning to train 10 different classifiers on 4 dataset distributions with label errors.
3 hyperparameter_optimization Hyperparameter optimization to find the best settings of CleanLearning's optional parameters.
4 simplifying_confident_learning Straightforward implementation of Confident Learning algorithm with raw numpy code.
5 visualizing_confident_learning See how cleanlab estimates parameters of the label error distribution (noise matrix).
6 find_tabular_errors Handle mislabeled tabular data to improve a XGBoost classifier.
7 cnn_mnist Finding label errors in MNIST image data with a Convolutional Neural Network.
8 huggingface_keras_imdb CleanLearning for text classification with Keras Model + pretrained BERT backbone and Tensorflow Dataset.
9 fasttext_amazon_reviews Finding label errors in Amazon Reviews text dataset using a cleanlab-compatible FastText model.
10 multiannotator_cifar10 Iteratively improve consensus labels and trained classifier from data labeled by multiple annotators.
11 active_learning_multiannotator Improve model performance by iteratively collecting additional labels from annotators. This active learning pipeline allows for examples labeled in batches by multiple annotators.
12 outlier_detection_cifar10 Train AutoML for image classification and use it to detect out-of-distribution images.
13 multilabel_classification Find label errors in an image tagging dataset (CelebA) using a Pytorch model you can easily train for multi-label classification.
14 entity_recognition Train Transformer model for Named Entity Recognition and produce out-of-sample pred_probs for cleanlab.token_classification.
15 transformer_sklearn How to use KerasWrapperModel to make any Keras model sklearn-compatible, demonstrated here for a BERT Transformer.
16 cnn_coteaching_cifar10 Train a Convolutional Neural Network on noisily labeled Cifar10 image data using cleanlab with coteaching.

Instructions

To run the latest example notebooks, execute the commands below which will install the required libraries in a virtual environment.

$ python -m pip install virtualenv
$ python -m venv cleanlab-examples  # creates a new venv named cleanlab-examples
$ source cleanlab-examples/bin/activate
$ python -m pip install -r requirements.txt

Alternatively you can only install those dependencies required for a specific example by calling pip install -r requirements.txt inside the subfolder for that example (each example's subfolder contains a separate requirements.txt file).

It is recommended to run the examples with the latest stable cleanlab release (pip install cleanlab). However be aware that notebooks in the master branch of this repository are assumed to correspond to master branch version of cleanlab, hence some very-recently added examples may require you to instead install the developer version of cleanlab (pip install git+https://github.com/cleanlab/cleanlab.git). To see the examples corresponding to specific version of cleanlab, check out the Tagged Releases of this repository (e.g. the examples for cleanlab v2.1.0 are here).

Running all examples

You may run the notebooks individually or run the bash script below which will execute and save each notebook (for examples: 1-7). Note that before executing the script to run all notebooks for the first time you will need to create a jupyter kernel named cleanlab-examples. Be sure that you have already created and activated the virtual environment (steps provided above) before running the following command to create the jupyter kernel.

$ python -m ipykernel install --user --name=cleanlab-examples

Bash script to run all notebooks:

$ bash ./run_all_notebooks.sh

Older Examples

For running older versions of cleanlab, look at the Tagged Releases of this repository to see the corresponding older versions of the example notebooks (e.g. the examples for cleanlab v2.0.0 are here).

See the contrib folder for examples from v1 of cleanlab which may be helpful for reproducing results from the Confident Learning paper.

License

Copyright (c) 2017-2023 Cleanlab Inc.

All files listed above and contained in this folder (https://github.com/cleanlab/examples) are part of cleanlab.

cleanlab is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

cleanlab is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License in LICENSE.

About

Notebooks demonstrating example applications of the cleanlab library

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages