Skip to content
#

data-curation

Here are 36 public repositories matching this topic...

FastDup is a tool for gaining insights from a large image collection. It can find anomalies, duplicate and near duplicate images, clusters of similaritity, learn the normal behavior and temporal interactions between images. It can be used for smart subsampling of a higher quality dataset, outlier removal, novelty detection of new information to be sent for tagging. FastDup scales to millions of images running on CPU only.

  • Updated Nov 2, 2022
  • Python

One of the biggest barriers to widespread machine learning adoption is the difficulty in collecting a 'good' dataset. There is an overall consensus that a 'good' dataset is a big dataset, but we believe that we can do better. As such the VennData project was created to develop tools to guide in the collection, curation, augmentation and validation of data.

  • Updated Dec 13, 2020
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the data-curation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-curation topic, visit your repo's landing page and select "manage topics."

Learn more