Data-centric declarative deep learning framework
-
Updated
Apr 15, 2023 - Python
Data-centric declarative deep learning framework
Data Lake for Deep Learning. Multi-modal Vector Database for LLMs/LangChain. Store, query, version, & visualize datasets. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Modern columnar data format for ML implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
A curated, but incomplete, list of data-centric AI resources.
The open source active learning toolkit to find failure modes in your computer vision models, prioritize data to label next, and drive data curation to improve model performance.
DataCLUE: 数据为中心的NLP基准和工具包
Vue Form with Laravel Inspired Validation and Simply Enjoyable Error Messages Api. (Form Api, Validator Api, Rules Api, Error Messages Api)
A Data Centric annotation tool for your Named Entity Recognition projects
[ICLR'23] Implementation of "Empowering Graph Representation Learning with Test-Time Graph Transformation"
An observer is a wrapper over JSON data, that provides an interface to know when data is changed, with a focus on performance and memory efficiency.
Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI
From local functions to cloud deployed pipelines
Sample notebooks that use the Openlayer Python API
The code for our paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach (arXiv preprint 2209.06995).
Open-source Data Backend written in Java and based on PostgreSQL & GraphQL.
Quickly set up an image labelling web application for manually tagging images for machine learning tasks.
Data-Oriented Microservices Architecture Framework using DDS
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)
ndn-hydra: A Python-coded NDN distributed repository with five focused attributes: resiliency, scalability, usability, efficiency, and security.
Add a description, image, and links to the data-centric topic page so that developers can more easily learn about it.
To associate your repository with the data-centric topic, visit your repo's landing page and select "manage topics."