Best-of Python
This curated list contains 390 awesome open-source projects with a total of 1.4M stars grouped into 28 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome!
Contents
- Data Serialization 16 projects
- Data Containers & Dataframes 30 projects
- Data Structures 15 projects
- Data Validation 15 projects
- Algorithms & Design Patterns 4 projects
- Date & Time Utilities 9 projects
- File & Path Utilities 10 projects
- Compatiblity 7 projects
- Cryptography 7 projects
- Infrastructure & DevOps 19 projects
- Process Utilities 4 projects
- Asynchronous Programming 7 projects
- Configuration 9 projects
- CLI Development 18 projects
- Development Tools 1 projects
- Data Caching 6 projects
- GUI Development 10 projects
- Computer & Machine Vision 1 projects
- Machine Learning & Data Engineering 1 projects
- Text Data 12 projects
- Web Development 1 projects
- Database Clients 64 projects
- Data Loading & Extraction 30 projects
- Data Pipelines & Streaming 43 projects
- File Formats 3 projects
- Code Inspection 4 projects
- General Utilities 15 projects
- Python Implementations 6 projects
- Others 21 projects
Explanation
🥇 🥈 🥉 Combined project-quality score⭐️ Star count from GitHub🐣 New project (less than 6 months old)💤 Inactive project (6 months no activity)💀 Dead project (12 months no activity)📈 📉 Project is trending up or down➕ Project was recently added❗️ Warning (e.g. missing/risky license)👨💻 Contributors count from GitHub🔀 Fork count from GitHub📋 Issue count from GitHub⏱️ Last update timestamp on package manager📥 Download count from package manager📦 Number of dependent projectsPandas related project
Data Serialization
protobuf (🥇 49 · ⭐ 55K) - Protocol Buffers - Googles data interchange format. BSD-3
-
GitHub (
👨💻 980 ·🔀 14K ·📥 38M ·📦 240K ·📋 5K - 17% open ·⏱️ 29.06.2022):git clone https://github.com/protocolbuffers/protobuf -
PyPi (
📥 91M / month ·📦 15K ·⏱️ 24.06.2022):pip install protobuf -
Conda (
📥 8.9M ·⏱️ 10.05.2022):conda install -c conda-forge protobuf -
npm (
📥 4.6M / month ·📦 2.6K ·⏱️ 21.04.2022):npm install google-protobuf
flatbuffers (🥇 41 · ⭐ 18K) - FlatBuffers: Memory Efficient Serialization Library. Apache-2
-
GitHub (
👨💻 560 ·🔀 2.8K ·📥 64K ·📦 2.5K ·📋 2K - 7% open ·⏱️ 28.06.2022):git clone https://github.com/google/flatbuffers -
PyPi (
📥 9M / month ·📦 210 ·⏱️ 10.05.2021):pip install flatbuffers -
Conda (
📥 380K ·⏱️ 04.03.2022):conda install -c conda-forge flatbuffers -
npm (
📥 360K / month ·📦 220 ·⏱️ 25.02.2022):npm install flatbuffers
marshmallow (🥈 38 · ⭐ 6.1K) - A lightweight library for converting complex objects to and from.. MIT
ultrajson (🥈 36 · ⭐ 3.7K) - Ultra fast JSON decoder and encoder written in C with Python bindings. BSD-3
simplejson (🥈 35 · ⭐ 1.5K) - simplejson is a simple, fast, extensible JSON encoder/decoder for.. MIT
jsonpickle (🥈 34 · ⭐ 1K) - Python library for serializing any arbitrary object graph into JSON... BSD-3
orjson (🥉 33 · ⭐ 3.4K) - Fast, correct Python JSON library supporting dataclasses, datetimes,.. Apache-2
msgpack (🥉 33 · ⭐ 1.6K) - MessagePack serializer implementation for Python msgpack.org[Python]. Apache-2
cloudpickle (🥉 33 · ⭐ 1.2K) - Extended pickling support for Python objects. BSD-3
pysimdjson (🥉 26 · ⭐ 540) - Python bindings for the simdjson project. MIT
python-rapidjson (🥉 26 · ⭐ 450) - Python wrapper around rapidjson. MIT
Data Containers & Dataframes
General-purpose data containers as well as utilities & extensions for pandas.
h5py (🥇 41 · ⭐ 1.8K) - HDF5 for Python -- The h5py package is a Pythonic interface to the HDF5.. BSD-3
Bottleneck (🥈 32 · ⭐ 760) - Fast NumPy array functions written in C. BSD-2
Vaex (🥈 31 · ⭐ 7.1K) - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization.. MIT
datasketch (🥉 29 · ⭐ 1.7K) - MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog,.. MIT
Pandaral·lel (🥉 26 · ⭐ 2.2K) - A simple and efficient tool to parallelize Pandas.. BSD-3 
jupyter
StaticFrame (🥉 26 · ⭐ 300) - Immutable and grow-only Pandas-like DataFrames with a more explicit.. MIT
Pandas Summary (🥉 20 · ⭐ 420) - A library for managing, validating, summarizing, and.. Apache-2 
Show 8 hidden projects...
- Blaze (
🥈 31 ·⭐ 3.1K ·💀 ) - NumPy and Pandas interface to Big Data.BSD-3 - Arctic (
🥉 29 ·⭐ 2.7K) - Arctic is a high performance datastore for numeric data.❗️LGPL-2.1 - sklearn-pandas (
🥉 29 ·⭐ 2.6K ·💀 ) - Pandas integration with sklearn.❗️Zlibsklearn - pandasql (
🥉 27 ·⭐ 1.1K ·💀 ) - sqldf for pandas.MIT - bcolz (
🥉 27 ·⭐ 940 ·💀 ) - A columnar data container that can be compressed.BSD-3 - pickleDB (
🥉 22 ·⭐ 670 ·💀 ) - pickleDB is an open source key-value store using Pythons json module.BSD-3 - Bounter (
🥉 18 ·⭐ 930 ·💀 ) - Efficient Counter that uses a limited (bounded) amount of memory..MIT - fletcher (
🥉 18 ·⭐ 220 ·💀 ) - Pandas ExtensionDType/Array backed by Apache Arrow.MIT
Data Structures
pyrsistent (🥇 35 · ⭐ 1.7K) - Persistent/Immutable/Functional data structures for Python. MIT
python-sortedcontainers (🥈 29 · ⭐ 2.6K · 💤 ) - Python Sorted Container Types: Sorted List, Sorted.. Apache-2
sqlitedict (🥈 28 · ⭐ 850) - Persistent dict, backed by sqlite3 and pickle, multithread-safe. Apache-2
python-benedict (🥈 28 · ⭐ 400) - dict subclass with keylist/keypath support, normalized I/O.. MIT
ordered-set (🥈 28 · ⭐ 160) - A mutable set that remembers the order of its entries. One of Pythons.. MIT
immutables (🥉 27 · ⭐ 960) - A high-performance immutable mapping type for Python. Apache-2
python-box (🥉 25 · ⭐ 2K) - Python dictionaries with advanced dot notation access. MIT
Show 4 hidden projects...
- addict (
🥈 29 ·⭐ 2.2K ·💀 ) - The Python Dict thats better than heroin.MIT - anytree (
🥉 26 ·⭐ 710 ·💀 ) - Python tree data library.Apache-2 - munch (
🥉 26 ·⭐ 590 ·💀 ) - A Munch is a Python dictionary that provides attribute-style access (a..MIT - cleverdict (
🥉 17 ·⭐ 87) - A JSON-friendly data structure which allows both object attributes and..MIT
Data Validation
jsonschema (🥇 40 · ⭐ 3.7K) - An implementation of the JSON Schema specification for Python. MIT
voluptuous (🥈 33 · ⭐ 1.7K) - CONTRIBUTIONS ONLY: Voluptuous, despite the name, is a Python data.. BSD-3
schematics (🥉 30 · ⭐ 2.5K · 💤 ) - Python Data Structures for Humans. BSD-3
validators (🥉 30 · ⭐ 600) - Python Data Validation for Humans. MIT
strictyaml (🥉 26 · ⭐ 1.1K) - Type-safe YAML parser and validator. MIT
dirty-equals (🥉 16 · ⭐ 480 · 🐣 ) - Doing dirty (but extremely useful) things with equals. MIT
Show 4 hidden projects...
- cerberus (
🥈 32 ·⭐ 2.8K ·💀 ) - Lightweight, extensible data validation library for Python.ISC - python-email-validator (
🥉 27 ·⭐ 590) - A robust email syntax and deliverability validation..❗️CC0-1.0 - valideer (
🥉 20 ·⭐ 260 ·💀 ) - Lightweight data validation and adaptation Python library.MIT - dataklasses (
🥉 6 ·⭐ 720 ·🐣 ) - A different spin on dataclasses.❗Unlicensed
Algorithms & Design Patterns
transitions (🥇 30 · ⭐ 4.4K) - A lightweight, object-oriented finite state machine implementation.. MIT
algorithms (🥉 28 · ⭐ 21K) - Minimal examples of data structures and algorithms in Python. MIT
Show 1 hidden projects...
Date & Time Utilities
python-dateutil (🥈 37 · ⭐ 1.8K) - Useful extensions to the standard Python datetime features. Apache-2
dateparser (🥈 34 · ⭐ 2.1K) - python parser for human readable dates. BSD-3
parsedatetime (🥉 28 · ⭐ 650 · 💤 ) - Parse human-readable date/time strings. Apache-2
File & Path Utilities
filesystem_spec (🥈 36 · ⭐ 440) - A specification that python filesystems should adhere to. BSD-3
pyfilesystem2 (🥈 31 · ⭐ 1.7K) - Pythons Filesystem abstraction layer. MIT
scandir (🥉 30 · ⭐ 500) - Better directory iterator and faster os.walk(), now in the Python 3.5.. BSD-3
Show 3 hidden projects...
Compatiblity
typing (🥈 34 · ⭐ 1.3K) - Python static typing home. Hosts the documentation and a user help.. Python-2.0
dataclasses (🥉 27 · ⭐ 540) - A backport of the dataclasses module for Python 3.6. Apache-2
futures (🥉 27 · ⭐ 220) - Backport of the concurrent.futures package to Python 2.6 and 2.7. Python-2.0
Show 2 hidden projects...
- pathlib2 (
🥉 29 ·⭐ 69) - Backport of pathlib aiming to support the full stdlib Python API.MIT - contextlib2 (
🥉 26 ·⭐ 34) - contextlib2 is a backport of the standard librarys contextlib..❗️psfrag
Cryptography
cryptography (🥇 44 · ⭐ 4.9K) - cryptography is a package designed to expose cryptographic.. BSD-3
pycryptodomex (🥈 38 · ⭐ 2K) - A self-contained cryptographic library for Python. BSD-3
asn1crypto (🥉 34 · ⭐ 270) - Python ASN.1 library with a focus on performance and a pythonic API. MIT
Infrastructure & DevOps
ansible (🥇 48 · ⭐ 54K) - Ansible is a radically simple IT automation platform that makes your.. ❗️GPL-3.0
docker-compose (🥈 41 · ⭐ 26K · 📉 ) - Define and run multi-container applications with Docker. Apache-2
paramiko (🥈 41 · ⭐ 7.7K) - The leading native Python SSHv2 protocol library. ❗️LGPL-2.1
pulumi (🥈 40 · ⭐ 13K) - Pulumi - Universal Infrastructure as Code. Your Cloud, Your Language,.. Apache-2
kubernetes (🥈 37 · ⭐ 4.9K) - Official Python client library for kubernetes. Apache-2
pyinfra (🥉 30 · ⭐ 1.5K) - pyinfra automates infrastructure super fast at massive scale. It can be.. MIT
pypyr (🥉 21 · ⭐ 410) - pypyr task-runner cli & api for automation pipelines. Automate anything.. Apache-2
Show 5 hidden projects...
- sshtunnel (
🥉 28 ·⭐ 980 ·💀 ) - SSH tunnels to remote server.MIT - storm (
🥉 25 ·⭐ 3.9K ·💀 ) - Manage your SSH like a boss.MIT - fabtools (
🥉 25 ·⭐ 1.3K ·💀 ) - Tools for writing awesome Fabric files.BSD-2 - parallel-ssh (
🥉 25 ·⭐ 1K) - Asynchronous parallel SSH client library.❗️LGPL-2.1 - wssh (
🥉 17 ·⭐ 1.3K ·💀 ) - SSH to WebSockets Bridge.MIT
Process Utilities
pexpect (🥇 36 · ⭐ 2.2K) - A Python module for controlling interactive programs in a pseudo-terminal. ISC
supervisor (🥈 35 · ⭐ 7.4K) - Supervisor process control system for UNIX. ❗️Repoze Public License
ptyprocess (🥉 26 · ⭐ 170 · 💤 ) - Run a subprocess in a pseudo terminal. ISC
Asynchronous Programming
anyio (🥈 30 · ⭐ 920) - High level asynchronous concurrency and networking framework that works on.. MIT
Show 1 hidden projects...
Configuration
python-dotenv (🥇 36 · ⭐ 5.1K) - Reads key-value pairs from a .env file and can set them as.. BSD-3
omegaconf (🥈 33 · ⭐ 1.2K) - Flexible Python configuration system. The last one you will ever need. BSD-3
python-decouple (🥉 32 · ⭐ 2.1K) - Strict separation of config from code. MIT
gin-config (🥉 27 · ⭐ 1.6K) - Gin provides a lightweight configuration framework for Python. Apache-2
Show 2 hidden projects...
CLI Development
rich (🥇 43 · ⭐ 38K) - Rich is a Python library for rich text and beautiful formatting in the terminal. MIT
python-prompt-toolkit (🥈 37 · ⭐ 7.8K) - Library for building powerful interactive command.. BSD-3
python-fire (🥈 36 · ⭐ 23K) - Python Fire is a library for automatically generating command.. Apache-2
argcomplete (🥉 32 · ⭐ 1.1K) - Python and tab completion, better together. Apache-2
asciimatics (🥉 31 · ⭐ 3.1K) - A cross platform package to do curses-like operations, plus.. Apache-2
wcwidth (🥉 30 · ⭐ 280 · 💤 ) - Python library that measures the width of unicode strings rendered to.. MIT
ConfigArgParse (🥉 28 · ⭐ 600 · 💤 ) - A drop-in replacement for argparse that allows options to.. MIT
questionary (🥉 27 · ⭐ 890) - Python library to build pretty command line user prompts Easy to use.. MIT
Show 5 hidden projects...
- docopt (
🥈 36 ·⭐ 7.6K ·💀 ) - Pythonic command line arguments parser, that will make you smile.MIT - blessings (
🥉 29 ·⭐ 1.3K ·💀 ) - A thin, practical wrapper around terminal capabilities in Python.MIT - clint (
🥉 25 ·⭐ 75 ·💀 ) - Python Command-line Application Tools.ISC - bashplotlib (
🥉 22 ·⭐ 1.7K ·💀 ) - plotting in the terminal.MIT - docopt-ng (
🥉 21 ·⭐ 87) - Humane command line arguments parser. Now with maintenance, typehints,..MIT
Development Tools
Data Caching
cachetools (🥇 33 · ⭐ 1.4K) - Extensible memoizing collections and decorators. MIT
pylibmc (🥈 28 · ⭐ 450 · 💤 ) - A Python wrapper around the libmemcached interface from TangentOrg. BSD-3
Show 1 hidden projects...
- cached-property (
🥈 30 ·⭐ 650 ·💀 ) - A decorator for caching properties in classes.BSD-3
GUI Development
kivy (🥇 40 · ⭐ 15K) - Open source UI framework written in Python, running on Windows, Linux, macOS,.. MIT
PySimpleGUI (🥈 38 · ⭐ 10K) - PySimpleGUI is a Python package that enables Python.. ❗️LGPL-3.0
DearPyGui (🥈 32 · ⭐ 8.1K) - Dear PyGui: A fast and powerful Graphical User Interface Toolkit for.. MIT
Eel (🥈 30 · ⭐ 5K · 💤 ) - A little Python library for making simple Electron-like HTML/JS GUI apps. MIT
Gooey (🥉 28 · ⭐ 16K) - Turn (almost) any Python command line program into a full GUI application.. MIT
Show 2 hidden projects...
- Phoenix (
🥉 27 ·⭐ 1.8K) - wxPythons Project Phoenix. A new implementation of wxPython,..❗️wxWindows - enaml (
🥉 27 ·⭐ 1.2K) - Declarative User Interfaces for Python.❗Unlicensed
Computer & Machine Vision
Machine Learning & Data Engineering
Text Data
chardet (🥇 35 · ⭐ 1.7K · 📈 ) - Python character encoding detector. ❗️LGPL-2.1
-
GitHub (
👨💻 48 ·🔀 240 ·📋 140 - 46% open ·⏱️ 29.06.2022):git clone https://github.com/chardet/chardet -
PyPi (
📥 62M / month ·📦 39K ·⏱️ 25.06.2022):pip install chardet -
Conda (
📥 17M ·⏱️ 25.06.2022):conda install -c conda-forge chardet -
npm (
📥 8 / month ·📦 1 ·⏱️ 20.08.2017):npm install @pypi/chardet
phonenumbers (🥈 34 · ⭐ 3K) - Python port of Googles libphonenumber. Apache-2
python-slugify (🥈 30 · ⭐ 1.2K) - Returns unicode slugs. MIT
inflect (🥉 29 · ⭐ 670) - Correctly generate plurals, ordinals, indefinite articles; convert numbers.. MIT
pyahocorasick (🥉 28 · ⭐ 730) - Python module (C extension and plain python) implementing Aho-.. BSD-3
Show 4 hidden projects...
- awesome-slugify (
🥉 22 ·⭐ 460 ·💀 ) - Python flexible slugify function.❗️GPL-3.0 - price-parser (
🥉 20 ·⭐ 230 ·💀 ) - Extract price amount and currency symbol from a raw text..BSD-3 - coolname (
🥉 19 ·⭐ 93 ·💀 ) - Random Name and Slug Generator.BSD-2 - millify (
🥉 15 ·⭐ 65 ·💀 ) - Convert long numbers into a human-readable format in Python.MIT
Web Development
Database Clients
Libraries for connecting to, operating, and querying databases.
SQLAlchemy (🥇 44 · ⭐ 5.6K) - The Database Toolkit for Python. MIT
azure-storage-blob (🥇 42 · ⭐ 2.9K) - This repository is for active development of the Azure SDK.. MIT
google-cloud-storage (🥇 40 · ⭐ 3.9K) - Google Cloud Client Library for Python. Apache-2
elasticsearch (🥇 40 · ⭐ 3.7K) - Official Elasticsearch client library for Python. Apache-2
kafka-python (🥈 38 · ⭐ 4.9K) - Python client for Apache Kafka. Apache-2
MongoEngine (🥈 37 · ⭐ 3.8K) - A Python Object-Document-Mapper for working with MongoDB. MIT
python-bigquery (🥈 37 · ⭐ 450) - Google BigQuery API client library. Apache-2
confluent-kafka-python (🥈 36 · ⭐ 2.8K) - Confluents Kafka Python Client. Apache-2
SQLAlchemy-Utils (🥈 36 · ⭐ 940) - Various utility functions and datatypes for SQLAlchemy. BSD-3
libcloud (🥈 35 · ⭐ 1.9K) - Apache Libcloud is a Python library which hides differences between.. Apache-2
Elasticsearch DSL (🥈 33 · ⭐ 3.5K) - High level Python client for Elasticsearch. Apache-2
AWS Data Wrangler (🥈 33 · ⭐ 2.9K) - Pandas on AWS - Easy integration with Athena, Glue,.. Apache-2 
mysqlclient (🥈 33 · ⭐ 2.1K) - MySQL database connector for Python (with Python 3 support). ❗️GPL-2.0
pandas-gbq (🥈 33 · ⭐ 310) - Google BigQuery connector for pandas. BSD-3
s3transfer (🥈 33 · ⭐ 140) - Amazon S3 Transfer Manager for Python. Apache-2
Prometheus Client (🥈 32 · ⭐ 2.8K) - Prometheus instrumentation library for Python.. Apache-2
PyPika (🥈 32 · ⭐ 1.7K) - PyPika is a python SQL query builder that exposes the full richness.. Apache-2
tortoise-orm (🥉 31 · ⭐ 2.9K) - Familiar asyncio ORM for python, built with relations in mind. Apache-2
Cassandra Driver (🥉 31 · ⭐ 1.3K) - DataStax Python Driver for Apache Cassandra. Apache-2
cx-Oracle (🥉 31 · ⭐ 820) - Python interface to Oracle Database now superseded by python-oracledb. BSD-3
neo4j-driver (🥉 31 · ⭐ 720) - Neo4j Bolt driver for Python. Apache-2
py2neo (🥉 30 · ⭐ 1.1K) - Py2neo is a comprehensive Neo4j driver library and toolkit for Python. Apache-2
dataset (🥉 29 · ⭐ 4.2K) - Easy-to-use data handling for SQL data stores with support for implicit.. MIT
redis-py-cluster (🥉 29 · ⭐ 1.1K) - Python cluster client for the official redis cluster. Redis.. MIT
sqlmodel (🥉 25 · ⭐ 7.6K) - SQL databases in Python, designed for simplicity, compatibility,.. MIT pydantic
prisma (🥉 24 · ⭐ 700) - Prisma Client Python is an auto-generated and fully type-safe database.. Apache-2
ODMantic (🥉 21 · ⭐ 580) - Async ODM (Object Document Mapper) for MongoDB based on python type hints. ISC
aioprometheus (🥉 19 · ⭐ 120) - A Prometheus Python client library for asyncio-based applications. MIT
psycopg3 (🥉 18 · ⭐ 710) - New generation PostgreSQL database adapter for the Python.. ❗️LGPL-3.0
-
GitHub (
👨💻 24 ·🔀 63 ·📋 170 - 13% open ·⏱️ 15.06.2022):git clone https://github.com/psycopg/psycopg
Show 12 hidden projects...
- psycopg2 (
🥈 37 ·⭐ 2.6K) - PostgreSQL database adapter for the Python..❗️BSD-3-Clause-Attribution - Records (
🥉 30 ·⭐ 6.9K ·💀 ) - SQL for Humans.ISC - pyodbc (
🥉 30 ·⭐ 2.4K ·📉 ) - Python ODBC bridge.❗️MIT-0 - google-cloud-bigtable (
🥉 30 ·⭐ 34) - Google Cloud Bigtable API client library.Apache-2 - HappyBase (
🥉 27 ·⭐ 590 ·💀 ) - A developer-friendly Python library to interact with Apache HBase.MIT - mongo-connector (
🥉 26 ·⭐ 1.8K ·💀 ) - MongoDB data stream pipeline tools by YouGov (adopted..Apache-2 - pyhdb (
🥉 24 ·⭐ 300 ·💀 ) - SAP HANA Connector in pure Python.Apache-2 - PyMODM (
🥉 22 ·⭐ 340 ·💀 ) - A Pythonic, object-oriented interface for working with MongoDB.Apache-2 - db.py (
🥉 21 ·⭐ 1.2K ·💀 ) - db.py is an easier way to interact with your databases.BSD-2 - gsheets-db-api (
🥉 19 ·⭐ 180 ·💀 ) - A Python DB-API and SQLAlchemy dialect to Google Spreasheets.MIT - lazydata (
🥉 16 ·⭐ 630 ·💀 ) - Lazydata: Scalable data dependencies for Python projects.Apache-2 - SuperSQLite (
🥉 15 ·⭐ 700 ·💀 ) - A supercharged SQLite library for Python.MIT
Data Loading & Extraction
Libraries for loading, collecting, and extracting data from a variety of data sources and formats.
Datasets (🥇 42 · ⭐ 14K) - The largest hub of ready-to-use datasets for ML models with fast,.. Apache-2
xmltodict (🥈 36 · ⭐ 4.8K) - Python module that makes working with XML feel like you are working.. MIT
python-magic (🥈 36 · ⭐ 2.1K) - A python wrapper for libmagic. MIT
smart-open (🥈 32 · ⭐ 2.5K) - Utils for streaming large files (S3, HDFS, gzip, bz2...). MIT
csvkit (🥈 31 · ⭐ 5K) - A suite of utilities for converting to and working with CSV, the king of.. MIT
snorkel (🥈 30 · ⭐ 5.2K) - A system for quickly generating training data with weak supervision. Apache-2
gdown (🥈 30 · ⭐ 2.1K · 📈 ) - Download a large file from Google Drive (curl/wget fails because of the.. MIT
pandas-datareader (🥈 29 · ⭐ 2.4K) - Extract data from a wide range of Internet sources into a.. BSD-3
Intake (🥈 29 · ⭐ 780) - Intake is a lightweight package for finding, investigating, loading and.. BSD-2
rows (🥉 23 · ⭐ 800) - A common, beautiful interface to tabular data, no matter the format. ❗️LGPL-3.0
img2dataset (🥉 23 · ⭐ 730) - Easily turn large sets of image urls to an image dataset. Can.. MIT
deepdish (🥉 22 · ⭐ 240 · 💤 ) - Flexible HDF5 saving/loading and other data science tools from the.. BSD-3
csvs-to-sqlite (🥉 16 · ⭐ 700 · 💤 ) - Convert CSV files into a SQLite database. Apache-2
Show 8 hidden projects...
- PDFMiner (
🥈 29 ·⭐ 4.8K ·💀 ) - Python PDF Parser (Not actively maintained). Check out pdfminer.six.MIT - tabulator-py (
🥉 27 ·⭐ 220 ·💀 ) - Python library for reading and writing tabular data via streams.MIT - messytables (
🥉 24 ·⭐ 380 ·💀 ) - Tools for parsing messy tabular data. This is now superseded by..MIT - Singer (
🥉 23 ·⭐ 1K ·💀 ) - Standard for moving data between databases, web APIs, files,..❗️AGPL-3.0 - pyexcel-xlsx (
🥉 23 ·⭐ 100 ·💀 ) - A wrapper library to read, manipulate and write data in xlsx..BSD-3 - borb (
🥉 19 ·⭐ 2.7K) - borb is a library for reading, creating and manipulating PDF files..❗Unlicensed - excalibur (
🥉 19 ·⭐ 1.2K ·💀 ) - A web interface to extract tabular data from PDFs.MIT - Upgini (
🥉 17 ·⭐ 46 ·🐣 ) - Free automated data enrichment library for machine learning searches..BSD-3
Data Pipelines & Streaming
Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
Airflow (🥇 45 · ⭐ 27K) - Platform to programmatically author, schedule, and monitor workflows. Apache-2
-
GitHub (
👨💻 2.4K ·🔀 11K ·📥 310K ·📋 5.9K - 14% open ·⏱️ 30.06.2022):git clone https://github.com/apache/airflow -
PyPi (
📥 6.9M / month ·📦 480 ·⏱️ 04.06.2022):pip install apache-airflow -
Conda (
📥 650K ·⏱️ 16.06.2022):conda install -c conda-forge airflow -
Docker Hub (
📥 76M ·⭐ 360 ·⏱️ 17.06.2022):docker pull apache/airflow
Celery (🥇 45 · ⭐ 20K · 📉 ) - Asynchronous task queue/job queue based on distributed message passing. BSD-3
luigi (🥇 38 · ⭐ 16K · 📈 ) - Luigi is a Python module that helps you build complex pipelines of.. Apache-2
Great Expectations (🥈 37 · ⭐ 6.8K) - Always know what to expect from your data. Apache-2
Kedro (🥈 35 · ⭐ 7.3K) - A Python framework for creating reproducible, maintainable and modular.. Apache-2
dbt (🥈 35 · ⭐ 5.1K) - dbt enables data analysts and engineers to transform their data using the.. Apache-2
Activeloop (🥈 31 · ⭐ 4.7K) - Dataset format for AI. Build, manage, query & visualize datasets.. MPL-2.0
streamparse (🥉 27 · ⭐ 1.5K) - Run Python in Apache Storm topologies. Pythonic API, CLI.. Apache-2
Optimus (🥉 27 · ⭐ 1.2K) - Agile Data Preparation Workflows madeeasy with Pandas, Dask,.. Apache-2 spark
dbnd (🥉 27 · ⭐ 230) - DBND is an agile pipeline framework that helps data engineering teams.. Apache-2
PyFunctional (🥉 26 · ⭐ 2K) - Python library for creating data pipelines with chain functional.. MIT
whylogs (🥉 26 · ⭐ 1.6K) - Open standard for end-to-end data and ML monitoring for any scale in.. Apache-2
BatchFlow (🥉 21 · ⭐ 180) - BatchFlow helps you conveniently work with random or sequential.. Apache-2
bodywork-core (🥉 20 · ⭐ 350) - ML pipeline orchestration and model deployments on.. ❗️AGPL-3.0
spark-deep-learning (🥉 19 · ⭐ 1.9K) - Deep Learning Pipelines for Apache Spark. Apache-2 spark
-
GitHub (
👨💻 17 ·🔀 460 ·📦 22 ·📋 100 - 74% open ·⏱️ 21.03.2022):git clone https://github.com/databricks/spark-deep-learning
Databolt Flow (🥉 18 · ⭐ 940 · 💤 ) - Python library for building highly effective data science.. MIT
Mara Pipelines (🥉 17 · ⭐ 1.9K) - A lightweight opinionated ETL framework, halfway between plain.. MIT
RasgoQL (🥉 15 · ⭐ 260 · 🐣 ) - Write python locally, execute SQL in your data warehouse. ❗️AGPL-3.0
Show 8 hidden projects...
- mrjob (
🥈 31 ·⭐ 2.6K ·💀 ) - Run MapReduce jobs on Hadoop or Amazon Web Services.Apache-2 - faust (
🥈 30 ·⭐ 6.2K ·💀 ) - Python Stream Processing.BSD-3 - bonobo (
🥉 24 ·⭐ 1.5K ·💀 ) - Extract Transform Load for Python 3.5+.Apache-2 - dpark (
🥉 22 · <g-emoji class="g-emoji" alias="star" fallback-src="https://github.githubassets.co