List of libraries, tools and APIs for web scraping and data processing.
Makefile
Updated Oct 18, 2018
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
A curated list of awesome curated lists of many topics.
Updated Oct 21, 2018
Extract Transform Load for Python 3.5+
Python
Updated Oct 27, 2018
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processin…
Updated Jul 25, 2018
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Jupyter Notebook
Updated Oct 26, 2018
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Python
Updated Jun 12, 2018
Skip bad items in your PyTorch DataLoader, use Transforms as Filters, and more!
Python
Updated Oct 15, 2018
A list about Apache Kafka
Updated Jun 8, 2018
Fluence is the decentralized data processing engine
Scala
Updated Oct 26, 2018
Manipulating VASP files with Python.
Python
Updated May 13, 2018
Collection of Data Processing Agreement (DPA) and GDPR compliance resources
CSS
Updated Aug 31, 2018
IJCAI-18 阿里妈妈搜索广告转化预测初赛方案
Jupyter Notebook
Updated Apr 22, 2018
Python Adaptive Signal Processing
Python
Updated Mar 26, 2018
Machine Learning notebooks for refreshing concepts.
Jupyter Notebook
Updated Oct 5, 2018
A blazing fast exporter for your Elasticsearch data.
C++
Updated Oct 4, 2018
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing…
Ruby
Updated Oct 26, 2018
Docker image packaging for Apache Storm
Shell
Updated Oct 3, 2018
Some class materials for a data processing course using PySpark
Python
Updated Feb 14, 2018
Enterprise backend as a service
Java
Updated Oct 16, 2018
VIP is a python package/library for angular, reference star and spectral differential imaging for exoplanet/disk dete…
Python
Updated Oct 24, 2018
The MDSplus data management system
A package manager built for the command-line JSON processor jq.
Shell
Updated Jun 4, 2016
⇔ wq's io library, an interoperability tool for importing and exporting tabular and time series data, e.g. from citiz…
Data pipelining service
Java
Updated Aug 31, 2017
Create a serverless, event-driven application with Apache OpenWhisk on IBM Cloud Functions that executes code in resp…
Little utility to decode Metastock files and write them in text format
Python
Updated Dec 9, 2010
Deep learning tools for predicting oil well data
Python
Updated Apr 5, 2018
Rheem - a cross-platform data processing system
Java
Updated Sep 11, 2018
Apache Beam portability demo
Java
Updated Aug 28, 2017