A curated list of awesome big data frameworks, ressources and other awesomeness.
Updated Feb 12, 2019
Python clone of Spark, a MapReduce alike framework in Python
Python
Updated Jan 23, 2019
Out-of-Core DataFrames for Python, visualize and explore big tabular data at a billion rows per second.
Python
Updated Apr 29, 2019
Distributed Big Data Orchestration Service
Java
Updated Apr 30, 2019
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Jupyter Notebook
Updated Sep 6, 2017
C# and F# language binding and extensions to Apache Spark
C#
Updated Apr 25, 2019
Unify Big Data and Machine Learning.
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Lightweight real-time big data streaming engine over Akka
Scala
Updated Apr 8, 2019
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Jupyter Notebook
Updated Nov 24, 2018
🚚 Agile Data Science Workflows made easy with Pyspark
Google, Naver multiprocess image web crawler (Selenium)
Python
Updated Feb 26, 2019
基于flink的分布式数据同步工具
Java
Updated Mar 18, 2019
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Java
Updated Apr 28, 2019
Fast topic modeling platform
C++
Updated Apr 27, 2019
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
d3 library to build circular graphs
Jigsaw七巧板 provides a set of web components based on Angular5+. The main purpose of Jigsaw is to help the application …
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
An open source platform for managing and analyzing biomedical big data
Ruby
Updated Apr 30, 2019
Asynchronous HBase client for NodeJs using REST
CoffeeScript
Updated Mar 13, 2019
A book about running Elasticsearch
Updated Apr 18, 2019
ROOT I/O in pure Python and Numpy.
Python
Updated Apr 30, 2019
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Scala
Updated Apr 24, 2019
An end-to-end machine learning and data mining framework on Hadoop
Java
Updated Apr 23, 2019
A collection of pentest tools and resources targeting Hadoop environments
Python
Updated Jul 1, 2017
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Updated Apr 27, 2019
一个大数据架构师应该掌握的技能
Updated Jan 16, 2019
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Java
Updated Dec 11, 2018
Data Lineage Tracking and Visualization tool for Apache Spark ™
Scala
Updated Apr 30, 2019