A curated list of awesome big data frameworks, ressources and other awesomeness.
Updated Feb 12, 2019
Python clone of Spark, a MapReduce alike framework in Python
Python
Updated Jan 23, 2019
Distributed Big Data Orchestration Service
Java
Updated Mar 22, 2019
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Jupyter Notebook
Updated Sep 6, 2017
Out-of-Core DataFrames for Python, visualize and explore big tabular data at a billion rows per second.
Python
Updated Mar 21, 2019
C# and F# language binding and extensions to Apache Spark
C#
Updated Dec 24, 2018
Unify Big Data and Machine Learning.
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Jupyter Notebook
Updated Nov 24, 2018
🚚 Agile Data Science Workflows made easy with Python and Spark.
Google, Naver multiprocess image web crawler (Selenium)
Python
Updated Feb 26, 2019
基于flink的分布式数据同步工具
Java
Updated Mar 18, 2019
Fast topic modeling platform
C++
Updated Mar 16, 2019
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Java
Updated Mar 14, 2019
d3 library to build circular graphs
Jigsaw七巧板 provides a set of web components based on Angular5+. The main purpose of Jigsaw is to help the application …
An open source platform for managing and analyzing biomedical big data
Ruby
Updated Mar 21, 2019
Asynchronous HBase client for NodeJs using REST
CoffeeScript
Updated Mar 13, 2019
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
ROOT I/O in pure Python and Numpy.
Python
Updated Mar 21, 2019
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Scala
Updated Mar 13, 2019
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Updated Mar 22, 2019
A collection of pentest tools and resources targeting Hadoop environments
Python
Updated Jul 1, 2017
An end-to-end machine learning and data mining framework on Hadoop
Java
Updated Mar 22, 2019
A book about running Elasticsearch
Updated Mar 18, 2019
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Java
Updated Dec 11, 2018
一个大数据架构师应该掌握的技能
Updated Jan 16, 2019
Data Lineage Tracking and Visualization tool for Apache Spark ™
Scala
Updated Mar 22, 2019
🐳 big data study
Updated Apr 16, 2017
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spa…
Java
Updated Mar 9, 2019
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Jupyter Notebook
Updated Sep 6, 2017