Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spar…
#
big-data
Repositories 1,387
The Patterns Behind Scalable, Reliable, and Performant Large-Scale Systems
system-design
backend
scalability
site-reliability-engineering
sre
interview
architecture
devops
site-reliability
design-patterns
back-end
back-end-development
interview-questions
design-systems
awesome-list
microservices
distributed-systems
design-system
tech
big-data
Updated Apr 30, 2019
A realtime, decentralized, offline-first, mutable graph protocol to sync the web.
machine-learning
artificial-intelligence
big-data
blockchain
p2p
peer-to-peer
decentralized
graph
cryptography
crypto
offline-first
realtime
iot
crdt
protocol
database
end-to-end
encryption
dweb
dapp
JavaScript
Updated Apr 27, 2019
The official home of the Presto distributed SQL query engine for big data
Java
Updated May 1, 2019
A tool for managing Apache Kafka.
Scala
Updated Apr 30, 2019
ClickHouse is a free analytic DBMS for big data.
Kubernetes中文指南/云原生应用架构实践手册 - https://jimmysong.io/kubernetes-handbook
The most widely used Python to C compiler
Alluxio, formerly Tachyon, Unify Data at Memory Speed
alluxio
distributed-storage
big-data
memory-speed
hadoop
spark
virtual-file-system
presto
tensorflow
Java
Updated May 1, 2019
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems …
Python
Updated Apr 2, 2019
Open Source Fast Scalable Machine Learning Platform For Smarter Applications: Deep Learning, Gradient Boosting & XGBo…
h2o
machine-learning
data-science
deep-learning
big-data
ensemble-learning
gbm
random-forest
naive-bayes
pca
opensource
distributed
multi-threading
java
python
r
hadoop
spark
gpu
automatic
Java
Updated May 1, 2019
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, reg…
Reproducible Data Science at Scale!
Moloch is an open source, large scale, full packet capturing, indexing, and database system.
JavaScript
Updated Apr 30, 2019
Open Source In-Memory Data Grid
Java
Updated May 1, 2019
BigDL: Distributed Deep Learning Library for Apache Spark
Scala
Updated Apr 30, 2019
Vespa is an engine for low-latency computation over large data sets.
Bare bone examples of machine learning in TensorFlow
tensorflow
tensorflow-tutorials
distributed-computing
simple
big-data
linear-regression
tensorflow-examples
tensorflow-exercises
Python
Updated Mar 14, 2017
An easy to use, self-service open BI reporting and BI dashboard platform.
JavaScript
Updated Mar 18, 2019
A large-scale entity and relation database supporting aggregation of properties
data-science
data-visualization
dashboard
data-engineering
d3
d3js
chart
data
yaml
csv
json
gist
github-gist
big-data
business-intelligence
data-driven
just-dashboard
JavaScript
Updated Apr 29, 2019
MySQL performance monitoring and analysis.
Java
Updated Jan 9, 2019
A search engine which can hold 100 trillion lines of log data.
Go
Updated May 22, 2017
Distributed Big Data Orchestration Service
big-data
bigdata
orchestration
configuration
configuration-management
java
spring-boot
distributed-systems
netflixoss
cloud
netflix-oss
microservice
microservices
Java
Updated Apr 30, 2019
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
spark
python
pyspark
data-analysis
mllib
ipython-notebook
notebook
ipython
data-science
machine-learning
big-data
bigdata
Jupyter Notebook
Updated Sep 6, 2017
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine L…
Java
Updated Dec 26, 2018
Resource scheduling and cluster management for AI
TrailDB is an efficient tool for storing and querying series of events
C
Updated Oct 31, 2018
JavaScript
Updated May 1, 2019