Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spar…
#
hadoop
Repositories 1,683
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, work…
Python
Updated Apr 28, 2019
Deeplearning4j, ND4J, DataVec and more - deep learning & linear algebra for Java/Scala with GPUs + Spark
The official home of the Presto distributed SQL query engine for big data
Java
Updated May 1, 2019
Alluxio, formerly Tachyon, Unify Data at Memory Speed
alluxio
distributed-storage
big-data
memory-speed
hadoop
spark
virtual-file-system
presto
tensorflow
Java
Updated May 1, 2019
Open Source Fast Scalable Machine Learning Platform For Smarter Applications: Deep Learning, Gradient Boosting & XGBo…
h2o
machine-learning
data-science
deep-learning
big-data
ensemble-learning
gbm
random-forest
naive-bayes
pca
opensource
distributed
multi-threading
java
python
r
hadoop
spark
gpu
automatic
Java
Updated May 1, 2019
Hue is an open source SQL Cloud Assistant for developing and accessing SQL/Data Apps.
Python
Updated May 1, 2019
BigDL: Distributed Deep Learning Library for Apache Spark
Scala
Updated Apr 30, 2019
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
Makefile
Updated Oct 13, 2017
A large-scale entity and relation database supporting aggregation of properties
AI on Hadoop
Java
Updated Apr 18, 2019
A pandas-like deferred expression system, with first-class SQL support (Impala, PostgreSQL, SQLite, ...)
Resource scheduling and cluster management for AI
Hadoop, Docker, Kafka, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, D…
nagios-plugins
zookeeper
hadoop
hbase
cloudera
hbase-client
jenkins
travis-ci
hortonworks
ambari
cassandra
elasticsearch
docker
kafka
solr
redis
rabbitmq
consul
datastax
kubernetes
Python
Updated Apr 24, 2019
LizardFS is an Open Source Distributed File System licenced under GPLv3.
c-plus-plus
gplv3
nas
macosx
linux
posix
distributed-systems
distributed-computing
fault-tolerance
high-performance
high-availability
snapshot
qos
erasure-coding
replication
replicas
geo-replication
hsm
hierarchical-storage
hadoop
C++
Updated Apr 29, 2019
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Java
Updated Apr 1, 2019
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on b…
Java
Updated Apr 24, 2019
cerndb / dist-keras Archived
571
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
machine-learning
deep-learning
apache-spark
data-parallelism
distributed-optimizers
keras
optimization-algorithms
tensorflow
data-science
hadoop
Python
Updated Jul 25, 2018
Upserts And Incremental Processing on Big Data
Java
Updated May 1, 2019
DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr / SolrCloud, Prest…
hadoop
hbase
cassandra
solr
solrcloud
kafka
consul
zookeeper
apache-drill
nifi
docker-image
dockerhub
docker
rabbitmq-cluster
nagios-plugins
spark
presto
rabbitmq
linux
kubernetes
Shell
Updated Mar 31, 2019
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Java
Updated Apr 26, 2019
生产环境的海量数据计算产品,文档地址:
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Updated Jan 30, 2019
Official home of Presto, the distributed SQL query engine for big data
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on va…
Kafka Connect HDFS connector
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT…
Java
Updated May 10, 2018
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Java
Updated Apr 25, 2018
Lightweight, simple structured NoSQL database for Android
android
nosql
sql
data
local
saver
shared
preferences
path
uri
simple
cassandra
firebase
mongo
db
mongodb
hadoop
cassandra-database
elastic
Java
Updated Oct 1, 2018