#
apache-spark
Here are 930 public repositories matching this topic...
酷玩 Spark: Spark 源代码解析、Spark 类库等
-
Updated
May 26, 2019 - Scala
Interactive and Reactive Data Science using Scala and Spark.
-
Updated
Jun 2, 2020 - JavaScript
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
python
scala
apache-spark
pytorch
keras-tensorflow
bigdl
distributed-deep-learning
deep-neural-network
analytics-zoo
-
Updated
Sep 5, 2020 - Jupyter Notebook
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
-
Updated
Mar 17, 2020 - Java
suhsteve
commented
Aug 19, 2020
APIs
SparkSession
-
pythondef getActiveSession(cls) -
scaladef executeCommand(runner: String, command: String, options: Map[String, String]): DataFrame
DataFrame
-
pythondef transform(self, func) -
pythondef tail(self, num)
scaladef tail(n: Int): Array[T] -
scaladef printSchema(level: Int): Unit -
scaladef explain(mode: String): U
Apache Spark docker image
-
Updated
Aug 15, 2020 - Dockerfile
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
kubernetes
spark
apache-spark
kubernetes-operator
kubernetes-controller
kubernetes-crd
google-cloud-dataproc
-
Updated
Aug 31, 2020 - Go
PySpark + Scikit-learn = Sparkit-learn
-
Updated
Oct 24, 2017 - Python
(Deprecated) Scikit-learn integration package for Apache Spark
-
Updated
Dec 3, 2019 - Python
A curated list of awesome Apache Spark packages and resources.
-
Updated
Jul 16, 2020
data-science
machine-learning
spark
apache-spark
bigdata
data-transformation
pyspark
data-extraction
data-analysis
data-wrangling
dask
data-exploration
data-preparation
data-profiling
data-cleansing
big-data-cleaning
data-cleaner
cudf
-
Updated
Aug 26, 2020 - Jupyter Notebook
C# and F# language binding and extensions to Apache Spark
streaming
spark
apache-spark
csharp
fsharp
bigdata
dataset
spark-streaming
eventhubs
mapreduce
dataframe
rdd
dstream
mobius
kafka-streaming
near-real-time
-
Updated
Nov 1, 2019 - C#
R interface for Apache Spark
-
Updated
Sep 4, 2020 - R
A Cluster Computing System for Processing Large-Scale Spatial Data
apache-spark
geospatial
spatial-analysis
spatial-index
spatial-queries
cluster-computing
spatial-join
spatial-sql
-
Updated
Sep 3, 2020 - Java
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
-
Updated
Jan 24, 2017 - Scala
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
python
airflow
spark
apache-spark
scheduler
s3
data-engineering
data-lake
warehouse
redshift
data-migration
livy
etl-framework
apache-airflow
emr-cluster
etl-pipeline
etl-job
data-engineering-pipeline
airflow-dag
goodreads-data-pipeline
-
Updated
Mar 9, 2020 - Python
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
data-science
machine-learning
spark
apache-spark
deep-learning
hadoop
tensorflow
keras
keras-models
optimization-algorithms
data-parallelism
distributed-optimizers
-
Updated
Jul 25, 2018 - Python
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
-
Updated
Jan 8, 2020 - Scala
A command-line tool for launching Apache Spark clusters.
-
Updated
Aug 3, 2020 - Python
REST web service for the true real-time scoring (<1 ms) of R, Scikit-Learn and Apache Spark models
-
Updated
Aug 5, 2020 - Java
Streaming System 相关的论文读物
streaming
apache-spark
storm
stream-processing
spark-streaming
dataflow
flink
heron
drizzle
millwheel
s4
streaming-engine
spe
stream-processing-engine
-
Updated
Mar 31, 2018
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
python
vagrant
data-science
data
machine-learning
airflow
kafka
spark
apache-spark
analytics
machine-learning-algorithms
python3
amazon-ec2
python-3
apache-kafka
amazon-web-services
predictive-analytics
agile-data
data-syndrome
agile-data-science
-
Updated
Jul 29, 2020 - Jupyter Notebook
A list about Apache Kafka
infrastructure
kafka
apache-spark
stream-processing
apache-kafka
kafka-streams
data-processing
data-pipeline
streaming-data
-
Updated
Dec 22, 2019
The Internals of Spark Structured Streaming
-
Updated
Nov 16, 2019
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
-
Updated
Sep 14, 2015 - Shell
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
-
Updated
Sep 3, 2020 - Scala
Improve this page
Add a description, image, and links to the apache-spark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the apache-spark topic, visit your repo's landing page and select "manage topics."
MLflow seems to have a length limit of 5000 when setting tags (see below).