Popular repositories
-
dist-keras Public archive
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
-
hdfs-metadata Public
Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks and nodes.
-
SparkPlugins Public
Code and examples of how to write and deploy Apache Spark Plugins with Spark 3.x. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spa…
-
spark-dashboard Public
This repo provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
-
SparkDLTrigger Public
Code and tests for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"
-
Hadoop-Profiler Public
Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.
Repositories
- SparkTraining Public
Training material for the CERN course on Apache Spark: https://sparktraining.web.cern.ch/
- spark-dashboard Public
This repo provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
- sparkMeasure Public
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics.
- SparkPlugins Public
Code and examples of how to write and deploy Apache Spark Plugins with Spark 3.x. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
-
- SparkDLTrigger Public
Code and tests for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"
-
-
-