Skip to content
@cerndb

CERN Database Group

Popular repositories

  1. dist-keras Public archive

    Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

    Python 620 172

  2. Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks and nodes.

    Java 56 18

  3. Code and examples of how to write and deploy Apache Spark Plugins with Spark 3.x. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spa…

    Scala 54 9

  4. This repo provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.

    Dockerfile 52 16

  5. Code and tests for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"

    Jupyter Notebook 25 10

  6. Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.

    Shell 24 10

Repositories

  • SparkTraining Public

    Training material for the CERN course on Apache Spark: https://sparktraining.web.cern.ch/

    Jupyter Notebook 1 0 0 0 Updated Oct 13, 2022
  • spark-dashboard Public

    This repo provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.

    Dockerfile 52 Apache-2.0 16 1 0 Updated Oct 5, 2022
  • sparkMeasure Public

    This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics.

    Scala 11 Apache-2.0 3 0 0 Updated Oct 5, 2022
  • SparkPlugins Public

    Code and examples of how to write and deploy Apache Spark Plugins with Spark 3.x. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.

    Scala 54 Apache-2.0 9 2 0 Updated Aug 26, 2022
  • hadoop-xrootd Public

    Mirror of CERN db/hadoop-xrootd. Hadoop-XRootD Filesystem Connector

    Java 6 Apache-2.0 3 3 1 Updated Aug 11, 2022
  • zkpolicy Public

    Zookeeper Policy Audit Tool (aka zkPolicy) for checking and enforcing ACLs on ZNodes.

    Java 4 MIT 1 1 0 Updated Apr 29, 2022
  • SparkDLTrigger Public

    Code and tests for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"

    Jupyter Notebook 25 Apache-2.0 10 0 0 Updated Feb 24, 2022
  • storage-api Public

    Unified RESTful interface for managing CERNs data storage back-ends

    Python 7 GPL-3.0 2 1 2 Updated Jan 31, 2022
  • cern-sso-python Public

    Python Re-implementation of the cern-get-sso-cookie functionality

    Python 10 6 1 0 Updated Jan 11, 2022
  • hbase-packet-inspector Public

    Analyzes network traffic of HBase RegionServers

    Clojure 1 Apache-2.0 5 0 0 Updated Nov 5, 2021

Top languages

Loading…

Most used topics

Loading…