DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
Updated
Dec 13, 2022 - Python
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Colossal-AI: A Unified Deep Learning System for Big Model Era
A GPipe implementation in PyTorch
Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC LocalSGD Wide&Deep
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
A curated list of awesome projects and papers for distributed training or inference
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
Development of Project HPGO | Hybrid Parallelism Global Orchestration
Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI, this module automatically parallelizes neural network training.
Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)
distributed tensorflow (model parallelism) example repository
A decentralized and distributed framework for training DNNs
A simple graph partitioning algorithm written in Go. Designed for use for partitioning neural networks across multiple devices which has an added cost when crossing device boundaries.
Contains materials of internship at ALCF during summer of 2019
An MPI-based distributed model parallelism technique for MLP
A fully distributed hyperparameter optimization tool for PyTorch DNNs
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
Add a description, image, and links to the model-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the model-parallelism topic, visit your repo's landing page and select "manage topics."