Making big AI models cheaper, easier, and more scalable
-
Updated
Feb 20, 2023 - Python
Making big AI models cheaper, easier, and more scalable
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A GPipe implementation in PyTorch
Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC LocalSGD Wide&Deep
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
A curated list of awesome projects and papers for distributed training or inference
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
Adaptive Tensor Parallelism for Foundation Models
Development of Project HPGO | Hybrid Parallelism Global Orchestration
Description of Framework for Efficient Fused-layer Cost Estimation, Legion (2021)
A decentralized and distributed framework for training DNNs
Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI, this module automatically parallelizes neural network training.
Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)
distributed tensorflow (model parallelism) example repository
A simple graph partitioning algorithm written in Go. Designed for use for partitioning neural networks across multiple devices which has an added cost when crossing device boundaries.
An MPI-based distributed model parallelism technique for MLP
Contains materials of internship at ALCF during summer of 2019
A fully distributed hyperparameter optimization tool for PyTorch DNNs
Add a description, image, and links to the model-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the model-parallelism topic, visit your repo's landing page and select "manage topics."