#
model-parallelism
Here are 14 public repositories matching this topic...
A GPipe implementation in PyTorch
-
Updated
Jul 29, 2020 - Python
Paddle Distributed Training Extended. 飞桨分布式训练扩展包
benchmark
cloud
lightning
elastic
large-scale
data-parallelism
paddlepaddle
model-parallelism
distributed-algorithm
pipeline-parallelism
fleet-api
paddlecloud
-
Updated
Aug 24, 2020 - Shell
Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI, this module automatically parallelizes neural network training.
machine-learning
neural-network
torch7
openmpi
data-parallelism
model-parallelism
distributed-machine-learning
-
Updated
Feb 28, 2018 - Lua
Development of Project HPGO | Hybrid Parallelism Global Orchestration
rust
machine-learning
tensorflow
pytorch
data-parallelism
model-parallelism
distributed-training
pipedream
gpipe
pipeline-parallelism
-
Updated
Aug 24, 2020 - Rust
distributed tensorflow (model parallelism) example repository
-
Updated
Jul 13, 2019 - Python
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
-
Updated
Aug 9, 2020 - Python
A simple graph partitioning algorithm written in Go. Designed for use for partitioning neural networks across multiple devices which has an added cost when crossing device boundaries.
-
Updated
Jun 17, 2020 - Go
Contains materials of internship at ALCF during summer of 2019
-
Updated
Aug 15, 2019 - Python
A decentralized and distributed framework for training DNNs
-
Updated
Aug 25, 2019 - Python
Mesh TensorFlow: Model Parallelism Made Easier
-
Updated
Dec 18, 2018 - Python
An MPI-based distributed model parallelism technique for MLP
-
Updated
Jun 10, 2020 - C
performance test of MNIST hand writings usign MXNet + TF
python
mxnet
tensorflow
keras
mnist
classification
gluon
multi-gpu
model-parallelism
horovod
multi-gpu-training
mirrored-strategy
-
Updated
Jan 31, 2020 - Python
Improve this page
Add a description, image, and links to the model-parallelism topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the model-parallelism topic, visit your repo's landing page and select "manage topics."
Hi ,
I have tried out both loss.backward() and model_engine.backward(loss) for my code. There are several subtle differences that I have observed , for one retain_graph = True does not work for model_engine.backward(loss) . This is creating a problem since buffers are not being retained every time I run the code for some reason.
Please look into this if you could.