model-parallelism

Hi ,

I have tried out both loss.backward() and model_engine.backward(loss) for my code. There are several subtle differences that I have observed , for one retain_graph = True does not work for model_engine.backward(loss) . This is creating a problem since buffers are not being retained every time I run the code for some reason.

Please look into this if you could.

model-parallelism

Here are 14 public repositories matching this topic...

microsoft / DeepSpeed

Difference between loss.backward() and model_engine.backward(loss) ?

kakaobrain / torchgpipe

PaddlePaddle / FleetX

ngrabaskas / Torch-Automatic-Distributed-Neural-Network

LER0ever / HPGO

EunjuYang / distributed-tf

atakehiro / 3D-U-Net-pytorch-model-parallel

d4l3k / axe

mkrdip / alcf

dscpesu / NetTorrent

mzj14 / mesh

zhuangsc / altsplit

ankahira / chainermnx

olk / mnist-performance

Improve this page

Add this topic to your repo