New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
[BUG] GPT-J InferenceEngine Initialization Failure: Something isn't working
RuntimeError
bug
#1946
opened May 10, 2022 by
joehoover
[BUG] ZeRO 1 and ZeRO 2 produces different losses
bug
Something isn't working
#1945
opened May 10, 2022 by
szhengac
[BUG] Error on loading saved optimizer after training (zero-3)
bug
Something isn't working
#1940
opened May 6, 2022 by
base-y
Can't run ZeRO on single GPU with windows11[BUG]
bug
Something isn't working
#1928
opened Apr 30, 2022 by
Sagivfer
[BUG] No Model Parameters Specified with Latest Release
bug
Something isn't working
#1926
opened Apr 29, 2022 by
jmwoloso
[BUG] Floating Point Exception (core dump) at launch_attn_softmax_v2<float>
bug
Something isn't working
#1925
opened Apr 29, 2022 by
codertimo
[BUG] OpenMPI backend doesn't support custom mpi launch args
bug
Something isn't working
#1924
opened Apr 29, 2022 by
flyhighzy
[BUG] Multi-Node Address in Use Error
bug
Something isn't working
#1923
opened Apr 28, 2022 by
Sanger2000
[BUG] attention_mask is overwritten by dummy tensor at DeepSpeedSelfAttentionFunction
bug
Something isn't working
#1912
opened Apr 26, 2022 by
codertimo
[BUG] Autotuner is not launching experiments with correct hostfile setting
bug
Something isn't working
#1904
opened Apr 20, 2022 by
grzywada
[REQUEST] torch equivalent api model.no_sync()
enhancement
New feature or request
#1902
opened Apr 20, 2022 by
tangzhy
[REQUEST] SSG GPU support
enhancement
New feature or request
#1898
opened Apr 19, 2022 by
LifeIsStrange
[BUG] DeepSpeed zero_to_fp32.py script ignores some layers while creating FP32 checkpoints from DS ZeRO checkpoints.
bug
Something isn't working
#1896
opened Apr 19, 2022 by
rohitdwivedula
[BUG] Zero3 Checkpointing doesn't include HF T5's token embeddings
bug
Something isn't working
#1893
opened Apr 15, 2022 by
m3rlin45
[REQUEST] cpu offload needs a max cpu memory config + pointers to cgroups/cpu oom handlers
enhancement
New feature or request
#1891
opened Apr 12, 2022 by
stas00
Instructions for building the AIO op using New feature or request
libaio from conda
enhancement
#1890
opened Apr 12, 2022 by
stas00
[REQUEST] removing the requirement for all layers to always execute in sync
enhancement
New feature or request
#1888
opened Apr 12, 2022 by
stas00
Loading fp16 model checkpoints with MoE layers
bug
Something isn't working
#1876
opened Mar 31, 2022 by
joeljang
Previous Next
ProTip!
Follow long discussions with comments:>50.