Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[REQUEST] sync FusedAdam with the upstream
enhancement
New feature or request
#3006
opened Mar 13, 2023 by
stas00
[BUG]Assertion error while using pipeline parallelism
bug
Something isn't working
training
#3003
opened Mar 13, 2023 by
x54-729
[BUG] Peft Training with Zero.Init() and Zero3 will increase GPU memory every forward step
bug
Something isn't working
training
#3002
opened Mar 13, 2023 by
dumpmemory
[REQUEST] Support Deepspeed inference for Fairseq Transformer LM model
enhancement
New feature or request
inference
#3001
opened Mar 13, 2023 by
krishnanNuance
[BUG] NVMe Offloading OOMs While Zero Stage 3 Offloading Runs
bug
Something isn't working
training
#3000
opened Mar 13, 2023 by
stanleyshly
[BUG] Zero Offload Is Significantly Slower Than Normal Training
bug
Something isn't working
training
#2998
opened Mar 13, 2023 by
stanleyshly
[BUG] deepspeed_stage_3 was used in pytorch_lightning。when initialize, it cost huge cpu memory which increase with the grow of gpu num。
bug
Something isn't working
training
#2997
opened Mar 12, 2023 by
linyubupa
13B model training OOM with 8x48G machine and limited CPU RAM
bug
Something isn't working
training
#2996
opened Mar 11, 2023 by
lavaaa7
[BUG] Does bf16 support Zero stage 1 with pipeline?
bug
Something isn't working
training
#2994
opened Mar 10, 2023 by
lyj201002
should DeepSpeedEngine.save_checkpoint be only under main_process
bug
Something isn't working
training
#2993
opened Mar 10, 2023 by
better629
[BUG] ZeRO-Offload GPU -> CPU datatype
bug
Something isn't working
training
#2986
opened Mar 9, 2023 by
taehyunzzz
[BUG] DeepSpeed Inference reports Signal code: Integer divide-by-zero when Seq length is 4096 for GPT2
bug
Something isn't working
inference
#2985
opened Mar 9, 2023 by
zhen-jia
[BUG] Sudden increase in CPU memory usage when engine.step() using zero-3
bug
Something isn't working
training
#2984
opened Mar 9, 2023 by
4AKker
Change request regarding the use of CUDA_VISIBLE_DEVICES in deepspeed/launcher/runner.py
enhancement
New feature or request
#2980
opened Mar 9, 2023 by
JY-Ren
AttributeError: 'PipelineEngine' object has no attribute 'flatten'
bug
Something isn't working
training
#2978
opened Mar 9, 2023 by
suiyan538
【BUG】trainig use CPU offload raise OOM error in wsl2 system
training
#2977
opened Mar 9, 2023 by
lpty
[BUG] export CUDA_VISIBLE_DEVICES=0,1,6,7 does not work
bug
Something isn't working
training
#2976
opened Mar 9, 2023 by
xu-song
[REQUEST] Custom partition option for PipelineModule
enhancement
New feature or request
#2974
opened Mar 9, 2023 by
sgunasekar
Multi-node training reports "stop_waiting response required" and "connection reset by peer"
bug
Something isn't working
training
#2973
opened Mar 8, 2023 by
maxmaier59
[BUG] 'StableDiffusionPipeline' object has no attribute 'children'
bug
Something isn't working
inference
#2968
opened Mar 8, 2023 by
stevensu1977
Does deepspeed automaticlly partition the model to multi gpus when stage=3 is specified?
#2966
opened Mar 8, 2023 by
superzhangmch
[REQUEST]How to deploy multi-nodes training without hostfile?
enhancement
New feature or request
#2958
opened Mar 7, 2023 by
SefaZeng
deepspeed on T4 GPU server and run Stable diffustion model inference error
bug
Something isn't working
inference
#2957
opened Mar 7, 2023 by
qingyuan18
Previous Next
ProTip!
no:milestone will show everything without a milestone.