run intel caffe using multi-node with mlsl on AMD cpus ,stopped at Iteration 0 #19
Comments
|
Hi @Tron-x, could you please specify how do you launch IntelCaffe over OpenMPI? |
|
hi @mshiryaev, when i use openmpi ,i launch intelcaffe with a case such as : |
|
Hi @Tron-x |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

when i run intel caffe on multi-node(four node) with mlsl on AMD cpus,something is wrong ,the training stopped at the Iteration 0, when run on single node ,it is ok.


when i htop on evry node
my run instruct is :./scripts/run_intelcaffe.sh --hostfile /opt/caffe/mpd.hosts --network tcp --netmask enp3s0f0 --caffe_bin /opt/caffe/build/tools/caffe --solver /opt/caffe/models/intel_optimized_models/multinode/alexnet_4nodes/solver.prototxt
I think something is wrong with mlsl ,my mlsl version is

because when i run with my own openmpi,it is ok
The text was updated successfully, but these errors were encountered: