distributed-training

Please can you train ghostnet.
(i don't have the imagenet dataset)

We would like to forward a particular 'key' column which is part of the features to appear alongside the predictions - this is to be able to identify to which set of features a particular prediction belongs to. Here is an example of predictions output using the tensorflow.contrib.estimator.multi_class_head:

{"classes": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
 "scores": [0.068196

I have the same hardware envs, same network, but I could not get the result as you, almost half as you. Any best practices and experience? thanks very much! for bytePS with 1 instance and 8 GPU, I have similar testing result.

Currently the allocator triggers its allocation policy at a fixed time interval (default 60s). This is useful for periodically re-optimizing the resource allocations, but new jobs also need to wait for the next allocation cycle to start. When there are enough resources available for the new job, it should be possible to immediately schedule it.

Possible implementation:

Change `sched/alloca

distributed-training

Here are 43 public repositories matching this topic...

PaddlePaddle / Paddle

rwightman / pytorch-image-models

Feature request : ghostnet

tensorflow / adanet

Allow one to forward features to predictions

bytedance / byteps

How did you get the horovod & bytePS performance

tensorlayer / hyperpose

determined-ai / determined

learning-at-home / hivemind

awslabs / deeplearning-cfn

wenwei202 / terngrad

dougsouza / pytorch-sync-batchnorm-example

lsds / KungFu

maudzung / YOLO3D-YOLOv4-PyTorch

synxlin / deep-gradient-compression

awslabs / dynamic-training-with-apache-mxnet-on-aws

bryanyzhu / Video-Tutorial-CVPR2020

bindog / pytorch-model-parallel

Accenture / mercury

bytedance / ps-lite

petuum / adaptdl

Short-circuit allocation when new job is immediately schedulable

Add BERT training/fine-tuning example

Support new torchtext data loading

Azure / DistributedDeepLearning

aws-samples / TensorFlow-in-SageMaker-workshop

graykode / horovod-ansible

richardkxu / distributed-pytorch

jiankaiwang / distributed_training

Shenggan / DeepCell-Keras

hysts / pytorch_yolov3

ZJU-OpenKS / OpenKS

asprenger / distributed-training-patterns

erfannoury / cifar-tf

valayDave / metaflow-kube-demo

Improve this page

Add this topic to your repo