Skip to content
#

bert

Here are 1,267 public repositories matching this topic...

transformers
sgugger
sgugger commented Jan 22, 2021

To get the full speed-up of FP16 training, every tensor passed through the model should have all its dimensions be a multiple of 8. In the new PyTorch examples, when using dynamic padding, the tensors are padded to the length of the biggest sentence of the batch, but that number is not necessarily a multiple of 8.

The examples should be improved to pass along the option pad_to_multiple_of=8 w

haystack

Improve this page

Add a description, image, and links to the bert topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."

Learn more