An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
-
Updated
Feb 25, 2022 - Python
Similar to other notes, need "tl;dr" notes for GPT-3 paper. Please use TEMPLATE.md format and follow instructions on README.md
Add a description, image, and links to the gpt-3 topic page so that developers can more easily learn about it.
To associate your repository with the gpt-3 topic, visit your repo's landing page and select "manage topics."
Describe the bug
Setting
"text-gen-type": "interactive"results in anIndexError: : shape mismatch: indexing tensors could not be broadcast together with shapes [4], [3]. Other generation types work.To Reproduce
Steps to reproduce the behavior: