language-model
Here are 819 public repositories matching this topic...
-
Updated
Oct 22, 2020
-
Updated
Nov 22, 2021 - Python
-
Updated
Jan 6, 2022 - Rust
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
_handle_duplicate_documents and _drop_duplicate_documents in the elastic search document store will always report self.index as the index with the conflict, which is obviously incorrect.
Edit: Upon further investigation, this is actually a lot worse. Using multiple indices with the ElasticSearch DocumentStore is completely broken due to the fact, that this is used in `_handle_duplicate_do
-
Updated
Jan 5, 2022 - Python
-
Updated
Nov 11, 2021 - Python
-
Updated
Jan 6, 2022 - Python
-
Updated
Jan 6, 2022 - Python
-
Updated
Jun 19, 2021 - Python
-
Updated
Jan 5, 2022
-
Updated
Nov 11, 2020 - Python
-
Updated
Apr 23, 2021 - Python
-
Updated
May 7, 2020 - Python
-
Updated
May 11, 2021 - Python
-
Updated
Aug 9, 2021 - Python
-
Updated
Jan 5, 2022 - Python
Issue to track tutorial requests:
- Deep Learning with PyTorch: A 60 Minute Blitz - #69
- Sentence Classification - #79
-
Updated
Dec 7, 2021 - Go
-
Updated
Dec 16, 2021 - Python
-
Updated
Dec 17, 2021 - Python
-
Updated
Aug 5, 2020
-
Updated
Jan 1, 2019 - Python
-
Updated
Dec 30, 2021 - Python
-
Updated
Dec 8, 2021 - Python
-
Updated
Dec 16, 2021 - Python
-
Updated
Dec 14, 2020 - Python
-
Updated
Dec 24, 2021 - Jupyter Notebook
-
Updated
Dec 10, 2021 - TeX
Improve this page
Add a description, image, and links to the language-model topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the language-model topic, visit your repo's landing page and select "manage topics."
Fast Tokenizer for DeBERTA-V3 and mDeBERTa-V3
Motivation
DeBERTa V3 is an improved version of DeBERTa. With the V3 version, the authors also released a multilingual model "mDeBERTa-base" that outperforms XLM-R-base. However, DeBERTa V3 currently lacks a FastTokenizer implementation which makes it impossible to use with some of the example scripts (They require a Fa