language-model
Here are 665 public repositories matching this topic...
-
Updated
Oct 22, 2020
The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
-
Updated
Mar 10, 2021 - Python
-
Updated
Jul 28, 2020 - Python
-
Updated
Nov 11, 2020 - Python
-
Updated
Dec 9, 2020 - Python
-
Updated
May 7, 2020 - Python
-
Updated
Mar 8, 2021
-
Updated
Nov 6, 2020 - Python
Create a suite of tools to easily manipulate SQuAD format data. It would be useful to have tools to do things such as merging annotations, converting SQuAD format to Pandas data frame and vice versa, easier functions to remove samples / paragraphs / annotations.
-
Updated
Jan 14, 2021 - Python
-
Updated
Feb 7, 2019 - Python
-
Updated
Feb 27, 2021 - Python
-
Updated
Aug 5, 2020
-
Updated
Jan 1, 2019 - Python
-
Updated
Mar 9, 2021 - Python
-
Updated
Mar 7, 2021 - Go
-
Updated
Feb 23, 2021 - Python
-
Updated
Oct 29, 2020 - Python
-
Updated
Jan 12, 2021 - Python
-
Updated
Dec 14, 2020 - Python
-
Updated
Dec 18, 2017 - Python
-
Updated
Mar 10, 2021 - TeX
-
Updated
Feb 10, 2021 - Jupyter Notebook
-
Updated
Feb 18, 2021
-
Updated
Nov 15, 2018 - Jupyter Notebook
Improve this page
Add a description, image, and links to the language-model topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the language-model topic, visit your repo's landing page and select "manage topics."
Hi, I am interested in using the DeBERTa model that was recently implemented here and incorporating it into FARM so that it can also be used in open-domain QA settings through Haystack.
Just wondering why there's only a Slow Tokenizer implemented for DeBERTa and wondering if there are plans to create the Fast Tokeni