language-model

Hi, I am interested in using the DeBERTa model that was recently implemented here and incorporating it into FARM so that it can also be used in open-domain QA settings through Haystack.

Just wondering why there's only a Slow Tokenizer implemented for DeBERTa and wondering if there are plans to create the Fast Tokeni

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl

From paper, it mentioned

Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.

It means that 15% of token will be choose for sure.

From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.

Create a suite of tools to easily manipulate SQuAD format data. It would be useful to have tools to do things such as merging annotations, converting SQuAD format to Pandas data frame and vice versa, easier functions to remove samples / paragraphs / annotations.

Issue to track tutorial requests:

Deep Learning with PyTorch: A 60 Minute Blitz - #69

language-model

Here are 665 public repositories matching this topic...

huggingface / transformers

brightmart / nlp_chinese_corpus

huggingface / tokenizers

codertimo / BERT-pytorch

tensorflow / lingvo

CyberZHG / keras-bert

chiphuyen / lazynlp

Separius / awesome-sentence-embedding

salesforce / awd-lstm-lm

zzw922cn / awesome-speech-recognition-speech-synthesis-papers

CLUEbenchmark / CLUE

deepset-ai / haystack

NVIDIA / OpenSeq2Seq

huggingface / pytorch-openai-transformer-lm

EleutherAI / gpt-neo

mihail911 / nlp-library

prabhuomkar / pytorch-cpp

brightmart / bert_language_understanding

explosion / spacy-transformers

nlpodyssey / spago

ymcui / Chinese-ELECTRA

LiyuanLucasLiu / LM-LSTM-CRF

pykaldi / pykaldi

smilelight / lightNLP

codekansas / keras-language-modeling

IsaacChanghau / DL-NLP-Readings

SKTBrain / KoBERT

cedrickchee / awesome-bert-nlp

brightmart / sentiment_analysis_fine_grain

lonePatient / albert_pytorch

Improve this page

Add this topic to your repo