bert

huggingface/transformers#12276 introduced a new --log_level feature, which now allows users to set their desired log level via CLI or TrainingArguments.

run_translation.py was used as a "model" for other examples.

Now we need to replicate this to all other Trainer-based examples under examples/pytorch/, the 3 changes are

importing datasets
using `training

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl

From paper, it mentioned

Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.

It means that 15% of token will be choose for sure.

From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.

Many users in our community have been asking to have easier ways to return the output of intermediate nodes. I can see that this could be very useful for debugging and also qualitative evaluation.

I think this feature would be very useful, though the exact design is not yet fully clear.

建议在readme里面添加预训练模型环境变量地址修改的提示。
我之前遇到过磁盘空间不够，原因就是这个问题
但不是所有人都会debug源码去查看框架读的是哪个系统变量

你好，看代码使用的训练数据为Restaurants_Train.xml.seg，请问这是这是在哪里下载的吗，还是semeval14的任务4中xml文件生成的？如果是后续生成的，请问有数据生成部分的代码吗？

bert

Here are 1,608 public repositories matching this topic...

huggingface / transformers

hanxiao / bert-as-service

graykode / nlp-tutorial

brightmart / nlp_chinese_corpus

ymcui / Chinese-BERT-wwm

huggingface / tokenizers

codertimo / BERT-pytorch

PaddlePaddle / ERNIE

macanv / BERT-BiLSTM-CRF-NER

brightmart / albert_zh

jessevig / bertviz

bentrevett / pytorch-sentiment-analysis

IntelLabs / nlp-architect

shibing624 / pycorrector

JohnSnowLabs / spark-nlp

CyberZHG / keras-bert

asyml / texar

BrikerMan / Kashgari

CLUEbenchmark / CLUE

deepset-ai / haystack

Separius / awesome-sentence-embedding

brightmart / roberta_zh

Jiakui / awesome-bert

utterworks / fast-bert

dbiir / UER-py

ChineseGLUE / ChineseGLUE

github / CodeSearchNet

PaddlePaddle / PaddleNLP

msgi / nlp-journey

songyouwei / ABSA-PyTorch

Improve this page

Add this topic to your repo