bert
Here are 1,606 public repositories matching this topic...
-
Updated
Jun 14, 2021 - Python
-
Updated
May 2, 2021 - Jupyter Notebook
-
Updated
Oct 22, 2020
-
Updated
Jan 27, 2021 - Python
The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
-
Updated
May 27, 2021 - Python
-
Updated
Feb 24, 2021 - Python
-
Updated
Oct 22, 2020 - Python
-
Updated
May 8, 2021 - Python
-
Updated
Mar 12, 2021 - Jupyter Notebook
-
Updated
Jun 10, 2021 - Python
-
Updated
Jun 18, 2021 - Python
-
Updated
Jun 21, 2021 - Scala
-
Updated
Jun 19, 2021 - Python
-
Updated
Sep 17, 2020 - Python
-
Updated
May 5, 2021 - Python
-
Updated
Jun 18, 2021 - Python
Many users in our community have been asking to have easier ways to return the output of intermediate nodes. I can see that this could be very useful for debugging and also qualitative evaluation.
I think this feature would be very useful, though the exact design is not yet fully clear.
-
Updated
Apr 23, 2021 - Python
-
Updated
Jun 29, 2020 - Python
-
Updated
Mar 21, 2021
-
Updated
Jun 8, 2021 - Python
-
Updated
Jun 21, 2021 - Python
-
Updated
Jan 3, 2021 - Python
-
Updated
Jan 28, 2021 - Jupyter Notebook
预训练模型下载地址修改
建议在readme里面添加预训练模型环境变量地址修改的提示。
我之前遇到过磁盘空间不够,原因就是这个问题
但不是所有人都会debug源码去查看框架读的是哪个系统变量
-
Updated
Apr 29, 2021 - Python
训练数据集问题
你好,看代码使用的训练数据为Restaurants_Train.xml.seg,请问这是这是在哪里下载的吗,还是semeval14的任务4中xml文件生成的?如果是后续生成的,请问有数据生成部分的代码吗?
Improve this page
Add a description, image, and links to the bert topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."
Add better error message to
HubertForCTC,Wav2Vec2ForCTCif labels are bigger than vocab size.Motivation
Following this issue: huggingface/transformers#12264 it is clear that an error message should be thrown if any of the any of the labels are >
self.config.vocab_sizeor else silent errors can sneak into the training script.So w