Skip to content
#

bert

Here are 1,608 public repositories matching this topic...

transformers
stas00
stas00 commented Jun 22, 2021

huggingface/transformers#12276 introduced a new --log_level feature, which now allows users to set their desired log level via CLI or TrainingArguments.

run_translation.py was used as a "model" for other examples.

Now we need to replicate this to all other Trainer-based examples under examples/pytorch/, the 3 changes are

  1. importing datasets
  2. using `training
tokenizers
david-waterworth
david-waterworth commented Feb 27, 2021

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl
haystack
brandenchan
brandenchan commented Jun 14, 2021

Many users in our community have been asking to have easier ways to return the output of intermediate nodes. I can see that this could be very useful for debugging and also qualitative evaluation.

I think this feature would be very useful, though the exact design is not yet fully clear.

Improve this page

Add a description, image, and links to the bert topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."

Learn more