Skip to content
#

bert

Here are 1,606 public repositories matching this topic...

transformers
tokenizers
david-waterworth
david-waterworth commented Feb 27, 2021

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl
haystack
brandenchan
brandenchan commented Jun 14, 2021

Many users in our community have been asking to have easier ways to return the output of intermediate nodes. I can see that this could be very useful for debugging and also qualitative evaluation.

I think this feature would be very useful, though the exact design is not yet fully clear.

Improve this page

Add a description, image, and links to the bert topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."

Learn more