Skip to content
#

language-model

Here are 710 public repositories matching this topic...

transformers
stas00
stas00 commented Jun 12, 2021

Let's use this Issue to track performance issues and enhancement requests, so it's easier to prioritize the work.

This is for pytorch transformers

Also I will label it as a Good Difficult Issue in case someone is ready for a challenging but rewarding experience of figuring things out. If you do want to take the challenge comment in the corresponding Issue/PR that resonates with you s

tokenizers
david-waterworth
david-waterworth commented Feb 27, 2021

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl
haystack
brandenchan
brandenchan commented Jun 14, 2021

Many users in our community have been asking to have easier ways to return the output of intermediate nodes. I can see that this could be very useful for debugging and also qualitative evaluation.

I think this feature would be very useful, though the exact design is not yet fully clear.

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

  • Updated Jun 16, 2021

Improve this page

Add a description, image, and links to the language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the language-model topic, visit your repo's landing page and select "manage topics."

Learn more