Issues: huggingface/tokenizers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
CVE-2023-22466 vuln of the component tokio which is used to compiled in tokenizers.
#1184
opened Mar 20, 2023 by
wqh17101
NodeJS: Can't initialize
BertWordPiece using fromFile or init from vocabFile
#1180
opened Mar 15, 2023 by
douglasqian
Feature request: list-like behaviour for Good for newcomers
Sequence
good first issue
#1175
opened Mar 8, 2023 by
davidgilbertson
GPT-2 tokeniser's decoder is incorrect and doesn't roundtrip
#1164
opened Feb 21, 2023 by
hauntsaninja
text is listed as optional in api but tokenizer throws error when missing
#1159
opened Feb 4, 2023 by
epinnock
Is there any support for 'google/tapas-mini-finetuned-wtq' tokenizer?
#1114
opened Nov 26, 2022 by
memetrusidovski
Adding treat_whitespace_as_suffix as a new feature to sentencepiece?
#1112
opened Nov 20, 2022 by
Smu-Tan
How can I keep the initial input vocab and incremental add the new tokens during re-training a tokenizer?
#1109
opened Nov 17, 2022 by
henryxiao1997
tokenizers 0.13.2 does not compile when default features are turned off
#1104
opened Nov 10, 2022 by
jneuff
Difference between PreTrainedTokenizerFast Python and Node SentencePieceBPETokenizer
#1100
opened Nov 8, 2022 by
loretoparisi
Can not get package to build with Python 3.11 on a minimal linux environment
#1092
opened Nov 1, 2022 by
ZetiMente
Previous Next
ProTip!
Follow long discussions with comments:>50.