-
Notifications
You must be signed in to change notification settings - Fork 1k
Pull requests: huggingface/tokenizers
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: add progress_format option for machine-readable JSON output
#1921
opened Dec 26, 2025 by
podarok
Loading鈥�
6 tasks done
Use
unicode-normalization instead of unicode-normalization-alignments
#1912
opened Dec 14, 2025 by
IvanIsCoding
Loading鈥�
Providing byte level offsets for effective alignment in Cross-Tokenizer On-Policy Distillation
Feature Request
#1880
opened Oct 30, 2025 by
JqzChandler
Loading鈥�
Add a multithreaded tokenizer test and as well as 3.14 and 3.14t CI
#1864
opened Sep 12, 2025 by
ngoldbaum
Loading鈥�
feat: allow BPETrainer to be seeded with a set of initial tokens
#1862
opened Sep 6, 2025 by
henrycharlesworth
Loading鈥�
Fix unsigned integer underflow issue with truncation
#1859
opened Sep 1, 2025 by
maxdebayser
Loading鈥�
Adding multiprocessing for sentencepiece_extractor
#1804
opened Jun 19, 2025 by
AamodThakur
Loading鈥�
Expose
Encoding attributes via the buffer protocol interface
#1789
opened Jun 4, 2025 by
mariosasko
Loading鈥�
Pre-tokenizers that support multi-word/non-whitespace BPE in single pass
#1753
opened Mar 22, 2025 by
mjbommar
Loading鈥�
Previous Next
ProTip!
Mix and match filters to narrow down what you鈥檙e looking for.