Skip to content
huggingface
/
tokenizers
Sign up
Why GitHub?
Features
→
Code review
Project management
Integrations
Actions
Packages
Security
Team management
Hosting
Customer stories
→
Security
→
Team
Enterprise
Explore
Explore GitHub
→
Learn & contribute
Topics
Collections
Trending
Learning Lab
Open source guides
Connect with others
Events
Community forum
GitHub Education
Marketplace
Pricing
Plans
→
Compare plans
Contact Sales
Nonprofit
→
Education
→
In this repository
All GitHub
↵
Jump to
↵
No suggested jump to results
In this repository
All GitHub
↵
Jump to
↵
In this repository
All GitHub
↵
Jump to
↵
Sign in
Sign up
huggingface
/
tokenizers
Watch
57
Star
2.9k
Fork
183
Code
Issues
50
Pull requests
6
Actions
Projects
0
Security
Insights
Code
Issues
50
Pull requests
6
Projects
0
Actions
Security
Pulse
Labels
12
Milestones
0
Labels
12
Milestones
0
New pull request
New
6 Open
113 Closed
6 Open
113 Closed
Author
Filter by author
Label
Filter by label
Projects
Filter by project
Milestones
Filter by milestone
Reviews
Filter by reviews
No reviews
Review required
Approved review
Changes requested
Assignee
Filter by who’s assigned
Sort
Sort by
Newest
Oldest
Most commented
Least commented
Recently updated
Least recently updated
Most reactions
👍
👎
😄
🎉
😕
❤️
🚀
👀
Python - Make Tokenizer, parts and Encoding pickable
#273 opened
May 19, 2020
by
n1t0
•
Draft
0 of 1
Add Serialization
#272 opened
May 19, 2020
by
n1t0
0 of 4
1
Allow pre-tokenized inputs to encode/encode_batch
#249 opened
Apr 25, 2020
by
n1t0
•
Approved
0 of 2
12
Added Rust code to train from word count (not made available in Python)
#183 opened
Mar 4, 2020
by
chrhad
Expose train with word counts API
#90 opened
Jan 19, 2020
by
vilunov
1
TokenizerBuilder
#27 opened
Jan 2, 2020
by
epwalsh
4
ProTip!
Follow long discussions with
comments:>50
.
You can’t perform that action at this time.
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.