Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
-
Updated
Sep 5, 2022 - Python
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Unsupervised text tokenizer focused on computational efficiency
Explains nlp building blocks in a simple manner.
Fast and customizable text tokenization library with BPE and SentencePiece support
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Machine Learning for Phishing Website Detection
Learning BPE embeddings by first learning a segmentation model and then training word2vec
High performance unsupervised text tokenization for Ruby
Sentiment-based classification for stock article title using PhoBert
Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)
R package for Byte Pair Encoding based on YouTokenToMe
GPT3 encoder & decoder tool written in Swift
Natural Language EnCoder-Decoder: word, char, bpe etc
Byte Pair Encoding (BPE)
BPE (Byte-Pair Encoding) Encoder Decoder for OpenAI's GPT-2 / GPT-3 Implemented In Pure PHP, Zero Dependency, Multi Byte Supported.
Auto summarization from BPE tokenization
Golang BPE (Bytes Pair Encoding) algorithm implementation.
Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.
To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."