word-segmentation
Here are 82 public repositories matching this topic...
-
Updated
Mar 17, 2020 - C#
-
Updated
Jul 22, 2019 - C++
-
Updated
Feb 13, 2020 - C++
Almost all models we use now (see list in #298) are trained privately by different contributors. With code on notebooks or scripts that may be private or may be open source but difficult to follow.
To make PyThaiNLP more transparent and more customizable by users, should try to put training scripts or instructions (can be pointers) somewhere in the repo.
Known scripts/notebooks and data
-
Updated
Oct 22, 2019 - Python
-
Updated
Mar 9, 2020 - Python
-
Updated
Jan 28, 2020 - Python
-
Updated
Mar 6, 2020 - Java
For Juman++ to be widely usable, we want to have a documented and stable C API and an option to have a dynamically linked library.
That library probably should use -fvisibility=hidden and explicit visibility on exported symbols on Unixes and __declspec(dllimport/dllexport) on Windows.
The minimal API should be:
- Loading a model using a config file
- Analyzing a sentence
- Accessing
-
Updated
Dec 31, 2019 - Python
-
Updated
Mar 12, 2020 - Jupyter Notebook
-
Updated
Feb 1, 2018 - Python
-
Updated
Jan 8, 2019 - Python
-
Updated
Mar 4, 2020 - C++
-
Updated
Apr 26, 2017 - Java
-
Updated
Feb 14, 2020 - C++
-
Updated
Jun 26, 2019 - Java
-
Updated
May 26, 2017 - Python
-
Updated
Jan 19, 2020 - HTML
-
Updated
Mar 14, 2020 - JavaScript
-
Updated
Mar 7, 2019 - Python
-
Updated
Aug 16, 2019 - Python
-
Updated
Mar 4, 2020 - Java
-
Updated
Nov 22, 2016 - Python
-
Updated
May 6, 2018 - C#
-
Updated
Aug 10, 2018 - Python
-
Updated
Dec 23, 2019 - Python
Improve this page
Add a description, image, and links to the word-segmentation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the word-segmentation topic, visit your repo's landing page and select "manage topics."
It would be worth to provide a tutorial how to train a simple cross-language classification model using sentencepiece. Supposed to have a given training set and have chosen a model (let'say a simple Word2Vec plus softmax or a LSTM model, etc), how to use the created sentencepiece model (vocabulary/codes) to feed this model for train and inference?