Unsupervised text tokenizer focused on computational efficiency
-
Updated
Feb 13, 2020 - C++
The Transaction.md file doesn't contain enough details about its actual behavior.
morphology_han-readings.py passes "北京大学生物系主任办公室内部会议" and prints out
{'hanReadings': [['Bei3-jing1-Da4-xue2'], null, ['zhu3-ren4'], ['ban4-gong1-shi4'], ['nei4-bu4'], ['hui4-yi4']]}
The element of the list, null, should be ['Sheng1-wu4'], i.e., "Biology."
Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.
To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."
OSX build notes have the following line
brew install automake berkeley-db4 libtool boost --c++11 miniupnpc openssl pkg-config protobuf python3 qt libevent
However, the boost --c++11 isn't a valid command anymore. Need to update it