language-model
Here are 530 public repositories matching this topic...
-
Updated
Dec 1, 2019
PositionalEmbedding
The position embedding in the BERT is not the same as in the transformer. Why not use the form in bert?
Spacy has customizable word level tokenizers with rules for multiple languages. I think porting that to rust would add nicely to this package. Having a customizable uniform word level tokenization across platforms (client web, server) and languages would be beneficial. Currently, idk any clean way or whether it's even possible to write bindings for spacy cython.
Spacy Tokenizer Code
https:
Rust documentation
-
Updated
Jul 16, 2020 - Python
-
Updated
Oct 7, 2019 - Python
-
Updated
Jul 8, 2020 - Python
-
Updated
May 7, 2020 - Python
-
Updated
Jul 3, 2020
-
Updated
Jul 16, 2020 - Scala
-
Updated
Jul 9, 2020 - Python
-
Updated
Feb 7, 2019 - Python
-
Updated
Jul 15, 2020 - Python
-
Updated
Jun 11, 2020
-
Updated
Jan 1, 2019 - Python
Hi,
When we try to tokenize the following sentence:
If we use spacy
a = spacy.load('en_core_web_lg')
doc = a("I like the link http://www.idph.iowa.gov/ohds/oral-health-center/coordinator")
list(doc)
We got
[I, like, the, link, http://www.idph.iowa.gov, /, ohds, /, oral, -, health, -, center, /, coordinator]
But if we use the Spacy transformer tokenizer:
-
Updated
Jun 20, 2019 - Python
-
Updated
Jul 15, 2020 - Go
-
Updated
Jan 10, 2020 - Python
-
Updated
Dec 18, 2017 - Python
-
Updated
Jul 9, 2020 - C++
I think the filenames in models.sh referred to on lines 4-9 should refer to kaldi-generic-en-tdnn_f-r20190609* which is downloaded on line 3.
-
Updated
Jun 24, 2020 - TeX
-
Updated
Nov 15, 2018 - Jupyter Notebook
-
Updated
Jun 2, 2020 - Python
-
Updated
Jan 9, 2020 - Python
-
Updated
Jul 16, 2020 - Python
-
Updated
Feb 28, 2020 - Jupyter Notebook
-
Updated
Jul 8, 2020
Improve this page
Add a description, image, and links to the language-model topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the language-model topic, visit your repo's landing page and select "manage topics."
Consider this code that downloads models and tokenizers to disk and then uses
BertTokenizer.from_pretrainedto load the tokenizer from disk.ISSUE:
BertTokenizer.from_pretrained()does not seem to be compatible with Python's native pathlib module.