-
Updated
Dec 1, 2019
#
corpus
Here are 481 public repositories matching this topic...
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
nlp
news
wiki
text-classification
word2vec
corpus
dataset
question-answering
chinese
chinese-nlp
language-model
bert
chinese-corpus
pretrain
chinese-dataset
Deep Learning and deep reinforcement learning research papers and some codes
-
Updated
Jun 26, 2020
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
-
Updated
Feb 10, 2020 - Python
中文语言理解基准测评 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
benchmark
tensorflow
nlu
glue
corpus
transformers
pytorch
dataset
chinese
pretrained-models
language-model
albert
bert
roberta
chineseglue
-
Updated
Jun 10, 2020 - Python
搜索所有中文NLP数据集,附常用英文NLP数据集
nlp
qa
sentiment-analysis
text-classification
match
machine-translation
text-similarity
corpus
knowledge-graph
chinese
text-summarization
datasets
ner
machine-reading-comprehension
-
Updated
Mar 1, 2020 - Python
OpenData in insurance area for Machine Learning Tasks, 保险行业语料库
machine-learning
natural-language-processing
insurance
chatbot
corpus
dataset
question-answering
natural-language-understanding
qasystem
insuranceqa-corpus-zh
-
Updated
Jul 13, 2018 - Python
Chatbot in 200 lines of code using TensorLayer
-
Updated
Oct 6, 2019 - Python
koheiw
commented
Apr 12, 2020
In #1925, we noticed that normalization of hyphens in the corpus constructor is causing inconsistency.
If corpus is a an object to keep original texts, we probably should not do it in the corpus constructor but in token constructor.
微信公众号语料库
nlp
natural-language-processing
corpus
linguistics
weixin
chinese-nlp
corpora
weixin-data
wei-xin
yu-liao
yu-liao-ku
-
Updated
Jan 7, 2019
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
natural-language-processing
information-retrieval
corpus
language-detection
embeddings
named-entity-recognition
normalizer
spell-check
persian-language
stemmer
dependency-parser
persian-nlp
part-of-speech-tagger
morphological-analysis
persian-stemmer
shallow-parser
-
Updated
Jun 19, 2020
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
text-classification
corpus
dataset
chinese
semantic-similarity
pretrained-models
sentence-classification
albert
bert
sentence-analysis
distillation
sentence-pairs
roberta
-
Updated
Jul 8, 2020 - Python
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
nlp
language
translation
corpus
literature
corpus-linguistics
corpus-tools
multi-language-support
corpus-processing
-
Updated
Jul 5, 2020 - Python
A dataset of millions of news articles scraped from a curated list of data sources.
nlp
machine-learning
natural-language-processing
database
corpus
artificial-intelligence
dataset
fakenews
-
Updated
Jan 25, 2020
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
-
Updated
Sep 11, 2019 - HTML
WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
-
Updated
Jan 10, 2018 - Ruby
Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.
-
Updated
Jun 25, 2018 - Python
PTT 八卦版問答中文語料
chatbot
dialog
corpus
dataset
question-answering
chinese-nlp
ptt
chinese-corpus
chinese-chatbot
chinese-dataset
chatbot-corpus
-
Updated
Sep 9, 2019 - Jupyter Notebook
Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
machine-learning
natural-language-processing
corpus
pytorch
speech-synthesis
dataset
prosody
bert
sequence-labeling
-
Updated
Oct 30, 2019 - Python
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
python
nlp
c-plus-plus
library
corpus
linguistics
pattern-recognition
computational-linguistics
text-processing
ngram
ngrams
skipgram
-
Updated
May 6, 2020 - C++
Improve this page
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."
ItRunning command git clone -q https://github.com/gunthercox/chatterbot-corpus.git 'C:\Users\user\AppData\Local\Temp\pip-install-2w96myp1\chatterbot-corpus'
ERROR: Error [WinError 2] The system cannot find the file specified while executing command git clone -q https://