#
corpus
Here are 416 public repositories matching this topic...
Deep Learning and deep reinforcement learning research papers and some codes
-
Updated
Jun 26, 2020
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
-
Updated
Feb 10, 2020 - Python
中文语言理解基准测评 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
benchmark
tensorflow
nlu
glue
corpus
transformers
pytorch
dataset
chinese
pretrained-models
language-model
albert
bert
roberta
chineseglue
-
Updated
Jul 15, 2020 - Python
搜索所有中文NLP数据集,附常用英文NLP数据集
nlp
qa
sentiment-analysis
text-classification
match
machine-translation
text-similarity
corpus
knowledge-graph
chinese
text-summarization
datasets
ner
machine-reading-comprehension
-
Updated
Mar 1, 2020 - Python
OpenData in insurance area for Machine Learning Tasks, 保险行业语料库
machine-learning
natural-language-processing
insurance
chatbot
corpus
dataset
question-answering
natural-language-understanding
qasystem
insuranceqa-corpus-zh
-
Updated
Jul 13, 2018 - Python
微信公众号语料库
nlp
natural-language-processing
corpus
linguistics
weixin
chinese-nlp
corpora
weixin-data
wei-xin
yu-liao
yu-liao-ku
-
Updated
Jan 7, 2019
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
natural-language-processing
information-retrieval
corpus
language-detection
embeddings
named-entity-recognition
normalizer
spell-check
persian-language
stemmer
dependency-parser
persian-nlp
part-of-speech-tagger
morphological-analysis
persian-stemmer
shallow-parser
-
Updated
Jul 21, 2020
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
text-classification
corpus
dataset
chinese
semantic-similarity
pretrained-models
sentence-classification
albert
bert
sentence-analysis
distillation
sentence-pairs
roberta
-
Updated
Jul 8, 2020 - Python
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
nlp
language
translation
corpus
literature
corpus-linguistics
corpus-tools
multi-language-support
corpus-processing
-
Updated
Jul 5, 2020 - Python
A dataset of millions of news articles scraped from a curated list of data sources.
nlp
machine-learning
natural-language-processing
database
corpus
artificial-intelligence
dataset
fakenews
-
Updated
Jan 25, 2020
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
-
Updated
Sep 11, 2019 - HTML
WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
-
Updated
Jan 10, 2018 - Ruby
Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.
-
Updated
Jul 13, 2020 - Python
Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
machine-learning
natural-language-processing
corpus
pytorch
speech-synthesis
dataset
prosody
bert
sequence-labeling
-
Updated
Oct 30, 2019 - Python
-
Updated
Jun 17, 2020 - Python
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
python
nlp
c-plus-plus
library
corpus
linguistics
pattern-recognition
computational-linguistics
text-processing
ngram
ngrams
skipgram
-
Updated
May 6, 2020 - C++
PubMed 200k RCT dataset: a large dataset for sequential sentence classification.
-
Updated
Jul 22, 2018
KH Coder: for Quantitative Content Analysis or Text Mining
-
Updated
Jul 26, 2020 - Perl
Improve this page
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."