#
corpus-data
Here are 88 public repositories matching this topic...
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
nlp
data-science
natural-language-processing
big-data
ai
nlu
extract-information
information-extraction
datascience
dataset
papers
corpus-data
nlp-apis
nlp-resources
openie
relation-extraction
natural-language-understanding
open-information-extraction
literature-review
oie-systems
-
Updated
May 20, 2020
Utilities for Processing the Switchboard Dialogue Act Corpus
dialogue
corpus
corpus-data
corpus-tools
switchboard
dialogues
corpus-processing
dialogue-data
switchboard-corpus
dialogue-act
-
Updated
Sep 10, 2019 - Python
Scraper
-
Updated
Dec 21, 2018 - Python
Reading the data from OPIEC - an Open Information Extraction corpus
nlp
natural-language-processing
wiki
wikipedia
corpus
information-extraction
dataset
corpora
corpus-data
nlp-resources
wikipedia-dump
corpus-tools
natural-language-understanding
open-information-extraction
dataset-interface
wikipedia-corpus
corpus-processing
nlp-datasets
-
Updated
Jun 12, 2019 - Java
Clean corpus generic script made with tm package
-
Updated
Oct 3, 2018 - R
datasets with text data for use in NLP, Text analysis, information extraction, ML research.
nlp
data-science
machine-learning
text-mining
news
politics
text-classification
pandas-dataframe
sklearn
corpus
text-analysis
journalism
pytorch
data-journalism
dataset
political-science
india
corpus-data
nlg-dataset
nlp-datasets
-
Updated
Feb 1, 2019 - Jupyter Notebook
Korean ASR Corpus generated from TEDx talks
-
Updated
Jan 11, 2019
Utilities for Processing the Meeting Recorder Dialogue Act Corpus
-
Updated
Sep 10, 2019 - Python
nlp
natural-language-processing
big-data
wiki
wikipedia
bigdata
information-extraction
text-processing
corpus-linguistics
corpus-data
nlp-apis
nlp-resources
corpus-generator
corpus-builder
corpus-tools
natural-language-understanding
open-information-extraction
wikipedia-corpus
corpus-processing
nlp-datasets
-
Updated
Jun 12, 2019 - Java
Vietnamese Wikipedia Corpus
-
Updated
May 18, 2017 - Python
GermaParl: Corpus of Plenary Protocols of the German Bundestag (TEI Format)
-
Updated
Nov 6, 2017
Data from a corpus of written Hawaiian
frequency
corpus
hawaii
n-grams
corpora
stoplist
ngram
stopwords
corpus-linguistics
hawaiian
hawaiian-language
corpus-data
ulukau
olelo-hawaii
hawaiian-electronic-library
frequency-list
bigrams
-
Updated
Jun 27, 2016
Tunisian Sentiment Analysis Corpus.
-
Updated
Feb 17, 2017
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
-
Updated
Aug 24, 2018 - Python
Repository dedicated to a collection of resources and helping material for Urdu language Processing related tasks
corpus
open-data
awesome-list
corpus-data
research-paper
urdu
urdu-nlp
urdu-text-processsing
urdu-language
-
Updated
Oct 24, 2019
simple bs4 based web crawl for a corpus in need of statistical machine translation
nlp
natural-language-processing
translation
machine-translation
amharic
corpus-linguistics
corpus-data
amharic-corpus
ethiopian-languages
-
Updated
Dec 14, 2017 - Python
A Lightweight Tool for Annotating Discourse Relations and Sentence Reordering
nlp
natural-language-processing
annotation
corpus
corpus-linguistics
corpus-data
corpus-tools
annotating-discourse-relations
-
Updated
May 16, 2020 - JavaScript
My public domain speech index
-
Updated
Sep 19, 2019
Build an n-way multilingual corpus
-
Updated
Jan 30, 2017 - Python
This repository contains the DFKI Product Corpus, a dataset of 174 documents annotated for product and company named entities, and the relation CompanyProvidesProduct.
nlp
natural-language-processing
corpus
information-extraction
english
dataset
named-entity-recognition
corpus-data
ner
relation-extraction
-
Updated
May 19, 2020
This is project material for the Antisemitism Datathon and Hackathon 2020 at Indiana University
python
machine-learning
social-media
twitter
tensorflow
pytorch
spacy
nltk
corpus-data
flair
hatespeech
antisemitism
-
Updated
May 17, 2020
Kumpulan dokumen korpus dalam bahasa Indonesia berisi kasus uji deteksi plagiarisme eksternal dengan standar PAN CLEF (http://www.uni-weimar.de/medien/webis/events/pan-11).
corpus
corpus-data
indonesian-language
bahasa-indonesia
plagiarism-detection
plagiarism-detector
plagiarism-provider
korpus-plagiarisme-indonesia
plagiarism-evaluation
-
Updated
Aug 8, 2016 - Python
-
Updated
Sep 18, 2017 - R
-
Updated
Mar 9, 2019 - JavaScript
An annotated corpus of discussion forum threads from Massive Open Online Courses.
taxonomy
annotations
learning-analytics
crowdsourcing
discourse-analysis
computational-linguistics
intervention
corpus-data
moocs
coursera-discussion-forums
education-data
education-technology
discussion-forum-data
mooc-forum
transactivity
discussion-forum-threads
-
Updated
Jul 6, 2019 - Perl
Improve this page
Add a description, image, and links to the corpus-data topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the corpus-data topic, visit your repo's landing page and select "manage topics."