Language Machines
Grow your team on GitHub
GitHub is home to over 36 million developers use GitHub to host and review code, manage projects, and build software together across more than 100 million repositories.
Sign up for free See pricing for teams and enterprises-
-
ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …
-
-
frogdata
Data for Frog, mandatory
-
foliatest
Test suite for libfolia
-
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
-
mbt
MBT: Memory-based tagger generation and tagging MBT is a memory-based tagger-generator and tagger in one.
-
frogtests
Unit tests for Frog
-
mbttests
Unit tests for Mbt
-
foliautils
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)
-
-
travistest
small program to test travis issues. Like OSX and Clang OpenMP support
-
PICCL
A set of workflows for corpus building through OCR, post-correction and normalisation
-
toad
Toad: Trainer Of All Data, the Frog training collection
-
ticcutils
Ticcutils, a generic utility library shared by our software.
-
wikinerdata
Forked from lidejong/wikinerdataScript to collect data from Wikipedia and automatically annotate the linked named entities with Named Entity type.
-
ticcltools
Tools for TICCL
-
dimbl
Distributed Tilburg Memory Based Learner
-
clariah-plus-tasks
An overview of CLARIAH-PLUS tasks at CLST, Radboud University, Nijmegen
-
timbl
TiMBL implements several memory-based learning algorithms.
-
timblserver
TiMBL implements several memory-based learning algorithms. This is the server part.
-
uctodata
Datafiles for the tokenizer ucto.
-
bp-som
BP-SOM: A hybrid of back-propagation learning in multi-layered perceptrons and self-organizing maps
-
LamaEvents
Lama Events is a calendar application listing events in the near future. The events are detected and selected by a fully automatic procedure in the Dutch Twitter stream.
-
homebrew-lamachine
Forked from fbkarsdorp/homebrew-lamachineBrew formulas for installing NLP software developed by the Language Machines research group
-
timbltests
Unit tests for Timbl
-
LuigiNLP
A workflow system for Natural Language Processing.