huggingface / transformers

Star

Open

DeBERTa V3 Fast Tokenizer

10

ikergarcia1996 commented Dec 10, 2021

🚀 Feature request

Fast Tokenizer for DeBERTA-V3 and mDeBERTa-V3

Motivation

DeBERTa V3 is an improved version of DeBERTa. With the V3 version, the authors also released a multilingual model "mDeBERTa-base" that outperforms XLM-R-base. However, DeBERTa V3 currently lacks a FastTokenizer implementation which makes it impossible to use with some of the example scripts (They require a Fa

Make `CLIPFeatureExtractor` accept batch of images as `torch.Tensor`.

6

Open

Encapsulate all forward passes of integration tests with "with torch.no_grad()"

6

Find more good first issues

explosion / spaCy

Star

💫 Industrial-strength Natural Language Processing (NLP) in Python

python nlp data-science machine-learning natural-language-processing ai deep-learning neural-network text-classification cython artificial-intelligence spacy named-entity-recognition neural-networks nlp-library tokenization entity-linking

Updated Dec 31, 2021
Python

bharathgs / Awesome-pytorch-list

Star

A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

python nlp data-science machine-learning natural-language-processing awesome facebook computer-vision deep-learning neural-network cv tutorials pytorch awesome-list utility-library probabilistic-programming papers nlp-library pytorch-tutorials pytorch-model

Updated Dec 30, 2021

FudanNLP / fnlp

Star

中文自然语言处理工具包 Toolkit for Chinese natural language processing

java nlp-library fudannlp fnlp

Updated Dec 16, 2018
Java

fastnlp / fastNLP

Star

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

natural-language-processing deep-learning text-classification chinese-nlp text-processing nlp-parsing nlp-library

Updated Dec 6, 2021
Python

xavier-zy / Awesome-pytorch-list-CNVersion

Star

Awesome-pytorch-list 翻译工作进行中......

python nlp machine-learning facebook computer-vision deep-learning neural-network cv tutorials pytorch utility-library probabilistic-programming papers nlp-library pytorch-tutorials pytorch-models data-sicence awsome-pytorch-list cnversion

Updated Jul 26, 2021
Jupyter Notebook

deepset-ai / FARM

Star

🏡 Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

nlp deep-learning pytorch question-answering transfer-learning pretrained-models language-models ner nlp-library bert nlp-framework roberta xlnet-pytorch germanbert

Updated Nov 23, 2021
Python

chrismattmann / tika-python

Sponsor

Star

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Updated Oct 7, 2021
Python

thunlp / OpenPrompt

Star

An Open-Source Framework for Prompt-Learning.

nlp natural-language-processing ai deep-learning prompt pytorch transformer prompt-toolkit nlp-library nlp-machine-learning prompts natural-language-understanding pre-trained-model pre-trained-language-models prompt-based-tuning prompt-learning

Updated Dec 27, 2021
Python

undertheseanlp / underthesea

Star

Underthesea - Vietnamese NLP Toolkit

nlp natural-language-processing vietnamese nlp-library vietnamese-nlp

Updated Jan 1, 2022
Python

atilika / kuromoji

Star

Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search

japanese nlp-library part-of-speech-tagger morphological-analyser

Updated Aug 18, 2021
Java

MilaNLProc / contextualized-topic-models

Star

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

nlp embeddings transformer topic-modeling nlp-library nlp-machine-learning bert neural-topic-models text-as-data topic-coherence multilingual-topic-models multilingual-models

Updated Dec 31, 2021
Python

mocobeta / janome

Star

Japanese morphological analysis engine written in pure Python

python japanese-language nlp-library

Updated Nov 21, 2021
Python

PyThaiNLP / pythainlp

Star

Thai Natural Language Processing in Python.

python natural-language-processing thai-language thai soundex nlp-library word-segmentation thai-nlp hacktoberfest thai-nlp-library thai-soundex

Updated Dec 29, 2021
Python

wyounas / homer

Star

Homer, a text analyser in Python, can help make your text more clear, simple and useful for your readers.

python nlp python-library python-script text-analysis python3 nlp-library

Updated Oct 3, 2021
Python

ikawaha / kagome

Sponsor

Star

Self-contained Japanese Morphological Analyzer written in pure Go

japanese tokenizer segmentation korean japanese-language nlp-library hacktoberfest pos-tagging morphological-analysis

Updated Nov 23, 2021
Go

WorksApplications / Sudachi

Star

A Japanese Tokenizer for Business

segmentation nlp-library pos-tagging morphological-analysis

Updated Dec 28, 2021
Java

NorskRegnesentral / skweak

Star

skweak: A software toolkit for weak supervision applied to NLP tasks

python data-science natural-language-processing weak-supervision spacy nlp-library nlp-machine-learning distant-supervision training-data

Updated Dec 25, 2021
Python

cbaziotis / ekphrasis

Star

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

nlp tokenizer text-processing semeval nlp-library word-segmentation spelling-correction tokenization text-segmentation spell-corrector word-normalization

Updated Feb 8, 2021
Python

pemistahl / lingua

Star

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

nlp natural-language-processing natural-language kotlin-library language-detection android-library java-library nlp-library nlp-machine-learning language-recognition language-processing language-identification language-classification

Updated Dec 16, 2021
Kotlin

proycon / pynlpl

Star

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).