natural-language-processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Recently HF trainer was extended to support full fp16 eval via --fp16_full_eval. I'd have expected it to be either equal or faster than eval with fp32 model, but surprisingly I have noticed a 25% slowdown when using it.

This may or may not impact deepspeed as well, which also runs eval in fp16, but we can't compare it to a baseline, since it only runs fp16.

I wonder if someone would like t

Change tensor.data to tensor.detach() due to
pytorch/pytorch#6990 (comment)
tensor.detach() is more robust than tensor.data.

@gojomo

Not a high-priority at all, but it'd be more sensible for such a tutorial/testing utility corpus to be implemented elsewhere - maybe under /test/ or some other data- or doc- related module – rather than in gensim.models.word2vec.

Originally posted by @gojomo in RaRe-Technologies/gensim#2939 (comment)

While setting train_parameters to False very often we also may consider disabling dropout/batchnorm, in other words, to run the pretrained model in eval mode.
We've done a little modification to PretrainedTransformerEmbedder that allows providing whether the token embedder should be forced to eval mode during the training phase.

Do you this feature might be handy? Should I open a PR?

Hello,

It seems when a cached file is saved from calling dataset.map for preprocessing, it gets the user permissions and none of the user's group permissions. As we share data files across members of our team, this is causing a bit of an issue as we have to continually reset the permission of the files. Do you know any ways around this or a way to correctly set the permissions?

Hello spoooopyyy hackers 🎃

This is a Hacktoberfest only issue! 👻

This is also data-sciency!

The Problem

Our English dictionary contains words that aren't English, and does not contain common English words.

Examples of non-common words in the dictionary:

"hlithskjalf",
  "hlorrithi",
  "hlqn",
  "hm",
  "hny",
  "ho",
  "hoactzin",
  "hoactzine

natural-language-processing

Here are 7,314 public repositories matching this topic...

huggingface / transformers

google-research / bert

GokuMohandas / madewithml

hankcs / HanLP

d2l-ai / d2l-zh

explosion / spaCy

sebastianruder / NLP-progress

oxford-cs-deepnlp-2017 / lectures

ShusenTang / Dive-into-DL-PyTorch

RaRe-Technologies / gensim

keon / awesome-nlp

bharathgs / Awesome-pytorch-list

RasaHQ / rasa

flairNLP / flair

chiphuyen / stanford-tensorflow-tutorials

allenai / allennlp

nltk / nltk

eugeneyan / applied-ml

d2l-ai / d2l-en

kmario23 / deep-learning-drizzle

hanxiao / bert-as-service

graykode / nlp-tutorial

stanfordnlp / CoreNLP

clips / pattern

ludwig-ai / ludwig

sloria / TextBlob

huggingface / datasets

Ciphey / Ciphey

The Problem

lazyprogrammer / machine_learning_examples

PaddlePaddle / models