natural-language-processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

It looks like our --label_smoothing_factor Trainer's feature doesn't handle fp16 well. It's a problem with the deepspeed zero3 I'm integrating right now, since it evals in fp16, but also can be reproduced with the recently added --fp16_full_eval trainer option.

To reproduce:

export BS=16; rm -r output_dir; PYTHONPATH=src USE_TF=0 CUDA_VISIBLE_DEVICES=0 python examples/seq2seq/run_seq2

Change tensor.data to tensor.detach() due to
pytorch/pytorch#6990 (comment)
tensor.detach() is more robust than tensor.data.

@gojomo

Not a high-priority at all, but it'd be more sensible for such a tutorial/testing utility corpus to be implemented elsewhere - maybe under /test/ or some other data- or doc- related module – rather than in gensim.models.word2vec.

Originally posted by @gojomo in RaRe-Technologies/gensim#2939 (comment)

While setting train_parameters to False very often we also may consider disabling dropout/batchnorm, in other words, to run the pretrained model in eval mode.
We've done a little modification to PretrainedTransformerEmbedder that allows providing whether the token embedder should be forced to eval mode during the training phase.

Do you this feature might be handy? Should I open a PR?

Current pytorch implementation ignores the argument split_f in the function train_batch_ch13 as shown below.

def train_batch_ch13(net, X, y, loss, trainer, devices):
    if isinstance(X, list):
        # Required for BERT Fine-tuning (to be covered later)
        X = [x.to(devices[0]) for x in X]
    else:
        X = X.to(devices[0])
...

Todo: Define the argument `

Hello spoooopyyy hackers 🎃

This is a Hacktoberfest only issue! 👻

This is also data-sciency!

The Problem

Our English dictionary contains words that aren't English, and does not contain common English words.

Examples of non-common words in the dictionary:

"hlithskjalf",
  "hlorrithi",
  "hlqn",
  "hm",
  "hny",
  "ho",
  "hoactzin",
  "hoactzine

natural-language-processing

Here are 7,275 public repositories matching this topic...

huggingface / transformers

google-research / bert

GokuMohandas / madewithml

hankcs / HanLP

d2l-ai / d2l-zh

explosion / spaCy

sebastianruder / NLP-progress

oxford-cs-deepnlp-2017 / lectures

ShusenTang / Dive-into-DL-PyTorch

RaRe-Technologies / gensim

keon / awesome-nlp

bharathgs / Awesome-pytorch-list

RasaHQ / rasa

flairNLP / flair

chiphuyen / stanford-tensorflow-tutorials

allenai / allennlp

nltk / nltk

d2l-ai / d2l-en

kmario23 / deep-learning-drizzle

hanxiao / bert-as-service

eugeneyan / applied-ml

graykode / nlp-tutorial

stanfordnlp / CoreNLP

clips / pattern

ludwig-ai / ludwig

sloria / TextBlob

huggingface / datasets

Ciphey / Ciphey

The Problem

lazyprogrammer / machine_learning_examples

PaddlePaddle / models