Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

I am going through the GPT2 example in the doc. Is there a mistake in the "Using the past" code. The main loop to generate text is:

for i in range(100):
    print(i)
    output, past = model(context, past=past)
    token = torch.argmax(output[0, :])
    generated += [token.tolist()]
    context = token.unsqueeze(0)

At the first iteration the tensor output as shape [1,

Google has started using BERT in its search engine. I imagine it creates embeddings for the query on the search engine, and then find a kind of similarity measure with the potential candidate websites/pages, finally ranking them in search results.

I am curious how do they create embeddings for the documents (the potential candidate websites/pages) if any? Or am I interpreting it wrong?

What makes HanLP different than the majority of OSS projects?
One of the most important factors would be the large scale professional corpora, and the correct way to make use of them.
To have some unique pretrained LM before releasing the beta version would be a cool idea. Don't you think so?

I got a conllU file, from my university, where the head column is filled with .
Processing such file with the cli.convert method will result in a int cast error in
https://github.com/explosion/spaCy/blob/master/spacy/cli/converters/conllu2json.py line 73
in the read_conllx method (head = (int(head) - 1) if head != "0" else id).

In the format documentation on https://universaldependencie

The usage example in the word2vec.py doc-comment regarding KeyedVectors uses inconsistent paths and thus doesn't work.

https://github.com/RaRe-Technologies/gensim/blob/e859c11f6f57bf3c883a718a9ab7067ac0c2d4cf/gensim/models/word2vec.py#L73

https://github.com/RaRe-Technologies/gensim/blob/e859c11f6f57bf3c883a718a9ab7067ac0c2d4cf/gensim/models/word2vec.py#L76

If vectors were saved to a tm

In README.md of stanford-tensorflow-tutorials/assignments/chatbot/ directory. The hyper-link https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/ is not currently in working state.

Same problem duplicated in chatbot.py comments.

A semicolon ends a complete sentence and should be treated like a period when tagging.

Here's my surprising finding, replicated on Observable:

The match above is surprising because "mine" starts a new syntactically complete sentence. It is not syntactic

The latest versions of Python are more strict wrt. escape in regex.
For instance with 3.6.8, there are 10+ warnings like this one:

...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
    RANGE_RE = re.compile('(-?\d+):(-?\d+)')

The regex(es) should be updated to silence these warnings.

My feature request is to include an option on a button made from choice skill, to redirect a link to an external url...

Here's a detailed explanation including screenshots

https://help.botpress.io/t/how-to-redirect-to-an-external-url-while-using-a-button-made-in-choice-skill/1791

This option will be really beneficial using choice skill buttons since at the moment, you can only add an ext

Forgive me, this is purely out of curiosity. If I had a document embedding that is created from the average of all word embeddings created by BERT. Could I do the inverse and take that ending embedding and generate a word or even a sentence from it?

#3602 switched our docs from restructured text to markdown, which is a big improvement. However, there are some left over traces of rst formatting in the docstrings. It would be great if we could comb through these and update them.

How to help

Right now, the installation section tells you how to install, but not how to upgrade your installed version. we should add that info to the documentation. Inspired by this forum post

I love the cool visualization of NER results from http://corenlp.run/
I want to change some part of it but do not know whether there are some API docs.
For example, just highlight some kinds of NER label.
Thanks for your reply~

Despite the documentation here stating:

You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the tokens property.

This fails:

from textblob import TextBlob
from nltk.tokenize import TweetTokenizer

blob = TextBlob("I don't work!", tokenizer=T

[x ] Are you running the latest bert-as-service?
[x ] Did you follow the installation and the usage instructions in README.md?
[x ] Did you check the FAQ list in README.md?
[x ] Did you perform [a cursory searc

Hi I would like to propose a better implementation for 'test_indices':

We can remove the unneeded np.array casting:

Cleaner/New:
test_indices = list(set(range(len(texts))) - set(train_indices))

Old:
test_indices = np.array(list(set(range(len(texts))) - set(train_indices)))

I notice that in run_classifier.py, row 409, your code shuffles the input examples.

While in predicting phase, we actually need an ordered input sequence for online submission.

You may add a flag here.

tf.logging.info("Create new tfrecord {}.".format(output_file))
writer = tf.python_io.TFRecordWriter(output_

Natural language processing

Here are 8,352 public repositories matching this topic...

practicalAI / practicalAI

apachecn / AiLearning

huggingface / transformers

google-research / bert

hankcs / HanLP

explosion / spaCy

oxford-cs-deepnlp-2017 / lectures

virgili0 / Virgilio

RaRe-Technologies / gensim

keon / awesome-nlp

chiphuyen / stanford-tensorflow-tutorials

spencermountain / compromise

bharathgs / Awesome-pytorch-list

nltk / nltk

botpress / botpress

flairNLP / flair

allenai / allennlp

How to help

RasaHQ / rasa

stanfordnlp / CoreNLP

sloria / TextBlob

hanxiao / bert-as-service

brightmart / text_classification

nfmcclure / tensorflow_cookbook

NLPchina / ansj_seg

graykode / nlp-tutorial

crownpku / Awesome-Chinese-NLP

zihangdai / xlnet

dragen1860 / TensorFlow-2.x-Tutorials

brightmart / nlp_chinese_corpus

vi3k6i5 / flashtext

Related Topics