Skip to content
#

Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Here are 8,352 public repositories matching this topic...

thomasboris
thomasboris commented Dec 21, 2019

I am going through the GPT2 example in the doc. Is there a mistake in the "Using the past" code. The main loop to generate text is:

for i in range(100):
    print(i)
    output, past = model(context, past=past)
    token = torch.argmax(output[0, :])
    generated += [token.tolist()]
    context = token.unsqueeze(0)

At the first iteration the tensor output as shape [1,

engrsfi
engrsfi commented Dec 10, 2019

Google has started using BERT in its search engine. I imagine it creates embeddings for the query on the search engine, and then find a kind of similarity measure with the potential candidate websites/pages, finally ranking them in search results.

I am curious how do they create embeddings for the documents (the potential candidate websites/pages) if any? Or am I interpreting it wrong?

HendricButz
HendricButz commented Nov 17, 2019

I got a conllU file, from my university, where the head column is filled with .
Processing such file with the cli.convert method will result in a int cast error in
https://github.com/explosion/spaCy/blob/master/spacy/cli/converters/conllu2json.py line 73
in the read_conllx method (head = (int(head) - 1) if head != "0" else id
).

In the format documentation on https://universaldependencie

gensim
ProfJanetDavis
ProfJanetDavis commented Sep 9, 2019

A semicolon ends a complete sentence and should be treated like a period when tagging.

Here's my surprising finding, replicated on Observable:
Screen Shot 2019-09-09 at 2 36 03 PM
The match above is surprising because "mine" starts a new syntactically complete sentence. It is not syntactic

pombredanne
pombredanne commented Aug 28, 2019

The latest versions of Python are more strict wrt. escape in regex.
For instance with 3.6.8, there are 10+ warnings like this one:

...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
    RANGE_RE = re.compile('(-?\d+):(-?\d+)')

The regex(es) should be updated to silence these warnings.

kenkaigu
kenkaigu commented Aug 5, 2019

My feature request is to include an option on a button made from choice skill, to redirect a link to an external url...

Here's a detailed explanation including screenshots

https://help.botpress.io/t/how-to-redirect-to-an-external-url-while-using-a-button-made-in-choice-skill/1791

This option will be really beneficial using choice skill buttons since at the moment, you can only add an ext

DeNeutoy
DeNeutoy commented Jan 15, 2020

#3602 switched our docs from restructured text to markdown, which is a big improvement. However, there are some left over traces of rst formatting in the docstrings. It would be great if we could comb through these and update them.

  • commands
  • common
  • data
  • interpret
  • models
  • modules
  • nn
  • predictors
  • tools
  • training

How to help

ParkerD559
ParkerD559 commented Oct 12, 2019

Despite the documentation here stating:

You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the tokens property.

This fails:

from textblob import TextBlob
from nltk.tokenize import TweetTokenizer

blob = TextBlob("I don't work!", tokenizer=T
ychong
ychong commented Feb 8, 2018

Hi I would like to propose a better implementation for 'test_indices':

We can remove the unneeded np.array casting:

Cleaner/New:
test_indices = list(set(range(len(texts))) - set(train_indices))

Old:
test_indices = np.array(list(set(range(len(texts))) - set(train_indices)))

woshiyyya
woshiyyya commented Jun 23, 2019

I notice that in run_classifier.py, row 409, your code shuffles the input examples.

While in predicting phase, we actually need an ordered input sequence for online submission.

You may add a flag here.

tf.logging.info("Create new tfrecord {}.".format(output_file))
writer = tf.python_io.TFRecordWriter(output_
You can’t perform that action at this time.