Skip to content
#

natural-language-processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Here are 5,064 public repositories matching this topic...

engrsfi
engrsfi commented Dec 10, 2019

Google has started using BERT in its search engine. I imagine it creates embeddings for the query on the search engine, and then find a kind of similarity measure with the potential candidate websites/pages, finally ranking them in search results.

I am curious how do they create embeddings for the documents (the potential candidate websites/pages) if any? Or am I interpreting it wrong?

ines
ines commented Sep 29, 2019

I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration For existing plugins and projects, check out the spaCy universe.

If you have questions about the projects I suggested,

gensim
michaeljneely
michaeljneely commented Jan 26, 2020

Describe the bug

Calling Predictor.get_gradients() returns an empty dictionary

To Reproduce
I am replicating the binary sentiment classification tasked described in the paper 'Attention is not Explanation ' (Jain and Wallace 2019 - https://arxiv.org/pdf/1902.10186.pdf).

My first experiment is on the Stanford Sentiment TreeBank Dataset. I need to measure the correlation between th

rasa
ghost
ghost commented Aug 9, 2017

Line 3756: should be attrs instead of attr

Current: if self.classes.issubset(set([s.lower() for s in e.attr.get("class", [])])) is False:

Should be: if self.classes.issubset(set([s.lower() for s in e.attrs.get("class", [])])) is False:

ParkerD559
ParkerD559 commented Oct 12, 2019

Despite the documentation here stating:

You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the tokens property.

This fails:

from textblob import TextBlob
from nltk.tokenize import TweetTokenizer

blob = TextBlob("I don't work!", tokenizer=T
ludwig
BenMacKenzie
BenMacKenzie commented Mar 12, 2019

The diagram in documentation suggest yes, but num_fc_layers and fc_layers are not listed as available parameters as they are for e.g., parallel cnn or stacked cnn.

It does not seem like it is supported based on a few experiments however I am using the RNN encoder inside a sequence combiner, so possibly this is causing problems.

for example, this does not seem to add any fc_layers:

co

mzchtx
mzchtx commented Jul 24, 2019

环境

  • system: centos6u3
  • python: 2.7.14
  • paddle: 1.5.1.post97
  • nccl: 2.2.13

问题

使用 FP16 训练 ResNet50,如果添加 label_smooth 则会报错,代码如下:

out = ResNet50().net(input=image, class_dim=class_dim)

epsilon = 0.1
one_hot_label = fluid.layers.one_hot(input=label, depth=class_dim)
smooth_label = fluid.layers.label_smooth(label=one_hot_label, epsilon=epsilon)
cost = fluid.laye
heatherbshapiro
heatherbshapiro commented Aug 14, 2019

Description

There is a readme for the repo metrics subfolder under tools. It needs general review to make sure it is accurate.

Other Comments

Principles of NLP Documentation
Each landing page at the folder level should have a ReadMe which explains -
○ Summary of what this folder offers.
○ Why and how it benefits users
○ As applicable - Documentation of using it, brief d

Created by Alan Turing

Wikipedia
Wikipedia
You can’t perform that action at this time.