natural-language-processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Please provide a barebones "pick up and go" GPT-2 colab notebook for text generation, just like gpt-2-simple does

Google has started using BERT in its search engine. I imagine it creates embeddings for the query on the search engine, and then find a kind of similarity measure with the potential candidate websites/pages, finally ranking them in search results.

I am curious how do they create embeddings for the documents (the potential candidate websites/pages) if any? Or am I interpreting it wrong?

What makes HanLP different than the majority of OSS projects?
One of the most important factors would be the large scale professional corpora, and the correct way to make use of them.
To have some unique pretrained LM before releasing the beta version would be a cool idea. Don't you think so?

I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration ✨ For existing plugins and projects, check out the spaCy universe.

If you have questions about the projects I suggested,

Tutorial: Similarity Queries
https://radimrehurek.com/gensim/auto_examples/core/run_similarity_queries.html#sphx-glr-auto-examples-core-run-similarity-queries-py

Notice the document order in the tutorial:

documents = [
    "Human machine interface for lab abc computer applications",
    "A survey of user opinion of computer system response time",
    "The EPS user interface management

This output is unexpected. The In returns the capitalize In from PorterStemmer's output.

>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()
>>> porter.stem('In')
'In'

More details on https://stackoverflow.com/q/60387288/610569

Train a simple NER tagger for Swedish trained for instance over this dataset.

For this task, we need to adapt the NLPTaskDataFetcher for the appropriate Swedish dataset and train a simple model using Swedish word embeddings. How to train a model is [illustrated here](https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_TRAI

Describe the bug

Calling Predictor.get_gradients() returns an empty dictionary

To Reproduce
I am replicating the binary sentiment classification tasked described in the paper 'Attention is not Explanation ' (Jain and Wallace 2019 - https://arxiv.org/pdf/1902.10186.pdf).

My first experiment is on the Stanford Sentiment TreeBank Dataset. I need to measure the correlation between th

There are a couple of comments in RasaHQ/rasa#5266 that are valid and still open. We should address those.

In general:

add better descriptions to our model parameters, explain what they are and when to modify those
have proper suggestions how to configure certain model parameters
explain when to use what component and policy

There are too many courses on the list, a rating on each course will help much people.

We can use github issues to rate each course.

Line 3756: should be attrs instead of attr

Current: if self.classes.issubset(set([s.lower() for s in e.attr.get("class", [])])) is False:

Should be: if self.classes.issubset(set([s.lower() for s in e.attrs.get("class", [])])) is False:

As per the StanfordCoreNLP documentation for CoreLabel, The functions after() and before() should return white space strings between the token and the next/previous tokens respectively.
However, they return an empty string always even if there are some white spaces when the tokenizer option **normalizeOth

Prerequisites

Please fill in by replacing [ ] with [x].

Are you running the latest bert-as-service?
Did you follow the installation and the usage instructions in README.md?
Did you check the [FAQ list in README.md](https://github.com/hanxiao/bert-as-se

Despite the documentation here stating:

You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the tokens property.

This fails:

from textblob import TextBlob
from nltk.tokenize import TweetTokenizer

blob = TextBlob("I don't work!", tokenizer=T

The diagram in documentation suggest yes, but num_fc_layers and fc_layers are not listed as available parameters as they are for e.g., parallel cnn or stacked cnn.

It does not seem like it is supported based on a few experiments however I am using the RNN encoder inside a sequence combiner, so possibly this is causing problems.

for example, this does not seem to add any fc_layers:

co

Using data points (0,3) (1,4) (2,5) illustrate the fact that "c" (the y-intercept) is locked at zero. Flip the sign and the resultant line tracks the points.

https://github.com/lazyprogrammer/machine_learning_examples/blob/master/best_fit_line.py

环境

system: centos6u3
python: 2.7.14
paddle: 1.5.1.post97
nccl: 2.2.13

问题

使用 FP16 训练 ResNet50，如果添加 label_smooth 则会报错，代码如下：

out = ResNet50().net(input=image, class_dim=class_dim)

epsilon = 0.1
one_hot_label = fluid.layers.one_hot(input=label, depth=class_dim)
smooth_label = fluid.layers.label_smooth(label=one_hot_label, epsilon=epsilon)
cost = fluid.laye

integration of responsive voice uses an older version of my package https://github.com/OpenJarbas/py_responsivevoice

we should either pin an older version of the package/log a warning, or update the TTS engine to support all the new voices

Looking at the following diagram and the code you wrote , which is :

import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.i2h =

Description

There is a readme for the repo metrics subfolder under tools. It needs general review to make sure it is accurate.

Other Comments

Principles of NLP Documentation
Each landing page at the folder level should have a ReadMe which explains -
○ Summary of what this folder offers.
○ Why and how it benefits users
○ As applicable - Documentation of using it, brief d

I guess I should be re-sampling tokenizations on the train data with SP before each epoch, but it would be nice to see a canonical implementation of this in $FRAMEWORK.

natural-language-processing

Here are 5,064 public repositories matching this topic...

huggingface / transformers

google-research / bert

hankcs / HanLP

d2l-ai / d2l-zh

explosion / spaCy

sebastianruder / NLP-progress

oxford-cs-deepnlp-2017 / lectures

RaRe-Technologies / gensim

keon / awesome-nlp

chiphuyen / stanford-tensorflow-tutorials

bharathgs / Awesome-pytorch-list

nltk / nltk

flairNLP / flair

allenai / allennlp

RasaHQ / rasa

ShusenTang / Dive-into-DL-PyTorch

kmario23 / deep-learning-drizzle

clips / pattern

stanfordnlp / CoreNLP

hanxiao / bert-as-service

sloria / TextBlob

uber / ludwig

graykode / nlp-tutorial

lazyprogrammer / machine_learning_examples

PaddlePaddle / models

环境

问题

MycroftAI / mycroft-core

d2l-ai / d2l-en

spro / practical-pytorch

microsoft / nlp-recipes

Description

Other Comments

google / sentencepiece