natural-language-processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
Here are 5,064 public repositories matching this topic...
Google has started using BERT in its search engine. I imagine it creates embeddings for the query on the search engine, and then find a kind of similarity measure with the potential candidate websites/pages, finally ranking them in search results.
I am curious how do they create embeddings for the documents (the potential candidate websites/pages) if any? Or am I interpreting it wrong?
What makes HanLP different than the majority of OSS projects?
One of the most important factors would be the large scale professional corpora, and the correct way to make use of them.
To have some unique pretrained LM before releasing the beta version would be a cool idea. Don't you think so?
-
Updated
Mar 4, 2020 - Python
I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration
If you have questions about the projects I suggested,
-
Updated
Mar 4, 2020 - Python
-
Updated
Mar 4, 2020
Tutorial: Similarity Queries
https://radimrehurek.com/gensim/auto_examples/core/run_similarity_queries.html#sphx-glr-auto-examples-core-run-similarity-queries-py
Notice the document order in the tutorial:
documents = [
"Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management
-
Updated
Mar 4, 2020
-
Updated
Mar 4, 2020 - Python
-
Updated
Mar 4, 2020
This output is unexpected. The In returns the capitalize In from PorterStemmer's output.
>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()
>>> porter.stem('In')
'In'More details on https://stackoverflow.com/q/60387288/610569
Train a simple NER tagger for Swedish trained for instance over this dataset.
For this task, we need to adapt the NLPTaskDataFetcher for the appropriate Swedish dataset and train a simple model using Swedish word embeddings. How to train a model is [illustrated here](https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_TRAI
Describe the bug
Calling Predictor.get_gradients() returns an empty dictionary
To Reproduce
I am replicating the binary sentiment classification tasked described in the paper 'Attention is not Explanation ' (Jain and Wallace 2019 - https://arxiv.org/pdf/1902.10186.pdf).
My first experiment is on the Stanford Sentiment TreeBank Dataset. I need to measure the correlation between th
There are a couple of comments in RasaHQ/rasa#5266 that are valid and still open. We should address those.
In general:
- add better descriptions to our model parameters, explain what they are and when to modify those
- have proper suggestions how to configure certain model parameters
- explain when to use what component and policy
-
Updated
Mar 4, 2020 - Jupyter Notebook
There are too many courses on the list, a rating on each course will help much people.
We can use github issues to rate each course.
Line 3756: should be attrs instead of attr
Current: if self.classes.issubset(set([s.lower() for s in e.attr.get("class", [])])) is False:
Should be: if self.classes.issubset(set([s.lower() for s in e.attrs.get("class", [])])) is False:
As per the StanfordCoreNLP documentation for CoreLabel, The functions after() and before() should return white space strings between the token and the next/previous tokens respectively.
However, they return an empty string always even if there are some white spaces when the tokenizer option **normalizeOth
Prerequisites
Please fill in by replacing
[ ]with[x].
- Are you running the latest
bert-as-service? - Did you follow the installation and the usage instructions in
README.md? - Did you check the [FAQ list in
README.md](https://github.com/hanxiao/bert-as-se
Despite the documentation here stating:
You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the tokens property.
This fails:
from textblob import TextBlob
from nltk.tokenize import TweetTokenizer
blob = TextBlob("I don't work!", tokenizer=T
The diagram in documentation suggest yes, but num_fc_layers and fc_layers are not listed as available parameters as they are for e.g., parallel cnn or stacked cnn.
It does not seem like it is supported based on a few experiments however I am using the RNN encoder inside a sequence combiner, so possibly this is causing problems.
for example, this does not seem to add any fc_layers:
co
-
Updated
Mar 4, 2020 - Jupyter Notebook
Using data points (0,3) (1,4) (2,5) illustrate the fact that "c" (the y-intercept) is locked at zero. Flip the sign and the resultant line tracks the points.
https://github.com/lazyprogrammer/machine_learning_examples/blob/master/best_fit_line.py
环境
- system: centos6u3
- python: 2.7.14
- paddle: 1.5.1.post97
- nccl: 2.2.13
问题
使用 FP16 训练 ResNet50,如果添加 label_smooth 则会报错,代码如下:
out = ResNet50().net(input=image, class_dim=class_dim)
epsilon = 0.1
one_hot_label = fluid.layers.one_hot(input=label, depth=class_dim)
smooth_label = fluid.layers.label_smooth(label=one_hot_label, epsilon=epsilon)
cost = fluid.layeintegration of responsive voice uses an older version of my package https://github.com/OpenJarbas/py_responsivevoice
we should either pin an older version of the package/log a warning, or update the TTS engine to support all the new voices
-
Updated
Mar 4, 2020 - Python
Description
There is a readme for the repo metrics subfolder under tools. It needs general review to make sure it is accurate.
Other Comments
Principles of NLP Documentation
Each landing page at the folder level should have a ReadMe which explains -
○ Summary of what this folder offers.
○ Why and how it benefits users
○ As applicable - Documentation of using it, brief d
I guess I should be re-sampling tokenizations on the train data with SP before each epoch, but it would be nice to see a canonical implementation of this in $FRAMEWORK.
Created by Alan Turing
- Wikipedia
- Wikipedia

Please provide a barebones "pick up and go" GPT-2 colab notebook for text generation, just like gpt-2-simple does