📚 A practical approach to machine learning to enable everyone to learn, explore and build.
-
Updated
Jan 13, 2020 - Jupyter Notebook
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
📚 A practical approach to machine learning to enable everyone to learn, explore and build.
AiLearning: 机器学习 - MachineLearning - ML、深度学习 - DeepLearning - DL、自然语言处理 NLP
This is a documentation-related bug. In the TransfoXL documentation, the tokenization example is wrong. The snippet goes:
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
...
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1
This code output
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification via one unified interface.
I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration
If you have questions about the projects I suggested,
Oxford Deep NLP 2017 course
Your new Mentor for Data Science E-Learning.
The usage example in the word2vec.py doc-comment regarding KeyedVectors uses inconsistent paths and thus doesn't work.
If vectors were saved to a tm
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
In README.md of stanford-tensorflow-tutorials/assignments/chatbot/ directory. The hyper-link https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/ is not currently in working state.
Same problem duplicated in chatbot.py comments.
When normalizing this text
"the guest-singer mr. smith who was supposed to show up at at seven thirty didn't.
The output that is received is
the guest singer Mr. Smith who was supposed to show up at at 7 30 did not
Expected output
the guest singer Mr. Smith who was supposed to show up at at 7 30 did not.
Sample code
var doc = nlp("the guest-singer mr. smA comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
The latest versions of Python are more strict wrt. escape in regex.
For instance with 3.6.8, there are 10+ warnings like this one:
...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
RANGE_RE = re.compile('(-?\d+):(-?\d+)')
The regex(es) should be updated to silence these warnings.
My feature request is to include an option on a button made from choice skill, to redirect a link to an external url...
Here's a detailed explanation including screenshots
This option will be really beneficial using choice skill buttons since at the moment, you can only add an ext
Train a simple NER tagger for Swedish trained for instance over this dataset.
For this task, we need to adapt the NLPTaskDataFetcher for the appropriate Swedish dataset and train a simple model using Swedish word embeddings. How to train a model is [illustrated here](https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_TRAI
Question
Right now, the installation section tells you how to install, but not how to upgrade your installed version. we should add that info to the documentation. Inspired by this forum post
As per the StanfordCoreNLP documentation for CoreLabel, The functions after() and before() should return white space strings between the token and the next/previous tokens respectively.
However, they return an empty string always even if there are some white spaces when the tokenizer option **normalizeOth
Despite the documentation here stating:
You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the tokens property.
This fails:
from textblob import TextBlob
from nltk.tokenize import TweetTokenizer
blob = TextBlob("I don't work!", tokenizer=T
bert-as-service?README.md?README.md?all kinds of text classification models and more with deep learning
Hi I would like to propose a better implementation for 'test_indices':
We can remove the unneeded np.array casting:
Cleaner/New:
test_indices = list(set(range(len(texts))) - set(train_indices))
Old:
test_indices = np.array(list(set(range(len(texts))) - set(train_indices)))
Natural Language Processing Tutorial for Deep Learning Researchers
Hi, can batchify method only batch a doc in a file, not two docs in the same file? Why the EOD flag not use to distinguish different docs in data_utils.py ?
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
TensorFlow 2.x version's Tutorials and Examples, including CNN, RNN, GAN, Auto-Encoders, FasterRCNN, GPT, BERT examples, etc. TF 2.0版入门实例代码,实战教程。
Extract Keywords from sentence or Replace keywords in sentences.
Created by Alan Turing
Google has started using BERT in its search engine. I imagine it creates embeddings for the query on the search engine, and then find a kind of similarity measure with the potential candidate websites/pages, finally ranking them in search results.
I am curious how do they create embeddings for the documents (the potential candidate websites/pages) if any? Or am I interpreting it wrong?