📚 A practical approach to machine learning.
-
Updated
Jan 27, 2020 - Jupyter Notebook
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
📚 A practical approach to machine learning.
AiLearning: 机器学习 - MachineLearning - ML、深度学习 - DeepLearning - DL、自然语言处理 NLP
Google has started using BERT in its search engine. I imagine it creates embeddings for the query on the search engine, and then find a kind of similarity measure with the potential candidate websites/pages, finally ranking them in search results.
I am curious how do they create embeddings for the documents (the potential candidate websites/pages) if any? Or am I interpreting it wrong?
What makes HanLP different than the majority of OSS projects?
One of the most important factors would be the large scale professional corpora, and the correct way to make use of them.
To have some unique pretrained LM before releasing the beta version would be a cool idea. Don't you think so?
I got a conllU file, from my university, where the head column is filled with .
Processing such file with the cli.convert method will result in a int cast error in
https://github.com/explosion/spaCy/blob/master/spacy/cli/converters/conllu2json.py line 73
in the read_conllx method (head = (int(head) - 1) if head != "0" else id).
In the format documentation on https://universaldependencie
Oxford Deep NLP 2017 course
Your new Mentor for Data Science E-Learning.
The usage example in the word2vec.py doc-comment regarding KeyedVectors uses inconsistent paths and thus doesn't work.
If vectors were saved to a tm
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
In README.md of stanford-tensorflow-tutorials/assignments/chatbot/ directory. The hyper-link https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/ is not currently in working state.
Same problem duplicated in chatbot.py comments.
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
The latest versions of Python are more strict wrt. escape in regex.
For instance with 3.6.8, there are 10+ warnings like this one:
...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
RANGE_RE = re.compile('(-?\d+):(-?\d+)')
The regex(es) should be updated to silence these warnings.
My feature request is to include an option on a button made from choice skill, to redirect a link to an external url...
Here's a detailed explanation including screenshots
This option will be really beneficial using choice skill buttons since at the moment, you can only add an ext
#3602 switched our docs from restructured text to markdown, which is a big improvement. However, there are some left over traces of rst formatting in the docstrings. It would be great if we could comb through these and update them.
Right now, the installation section tells you how to install, but not how to upgrade your installed version. we should add that info to the documentation. Inspired by this forum post
I love the cool visualization of NER results from http://corenlp.run/
I want to change some part of it but do not know whether there are some API docs.
For example, just highlight some kinds of NER label.
Thanks for your reply~
Despite the documentation here stating:
You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the tokens property.
This fails:
from textblob import TextBlob
from nltk.tokenize import TweetTokenizer
blob = TextBlob("I don't work!", tokenizer=T
bert-as-service?README.md?README.md?all kinds of text classification models and more with deep learning
Hi I would like to propose a better implementation for 'test_indices':
We can remove the unneeded np.array casting:
Cleaner/New:
test_indices = list(set(range(len(texts))) - set(train_indices))
Old:
test_indices = np.array(list(set(range(len(texts))) - set(train_indices)))
Natural Language Processing Tutorial for Deep Learning Researchers
I notice that in run_classifier.py, row 409, your code shuffles the input examples.
While in predicting phase, we actually need an ordered input sequence for online submission.
You may add a flag here.
tf.logging.info("Create new tfrecord {}.".format(output_file))
writer = tf.python_io.TFRecordWriter(output_TensorFlow 2.x version's Tutorials and Examples, including CNN, RNN, GAN, Auto-Encoders, FasterRCNN, GPT, BERT examples, etc. TF 2.0版入门实例代码,实战教程。
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Extract Keywords from sentence or Replace keywords in sentences.
Created by Alan Turing
I am going through the GPT2 example in the doc. Is there a mistake in the "Using the past" code. The main loop to generate text is:
At the first iteration the tensor output as shape [1,