Skip to content
#

bigrams

Here are 78 public repositories matching this topic...

ez-sherlock
ez-sherlock commented Mar 21, 2022

Right now the tokenize() function is splitting whenever a ' . ' character is found. Most of the time it's a correct approach to split a fine into sentences but sometimes the abbreviation like Dr., Mr., Mrs, etc. appear in a middle of a sentence and hence splits the sentence right there. I want to enhance the regex to not to spit the sentences on abbreviations.

enhancement help wanted good first issue

The goal of this script is to implement three langauge models to perform sentence completion, i.e. given a sentence with a missing word to choose the correct one from a list of candidate words. The way to use a language model for this problem is to consider a possible candidate word for the sentence at a time and then ask the language model which version of the sentence is the most probable one.

  • Updated Apr 6, 2019
  • Python

Predicting next word with Natural Language Processing. Being able to predict what word comes next in a sentence is crucial when writing on portable devices that don't have a full size keyboard. However the same techniques used in texting application can be applied to a variety of other applications, for example: genomics by segmenting DNA, sequences speech recognition, automatic language translation or even as one student in the course suggested music sequence prediction.

  • Updated Mar 26, 2019
  • HTML

Improve this page

Add a description, image, and links to the bigrams topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bigrams topic, visit your repo's landing page and select "manage topics."

Learn more