Implementing a version of the CYK Parser that supports a grammar with probabilities assigned to each non-terminal’s production rules. The probabilities will be used to break ties in ambiguous parses and to assign an overall probability for the whole sentence.
A parallel program to parse a string of symbols. The inputs are a context-free grammar G in Chomsky Normal Form and a string of symbols. In the end, the program should print yes if the string of symbols can be derived by the rules of the grammar and no otherwise.
NLP implementations like information-theoretic measures of distributional similarity, text preprocessing using shell commands, Naive Bayes text categorization model, Cocke-Younger-Kasami parsing.