Using the GUM treebank from here:https://github.com/UniversalDependencies/UD_English-GUM/blob/master/en_gum-ud-train.conllu
(Links to an external site.)
The HMMs are well described here in the chapter 8.4. Link here: https://web.stanford.edu/~jurafsky/slp3/8.pdf
Undergrads and graduates: Use the equation in 8.4.3 to implement the emission and transition probabilities. Check equation 3.23 in chapter 3 in the book for implementing both the transition and emission probabilities if you want to add smoothing. Don't forget to add thetoken when computing the transition probabilities.
Implement a greedy tagger. At each step, choose the tag that is the best. You don't have to implement the Viterbi algorithm to find the best tag sequence. At each step, select the tag that is the maximum of the product of the transition probability and the emission probability. Think greedy!
Implement the Viterbi tagger as given in 8.4.5. The backpointer part needs to be implemented for outputting the best sequence.
Reading: Read the section A.4 for worked out examples of the viterbi algorithm.
Don't hesitate to contact me for doubts about your code. Best of luck.
Test your tagger on the test dataset here:https://github.com/UniversalDependencies/UD_English-GUM/blob/master/en_gum-ud-test.conllu(Links to an external site.)
What is the accuracy and F-scores of your tagger? You can use sklearn's metrics to compute the metrics.
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here