All the questions in the exam are based on NLTK chapter 6:https://www.nltk.org/book/ch06.html(Links to an external site.)You need to be familiar with the chapter to do well in theexam. Your response...

1 answer below »

All the questions in the exam are based on NLTK chapter 6:https://www.nltk.org/book/ch06.html(Links to an external site.)You need to be familiar with the chapter to do well in theexam.



Your response has to be submitted individually.You need to answer both the parts to pass the exam.You should submit both code and report(about2-3pages).





Part A(60 points):
Classifier.This question is based on question 2 in chapter 6.I will reproduce the question below for easy access.



Usingthe Naive Bayes exampledescribed in this chapter, and any features you can think of, build thebest name gender classifieryou can. Begin by splitting theNames Corpusintotwosubsets:10% of thewords for the test set,10% of thewords for the dev-test set, and the remaining80%of thewords for the training set. Then, starting with the example name gender classifier, make incremental improvements. Use the dev-test set to check your progress. Once you are satisfied with your classifier, check its final performance on the test set. How does the performance on the test set compare to the performance on the dev-test set? Is this what you'd expect?




Expectations


As a baseline you can use thegender_features()orgender_features2()functions from the chapterto generate the feature set.But, you need to improve upon thesefeaturesetgenerators. Think of what other features can be added.For instance, character n-grams are good features.I expect to see a function that can yielda newfeature set than thesefunctions. If your codeisbased on thesefunctionsalone, then, you did not do anything newand won’t get any points.


You should explain how your system is different from thecode given in the chapter. You need to report the accuracy, precision, recall,andF-score of the baseline system and your system.You can use sklearn’s precision, recall and F-score to compute the metrics.



Note:The development dataset is mainlyto improve (tuning)yoursystem. You should not tune your feature generator function on test dataset.







Part B (40 points):Apart from describing what you did, your reportshouldanswerthe followingquestions.


How is your feature setgenerator function different fromgender_features()orgender_features2()?


Does your system perform better or worse than the baselineclassifiersystemin the chapter?


Is there a difference intheperformance between the development dataset andthetest dataset?


Doesyour systemshows better performanceif you use a different classifier? You can use one of the classifiers described in the chapter.


Can youprovide fivenamesthatyour system made mistakes on?



Answered 1 days AfterMar 17, 2021

Answer To: All the questions in the exam are based on NLTK chapter 6:https://www.nltk.org/book/ch06.html(Links...

Sandeep Kumar answered on Mar 19 2021
146 Votes
The project begun by splitting the Names Corpus into two subsets: 10% of the words for the test set, 10% of the words for the dev-test set, and the remaining 80% of the words for the training set. Then, starting with the example name gender classifier, incremental improvements were made.
Gender_features2() function was used....
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here