Text categorization: given the following document-term matrix: (the value in thematrix represents the frequency of a specific term in that document)T1 T2 T3 T4 T5 T6 T7 T8doc1 2 0 4 3 0 1 0 2doc2 0 2 4 0 2 3 0 0doc3 4 0 1 3 0 1 0 1doc4 0 1 0 2 0 0 1 0doc5 0 0 2 0 0 4 0 0doc6 1 1 0 2 0 1 1 3doc7 2 1 3 4 0 2 0 2Assume that documents have been manually assigned to two pre-specified categoriesas follows: Class_1 = {Doc1, Doc2, Doc5}, Class_2 = {Doc3, Doc4, Doc6, Doc7}(a) Use Naïve Bayes Multinomial Model and Naïve Bayes Bernoulli Model torespectively calculate how Doc 8 and Doc 9 given above will be classified. Please useadd-one smoothing to process the conditional probabilities in the calculationT1 T2 T3 T4 T5 T6 T7 T8doc8 3 1 0 4 1 0 2 1doc9 0 0 3 0 1 5 0 1(b) Redo the classification, use the K-Nearest-Neighbor approach for documentcategorization with K = 3 to classify the following two new documents. Showcalculation details. Note: no need to normalized the vectors, use raw tf*idf for theweight of each term and use cosine similarity for computing similarities.(c) Redo the classification, use the Rocchio-Based vector space model todetermine how Doc 8 and Doc 9 will be classified. As (b), use non-normali zed vectors,and raw tf*idf for the weights of each term and cosine similarity.
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here