Text categorization: given the following document-term matrix: (the value in the matrix represents the frequency of a specific term in that document) T1 T2 T3 T4 T5 T6 T7 T8 doc XXXXXXXXXX doc...


 Text categorization: given the following document-term matrix: (the value in the

matrix represents the frequency of a specific term in that document)

T1 T2 T3 T4 T5 T6 T7 T8

doc1 2 0 4 3 0 1 0 2

doc2 0 2 4 0 2 3 0 0

doc3 4 0 1 3 0 1 0 1

doc4 0 1 0 2 0 0 1 0

doc5 0 0 2 0 0 4 0 0

doc6 1 1 0 2 0 1 1 3

doc7 2 1 3 4 0 2 0 2

Assume that documents have been manually assigned to two pre-specified categories

as follows: Class_1 = {Doc1, Doc2, Doc5}, Class_2 = {Doc3, Doc4, Doc6, Doc7}

(a) Use Naïve Bayes Multinomial Model and Naïve Bayes Bernoulli Model to

respectively calculate how Doc 8 and Doc 9 given above will be classified. Please use

add-one smoothing to process the conditional probabilities in the calculation

T1 T2 T3 T4 T5 T6 T7 T8

doc8 3 1 0 4 1 0 2 1

doc9 0 0 3 0 1 5 0 1

(b) Redo the classification, use the K-Nearest-Neighbor approach for document

categorization with K = 3 to classify the following two new documents. Show

calculation details. Note: no need to normalized the vectors, use raw tf*idf for the

weight of each term and use cosine similarity for computing similarities.

(c) Redo the classification, use the Rocchio-Based vector space model to

determine how Doc 8 and Doc 9 will be classified. As (b), use non-normali zed vectors,

and raw tf*idf for the weights of each term and cosine similarity.



Jun 06, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here