Submit a juypter notebook named as __Assignment4.ipynb
You are given a csv file calledtweets.csvwhich contain a sample collection of tweets from Donald Trump and Justin Trudeau. The file has two columns, the second column represents tweets, and first column represents the person who tweeted.
1.Separate the tweets from Justin Trudeau and Donald Trump.[10 points]
2.Your goal is to train word2vec model (use CBOW method) on tweets by Trump and Trudeau separately[60 points]
a.Use gensim package to train the word2vec models.
b.Use NLTK package casual_tokenize to tokenize the tweets. Need to convert all tokens to lower case.
c.Use length of word vector as 10.
3.After training the two word2vec models[30 points]
a.Display the vocabulary for each model
b.Select a few tokens from each model and display the most similar words.
Select a few tokens from each model and display the vectors