Problems


CS301-101/103 Midterm Correction Exam/Project - Fall 2021 THIS TEST IS OPTIONAL. IF YOU SUBMIT IT BY THE DEADLINE YOUR NEW MIDTERM GRADE WILL BE THE AVERAGE OF THE EXISTING MIDTERM EXAM GRADE AND THIS MIDTERM EXAM GRADE. THIS MEANS THAT YOUR GRADE MAY GO UP OR DOWN. YOUR EXAM WILL REQUIRE THE SUBMISSION OF TWO ARTIFACTS: A PDF OF THE SCANNED HANDWRITTEN OR TYPED ANSWERS TO ALL QUESTIONS MINUS Q7 AND A URL OF THE NOTEBOOK SOLU- TION TO Q7. NO COMMUNICATION IS ALLOWED AND THOSE FOUND TO HAVE SIMILAR ANSWERS WILL BE CONTACTED AND REPORTED. You are applying for a position at AWS and your interviewer wants to know more about your sentiment classification skills on products sold by amazon.com All reviews are classified by a labeling subcontractor (eg AWS Mechanical Turk) as having positive (? = 1) or negative (? = 0) sentiment. The interviewer asks you to explain all the steps of designing a sentiment classifier based on Logistic Regression. Question 1 - Feature Engineering (10 points) In this step you outline the following as potential features (this is a limited example - we can have many features as in your programming exercise below). 1. ?1 the number of counts of positive words 2. ?2 the number of counts of negative words Write the posterior probability expressions: ?(? = 1|x) ?(? = 0|x) for the problem you are given to solve. 1 Question 2 Decision Boundary (10 points) Write the expression for the decision boundary assuming that ?(? = 1) = ?(? = 0) Question 3 Training - Loss function (10 points) Write the expression of the loss as a function of w that makes sense for you to use in this problem. ??? = NOTE: The loss will be a function that will include this function: ?(?) = 11 + ?−? Question 4 Training - Gradient (10 points) Write the expression of the gradient of the loss with respect to the parameters - show all your work. ∇w??? = Question 5 Naive Bayes (10 points) The interviewer was impressed with your answers up to now but to make your life difficult (its AWS after all) asks you how is Logistic Regression different than a method you have never seen before: Naive Bayes. You are given this wikipedia page https://en.wikipedia.org/wiki/Naive_Bayes_classifier and you are told to focus your attention to the posterior probability differences between the two models. NOTE: Write down the posterior in the log domain to make it amendable to implementation (subsequent question). Also if you copy the wikipedia answer you will be granted exactly 0 points. 2 Question 6 - Imbalanced dataset (10 points) You are now told that in the dataset ?(? = 0) >> ?(? = 1) Can you comment if the accuracy of Logistic Regression will be affected by such imbalance? Question 7 Online Learning (40 points) The interviewer was impressed with your answers and wants to test your pro- gramming skills. You are given access to thousands of Amazon product reviews from here. Please note that the number of reviews in Kaggle may be too many for the free version of Colab which in this case you are advised to sample a subset of reviews. You are asked to develop a system that will demonstrate the ability to update a sentiment prediction model using constantly arriving data over the wire (online learning). To do so, you are told to split the dataset into T groups. The ?-th group is arriving before the ?+1-th group to your classification system. The group index ? has therefore time semantics. 1. Use each arriving group to train a logistic regressor that will classify sen- timent. 2. Update the model after observing each arriving group. 3. Report the harmonic mean of precision (p) and recall (r) i.e the metric called ?1 score that is calculated as shown below using a test dataset that is 20% of each group. Plot the ?1 score vs the index ?. ?1 = 2 ?−1 + ?−1 5 bonus points: Your code includes hyperparameter optimization of the learn- ing rate and mini batch size. You can learn about cross validation which is a splitting strategy for tuning models here. 3 https://www.kaggle.com/snap/amazon-fine-food-reviews/version/2 https://scikit-learn.org/stable/modules/cross_validation.html CS301-101/103 Midterm Correction Exam/Project - Fall 2021 Question 1 - Feature Engineering (10 points) Question 2 Decision Boundary (10 points) Question 3 Training - Loss function (10 points) Question 4 Training - Gradient (10 points) Question 5 Naive Bayes (10 points) Question 6 - Imbalanced dataset (10 points) Question 7 Online Learning (40 points)
Answered 2 days AfterNov 25, 2021

Answer To: Problems

Pritam Kumar answered on Nov 28 2021
110 Votes
Sn
1 PosEt&e Sentmnont
y0 Jogate Sentm0nt
PDtVe tdrd
ho db
Te Pes terfos Br babsWty exPyos
ions eoge ti Rugaunien
Py4/x) 1
1+ exP- utb,)
ohere
w LAe vectrs og
t/enmuler br 1
QhR b �s tha
o{4ef as
1
1+ exp
- o2t b2)
shene
o is the
vectrs o6 Pavonnete
trn 2
and bo4 the
oset
rsas
ssumeng PC 1) P(y=0),
e exTesion tr
deeition
Assum
nr
PL) 70.s,
elass 1 CPoRetde Sentinnti)
PU) 0S,
cos o (egakse
Sentnentu)
ondy
decisionx)
-
1
Ply=4l«) 70g
n3
es btuuctaon
oare y Xtb)
1tee
ce r lg s, (o»x +b) +...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here