Question in LAb 4 pdfMicrosoft Word - Lab 4 - using cross-validation.docx Lab 4 – Using...

Question

Question in LAb 4 pdfMicrosoft Word - Lab 4 - using cross-validation.docx Lab 4 – Using Cross-Validation to Evaluate your past models  Due date: 11:59PM Tuesday June 2nd   In this lab you will use k-fold cross validation on the models you used in the past two labs in order to test how they are  likely to perform with data they have never seen before.  This lab is a chance to practice cross-validation, but also to improve your models according to the feedback from myself  and the TA regarding their strength and/or logic of your past models. If they weren’t done very well, you have  permission to recreate totally different models or better models on a different data set.    You have full permission of course to change your data set from the set of data you used in previous labs.  Please send  me a one or two-line description of your proposed set and a screenshot of the excel file so I get a sense of what it is.  In  the final project you will be using as many tools as possible to explore the same and/or related data sets (typically would  be the covid data sets, but not restricted to this), so I do recommend as much as is possible to stay with a similar topic as  you have been working with to help you with the final project. However, it can also be beneficial to see and experiment  with new data to see the various tools in different contexts so trying something new can be an advantage as well.  Some suggestions on content:  • Choose the top 3* models (or so) you have created in past labs  • Evaluate their test errors with cross-validation techniques comparing with similar models of less and more  compexity  • Use other approximations as well if available (cp, bic, aic) and comment on similarities or differences in the  results these approximations give as compared to the CV process  • Comment on what you believe to be the strength of your model/models  Hints on process:  • Give a brief recap on your models and explain again why they are important.  Make sure you take time to  improve old ones or try something new if you did not score well on a previous lab.  You will still be marked on if  the models and interpretations of those models make sense.  You don’t have to go as in depth this time around,  but the models should still make sense, you should still use a sufficient # of data points (I suggest at least 100,  but really at least 1000 is appropriate given all the data we can have access to), and so on (see rubric for more  details)  • The rest of the lab is more technical than the others: we want to see that:  o you have executed cross-validation properly and understand the difference between test and training  sets  o you understand how to perform k-fold cross validation properly and that you understand it is the  preferred method of calculating realistic (although imperfect) test error rates  o that you may want to use other estimates of test error such as Cp, AIC, BIC, but that you realize these  are not actual test errors, just training errors that have been modified to reflect what the true test  errors likely are.  o You use and understand the one standard error rule to choose the “best” model. Note: there is a 0 (ZERO) tolerance policy on cheating and plagiarism. If any student is found duplicating all or part of  the assignment of another student, they will be sent to the AIO (Academic Integrity Officer).  The AIO will then begin  the process of student discipline as they see fit.  This may include failing the assignment or the course. You will be graded according to the following rubric:                      Grade    Criteria     0 1 2 3  Robustness of  Models  Only 1 or 2  categories on  the far right  completed at all  or all 3  completed  insufficiently  well.  1-3 categories to  the right completed  only reasonably  well  At least 2 of the  categories  described to the  right done quite  well  The student Models:   1. based on sufficient # of data  points (at the very least 100, but  ideally at least 500 or 1,000),   2. interpreted well  3. help the reader understand the  data better, possibly even be of  practical use for decision makers   Model  performance vs.  Model  complexity  The student  hasn’t illustrated  that they  understand the  idea of  complexity vs.  performance  The student’s  performance vs.  complexity graph  contains significant  logical or structural  errors  The student has  stopped short of  creating too many  different models  and so their  performance vs.  complexity graph  has too few points  and is of limited  usefulness  The student has taken a model  that shows promise and tried  various combinations of possible  predictors, and created a proper  performance vs. complexity  graph  Choosing the  “best” model  Little to no  understanding of  the 1SE rule  Understanding and  use of the 1SE rule  is weak  The student seems  to have understood  the 1SE rule, but  the data looks  doubtful  The student has understood and  utilized the 1-standard-error rule  properly   Methods Student has very  little  understanding of  the purpose of  CV or test error  estimation  Student does the  process mainly  correctly but  illustrates in their  language they are  unsure of what CV  is or what Cp, AIC,  BIC etc. are  estimating  The student does  not perform KFCV  quite correctly.   Somewhat  confuses the idea  of the test error  estimate calculated  with CV and those  created by  modification of  training error rates  The student understands that K- Fold CV is best and depends on  this data the most.  They realize  Cp, AIC, BIC, etc. are estimates  based on modifications of  calculated training error and use  these to confirm their KFCV      Lab 2 – Data Analytics  Professor. Scott Flemming  Student. Sleiman Yammine  B00819918 Which of the two most impacted provinces in Canada are flattening the  curve of Covid-19 more efficiently? We have seen Covid-19 cases begin to decrease drastically around the world and cities slowly  beginning to reopen, and especially in Canada. According to the Chief Public Health Officer of  Canada Dr. Theresa Tam, an estimated 50% of all Covid-19 cases in Canada have fully  recovered; however, which provinces are doing the maximum for it's people and which isn't. Link: 50% Recovered in Canada    In the first pie chart, we can see the case distribution among major provinces (where cases are  >1000). Each percentage represents a percentage where the total cases in Canada are located. We can see that the two major hotspots in Canada are: Ontario, and Quebec. For the sake of  demonstration, the information below will include the two major affected provinces and analyze  their respective data while we compare them to each other. All while asking the question: Which of these two provinces is performing more efficiently to flatten the curve of new  cases? A. Infectivity          I.  Quebec: Total infected - 41420 cases, Population (2020) - 8.45 Million           II. Ontario: Total infected - 23147 cases, Population (2020) - 14.57 Million In graph 1, we can see that cases per capita in Quebec is higher than that of Ontario, suggesting  Ontario has been making stronger political policies regarding lock-downs; however, let us see  how Quebec performs in recovery. https://www.citynews1130.com/2020/05/17/50-percent-of-canada-covid-cases-recovered/ https://1.bp.blogspot.com/-9pUP1WxAelU/XsLhmkM12bI/AAAAAAAAEOo/dPNnp4XfhgEp7PFuk0gRo-n8Gl8LgLozACLcBGAsYHQ/s1600/Pie%2Bchart%2Bcases.png   Graph 1 - Cumulative Covid-19 Cases (Ontario Vs. Quebec) B. Recovered Cases          I.  Quebec: Total recovered - 11039 cases, Percentage recovered = 26.65%           II. Ontario: Total recovered - 16641 cases, Percentage recovered = 71.89% In graph 2, we can see that recoveries per capita in Quebec is lower than that of Ontario, can we  suggest that Ontario has a more efficient health care system or is it because of better social  distancing laws within the province? https://1.bp.blogspot.com/-fselFLLEZ9E/XsLl3si86tI/AAAAAAAAEPE/Akk0KawlhBcLR-TaIcJ6PAWA5arrCIkywCLcBGAsYHQ/s1600/OnVsQc-Cases.png   Graph 2 - Cumulative Covid-19 Recoveries (Ontario Vs. Quebec) C. Linear Models - Ontario From Graph 1 and Graph 2, we can begin to see that Ontario is doing a much better job;  however, is Ontario truly flattening the curve by decreasing their new amount of cases per day? https://1.bp.blogspot.com/-6F2mUtO0gFg/XsLl3p8SfKI/AAAAAAAAEPM/kMStslCZ8mMTu6P4B84inZZMijKYTSOIgCPcBGAYYCw/s1600/OnVsQu-Rec.png   Graph 3 - Ontario Covid-19 New Cases In Graph 3, we see the new cases per day starting in March until May. We notice that the total  new cases are decreasing as we move forward in time. If we compare this to a linear relationship  where f(x) = x, (Blue line) the goodness of fit is 66.8% indicating that the model is decreasing  over time (Good News!). If we look closer look to the Red and Green lines, which attempts to  fit the model more appropriately in a decreasing manner and has a goodnight of fit  of 74.5% and 81.87% respectively. This indicated that the cases are in fact decreasing, and  Ontario is successfully flattening the curve. D. Linear Models - Quebec From Graph 1 and Graph 2, we can begin to question whether Quebec is actually flattening the  curve or not. Let's take a close look. https://1.bp.blogspot.com/-uJLJZQcXOC4/XsLl3TqcuPI/AAAAAAAAEPM/He0SHmYWZZYZYVZtwNH-Rx7zPMNRsmd3gCPcBGAYYCw/s1600/Onatario%2BCases%2Bflattening%2Bthe%2Bcurve.png   Graph 4 - Quebec Covid-19 New Cases   In Graph 4, we see the new cases per day starting in March until May. We notice that the total  new cases (much higher than Ontario's) slightly decrease as we move forward in time but not  enough. Let's compare this to a linear relationship where f(x) = x, (Blue line) the goodness of fit  is 72.3% indicating that the model is slowly decreasing over time but it has a higher percentage  of fitting a linear line indicating that Ontario is performing better. Now, if we look at  the Red and Green lines, which attempts to

Microsoft Word - Lab 4 - using cross-validation.docx Lab 4 – Using Cross-Validation to Evaluate your past models Due date: 11:59PM Tuesday June 2nd In this lab you will use k-fold cross validation on...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment