·5 images or so*
·3 or more regression models*
o*of course the number of images and models depends on your argument, how in-depth you go into each image and model. If you have only 1 very sophisticated model and go very much in depth with many images, perhaps 1 model is sufficient.See rubric below for the grading scheme.Otherwise, perhaps you have 3 models with only 1 or 2 images each that help with the point you are trying to make.
Lab 2 – Using Regression Models on your Data Due date: 11:59PM Sunday May 16th In Lab 1 you explored basic aspects of your data. Now that you have had some practice understanding the use and interpretation of Regression models, it is time to apply such models to your data of interest. You have full permission of course to change your data set from the set of data you used in Lab 1 if you find there is little of interest to write about. A reminder also that you are not expected to have all the answers, just very good questions and observations at this point. A convincing argument will likely include: · 5 images or so* · 3 or more regression models* · *of course the number of images and models depends on your argument, how in-depth you go into each image and model. If you have only 1 very sophisticated model and go very much in depth with many images, perhaps 1 model is sufficient. See rubric below for the grading scheme. Otherwise, perhaps you have 3 models with only 1 or 2 images each that help with the point you are trying to make. Hints on process: · State your “thesis” clearly and at the introduction and conclusion of your lab. What are you trying to prove? (For example, Perhaps I suggest that Nova Scotia did a good job on containing COVID as compared to other provinces and some other countries. If this is your thesis statement, state it plainly up front, then prove it in the body of your writing, and repeat this at the conclusion.) · Take time to explain the models and the conclusions that can be drawn from them. · Do all the technical work you need to do but put this evidence in the appendix at the end of the document · Use the body of the report to show pictures of interesting data, introduce model summaries, explain which variables are significant, and attempt to interpret the data. · Interpretation: by this I mean explain why your results are interesting or important. · Example: some model may show that asthma affects 0 to 19 year-old males in Cumberland county more than other counties or other age ranges. This is an interesting and important result that our client (the Municipal and Provincial Government) would be interested in knowing. This may affect their policy on medicare in that county. · Include your R studio Script as an appendix to your submission; you can also include it as a “.r” file in the assignment dropbox if that is helpful for you. Note: there is a 0 (ZERO) tolerance policy on cheating and plagiarism. If any student is found duplicating all or part of the assignment from another source (internet or student etc.), they will be sent to the AIO (Academic Integrity Officer). The AIO will then begin the process of student discipline as they see fit. This may include failing the assignment or the course. Students will be given a grade from 0-3 in the following areas. An excellent report (3/3) will contain the following elements that satisfy these characteristics: 1. Images: The student has identified an interesting result and plotted / displayed it well to illustrate the point. Students use the appropriate plots (e.g. boxplots when appropriate, bar charts when appropriate, scatter plots when appropriate). Student also describes in words the plot as necessary (x variables, y variable, etc.) and interpret them appropriately. 2. Models: The regression models have been applied to the data properly and significance of variables etc. has been interpreted properly. (e.g. the student does not see a p-value of .2 as significant and claim that there is evidence that such a predictor (x) influences the response variable (y). a. See “lessons learned from previous years” document on BS. Common pitfalls are shown there. The best lesson being a model that attempts to predict the tonnage of recyclables based upon total tonnage of garbage in a municipality looks like this: b. Recyc = -6,425 + .739*Garbage c. This DOES NOT mean that 73.9% of recyclables are garbage. This models means that there is a correlation between recyclables and garbage and that as garbage increases by 1 tonne then recyclables increase by .739 tonnes. 3. Interpretations: The student has interpreted the results well. They understand the limitations of what can be concluded but also the strength of the model and what can be safely claimed. E.g. a student understands an R^2 value of 40% is not overly strong but shows some evidence that there is a relationship between the response and predictors and that their model deserves further exploration. They also understand the difference between correlation and causation: e.g. just because ice cream sales increase when shark attacks increase does not mean that ice cream attracts sharks. It could, but it also could mean that hot weather causes both of these things to increase. NOTE: depending on the type of data you use at 40% R^2 value could be very bad or could be very good. It all depends on the degree of control the person gathering the data had over the data. If it is a highly controlled scientific experiment then 40% is likely bad. If it is publicly available data that was not part of an experiment, then 40% is possibly quite good. See grading rubric on the next page: You will be graded according to the following rubric: Grade Criteria 0 1 2 3 Plots – detailed and easy to read Many errors where it means that the reader must work very hard to guess what the author may be saying Errors are common on the author’s plots making it difficult to interpret and/or read values from the graphic Generally, plots are easy to read with only a few small errors or 1-2 larger errors. Tick marks, scale, labels etc. are done with very few errors (only 1 or 2 very small errors). Plots:interpretation Mostly inappropriate or simplistic plots and interpretations. Little insight shown. Plots and interpretations “shot from the hip” and show promise but too little detail and deep understanding not shown Student didn’t always use best plots for the context. Interpretations solid, but could have gone further Student has used the proper plots for the data and with regards to the question they are asking. They have interpreted them well and show interesting insights that are easy to read Data: proper values, comparisons A student has practically no concern for being sure to compare or contrast values and variables that are fair to compare Students compare various things without thinking about if they are fair comparisons (apples to apples) or not Students compare values that make sense from a gut level but under closer inspection should have been more precise Students compare values/performance properly (e.g. using per capita measures instead of total # incidents when appropriate) Data: sufficient data, data quality Data is quite poor, or interpreted so poorly that doing any work with it seems meaningless Obvious issues with the data (too few points or of poor quality/confusing character There starts to be doubt about the quality of the data and the # of data points in various plots or regression models There are sufficient data points so that claims made in the report are valid (ideally >= 1,000). If there are much fewer points than this, students justify why they still used this data set Regression Models: comparing relevant variables to one another Very little insight to be gained from the created models Student models are haphazard and some relations that are shown are somewhat simplistic and give little insight Student models are basic but they show some meaningful insights and relations. Students Authors put together models that make sense and seek to relate different variables to one another that are not simplistic but are good ideas for exploring and show promise of insight. Regression Models: interpretation Claims made about models have large errors in logic or understanding. Very little if any work done in trying to get R2 higher in their models. Many or significant interpretations are not correct. Student is confused about what the regression model means. Few attempts to drive up R2 values. Interpretations are imprecise but generally correct. How this model may help a client is not obvious. Their models seem promising but they didn’t go further to refine their models and boost R2 very high Student has accurately interpreted their models: they understand the meaning of the coefficients and relate their model to real-life and what they or someone else may be able to do with this data. Students have experimented with many predictors in creative ways (shown either in the body of the text or appendix) to push up R2 values as much as is possible. The Report tells a good “story” Very little done with regards to telling any recognizable story in terms of finding a real life “question” or “problem” or assumption, digging in to find insights, and then clarifying in the end how a system or system behaviour really operates (as opposed to the initial assumptions) The student’s work really does read like many unrelated facts, however interesting, which do not in the end seem to relate much to one another The work is related but the story doesn’t hold to gether very well; more like a collection of related facts or ideas The thought process is clear, and arguments flow logically. The reader can see how and why the author has put the data together and how different questions relate. Interesting observations are made by the author