Data Analysis Exam 1 Due 5:00 pm October 8 Do not wait until the last minute to submit your test. We will not accept late submissions. 1 Instructions 1. This exam is a week-long take-home data...

This assignment (page 2)


Data Analysis Exam 1 Due 5:00 pm October 8 Do not wait until the last minute to submit your test. We will not accept late submissions. 1 Instructions 1. This exam is a week-long take-home data analysis exam. 2. You are allowed to use your notes as well as other reference books you feel you might need. You should use the statistical software R to perform your analysis. 3. You are NOT allowed to consult with any person other than your professor and your teaching assistant. 4. You are expected to comply with the CMU policy on academic integrity. Unauthorized help will result in failing the exam, and possibly more severe disciplinary action. 5. Submit two files: one is the PDF of your report and the other is your code (either a text file or R Markdown file). 6. Clearly comment your code so that it is clear which parts of your code go with which parts of your report. 7. Do not submit Word files; they will not be graded. 8. There are many correct approaches to data analysis. You should not think that there is a single, correct approach to analyzing these data. The important thing is to explain clearly what you are doing and why. 9. Your report is limited to a maximum of 7 pages (including graphs and tables). Make sure your report is clear and easy to read. 2 Data and Research Problem The goal of this project is to predict species richness (number of species) for some islands, based on a few covariates. The data can be found on Canvas under Files. The file is called: PlantData.txt. The first row has the variable names. The variables are: Variable Name Description NR Native plant species richness Area area in hectares Latitude latitude in degrees North Lat Elev elevation in meters above sea level Dist distance from mainland in km Soil number of soil types Years years since isolation Deglac years since deglaciation Human.pop human population The main outcome is native plant species richness, which is the count of the number of different plant species. The researcher has several research questions and goals: (1) The investigator hypothesizes that native species richness (NR) can be predicted from Area, Latitude, Elev, Dist, Soil, Years, Deglac, Human.pop. 1 (2) The investigator hypothesizes that the most important predictors are Area, Elevation and Soil types. (3) The investigator hypothesizes that better models will be obtained if transformations are applied to some covariate. You should analyze these data and address the three points above. You can use any methods you have learned in the course. Summarize your analysis in a report. 3 The Report Remember, the report is limited to a maximum of 7 pages. Your report should have the following sections: 1. Introduction. Briefly describe the data and the research problem. 2. EDA (Exploratory Data Analysis). Provide any graphical displays or numerical summaries for the variables that you think are useful. Describe your results. 3. Modeling. Start by building any multiple linear regression models you think are appropriate. 4. Diagnostics and model selection. Use diagnostics to evaluate your models. Take any actions you think are justified such as: transformations, removing outliers, removing variables etc. Explain what decisions you make and explain why you made them. Include a few plots if you think they are important. 5. Final Models. Summarize your final models: report the parameter estimates, standard errors, con- fidence intervals, and p-values. Interpret the fitted models in the context of the problem. If you have several models, then compare them. Note that there are several ways to compare different regression models: (i) partial F-tests, (ii) residuals and diagnostics and (iii) cross-validation. 6. Discussion. What are your final conclusions? Mention any limitations of your analysis. 4 Suggestions for Writing a Clear Report 1. Your writing should well-organized, free of grammatical errors, and written in complete sentences. 2. All numerical results or summaries should be reported with appropriate measures of uncertainty at- tached when applicable. 3. Figures and tables should be easy to read, with informative captions, axis labels and legends. 4. Make sure your code is organized, commented and that it is easy for others to read and understand. 2
Oct 08, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here