[Type text][Type text][Type text] SHA571: Understanding and Visualizing Data Cornell University Understanding and Visualizing Data Course Project Instructions: This project guides you through the...

1 answer below »
Is this feasible to complete by 10 pm eastern time today?


[Type text][Type text][Type text] SHA571: Understanding and Visualizing Data Cornell University Understanding and Visualizing Data Course Project Instructions: This project guides you through the process of applying a data decision framework to an important decision-making situation in your organization or career. Once you have completed all three parts of the project, submit this project document and any supporting documents to your instructor for grading. A submit button can be found on the Part Three assignment page. Information about the grading rubric is available on any of the course project assignment pages online. Do not hesitate to contact your instructor if you have any questions about the project. Part One – Draft a Data-Collection Plan In this part of the project, you will identify a situation that requires you to make a decision, identify what data you will use to inform the decision, and draft a plan to collect the data you need. In order to satisfy this part of the project, you produce a plan that answers the questions listed in the following three steps. Step 1: Identify a decision that requires, or would be enhanced by, data analysis. What is the situation? What are the parameters or options in the decision? Who are the key stakeholders? How do you hope or expect data to help illuminate your decision? Step 2: Identify data that will help you better understand the situation. What are the key performance indicators for your situation? What defines the range of cases you will consider? What are the variables you will consider? Is each variable categorical or quantitative? What purpose does each variable have in informing your decision? Step 3: Develop a data-gathering plan. (100-250 words) Where will the data come from? Is it observational or experimental data? Who will collect it? How much data will you need (sample size)? How will you assure that it is representative of the population? What steps will you take to mitigate potential bias? Part Two – Identify Data Summaries and Visualizations In this part of the project, you will identify the statistical summaries and visualizations you believe will best help you make your decision. In order to complete this part of the project, you need to answer the following questions. What summary statistics will you use to inform your decision? Are you interested in statistics that are sensitive to or resistant to outliers, and why? What visualizations will you use to inform your decision? Be sure to indicate what variables and what scale you are using for each visualization (e.g. histogram showing frequency of page hits per hour over a 24-hour period) If you have data related to your project, create at least one of the visualizations you have listed above and include a copy of it here. Note: Though your work will only be seen by those grading the course and will not be used or shared outside the course, you should take care to obscure any information you feel might be of a sensitive or confidential nature. Part Three – Data and Your Decision Part three is the culmination of your project. In this part of the project, you will make a determination about whether the data you have or planned to have is going to be sufficient to make a good decision. If you have data related to your project, you will create an interactive dashboard to help you analyze the data. If not, you will create a mock-up of the dashboard. In order to complete this part of the project, you need to answer the following questions and present either a working dashboard or a mockup of a dashboard. Part a: Questions about your data-model-insight framework What are you attempting to model with your data? What are the KPIs for the situation you are trying to understand? What is the relationship between your variables and the KPIs? What are the limitations of your model? Do you feel your model, as defined, is “good enough” to inform your decision? Why or why not? Part b: Your Project Dashboard If you have data for your project, you may be able to put together a dashboard in Excel that is a good working model. If so, include an Excel workbook as part of your project submission. As an alternative, you may turn in a mockup that indicates what elements you would like your dashboard to include. In either case, your dashboard or mockup should include · a readout of KPI’s, clearly labeled as such · one or more visualizations · summary statistics, as needed Indicate how the elements of the dashboard are connected. In a mockup, this could be arrows drawn between different elements on the dashboard. For a dashboard with actual data, you might include this information as text. 1 © 2016 eCornell. All rights reserved. All other copyrights, trademarks, trade names, and logos are the sole property of their respective owners.
Answered Same DayAug 20, 2021

Answer To: [Type text][Type text][Type text] SHA571: Understanding and Visualizing Data Cornell University...

Pooja answered on Aug 21 2021
154 Votes
Table of Contents
Part 1 - Data collection plan    2
Part 2 - Data summary and Visualization    3
Part 3 – Modelling    5
Analysis    5
Dashboard    6
Appendix    7
Part 1    7
Part 2    11
Part 3    13
References    14
Part 1 - Data collection plan
A leading cause of death in the United States is suicide. This cause is ranked number 2 for the age group of 10-14, 15-24, and 15-34 years. For the age group of 35-44 and 45-54, the cause of suicidal death is ranked as Number 4. With th
e help of data analysis, I want to analyze the trend of the suicidal rates on the basis of gender. This situation is the increasing rate of suicides from 4.1 per 100,000 in the year 2001 to 26.1 per 100,000 in the year 2017. The data can be helpful to predict the total number of suicides in the future. The data can also be useful to know if there is a significant difference in the average number of suicides between male and female. 
The data set corresponding to suicide rates in the United States would be beneficial to understand the situation. The data is filtered for the year 2000 until 2015. The variables of concern are year, sex, suicides/100k pop. The quantitative variables are the year and suicides/100k pop. The categorical variable is sex which is categorized as male or female. The variable gender will analyze if the average number of suicides is Greater for male or female. The variable time (denoted as 1 for the January 2000) will help us to create a regression model which can predict the suicides/100k pop in future. Chambers, J. M. (2017).
The dataset masters are obtained from a secondary source named as kaggle.com. The data is an observational data as the values are recorded from each unit. The government agencies of various countries collect the data. This data is finalized and published by kaggle.com.  The data corresponding to the year 2000-2015 would be a good representative of the population. Considering 12 months for each of the 15 years, a sample size of 192 would be appropriate. The biasedness is excluded by considering the participants of various age groups for both male and females. Garvan, F. (2001).
Part 2 - Data summary and Visualization
There is an increasing trend in the average number of suicides per 100000 population from the year 2000 until 2015. 
    suicides/100k pop
    
    
    Mean
    12.94541667
    Standard Error
    0.843362193
    Median
    7.115
    Mode
    0.34
    Standard Deviation
    11.68596934
    Sample Variance
    136.5618794
    Kurtosis
    -0.598106571
    Skewness
    0.767366793
    Range
    42.13
    Minimum
    0.27
    Maximum
    42.4
    Sum
    2485.52
    Count
    192
The average number of suicides/100k pop is 13 with a high standard deviation of 11.6. This universe value along with the histogram indicates that the distribution of suicides is slightly skewed to the right. There are very few years/months with a high suicide rate. Draper, N. R., & Smith, H. (1998).
As evident from the box plot, there are no outliers in this data set.
    Row Labels
    Average of suicides/100k pop
    female
    4.593020833
    male
    21.2978125
    Grand Total
    12.94541667
The average number of suicides for males is extremely greater than that of females. There are on an average 21 suicides per 100000 population for males in comparison to only 5 suicides per 100000 population for females.
Part 3 – Modelling
Analysis
The regression model is given by: suicides/100k popn = 3.93 + 0.0067*time + 16.73*male. 
There is 51% variation in two sides which is explained by time and mail. This percentage is considered fair, and it seems that the model can be improved by adding some significant variables. Chatfield, C. (2018).
Consider the null hypothesis, that model is not significant. This is an alternative hypothesis that the model is significant. With (F=100.16, p<5%), the null hypothesis is rejected at 5% level of significance. There is sufficient evidence to conclude that the model is significant. Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012).
With 1 month increase, the societal rate is increased by 0.006 suicides per 100000 population. But With t=0.63, p>5%, this value is not considered to be significant.
For males, the suicide in rate is 16.7 suicide per 100000 population more in comparison to females.  This value is considered to be significant with t=14.15, p<5%.
The limitation of regression analysis is the assumption of normality of residuals and equality of error variances. These assumptions are violated as observed from a normal probability plot and time residual plot. As the normal probability plot is not as shape and the point are not random in the time reasonable plot.
Dashboard
Appendix
Part 1
    year
    time
    sex
    suicides/100k...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here