Filename: Pennstate2 Extension: .txt Summary: Data from n=205 students in a statistics class for students in the social and behavioral sciences. The survey was done in the Spring semester of 2000. (Source: Mind on Statistics (Utts and Heckard)) The dataset has been modified for this homework. Variable Names in order from left to right: Column Name Description A Gender Male or Female B Tattoo Does student have a tattoo (Yes or No) C CDs Categorized number of the student's estimate of how many music CDs he/she owns (BelowAvg = 0 to 30, Average = 31 to 80, AboveAvg = more than 80) D Height Self-reported height in inches E EarPrc Number of ear piercings
Classify the variables
1. Create a table of variable definitions for all of the variables in the Pennstate2 data file.
Univariate descriptive statistics and visualizations
2. For each qualitative variable: a) provide a table containing the frequency, relative frequency, cumulative frequency, and totals, b) generate a bar chart and a pie chart, and c) discuss one frequency and one relative frequency in the context of the variable.
3. Observe the pie chart and bar chart for an ordinal variable. Which is a better visualization of the variable? Explain. [You should only complete this exercise if there is an ordinal variable in the data set.]
4. For each quantitative variable: a) use the Descriptive Statistics analysis tool to generate a table of descriptive statistics (including the margin of error for the 95% confidence interval), b) generate a box plot and a histogram (with bin frequencies), c) identify the shape of the distribution, and d) comment on the most appropriate measure of central tendency and the most appropriate measure of dispersion and interpret the associated values in the context of the variable.
2
Dr. Kimberly Gardner 2019 Assessment 1
Multivariate descriptive statistics and visualizations
5. Stratify the variable
EarPrc
by the variable
Tattoo. Complete the following: a) make a table of the descriptive statistics for each group and include the margin of error for the 95% confidence interval, b) generate a side by side box-whiskers plot, c) interpret the 95% confidence interval for each group, and discuss whether there is evidence the means are statistically significant.
6. Select a reasonable variable for stratifying the variable
Height,
and explain your selection. Complete the following: a) make a table of the descriptive statistics for each group and include the margin of error for the 95% confidence interval b) generate side by side box-whiskers plots, c) interpret the 95% confidence interval for each group, and discuss whether there is evidence the means are statistically significant.
7. Generate a contingency table for
Gender
by
CDs. Provide the following: a) a frequency table, b) a relative frequency table, c) row frequencies table, d) column frequencies table, e) an interpretation of the value in the second row, first column of each table in the context of the data.
8. Generate a stacked bar chart and a 100% stacked bar chart for the row frequencies of
Gender
by
CDs. Provide the following: a) a description of the differences between the graphics, b) determine which graph is a better visualization or the relationship between the variables and why.
Random number generation and simple random sampling
9. Generate a simple random sample of 45 observations using the RAND function. Display your sample with the sorted random numbers in the table. Generate the descriptive statistics for ONE quantitative variable and ONE qualitative variable. Comment on how the statistics compare to those of the overall data.
Follow the guidelines provided on the course D2L page for preparing and submitting your paper. Points will be deducted for not following instructions and for unscholarly papers