- You must pickonecolumnof categorical data (words) andonecolumnof quantitative data (numerical measurement) from the data set below. You will be trying to determine if the categorical data is related to the quantitative data or not by pasting the data into StatKey and performing an ANOVA hypothesis test. The
F-test statistic, critical value, and P-value will all be calculated with randomized simulation in StatKey. You must include pictures of the three printouts from StatKey. (1. Original Sample Printout showing sample sizes, means, standard deviations, and the F-test statistic. 2. The randomized StatKey simulation showing 5% in the top of the right tail and the critical value below it. 3. The randomized StatKey simulation showing the F-test statistic below the right tail and the P-value proportion in the top of the right tail.)
- Read the Project Directions carefullyandfollow the directions. The directions walk you through the project, step by step.
TEACHOUT MATH 140 PROJECT: ANOVA Randomization Hypothesis Test Plagiarism Warning: Do not copy an online article or another math 140 student’s work. This is plagiarism. Cheating will go on your permanent record. Further discipline will be determined by the COC dean of students. A plagiarism checker may be used if I suspect cheating. I have also had problems with students simply copying online ANOVA articles as their project. Do not do this either. This is also plagiarism. Your project answers should be intro level and use the given data and the simple straight forward language used during lectures and in the textbook. Online articles are usually written by people in advanced statistics and will not answer these questions appropriately. Grading Rubric: The project is worth 100 points. There are 25 things to put on your project listed below in bold as “Put these on the project report”. Some of them are pictures found on StatKey. Others are answers to questions or sentences to write. Each of these is worth 4% of the project grade. (4 points each) Key Note about Assumptions (Conditions): This is a Census of students in the Fall 2015 semester. Even though it is not a random sample, we will be assuming that the sample data represents the population of COC statistics students. We will also be assuming that individual observations from this data are independent of each other, even though in reality this might not be the case. Part I: Pick your two columns of data. Open the Math 075/140 Combined Survey Data Fall 2015. Data Link. This data was taken from statistics students (math 140) and pre-stat students (math 075) in the Fall 2015 semester. There are 36 columns of data to choose from. Pick one categorical data set (column of words) and one quantitative data set (numerical measurement data) from the 075/140 combined survey data. Put these on the Project Report: 1. What categorical variable did you pick? For example: The type of sandwich left out to spoil. 2. What quantitative variable did you pick? For example: The number of ants Part II: Write the Null Hypothesis, Alternative Hypothesis and chose your Claim Test the claim that there is a relationship between your categorical variable and your quantitative variable. Use the following null and alternative hypothesis. The number of options in your categorical data will determine the number of groups in your ANOVA test. If your categorical data has two options, then the null will be µ1 = µ2. If your data has three options, then the null will be µ1 = µ2 = µ3. If your categorical data has 6 options then your null will be µ1 = µ2 = µ3 = µ4 = µ5 = µ6 . Here is an example null and alternative hypothesis. Your null and alternative hypothesis should have symbolic notation with “µ” and also the relationship statement implication. Do not just copy this null and alternative below. You should not say that the categorical and quantitative variables are related, you should say that the type of sandwich is related to the number of ants. If those were the variables you picked. Also, the claim could be either statement. Which do you think is true. Do you think the two columns of data you picked are related or not? That is the claim. Be sure to label which statement is the claim. This is a right tailed test. (Remember all ANOVA tests are right tailed.) ??0 : µ1 = µ2 = µ3 = µ4 = … (The type of sandwich is NOT RELATED to and the number of ants.) ???? : at least one is ≠ (The type of sandwich is RELATED to and the number of ants.) CLAIM Put these on the Project Report: 3. Write your null hypothesis ???? as seen in the example above. Be sure to include the symbolic notation with “µ” AND the relationship statement with the two variables you picked. 4. Write your alternative hypothesis ???? as seen in the example above. Be sure to include the symbolic notation with “µ” AND the relationship statement with the two variables you picked. http://teachoutcoc.org/files/math_075_140_combined_survey_data_fall_2015.xlsx 5. Is your claim that the categorical and quantitative variables are related (Ha) or not related (Ho)? Part III: Paste your Data into StatKey and Find your F-test statistic Copy and paste your categorical column of data and the quantitative column of data next to each other in excel. The categorical column should be on the left. The quantitative column should be on the right. Now highlight both columns without the titles, right click and copy. Do NOT copy the titles. Go to the “ANOVA for Difference in Means” under the “More Advanced Randomization Tests” menu. StatKey Link. Click on “Edit Data” and Copy and Paste the two columns of data (categorical on left and quantitative on the right) into StatKey. Do NOT paste the titles. If you do paste the data with the titles, delete the titles. Uncheck the box for “header row” and push OK. Put these on the Project Report: 6. Copy and Paste a picture of the “Original Sample Statistics” printout into your Project report. You can find this on the top right of the StatKey page. It should show the F-test statistic from your data, and the sample size, mean and standard deviations for all your groups. It should look like the following but the numbers will be different. Do NOT copy and paste the “randomization sample” by mistake. Only copy the one that says “original sample”. The F from “original sample” is your one and only F test statistic for your data. You do not need to copy the ANOVA table either. a. 7. Assumptions#1&2: Is the sample data either random or represents the population and are the individuals independent. Put this Answer: “The data is not random, but we are assuming the data represents the population and that individuals are independent.” 8. Assumption#3: In the original sample printout, are all of the sample sizes at least 30? “Yes, all of the sample sizes are 30 or higher.” OR “No. At least one of the sample sizes is below 30.” In the above Original Sample table, all of the sample sizes were 8 so the ant/sandwich data would not pass this assumption. 9. Assumption#4: In the original sample printout, there should not be any sample standard deviations that are more than twice as large as any other sample standard deviation. Is this the case for your data? “Yes, the sample standard deviations are close.” OR “No, there is at least one sample standard deviation that is more than twice as large as the others.” In the above Original Sample printout, all of the sample standard deviations (9.3, 14.6, 10.8, 13.9) are all close. So this would pass the assumption. 10. What is the F-test statistic from your data? (This is listed in the Original Sample printout. For example, in the above printout the F-test statistic was 5.627) 11. Write a sentence explaining the F-test statistic. Here is an example: “The ratio of the variance between the groups to the variance within the groups is 5.627.” http://www.lock5stat.com/StatKey/advanced_1_quant_1_cat/advanced_1_quant_1_cat.html PART IV: Create a Randomized Simulation and Calculate the Critical Value You will now be creating two simulated F-distributions. One to calculate the critical value and tail. Another to calculate the P-value. Click on “Generate 1000 Samples” a few times to create the simulated F-distribution. Directions for calculating the Critical Value and tail. Click on “Right-Tail”. Change the right-tail proportion (upper box in distribution) to 0.05. This corresponds to a 5% significance level. The number at the bottom is the Critical Value and the start of the right tail. Put these on the Project Report: 12. Copy and paste a picture of the simulated F-distribution that has 0.05 in the upper box of the right tail and the Critical Value in the lower box of the right tail into your Project report. It should look like the picture below but the critical value at the bottom will be different. a. 13. What is the critical value corresponding to 0.05 in the tail? (In the example above the Critical Value is 3.503) 14. Does the real F-test statistic listed on “Original Sample Statistics” fall in the right tail determined by the Critical Value or does the F-test statistic NOT fall in the right tail? In the ants/sandwich example, the test statistic was 5.627, so this would fall in the tail that starts at Critical Value of 3.503. 15. Does the sample data significantly disagree with the null hypothesis (F-test stat in tail) or does the sample data not significantly disagree with the null hypothesis (F-test stat not in tail)? 16. Is the variance between the groups significantly higher than the variance within the groups (F-test stat in tail) or is the variance between not significantly different than the variance within (F-test stat not in tail)? PART V: Use the Randomized Simulation to Calculate the P-Value Directions for calculating the P-value. Use the same distribution you have already created. Click on “Right-Tail”. On the lower box where your Critical Value was, type in the actual F-test statistic listed under “Original Sample Statistics”. Once you put in the F-test statistic into the bottom box in the right tail, the upper box is the P-value. Put these on the Project Report: 17. Copy and paste a picture of the simulated F-distribution that has the F-test statistic in the lower box of the right tail and the P-value in the upper box of your right tail into your Project report. It should look like the following picture but the test stat and P-value will be different