Individual
R Assignment #9:
1) Using
phone.xlsx
in Bb, we like to test the effect of
type of phone on errors with 0.05 alpha. Do all tests to find the rankings of
phones based on errors.
2) Using
supplier.xlsx
in Bb, we like to test the effect of
supplier on strengths with 0.05 alpha. Do all tests to find the rankings of
suppliers based on strength.
3) Using
design.xlsx
in Bb we like to test the effect of
design on strengths with 0.05 alpha. Do all tests to find the rankings of
designs based on strength.
Sheet1 strengthdesign 5design1 4design1 6design1 22design1 3design1 4design1 5design1 6design1 7design2 8design2 10design2 8design2 9design2 8design2 9design2 6design2 5design3 6design3 5design3 7design3 5design3 6design3 7design3 8design3 anovaLogPhone errorsphone 2phone1 1phone1 2phone1 3phone1 4phone1 3phone1 2phone1 1phone1 10phone1 6phone2 7phone2 8phone2 9phone2 8phone2 7phone2 8phone2 5phone2 7phone2 2phone3 3phone3 4phone3 3phone3 2phone3 3phone3 4phone3 5phone3 3phone3 supplier strengthsupplier 30.5A 29.4A 31.3A 28.4A 29.8A 32.9A 29.3A 29.2A 29.3A 28.3A 22.6B 23.7B 19.6B 23.8B 27.1B 27.1B 25.3B 24.0B 21.2B 24.5B 27.7C 18.6C 20.8C 25.1C 17.7C 19.6C 24.1C 20.8C 24.7C 22.9C 21.5D 20.0D 21.1D 22.7D 16.0D 25.4D 19.9D 22.6D 17.5D 20.4D 20.6E 18.0E 19.0E 22.1E 13.2E 19.2E 24.0E 17.2E 19.9E 18.0E Statistics for Business and Economics 1 BUS 525: Quantitative Methods For Business Research Content: ANOVA test and Kruskal Wallis test 1 ANOVA or Kruskal Wallis test ANOVA stands for ANalysis Of VAriance. This is a test to compare more than two means. The name ANOVA is from, by analyzing variances of data, we can compare means. ANOVA or Kruskal Wallis test For ANVOA test, we use the term Factor, Factor level and Dependent variable. Example1: If we like to study the effect of price ($10, $15, and $20) on sales of a product, then Factor = Price Factor level = $10, $15, and $20 Dependent variable = sales of a product Note: This problem is to compare three means of sales based on three price levels. ANOVA or Kruskal Wallis test Example2: If we like to study the effect of design of golf balls (Design1, Design 2, Design 3 and Design 4) on flying distances (in yards) of balls, then Factor = Design Factor level = Design1, Design 2, Design 3 and Design 4 Dependent variable = flying distance in yard Note: This problem is to compare four means of flying distances in yard based on four designs. ANOVA or Kruskal Wallis test Validity conditions to use ANOVA test Each data is from a normal distribution (called normality condition) Shapiro test can be done to check this condition Number of Shapiro tests is same as number of factor levels. First example, we need to do 3 Shapiro tests. 2nd example, we need to do 4 Shapiro tests. H0: Data 1 is from a normal distribution H1: Data 1 is not from a normal distribution Variances of populations are same ( called homogeneity in variances condition) Levene test can be done to check this condition. Number of Levene test is only one regardless of number of factor levels. H0: All variances of populations are same H1: Not all variances of populations are same 5 ANOVA or Kruskal Wallis test Procedures for ANOVA: First, we need to check these two validity conditions. If both normality condition checked by Shapiro test and homogeneity in variances condition checked by Levene test are satisfied, then ANOVA test can be used to compare means. If both conditions are not met, then do log transformation and then check normality condition and homogeneity in variances condition again with log transformed data. If both conditions using log transformed data are met, then ANOVA with log transformed data can be used to compare means of log transformed data If both conditions with log transformed data are not met, ANOVA test can’t be used. Kruskal-Wallis test with original data can be used to compare medians. ANOVA or Kruskal Wallis test H0 and H1 for ANOVA test: Example 1: H0: All means of sales based on three price levels are same (There is no effect of price on sales) H1: Not all means of sales based on three price levels are same (There is effect of price on sales) Example 2: H0: All means of flying distances of balls based on 4 designs are same (There is no effect of design on flying distances) H1: Not all means of flying distances of balls based on 4 designs are same (There is effect of design on flying distances) ANOVA or Kruskal Wallis test Post ANOVA test: If the conclusion of ANOVA test is “not rejecting H0”, then the implication of the ANOVA test is that there is no evidence that not all means of ………. are same. No post ANOVA test is required since there is no evidence that they are different. Right! If the conclusion of ANOVA test is “rejecting H0”, then the implication is that there is evidence that not all means of ………. are same. So that a post ANOVA test should be done to see how they are different. Sensible? A post ANOVA test that we could use is Tukey test. Tukey test will be explained with examples. ANOVA or Kruskal Wallis test Kruskal and Wallis (KW) test, and its post test: If ANOVA test can’t be done, KW test could be done. H0 and H1 for KW test are: H0: All medians of …………… are same H1: Not all medians of ………….are same If the conclusion of KW test is “not rejecting H0”, then the implication of the KW test is that there is no evidence that not all medians of ………. are same. No post KW test is required since there is no evidence that they are different. If the conclusion of KW test is “rejecting H0”, then the implication is that there is evidence that not all medians of ………. are same. So that a post KW test should be done to see how they are different. A post KW test that we could use is Dunn test. Dunn.test will be explained using examples. ANOVA or Kruskal Wallis test Example1: Please download data, anovaCatFood.xlsx in Bb. Then save it as catfood.csv in our class folder. We like to study the effect of type of foods (kd (kidney), sp (shrimp), cl (chicken liver), sm (salmon) and bf (beef)) on weight gains in lb. of cats. Factor = type of food Factor level = kd, sp, cl, sm and bf Thus, number of factor level = 5 Dependent variable = weight gains of cats in lb. ANOVA or Kruskal Wallis test Check the validity conditions: normality and homogeneity in variances First, normality using shapiro test for each food. Since there are 5 shapiro tests to do, just one set of H0 and H1 is enough. H0: data is from a normal distribution H1: data is not from a normal distribution Then do the followings: Output of levels are Output of names ANOVA or Kruskal Wallis test Do shapiro test for each food. Note that the square bracket [ ] extracts data from data Since all p-values (0.5545, 0.4439, 0.1558, 0.664, 0.4146) are larger than 0.05, do not reject H0 for all Shapiro tests. So that there is evidence that all data are from a normal distribution. ANOVA or Kruskal Wallis test Do Levene test for homogeneity in variances: H0: All variances of 5 populations are same H1: Not all variances of 5 populations are same Levene test requires a package, car. Then, please go to Packages menu on the lower right hand side window and check the mark for car package. You need to do this all the time. If you could not install car package, see page 15 in the slide. ANOVA or Kruskal Wallis test Do Levene test for homogeneity in variances: Output is Since p-value (0.09467) is larger than 0.05, do not reject H0. So that there is no evidence that not all variances are same. Thus homogeneity of variances condition is met. Now we know that two conditions (normality and homogeneity in variances) are met. So ANOVA test can be done. P -value ANOVA or Kruskal Wallis test Do Levene test for homogeneity in variances: H0: All variances of 5 populations are same H1: Not all variances of 5 populations are same If you could not install car package, install the following two alternatives: DescTools or lawstat (case sensitive) Make sure that you need to use the correct function name and syntax for each package. ANOVA or Kruskal Wallis test Do ANOVA test: H0: All means of weight gains of 5 foods are equal H1: Not all means of weight gains of 5 foods are equal Outputs are Since p-value (9.15e-10) is less than 0.05, reject H0. It implies that there is evidence that not all means of weight gains of 5 foods are equal. Thus we need to do a post ANOVA test, Tukey test to rank the means of 5 foods. ANOVA or Kruskal Wallis test Now let us do Tukey test: Outputs are Tukey test is multiple pairwise tests. Since there are 5 food types, there are 10 pairs. Can you see the 10 pairs? That is, there are 10 tests. First test is H0: Mean weight gains of cl(Chicken Liver) = mean weight gains of bf (beef) H1: Mean weight gains of cl (Chicken Liver) is not same as mean weight gains of bf (beef) P-value for this test is 0.0000005. p-values ANOVA or Kruskal Wallis test Outputs are By looking at the outputs, we see that conclusions of three pairs (kidney-chicken liver, shrimp – chicken liver, and shrimp – kidney) have p-values larger than 0.05, that is, “not reject H0”. Rest pairs are “Reject H0”. Do you see that. Thus, implications of Tukey test is that Kidney and Chicken liver are not different Shrimp and Chicken liver are not different Shrimp and Kidney are not different Rest 7 pairs are different. Using these outputs, we like to see the rankings of foods based on mean weight gains. ANOVA or Kruskal Wallis test To see the ranking, first find the sample mean of data of each food: Outputs are Thus ascending orders of sample means are Beef < salmon="">< chicken="" liver="">< shrimp="">< kidney="" make="" sure="" that="" these="" are="" based="" on="" sample="" data.="" our="" goal="" is="" that="" we="" like="" to="" see="" the="" rankings="" of="" means="" of="" populations,="" not="" means="" of="" sample="" data.="" to="" do="" that,="" we="" need="" to="" combine="" above="" sample="" order="" and="" tukey="" outputs.="" anova="" or="" kruskal="" wallis="" test="" combing="" information:="" first="" list="" the="" foods="" based="" on="" the="" ascending="" order.="" beef="" salmon="" chicken="" liver="" shrimp="" kidney="" second,="" put="" the="" underline="" for="" the="" pairs="" whose="" conclusion="" is="" “not="" rejecting="" h0”="" beef="" salmon="" chicken="" liver="" shrimp="" kidney="" next,="" make="" groups="" which="" are="" distinct="" beef="" salmon="" chicken="" liver="" shrimp="" kidney="" there="" are="" three="" distinct="" groups:="" one="" for="" beef,="" second="" for="" salmon="" and="" third="" for="" chicken="" liver,="" shrimp="" and="" kidney.="" anova="" or="" kruskal="" wallis="" test="" beef="" salmon="" chicken="" liver="" shrimp="" kidney="" group="" 1="" group="" 2="" group="" 3="" since="" there="" are="" three="" distinct="" groups,="" we="" can="" rank="" them.="" group="" 1="">< group="" 2="">< group="" 3="" thus,="" the="" final="" implications="" of="" tukey="" test="" are="" that="" group="" 3="" (chicken="" liver,="" shrimp="" and="" kidney)="" has="" the="" highest="" mean="" of="" weight="" gains="" group="" 2="" (salmon)="" is="" the="" next="" group="" 1="" (beef)="" is="" the="" last="" one.="" also,="" we="" can’t="" rank="" among="" chicken="" liver,="" shrimp="" and="" kidney="" based="" on="" this="" data.="" (note="" that="" do="" not="" say="" that="" they="" have="" same="" means.="" it="" is="" too="" strong="" statement="" which="" might="" be="" incorrect.="" we="" just="" say="" that="" we="" can’t="" rank="" them.="" do="" you="" get="" the="" idea!)="" anova="" or="" kruskal="" wallis="" test="" more="" practice="" to="" interpret="" tukey="" outputs="" using="" a="" makeup="" example:="" if="" the="" ascending="" order="" of="" sample="" means="" is="" b="">< c="">< d="">< a="" and="" if="" tukey="" test="" says="" that="" conclusions="" of="" pairs="" (b="" -="" c,="" and="" d="" –a)="" are="" “not="" rejecting="" h0”="" and="" rest="" pairs="" are="" “rejecting="" h0”,="" then="" we="" can="" underlines="" like="" that:="" b="" c="" d="" a="" thus="" we="" can="" circle="" like="" below:="" b="" c="" d="" a="" thus="" there="" are="" two="" distinct="" groups:="" thus,="" the="" final="" implications="" are="" that="" group="" (a="" and="" d)="" has="" higher="" means="" than="" group="" (b="" and="" c)="" we="" can’t="" rank="" between="" a="" and="" d="" we="" can’t="" rank="" between="" b="" and="" c="" anova="" or="" kruskal="" wallis="" test="" one="" more="" practice="" to="" interpret="" tukey="" outputs:="" if="" the="" ascending="" order="" of="" sample="" means="" is="" b="">< c="">< d="">< a and if tukey test says that conclusions of pairs (b - c, c – d, and d –a) are “not rejecting h0” and rest pairs are “rejecting h0”, then we can underlines like that: bc da thus we can circle like below: bc da thus there is only one distinct group: thus, the final implications are that we can’t rank among a, b, c, and d. do not say that they are same! anova or kruskal wallis test look at another example, using anovalogcollege.xlsx please save it as anovalogcollege.csv in our class folder. let us assume that we like to see the effect of college (college1, college2 and college3) on amounts of accidents. factor = college (or type of college) factor level = college1, college2, a="" and="" if="" tukey="" test="" says="" that="" conclusions="" of="" pairs="" (b="" -="" c,="" c="" –="" d,="" and="" d="" –a)="" are="" “not="" rejecting="" h0”="" and="" rest="" pairs="" are="" “rejecting="" h0”,="" then="" we="" can="" underlines="" like="" that:="" b="" c="" d="" a="" thus="" we="" can="" circle="" like="" below:="" b="" c="" d="" a="" thus="" there="" is="" only="" one="" distinct="" group:="" thus,="" the="" final="" implications="" are="" that="" we="" can’t="" rank="" among="" a,="" b,="" c,="" and="" d.="" do="" not="" say="" that="" they="" are="" same!="" anova="" or="" kruskal="" wallis="" test="" look="" at="" another="" example,="" using="" anovalogcollege.xlsx="" please="" save="" it="" as="" anovalogcollege.csv="" in="" our="" class="" folder.="" let="" us="" assume="" that="" we="" like="" to="" see="" the="" effect="" of="" college="" (college1,="" college2="" and="" college3)="" on="" amounts="" of="" accidents.="" factor="college" (or="" type="" of="" college)="" factor="" level="college1,"> a and if tukey test says that conclusions of pairs (b - c, c – d, and d –a) are “not rejecting h0” and rest pairs are “rejecting h0”, then we can underlines like that: bc da thus we can circle like below: bc da thus there is only one distinct group: thus, the final implications are that we can’t rank among a, b, c, and d. do not say that they are same! anova or kruskal wallis test look at another example, using anovalogcollege.xlsx please save it as anovalogcollege.csv in our class folder. let us assume that we like to see the effect of college (college1, college2 and college3) on amounts of accidents. factor = college (or type of college) factor level = college1, college2,>