Let me know if you can or can't help. Please be honest with me, I can not afford to get less than an A from this assignment otherwise I'm getting my full refund.
Test Items (50 points total) (1) R has probability functions available for use (Kabacoff, Section 5.2.3). Using one distribution to approximate another is not uncommon. (1)(a) (4 points) The Poisson distribution may be used to approximate the binomial distribution if n > 20 and np < 7.="" estimate="" the="" following="" binomial="" probabilities="" using dpois() and ppois() with="" probability="" p="0.05," and="" n="100." then,="" estimate="" the="" same="" probabilities="" using dbinom() and pbinom().="" show="" the="" numerical="" results="" of="" your="" calculations.="" i.="" the="" probability="" of="" exactly="" 0="" successes.="" ii.="" the="" probability="" of="" fewer="" than="" 6="" successes.="" (1)(b)="" (2="" points)="" generate="" side-by-side="" barplots="" using par(mfrow="c(1,2)) or grid.arrange()." the="" left="" barplot="" will="" show="" poisson="" probabilties="" for="" outcomes="" ranging="" from="" 0="" to="" 10.="" the="" right="" barplot="" will="" show="" binomial="" probabilities="" for="" outcomes="" ranging="" from="" 0="" to="" 10.="" use="" p="0.05" and="" n="100." title="" each="" plot,="" present="" in="" color="" and="" assign="" names="" to="" the="" bar;="" i.e. x-axis="" value="" labels.="" (1)(c)="" for="" this="" problem="" refer="" to="" sections="" 5.2="" of="" business="" statistics.="" a="" discrete="" random="" variable="" has="" outcomes:="" 0,="" 1,="" 2,="" 3,="" 4,="" 5,="" 6.="" the="" corresponding="" probabilities="" in="" sequence="" with="" the="" outcomes="" are:="" 0.215,="" 0.230,="" 0.240,="" 0.182,="" 0.130,="" 0.003,="" 0.001.="" in="" other="" words,="" the="" probabilty="" of="" obtaining="" “0”="" is="" 0.215.="" i.="" (2="" points)="" calculate="" the="" expected="" value="" and="" variance="" for="" this="" distribution="" using="" the="" general="" formula="" for="" mean="" and="" variance="" of="" a="" discrete="" distribution.="" to="" do="" this,="" you="" will="" need="" to="" use="" integer="" values="" from="" 0="" to="" 6="" as="" outcomes="" along="" with="" the="" corresponding="" probabilities.="" round="" your="" answer="" to="" 2="" decimal="" places.="" ii.="" (2="" points)="" use="" the cumsum() function="" and="" plot="" the="" cumulative="" probabilties="" versus="" the="" corresponding="" outcomes.="" detemine="" the="" value="" of="" the="" median="" for="" this="" distribution="" and="" show="" on="" this="" plot.="" (2)="" conditional="" probabilities="" appear="" in="" many="" contexts="" and="" in="" particular="" are="" used="" by="" bayes’="" theorem.="" correlations="" are="" another="" means="" for="" evaluating="" dependency="" between="" variables.="" the="" dataset="" “faithful”"="" is="" part="" of="" the="" “datasets”="" package="" and="" may="" be="" loaded="" with="" the="" statement data(faithful).="" it="" contains="" 272="" observations="" of="" 2="" variables;="" waiting="" time="" between="" eruptions="" (in="" minutes)="" and="" the="" duration="" of="" the="" eruption="" (in="" minutes)="" for="" the="" old="" faithful="" geyser="" in="" yellowstone="" national="" park.="" (2)(a)="" (2="" points)="" load="" the="" “faithful”="" and="" present="" summary="" statistics="" and="" a="" histogram="" of="" waiting="" times.="" additionally,="" compute="" the="" empirical="" conditional="" probability="" of="" an="" eruption="" less="" than="" 3.0="" minutes,="" if="" the="" waiting="" time="" exceeds="" 70="" minutes.="" i.="" (2="" points)="" identify="" any="" observations="" in="" “faithful”="" for="" which="" the="" waiting="" time="" exceeds="" 70="" minutes="" and="" the="" eruptions="" are="" less="" than="" 3.0="" minutes.="" list="" and="" show="" any="" such="" observations="" in="" a="" distinct="" color="" on="" a="" scatterplot="" of="" all="" eruption="" (vertical="" axis)="" and="" waiting="" times="" (horizontal="" axis).="" include="" a="" horizontal="" line="" at="" eruption="3.0," and="" a="" vertical="" line="" at="" waiting="" time="70." add="" a="" title="" and="" appropriate="" text.="" ii.="" (1="" point)="" what="" does="" the="" plot="" suggest="" about="" the="" relationship="" between="" eruption="" time="" and="" waiting="" time?="" answer:="" (enter="" your="" answer="" here.)="" (2)(b)="" (3="" points)="" past="" research="" indicates="" that="" the="" waiting="" times="" between="" consecutive="" eruptions="" are="" not="" independent.="" this="" problem="" will="" check="" to="" see="" if="" there="" is="" evidence="" of="" this.="" form="" consecutive="" pairs="" of="" waiting="" times.="" in="" other="" words,="" pair="" the="" first="" and="" second="" waiting="" times,="" pair="" the="" third="" and="" fourth="" waiting="" times,="" and="" so="" forth.="" there="" are="" 136="" resulting="" consecutive="" pairs="" of="" waiting="" times.="" form="" a="" data="" frame="" with="" the="" first="" column="" containing="" the="" first="" waiting="" time="" in="" a="" pair="" and="" the="" second="" column="" with="" the="" second="" waiting="" time="" in="" a="" pair.="" plot="" the="" pairs="" with="" the="" second="" member="" of="" a="" pair="" on="" the="" vertical="" axis="" and="" the="" first="" member="" on="" the="" horizontal="" axis.="" one="" way="" to="" do="" this="" is="" to="" pass="" the="" vector="" of="" waiting="" times="" -="" faithful$waiting="" -="" to matrix(),="" specifying="" 2="" columns="" for="" our="" matrix,="" with="" values="" organized="" by="" row;="" i.e. byrow="TRUE." (2)(c)="" (2)="" test="" the="" hypothesis="" of="" independence="" with="" a="" two-sided="" test="" at="" the="" 5%="" level="" using="" the="" kendall="" correlation="" coefficient.="" (3)="" performing="" hypothesis="" tests="" using="" random="" samples="" is="" fundamental="" to="" statistical="" inference.="" the="" first="" part="" of="" this="" problem="" involves="" comparing="" two="" different="" diets.="" using="" “chickweight”="" data="" available="" in="" the="" base="" r,="" “datasets”="" package,="" execute="" the="" following="" code="" to="" prepare="" a="" data="" frame="" for="" analysis.="" #="" load="" "chickweight"="" dataset="" data(chickweight)="" #="" create="" t="" |="" f="" vector="" indicating="" observations="" with="" time="=" 21="" and="" diet="=" "1"="" or="" "3"="" index=""><- chickweight$time="=" 21="" &="" (chickweight$diet="=" "1"="" |="" chickweight$diet="=" "3")="" #="" create="" data="" frame,="" "result,"="" with="" the="" weight="" and="" diet="" of="" those="" observations="" with="" "true"="" "index""="" values="" result="">-><- subset(chickweight[index,="" ],="" select="c(weight," diet))="" #="" encode="" "diet"="" as="" a="" factor="" result$diet="">-><- factor(result$diet)="" str(result)="" ##="" classes="" 'nfngroupeddata',="" 'nfgroupeddata',="" 'groupeddata'="" and="" 'data.frame':="" 26="" obs.="" of="" 2="" variables:="" ##="" $="" weight:="" num="" 205="" 215="" 202="" 157="" 223="" 157="" 305="" 98="" 124="" 175="" ...="" ##="" $="" diet="" :="" factor="" w/="" 2="" levels="" "1","3":="" 1="" 1="" 1="" 1="" 1="" 1="" 1="" 1="" 1="" 1="" ...="" the="" data="" frame,="" “result”,="" has="" chick="" weights="" for="" two="" diets,="" identified="" as="" diet="" “1”="" and="" “3”.="" use="" the="" data="" frame,="" “result,”="" to="" complete="" the="" following="" item.="" (3)(a)="" (2="" points)="" display="" two="" side-by-side="" vertical="" boxplots="" using="" par(mfrow="c(1,2))." one="" boxplot="" would="" display="" diet="" “1”="" and="" the="" other="" diet="" “3”.="" (3)(b)="" (2="" points)="" use="" the="" “weight”="" data="" for="" the="" two="" diets="" to="" test="" the="" null="" hypothesis="" of="" equal="" population="" mean="" weights="" for="" the="" two="" diets.="" test="" at="" the="" 95%="" confidence="" level="" with="" a="" two-sided="" t-test.="" this="" can="" be="" done="" using t.test() in="" r.="" assume="" equal="" variances.="" display="" the="" results="" of="" t.test().="" working="" with="" paired="" data="" is="" another="" common="" statistical="" activity.="" the="" “chickweight”="" data="" will="" be="" used="" to="" illustrate="" how="" the="" weight="" gain="" from="" day="" 20="" to="" 21="" may="" be="" analyzed.="" use="" the="" following="" code="" to="" prepare="" pre-="" and="" post-data="" from="" diet="=" “3”="" for="" analysis.="" #="" load="" "chickweight"="" dataset="" data(chickweight)="" #="" create="" t="" |="" f="" vector="" indicating="" observations="" with="" diet="=" "3"="" index="">-><- chickweight$diet="=" "3"="" #="" create="" vector="" of="" "weight"="" for="" observations="" where="" diet="=" "3"="" and="" time="=" 20="" pre="">-><- subset(chickweight[index,="" ],="" time="=" 20,="" select="weight)$weight" #="" create="" vector="" of="" "weight"="" for="" observations="" where="" diet="=" "3"="" and="" time="=" 21="" post="">-><- subset(chickweight[index,="" ],="" time="=" 21,="" select="weight)$weight" #="" the="" pre="" and="" post="" values="" are="" paired,="" each="" pair="" corresponding="" to="" an="" individual="" chick.="" cbind(pre,="" post)="" ##="" pre="" post="" ##="" [1,]="" 235="" 256="" ##="" [2,]="" 291="" 305="" ##="" [3,]="" 156="" 147="" ##="" [4,]="" 327="" 341="" ##="" [5,]="" 361="" 373="" ##="" [6,]="" 225="" 220="" ##="" [7,]="" 169="" 178="" ##="" [8,]="" 280="" 290="" ##="" [9,]="" 250="" 272="" ##="" [10,]="" 295="" 321="" (3)(c)="" (2="" points)="" present="" a="" scatterplot="" of="" the="" variable="" “post”="" as="" a="" function="" of="" the="" variable="" “pre”.="" include="" a="" diagonal="" line="" with="" zero="" intercept="" and="" slope="" equal="" to="" one.="" title="" and="" label="" the="" variables="" in="" this="" scatterplot.="" (3)(d)="" (4="" points)="" calculate="" and="" present="" a="" one-sided,="" 95%="" confidence="" interval="" for="" the="" average="" weight="" gain="" from="" day="" 20="" to="" day="" 21.="" write="" the="" code="" for="" the="" paired="" t-test="" and="" for="" determination="" of="" the="" confidence="" interval="" endpoints.="" **do="" not="" use="" *t.test()**,="" although="" you="" may="" check="" your="" answers="" using="" this="" function.="" present="" the="" resulting="" test="" statistic="" value,="" critical="" value,="" p-value="" and="" confidence="" interval.="" (4)="" statistical="" inference="" depends="" on="" using="" a="" sampling="" distribution="" for="" a="" statistic="" in="" order="" to="" make="" confidence="" statements="" about="" unknown="" population="" parameters.="" the="" central="" limit="" theorem="" is="" used="" to="" justify="" use="" of="" the="" normal="" distribution="" as="" a="" sampling="" distribution="" for="" statistical="" inference.="" using="" nile="" river="" flow="" data="" from="" 1871="" to="" 1970,="" this="" problem="" demonstrates="" sampling="" distribution="" convergence="" to="" normality.="" use="" the="" code="" below="" to="" prepare="" the="" data.="" refer="" to="" this="" example="" when="" completing="" (4)(c)="" below.="" data(nile)="" m="">-><- mean(nile)="" std="">-><- sd(nile)="" x="">-><- seq(from = 400, to = 1400, by = 1) hist(nile, freq = false, col = "darkblue", xlab = "flow", main = "histogram of nile river flows, 1871 to 1970") curve(dnorm(x, mean = m, sd = std), col = "orange", lwd = 2, add = true) (4)(a) (2 points) using nile river flow data and the “moments” package, calculate skewness and kurtosis. present a qq plot and boxplot of the flow data side-by-side using qqnorm(), qqline() and boxplot(); par(mfrow = c(1, 2)) may be used to locate the plots side-by-side. add features to these displays as you choose. library(moments) (4)(b) (4 points) using set.seed(124) and the nile data, generate 1000 random samples of size n = 16, with replacement. for each sample drawn, calculate and store the sample mean. this can be done with a for-loop and use of the sample() function. label the resulting 1000 mean values as “sample1”. repeat these steps using set.seed(127) - a different “seed” - and samples of size n = 64. label these 1000 mean values as “sample2”. compute and present the means, sample standard deviations and sample variances for “sample1” and “sample2” in a table with the first row for “sample1”, the second row for “sample2” and the columns labled for each statistic. (4)(c) (4 points) present side-by-side histograms of “sample1” and “sample2” with the normal density curve superimposed. to prepare comparable histograms, it will be necessary to use “freq = false” and to maintain the same x-axis with “xlim = c(750, 1050)”, and the same y-axis with “ylim = c(0, 0.025).” to superimpose separate density functions, you will need to use the mean and standard deviation for each “sample” - each histogram - separately. (5) this problem deals with contingency table analysis. this is an example of categorical data analysis (see kabacoff, pp. 145-151). the “warpbreaks” dataset gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. there are 54 observations on 3 variables: breaks (numeric, the number of breaks), wool (factor, type of wool: a or b), and tension (factor, low l, medium m and high h). these data have been studied and used for example elsewhere. for the purposes of this problem, we will focus on the relationship between breaks and tension using contingency table analysis. (5)(a)(3 points) warpbreaks is part of the “datasets” package and may be loaded via data(warpbreaks). load “warpbreaks” and present the structure using str(). calculate the median number of breaks for the entire dataset, disregarding “tension” and “wool”. define this median value as “median_breaks”. present a histogram of the number of breaks with the location of the median indicated. create a new variable “number” as follows: for each value of “breaks”, classify the number of breaks as either strictly below “median_breaks”, or the alternative. convert the “above”|“below” classifications to a factor, and combine with the dataset seq(from="400," to="1400," by="1)" hist(nile,="" freq="FALSE," col="darkblue" ,="" xlab="Flow" ,="" main="Histogram of Nile River Flows, 1871 to 1970" )="" curve(dnorm(x,="" mean="m," sd="std)," col="orange" ,="" lwd="2," add="TRUE)" (4)(a)="" (2="" points)="" using="" nile="" river="" flow="" data="" and="" the="" “moments”="" package,="" calculate="" skewness="" and="" kurtosis.="" present="" a="" qq="" plot="" and="" boxplot="" of="" the="" flow="" data="" side-by-side="" using qqnorm(), qqline() and boxplot(); par(mfrow="c(1," 2)) may="" be="" used="" to="" locate="" the="" plots="" side-by-side.="" add="" features="" to="" these="" displays="" as="" you="" choose.="" library(moments)="" (4)(b)="" (4="" points)="" using set.seed(124) and="" the="" nile="" data,="" generate="" 1000="" random="" samples="" of="" size="" n="16," with="" replacement.="" for="" each="" sample="" drawn,="" calculate="" and="" store="" the="" sample="" mean.="" this="" can="" be="" done="" with="" a="" for-loop="" and="" use="" of="" the sample() function.="" label="" the="" resulting="" 1000="" mean="" values="" as="" “sample1”. repeat="" these="" steps="" using set.seed(127) -="" a="" different="" “seed”="" -="" and="" samples="" of="" size="" n="64. Label" these="" 1000="" mean="" values="" as="" “sample2”.="" compute="" and="" present="" the="" means,="" sample="" standard="" deviations="" and="" sample="" variances="" for="" “sample1”="" and="" “sample2”="" in="" a="" table="" with="" the="" first="" row="" for="" “sample1”,="" the="" second="" row="" for="" “sample2”="" and="" the="" columns="" labled="" for="" each="" statistic.="" (4)(c)="" (4="" points)="" present="" side-by-side="" histograms="" of="" “sample1”="" and="" “sample2”="" with="" the="" normal="" density="" curve="" superimposed.="" to="" prepare="" comparable="" histograms,="" it="" will="" be="" necessary="" to="" use="" “freq="FALSE”" and="" to="" maintain="" the="" same="" x-axis="" with="" “xlim="c(750," 1050)”,="" and="" the="" same="" y-axis="" with="" “ylim="c(0," 0.025).” to="" superimpose="" separate="" density="" functions,="" you="" will="" need="" to="" use="" the="" mean="" and="" standard="" deviation="" for="" each="" “sample”="" -="" each="" histogram="" -="" separately.="" (5)="" this="" problem="" deals="" with="" contingency="" table="" analysis.="" this="" is="" an="" example="" of="" categorical="" data="" analysis="" (see="" kabacoff,="" pp. 145-151).="" the="" “warpbreaks”="" dataset="" gives="" the="" number="" of="" warp="" breaks="" per="" loom,="" where="" a="" loom="" corresponds="" to="" a="" fixed="" length="" of="" yarn.="" there="" are="" 54="" observations="" on="" 3="" variables:="" breaks="" (numeric,="" the="" number="" of="" breaks),="" wool="" (factor,="" type="" of="" wool:="" a="" or="" b),="" and="" tension="" (factor,="" low="" l,="" medium="" m="" and="" high="" h).="" these="" data="" have="" been="" studied="" and="" used="" for="" example="" elsewhere.="" for="" the="" purposes="" of="" this="" problem,="" we="" will="" focus="" on="" the="" relationship="" between="" breaks="" and="" tension="" using="" contingency="" table="" analysis.="" (5)(a)(3="" points)="" warpbreaks="" is="" part="" of="" the="" “datasets”="" package="" and="" may="" be="" loaded="" via data(warpbreaks).="" load="" “warpbreaks”="" and="" present="" the="" structure="" using str().="" calculate="" the="" median="" number="" of="" breaks="" for="" the="" entire="" dataset,="" disregarding="" “tension”="" and="" “wool”.="" define="" this="" median="" value="" as="" “median_breaks”.="" present="" a="" histogram="" of="" the="" number="" of="" breaks="" with="" the="" location="" of="" the="" median="" indicated.="" create="" a="" new="" variable="" “number”="" as="" follows:="" for="" each="" value="" of="" “breaks”,="" classify="" the="" number="" of="" breaks="" as="" either="" strictly="" below="" “median_breaks”,="" or="" the="" alternative.="" convert="" the="" “above”|“below”="" classifications="" to="" a="" factor,="" and="" combine="" with="" the="">- seq(from = 400, to = 1400, by = 1) hist(nile, freq = false, col = "darkblue", xlab = "flow", main = "histogram of nile river flows, 1871 to 1970") curve(dnorm(x, mean = m, sd = std), col = "orange", lwd = 2, add = true) (4)(a) (2 points) using nile river flow data and the “moments” package, calculate skewness and kurtosis. present a qq plot and boxplot of the flow data side-by-side using qqnorm(), qqline() and boxplot(); par(mfrow = c(1, 2)) may be used to locate the plots side-by-side. add features to these displays as you choose. library(moments) (4)(b) (4 points) using set.seed(124) and the nile data, generate 1000 random samples of size n = 16, with replacement. for each sample drawn, calculate and store the sample mean. this can be done with a for-loop and use of the sample() function. label the resulting 1000 mean values as “sample1”. repeat these steps using set.seed(127) - a different “seed” - and samples of size n = 64. label these 1000 mean values as “sample2”. compute and present the means, sample standard deviations and sample variances for “sample1” and “sample2” in a table with the first row for “sample1”, the second row for “sample2” and the columns labled for each statistic. (4)(c) (4 points) present side-by-side histograms of “sample1” and “sample2” with the normal density curve superimposed. to prepare comparable histograms, it will be necessary to use “freq = false” and to maintain the same x-axis with “xlim = c(750, 1050)”, and the same y-axis with “ylim = c(0, 0.025).” to superimpose separate density functions, you will need to use the mean and standard deviation for each “sample” - each histogram - separately. (5) this problem deals with contingency table analysis. this is an example of categorical data analysis (see kabacoff, pp. 145-151). the “warpbreaks” dataset gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. there are 54 observations on 3 variables: breaks (numeric, the number of breaks), wool (factor, type of wool: a or b), and tension (factor, low l, medium m and high h). these data have been studied and used for example elsewhere. for the purposes of this problem, we will focus on the relationship between breaks and tension using contingency table analysis. (5)(a)(3 points) warpbreaks is part of the “datasets” package and may be loaded via data(warpbreaks). load “warpbreaks” and present the structure using str(). calculate the median number of breaks for the entire dataset, disregarding “tension” and “wool”. define this median value as “median_breaks”. present a histogram of the number of breaks with the location of the median indicated. create a new variable “number” as follows: for each value of “breaks”, classify the number of breaks as either strictly below “median_breaks”, or the alternative. convert the “above”|“below” classifications to a factor, and combine with the dataset>