Test Items (50 points total) (1) R has probability functions available for use (Kabacoff, Section XXXXXXXXXXUsing one distribution to approximate another is not uncommon. (1)(a) (4 points) The Poisson...

1 answer below »
Let me know if you can or can't help. Please be honest with me, I can not afford to get less than an A from this assignment otherwise I'm getting my full refund.


Test Items (50 points total) (1) R has probability functions available for use (Kabacoff, Section 5.2.3). Using one distribution to approximate another is not uncommon. (1)(a) (4 points) The Poisson distribution may be used to approximate the binomial distribution if n > 20 and np < 7.="" estimate="" the="" following="" binomial="" probabilities="" using dpois() and ppois() with="" probability="" p="0.05," and="" n="100." then,="" estimate="" the="" same="" probabilities="" using dbinom() and pbinom().="" show="" the="" numerical="" results="" of="" your="" calculations.="" i.="" the="" probability="" of="" exactly="" 0="" successes.="" ii.="" the="" probability="" of="" fewer="" than="" 6="" successes.="" (1)(b)="" (2="" points)="" generate="" side-by-side="" barplots="" using par(mfrow="c(1,2)) or grid.arrange()." the="" left="" barplot="" will="" show="" poisson="" probabilties="" for="" outcomes="" ranging="" from="" 0="" to="" 10.="" the="" right="" barplot="" will="" show="" binomial="" probabilities="" for="" outcomes="" ranging="" from="" 0="" to="" 10.="" use="" p="0.05" and="" n="100." title="" each="" plot,="" present="" in="" color="" and="" assign="" names="" to="" the="" bar;="" i.e. x-axis="" value="" labels.="" (1)(c)="" for="" this="" problem="" refer="" to="" sections="" 5.2="" of="" business="" statistics.="" a="" discrete="" random="" variable="" has="" outcomes:="" 0,="" 1,="" 2,="" 3,="" 4,="" 5,="" 6.="" the="" corresponding="" probabilities="" in="" sequence="" with="" the="" outcomes="" are:="" 0.215,="" 0.230,="" 0.240,="" 0.182,="" 0.130,="" 0.003,="" 0.001.="" in="" other="" words,="" the="" probabilty="" of="" obtaining="" “0”="" is="" 0.215.="" i.="" (2="" points)="" calculate="" the="" expected="" value="" and="" variance="" for="" this="" distribution="" using="" the="" general="" formula="" for="" mean="" and="" variance="" of="" a="" discrete="" distribution.="" to="" do="" this,="" you="" will="" need="" to="" use="" integer="" values="" from="" 0="" to="" 6="" as="" outcomes="" along="" with="" the="" corresponding="" probabilities.="" round="" your="" answer="" to="" 2="" decimal="" places.="" ii.="" (2="" points)="" use="" the cumsum() function="" and="" plot="" the="" cumulative="" probabilties="" versus="" the="" corresponding="" outcomes.="" detemine="" the="" value="" of="" the="" median="" for="" this="" distribution="" and="" show="" on="" this="" plot.="" (2)="" conditional="" probabilities="" appear="" in="" many="" contexts="" and="" in="" particular="" are="" used="" by="" bayes’="" theorem.="" correlations="" are="" another="" means="" for="" evaluating="" dependency="" between="" variables.="" the="" dataset="" “faithful”"="" is="" part="" of="" the="" “datasets”="" package="" and="" may="" be="" loaded="" with="" the="" statement data(faithful).="" it="" contains="" 272="" observations="" of="" 2="" variables;="" waiting="" time="" between="" eruptions="" (in="" minutes)="" and="" the="" duration="" of="" the="" eruption="" (in="" minutes)="" for="" the="" old="" faithful="" geyser="" in="" yellowstone="" national="" park.="" (2)(a)="" (2="" points)="" load="" the="" “faithful”="" and="" present="" summary="" statistics="" and="" a="" histogram="" of="" waiting="" times.="" additionally,="" compute="" the="" empirical="" conditional="" probability="" of="" an="" eruption="" less="" than="" 3.0="" minutes,="" if="" the="" waiting="" time="" exceeds="" 70="" minutes.="" i.="" (2="" points)="" identify="" any="" observations="" in="" “faithful”="" for="" which="" the="" waiting="" time="" exceeds="" 70="" minutes="" and="" the="" eruptions="" are="" less="" than="" 3.0="" minutes.="" list="" and="" show="" any="" such="" observations="" in="" a="" distinct="" color="" on="" a="" scatterplot="" of="" all="" eruption="" (vertical="" axis)="" and="" waiting="" times="" (horizontal="" axis).="" include="" a="" horizontal="" line="" at="" eruption="3.0," and="" a="" vertical="" line="" at="" waiting="" time="70." add="" a="" title="" and="" appropriate="" text.="" ii.="" (1="" point)="" what="" does="" the="" plot="" suggest="" about="" the="" relationship="" between="" eruption="" time="" and="" waiting="" time?="" answer:="" (enter="" your="" answer="" here.)="" (2)(b)="" (3="" points)="" past="" research="" indicates="" that="" the="" waiting="" times="" between="" consecutive="" eruptions="" are="" not="" independent.="" this="" problem="" will="" check="" to="" see="" if="" there="" is="" evidence="" of="" this.="" form="" consecutive="" pairs="" of="" waiting="" times.="" in="" other="" words,="" pair="" the="" first="" and="" second="" waiting="" times,="" pair="" the="" third="" and="" fourth="" waiting="" times,="" and="" so="" forth.="" there="" are="" 136="" resulting="" consecutive="" pairs="" of="" waiting="" times.="" form="" a="" data="" frame="" with="" the="" first="" column="" containing="" the="" first="" waiting="" time="" in="" a="" pair="" and="" the="" second="" column="" with="" the="" second="" waiting="" time="" in="" a="" pair.="" plot="" the="" pairs="" with="" the="" second="" member="" of="" a="" pair="" on="" the="" vertical="" axis="" and="" the="" first="" member="" on="" the="" horizontal="" axis.="" one="" way="" to="" do="" this="" is="" to="" pass="" the="" vector="" of="" waiting="" times="" -="" faithful$waiting="" -="" to matrix(),="" specifying="" 2="" columns="" for="" our="" matrix,="" with="" values="" organized="" by="" row;="" i.e. byrow="TRUE." (2)(c)="" (2)="" test="" the="" hypothesis="" of="" independence="" with="" a="" two-sided="" test="" at="" the="" 5%="" level="" using="" the="" kendall="" correlation="" coefficient.="" (3)="" performing="" hypothesis="" tests="" using="" random="" samples="" is="" fundamental="" to="" statistical="" inference.="" the="" first="" part="" of="" this="" problem="" involves="" comparing="" two="" different="" diets.="" using="" “chickweight”="" data="" available="" in="" the="" base="" r,="" “datasets”="" package,="" execute="" the="" following="" code="" to="" prepare="" a="" data="" frame="" for="" analysis.="" #="" load="" "chickweight"="" dataset="" data(chickweight)="" #="" create="" t="" |="" f="" vector="" indicating="" observations="" with="" time="=" 21="" and="" diet="=" "1"="" or="" "3"="" index=""><- chickweight$time="=" 21="" &="" (chickweight$diet="=" "1"="" |="" chickweight$diet="=" "3")="" #="" create="" data="" frame,="" "result,"="" with="" the="" weight="" and="" diet="" of="" those="" observations="" with="" "true"="" "index""="" values="" result=""><- subset(chickweight[index,="" ],="" select="c(weight," diet))="" #="" encode="" "diet"="" as="" a="" factor="" result$diet=""><- factor(result$diet)="" str(result)="" ##="" classes="" 'nfngroupeddata',="" 'nfgroupeddata',="" 'groupeddata'="" and="" 'data.frame':="" 26="" obs.="" of="" 2="" variables:="" ##="" $="" weight:="" num="" 205="" 215="" 202="" 157="" 223="" 157="" 305="" 98="" 124="" 175="" ...="" ##="" $="" diet="" :="" factor="" w/="" 2="" levels="" "1","3":="" 1="" 1="" 1="" 1="" 1="" 1="" 1="" 1="" 1="" 1="" ...="" the="" data="" frame,="" “result”,="" has="" chick="" weights="" for="" two="" diets,="" identified="" as="" diet="" “1”="" and="" “3”.="" use="" the="" data="" frame,="" “result,”="" to="" complete="" the="" following="" item.="" (3)(a)="" (2="" points)="" display="" two="" side-by-side="" vertical="" boxplots="" using="" par(mfrow="c(1,2))." one="" boxplot="" would="" display="" diet="" “1”="" and="" the="" other="" diet="" “3”.="" (3)(b)="" (2="" points)="" use="" the="" “weight”="" data="" for="" the="" two="" diets="" to="" test="" the="" null="" hypothesis="" of="" equal="" population="" mean="" weights="" for="" the="" two="" diets.="" test="" at="" the="" 95%="" confidence="" level="" with="" a="" two-sided="" t-test.="" this="" can="" be="" done="" using t.test() in="" r.="" assume="" equal="" variances.="" display="" the="" results="" of="" t.test().="" working="" with="" paired="" data="" is="" another="" common="" statistical="" activity.="" the="" “chickweight”="" data="" will="" be="" used="" to="" illustrate="" how="" the="" weight="" gain="" from="" day="" 20="" to="" 21="" may="" be="" analyzed.="" use="" the="" following="" code="" to="" prepare="" pre-="" and="" post-data="" from="" diet="=" “3”="" for="" analysis.="" #="" load="" "chickweight"="" dataset="" data(chickweight)="" #="" create="" t="" |="" f="" vector="" indicating="" observations="" with="" diet="=" "3"="" index=""><- chickweight$diet="=" "3"="" #="" create="" vector="" of="" "weight"="" for="" observations="" where="" diet="=" "3"="" and="" time="=" 20="" pre=""><- subset(chickweight[index,="" ],="" time="=" 20,="" select="weight)$weight" #="" create="" vector="" of="" "weight"="" for="" observations="" where="" diet="=" "3"="" and="" time="=" 21="" post=""><- subset(chickweight[index,="" ],="" time="=" 21,="" select="weight)$weight" #="" the="" pre="" and="" post="" values="" are="" paired,="" each="" pair="" corresponding="" to="" an="" individual="" chick.="" cbind(pre,="" post)="" ##="" pre="" post="" ##="" [1,]="" 235="" 256="" ##="" [2,]="" 291="" 305="" ##="" [3,]="" 156="" 147="" ##="" [4,]="" 327="" 341="" ##="" [5,]="" 361="" 373="" ##="" [6,]="" 225="" 220="" ##="" [7,]="" 169="" 178="" ##="" [8,]="" 280="" 290="" ##="" [9,]="" 250="" 272="" ##="" [10,]="" 295="" 321="" (3)(c)="" (2="" points)="" present="" a="" scatterplot="" of="" the="" variable="" “post”="" as="" a="" function="" of="" the="" variable="" “pre”.="" include="" a="" diagonal="" line="" with="" zero="" intercept="" and="" slope="" equal="" to="" one.="" title="" and="" label="" the="" variables="" in="" this="" scatterplot.="" (3)(d)="" (4="" points)="" calculate="" and="" present="" a="" one-sided,="" 95%="" confidence="" interval="" for="" the="" average="" weight="" gain="" from="" day="" 20="" to="" day="" 21.="" write="" the="" code="" for="" the="" paired="" t-test="" and="" for="" determination="" of="" the="" confidence="" interval="" endpoints.="" **do="" not="" use="" *t.test()**,="" although="" you="" may="" check="" your="" answers="" using="" this="" function.="" present="" the="" resulting="" test="" statistic="" value,="" critical="" value,="" p-value="" and="" confidence="" interval.="" (4)="" statistical="" inference="" depends="" on="" using="" a="" sampling="" distribution="" for="" a="" statistic="" in="" order="" to="" make="" confidence="" statements="" about="" unknown="" population="" parameters.="" the="" central="" limit="" theorem="" is="" used="" to="" justify="" use="" of="" the="" normal="" distribution="" as="" a="" sampling="" distribution="" for="" statistical="" inference.="" using="" nile="" river="" flow="" data="" from="" 1871="" to="" 1970,="" this="" problem="" demonstrates="" sampling="" distribution="" convergence="" to="" normality.="" use="" the="" code="" below="" to="" prepare="" the="" data.="" refer="" to="" this="" example="" when="" completing="" (4)(c)="" below.="" data(nile)="" m=""><- mean(nile)="" std=""><- sd(nile)="" x=""><- seq(from = 400, to = 1400, by = 1) hist(nile, freq = false, col = "darkblue", xlab = "flow", main = "histogram of nile river flows, 1871 to 1970") curve(dnorm(x, mean = m, sd = std), col = "orange", lwd = 2, add = true) (4)(a) (2 points) using nile river flow data and the “moments” package, calculate skewness and kurtosis. present a qq plot and boxplot of the flow data side-by-side using qqnorm(), qqline() and boxplot(); par(mfrow = c(1, 2)) may be used to locate the plots side-by-side. add features to these displays as you choose. library(moments) (4)(b) (4 points) using set.seed(124) and the nile data, generate 1000 random samples of size n = 16, with replacement. for each sample drawn, calculate and store the sample mean. this can be done with a for-loop and use of the sample() function. label the resulting 1000 mean values as “sample1”. repeat these steps using set.seed(127) - a different “seed” - and samples of size n = 64. label these 1000 mean values as “sample2”. compute and present the means, sample standard deviations and sample variances for “sample1” and “sample2” in a table with the first row for “sample1”, the second row for “sample2” and the columns labled for each statistic. (4)(c) (4 points) present side-by-side histograms of “sample1” and “sample2” with the normal density curve superimposed. to prepare comparable histograms, it will be necessary to use “freq = false” and to maintain the same x-axis with “xlim = c(750, 1050)”, and the same y-axis with “ylim = c(0, 0.025).” to superimpose separate density functions, you will need to use the mean and standard deviation for each “sample” - each histogram - separately. (5) this problem deals with contingency table analysis. this is an example of categorical data analysis (see kabacoff, pp. 145-151). the “warpbreaks” dataset gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. there are 54 observations on 3 variables: breaks (numeric, the number of breaks), wool (factor, type of wool: a or b), and tension (factor, low l, medium m and high h). these data have been studied and used for example elsewhere. for the purposes of this problem, we will focus on the relationship between breaks and tension using contingency table analysis. (5)(a)(3 points) warpbreaks is part of the “datasets” package and may be loaded via data(warpbreaks). load “warpbreaks” and present the structure using str(). calculate the median number of breaks for the entire dataset, disregarding “tension” and “wool”. define this median value as “median_breaks”. present a histogram of the number of breaks with the location of the median indicated. create a new variable “number” as follows: for each value of “breaks”, classify the number of breaks as either strictly below “median_breaks”, or the alternative. convert the “above”|“below” classifications to a factor, and combine with the dataset seq(from="400," to="1400," by="1)" hist(nile,="" freq="FALSE," col="darkblue" ,="" xlab="Flow" ,="" main="Histogram of Nile River Flows, 1871 to 1970" )="" curve(dnorm(x,="" mean="m," sd="std)," col="orange" ,="" lwd="2," add="TRUE)" (4)(a)="" (2="" points)="" using="" nile="" river="" flow="" data="" and="" the="" “moments”="" package,="" calculate="" skewness="" and="" kurtosis.="" present="" a="" qq="" plot="" and="" boxplot="" of="" the="" flow="" data="" side-by-side="" using qqnorm(), qqline() and boxplot(); par(mfrow="c(1," 2)) may="" be="" used="" to="" locate="" the="" plots="" side-by-side.="" add="" features="" to="" these="" displays="" as="" you="" choose.="" library(moments)="" (4)(b)="" (4="" points)="" using set.seed(124) and="" the="" nile="" data,="" generate="" 1000="" random="" samples="" of="" size="" n="16," with="" replacement.="" for="" each="" sample="" drawn,="" calculate="" and="" store="" the="" sample="" mean.="" this="" can="" be="" done="" with="" a="" for-loop="" and="" use="" of="" the sample() function.="" label="" the="" resulting="" 1000="" mean="" values="" as="" “sample1”. repeat="" these="" steps="" using set.seed(127) -="" a="" different="" “seed”="" -="" and="" samples="" of="" size="" n="64. Label" these="" 1000="" mean="" values="" as="" “sample2”.="" compute="" and="" present="" the="" means,="" sample="" standard="" deviations="" and="" sample="" variances="" for="" “sample1”="" and="" “sample2”="" in="" a="" table="" with="" the="" first="" row="" for="" “sample1”,="" the="" second="" row="" for="" “sample2”="" and="" the="" columns="" labled="" for="" each="" statistic.="" (4)(c)="" (4="" points)="" present="" side-by-side="" histograms="" of="" “sample1”="" and="" “sample2”="" with="" the="" normal="" density="" curve="" superimposed.="" to="" prepare="" comparable="" histograms,="" it="" will="" be="" necessary="" to="" use="" “freq="FALSE”" and="" to="" maintain="" the="" same="" x-axis="" with="" “xlim="c(750," 1050)”,="" and="" the="" same="" y-axis="" with="" “ylim="c(0," 0.025).” to="" superimpose="" separate="" density="" functions,="" you="" will="" need="" to="" use="" the="" mean="" and="" standard="" deviation="" for="" each="" “sample”="" -="" each="" histogram="" -="" separately.="" (5)="" this="" problem="" deals="" with="" contingency="" table="" analysis.="" this="" is="" an="" example="" of="" categorical="" data="" analysis="" (see="" kabacoff,="" pp. 145-151).="" the="" “warpbreaks”="" dataset="" gives="" the="" number="" of="" warp="" breaks="" per="" loom,="" where="" a="" loom="" corresponds="" to="" a="" fixed="" length="" of="" yarn.="" there="" are="" 54="" observations="" on="" 3="" variables:="" breaks="" (numeric,="" the="" number="" of="" breaks),="" wool="" (factor,="" type="" of="" wool:="" a="" or="" b),="" and="" tension="" (factor,="" low="" l,="" medium="" m="" and="" high="" h).="" these="" data="" have="" been="" studied="" and="" used="" for="" example="" elsewhere.="" for="" the="" purposes="" of="" this="" problem,="" we="" will="" focus="" on="" the="" relationship="" between="" breaks="" and="" tension="" using="" contingency="" table="" analysis.="" (5)(a)(3="" points)="" warpbreaks="" is="" part="" of="" the="" “datasets”="" package="" and="" may="" be="" loaded="" via data(warpbreaks).="" load="" “warpbreaks”="" and="" present="" the="" structure="" using str().="" calculate="" the="" median="" number="" of="" breaks="" for="" the="" entire="" dataset,="" disregarding="" “tension”="" and="" “wool”.="" define="" this="" median="" value="" as="" “median_breaks”.="" present="" a="" histogram="" of="" the="" number="" of="" breaks="" with="" the="" location="" of="" the="" median="" indicated.="" create="" a="" new="" variable="" “number”="" as="" follows:="" for="" each="" value="" of="" “breaks”,="" classify="" the="" number="" of="" breaks="" as="" either="" strictly="" below="" “median_breaks”,="" or="" the="" alternative.="" convert="" the="" “above”|“below”="" classifications="" to="" a="" factor,="" and="" combine="" with="" the="">
Answered Same DayAug 02, 2020

Answer To: Test Items (50 points total) (1) R has probability functions available for use (Kabacoff, Section...

Shaziya answered on Aug 06 2020
137 Votes
Rwork.docx
(1) R has probability functions available for use (Kabacoff, Section 5.2.3). Using one distribution to approximate another is not uncommon.
(1)(a) (4 points) The Poisson distribution may be used to approximate the binomial distribution if n > 20 and np < 7. Estimate the following binomial probabilities using *dpois()* and *ppois()* with probability p = 0.05, and n = 100. Then, estimate the same probabilities using *dbinom()* and *pbinom()*. Show the nume
rical results of your calculations.
Solution :
i)The probability of 0 success
ppois(0, lambda=100)
[1] 3.720076e-44
ii) dpois(0,lambda=100,log=FALSE)
[1] 3.720076e-44
iii) dbinom(0, 100, 0.5, log = FALSE)
[1] 7.888609e-31
iv) pbinom(0, 100, 0.5, lower.tail = TRUE, log.p = FALSE)
[1] 7.888609e-31
The probability of success fewer than 6
> ppois(0, lambda=100)
[1] 3.720076e-44
Warning message:
package ‘memisc’ was built under R version 3.3.3
> ppois(6,lambda=100)
[1] 5.492918e-35
> ppois(5,lambda=100)
[1] 3.261456e-36
> dpois(0,lambda=100,log=FALSE)
[1] 3.720076e-44
> dpois(5,lambda=100,log=FALSE)
[1] 3.100063e-36
> dpois(5,lambda=100,log=TRUE)
[1] -81.76164
> dbinom(0, 100, 0.5, log = FALSE)
[1] 7.888609e-31
> dbinom(5, 100, 0.5, log = FALSE)
[1] 5.939138e-23
> dbinom(5, 100, 0.5, log = TRUE)
[1] -51.17789
> pbinom(0, 100, 0.5, lower.tail = TRUE, log.p = FALSE)
[1] 7.888609e-31
> pbinom(5, 100, 0.5, lower.tail = TRUE, log.p = FALSE)
[1] 6.261623e-23
> pbinom(5, 100, 0.5, lower.tail = TRUE, log.p = TRUE)
[1] -51.12502
Shown in screenshot below :
2) (2 points) Generate side-by-side barplots using *par(mfrow = c(1,2))* or *grid.arrange()*. The left barplot will show Poisson probabilties for outcomes ranging from 0 to 10. The right barplot will show binomial probabilities for outcomes ranging from 0 to 10. Use p = 0.05 and n = 100. Title each plot, present in color and assign names to the bar; i.e. x-axis value labels.
data<-pnorm(10,22,sd=5)-pnorm(1,22,sd=5)
> library(UsingR)
Error in library(UsingR) : there is no package called ‘UsingR’
> par(mfrow=c(1,2))
> a <- densityplot(data)
> b <- densityplot(data)
>
> print(a, position = c(0, 10, 0.5, 1), more = TRUE)
> print(b, position = c(0.5, 0, 1, 1))
i) (2 points) Calculate the expected value and variance for this distribution using the general formula for mean and variance of a discrete distribution. To do this, you will need to use integer values from 0 to 6 as outcomes along with the corresponding probabilities. Round your answer to 2 decimal places.
Mean is coming
0.215
Variance is coming
0.01060267
As shown in the screenshot below
ii) (2 points) Use the *cumsum()* function and plot the cumulative probabilties versus the corresponding outcomes. Detemine the value of the median for this distribution and show on this plot.
Plot between cumsum values is shown below
(2)(a) (2 points) Load the “faithful” and present summary statistics and a histogram of waiting times. Additionally, compute the empirical conditional probability of an eruption less than 3.0 minutes, if the waiting time exceeds 70 minutes.
Solution: require(stats); require(graphics)
f.tit <- "faithful data: Eruptions of Old Faithful"
ne60 <- round(e60 <- 60 * faithful$eruptions)
all.equal(e60, ne60) # relative diff. ~ 1/10000
table(zapsmall(abs(e60 - ne60))) # 0, 0.02 or 0.04
faithful$better.eruptions <- ne60 / 60
te <- table(ne60)
te[te >= 4] # (too) many multiples of 5 !
plot(names(te), te, type = "h", main = f.tit, xlab = "Eruption time (sec)")
plot(faithful[, -3], main = f.tit,
xlab = "Eruption time (min)",
ylab = "Waiting time to next eruption (min)")
lines(lowess(faithful$eruptions, faithful$waiting, f = 2/3, iter = 3),
col = "red")
Histogram for waiting times:
i. (2 points) Identify any observations in “faithful” for which the waiting time exceeds 70 minutes and the eruptions are less than 3.0 minutes. List and show any such observations in a distinct color on a scatterplot of all eruption (vertical axis) and waiting times (horizontal axis). Include a horizontal line at eruption = 3.0, and a vertical line at waiting time = 70. Add a title and appropriate text.
i. (1 point) What does the plot suggest about the relationship between eruption time and waiting time?
Solution: The suggestion which we got from the plot states that waiting time is directly proportional to eruption time as eruption time increases the waiting time also increases
(2)(b) (3 points) Past research indicates that the waiting times between consecutive eruptions are not independent. This problem will check to see if there is evidence of this. Form...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here