need help with coding assignment in R studio
Math 211 R Assignment #5 Due Thursday, November 18 Instructions: • In RStudio, go to file and open a new R script. • Save the file as A5_LastnameFirstinitial.R. If your last name is longer than 8 letters, you can stop at 8 characters. • Always write your code in the Source panel (not the Console panel). View your results in the Console panel or graphs in the Plots panel. Note that different questions may result in different codes for different people. One code may not necessarily be more correct than another. However, one code may be better written than another, but I will not mark it as wrong as long as I am able to generate the needed result. • If you make a mistake, correct your code in the Source panel so that the submission does not contain codes you don’t want me to grade. Basically, your submission should be a clean, working file. Do not submit any errors, practices, etc. • Before answering or writing the code for each question, start by commenting the question number. For example, before answering question 1, write # Q1. Then go to the next line and start answering the question. If an answer is a sentence, the first line will be # Question number. The next line will be # Answer in sentence form. If the question calls for a code, the first line will be # Question number. The next line will be the code. • Upload your file in MOM. Be sure your file has the extension .R or .r. For this assignment, we are going to see how the Central Limit Theorem works. We will consider the real estate data from the city of Ames, Iowa. The details of every real estate transaction in Ames is recorded by the City Assessor’s office. Our particular focus will be all residential home sales in Ames between 2006 and 2010. This collection represents our population of interest. Download the real estate data from Ames, Iowa by entering the following codes: • download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData") • load("ames.RData") (There are lots of quotation marks here. If the code does not work, you may have to play around with the quotation marks.) This data set has 82 variables. We are only going to be focusing on one variable, the Sale Price of homes in Ames, Iowa. Questions: 1. Rename the variable SalePrice as price. 2. Draw a histogram of price. Label the axes and give a title to your histogram so that it is understandable to a general audience. 3. From the histogram, what kind of distribution is price? 4. Set the seed first. Then take any sample size you want from price by using the code: sample(price, size), although I suggest using a small sample size for easy viewing. Call that sample_size_size. For example, if your chosen sample size is 25, then your vector will be called sample_size_25. (This part is optional: You may want to call out sample_size_size to view what samples R generates. Try the code sample( ) several times using different seeds or without seeding to see the samples vary.) We will now take different sample sizes of price and calculate the sample means. For uniformity, call the variable, sample_means_ size. You can follow the directions below or refer to Chapter 21 (Samples and Distributions) of the R Guide. Any word in red and italicized means you enter your own value. Any word in green is the code function name. 5. Use sample size = 5. Begin each code as follows: # 5a, # 5b, # 5c… a. Do the each of the following in order. • set_seed (any integer) • sample_means_ size <- rep(na,="" number="" of="" repetitions)="" #="" i="" suggest="" 1000="" or="" more="" repetitions.="" play="" around="" with="" the="" number="" of="" repetitions="" •="" for="" (i="" in="" 1:="" number="" of="" repetition)="" {="" sample_means_="" size="" [="" i="" ]="">-><- mean(sample(price, size)) } # this loop takes the mean of the samples and puts them in the ith entry of vector, sample_means_ size, each time. b. do a histogram of sample_means_ size. be sure to label the axes. add the title: sample size of size. c. what kind of distribution is sample_means_ size? d. calculate the mean. e. calculate the standard deviation of sample_means_ size. f. calculate the standard deviation of price divided by the square root of the sample size. 6. repeat all of a – f in #5 using size = 10. start your code with # 6a, # 6b, # 6c … 7. repeat all of a – f in #5 using size = 30. start your code with # 7a, # 7b, # 7c … 8. repeat all of a – f in #5 using size = 50. start your code with # 8a, # 8b, # 8c … 9. calculate the mean of price. 10. as the sample size increases from 5 to 50, how does the mean of sample_means_ size compare with the mean of price? 11. as the sample size increases from 5 to 50, are your answers to #5 – 8, (e ) and (f ) getting closer? mean(sample(price,="" size))="" }="" #="" this="" loop="" takes="" the="" mean="" of="" the="" samples="" and="" puts="" them="" in="" the="" ith="" entry="" of="" vector,="" sample_means_="" size,="" each="" time.="" b.="" do="" a="" histogram="" of="" sample_means_="" size.="" be="" sure="" to="" label="" the="" axes.="" add="" the="" title:="" sample="" size="" of="" size.="" c.="" what="" kind="" of="" distribution="" is="" sample_means_="" size?="" d.="" calculate="" the="" mean.="" e.="" calculate="" the="" standard="" deviation="" of="" sample_means_="" size.="" f.="" calculate="" the="" standard="" deviation="" of="" price="" divided="" by="" the="" square="" root="" of="" the="" sample="" size.="" 6.="" repeat="" all="" of="" a="" –="" f="" in="" #5="" using="" size="10." start="" your="" code="" with="" #="" 6a,="" #="" 6b,="" #="" 6c="" …="" 7.="" repeat="" all="" of="" a="" –="" f="" in="" #5="" using="" size="30." start="" your="" code="" with="" #="" 7a,="" #="" 7b,="" #="" 7c="" …="" 8.="" repeat="" all="" of="" a="" –="" f="" in="" #5="" using="" size="50." start="" your="" code="" with="" #="" 8a,="" #="" 8b,="" #="" 8c="" …="" 9.="" calculate="" the="" mean="" of="" price.="" 10.="" as="" the="" sample="" size="" increases="" from="" 5="" to="" 50,="" how="" does="" the="" mean="" of="" sample_means_="" size="" compare="" with="" the="" mean="" of="" price?="" 11.="" as="" the="" sample="" size="" increases="" from="" 5="" to="" 50,="" are="" your="" answers="" to="" #5="" –="" 8,="" (e="" )="" and="" (f="" )="" getting="">- mean(sample(price, size)) } # this loop takes the mean of the samples and puts them in the ith entry of vector, sample_means_ size, each time. b. do a histogram of sample_means_ size. be sure to label the axes. add the title: sample size of size. c. what kind of distribution is sample_means_ size? d. calculate the mean. e. calculate the standard deviation of sample_means_ size. f. calculate the standard deviation of price divided by the square root of the sample size. 6. repeat all of a – f in #5 using size = 10. start your code with # 6a, # 6b, # 6c … 7. repeat all of a – f in #5 using size = 30. start your code with # 7a, # 7b, # 7c … 8. repeat all of a – f in #5 using size = 50. start your code with # 8a, # 8b, # 8c … 9. calculate the mean of price. 10. as the sample size increases from 5 to 50, how does the mean of sample_means_ size compare with the mean of price? 11. as the sample size increases from 5 to 50, are your answers to #5 – 8, (e ) and (f ) getting closer?>