--- title: "HW 5" subtitle: 'Introduction to R' output: pdf_document: default html_notebook: default --- Run this code to load the R objects for this homework: ```{r} load( "HW5 R Objects.Rdata" ) ```...

1 answer below »
High School Intro to R Homework


--- title: "HW 5" subtitle: 'Introduction to R' output: pdf_document: default html_notebook: default --- Run this code to load the R objects for this homework: ```{r} load( "HW5 R Objects.Rdata" ) ``` # Problem 1: Sum of Squares In this problem, we'll calculate the same value in 3 different ways. The sum of the squares of the first $n$ positive integers is: $$ 1^2 + 2^2 + 3^2 + \ldots + n^2\ =\ \frac{ n \times (n+1) \times (2n + 1)}{6} $$ For instance, suppose $n = 5$. Then the sum of the first $n = 5$ positive integers is: $$ 1^2 + 2^2 + 3^2 + 4^2 + 5^2\ =\ 1 + 4 + 9 + 16 + 25\ =\ 55 $$ The right-hand side of the formula is: $$ \frac{5 \times (5 + 1) \times (2 \times 5 + 1)}{6}\ =\ \frac{5 \times 6 \times 11}{6}\ =\ 55 $$ ## Part (a) Use the formula to calculate the sum of the first 70 positive integers. Report your result using a `cat()` statement. **Solution** ## Part (b) Use a vectorized approach to calculate the sum of the first 20 positive integers. Report your result using a `cat()` statement. **Solution** ## Part (c) Use a `for` loop to calculate the sum of the first 20 positive integers. Report your result using a `cat()` statement. **Solution** \newpage End of problem 1 \newpage # Problem 2: Removing -9 and -99 ## Part (a) Construct a stripchart of the values in `problem.2.data`. Notice that the data contains some values that are -9 and some values that are -99. **Solution** ## Part (b) Replace the values in `problem.2.data` that are equal to -9 with the special value `NA`. Write a short sentence explaining which locations had a -9 value. **Solution** ## Part (c) Now replace the values in `problem.2.data` that are equal to -99 with the special value `NA`. Write a short sentence explaining which locations had a -99 value. At the end of this problem, the vector `problem.2.data` should have the same values as it originally did, except that both -9 and -99 values have been replaced with `NA` values. **Solution** ## Part (d) Now create a new stripchart of the data in `problem.2.data`. Since the -9 and -99 values have been converted to `NA` values, you should not see them in the graph. **Solution** \newpage End of problem 2 \newpage # Problem 3: Temperature Conversion Let $F$ denote a temperature measurement in degrees Fahrenheit, and $C$ denote a temperature measurement in degrees Centigrade. Then we have: $$ F\ =\ \frac{9}{5} \cdot C + 32 $$ Conversely, we also have: $$ C\ =\ \frac{5}{9} \cdot (F - 32) $$ ## Part (a) The vector `problem.3.a.data` consists of a sequence of temperature measurements, recorded in degrees Fahrenheit. Using vectorized operations, convert these measurements to degrees Centigrade. Then report the sample mean of this data using a `cat()` statement, rounding to 5 decimal places. **Solution** ## Part (b) The vector `problem.3.b.data` consists of a sequence of temperature measurements, recorded in degrees Centigrade. Using vectorized operations, convert these measurements to degrees Fahrenheit. Then report the sample maximum and sample minimum of this data using a separate `cat()` statement for each value, rounding to 5 decimal places. **Solution** \newpage End of problem 3 \newpage # Problem 4: Graphing the Logistic Function The *logistic* function is defined as: $$ f(x)\ =\ \frac{ e^x }{1 + e^x} $$ Draw a graph of this function: * First, draw the function curve itself, with the $x$-axis ranging from -6 to +6 and the $y$-axis ranging from 0 to 1.5. Use a solid line for the curve, and choose a nice color. * Draw the horizontal reference line $y = 0$ from $x = -6$ to $x = +6$. * Draw a vertical reference line $x = 0$ from $y = 0$ to $y = 1.5$. * Draw the horizontal asymptote $y = 1$ from $x = -6$ to $x = +6$. **Solution** \newpage End of problem 4 \newpage # Problem 5: Filtering Extreme Values Systolic blood pressures are typically in the range of about 120 to 140, although in extreme cases they can be as high as 180. ## Part (a) The variable `problem.5.data` contains numeric values representing systolic blood pressures. Unfortunately, due to data entry errors, there are some values in this dataset that are too large to represent a valid systolic blood pressure. For the first part of this problem, construct a histogram of the data in `problem.5.data`. **Solution** ## Part (b) Using the histogram that you drew, remove the unusual values in the data. You'll have to determine which values are too large, although this should be clear. Don't set these values to `NA` -- instead, actually create a vector which has these values removed. The result should be a vector that has a length that is less than the original vector. When you've finished this filtering operation, report the length of the filtered vector, along with the sample mean of the values in the filtered vector. Use a separate `cat()` statement for each value, rounding to 5 decimal places. **Solution** ## Part (c) Now create a histogram for the filtered data from part (b). Make sure you include a main title as well as titles for the horizontal and vertical axes, select a nice color, and choose the number of breaks. **Solution** \newpage End of problem 5 \newpage # Problem 6: Grouping Categories ## Part (a) The variable `problem.6.data` contains data on support requests at each of the four offices of WiDgT. Create a table of these values, summarizing the number of requests for each of the offices, and display this table directly (i.e.\ you don't need to do anything like a `cat()` statement). **Solution** ## Part (b) Create a pie chart using the tabulated data from part (a). Be sure to give your pie chart a main title, and to choose nice colors for the pie slices. **Solution** ## Part (c) Group the categories "Boston" and "Salt Lake City" together into a category named "Domestic". Group the categories "London" and "Shanghai" together into a category named "International". Then construct and display a table summarizing the total number of requests for these two grouped categories. **Solution** ## Part (d) Display the grouped data from part (c) not as a table of raw counts, but instead as a table of the relative proportions. Round the values to 2 decimal places. **Solution** ## Part (e) Now create a pie chart using the grouped data from part (c). \newpage End of problem 6 \newpage # Problem 7: Stratified Boxplot So far, we've seen 3 ways to repair data: * If missing data is represented by a value such as -9 or -99, we can convert that to `NA`. * We can convert outliers to `NA`. * We can convert the value `Missing` in a factor to `NA`. In this problem, we will put all of these ideas to work, and then make a nice graph at the end. ## Part (a) The variable `problem.7.data.vector` contains numeric data representing sales. Find all the locations where there is a -9, and replace these with the special value `NA`. Also, find all the locations where there is a 99999, and replace these with the special value `NA`. Save this repaired vector in a variable. Then write one or two sentences and tell us which locations had a -9, and which locations had a 99999. At the end of this part, you should have constructed a vector which has all the values of `problem.7.data.vector`, except that -9 and 99999 values have been converted to `NA` values. **Solution** ## Part (b) The variable `problem.7.data.factor` contains factor data representing one of the four office locations of WiDgT, as well as the category "Missing". Replace the "Missing" values with the special value `NA`. Then summarize this categorical data using a table. **Solution** ## Part (c) Using the numeric vector that you created in part (a) and the factor that you created in part (b), create a vertical stratified boxplot. Include a main title and a title for the vertical axis, and choose nice colors for the boxes. **Solution** \newpage End of problem 7 \newpage # Problem 8: Smiley Face In this problem, we will graph a sequence of points that will make a nice design. The variable `problem.8.x.data` contains the $x$-coordinates for a sequence of points. The variable `problem.8.y.data` contains the $y$-coordinate for the same sequence of points. ## Part (a) First, create an empty plot with no data. The $x$ values should range from -3 to 3, and the $y$ values should range from 0 to 4. You don't have to give your graph a main title, and the $x$- and $y$-axis title can just be empty strings i.e.\ just use "". Then graph the sequence of points by making a single call to the `points()` function. I suggest using solid circular points, and you should explicitly select a nice color for the points. **Solution** ## Part (b) Write a `for` loop that iterates over the two vectors, taking corresponding values of the `problem.8.x.data` vector and the `problem.8.y.data` and plotting a single point at that location: * As before, first create an empty plot with no data. The $x$ values should range from -3 to 3, and the $y$ values should range from 0 to 4. You don't have to give your graph a main title, and the $x$- and $y$-axis title can just be empty strings i.e.\ just use "". * Then graph the sequence of points by iterating over the two vectors. (Hint: you can do this by iterating with an index, and then using positive integer indexing to select the elements from the two vectors.) - In the first iteration of the `for` loop, plot a point at the location where the $x$-coordinate is the first value of the `problem.8.x.data` vector and the $y$-coordinate is the first value of the ``problem.8.y.data` vector. - In the second iteration of the `for` loop, plot a point at the location where the $x$-coordinate is the second value of the `problem.8.x.data` vector and the $y$-coordinate is the second value of the ``problem.8.y.data` vector. - In the third iteration of the `for` loop, plot a point at the location where the $x$-coordinate is the third value of the `problem.8.x.data` vector and the $y$-coordinate is the third value of the ``problem.8.y.data` vector. Your `for` loop should iterate over all the points, so make sure you get the upper limit right. (Hint: the $x$ and $y$ data vectors must have the same number of values, so you can just calculate the
Answered Same DayMar 16, 2021

Answer To: --- title: "HW 5" subtitle: 'Introduction to R' output: pdf_document: default html_notebook: default...

Kshitij answered on Mar 17 2021
159 Votes
hw5-r-objects-2fnbrp2g-1.rdata
hw5-r-objects-2fnbrp2g-1.rdata
hw5-s3pejenh.docx
HW 5
Introduction to R
Run this code to load the R objects for this homework:
#load( "HW5 R Objects.Rdata" )
load("~/Downloads/hw5-r-objects-2fnbrp2g.rdata")
Problem 1: Sum of Squares
In this problem, we’ll calculate the same value in 3 different ways.
The sum of the squares of the first positive integers is:
For instance, suppose . Then the sum of the first positive integers is:
The right-hand side of the formula is:
Part (a)
Use the formula to calculate the sum of the first 70 positive integers. Report your result using a cat() statement.
Solution
n=70
sum=0
for(i in 1:70)
{
sum=sum + i^2

}
cat("RHS = ")
## RHS =
c
at(sum)
## 116795
cat("\n")
LHS <- (n * (n + 1) *(2*n +1) )/ 6
cat("LHS = ")
## LHS =
cat(LHS)
## 116795
Part (b)
Use a vectorized approach to calculate the sum of the first 20 positive integers. Report your result using a cat() statement.
Solution
vec <- (1:20)^2
cat(sum(vec))
## 2870
Part (c)
Use a for loop to calculate the sum of the first 20 positive integers. Report your result using a cat() statement.
Solution
n=20
sum=0
for(i in 1:n)
{
sum=sum + i^2

}
cat("By Loop = ")
## By Loop =
cat(sum)
## 2870
End of problem 1
Problem 2: Removing -9 and -99
Part (a)
Construct a stripchart of the values in problem.2.data. Notice that the data contains some values that are -9 and some values that are -99.
Solution
stripchart(problem.2.data)
Part (b)
Replace the values in problem.2.data that are equal to -9 with the special value NA. Write a short sentence explaining which locations had a -9 value.
Solution
problem.2.data[problem.2.data == -9] <- NA
Part (c)
Now replace the values in problem.2.data that are equal to -99 with the special value NA. Write a short sentence explaining which locations had a -99 value.
At the end of this problem, the vector problem.2.data should have the same values as it originally did, except that both -9 and -99 values have been replaced with NA values.
Solution
problem.2.data[problem.2.data == -99] <- NA
Part (d)
Now create a new stripchart of the data in problem.2.data. Since the -9 and -99 values have been converted to NA values, you should not see them in the graph.
Solution
stripchart(problem.2.data)
End of problem 2
Problem 3: Temperature Conversion
Let denote a temperature measurement in degrees Fahrenheit, and denote a temperature measurement in degrees Centigrade. Then we have:
Conversely, we also have:
Part (a)
The vector problem.3.a.data consists of a sequence of temperature measurements, recorded in degrees Fahrenheit. Using vectorized operations, convert these measurements to degrees Centigrade. Then report the sample mean of this data using a cat() statement, rounding to 5 decimal places.
Solution
C<- 5/9 * (problem.3.a.data -32)
meanTemp<-round(mean(C),5)
cat(meanTemp)
## 18.54857
Part (b)
The vector problem.3.b.data consists of a sequence of temperature measurements, recorded in degrees Centigrade. Using vectorized operations, convert these measurements to degrees Fahrenheit. Then report the sample maximum and sample minimum of this data using a separate cat() statement for each value, rounding to 5 decimal places.
Solution
F<- 9/5 * (problem.3.b.data + 32)
meanTemp<-round(mean(F),5)
cat(meanTemp)
## 416.9184
End of problem 3
Problem 4: Graphing the Logistic Function
The logistic function is defined as:
Draw a graph of this function:
· First, draw the function curve itself, with the -axis ranging from -6 to +6 and the -axis ranging from 0 to 1.5. Use a solid line for the curve, and choose a nice color.
· Draw the horizontal reference line from to .
· Draw a vertical reference line from to .
· Draw the horizontal asymptote from to .
Solution
eq = function(x){exp(x)/(1+ exp(x))}
plot(eq(-6:6), type='l', col="cyan")
abline(v=0, col="blue")
abline(h=0, col="red")
End of problem 4
Problem 5: Filtering Extreme Values
Systolic blood pressures are typically in the range of about 120 to 140, although in extreme cases they can be as high as 180.
Part (a)
The variable problem.5.data contains numeric values representing systolic blood pressures. Unfortunately, due to data entry errors, there are some values in this dataset that are too large to represent a valid systolic blood pressure.
For the first part of this problem, construct a histogram of the data in problem.5.data.
Solution
hist(problem.5.data)
Part (b)
Using the histogram that you drew, remove the unusual values in the data. You’ll have to determine which values are too large, although this should be clear. Don’t set these values to NA – instead, actually create a vector which has these values removed. The result should be a vector that has a length that is less than the original vector.
When you’ve finished this filtering operation, report the length of the filtered vector, along with the sample mean of the values in the filtered vector. Use a separate cat() statement for each value, rounding to 5 decimal places.
Solution
data<-problem.5.data[problem.5.data < 250]
Part (c)
Now create a histogram for the filtered data from part (b). Make sure you include a main title as well as titles for the horizontal and vertical axes, select a nice color, and choose the number of breaks.
Solution
hist(data)
End of problem 5
Problem 6: Grouping Categories
Part (a)
The variable problem.6.data contains data on support requests at each of the four offices of WiDgT. Create a table of these values, summarizing the number of requests for each of the offices, and display this table directly (i.e. you don’t need to do anything like a cat() statement).
Solution
table(problem.6.data)
## problem.6.data
## Boston London Salt Lake City Shanghai
## 136 102 277 184
Part (b)
Create a pie chart using the tabulated data from part (a). Be sure to give your pie chart a main title, and to choose nice colors for the pie slices.
Solution
pie(table(problem.6.data), main="Pie Chart City Count", col=rainbow(4))
Part (c)
Group the categories “Boston” and “Salt Lake City” together into a category named “Domestic”. Group the categories “London” and “Shanghai” together into a category named “International”. Then construct and display a table summarizing the total number of requests for these two grouped categories.
Solution
table<-as.data.frame(table(problem.6.data))
table$Category<-c("Domestic","International","Domestic","International")
table
## problem.6.data Freq Category
## 1 Boston 136 Domestic
## 2 London 102 International
## 3 Salt Lake City 277 Domestic
## 4 Shanghai 184 International
Part (d)
Display the grouped data from part (c) not as a table of raw counts, but instead as a table of the relative proportions. Round the values to 2 decimal places.
Solution
x <- table(problem.6.data)
piepercent<- round(100*x/sum(x), 1)
pie(x, labels = piepercent, main = "City pie chart",col = rainbow(length(x)))
legend("topright", names(x), cex = 0.8, fill = rainbow(length(x)))
## Part (e)
Now create a pie chart using the grouped data from part (c).
x <- table
pie(x$Freq, labels = piepercent, main = "City pie chart",col =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here