1. The data set Xdata.RData contains a data frame, Xdata , that has n = 100 observations (rows) and five variables named X1, X2, X3, X4 , and Y . This is a simulated data set generated from the...

1 answer below »

1. The data set
Xdata.RData
contains a data frame,
Xdata, that has n = 100 observations


(rows) and five variables named
X1, X2, X3, X4
, and
Y. This is a simulated data set generated


from the following model:



The error term ε is a normal random variable with mean 0 and standard deviation σ. In the


simulation the following parameter values were used: β0
= 0, β1
= 1, β2
= 1, β3
= 1, β4
= 1, and


σ = 0.5.



How successfully can the iterative process described above identify the model that generated the


data? To answer this question, ask what you should expect to see for the Box-Cox parameter λ,


and transformations of the predictor variables. Do you get something similar from the data?


Can you identify the model coefficients reasonably well?



2. Use the
Boston Housing
R Data(BHD0.RData)to give a 95% prediction interval for the median home value of a census tract that has the following characteristics: NOX = .65, RM = 5.5, AGE = 80, and LSTAT = .16. Use a logarithm transformation with MEDV, and assume that the predictors are correctly treated without using transformations.




3. For the
prostate
R data, fit a model with
lpsa
as the response and the other variables as predictors. Answer the following questions:



(a) Check for outliers.



(b) Check for influential points.



(c) Check the structure of the relationship between the predictors and the response.




4. Use the
fat
R data, fitting the model described in Section 4.2.



> data(fat,package="faraway")


> lmod



(a) Compute the condition numbers and variance inflation factors. Comment on the degree of collinearity observed in the data.



(b) Cases 39 and 42 are unusual. Refit the model without these two cases and recompute the collinearity diagnostics. Comment on the differences observed from the full data fit.



(c) Fit a model with
brozek
as the response and just
age,
weight
and
height
as predictors. Compute the collinearity diagnostics and compare to the full data fit.



(d) Compute a 95% prediction interval for
brozek
for the median values of
age,
weight
and
height.



(e) Compute a 95% prediction interval for
brozek
for
age=40,
weight=200 and
height=73. How does the interval compare to the previous prediction?



(f) Compute a 95% prediction interval for
brozek
for
age=40,
weight=130 and
height=73. Are the values of predictors unusual? Comment on how the interval compares to the previous two answers.




5. Ankylosing spondylitis is a chronic form of arthritis. A study was conducted to determine whether daily stretching of the hip tissues would improve mobility. The R data are found in
hips. The flexion angle of the hip before the study is a predictor and the flexion angle after the study is the response.



(a) Plot the data using different plotting symbols for the treatment and the control status.



(b) Fit a model to determine whether there is a treatment effect.



(c) Compute the difference between the flexion before and after and test whether this difference varies between treatment and control. Contrast this approach to your previous model.

Answered 1 days AfterDec 07, 2021

Answer To: 1. The data set Xdata.RData contains a data frame, Xdata , that has n = 100 observations (rows) and...

Subhanbasha answered on Dec 08 2021
115 Votes
Report
Question 1:
Ans:
The generated output data used that simulation process in the R to get the sample generated data by giving the values of coefficients of the predictors.
The model where the data is generated by is
Y= sqrt(X1^2)+sqrt(X2^2)+sqrt(X3^2)+sqrt(X4^2)
The co efficient values are same as we round it into single digit value. So, that the values given by our model is approximately equal to the actual simulated data results. We can say that the results are near similar. And tried lot more models to
identify the similar result model finally got the model with these values.
Question 2:
Ans: we have the data in the file so used the load function to read the data into R. The code as follows
# Reading data
bhd <- load("bhd0.rdata")
# Regression model
bhd_lm = lm(log(MEDV) ~ ., data = BHD)
The above coding is for reading data and building a regression model using data. In the data we have 5 variables in that we are considering one variable that is MEDV as response variable and rest of the other variables as predictors. Here we are taking response variable as logarithm of the response after that only we are generating the model. After that we are going to predict the data by giving the new data.
# Prediction interval
testing <- data.frame(NOX=0.65,RM=5.5,AGE=80,LSTAT=0.16)
pred_lm1 <- predict(bhd_lm, newdata = testing)
predict(bhd_lm, newdata = testing, interval = 'prediction')
In the above step we created data frame with new data and then will pass in the predict() function then it will generate the predicted value that is MEDV value for the given input. The predicted value is 2.953969 and this is not the original value because in modeling we have used logarithm for the response variable so need to convert the original value.
The predicted interval is
fit lwr upr
2.953969 2.661829 3.24611
Here the original predicted value is 2.953969 and the interval is (2.661829, 3.24611) this means that with 95% confidence the predicted value is in between the range of (2.661829, 3.24611). The predictor won’t go outside this range.
Question 3:
Ans: The data is ready in R that is prostate data we have used that data for building a model of regression by lpsa as a response variable and other than this all are predictors.
# Model building
pro_lm <-lm(lpsa~.,data = prostate)
# Summary of the model
summary(pro_lm)
The summary of the model will help us to know about the model performance with some of the indicator and we can get from here that co efficients.
a).
Ans: There are many plots to check the outliers in the data or model here we are going to use the normal qq plot from the regression model output.
# Plots
plot(pro_lm)
By observing the above normal plot in the top and starting points the records 39, 85, 69 are far from the remaining data where those are called the outliers in the model. So as to get the good result need to remove those before building a model.
b).
Ans:
The above two plots are clearly showing that there are some points where they are not a same pattern of the model they are some far away from the model so we can say that there are many influential points.
c).
The model output as follows
Residuals:
Min 1Q Median 3Q Max
-1.7331 -0.3713 -0.0170 0.4141 1.6381
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.669337 1.296387 0.516 0.60693
lcavol 0.587022 0.087920 6.677 2.11e-09 ***
lweight 0.454467 0.170012 2.673 0.00896 **
age -0.019637 0.011173 -1.758 0.08229 .
lbph 0.107054 0.058449 1.832 0.07040 .
svi 0.766157 0.244309 3.136 0.00233 **
lcp -0.105474 0.091013 -1.159 0.24964
gleason 0.045142 0.157465 0.287 0.77503
pgg45 0.004525 0.004421 1.024 0.30886
By observing the above output we can say that there is positive relation between the response and predictor variables. The R square value of the model is 0.6234 which means that predictors are explaining 62.34% change present in the response variable which is somehow good not consider it as good measure. Some of the predictors are having negative relation and some have the positive relation with the response variable.
Question 4:
The data fat is inbuilt data in R where we are using that to create the regression model as follows.
# data
data(fat,package="faraway")
# Model building
lmod<-lm(brozek~age + weight + height + neck + chest + abdom +hip + thigh + knee + ankle + biceps + forearm + wrist, data=fat)
Here used brozek as the response variable and other than this all are treated as predictor variable.
Here we are not used any transformation on any of the variables in the model. And the model summary as follows
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.29255 16.06992 -0.952 0.34225
age 0.05679 0.02996 1.895 0.05929 .
weight -0.08031 0.04958 -1.620 0.10660
height -0.06460 0.08893 -0.726 0.46830
neck -0.43754 0.21533 -2.032 0.04327 *
chest -0.02360 0.09184 -0.257 0.79740 ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here