1. The dataset prostate comes from a study on 97 men with prostate cancer who were due to receive a radical prostatectomy. Fit a model with lpsa as the response and lcavol as the predictor. Record the...

1 answer below »

1. The dataset prostate comes from a study on 97 men with prostate cancer who were due to receive a radical prostatectomy. Fit a model with lpsa as the response and lcavol as the predictor. Record the residual standard error and the R2. Now add lweight, svi, lbph, age, lcp, pgg45 and gleason to the model one at a time. For each model record the residual standard error and the R2. Plot the trends in these two statistics.
2. Using the prostate data, plot lpsa against lcavol. Fit the regressions of lpsa on lcavol and lcavol on lpsa. Display both regression lines on the plot. At what point do the two lines intersect?
3. An experiment was conducted to examine factors that might affect the height of leaf springs in the suspension of trucks. The data may be found in truck. The five factors in the experiment are set to − and + but it will be more convenient for us to use −1 and +1. This can be achieved for the first factor by:truck$B Repeat for the other four factors.(a) Fit a linear model for the height in terms of the five factors. Report on the value of the regression coefficients.(b) Fit a linear model using just factors B, C, D and E and report the coefficients. How do these compare to the previous question? Show how we could have anticipated this result by examining the X matrix.(c) Construct a new predictor called A which is set to B+C+D+E. Fit a linear model with the predictors A, B, C, D, E and O. Do coefficients for all six predictors appear in the regression summary? Explain.(d) Extract the model matrix X from the previous model. Attempt to compute ˆ β from (X^T X)^−1 X^T y. What went wrong and why?

prostate-data-wignrwef.r truck-data-3o0qepy3.r

Answered 1 days AfterNov 15, 2021

Answer To: 1. The dataset prostate comes from a study on 97 men with prostate cancer who were due to receive a...

Subhanbasha answered on Nov 16 2021

124 Votes

Data Analysis
Introduction:
    The analysis here we are going to do is that regression analysis by using R Programming software which is the one of the most powerful statistical software in various domains. The regression analysis will help us to know the relation between the features. That means that there is any relation between the variable if yes then we can use this relation to make the predictions. Here we use the analysis of regression where many of the industries usually use to get their future performance of their business in the way that they take better decisions to make the business into profitable way.
    Here we are doing the regression analysis like trial and error method that means initially taking the single variable into the model and each step adding all other variables to get to know the effectiveness or the relation with the dependent variable of predictor variables also there are many problems will occur if the relation go out to the what we expected. In this analysis we are mainly focusing on the residual standard error and the R square value. Residual standard error will explain us the how far the actual values with predictions so that we can analyze our model performance and can take the necessary action to the model to improve the performance.
As we mentioned above R square is also the key performance indicator of the model. The r square will explain the how much percentage of the variation present in the response variable will explain by the predictor variables. From that the percentage we can get to know how much percentage the predictor variables will be useful in making the predictions.
The above all process will be done by using the statistical software R and generated the outputs.
Analysis – using prostate data:
Question1:
    Here the data prostate is an inbuilt data set in the package of faraway which is the free package in the R software. Prostate data is a type of cancer that was done study on the 97 men.
Here first we are using two variables to make a model those are lpsa and lcavol the response variable is lpsa that means dependent variable and predictor variable is lcavol means that independent variable.
Fitting a model by using above mentioned two variables as follows using R
# Model1
model1 <- lm(lpsa~lcavol, data=prostate_df)
# Residual Standard error and R square
summ1 <- summary(model1)
R_sq1 <- summ1$r.squared
Res1 <- summ1$sigma
The output that is residual standard error and R square value is (0.79, 0.54) (rounding to two digits).
In the next step we are going to add another variable to the existing model that is lweight variable. The fit of the model and the output of the model as followed.
# Model2
model2 <- lm(lpsa~lcavol+lweight, data=prostate_df)
# Residual Standard error and R square
summ2 <- summary(model2)
R_sq2 <- summ2$r.squared
Res2 <- summ2$sigma
The output that is residual standard error and R square value is (0.75, 0.58) (rounding to two digits).
By comparing the previous model where we used only one predictor variable here the residual standard error is decreased and R square value is increased which is the indicator of better performance of the previous model. That means the variable lweight is explaining some of the variation in the response variable means that this is a useful variable in the model.
In the next step we are going to add another variable to the existing model that is svi variable. The fit of the model and the output of the model as followed.
# Model3
model3 <- lm(lpsa~lcavol+lweight+svi, data=prostate_df)
# Residual Standard error and R square
summ3 <- summary(model3)
R_sq3 <- summ3$r.squared
Res3 <- summ3$sigma
The output that is residual standard error and R square value is (0.72, 0.63) (rounding to two digits).
By comparing the previous model where we used only one predictor variable here the residual standard error is decreased and R square value is increased which is the indicator of better performance of the previous model. That means the variable svi is explaining some of the variation in the response variable means that this is a useful variable in the model.
In the next step we are going to add another variable to the existing model that is lbph variable. The fit of the model and the output of the model as followed.
# Model4
model4 <- lm(lpsa~lcavol+lweight+svi+lbph, data=prostate_df)
# Residual Standard error and R square
summ4 <- summary(model4)
R_sq4 <- summ4$r.squared
Res4 <- summ4$sigma
The output that is residual standard error and R square value is (0.71, 0.64) (rounding to two digits).
By comparing the previous model where we used only one predictor variable here the residual...

SOLUTION.PDF

1. The dataset prostate comes from a study on 97 men with prostate cancer who were due to receive a radical prostatectomy. Fit a model with lpsa as the response and lcavol as the predictor. Record the...

Answer To: 1. The dataset prostate comes from a study on 97 men with prostate cancer who were due to receive a...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment