*** FOR THE ASSIGNMENT, I JUST NEED ASSISTANCE WITH QUESTIONS 3, 4, 5. I ALREADY DID 1 AND 2 AND I...

Question

*** FOR THE ASSIGNMENT, I JUST NEED ASSISTANCE WITH QUESTIONS 3, 4, 5. I ALREADY DID 1 AND 2 AND I CAN FINISH THE REST ***

Option #1:Linear Regression Model

In this Critical Thinking Assignment, you will install R Markdown, explore and summarize a dataset as well as create a linear regression model. Your assignment submission will be an R Markdown generated Word document.

Install R Markdown

Download Install R Markdown

. Create a new R Markdown file by performing the following steps.

Open R Studio

Select File | New | R Markdown

UseModule 3 CT Option 1as the Title

Use your name as the Author

Select the Word output format

Delete all default content after the R Setup block of code, which is all content from line 12 through the end of the file.

Explore Boston housing in the

BostonHousing.csv

Download BostonHousing.csv

file by performing the following steps.

Apply what you learned in Modules 1 and 2 about data exploration by selecting and running appropriate data exploration functions. Run at least five functions.

For your assignment submission, copy your commands into your R Markdown file.
1. Include R comments on all your code.
2. Separate sections of R code by using appropriate R Markdown headings.

Fit a multiple linear regression model to the median house price (MEDV) as a function of CRIM, CHAS, and RM by following the process underExample: Predicting the Price of Used Toyota Corolla Carsin section 6.3.

Use the R code example shown in Figure 6.3.

Hint:You will need to remove the categorial variable CAT…MEDV prior to fitting the multiple linear regression model.

Create a scatter plot with the plot() function with the following attributes.
1. Use MEDV as the y-axis
2. Use the most significant attribute as the x-axis.
3. Use the abline() function to add a linear regression line to the scatter plot. Use the y-intercept as the y-value and the factor value of the most significant attribute as the slope value.

For your assignment submission, copy your commands into your R Markdown file.
1. Include R comments on all your code.
2. Separate sections of R code by using appropriate R Markdown headings.

Use the R MarkdownKnitdrop-down menu to selectKnit to Wordto create the Word document for your assignment submission.

Software dropdown menu with the option for ‘knit to Word’ selected.

Your assignment submission must be one Word document that meets the following requirements:

Is an R Markdown generated Word document containing all R code used in this assignment, appropriate R comments on code, and appropriate R Markdown headings?

Does not include a cover page.

Does not include an abstract.

Includes a one-page description of what you did and what you learned. Add this description to the end of the R Markdown document as a new page. This page must conform to APA guidelines in the

CSU Global Writing CenterLinks to an external site.

.

**************************************************

"6.3 Estimating the Regression Equation and Prediction

Once we determine the predictors to include and their form, we estimate the coefficients of the regression formula from the data using a method called ordinary least squares (OLS). This method finds values that minimize the sum of squared deviations between the actual outcome values (Y) and their predicted values based on that model ().

To predict the value of the outcome variable for a record with predictor values x1, x2, …, xp, we use the equation

(6.2)"

"Predictions based on this equation are the best predictions possible in the sense that they will be unbiased (equal to the true values on average) and will have the smallest mean squared error compared to any unbiased estimates if we make the following assumptions:

The noise ε (or equivalently, Y) follows a normal distribution.

The choice of predictors and their form is correct (linearity).

The records are independent of each other.

The variability in the outcome values for a given set of predictors is the same regardless of the values of the predictors (homoskedasticity).

An important and interesting fact for the predictive goal is that even if we drop the first assumption and allow the noise to follow an arbitrary distribution, these estimates are very good for prediction, in the sense that among all linear models, as defined by equation (6.1), the model using the least squares estimates, , will have the smallest mean squared errors. The assumption of a normal distribution is required in explanatory modeling, where it is used for constructing confidence intervals and statistical tests for the model parameters.

Even if the other assumptions are violated, it is still possible that the resulting predictions are sufficiently accurate and precise for the purpose they are intended for. The key is to evaluate predictive performance of the model, which is the main priority. Satisfying assumptions is of secondary interest and residual analysis can give clues to potential improved models to examine.

Example: Predicting the Price of Used Toyota Corolla Cars

A large Toyota car dealership offers purchasers of new Toyota cars the option to buy their used car as part of a trade-in. In particular, a new promotion promises to pay high prices for used Toyota Corolla cars for purchasers of a new car. The dealer then sells the used cars for a small profit. To ensure a reasonable profit, the dealer needs to be able to predict the price that the dealership will get for the used cars. For that reason, data were collected on all previous sales of used Toyota Corollas at the dealership. The data include the sales price and other information on the car, such as its age, mileage, fuel type, and engine size. A description of each of these variables is given in Table 6.1. A sample of this dataset is shown in Table 6.2. The total number of records in the dataset is 1000 cars (we use the first 1000 cars from the dataset ToyotoCorolla.csv). After partitioning the data into training (60%) and validation (40%) sets, we fit a multiple linear regression model between price (the outcome variable) and the other variables (as predictors) using only the training set. Table 6.3 shows the estimated coefficients. Notice that the Fuel Type predictor has three categories (Petrol, Diesel, and CNG). We therefore have two dummy variables in the model: Fuel_TypePetrol (0/1) and Fuel_TypeDiesel (0/1); the third, for CNG (0/1), is redundant given the information on the first two dummies. Including the redundant dummy would cause the regression to fail, since the redundant dummy will be a perfect linear combination of the other two; R’s “lm” routine handles this issue automatically.

Table 6.1 Variables in the Toyota Corolla Example

Variable

Description

Price

Offer price in Euros

Age

Age in months as of August 2004

Kilometers

Accumulated kilometers on odometer

Fuel Type

Fuel type (Petrol, Diesel, CNG)

HP

Horsepower

Metallic

Metallic color? (Yes = 1, No = 0)

Automatic

Automatic (Yes = 1, No = 0)

CC

Cylinder volume in cubic centimeters

Doors

Number of doors

QuartTax

Quarterly road tax in Euros

Weight

Weight in kilograms"

"Table 6.2 Prices and Attributes for Used Toyota Corolla Cars (selected rows and columns only)

Price

Age

Kilometers

Fuel Type

HP

Metallic

Automatic

CC

Doors

Quart Tax

Weight

13500

23

46986

Diesel

90

1

0

2000

3

210

1165

13750

23

72937

Diesel

90

1

0

2000

3

210

1165

13950

24

41711

Diesel

90

1

0

2000

3

210

1165

14950

26

48000

Diesel

90

0

2000

3

210

1165

13750

30

38500

Diesel

90

0

2000

3

210

1170

12950

32

61000

Diesel

90

0

2000

3

210

1170

16900

27

94612

Diesel

90

1

0

2000

3

210

1245

18600

30

75889

Diesel

90

1

0

2000

3

210

1245

21500

27

19700

Petrol

192

0

1800

3

100

1185

12950

23

71138

Diesel

69

0

1900

3

185

1105

20950

25

31461

Petrol

192

0

1800

3

100

1185

19950

22

43610

Petrol

192

0

1800

3

100

1185

19600

25

32189

Petrol

192

0

1800

3

100

1185

21500

31

23000

Petrol

192

1

0

1800

3

100

1185

22500

32

34131

Petrol

192

1

0

1800

3

100

1185

22000

28

18739

Petrol

192

0

1800

3

100

1185

22750

30

34000

Petrol

192

1

0

1800

3

100

1185

17950

24

21716

Petrol

110

1

0

1600

3

85

1105

16750

24

25563

Petrol

110

0

1600

3

19

1065

16950

30

64359

Petrol

110

1

0

1600

3

85

1105

15950

30

67660

Petrol

110

1

0

1600

3

85

1105

16950

29

43905

Petrol

110

0

1

1600

3

100

1170

15950

28

56349

Petrol

110

1

0

1600

3

85

1120

16950

28

32220

Petrol

110

1

0

1600

3

85

1120

16250

29

25813

Petrol

110

1

0

1600

3

85

1120

15950

25

28450

Petrol

110

1

0

1600

3

85

1120

17495

27

34545

Petrol

110

1

0

1600

3

85

1120

15750

29

41415

Petrol

110

1

0

1600

3

85

1120

11950

39

98823

CNG

110

1

0

1600

5

197

1119

Table 6.3 Linear regression model of price vs. car attributes

code for fitting a regression model

car.df <->

# use first 1000 rows of data

car.df <- car.df[1:1000,="">

# select variables for regression

selected.var <- c(3,="" 4,="" 7,="" 8,="" 9,="" 10,="" 12,="" 13,="" 14,="" 17,="">

# partition data

set.seed(1) # set seed for reproducing the partition

train.index <- sample(c(1:1000),="">

train.df <- car.df[train.index,="">

valid.df <- car.df[-train.index,="">

# use lm() to run a linear regression of Price on all the predictors in the

# training set (it will automatically turn Fuel_Type into dummies).

# use . after ~ to include all the remaining columns in train.df as predictors.

car.lm <- lm(price="" ~="" .,="" data="">

# use options() to ensure numbers are not displayed in scientific notation.

options(scipen = 999)

summary(car.lm)

Partial Output

> summary(car.lm)

Call:

lm(formula = Price ~ ., data = train.df)

Residuals:

Min 1Q Median 3Q Max

-8212.5 -839.2 -14.3 831.5 7270.7

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1774.877829 1643.744823 -1.080 0.2807

Age_08_04 -135.430875 4.875906 -27.776 < 0.0000000000000002="">

KM -0.019003 0.002341 -8.116 0.00000000000000283 ***

Fuel_TypeDiesel 1208.339159 534.431400 2.261 0.0241 *

Fuel_TypePetrol 2425.876714 520.587979 4.660 0.00000391697679667 ***

HP 38.985537 5.587183 6.978 0.00000000000811621 ***

Met_Color 84.792715 126.883452 0.668 0.5042

Automatic 306.684154 289.433138 1.060 0.2898

CC 0.031966 0.099075 0.323 0.7471

Doors -44.157742 64.056530 -0.689 0.4909

Quarterly_Tax 16.677343 2.602668 6.408 0.00000000030287017 ***

Weight 12.667487 1.536587 8.244 0.00000000000000109 ***

---

Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 1406 on 588 degrees of freedom

Multiple R-squared: 0.8567,

Adjusted R-squared: 0.854

F-statistic: 319.6 on 11 and 588 DF, p-value: <>

"The regression coefficients are then used to predict prices of individual used Toyota Corolla cars based on their age, mileage, and so on. Table 6.4 shows a sample of predicted prices for 20 cars in the validation set, using the estimated model. It gives the predictions and their errors (relative to the actual prices) for these 20 cars. Below the predictions, we have overall measures of predictive accuracy. Note that the mean error (ME) is < ent=""> $ −40 and RMSE = $1321. A histogram of the residuals (Figure 6.1) shows that most of the errors are between ± $ 2000. This error magnitude might be small relative to the car price, but should be taken into account when considering the profit. Another observation of interest is the large positive residuals (under-predictions), which may or may not be a concern, depending on the application. Measures such as the mean error, and error percentiles are used to assess the predictive performance of a model and to compare models."

mis510bostonhousing-dc3qcwk0.csv module-3-ct-option-1-ithiiorp.rmd module-3-ct-option-1-f13qaupt.docx

* FOR THE ASSIGNMENT, I JUST NEED ASSISTANCE WITH QUESTIONS 3, 4, 5. I ALREADY DID 1 AND 2 AND I CAN FINISH THE REST *Option #1:Linear Regression ModelIn this Critical Thinking Assignment, you...

Option #1:Linear Regression Model

**************************************************

Get Answer To This Question

Related Questions & Answers

Submit New Assignment

*** FOR THE ASSIGNMENT, I JUST NEED ASSISTANCE WITH QUESTIONS 3, 4, 5. I ALREADY DID 1 AND 2 AND I CAN FINISH THE REST ***Option #1:Linear Regression ModelIn this Critical Thinking Assignment, you...

Option #1:Linear Regression Model

**************************************************

Get Answer To This Question

Related Questions & Answers

Submit New Assignment

* FOR THE ASSIGNMENT, I JUST NEED ASSISTANCE WITH QUESTIONS 3, 4, 5. I ALREADY DID 1 AND 2 AND I CAN FINISH THE REST *Option #1:Linear Regression ModelIn this Critical Thinking Assignment, you...