Option #1: House Price Prediction In real estates, housing market prediction (forecasting) is crucial. There are many factors that may influence the house prices. The datasets housing.training.csv and...

1 answer below »

Option #1:House Price Prediction


In real estates, housing market prediction (forecasting) is crucial. There are many factors that may influence the house prices. The datasets
housing.training.csv
and
housing.testing.csv
contain 25 quantitative explanatory variables describing many aspects of residential homes in Ames, IA.


The goal of this project is to predict house prices. To this end, we will be using regression analysis.



  1. In Week 4 Portfolio Milestone, you've examined housing.training.csv dataset. Now, examine housing.testing.csv dataset and perform the same tasks as given in Week 4 Portfolio Milestone. Using R, calculate the summary statistics (minimum, maximum, mean, median, and standard deviation) and create a histogram of sale price for each dataset. Comparing withhousing.training,csvdataset, describe the similarities and/or differences.

  2. Combine the two datasetshousing.training.csvandhousing.testing.csv. This can be done in R by using the functioncombine(). Create a histogram of sale prices for the combined dataset and compare it with the histograms from training and testing datasets. Describe the similarities and differences.

  3. Using only the datasethousing.training.csv, fit a linear regression model using all the explanatory variables and SalePrice as the response variable.

  4. What are the significant factors? How do these variables relate to the sale price? Interpret your estimated model.

  5. Remove all the rows with missing values (NA) from the datasethousing.testing.csv. The functioncomplete.cases()can be used. Using only the first 20 rows fromhousing.testing.csv, predict the sale price. The R functionpredict()can perform this task. You should have 20 predicted sale prices.

  6. Compare the predicted sale prices to the actual sale prices from thehousing.testing.csvdataset (the first 20 rows). How good is your prediction?


For each R output result, you may either type directly into a Word document or take a screenshot. If you take the screenshot, make sure that the current date is shown.


Ensure everything is clearly labeled.The report must be 10-12 pages long,includinga title page and reference page (the report itself should be 8-10 pages). Cite 2-3 academic sources other than the textbook, course materials, or other information provided as part of the course materials. Follow APA format, according toCSU Global Writing Center(Links to an external site.).

Answered 2 days AfterMay 06, 2021

Answer To: Option #1: House Price Prediction In real estates, housing market prediction (forecasting) is...

Suraj answered on May 08 2021
146 Votes
Assignment
Topic: House Price Prediction Using Linear Regression
(Using R)
Submitted To:
Submitted By:
Date:
Introduction: The prediction is one of the main problem areas of the research in these da
ys. So, there are many projects which are only based on the prediction of some sort of variables which may be necessary for our daily life. In this way, the prediction of house price is one of the main and interesting field of work. Because in housing market the prediction of house is the main task. Thus, our project is also based on the housing prediction. We have two data sets training and testing. Both the data sets have same number of variables but different number of rows. The data set contains 25 different types of variables which may be crucial for the prediction of the house price. All the variables are quantitative in nature. A quantitative which is measurable is called the quantitative variable.
Thus, to make the predictions or make the model of house price prediction we will use multiple regression analysis technique. Since, multiple regression is one of the easiest and most reliable analysis technique to use. It is totally based on the ordinary least square (OLS) method. That is, it minimizes the total error and then estimates the coefficient of different independent variables. Hence, this is one of the most usable and trustworthy algorithm to use in such kind of situations.

A general multiple regression model is look like as follows:
Y = a + b1X1 + b2X2 + … + bnXn
Where Y is the prediction value of the house price, a is the intercept term of the model, while b1, b2, …, bn are the coefficients of the regression model and X1, X2, …, Xn are the values of the independent variables.
We will use a statistical software to do our analysis and that software is R. R is the trending software in today’s world in data analysis field. So, it is very helpful to work or do our analysis by using such type of software.
Analysis:
1)
The analysis is given as follows:
The both data files are loaded into the R studio as follows:
df1<-read.csv("C:/Users/Hp/Desktop/testing.csv")
df2<-read.csv("C:/Users/Hp/Desktop/training.csv")
The summary statistics for all the 25 variables is calculated using R and the output is given as follows in image form.
Now, we will make the histogram of both the data sets that is for training and for testing data set. The main use of the histogram is to compare the distribution of a continuous variable. Here, our variable of interest is sale price. Thus, the histogram for both the variables is given as follows:
The histogram for the testing data set is given as follows:
The histogram for sale price for training variable is given as...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here