Business Analytics and Big Data (ACC73002) Assignment 3 – Report (50%) You are hired as a consultant to analyse the real estate market in non-capital cities and towns in the states A and B....

1 answer below »



Business Analytics and Big Data (ACC73002)




Assignment 3 – Report (50%)



You are hired as a consultant to analyse the real estate market in non-capital cities and towns in the states A and B. Safe-As-House Real Estate, a large national real estate company, has collected samples of recent residential sales from a sample of non-capital cities and towns in these states. Use the data stored in to develop the best model to predict house and unit prices. Prepare a 1,500-word report.






Structure


· Introduction and background (5 marks)


· Data and empirical strategy (15 marks)


· Results and discussion (20 marks)


· Recommendations (10 marks)




Statistical component


· Provide descriptive analysis of the attributes of properties in State A and State B


· Develop a multiple regression model to predict house price.


· Develop a multiple regression model to predict unit price.






Please refer to slides and lecture on this assessment for further instructions.


Answered Same DaySep 23, 2021ACC73002Southern Cross University

Answer To: Business Analytics and Big Data (ACC73002) Assignment 3 – Report (50%) You are hired as a...

Pritam answered on Sep 27 2021
148 Votes
INTRODUCTION
The purpose of the analysis here is mainly to build a proper model to determine the price of the property based on different attributes. As a consultant of the real estate market, it is really of utter importance to build a model as accurate as possible. Six different samples from different states and regions are quite enough to determine the best model that can predict the prices of the property accurately.
Data and Empirical Strategy
The data set mainly contains two sets of data for two different states, one of them being State A and the other being State B. In these data sets one can find three different regions for that particular state. The regions are explained to be regional city, coastal city and coastal town. The variables present in the data are Price, Internal Area, number of bedrooms, number of bathrooms, number of garages and type of
the property. All of them except the last one are numerical variables and the last one, Type, is a categorical variable. In the excel sheet containing the state A data has again two sets of data for each region and hence just one of them each has been taken for further analysis. The visualization might be considered as the building blocks of the analysis. The backbone of any analysis is the pre-analysis involving data manipulation and visualization techniques. Since the data doesn’t require any manipulation here, one can start the analysis through visualization. The statistical methods that have been used here is multiple linear regression analysis. Before building any kind of multiple linear regression, one has to check all the assumptions of the regression and then the backward elimination can be applied to create the final model.
Results and Discussions:
The first step in the analysis is to produce some visualization techniques to have a taste of the data, rather the understanding of the data is very significant before any kind of analysis. Hence some graphs are stated below for different states and regions to visualize the aspect of the analysis.
Some random visualization:
State A: Regional city:
State B: Regional city:

From the visualization, one thing is quite clear that the Internal area variable is quite positively related to other variables for both state A and state B. This can be an alarming issue for multicollinearity. The price seems to be quite higher for state A than that of state B.
State A: Coastal city:

State B: Coastal city:

Comparison based on regional city of State A and State B:
State A: Regional City:
    Row Labels
    Average of Price $000
    Average of Internal Area m^2
    Average of Bedrooms
    House
    368.34
    162.31
    3.54
    Unit
    223.44
    95.55
    2.22
    Grand Total
    320.8144
    140.4104
    3.104
    Row Labels
    Average of Bathrooms
    Average of Garages
    House
    1.52
    1.85
    Unit
    1.07
    1.22
    Grand Total
    1.376
    1.64
State B: Regional City:
    Row Labels
    Average of Price $000
    Average of Internal Area m^2
    Average of Land/Total Area m^2
    House
    418.51
    139.94
    763.46
    Unit
    335.47
    104.16
    210.91
    Grand Total
    400.86
    132.34
    646.04
    Row Labels
    Average of Bedrooms
    Average of Bathrooms
    Average of Garages
    House
    3.46
    1.54
    1.95
    Unit
    2.71
    1.76
    1.35
    Grand Total
    3.30
    1.59
    1.83
One can clearly see that as far as the average price of the property of Regional city is concerned, State B seems to be quite expensive than State A. Other attributes like average internal area, average number of bathrooms, average bathrooms and garages seem to be almost same and no any significant difference is seen.
Multiple Linear Regression model:
The entire data is selected for the first model and after that by checking the VIF the variables with high multicollinearity (VIF > 3 has been considered as the threshold for being having high multicollinearity). Then the regression model has been built based on the data removing the variable with high multicollinearity. Again a new model has been built based on the new predictors and finally after assuring the low VIF, the variables have been checked with insignificant p-values and then removed also. In fine, the variables with the significant p-values have been selected and thus the final model is selected.
State A final model:
    Regression Statistics
    
    
    
    
    
    
    
    Multiple R
    0.82
    
    
    
    
    
    
    
    R Square
    0.68
    
    
    
    
    
    
    
    Adjusted R Square
    0.67
    
    
    
    
    
    
    
    Standard Error
    63.58
    
    
    
    
    
    
    
    Observations
    125.00
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ANOVA
    
    
    
    
    
    
    
    
     
    df
    SS
    MS
    F
    Significance F
    
    
    
    Regression
    4.00
    1031712.22
    257928.06
    63.81
    0.00
    
    
    
    Residual
    120
    485039.21
    4041.99
    
    
    
    
    
    Total
    124
    1516751.434
     
     
     
    
    
    
    
    
    
    
    
    
    
    
    
     
    Coefficients
    Standard Error
    t Stat
    P-value
    Lower 95%
    Upper 95%
    Sx
    VIF
    Intercept
    119.67
    26.44
    4.53
    0.00
    67.31
    172.02
    
    
    Bedrooms
    23.38
    7.54
    3.10
    0.00
    8.45
    38.32
    1.07
    1.99
    Bathrooms
    71.77
    13.18
    5.45
    0.00
    45.68
    97.86
    0.55
    1.60
    Garages
    30.69
    7.44
    4.13
    0.00
    15.96
    45.42
    0.85
    1.22
    Type
    -62.58
    15.19
    -4.12
    0.00
    -92.65
    -32.51
    0.47
    1.57
State B final model:
    SUMMARY OUTPUT
    
    
    
    
    
    
    
    
    
    
    
    
    
    Regression Statistics
    
    
    
    
    
    Multiple R
    0.78
    
    
    
    
    
    R Square
    0.61
    
    
    
    
    
    Adjusted R Square
    0.60
    
    
    
    
    
    Standard Error
    80.37
    
    
    
    
    
    Observations
    80.00
    
    
    
    
    
    
    
    
    
    
    
    
    ANOVA
    
    
    
    
    
    
     
    df
    SS
    MS
    F
    Significance F
    
    Regression
    2
    791788.3
    395894
    61.28
    0.00
    
    Residual
    77
    497412.8
    6459.91
    
    
    
    Total
    79
    1289201
     
     
     
    
    
    
    
    
    
    
    
     
    Coefficients
    Standard Error
    t Stat
    P-value
    Lower 95%
    Upper 95%
    Intercept
    70.35
    32.72
    2.15
    0.03
    5.20
    135.50
    Internal Area m^2
    2.17
    0.29
    7.57
    0.00
    1.60
    2.74
    Garages
    23.61
    9.98
    2.37
    0.02
    3.73
    43.49
From both the model, we can say that the model for the state A fits the data quite well. The adjusted R-squared for the first and second models are 0.67 and 0.60 respectively. This implies that almost 67% of the variance of the response variable is explained by bedrooms, bathrooms, garage, and type. While in the case of the second model only 60% of the response variable is explained by the predictor variables Internal area and garages.
Comparison based on Coastal city of State A and State B:
State A: Coastal City:
    Row Labels
    Average of Price $000
    Average of Internal Area m^2
    Average of Bedrooms
    House
    610.88
    170.50
    3.97
    Unit
    383.73
    92.95
    2.17
    Grand Total
    494.58
    130.80
    3.05
    Row Labels
    Average of Bathrooms
    Average of Garages
    House
    2.20
    1.98
    Unit
    1.42
    1.08
    Grand Total
    1.80
    1.52
State B: Coastal City:
    Row Labels
    Average of Price $000
    Average of Internal Area m^2
    Average of Bedrooms
    House
    538.10
    180.55
    3.84
    Unit
    411.53
    88.02
    1.97
    Grand Total
    500.63
    153.16
    3.29
    Row Labels
    Average of Bathrooms
    Average of Garages
    House
    2.05
    2.51
    Unit
    1.57
    1.59
    Grand Total
    1.90
    2.24
From the table it is quite evident that in the case of House property, the average price seems to be quite higher for state A but for unit property type, the average price seems to be quite higher for State B, other attributes remaining almost the same. Overall, the average price in State B seems to be greater than that of State A with other amenities also being provided in a larger amount.
State A final model:
    SUMMARY OUTPUT
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Regression Statistics
    
    
    
    
    
    
    
    Multiple R
    0.68
    
    
    
    
    
    
    
    R Square
    0.46
    
    
    
    
    
    
    
    Adjusted R Square
    0.45
    
    
    
    
    
    
    
    Standard Error
    164.55
    
    
    
    
    
    
    
    Observations
    125.00
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ANOVA
    
    
    
    
    
    
    
    
     
    df
    SS
    MS
    F
    Significance F
    
    
    
    Regression
    3
    2824663.29
    941554.43
    34.77
    0.00
    
    
    
    Residual
    121
    3276394.86
    27077.64
    
    
    
    
    
    Total
    124
    6101058.15
     
     
     
    
    
    
    
    
    
    
    
    
    
    
    
     
    Coefficients
    Standard Error
    t Stat
    P-value
    Lower 95%
    Upper 95%
    Sx
    VIF
    Intercept
    241.19
    59.18
    4.08
    0.00
    124.02
    358.35
    
    
    Bathrooms
    127.90
    25.45
    5.03
    0.00
    77.52
    178.29
    0.74
    1.63
    Garages
    44.73
    21.11
    2.12
    0.04
    2.94
    86.52
    0.89
    1.60
    Type
    -87.55
    36.44
    -2.40
    0.02
    -159.70
    -15.40
    0.50
    1.53
State B final model:
    SUMMARY...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here