This is our final project this semester. It summarizes the most important statistical tests you learned in my class. It will also show you, once again, how we can apply our knowledge to business situations. This project will be of value to any real estate agency in your chosen city! I hope you will enjoy working on it and put your best effort to make it really professional!
Make a copy of your data. Up to Step 7, use ALL your variables even if you collected just few data points for them.
Please, make sure that you have Age variable but NOT Year and Lot Size in either acres or sq.ft. but NOT both.
Step 1. Start a new Word document with
YourLastName_CityName
as a name of the file and construct a title page with your city name, your name/section, a map with your city marked, etc.
Step 2. Define all your variables:
Table 1 Definitions and Types
In this table, present and define each variable in your data set.
In particular, explain your qualitative variables and the way you coded them. Please, include data types (qualitative or quantitative; if quantitative: nominal, ordinal, or interval/ratio). PLEASE, construct your tables in Word.
Example:
Variable
|
Definition
|
Quantitative /Qualitative
|
Nominal / Ordinal / Interval/Ratio
|
Price
|
Property price in dollars (or thousands of $$ - whatever is true for your data)
|
…
|
|
Sq. Ft.
|
Size of the house in sq.ft.
|
…
|
|
Lot
|
Size of the lot in sq.ft. or acres
|
|
|
AC
|
If the property has air condition. Yes=1, No=0
|
…
|
|
****************************************************************************
Step 3. Calculate Descriptive Statistics using Excel, format the table (see below), and copy it to your Word file as PICTURE:
Table 2. Descriptive Statistics Table
Format the table, so it fits nicely in your paper. This step will involve some formatting of the table.
(Hint:
Excel repeats names of statistics for each variable – you can delete these and leave only the first column but you will have to move variables’ names one column to the right.)
The table should take about ½ of the page.
Example (You should have
ALL your variables
here AND include your
property number
in the first column!!!); Please, do not put in the landscape format):
****************************************************************************
Step 4. For each variable,
interpret
the descriptive statistics using the following table format. In the last column, please, include a measure of dispersion that will allow us to compare the dispersion for all quantitative variables. Please, mark the highest and lowest dispersion.
PROVIDE THE VALUE OF THE SKEWNESS AND THE VALUE OF THE LIMIT!!! Please, remember that we explain central tendency codes for nominal variables in “Value” column.
Table 3 Central Tendency Table
Example:
Variable
|
Central Tendency Measure = Value
|
Why did you choose this measure?
|
Dispersion
|
Price
|
Mean = 209.8
|
Quantitative and Skewness = … < or=""> … (provide the value of the skewness and the value of the limits: use a number form the little table on Green Notes)
|
xx% = highest
|
…
|
…
|
…
|
…
|
****************************************************************************
Step 5. Standardize all quantitative variables:
Make a copy of your data and calculate z-scores for all your quantitative variables (use function STANDARDIZE). Use Conditional Highlighting to mark unusual observations and outliers (use DIFFERENT COLORS). This is a good time to check if your outliers are not just errors in typing in data!!! If so, correct your data and go back to Step 3.
Table 4. Unusual Observations and Outliers
The next page in your report should be a copy of standardized variables worksheet (your raw data and standardized variables.) This should be ONE page ONLY. Therefore, you will need to copy your table from Excel so it fits on one page (highlight your data in Excel, choose Copy, go back to Word, and choose Paste Special > Picture). Please, make sure all your data are included (including your random number and ALL your qualitative variables). Also, please, remember to label your columns. Data should be sorted by price from High to Low. (Please, report z-scores with two decimals and remove z-scores for missing observations.) This table will be small J Please, make sure you have
at least 17 columns
in this table.
Step 6: Mark unusual observations and outliers with the highlighter on your printout
–
use different colors for outliers and different color for unusual observations and provide a LEGEND (!!!): unusual observations = …; outliers = ….
****************************************************************************
Step 7. Formatting/Cleaning your data set for regression. Copy your original data (from Step 3, without z-scores calculated) before working on it. Make sure to use Age variable and not Year. In order to run a regression you need to have all entries for all your variables. If you have not filled every entry for each variable, you cannot use this variable or these properties. Please, talk to me and we will decide what to do: depending on how many you are missing, we will either delete the whole variable or delete properties we don’t have all information about. Delete outliers that may strongly influence your results (z>|4|). In bullet points, describe what was done.
****************************************************************************
Step 8. In Excel, run a Correlation Table for all your variables (please, remember to include labels!) and copy it to your Word document as Table 5 (as before, copy and Paste Special as Picture.). Please, round all your numbers to
the nearest tenths (=one decimal place).
****************************************************************************
Step 9. Below the table, comment on possible multicollinearity issues (SEE CLASS SLIDES!!!). If none of your correlations are greater than |0.7|, use the highest you found to comment.
****************************************************************************
Step 10. For the rest of this project, please, refer to the Regression Class Notes handout for the format.
Choose the best model for your data: In Word,copy and paste output from the 1stregression and the output from the best regression you run (paste as picture);do NOT copy ALL your regressions to your Word document but leave them in your Excel file. Describe all regressions you have run in a table. Include allAdj. R Squaresand p-values(as I did on theyellowhandout). Name it Table 6. You may end up with just 1-2 variables! Remember to show the drop in Adj.R2.
Table 6: Description of Regressions (Example)
Regression #
|
Excluded Variable
|
p-value
|
Adj. R squared
|
Model A
|
All included
|
N/A
|
0.6565
|
Model B: BEST!!!
|
Pool
|
0.9
|
0.6725
|
Model C
|
…
|
…
|
…
|
Below the Excel outputs, give the following information (follow the yellow handout):
Best regression process described;
Dependent variable (y) =
Independent variables (x) =
R2
significance (if significant and WHY?):
Adj.
R2
interpretation:
The estimated regression equation (in words and numbers from output, no y-x nor bi):
Table 7. Significance and interpretation of coefficients table.(WORTH 25%!)
Example:
Variable
|
Coefficient
|
Significant? and Why?
|
Interpretation
|
Sq.Ft.
|
…
|
|
|
If signs of the coefficients don’t seems to be correct, please, check if the effects are NOT the results of a multicollinearity (Table 5) or outliers and comment in 1 sentence.
****************************************************************************
Step 11. Go back to the beginning of your report and write the Introduction (one page, double spaced): Present your city.
Based on all statistical tests you run and referring to tables in your report, summarize important features of the real estate market in your city.
Write this as a highly paid executive presenting your findings to a real estate company. This should be really informative! Please, use the following format (in points):
1.
Present your city (location, population, why did you choose this city?, etc.)
2.
Source of data
3.
How does the “typical” house look like (what are the central tendencies)? (Please, refer to your Table 3 and use specific numbers!) Please, remember to report proportions for binary variables (e.g. a proportion of houses with AC). Go back to Project 1 if you are confused.
4.
Which features of houses are the most and the least dispersed?
5.
Describe 1 outlier or an unusual observations:
Using descriptions from Realtor.com AND the fact that these properties are outliers explain why these properties are worth to or not worth to buy/invest in. (I will grade this on your understanding of outliers. Therefore, make sure that, while you are recommending the property, you make a good use of the fact that one or more of variables for this property is an outlier (be specific). Please, make sure to relate your explanation to property price. (Do not use properties that has only one outlier in Age category.) This should be 4-5 sentences..
6.
What is one house feature that is the most important in the valuation of your city properties? Be specific (e.g. price increases by $... if …) (Refer to Table 7.) Note that the most important variables have THE LOWEST p-Values!
What are other important variables?
7.
Which features do not explain variation in the prices in your city properties (list the ones that you excluded from regression AND insignificant ones)?
8.
Anything else? Is there a house you would like to buy?
****************************************************************************
Step 12.
Submit your Excel file with regressions to the D2L Dropbox>Regression Excel and print out your whole report (also drop Word to Dropbox>Word). Please, remember to mark your unusual observations and outliers with different colors and provide a legend if you printed not in colors!
Print out the grading template and go over it to make sure you included ALL required points. Please, turn in your printed report and grading template.
|
Please, e-mail
me ([email protected]) if you have any questions.
I HOPE YOU WILL REALLY LIKE THIS PROJECT!
HAVE FUN!!!
|
|