statistics report due asap
GROUP PROJECTS COURSE PROJECTS: Guidelines for your Written Report -------------------------------------------------------- Your written project report is due NO LATER THAN NOON, Thursday, May 5. (You are STRONGLY ENCOURAGED to complete and submit this much earlier than the final due date). The data file you’ll use is named “Bloomington home sales” (on D2L), which contains information about a complete listing of all homes sold in Bloomington from Summer, 2015 thru Summer, 2016 (you do not need to make any edits with this data). Your first task is to (via StatCrunch) obtain a random sample of 280 homes from this population list: · In StatCrunch, click DATA – SAMPLE , then Select All your columns, and then make the entries as shown here Your primary task on this assignment is to “tell the story” your sample data has to offer about selling prices of homes in Bloomington (e.g. how much did homes sell for in the past year, and what sort of homes tended to sell for more/less $$ ). Your written report should include: --- description of your sample findings (i.e. narrative based on graphs, patterns, tables, statistics, etc..), (NOTE: In a more formal report, your supporting StatCrunch output would appear in an appendix, referenced by your narrative in the report; e.g. ‘as per the boxplots on page A6’...) However, if you find it more convenient, you may choose to intersperse your narrative with the relevant StatCrunch output (as you typically do on HW assignments). In any case, this “description/narrative” deserves the biggest chunk of your attention on the project. --- illustrations of inferences that can be made to the population based on your sample results (i.e. you don’t need to provide every possible confidence interval, but by showing a few illustrations of such intervals, I can see that you know how to do them.) You need to provide ONE of EACH of the following (properly interpreted and in logical context): · confidence interval for a population mean · confidence interval for a population proportion · confidence interval for the difference between 2 population means · confidence interval for the slope in a regression model · confidence interval/prediction interval based on regression model The contents of the data file: C1: Sold Price ($$) C2: Zip Code (where house is located) C3: Cumulative # of days the house was on the market C4: # of sq. ft. above ground C5: # of sq. ft. below ground C6: # of bedrooms C7: # of years since the house was built C8: size of the lot (in acres) C9: # of bathrooms C10: # of garage stalls C11: # of fireplaces C12: swimming pool (0 if no, 1 if yes) C13: type of air conditioning C14: foundation size (sq. ft.) Your starting steps in analyzing your sample data: · Look at the pattern of data for each of the variables (columns). Thus you’ll have several stem & leaf plots with Summary Statistics (for any interval-scale variables), and several Stat-Tables-Frequency (for any ‘categorical’ variables); NOTE – although ‘Bedrooms’, ‘Garage Stalls’, and “# Fireplaces are technically ‘interval-scaled’, it will be better to treat them as ‘categorical’ for descriptive and graphing purposed, because there are few outcomes for those variables. You’re doing this for several reasons: -- just to get familiar with the sort of values you’re seeing for each variable -- to get a sense of “what sort of houses” you have in your sample -- to prepare for making any ‘edits’ you deem appropriate (although in this data, you should assume that any such editing has already been accomplished) Provide a very brief verbal description (approx.. 2-3 sentences for each variable) of the pattern of values for each of your variables in your written report (i.e. to give the reader some sense of “what sort of houses” make up your sample). -------------------------------------------------------------------------------------------------------------------------------- Now that you’re familiar with your sample data, you’re ready to start the actual analysis, which will mimic your approach to the first part of HW #9 : · Provide a detailed verbal description of the pattern of values for your ‘Response variable’ of ‘Sold Price’ (i.e. based on your Stem & Leaf plot (or Histogram) and Summary Statistics). This is the variability you hope to ‘explain’ by virtue of relationships with some of your predictor variables. · Graphically display how each of your ‘predictor variables’ might relate to your ‘response variable’. Thus you’ll have several scatter plots (for those predictors that are interval-scaled and ‘continuous’) and several sets of boxplots (for those predictors that are categorical/discrete). Provide a brief verbal description in your written report as to what each of your ‘relationship plots’ seems to be saying .