Have a files
Microsoft Word - BUS708 201903 Ass2 Description BUS708 Statistics and Data Analysis Inferential Statistics Report Assignment 2 (Assessment 4) – Individual Word Report – Trimester 3, 2019 1 OVERVIEW OF THE ASSIGNMENT This assignment will test your skill to present and summarise data as well as to make basic statistical inferences in a business context. You will use the results and any feedback given in Assignment 1 (Assessment 3, Excel Report) and produce a single report in a word document. You will need to construct interval estimates, perform suitable hypothesis tests and regression analysis and make conclusion and suggestion for management action. Your report should be written in a word document and should be submitted to Turnitin following the requirement explained below. 2 TASK DESCRIPTION There are two datasets involved in this assignment: Dataset 1 and Dataset 2, which are the same datasets used in Assignment 1 (Excel Report). All data processing should be performed in Excel or Statkey (http://www.lock5stat.com/StatKey). Specific instruction as to which tools should be used for each section will be given during tutorials. Your tasks are to answer the following research questions given in Section 2 to Section 6 below using dataset 1 or dataset 2 as indicated in each section. To answer each question, you will need to first present the relevant numerical summary (summary statistics) and graphical display and perform suitable statistical analysis to provide a statistical conclusion. 1. Section 1: Introduction Provide a brief and clear introduction about the report (e.g. the objective of the report, the datasets involved, etc.). Find relevant articles (minimum one article, maximum 3 articles) and write a proper literature review which includes in-text citation. 2. Section 2: Is 40% a plausible value for the proportion of private room in Airbnb room type? Using Dataset 1, first provide both numerical summary as well as graphical display that easily shows the proportions of different room types. Then construct a 95% confidence interval of the population proportion of private rooms. Finally, answer the research question using the confidence interval. 3. Section 3: After an iteration of outlier removal, is the price of private room more than $70? Using Dataset 1, perform one iteration of outlier detection on the price of private room using the method described in the lecture notes. After removing those outliers, describe the price distribution of private room using both numerical and graphical summary which shows the remaining outliers, if any. Perform suitable hypothesis test to answer the research question above at 5% level of significance. 4. Section 4: Is there a difference in availability in the next 365 days between different room type? Using Dataset 1, describe the distribution of availability_365 from each room types. You need to provide both numerical summary as well as graphical display which shows the outliers, if any. Perform a suitable hypothesis test to answer the research question above. Use a 5% significance level. 5. Section 5: Can we predict the price of the accommodation using the longitude of the property? Using Dataset 1, develop a regression model to predict the price of the Airbnb accommodation using the longitude of the property. Interpret the correlation coefficient, coefficient determination and the relevant p-values and use them to answer the research question. Provide a suitable graphical display. 6. Section 6: Is there any relationship between gender and room type accommodation? Using Dataset 2, describe the relationship between a student’s gender and the room type of the accommodation they currently live in. You need to provide both numerical summary and graphical display. Perform a suitable hypothesis test to answer the research question above. Use a 5% significance level. 7. Section 7: Conclusion Write a brief summary of all the findings in the previous sections and write a concluding statement. Suggest further research by discussing an interesting topic or research question that can be further explored related to the datasets. 3 SUBMISSION REQUIREMENT Deadline to submit the report: Week 11, Sunday 2 Feb 2020, 23:59 You need to submit a word document file to Turnitin which shows all computer outputs and discussion. You do not need to submit the dataset. 4 MARKING CRITERIA Students are advised to read the marking rubric provided on Moodle as well as detailed marking criteria based on this rubric. 5 DEDUCTION, LATE SUBMISSION AND EXTENSION Late submission penalty: - 5% of the total available marks per calendar day unless an extension is approved. This means 0.75 marks (out of 15 marks) per day. For extension application procedure, please refer to Section 3.3 of the Subject Outline. Please do NOT email the lecturer or tutor to seek an extension, you need to follow the procedure described in the Subject Outline. 6 PLAGIARISM Please read Section 3.4 Plagiarism and Referencing, from the Subject Outline. Below is part of the statement: “Students plagiarising run the risk of severe penalties ranging from a reduction through to 0 marks for a first offence for a single assessment task, to exclusion from KOI in the most serious repeat cases. Exclusion has serious visa implications.” “Authorship is also an issue under Plagiarism – KOI expects students to submit their own original work in both assessment and exams, or the original work of their group in the case of a group project. All students agree to a statement of authorship when submitting assessments online via Moodle, stating that the work submitted is their own original work. The following are examples of academic misconduct and can attract severe penalties: Handing in work created by someone else (without acknowledgement) , whether copied from another student, written by someone else, or from any published or electronic source, is fraud, and falls under the general Plagiarism guidelines. Students who willingly allow another student to copy their work in any assessment may be considered to assisting in copying/cheating, and similar penalties may be applied. ” write your title here (e.g. Google play apps analysis) BUS708 Assignment 2 Section 1: Introduction This document serves as a sample template for Assignment 2, as well as a general feedback for Assignment 1. You don’t have to use this template for Assignment 2, but if you prefer, you can edit this document and use it for Assignment 2. You can change the title, subtitle and section title accordingly. Some general feedback for Assignment 1 · Some students gave a very short description about dataset 1 and failed to explain what it is about (e.g. some characteristics of Google Play Apps) and/or the source of the dataset (e.g. from Kaggle and originally provided by Lavanya Gupta). · Quantitative variables in dataset 1 are: Rating, Review and Price. Install can arguably be either quantitative or categorical (original dataset should be categorical but can be accepted as quantitative as it’s quite ambiguous). Size, Last Updated, Current Version and Android Version are all categorical (Size can potentially be quantitative if the units are all the same). · The main reason that dataset 2 might be biased is not because it does not cover the whole population (a random sample does not include the whole population, but it’s not biased). Most likely that dataset 2 is biased is because it’s not a representative of the population (only from KOI or other institutions). · Many students wrote reasonable comments, but many failed to answer the research question. You should have a concluding statement that answer the research question (e.g. “… hence, we conclude that most google play apps are free”, or “there seems to be a difference in prices among paid apps from the categories…”, or “the correlation coefficient indicates there is no linear relationship between Rating and Review”, etc.) · Many graphs are still missing a title and axis labels. Hints for Assignment 2 · Make sure you mention the objective of the report or what is the report about, including short description of the datasets. This can be one paragraph in Section 1. · Write a proper literature review, including in text citation. Some example can be found in http://anglia.libguides.com/ld.php?content_id=14268350. Paraphrase the article, don’t just copy paste its content. This can be another paragraph in Section 1. · Make sure you explicitly answer the research question in each section. · Check that your graphs are complete (title, labels or legends). · Check and re-check marking criteria to make sure you address all the criteria. Section 2: Are most google play apps free? In this section … data presentation Inferential statistics The following …. Sample size (n) = 4000 Sample proportion (phat) = 0.934 Standard Error (SE) = = 0.0039 Critical value = 1.96 95% Confidence Interval = 0.934 +/- (1.96)(0.0039) = (... , ….) …..
Section 3: Dfadfa data presentation Inferential statistics Dfadfad Sample size (n) = 230 Sample mean (Xbar) = 3.075 Sample standard deviation (s) = 1.783 Test-statistic p-value = 0.0000062 From Statkey, Theoretical Distributions: t, with df = n-1 = 229 Section 4: data presentation Copy and paste your numerical summary and graph from Excel to this space. Make sure you have checked if they are correct. Inferential statistics You need to do step-by-step ANOVA and copy and paste the ANOVA table from Statkey. Section 5: data presentation You need to perform regression analysis and paste in some output in here. You can either use Excel (Data > Data Analysis > Regression) or Statkey. Note that most likely you will need to make a new scatter plot as the order of X and Y may be different. Inferential statistics Please refer to the marking criteria to see what inferences you need to make. Also make sure you make a conclusion that answer the research question. Section 6: