Assignment and charge
Overview Over the past few weeks, you have learned how to use R to wrangle business data. This assignment will provide you with an opportunity to demonstrate your R skill for data wrangling. Using the tidyverse package is recommended but not compulsory. Please carefully read the entire assignment to make sure you understand the requirements and also the submission format and marking rubrics before starting. Academic Integrity Assignment Requirements Part 1 [30 marks] The online hospitality company Airbnb has made publicly available a number of datasets. This part of the assignment makes use of the detail data of Sydney listings which is available at the link below. You have to unzip it after downloading. http://data.insideairbnb.com/australia/nsw/sydney/2022-06-06/data/listings.csv.gz The data dictionary of this dataset can be found at https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit?usp=sh aring Write R code in an Rmd file to perform the following request. Plagiarism occurs when you use words, ideas, or work products attributable to another identifiable person or source: • without attributing the work to the source from which it was obtained • in a situation in which there is a legitimate expectation of original authorship • in order to obtain some benefit, credit, or gain which need not be monetary Self-plagiarism refers to the re-submission of work as if it were original. You may not submit your own academic work for assessment when it has already been submitted for assessment at another time (including at another institution), without the express permission of the academic staff member who will assess it. http://data.insideairbnb.com/australia/nsw/sydney/2022-06-06/data/listings.csv.gz https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit?usp=sharing https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit?usp=sharing 1.1. Read in the dataset into a dataframe. Keep only the below columns. Make sure the numeric columns are in the correct data type. Display the summary of the dataframe. id, name, description, host_name, neighbourhood_cleansed, property_type, room_type, accommodates, bathrooms_text, bedrooms, beds, amenities, price, number_of_reviews, number_of_reviews_ltm, review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_checkin, review_scores_communication, review_scores_location, review_scores_value. (3 marks) 1.2. How many listings have their name column contain: a. Beautiful (in upper or lower case or mixed). b. Both Quiet and Beautiful (in upper or lower case or mixed). (2 marks) 1.3. List the five neighbourhoods with the highest number of reviews in the last 12 months (list them along with the average review score rating). (3 marks) 1.4. Display the average price and average review score rating of each property type in Woollahra. ( 2 marks) 1.5. You are going to visit Sydney and want to find an accommodation. Let define four criteria based on: a. the neighbourhood, b. the maximum price, c. the bath room type, and d. the required amenities. You must adjust the defined criteria so that the result has at least 10 and at most 20 listings for you to choose. Display only the name and columns related to the criteria. ( 5 marks) 1.6. Draw a bar chart to show the number of listing of the top 10 hosts having the most listings. Write a short paragraph to describe the chart. (7 marks) 1.7. Draw a histogram and a boxplot to show the distribution of the listing prices of all types of private room. Redraw both charts with outliers removed. Write a short paragraph to describe your insight about the price distribution. (8 marks) Part 2 [30 marks] The given data file, census.xlsx, contains the information about the number of bedrooms in occupied private dwellings for local government areas in Melbourne for the years 2011 and 2016. You will see that the data is far from being ready for analysis and needs to be 'wrangled'. Additionally a few errors have been deliberately introduced into the data so these will need to be corrected. You are required to write R code to perform the following steps to have the clean data. 2.1. Read in the 2011 dataset into a dataframe. Show the structure of the dataframe. (2 marks) 2.2. Investigate and fix if there are any inconsistent values in the first column. (2 marks) 2.3. You can see that the second column contains both count and percentage. Let split the two values into 2 columns. Show the structure of the dataframe. (3 marks) 2.4. You can see that each suburb have 9 rows of data. Display the number of suburbs in the dataframe. (1 mark) 2.5. We only interest in the count value. Remove the percentage column. Then, transform data to have each statistic shown in one column. (3 marks) 2.6. Add the year column showing the year of the data. Make sure each column has an appropriate type. Rename and reorder columns to have a data frame with columns in the below order. Show the summary of the dataframe. region, year, br_count_0, br_count_1, br_count_2, br_count_3, br_count_4_or_more, br_count_unstated, av_per_dwelling, av_per_household (3 marks) 2.7. How many regions do we have in the data? Remove if there are any duplicate regions. (2 marks) 2.8. Define a function that takes a year, applies all the steps from 2.1 to 2.7 to return a clean dataframe for that year. (4 marks) 2.9. Call the defined function to have two dataframes for 2011 and 2016. Then, combine them into one dataframe. (2 marks) 2.10. Investigate the combined dataframe in 2.11, report and fix it if there is any errors in the data. Show a summary of the combined dataframe. (4 marks) 2.11. How many houses with 2 or 3 bedrooms in 2016? (2 marks) 2.12. Which region has the largest decrease in the number of 3 bed room houses from 2011 to 2016? (2 marks) Submission Guidelines • You must submit A SINGLE file (.Rmd) comprising all the codes to answer all the questions of the two parts in the given order. • Answer to each question is presented in ONE code chunk. • PUT the question number before the code chunk. DO NOT include the question description (to avoid a high Turnitin similarity score). • When writing your code, keep the data files in the same directory as your notebook so that you DO NOT specify directories or file paths in your code. This allows us to run your code smoothly on our device. • Marks will be deducted if your submission does not follow the guidelines. 2011 Occupied private dwellings (Count & Percentage) StatCount/Percentage RegionBanyule None (includes bedsitters)78/0.2 1 bedroom1287/2.9 2 bedrooms8457/19.4 3 bedrooms21865/50 4 or more bedrooms11366/26 Number of bedrooms not stated645/1.5 Average number of bedrooms per dwelling3.1/-- Average number of people per household2.6/-- RegionBayside (C) None (includes bedsitters)97/0.3 1 bedroom1054/3.2 2 bedrooms7939/23.9 3 bedrooms13731/41.3 4 or more bedrooms10031/30.1 Number of bedrooms not stated419/1.3 Average number of bedrooms per dwelling3.1/-- Average number of people per household2.6/-- RegionBoroondara (C) None (includes bedsitters)347/0.6 1 bedroom3286/5.7 2 bedrooms15436/27 3 bedrooms19853/34.7 4 or more bedrooms17578|30.7 Number of bedrooms not stated731/1.3 Average number of bedrooms per dwelling3/-- Average number of people per household2.6/-- RegionBrimbank (C) None (includes bedsitters)180/0.3 1 bedroom1009/1.7 2 bedrooms5689|9.7 3 bedrooms34431/58.6 4 or more bedrooms15854/27 Number of bedrooms not stated1608/2.7 Average number of bedrooms per dwelling3.2/-- Average number of people per household2.9/-- RegionCardinia (S) None (includes bedsitters)40/0.2 1 bedroom400/1.6 2 bedrooms2554/10.2 3 bedrooms12115/48.4 4 or more bedrooms9599/38.3 Number of bedrooms not stated330/1.3 Average number of bedrooms per dwelling3.3/-- Average number of people per household2.8/-- RegionCasey (C) None (includes bedsitters)167/0.2 1 bedroom751/0.9 2 bedrooms5647/7 3 bedrooms39755/49.4 4 or more bedrooms32788:40.7 Number of bedrooms not stated1362/1.7 Average number of bedrooms per dwelling3.4/-- Average number of people per household3/-- RegionDarebin (C) None (includes bedsitters)300/0.6 1 bedroom4312/8.4 2 bedrooms16157/31.3 3 bedrooms22282/43.2 4 or more bedrooms7403/14.3 Number of bedrooms not stated1169/2.3 Average number of bedrooms per dwelling2.7/-- Average number of people per household2.5/-- RegionFrankstone (C) None (includes bedsitters)146/0.3 1 bedroom1383/2.9 2 bedrooms7326/15.5 3 bedrooms25003/52.7 4 or more bedrooms12633/26.7 Number of bedrooms not stated911/1.9 Average number of bedrooms per dwelling3.1/-- Average number of people per household2.5/-- RegionGlen Eira (C) None (includes bedsitters)137/0.3 1 bedroom4438/8.9 2 bedrooms14706/29.5 3 bedrooms19525/39.1 4 or more bedrooms10267/20.6 Number of bedrooms not stated820/1.6 Average number of bedrooms per dwelling2.8/-- Average number of people per household2.5/-- RegionGreater Dandenong (C) None (includes bedsitters)240/0.5 1 bedroom1864/4.1 2 bedrooms9788|21.5 3 bedrooms22848/50.2 4 or more bedrooms9323/20.5 Number of bedrooms not stated1425/3.1 Average number of bedrooms per dwelling2.9/-- Average number of people per household2.8/-- RegionHobsons Bay (C) None (includes bedsitters)105/0.3 1 bedroom1099/3.5 2 bedrooms6473/20.8 3 bedrooms17397/55.9 4 or more bedrooms5483/17.6 Number of bedrooms not stated581/1.9 Average number of bedrooms per dwelling2.9/-- Average number of people per household2.6/-- RegionHume (C) None (includes bedsitters)152/0.3 1 bedroom687/1.3 2 bedrooms3599/6.9 3 bedrooms29379/56.2 4 or more bedrooms17382/33.3 Number of bedrooms not stated1045/2 Average number of bedrooms per dwelling3.3/-- Average number of people per household3.1/-- RegionKingston (C) (Vic.) None (includes bedsitters)133/0.2 1 bedroom1990/3.7 2 bedrooms12719/23.8 3 bedrooms24685/46.2 4 or more bedrooms12929/24.2 Number of bedrooms not stated990/1.9 Average number of bedrooms per dwelling3/-- Average number of people per household2.5/-- RegionKnox (C) None (includes bedsitters)69/0.1 1 bedroom865/1.7 2 bedrooms5958/11.4 3 bedrooms25506/49 4 or more bedrooms18992/36.5 Number of bedrooms not stated666/1.3 Average number of bedrooms per dwelling3.3/-- Average number of people per household2.8/-- RegionManningham (C) None (includes bedsitters)67/0.2 1 bedroom491/1.3 2 bedrooms3661/9.5 3 bedrooms16747/43.4 4 or more bedrooms17190/44.5 Number of bedrooms not stated465/1.2 Average number of bedrooms per dwelling3.4/-- Average number of people per household2.8/-- RegionMaribyrnong (C) None (includes bedsitters)185/0.7 1 bedroom2260/8.5 2 bedrooms8873/33.6 3 bedrooms10986?41.5 4 or more bedrooms3549/13.4 Number of bedrooms not stated594/2.2 Average number of bedrooms per dwelling2.6/-- Average number of people per household2.5/-- RegionMaroondah (C) None (includes bedsitters)61/0.2 1 bedroom798/2.1 2 bedrooms8096/21.1 3 bedrooms17952/46.8 4 or more bedrooms10970/28.6 Number of bedrooms not stated510/1.3 Average number of bedrooms per dwelling3.1/-- Average number of people per household2.6/-- RegionMelbourne (C) None (includes