[Title of your report]IntroductionProvides clear and concise context for the report, introducing the purpose of the analyses that follow. As a guideline, one paragraph will be sufficient....

1 answer below »

View more »
Answered 6 days AfterMar 15, 2023University Of South Australia

Answer To: [Title of your report]IntroductionProvides clear and concise context for the report, introducing...

Mohd answered on Mar 20 2023
48 Votes
-
-
-
2023-03-20
library(readr)
library(magrittr)
library(dplyr)
finaldata <- read_csv("finaldata.csv", col_types = cols(Date = col_date(format = "%d/%m/%Y")))
Section 1: Questions [I] Descriptive Statistics & Exploratory Analysis: The data is not always cleaned and presented in a working manner. There are some unnecessary columns and variables which do not have full completed entries. In addition, you might have errors in this dataset, and you have to fix them before you start analysing. You can do data cleansing in R or Excel.
(a). Choose & filter a single h
ouse ‘Type’. Use this for the remainder of the assignment as completed in Project Part A. Create a subset dataset of size at least 250 with the continuous variables and ‘Postcode” and ‘Year= 2018’. Hint: Use na.omit function. For full marks, provide a screenshot of the first 30 row entries of the cleaned dataset in R. [2 marks]
finaldata<-na.omit(finaldata)
finaldata1<-finaldata%>%
filter(Type=="h")
head(finaldata1,30)
## # A tibble: 30 × 21
## Suburb Address Rooms Type Price Method SellerG Date Dista…¹ Postc…²
##
## 1 Abbotsf… 25 Blo… 2 h 1.03e6 S Biggin 2016-02-04 2.5 3067
## 2 Abbotsf… 5 Char… 3 h 1.46e6 SP Biggin 2017-03-04 2.5 3067
## 3 Abbotsf… 55a Pa… 4 h 1.6 e6 VB Nelson 2016-06-04 2.5 3067
## 4 Abbotsf… 124 Ya… 3 h 1.88e6 S Nelson 2016-05-07 2.5 3067
## 5 Abbotsf… 98 Cha… 2 h 1.64e6 S Nelson 2016-10-08 2.5 3067
## 6 Abbotsf… 10 Val… 2 h 1.10e6 S Biggin 2016-10-08 2.5 3067
## 7 Abbotsf… 40 Nic… 3 h 1.35e6 VB Nelson 2016-11-12 2.5 3067
## 8 Abbotsf… 16 Wil… 2 h 1.31e6 S Jellis 2016-10-15 2.5 3067
## 9 Abbotsf… 42 Hen… 3 h 1.2 e6 S Jellis 2016-07-16 2.5 3067
## 10 Abbotsf… 78 Yar… 3 h 1.18e6 S LITTLE 2016-07-16 2.5 3067
## # … with 20 more rows, 11 more variables: Bedroom2 , Bathroom ,
## # Car , Landsize , BuildingArea , YearBuilt ,
## # CouncilArea , Lattitude , Longtitude , Regionname ,
## # Propertycount , and abbreviated variable names ¹​Distance, ²​Postcode
(b). Use R to produce histograms of all the possible continuous variables. [4 marks]
par("mfrow"=c(3, 4))
hist(finaldata1$Rooms, col="blue",main = "Rooms")
hist(finaldata1$Price, col="blue",main = "Price")
hist(finaldata1$Distance, col="blue",main = "Distance")
hist(finaldata1$Postcode, col="blue",main = "Postcode")
hist(finaldata1$Bedroom2, col="green",main = "Bedroom_2")
hist(finaldata1$Bathroom, col="green",main = "Bathroom")
hist(finaldata1$Car, col="green",main = "Car")
hist(finaldata1$Landsize, col="green",main = "Landsize")
hist(finaldata1$BuildingArea, col="red",main = "Building Area")
hist(finaldata1$YearBuilt, col="red",main = "Yearbuilt")
hist(finaldata1$Propertycount, col="red",main = "Proprtycount")
(c). Use R to produce descriptive statistics for all the variables in part (a). [4 marks]
skimr::skim(finaldata1)
Data summary
    Name
    finaldata1
    Number of rows
    4088
    Number of columns
    21
    _______________________
    
    Column type frequency:
    
    character
    7
    Date
    1
    numeric
    13
    ________________________
    
    Group variables
    None
Variable type: character
    skim_variable
    n_missing
    complete_rate
    min
    max
    empty
    n_unique
    whitespace
    Suburb
    0
    1
    3
    18
    0
    280
    0
    Address
    0
    1
    8
    22
    0
    4037
    0
    Type
    0
    1
    1
    1
    0
    1
    0
    Method
    0
    1
    1
    2
    0
    5
    0
    SellerG
    0
    1
    1
    17
    0
    172
    0
    CouncilArea
    0
    1
    4
    17
    0
    31
    0
    Regionname
    0
    1
    16
    26
    0
    8
    0
Variable type: Date
    skim_variable
    n_missing
    complete_rate
    min
    max
    median
    n_unique
    Date
    0
    1
    2016-02-04
    2017-08-12
    2016-11-27
    51
Variable type: numeric
    skim_variable
    n_missing
    complete_rate
    mean
    sd
    p0
    p25
    p50
    p75
    p100
    hist
    Rooms
    0
    1
    3.31
    0.85
    1.00
    3.00
    3.00
    4.00
    8.00
    ▂▇▆▁▁
    Price
    0
    1
    1273016.20
    720060.06
    131000.00
    785750.00
    1100000.00
    1555000.00
    9000000.00
    ▇▁▁▁▁
    Distance
    0
    1
    10.57
    5.96
    1.30
    6.68
    9.70
    13.10
    47.40
    ▇▆▁▁▁
    Postcode
    0
    1
    3100.99
    94.27
    3002.00
    3042.00
    3079.00
    3145.00
    3977.00
    ▇▁▁▁▁
    Bedroom2
    0
    1
    3.27
    0.86
    0.00
    3.00
    3.00
    4.00
    9.00
    ▁▇▅▁▁
    Bathroom
    0
    1
    1.69
    0.76
    1.00
    1.00
    2.00
    2.00
    8.00
    ▇▁▁▁▁
    Car
    0
    1
    1.75
    1.04
    0.00
    1.00
    2.00
    2.00
    10.00
    ▇▁▁▁▁
    Landsize
    0
    1
    513.57
    324.69
    0.00
    305.00
    541.00
    665.00
    8216.00
    ▇▁▁▁▁
    BuildingArea
    0
    1
    165.50
    96.20
    1.00
    112.00
    143.00
    194.00
    3112.00
    ▇▁▁▁▁
    YearBuilt
    0
    1
    1953.13
    38.63
    1196.00
    1925.00
    1955.00
    1980.00
    2018.00
    ▁▁▁▁▇
    Lattitude
    0
    1
    -37.80
    0.08
    -38.16
    -37.85
    -37.79
    -37.75
    -37.46
    ▁▂▇▂▁
    Longtitude
    0
    1
    144.99
    0.11
    144.54
    144.91
    145.00
    145.06
    145.53
    ▁▃▇▁▁
    Propertycount
    0
    1
    7164.28
    4242.28
    389.00
    3873.00
    6380.00
    9264.00
    21650.00
    ▆▇▃▂▁
(d). Use R to produce boxplots describing the continuous variables side by side. This should be a picture of one plot. [2 marks]
par("mfrow"=c(3, 4))
boxplot(finaldata1$Rooms, col="blue",main = "Rooms")
boxplot(finaldata1$Price, col="blue",main = "Price")
boxplot(finaldata1$Distance, col="blue",main = "Distance")
boxplot(finaldata1$Postcode, col="blue",main = "Postcode")
boxplot(finaldata1$Bedroom2, col="green",main = "Bedroom_2")
boxplot(finaldata1$Bathroom, col="green",main =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
April
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30