7CS039 Assessment Statistics for AI & Data Science Assessment Overview Your assessment takes the form of an exploratory data analysis using R. You must include screenshots of your R code and you must...

1 answer below »

View more »
Answered 2 days AfterFeb 08, 2021

Answer To: 7CS039 Assessment Statistics for AI & Data Science Assessment Overview Your assessment takes the...

Suraj answered on Feb 11 2021
155 Votes
Assignment
Topic: Statistics for AI & Data Science using R
Submitted By:
Submitted To:
Date: 11/02/2021
(i)
The first task is to load the csv file in R and run the following comma
nds for the summary statistics:
df<-read.csv("C:/Users/Hp/Downloads/data.csv")
summary(df)
The output is given as follows:
Here, in time variable the minimum value is 10 days and maximum is 5565 days. The mean survival time since the operation is 2153 days.
For the age variable the minimum value is 4 years and maximum value is 95 years. The mean age at the time of operation is 52.46 years.
The thickness for tumour has minimum value 0.10 mm and maximum value is 17.42 mm. The average tumour thickness is 2.92 mm.
The 1st quartile means that 25% observations are less than a particular value and 3rd quartile means 75% observations are less than a particular value.
(ii)
The appropriate/useful plots are given for the three important variables as follows:
The R code is given as follows:
plot(df$time,df$age)
plot(df$time,df$thickness)
plot(df$age,df$thickness)
hist(df$time)
hist(df$age)
hist(df$thickness)
boxplot(df$time)
First is the histogram for distribution of the variables,
The distribution of the time is approximately normally distributed.
The distribution of the age variable is approximately normal.
The distribution of the thickness variable is positively skewed.
To see the relationship between two variables, scatter plot is an appropriate plot. It is drawn as follows:
To detect the outliers in the dataset, boxplot is good plot. It is plotted for time variable as follows:
Thus, from the above plot it is detected that there is one outlier present in the time variable.
(iii)
A regression analysis is used to make a model to predict the value of dependent variable by the use of independent variables. Here, we will...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
April
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30