Read in a data set and clean it up, describing your process as you go. Produce a minimum of four visual summaries Apply an appropriate modelling strategy to help answer your proposed question....

1 answer below »


  1. Read in a data set and clean it up, describing your process as you go.

  2. Produce a minimum of four visual summaries

  3. Apply an appropriate modelling strategy to help answer your proposed question.

  4. Describe and diagnose your models.

  5. Explain how you’ve answered your questions with the data from your models and plots.

Answered Same DaySep 24, 2021Monash University

Answer To: Read in a data set and clean it up, describing your process as you go. Produce a minimum of four...

Subhanbasha answered on Sep 25 2021
165 Votes
R Notebook
R Notebook
# installing packages
# install.packages("devtools")
# install.packages("dplyr")
# install.packages("caret")
# install.packages("Metrics")
# devtools::install_github("Saraswathi-Analytics/R/SA")
# calling packages
library(devtools)
## Loading required package: usethis
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(SA)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(Metrics)
##
## Attaching package: 'Metrics'
## The following objects are masked from 'package:caret':
##
## precision, recall
Reading data, cleaning and procedure: Generally we read the data into R from the Excel and then we check for dimensions. Here in the data we have 45629 records and 17 features. We have the several data types in the data so, we need to change the data types as we required which is acceptable for the modeling.
In data we have NA values so we can replace the null values with zero’s but then we loss the model accuracy so, here we are replacing the NA values with corresponding column mean value. First we need to aggregate the data by country and date to get the unique data set. Then we go for some assumptions of the regression we will check these assumptions by plotting the features.
If it is satisfy all the assumptions then we will check for the correlation between the features. Here we are going to predict the positive cases by using all other variables as independent variables. Then we use correlation matrix for knowing which variable is highly related to the dependent variable which is positive cases. So, now we select some of the useful features for modeling.
Here we will build two or three models by changing the parameters and we will check for the accuracy of each model then finally we wills elect the one model which is giving good amount of accuracy.
# reading data
df <- read.csv('covid-wongwqap-adbfl2oa.csv', header = TRUE, sep = ",")
# changing the date format
df$date <- as.Date(df$date, format = "%d-%m-%Y")
# first six records of the data
head(df)
## location date total_cases new_c
ases total_deaths new_deaths
## 1 Afghanistan 2019-12-31 0 0 0 0
## 2 Afghanistan 2020-01-01 0 0 0 0
## 3 Afghanistan 2020-01-02 0 0 0 0
## 4 Afghanistan 2020-01-03 0 0 0 0
## 5 Afghanistan 2020-01-04 0 0 0 0
## 6 Afghanistan 2020-01-05 0 0 0 0
## total_cases_per_million new_cases_per_million total_deaths_per_million
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## new_deaths_per_million population population_density median_age aged_65_older
## 1 0 38928341 54.422 18.6 2.581
## 2 0 38928341 54.422 18.6 2.581
## 3 0 38928341 54.422 18.6 2.581
## 4 0 38928341 54.422 18.6 2.581
## 5 0 38928341 54.422 18.6 2.581
## 6 0 38928341 54.422 18.6 2.581
## aged_70_older handwashing_facilities hospital_beds_per_thousand
## 1 1.337 37.746 0.5
## 2 1.337 37.746 0.5
## 3 1.337 37.746 0.5
## 4 1.337 37.746 0.5
## 5 1.337 37.746 0.5
## 6 1.337 37.746 0.5
# dimension of the data
dim(df)
## [1] 45629 17
# replacing NA values
df <- Fill_NA(df, replace = "MEAN")
# agrregating data frame
all_countries = aggregate( .~ date, data = df[,c(2,3,5),drop=FALSE], FUN = mean)
# plotting total cases
par(mfrow=c(1,2))
plot(all_countries$total_cases,
col = "blue", type = 'l', xlab = "Date",
xaxt = "n", ylab = "Total Cases",
main = "Avg No.of COVID cases")
lines(all_countries$total_deaths,col="red")
legend('topleft',
legend = c('Positive Cases',"Death Cases"),
fill = c('blue','red'),
col = c('blue','red'),
title = "COVID Cases")
From the above plot the death cases are constant over the days and the positive cases are at first are low but as time going the cases are increasing rapidly. So, we can say that total positive cases are increasing over the days.
# plotting total cases along the coountries
tot_cases = aggregate( .~ location, data = df[,c(1,3,5),drop=FALSE], FUN = mean)
tot_cases <- tot_cases %>% arrange(total_cases) %>% as.data.frame() %>% tail(11)
barplot(total_cases ~ location, data = tot_cases[-11,],
col = rainbow(10),
main = "Top 10 Max COVID Cases")
The above plot will show the top positive cases of the countries. Here we can see that United States having high number of positive cases and next will be the Brazil also next position is India. Apart from these countries all hare having the less number of positive cases.
# agrregating by mean
df1 = aggregate( .~ location, data = df[,c(1,12:ncol(df)),drop=FALSE], FUN = mean)
two_Countries<-df[grepl("India|Australia",df$location),]
two_Countries<-two_Countries[,c(11,14,17),]
# correlation plot
corrplot::corrplot(cor(two_Countries))
The plot is called correlation plot among all the variables. from the above plot there is negative correlation between population with aged_65_older and hospital_beds_per_thousand that mens if the population increase then there is a chance of decreasing the aged_65_older and hospital_beds_per_thousand.
and there is positive correlation between aged_65_older and hospital_beds_per_thousand that means if there is increase in aged_65_older then there is a chance of increase hospital_beds_per_thousand.
# histogram of the median age
hist(df1$median_age,xlab = "Meadian Age",
main = "Histogram of Median Age",
col = rainbow(7))
The above plot shows that the positive cases of the median age group people. Here we can see that most of 30-35 age group people are getting positive and the low positive cases are coming from 45-50 age group people.
# Splitting the data into train and test
set.seed(1234)
Sample <- sample(1:nrow(df), round(nrow(df)*.7))
Train_df <- df[Sample,,drop=FALSE]
Test_df <- df[-Sample,,drop=FALSE]
#fitting a linear regression model to whole data
model1<-lm(total_cases~.,data=Train_df)
#summary of the model
summary(model1)
##
## Call:
## lm(formula = total_cases ~ ., data = Train_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3809908 -4875 93 4957 5430740
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.224e+13 2.298e+14 0.140 0.88841
## locationAlbania 2.989e+13 2.130e+14 0.140 0.88841
## locationAlgeria -2.196e+13 1.565e+14 -0.140 0.88841
## locationAndorra 6.477e+13 4.616e+14 0.140 0.88841
## locationAngola -1.809e+13 1.289e+14 -0.140 0.88841
## locationAnguilla 1.814e+14 1.293e+15 0.140 0.88841
## locationAntigua and Barbuda 1.051e+14 7.491e+14 0.140 0.88841
## locationArgentina -2.266e+13 1.615e+14 -0.140 0.88841
## locationArmenia 2.874e+13 2.048e+14 0.140 0.88841
## locationAruba 3.142e+14 2.239e+15 0.140 0.88841
## locationAustralia -3.034e+13 2.163e+14 -0.140 0.88841
## locationAustria 3.100e+13 2.209e+14 0.140 0.88841
## locationAzerbaijan 3.844e+13 2.740e+14 0.140 0.88841
## locationBahamas -8.842e+12 6.302e+13 -0.140 0.88841
## locationBahrain 1.115e+15 7.944e+15 0.140 0.88841
## locationBangladesh 7.172e+14 5.111e+15 0.140 0.88841
## locationBarbados 3.614e+14 2.576e+15 0.140 0.88841
## locationBelarus -4.481e+12 3.194e+13 -0.140 0.88841
## locationBelgium 1.903e+14 1.356e+15 0.140 0.88841
## locationBelize -2.251e+13 1.604e+14 -0.140 0.88841
## locationBenin 2.648e+13 1.887e+14 0.140 0.88841
## locationBermuda 7.432e+14 5.296e+15 0.140 0.88841
## locationBhutan -1.969e+13 1.403e+14 -0.140 0.88841
## locationBolivia -2.620e+13 1.867e+14 -0.140 0.88841
## locationBonaire Sint Eustatius and Saba 1.814e+14 1.293e+15 0.140 0.88841
## locationBosnia and Herzegovina 8.338e+12 5.942e+13 0.140 0.88841
## locationBotswana -2.985e+13 2.127e+14 -0.140 0.88841
## locationBrazil -1.741e+13 1.241e+14 -0.140 0.88841
## locationBritish Virgin Islands 9.097e+13 6.483e+14 0.140 0.88841
## locationBrunei 1.595e+13 1.137e+14 0.140 0.88841
## locationBulgaria 6.374e+12 4.542e+13 0.140 0.88841
## locationBurkina Faso 9.319e+12 6.641e+13 0.140 0.88841
## locationBurundi 2.184e+14 1.556e+15 0.140 0.88841
## locationCambodia 2.148e+13 1.531e+14 0.140 0.88841
## locationCameroon -2.095e+12 1.493e+13 -0.140 0.88841
## locationCanada -2.985e+13 2.127e+14 -0.140 0.88841
## locationCape Verde 4.808e+13 3.427e+14 0.140 0.88841
## locationCayman Islands 1.197e+14 8.532e+14 0.140 0.88841
## locationCentral African Republic -2.781e+13 1.982e+14 -0.140 0.88841
## locationChad -2.523e+13 1.798e+14 -0.140 0.88841
## locationChile -1.786e+13 1.273e+14 -0.140 0.88841
## locationChina 5.525e+13 3.937e+14 0.140 0.88841
## locationColombia -6.042e+12 4.306e+13 -0.140 0.88841
## locationComoros 2.269e+14 1.617e+15 0.140 0.88841
## locationCongo -2.312e+13 1.647e+14 -0.140 0.88841
## locationCosta Rica 2.468e+13 1.759e+14 0.140 0.88841
## locationCote d'Ivoire 1.302e+13 9.279e+13 0.140 0.88841
## locationCroatia 1.144e+13 8.151e+13 0.140 0.88841
## locationCuba 3.317e+13 2.364e+14 0.140 0.88841
## locationCuracao 1.826e+14 1.301e+15 0.140 0.88841
## locationCyprus 4.339e+13 3.092e+14 0.140 0.88841
## locationCzech Republic 4.903e+13 3.494e+14 0.140 0.88841
## locationDemocratic Republic of Congo -1.099e+13 7.829e+13 -0.140 0.88841
## locationDenmark 4.864e+13 3.466e+14 0.140 0.88841
## locationDjibouti -7.783e+12 5.547e+13 -0.140 0.88841
## locationDominica 2.615e+13 1.864e+14 0.140 0.88841
## locationDominican Republic 9.980e+13 7.112e+14 0.140 0.88841
## locationEcuador 7.416e+12 5.285e+13 0.140 0.88841
## locationEgypt 2.582e+13 1.840e+14 0.140 0.88841
## locationEl Salvador 1.501e+14 1.070e+15 0.140 0.88841
## locationEquatorial Guinea -5.467e+12 3.896e+13 -0.140 0.88841
## locationEritrea -5.994e+12 4.272e+13 -0.140 0.88841
## locationEstonia -1.386e+13 9.875e+13 -0.140 0.88841
## locationEthiopia 2.994e+13 2.134e+14 0.140 0.88841
## locationFaeroe Islands -1.132e+13 8.070e+13 -0.140 0.88841
## locationFalkland Islands 1.814e+14 1.293e+15 0.140 0.88841
## locationFiji -2.879e+12 2.052e+13 -0.140 0.88841
## locationFinland -2.150e+13 1.532e+14 -0.140 0.88841
## locationFrance 4.038e+13 2.878e+14 0.140 0.88841
## locationFrench Polynesia 1.357e+13 9.670e+13 0.140 0.88841
## locationGabon -2.759e+13 1.966e+14 -0.140 0.88841
## locationGambia 9.073e+13 6.466e+14 0.140 0.88841
## locationGeorgia 6.286e+12 4.480e+13 0.140 0.88841
## locationGermany 1.082e+14 7.710e+14 0.140 0.88841
## locationGhana 4.283e+13 3.053e+14 0.140 0.88841
## locationGibraltar 2.016e+15 1.437e+16 0.140 0.88841
## locationGreece 1.721e+13 1.227e+14 0.140 0.88841
## locationGreenland -3.216e+13 2.292e+14 -0.140 0.88841
## locationGrenada 1.556e+14 1.109e+15 0.140 0.88841
## locationGuam 1.479e+14 1.054e+15 0.140 0.88841
## locationGuatemala 6.127e+13 4.366e+14 0.140 0.88841
## locationGuernsey 1.814e+14 1.293e+15 0.140 0.88841
## locationGuinea -1.580e+12 1.126e+13 -0.140 0.88841
## locationGuinea-Bissau 6.972e+12 4.969e+13 0.140 0.88841
## locationGuyana -2.990e+13 2.131e+14 -0.140 0.88841
## locationHaiti 2.038e+14 1.453e+15 0.140 0.88841
## locationHonduras 1.682e+13 1.198e+14 0.140 0.88841
## locationHong Kong 4.138e+15 2.949e+16 0.140 0.88841
## locationHungary 3.177e+13 2.264e+14 0.140 0.88841
## locationIceland -3.023e+13 2.154e+14 -0.140 0.88841
## locationIndia 2.346e+14 1.672e+15 0.140 0.88841
## locationIndonesia 5.409e+13 3.855e+14 0.140 0.88841
## locationInternational 1.814e+14 1.293e+15 0.140 0.88841
## locationIran -2.720e+12 1.938e+13 -0.140 0.88841
## locationIraq 1.997e+13 1.423e+14 0.140 0.88841
## locationIreland 9.154e+12 6.524e+13 0.140 0.88841
## locationIsle of Man 5.536e+13 3.946e+14 0.140 0.88841
## locationIsrael 2.063e+14 1.470e+15 0.140 0.88841
## locationItaly 8.972e+13 6.394e+14 0.140 0.88841
## locationJamaica 1.259e+14 8.970e+14 0.140 0.88841
## locationJapan 1.738e+14 1.239e+15 0.140 0.88841
## locationJersey 1.814e+14 1.293e+15 0.140 0.88841
## locationJordan 3.250e+13 2.316e+14 0.140 0.88841
## locationKazakhstan -2.828e+13 2.016e+14 -0.140 0.88841
## locationKenya 1.949e+13 1.389e+14 0.140 0.88841
## locationKosovo 6.738e+13 4.802e+14 0.140 0.88841
## locationKuwait 1.053e+14 7.503e+14 0.140 0.88841
## locationKyrgyzstan -1.309e+13 9.326e+13 -0.140 0.88841
## locationLaos -1.464e+13 1.043e+14 -0.140 0.88841
## locationLatvia -1.375e+13 9.800e+13 -0.140 0.88841
## locationLebanon 3.200e+14 2.281e+15 0.140 0.88841
## locationLesotho 1.134e+13 8.081e+13 0.140 0.88841
## locationLiberia -3.137e+12 2.236e+13 -0.140 0.88841
## locationLibya -3.010e+13 2.145e+14 -0.140 0.88841
## locationLiechtenstein 1.082e+14 7.709e+14 0.140 0.88841
## locationLithuania -5.502e+12 3.921e+13 -0.140 0.88841
## locationLuxembourg 1.049e+14 7.474e+14 0.140 0.88841
## locationMacedonia 1.669e+13 1.190e+14 0.140 0.88841
## locationMadagascar -6.203e+12 4.421e+13 -0.140 0.88841
## locationMalawi 8.478e+13 6.042e+14 0.140 0.88841
## locationMalaysia 2.478e+13 1.766e+14 0.140 0.88841
## locationMaldives 8.294e+14 5.911e+15 0.140 0.88841
## locationMali -2.324e+13 1.656e+14 -0.140 0.88841
## locationMalta 8.292e+14 5.909e+15 0.140 0.88841
## locationMauritania -2.970e+13 2.117e+14 -0.140 0.88841
## locationMauritius 3.368e+14 2.400e+15 0.140 0.88841
## locationMexico 7.122e+12 5.076e+13 0.140 0.88841
## locationMoldova 4.102e+13 2.923e+14 0.140 0.88841
## locationMonaco 1.143e+16 8.146e+16 0.140 0.88841
## locationMongolia -3.107e+13 2.214e+14 -0.140 0.88841
## locationMontenegro -4.824e+12 3.438e+13 -0.140 0.88841
## locationMontserrat 1.814e+14 1.293e+15 0.140 0.88841
## locationMorocco 1.520e+13 1.083e+14 0.140 0.88841
## locationMozambique -9.890e+12 7.049e+13 -0.140 0.88841
## locationMyanmar 1.617e+13 1.153e+14 0.140 0.88841
## locationNamibia -3.042e+13 2.168e+14 -0.140 0.88841
## locationNepal 8.887e+13 6.334e+14 0.140 0.88841
## locationNetherlands 2.690e+14 1.917e+15 0.140 0.88841
## locationNew Caledonia -2.315e+13 1.650e+14 -0.140 0.88841
## locationNew Zealand -2.146e+13 1.529e+14 -0.140 0.88841
## locationNicaragua -1.632e+12 1.163e+13 -0.140 0.88841
## locationNiger -2.220e+13 1.582e+14 -0.140 0.88841
## locationNigeria 9.193e+13 6.551e+14 0.140 0.88841
## locationNorthern Mariana Islands 3.878e+13 2.764e+14 0.140 0.88841
## locationNorway -2.367e+13 1.687e+14 -0.140 0.88841
## locationOman -2.337e+13 1.665e+14 -0.140 0.88841
## locationPakistan 1.192e+14 8.493e+14 0.140 0.88841
## locationPalestine 4.288e+14 3.056e+15 0.140 0.88841
## locationPanama 4.212e+11 3.002e+12 0.140 0.88841
## locationPapua New Guinea -2.145e+13 1.529e+14 -0.140 0.88841
## locationParaguay -2.209e+13 1.574e+14 -0.140 0.88841
## locationPeru -1.735e+13 1.237e+14 -0.140 0.88841
## locationPhilippines 1.762e+14 1.256e+15 0.140 0.88841
## locationPoland 4.124e+13 2.939e+14 0.140 0.88841
## locationPortugal 3.433e+13 2.447e+14 0.140 0.88841
## locationPuerto Rico 1.907e+14 1.359e+15 0.140 0.88841
## locationQatar 1.024e+14 7.300e+14 0.140 0.88841
## locationRomania 1.819e+13 1.297e+14 0.140 0.88841
## locationRussia -2.701e+13 1.925e+14 -0.140 0.88841
## locationRwanda 2.609e+14 1.860e+15 0.140 0.88841
## locationSaint Kitts and Nevis 9.387e+13 6.690e+14 0.140 0.88841
## locationSaint Lucia 1.415e+14 1.008e+15 0.140 0.88841
## locationSaint Vincent and the Grenadines 1.347e+14 9.600e+14 0.140 0.88841
## locationSan Marino 2.976e+14 2.121e+15 0.140 0.88841
## locationSao Tome and Principe 9.385e+13 6.689e+14 0.140 0.88841
## locationSaudi Arabia -2.316e+13 1.651e+14 -0.140 0.88841
## locationSenegal 1.653e+13 1.178e+14 0.140 0.88841
## locationSerbia 1.533e+13 1.092e+14 0.140 0.88841
## locationSeychelles 9.120e+13 6.499e+14 0.140 0.88841
## locationSierra Leone 2.979e+13 2.123e+14 0.140 0.88841
## locationSingapore 4.657e+15 3.319e+16 0.140 0.88841
## locationSint Maarten (Dutch part) 6.841e+14 4.875e+15 0.140 0.88841
## locationSlovakia 3.478e+13 2.479e+14 0.140 0.88841
## locationSlovenia 2.855e+13 2.035e+14 0.140 0.88841
## locationSomalia -1.832e+13 1.306e+14 -0.140 0.88841
## locationSouth Africa -4.543e+12 3.238e+13 -0.140 0.88841
## locationSouth Korea 2.805e+14 1.999e+15 0.140 0.88841
## locationSouth Sudan 1.814e+14 1.293e+15 0.140 0.88841
## locationSpain 2.292e+13 1.633e+14 0.140 0.88841
## locationSri Lanka 1.703e+14 1.214e+15 0.140 0.88841
## locationSudan -1.846e+13 1.316e+14 -0.140 0.88841
## locationSuriname -3.010e+13 2.145e+14 -0.140 0.88841
## locationSwaziland 1.485e+13 1.059e+14 0.140 0.88841
## locationSweden -1.760e+13 1.254e+14 -0.140 0.88841
## locationSwitzerland 9.468e+13 6.748e+14 0.140 0.88841
## locationSyria 1.814e+14 1.293e+15 0.140 ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here