Coding Homework 2 For the following exercises, either use the heart failure data set again or any data set of your choice. Make sure to submit some proof of your code along with your answers. ...

1 answer below »


Coding Homework 2



For the following exercises, either use the heart failure data set again or any data set of your choice. Make sure to submit some proof of your code along with your answers.




Exercise 1





Use all available variables to predict death (or your binary outcome of choice if you use your own data set). Split the data into training and testing sets (75%/25% split) before answering the following questions.



1.Fit and plot a decision tree with the training data. Which predictors seem to be the most important based on how the tree looks?


2.What is the ROC AUC of the model on the training set? On the testing set?




Exercise 2





Using the training set, use LASSO to perform variable selection with death as the outcome variable (or your binary outcome of choice if you use your own data set). What are the selected variables? Do these variables match what you were expecting?




Bonus





1.Attempt to make a decision tree that has a better ROC AUC on the testing set than the one you made in Exercise 1. You can try playing around with the optional minsplit (the minimum number of observations that must exist in a node in order for a split to be attempted), minbucket (the minimum number of observations in any terminal leaf node), maxdepth (the maximum depth of any node of the final tree), and/or cp (complexity parameter) arguments of the rpart() function to do this. If you were successful, why do you think were? If you were not successful, explain the thought process behind your attempt.


2.Create a prediction model that you have not already made in either homework (e.g., random forest [ranger], neural network [nnet], SVM [kernlab], gradient boosted trees [xgboost]) to predict an outcome of your choice with a data set of your choice (including the Alzheimer’s data). Why did you choose to make this model? If you searched for help with the implementation of your chosen model, how difficult was this process for you?


Answered 1 days AfterOct 09, 2021

Answer To: Coding Homework 2 For the following exercises, either use the heart failure data set again or any...

Suraj answered on Oct 10 2021
121 Votes
The decision tree model using all the predictors variables and death as outcome variable is implemented using R Studio. The training and testing splitting is as per the question that is 75% training and 25% testing.
The R code is given as follows:
1)
getwd()
setwd("C:/Users/Hp/Desktop")
df<-read.csv("heart_clinic.csv")
head(df)
dim(df)
shuffle_index<-sample(1:nrow(df))
df<-df[shuffle_index,] # shuffle the data set
create_train_test <- function(data, size = 0.75, train = TRUE) {
n_row = nrow(data)
total_row = size * n_row
train_sample <- 1: total_row
if (train == TRUE) {
return (data[train_sample, ])
} else {
return (data[- train_sample, ])
}
}
data_train <- create_train_test(df, 0.75, train = TRUE)
data_test <- create_train_test(df, 0.8, train = FALSE)
dim(data_train)
install.packages("rpart.plot")
library(rpart)
library(rpart.plot)
fit <- rpart(death~., data = data_train, method = 'class')
rpart.plot(fit, extra = 106)
pred<-predict(fit,...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here