The dataset represents 10 years XXXXXXXXXXof clinical care at 130 US hospitals. It includes over 50 features representing diabetic patient and hospital outcomes. Detailed description of all the...

1 answer below »
The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals. It includes over 50 features representing diabetic patient and hospital outcomes. Detailed description of all the atrributes is provided in Table 1 in Beata Strack, et al.’s paper “Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records,” BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.( https://www.hindawi.com/journals/bmri/2014/781670/). The dependent variable is “Readmitted” – whether a patient will re-admitted the next year and how long they will stay in hospital. In this original dataset, it represents days to inpatient readmission and include three possible values (>30, Tasks: (the sentences in
bold
indicate the thing you need to include in your word document, and you don’t need to submit anything else)

  1. Download and install WEKA tool from http://www.cs.waikato.ac.nz/ml/weka/

  2. Download the sample data from D2L site (diabetes.zip), unzip it. You will find a training dataset diabetic_training.csv and a test dataset diabetic_test.csv.

  3. Use weka->explorer->preprocess-> open file to open the training dataset(diabetic_training.csv), remove two variables “encounter_id” and “patient_nbr” (these ids shouldn’t used in modeling) and then save it as diabetic_training.arff. Then open the test dataset (diabetic_test.csv), again remove “encounter_id” and “patient_nbr” and save it as diabetic_test.arff.


-------- You don’t need to submit anything for the above tasks -------------------------------

  1. Open diabetic_training.arff. Use three variable selection methods including 1) filter-based method based on “information gain”, 2) filter-based method with “Chi-squared attribute evaluation”, and 2) wrapper-based method with the J48 decision tree. Please try to combine the results you obtained from these three different methods.


In your word document, show me the outputs of these three variable selection methods. We want to select 10 variables. Please try to combine the variables selected using different methods and show the 10 variables you think should be selected. Please briefly explain how you combine the results of the different methods.

  1. Remove the variables that haven’t been selected from diabetic_training.arff and save it as diabetic_training2.arff. Then open diabetic_test.arff. Again, remove the variables that haven’t been selected and save it as diabetic_test2.arff. (you don’t need to submit anything for this task)

  2. Open diabetic_training2.arff. Then we fit three models:



  • Use the training dataset (diabetic_training2.arff) with 10 variables to fit a neural network (MultilevelPercetron in weka) model. Please let weka automatically split your training data into training vs. validation (70% vs. 30%), and
    show me the results including include recall, precision, F1-score and accuracy.
    Warning:
    It will take quite some time to fit a neural network model.

  • Using the training dataset (diabetic_training2.arff), fit a SVM model (SMO in weka). Please let weka automatically split your training data into training vs. validation (70% vs. 30%). Please
    show me the results including include recall, precision, F1-score and accuracy.


  • Using the training dataset (diabetic_training2.arff), run logistic regression (rather than simple logistic regression) model ). Please let weka automatically split your training data into training vs. validation (70% vs. 30%). Please
    show me the results including include recall, precision, F1-score and accuracy.



----When you fit these models, please just use the default hyperparameters, but in real practice, you need to tune algorithm hyper-parameters.--------------------------------------


  1. Recommend the best model among the three. Please briefly justify your recommendation.

  2. Now you have selected which algorithm you want to use, we want to fit and assess the final model using the training dataset (diabetic_training2.arff) and the test dataset (diabetic_test2.arff). Please
    show me the results including include recall, precision, F1-score and accuracy.


Answered Same DayDec 27, 2021

Answer To: The dataset represents 10 years XXXXXXXXXXof clinical care at 130 US hospitals. It includes over 50...

David answered on Dec 27 2021
124 Votes
Data Mining
    
    Data Mining
    [Diabetes Readmission]
    
    
    
    
Contents
2Attribute selection
2Filter-based method based on “information gain”
2Filter-based met
hod with Chi-squared evaluation
3Wrapper-based method with the J48 decision tree.
3Selected Features
4Modeling
4SMO
5Logistic
6Neural Network
6Best Model
7Testing with Test data
Attribute selection
We conduct the attribute selection on the basis of three methods:
Filter-based method based on “information gain”
Filter-based method with Chi-squared evaluation
Wrapper-based method with the J48 decision tree.
Wrapper based took a lot of time to execute.
Selected Features
The top 10 features selected on the basis of the algorithms are :
1. race
2. gender
3. age
4. admission_type_id
5. discharge_disposition_id
6. time_in_hospital
7. num_lab_procedures
8. num_procedures
9. number_emergency
10. num_medications
Description:
These were the common features selected by two algorithms and made sense in general because these are the factors which decide whether a person is prone to admission again or not.
Although we feel diagnostics plays an equal role in making a person readmit, but we hope that will be covered in medications. Missing data for personal attributes is low for the selected attributes.
Modeling
SMO

=== Summary ===
Correctly Classified Instances 1130 54.6686 %
Incorrectly Classified Instances 937 45.3314 %
Kappa statistic 0.092
Mean absolute error 0.4533
Root mean squared error 0.6733
Relative absolute error ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here