1. Clean the excel data set using Rstudio.-Utilize the forcats package for reducing categories-Utilize the mice package for imputingThe HOR_state and UnitST columns of States needs to be reduced to 4...

1. Clean the excel data set using Rstudio.

-Utilize the forcats package for reducing categories

-Utilize the mice package for imputing

The HOR_state and UnitST columns of States needs to be reduced to 4 regions of West, Midwest, South and Northeast. The Branch categories needs to be reduced to unrestricted and restricted:

Unrestricted - Armor, Air Defense Artillery, Ammunition, Aviation, Field Artillery, Infantry, Logistics, Mechanical Maintenance, Military Police, Special Forces.

Restricted - Adjacent General, Army Medical Specialist Corps, Army Nurse Corps, Behavioral Sciences, CBRN, Chaplain, Civil Affairs, CMF Immaterial, Corps of Engineers, Cyber, Dental Corps, Electronic Maintenance, Financial Management, Force Management, Health Services, Information Operations, Information Systems Engineer, Judge Advocate Generals Corps, Laboratory Sciences, Medical Corps, Military Intelligence, Nuclear & Counterproliferation, Operations Research/Systems Analysis, Personnel Special Reporting Codes, Preventative Medical Sciences, Psychological Operations, Public Affairs, Quartermaster Corps, Recruitment & Reenlistment, Research/Development/Acquisition, Signal Corps, Simulations Operations, Space Operations, Strategist Intelligence, Strategist, Systems Automation Officer, Telecommunications Systems Engineers, Transportation Corps, Veterinary Corps.

2. After the data has been cleaned. Fit a random forest model using the of the unvac_pop column as the response variable to the Branch column and UnitST column using the supporting material word document as guidance.

3.Then estimate the AUC value of the random forest model using the supporting material word document.

rdata-cleaning-244goba4.r supporting-material-exjhck2r-1-1arm5v2n.docx

Answered 2 days AfterNov 08, 2022

Answer To: 1. Clean the excel data set using Rstudio.-Utilize the forcats package for reducing...

Mohd answered on Nov 11 2022

55 Votes

-
-
-
2022-11-11
Importing the dataset
library(readxl)
data <- read_excel("data.xlsx", col_types = c("text",
"text", "text", "text", "date", "date",
"numeric", "numeric", "text", "text",
"numeric", "text", "text", "text", "text",
"text", "text", "numeric", "numeric",
"text", "text", "text", "text", "date",
"text", "text", "text", "text", "text",
"numeric", "text", "text", "text", "text"))
First look of the Data
skimr::skim(data)
Data summary
    Name
    data
    Number of rows
    176485
    Number of columns
    34
    _______________________

    Column type frequency:

    character
    25
    numeric
    6
    POSIXct
    3
    ________________________

    Group variables
    None
Variable type:...

SOLUTION.PDF

1. Clean the excel data set using Rstudio.-Utilize the forcats package for reducing categories-Utilize the mice package for imputingThe HOR_state and UnitST columns of States needs to be reduced to 4...

Answer To: 1. Clean the excel data set using Rstudio.-Utilize the forcats package for reducing...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment