R programming and data calculation
Summer 2021-Saiying(Tina) Ge Part II. Short Answer Solve each of the following problems. 1. See the dataset below about Titanic passengers. Room is an independent variable representing the class of room passenger stayed in, first, second, and third. Survive is the dependent variable representing whether the passenger survived the accident. We wish to build a decision tree to determine whether a passenger survived, given room class. Please calculate the total entropy of the data, and the information gain once we use attribute Room as the root node of the decision tree. Room Survive 2nd Yes 1st No 3rd Yes 1st Yes 1st Yes 3rd No 2nd No 3rd No 1st Yes 3rd No 2nd Yes 3rd No Note: ���!1 = 0, ���!12 = −1 ���!13 = −1.6, ���!23 = −0.6 ���!14 = −2, ���!34 = −0.4 ���!15 = −2.3, ���!25 = −1.3, ���!35 = −0.7, ���!45 = −0.3 1 2. See the neural network below, the values of each node in input layer xi are provided in the plot. All the weights for input layer and hidden layers, w1ij and w2ij, are given in the table below. Node x1 represents gender, where female is 1 and male is 0. Node x2 represents year of study, where freshman is 1, sophomore is 2, junior is 3, and senior is 4. Node x3 represents GPA. The output layer represents whether the student will get an A in the new course he/she is taking. Please use the neural network to predict whether a female sophomore student with a GPA of 3.0 will get an A in the course or not. Assume the neural network stops after one iteration of feedforward. Gender x1 1 w 1 13 w121 w111 w112 h1 w2 12 w221 w211 o1 Get A = Yes School Year x2 2 w122 w123 131 w32 w1 h2 w222 w231w232 o2 Get A = No GPA x3 h3 w133 3 Weight Weight w111 = 0.7 w211 = 0.9 w112 = 0.6 w212 = -0.9 w113 = 0.7 w221 = 0.8 w121 = 0.8 w222 = -0.8 w122 = 0.6 w231 = 0.7 w123 = 0.8 w232 = -0.7 w131 = 0.9 w132 = 0.4 w133 = 0.9 2 3. See the dataset below. There are four pieces of promotion and insurance information about customers used as evidence attributes: Flight Promotion, Magazine Promotion, Life Insurance, and Credit Insurance. Gender is the class variable. We wish to use Naïve Bayes to predict a customer’s gender, given the information about his/her promotion and insurance. Please find the conditional probability of P(male | Flight Promotion=No, Magazine Promotion=Yes, Life Insurance=No, Credit Insurance=Yes). Flight Promotion Magazine Promotion Life Insurance Credit Insurance Gender Yes No No No Male Yes Yes Yes Yes Female No No No No Male Yes Yes Yes Yes Male Yes No Yes No Female No No No No Female Yes No Yes Yes Male No Yes No No Male Yes No No No Male Yes Yes Yes No Female 3 4. Part of the R code below has been removed. It is represented by blanks with number. Fill the code that can complete the blanks. Please write your code in the correspondent cell in the table on the answer sheet. Assume we have a file called titanic.csv. There are four columns in the file - class, age, gender, fare, and survive. All the column names are in lower case. • class: categorical variable, indicates the room class of passenger (1st, 2nd, or 3rd). • age: integer variable, age of passenger. • gender: categorical variable, gender of passenger (male, female). • fare: continuous variable, ticket price of passenger. • survive: categorical variable, survived or not (yes , no). The code below loads data from file titanic.csv in the current folder and save it to variable titanic. There are missing values in the data file, and represented by NA. The loading process considers the missing value situation. Once loaded, the code only keeps complete records (records do not contain any missing value) in the variable titanic. The code then plot age and class in a boxplot. It shows class on x-axis and age on y-axis. The title for the plot is "Age vs Class", the label for y-axis is "Age", and the label for x-axis is "Class". Finally the code generates a scatter plot. It shows fare on x-axis and age on y-axis. titanic <- _____1_____ ("titanic.csv", _____2_____ = na) titanic = titanic[_____3_____] boxplot(_____4_____, data = titanic, _____5_____ = "age vs class", _____6_____ = "age", _____7_____ = "class") _____8_____ (age _____9_____, _____10_____) 4 _____1_____="" ("titanic.csv",="" _____2_____="NA)" titanic="titanic[_____3_____]" boxplot(_____4_____,="" data="titanic," _____5_____="Age vs Class" ,="" _____6_____="Age" ,="" _____7_____="Class" )="" _____8_____="" (age="" _____9_____,="" _____10_____)="">- _____1_____ ("titanic.csv", _____2_____ = na) titanic = titanic[_____3_____] boxplot(_____4_____, data = titanic, _____5_____ = "age vs class", _____6_____ = "age", _____7_____ = "class") _____8_____ (age _____9_____, _____10_____) 4>