Task 5 *. Complete the EntropyEvaluator class Following the explanation given earlier about the calculation of entropy, complete the static EntropyEvaluator.evaluate(...) method. In this assignment, you need to use binary logarithm (that is, logarithm to base 2) for calculating entropy. For this, you can use the static EntropyEvaluator.log2(...) method provided to you. Task 6. Complete the GainInfoItem class Implement all the methods in GainInfoItem, except for the toString() method which is already provided to you. Implementing GainInfoItem is an easy task.Task 7 *. Complete the InformationGainEvaluator class Follow the instructions given earlier about the calculation of information gain scores to complete the static InformationGainCalculator.calculateAndSortInformationGains(...). In the InformationGainEvaluator class, you will also find a main(...) method. This main(...) method is provided in full. You do not need to make any changes to it. The name of the CSV file to process is given as a command-line parameter. Specifically, the following command line should be used to run A3:
just these three task need to be done everything else is there
U ni ve rs ité d’ O tt aw a C on te nu pr ot ég é pa r l e dr oi t d ’a ut eu r U ni ve rs ity of O tt aw a — C op yr ig ht ed m at er ia l U ni ve rs it é d’ O tt aw a C on te nu pr ot ég é pa r l e dr oi t d ’a ut eu r Le m at ér ie l qu e vo us re ce ve z po ur ce co ur s es t pr ot ég é pa r le dr oi t d’ au te ur et ne de vr ai t ê tr e ut ili sé qu e da ns le ca dr e de ce m êm e co ur s. Vo us n’ av ez pa s l a pe rm is si on de té lé ch ar ge r c e m at ér ie l d e co ur s v er s d’ au tr es si te s W eb . S i v ou s d és ir ez de s c la ri fic at io ns , v eu ill ez s’ il vo us pl aî t c on su lte r v ot re pr of es se ur . U ni ve rs it y of O tt aw a — C op yr ig ht ed m at er ia l Th e m at er ia ls yo u re ce iv e fo r th is co ur se ar e pr ot ec te d by co py ri gh t an d to be us ed fo r th is co ur se on ly . Yo u do no t h av e th e pe rm is si on to up lo ad th e co ur se m at er ia ls to an y w eb si te . I f y ou re qu ir e cl ar ifi ca - tio n, pl ea se co ns ul t y ou r p ro fe ss or . U ni ve rs ité d’ O tt aw a — C on te nu pr ot ég é pa r l e dr oi t d ’a ut eu r U ni ve rs ity of O tt aw a — C op yr ig ht ed m at er ia l ITI 1121. Introduction to Computing II Winter 2021 Assignments 2 and 3 Mehrdad Sabetzadeh and Guy-Vincent Jourdan – Copyrighted material (Last modified on March 11, 2021) Deadline for Assignment 2: February 26, 2021, 11:30 pm Deadline for Assignment 3: March 19, 2021, 11:30 pm Extension: March 21, 2021, 11:30 pm Learning objectives • Inheritance • Interfaces • Abstract Methods • Polymorphism • Experimentation with Lists Introduction In Assignments 2 and 3, we will take one step further towards building decision trees. Since the two assignments are closely related, we provide a combined description of the two. Towards the end of this description, we specify what needs be submitted for each of the two assignments. Please note that Assignment 2 and Assignment 3 have different deadlines (February 26 and March 19, respectively). In these two assignments, we are going to consider all possible “splits” of the input data that we already read into a matrix (Assignment 1) and determine which split yields the best results. Before we explain what splitting means and how to measure the quality of a split, let us see an example of a decision tree and make our way from there. Consider the weather-nominal dataset shown in Figure 1. outlook temperature humidity windy class sunny hot high FALSE no sunny hot high TRUE no overcast hot high FALSE yes rainy mild high FALSE yes rainy cool normal FALSE yes rainy cool normal TRUE no overcast cool normal TRUE yes sunny mild high FALSE no sunny cool normal FALSE yes rainy mild normal FALSE yes sunny mild normal TRUE yes overcast mild high TRUE yes overcast hot normal FALSE yes rainy mild high TRUE no Figure 1: The weather-nominal dataset 1 U ni ve rs ité d’ O tt aw a C on te nu pr ot ég é pa r l e dr oi t d ’a ut eu r U ni ve rs ity of O tt aw a — C op yr ig ht ed m at er ia l U ni ve rs it é d’ O tt aw a C on te nu pr ot ég é pa r l e dr oi t d ’a ut eu r Le m at ér ie l qu e vo us re ce ve z po ur ce co ur s es t pr ot ég é pa r le dr oi t d’ au te ur et ne de vr ai t ê tr e ut ili sé qu e da ns le ca dr e de ce m êm e co ur s. Vo us n’ av ez pa s l a pe rm is si on de té lé ch ar ge r c e m at ér ie l d e co ur s v er s d’ au tr es si te s W eb . S i v ou s d és ir ez de s c la ri fic at io ns , v eu ill ez s’ il vo us pl aî t c on su lte r v ot re pr of es se ur . U ni ve rs it y of O tt aw a — C op yr ig ht ed m at er ia l Th e m at er ia ls yo u re ce iv e fo r th is co ur se ar e pr ot ec te d by co py ri gh t an d to be us ed fo r th is co ur se on ly . Yo u do no t h av e th e pe rm is si on to up lo ad th e co ur se m at er ia ls to an y w eb si te . I f y ou re qu ir e cl ar ifi ca - tio n, pl ea se co ns ul t y ou r p ro fe ss or . U ni ve rs ité d’ O tt aw a — C on te nu pr ot ég é pa r l e dr oi t d ’a ut eu r U ni ve rs ity of O tt aw a — C op yr ig ht ed m at er ia l section, then in the next we examine a correction that is usually made to counter a bias toward selecting splits on attributes with large numbers of possible values. Before examining the detailed formula for calculating the amount of informa- tion required to specify the class of an example given that it reaches a tree node with a certain number of yes’s and no’s, consider first the kind of properties we would expect this quantity to have: 1. When the number of either yes’s or no’s is zero, the information is zero; 2. When the number of yes’s and no’s is equal, the information reaches a maximum. Moreover, the measure should be applicable to multiclass situations, not just to two-class ones. The information measure relates to the amount of information obtained by making a decision, and a more subtle property of information can be derived by considering the nature of decisions. Decisions can be made in a single stage, or they can be made in several stages, and the amount of information involved is the same in both cases. For example, the decision involved in Infoð½2; 3; 4#Þ can be made in two stages. First decide whether it’s the first case or one of the other two cases: Infoð½2; 7#Þ and then decide which of the other two cases it is: Infoð½3; 4#Þ Outlook Humidity Sunny Yes Overcast Windy Rainy No High Yes Normal Yes False No True FIGURE 4.4 Decision tree for the weather data. 1094.3 Divide-and-Conquer: Constructing Decision Trees Figure 2: A decision tree (borrowed from [1]) learned from the dataset of Figure 1 Our goal is to eventually (that is, in Assignment 4) be able to build decision trees like the one shown in Fig- ure 2. A decision tree uses a tree model to explain possible consequences or provide predictions, given a set of known parameters. For instance, in our weather example, we want our decision tree to take outlook, temperature, humidity, and whether it is windy or not as parameters and predict the “class” attribute. This attribute indicates whether a certain sports tournament (say, football or tennis) is feasible to play, given the weather conditions of the day. Obviously, we want our prediction to be more accurate than a coin toss! For this, we need to train a model – in our context, a decision tree – based on the data that we have observed previously. In our weather example, the previously observed data would be what is shown in Figure 1. The reason why the last column of the data in Figure 1 is called “class” is because we are dealing with a clas- sification problem, with the possible outcomes being yes or no. Since there are two outcomes only, the problem is a binary classification problem. In the assignments for this course, we are concerned exclusively with binary classification. Furthermore, we assume that the “class” attribute is always the last attribute, irrespective of what the attribute is actually named. For example, in the weather-numeric dataset, shown in Figure 3, the last attribute is named “play”. For this dataset, we take “play” (the last column) to have exactly the same role as “class”. outlook temperature humidity windy play sunny 85 85 FALSE no sunny 80 90 TRUE no overcast 83 86 FALSE yes rainy 70 96 FALSE yes rainy 68 80 FALSE yes rainy 65 70 TRUE no overcast 64 65 TRUE yes sunny 72 95 FALSE no sunny 69 70 FALSE yes rainy 75 80 FALSE yes sunny 75 70 TRUE yes overcast 72 90 TRUE yes overcast 81 75 FALSE yes rainy 71 91 TRUE no Figure 3: The weather-numeric dataset; the “play” attribute has the same role as “class” in the dataset of Figure 1 Semantically, the decision tree of Figure 2 is equivalent to the if-else block shown in Figure 4. The nice thing about our decision tree (and the corresponding if-else block) is that it is predictive and can project an outcome for weather conditions that have not been observed in the past. For example, our model has learned that “if the outlook is overcast, no matter what the other conditions are, we are good to play”. Interestingly, there are several 2 U ni ve rs ité d’ O tt aw a C on te nu pr ot ég é pa r l e dr oi t d ’a ut eu r U ni ve rs ity of O tt aw a — C op yr ig ht ed m at er ia l U ni ve