CS 301 Fall 2022 Sample Exam Solution Time 2 hour 30 minutes Total points -45 Name -------------------------------------------------------- 1. Multiple choice or single answer question....

1 answer below »

View more »
Answered 1 days AfterDec 14, 2022

Answer To: CS 301 Fall 2022 Sample Exam Solution Time 2 hour 30 minutes Total points -45 Name...

Karthi answered on Dec 16 2022
44 Votes
a. i. A fair coin iii. A 6-sided fair dice
In information theory, entropy is a measure of the amount of uncertainty in a random variable.
For a fair coin, each possible outcome (heads or tails) is equally likely, so there is maximum
uncertainty or entropy. Similarly, a 6-sided fair dice has six possible outcomes, each with equal
probability, so it also has maximum entropy. An unfair coin or a 4-sided fair dice, on the other
hand, would have less entropy because some outcomes are more likely than others.
b.
The
accuracy expression using 0.632 bootstrap for the given scenario is:
0.632 × 0.4 + 0.368 × 0.6
The 0.632 bootstrap is a method used to estimate the accuracy of a classifier. It involves
repeatedly sampling the training data with replacement, building a classifier on each sample,
and then averaging the accuracy of the classifiers. The 0.632 factor comes from the fact that
each bootstrapped sample is typically about 63.2% of the size of the original training set. In the
given scenario, the accuracy on the test data is 40%, while that on the training data is close to
60%, so the 0.632 bootstrap method can be used to estimate the overall accuracy of the
classifier.
c.
The given data demonstrates a linear relationship between Math Score and Age of the test
writers. The model equation for this relationship can be written as:
score = -0.8157 * Age + 79.56886
This equation represents a line with a slope of -0.8157 and a y-intercept of 79.56886. Given an
age, the equation can be used to predict the corresponding math score. For example, if the age
of a test writer is 20, their predicted math score would be -0.8157 * 20 + 79.56886 = 74.09206.
d.
To classify a test point X(8,7) using k-nearest neighbor based classification with k=3 and
Manhattan distance, we first need to calculate the distance between the test point and each of
the 10 training points. The Manhattan distance between two points (x1, y1) and (x2, y2) is given
by |x1-x2| + |y1-y2|. Using this formula, the distances between X(8,7) and each of the training
points can be calculated as follows:
X1(1,1): |1-8| + |1-7| = 14
X2(2,2): |2-8| + |2-7| = 13
X3(2,2.5): |2-8| + |2.5-7| = 13.5
X4(3,7): |3-8| + |7-7| = 5
X5(9,9): |9-8| + |9-7| = 4
X6(8,9): |8-8| + |9-7| = 2
X7(3,3): |3-8| + |3-7| = 9
X8(9,9): |9-8| + |9-7| = 4
X9(9,10): |9-8| + |10-7| = 6
X10(9,5): |9-8| + |5-7| = 5
Next, we need to sort the distances in ascending order and select the k=3 closest training
points. The 3 closest training points are X4(3,7), X5(9,9), and X6(8,9), with distances of 5, 4, and
2, respectively. Finally, we need to classify the test point based on the majority class label
among these 3 points. In this case, all 3 points have the class label "Male", so the test point
X(8,7) would be classified as "Male" using 3-nearest neighbor based classification with
Manhattan distance.
e.
In a transaction database, an itemset is a set of items that occur together in a transaction. The
support of an itemset is the number of transactions in which the itemset appears. A closed
itemset is an itemset such that no proper superset of the itemset has the same support. A
maximum itemset is an itemset that is not a subset of any other itemset with the same support.
Given the transaction database {, {, < a 1 , …, a 50 >} and a
minimum support of 2, the closed itemsets and the corresponding support counts are:
: 2 < a 1 , …, a 50 >: 3
Both of these itemsets are closed because they have a support of 2 and there are no proper
supersets of these itemsets with the same support. The maximum itemsets are:
: 2
This is the only maximum itemset because it is not a subset of any other itemset with the same
support.
f.
The following statements are true:
i. Supervised discretization could be obtained by applying information gain based criteria.
ii. Supervised discretization could be obtained by simply finding the pure intervals where the
class labels are the same.
iii. Discretization could be obtained by applying Elbow method.
Supervised discretization is the process of dividing a continuous variable into a set of discrete
intervals or bins. This can be useful for simplifying complex data and making it easier to analyze.
There are several methods for performing supervised discretization, including information gain
based criteria, pure interval finding, and the Elbow method. The first method uses information
gain to find the intervals that are most useful for predicting the class label. The second method
finds intervals where all the observations have the same class label. The third method uses the
Elbow method to determine the optimal number of intervals. All of these methods can be used
to obtain supervised discretization.
2.
To generate all association rules that describe weather conditions with play outcome, we need
to first find all frequent itemsets in the dataset using the support threshold of 3. For example,
the itemset {Outlook=Sunny, Play=No} has a support of 3, since it appears in transactions 1, 2,
and 8. Similarly, the itemset {Outlook=Overcast, Play=Yes}...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
April
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30