A single file(.ipynb) is expected along with the required output
Note: 1. Report the values (output) in the text box when submitting the assignment. 2. Attach one (consolidated) code/notebook which is used to generate the values. Q1. In this problem we will verify that ensembles indeed perform better than individual classifiers. Consider K set of binary classifiers. For simplicity we shall simulate the accuracy of each classifier using a probability value p. To do the ensemble, we repeat the following N times. · Generate K random binary values in {0, 1} with probability of 1 being p. (Look at the function numpy.random.choice) · Take a Majority Vote to predict the class. Now, accuracy is nothing but the percentage of times (out of N) we predict 1. REASONING: Essentially, we are assuming that an individual classifier is correct with probability p. So, if we generate K random binary values, we randomly guess if each of the K classifier is correct/wrong. So, if the majority vote is 1 then we have predicted the correct class, else we have predicted the wrong class. Hence accuracy is nothing but percentage of times (out of N) we predict value 1. Report the accuracy when we substitute the following values: • p=0.49, K=1000, N=1000 • p=0.51, K=1000, N=1000 • p=0.51, K=10, N=1000 • p=0.51, K=1000, N=10 • p=0.51, K=100, N=10000 Q2. In this problem we shall look at the decision tree and fine tuning it. Do the following: · Use the dataset from sklearn. datasets. make_moons. Use the following parameter values random state = 42, n samples = 1000, noise = 0.4. · Use train test split function with test size = 0.2 and random state = 42 to split the dataset into train and test. We only change two variables for this problem - max leaf nodes, min samples split. If nothing is specified take the default values. Report the following values · Accuracy when max leaf nodes = 2 · Accuracy when max leaf nodes = 4 · Accuracy when min samples split = 30 · Using GridSearchCV identify the best hyperparameters within combinations max leaf nodes is in {2, 3, · · ·, 99} and min samples split is in {2, 3, 4}. Report the best parameter combination and accuracy corresponding to it. Q3. Use LinearSVC to classify the following data points: SNO X1 X2 Y 1 3 4 0 2 2 2 0 3 4 4 0 4 1 4 0 5 2 1 1 6 4 3 1 7 4 1 1 Report the following: • Value of. coef • Value of. intercept