Make a multiclass classifier to predict wine quality with majority rules voting by performing the following steps: a) Using the data/winequality-white.csv and data/winequalityred.csv files, create a...

Make a multiclass classifier to predict wine quality with majority rules voting by

performing the following steps:

a) Using the data/winequality-white.csv and data/winequalityred.csv files, create a dataframe with concatenated data and a column indicating

which wine type the data belongs to (red or white).

b) Create a test and training set with 75% of the data in the training set. Stratify

on quality.

c) Build a pipeline for each of the following models: random forest, gradient

boosting, k-NN, logistic regression, and Naive Bayes (GaussianNB). The

pipeline should use a ColumnTransformer object to standardize the numeric

data while one-hot encoding the wine type column (something like is_red and

is_white, each with binary values), and then build the model. Note that we will

discuss Naive Bayes in Chapter 11, Machine Learning Anomaly Detection

d) Run grid search on each pipeline except Naive Bayes (just run fit() on it) with

scoring='f1_macro' on the search space of your choosing to find the best

values for the following:

i) Random forest: max_depth

ii) Gradient boosting: max_depth

iii) k-NN: n_neighbors

iv) Logistic regression: C

e) Find the level of agreement between each pair of two models using the

cohen_kappa_score() function from the metrics module in

scikit-learn. Note that you can get all the combinations of the two

easily using the combinations() function from the itertools module

in the Python standard library.

f) Build a voting classifier with the five models built using majority rules

(voting='hard') and weighting the Naive Bayes model half as much

as the others.

g) Look at the classification report for your model.

h) Create a confusion matrix using the confusion_matrix_visual() function

from the ml_utils.classification module.

May 26, 2022

Get Answer To This Question