Can wine type (red or white) help determine the quality score?
a) Using the data/winequality-white.csv and data/winequalityred.csv files, create a dataframe with the concatenated data and a column
indicating which wine type the data belongs to (red or white).
b) Create a test and training set with 75% of the data in the training set. Stratify
on quality.
c) Build a pipeline using a ColumnTransformer object to standardize the numeric
data while one-hot encoding the wine type column (something like is_red and
is_white, each with binary values), and then train a random forest.
d) Run grid search on your pipeline with the search space of your choosing
to find the best value for the random forest's max_depth parameter with
scoring='f1_macro'.
e) Take a look at the feature importances from the random forest.
f) Look at the classification report for your model.
g) Plot a ROC curve for multiclass data using the plot_multiclass_roc()
function from the ml_utils.classification module.
h) Create a confusion matrix using the confusion_matrix_visual() function
from the ml_utils.classification module.