Proctored Final exam for the course of machine learning, the exam will last for about 2 hours. It's for a master's program in business analytics. The exam will cover everything from the semester.
Quiz 2 (Individual) (Remotely Proctored)_ Machine Learning - DAT-5303 - BMSBA1.pdf 4/14/2021 Quiz 2 (Individual) (Remotely Proctored): Machine Learning - DAT-5303 - BMSBA1 https://mycourses.hult.edu/courses/3114676/quizzes/7501015 1/8 Quiz 2 (Individual) (Remotely Proctored) Due No due date Points 50 Questions 10 Available Jan 23 at 10:30am - Feb 13 at 2:05am 21 days Time Limit 25 Minutes Instructions This quiz was locked Feb 13 at 2:05am. Attempt History Attempt Time Score LATEST Attempt 1 25 minutes 20 out of 50 https://mycourses.hult.edu/courses/3114676/quizzes/7501015/history?version=1 4/14/2021 Quiz 2 (Individual) (Remotely Proctored): Machine Learning - DAT-5303 - BMSBA1 https://mycourses.hult.edu/courses/3114676/quizzes/7501015 2/8 Score for this quiz: 20 out of 50 Submitted Feb 12 at 1:59pm This attempt took 25 minutes. 0 / 5 ptsQuestion 1 (5 points) Which of the following model types are non-parametric (do NOT result in models that have coefficients)? Select all that apply. k-Nearest Neighbors (KNN) orrect Answerorrect Answer Lasso Regression You AnsweredYou Answered Classification Trees Correct!Correct! Logistic Regression You AnsweredYou Answered 0 / 5 ptsQuestion 2 (5 points) Below is a list of parameters from scikit-learn’s DecisionTreeClassifier. Which of the following parameters should NOT be optimized using hyperparameter tuning? Parameters for Decision Tree Classifier DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort='deprecated', ccp_alpha=0.0) max_depth 4/14/2021 Quiz 2 (Individual) (Remotely Proctored): Machine Learning - DAT-5303 - BMSBA1 https://mycourses.hult.edu/courses/3114676/quizzes/7501015 3/8 min_samples_split You AnsweredYou Answered random_state orrect Answerorrect Answer min_impurity_decrease 0 / 5 ptsQuestion 3 (5 points) You've just completed developing a classification model that is meant to predict whether or not a customer will buy a product (Yes = 1, No = 0). Your manager asks: “How well did we do in correctly predicting which customers will buy, based on the total number of times we predicted that customers will buy?” In classification modeling, is a concept known as __________. Hint: This concept is mathematically defined as follows: ?? ?? + ?? specificity You AnsweredYou Answered precision orrect Answerorrect Answer accuracy recall (sensitivity) 4/14/2021 Quiz 2 (Individual) (Remotely Proctored): Machine Learning - DAT-5303 - BMSBA1 https://mycourses.hult.edu/courses/3114676/quizzes/7501015 4/8 5 / 5 ptsQuestion 4 (5 points) In the confusion matrix, when an event is CORRECTLY predicted to occur, it is known as a __________. In other words, something was predicted to occur, and it did occur. true negative true positive Correct!Correct! false negative false positive 5 / 5 ptsQuestion 5 (5 points) A doctor is screening patients for cancer (1 = Patient Has Cancer, 0 = Patient Does Not Have Cancer). The cancer screening test has been designed to minimize incorrectly predicting that a patient does not have cancer when they actually do. In effect, the cancer screening test has been designed to minimize the occurrence of __________. false negatives Correct!Correct! true negatives true positives false positives 4/14/2021 Quiz 2 (Individual) (Remotely Proctored): Machine Learning - DAT-5303 - BMSBA1 https://mycourses.hult.edu/courses/3114676/quizzes/7501015 5/8 5 / 5 ptsQuestion 6 Your friend is using the following machine learning algorithm: class sklearn.linear_model.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None) Which of the following problems is your friend most likely trying to address? Assume the algorithm is appropriate for the task at hand. Trying to predict a continuous response variable; for example predicting the price of a house based on its features. Trying to recommend a song on Spotify based on user characteristics. Trying to segment customers using a clustering technique. Trying to find items/events that often co-occur; for example grocery items that are usually bought together by a customer. Trying to predict whether or not something will occur; for example a cell is benign or malignant, or whether a customer will churn or not. Correct!Correct! 0 / 5 ptsQuestion 7 4/14/2021 Quiz 2 (Individual) (Remotely Proctored): Machine Learning - DAT-5303 - BMSBA1 https://mycourses.hult.edu/courses/3114676/quizzes/7501015 6/8 If we do not prune a decision tree by restricting its depth or its number of leaf nodes, the tree can become arbitrarily deep and complex. Unpruned trees are therefore prone to ___________________ and not generalizing well on new data. error You AnsweredYou Answered orrect Answersorrect Answers over-fitting Over-fitting overfitting Overfitting Over Fitting OVER FITTING OVER-FITTING Over-Fitting over fitting OVERFITTING Over fitting 5 / 5 ptsQuestion 8 Which of the following is FALSE concerning the train_test_split method? You can use integer values to set the precise number of samples you want to use as training and/or testing data. None of the above Correct!Correct! 4/14/2021 Quiz 2 (Individual) (Remotely Proctored): Machine Learning - DAT-5303 - BMSBA1 https://mycourses.hult.edu/courses/3114676/quizzes/7501015 7/8 If you specify either the test_size or the train_size, the other argument is inferred. For example, the statement: X_train, X_test, y_train, y_test = train_test_split( digits.data, digits.target, random_state=11, test_size=0.20) specifies that 20% of the data will be allocated to the testing set, so training set is inferred to be 80% of the data. To specify different splits, you can set the sizes of the testing and training sets with the train_test_split method’s arguments test_size and train_size, respectively. You can use floating-point values from 0.0 through 1.0 to specify the percentages of the data to use for each. Shuffle and Stratify are two parameters that can be controlled in train_test_split function. 0 / 5 ptsQuestion 9 Which of the following statements is FALSE? Hyperparameters are set after training and fitting your model. orrect Answerorrect Answer In real-world machine learning studies, you will want to use hyperparameter tuning to choose hyperparameter values that produce the best possible predictions. The k-Nearest Neighbors algorithm can be used for classification modeling. 4/14/2021 Quiz 2 (Individual) (Remotely Proctored): Machine Learning - DAT-5303 - BMSBA1 https://mycourses.hult.edu/courses/3114676/quizzes/7501015 8/8 The k-Nearest Neighbors algorithm can be used to estimate values for a continuous response variable. The n_neighbors argument in the k-nearest neighbors algorithm is a hyperparameter of the algorithm. You AnsweredYou Answered 0 / 5 ptsQuestion 10 Combining and manipulating existing features in a dataset to create new ones is known as _____________________. This can improve the performance of predictive models on unseen data. unsupervised learning confusion matrix analysis feature engineering orrect Answerorrect Answer hyperparameter tuning You AnsweredYou Answered k-fold cross validation Quiz Score: 20 out of 50 Machine Learning Scripts (Regression Modeling) (1).zip __MACOSX/._Machine Learning Scripts (Regression Modeling) Machine Learning Scripts (Regression Modeling)/.DS_Store __MACOSX/Machine Learning Scripts (Regression Modeling)/._.DS_Store __MACOSX/Machine Learning Scripts (Regression Modeling)/._datasets Machine Learning Scripts (Regression Modeling)/Script 2 - Feature Engineering - Guided.ipynb { "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "run_control": { "frozen": true } }, "source": [ "
\n", "
\n", "\n", "
Script 2 | Feature Engineering
\n", "
DAT-5303 | Machine Learning
\n", "Chase Kusterer - Faculty of Analytics \n", "Hult International Business School \n", "\n", "
\n", "
\n", "\n", " \n", "In this session we will step further into our content on building machine learning models. As we move forward, make sure you are comfortable the following framework. \n", "\n", "
\n", "\n", "
Basic Course Modeling Strategy for CONTINUOUS Response Variable:\n", "\n", "\n", "1. Prepare for Model Development \n", "Split dataset into training and testing sets \n", "2. Model Development in statsmodels \n", "Experiment with different variable combinations in linear regression (OLS) and analyze results \n", "3. (COMING SOON) Develop Candidate Models \n", "Take model(s) with highest predictive power and save its variables as a new dataset \n", "4. (COMING SOON) Prepare for Model Development on New Dataset \n", "Split new dataset into training and testing sets \n", "5. (COMING SOON) Model Tournament \n", "Experiment with different (regression) model types in scikit-learn \n", "\n", "(COMING SOON) The
MUST KNOWworkflow of scikit-learn:\n", "* Instantiate\n", "* Fit\n", "* Predict\n", "* Score\n", "\n", " \n", "\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "run_control": { "frozen": true } }, "source": [ "
Part I: Review of Imports and Base Modeling
\n", "Let's warm up by building a simple linear regression (OLS) model to predict the sale price of a home (Sale_Price). This is known as developing a base model. Our first goal is to develop a detailed understanding of our features (i.e. variables) and their relationship with our
response variable(Sale_Price). After engineering a slew of new features, our goal will be to build a new model that predicts better than the base model." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "run_control": { "frozen": true } }, "source": [ "
Challenge 1\n", "Import the following packages:\n", "* pandas (as pd)\n", "* seaborn (as sns)\n", "* matplotlib.pyplot