Please find instructions for this order in the uploaded file, answer all questions carefully. Give clear explanations on short-answer questions and submit all in a word document. For coding parts, submit your codes in py files. PDF files uploaded are for your reference.
1.a. Choose one of the cleaned datasets at https://www.kaggle.com/annavictoria/ml-friendly-public-datasets. Split it into training and test data. Write from scratch and apply any ML algorithm that you learned to this dataset. You can use Python to implement it. For the implementation, you may use any classes, modules, and functions in Python libraries such as NumPy to do various math / linear algebra operations, but not use the ML classes or functions directly. Apply another algorithm that you learned to the same dataset. For this one, you are free to implement it from scratch or use the ML class and functions directly from the ML packages. Which one of the algorithm fares better? Use as many evaluation metrics as possible to discuss the performance of the algorithms. Please put all your explanations in the word document. 1.b. Derive an equation for accuracy in terms of Specificity and Sensitivity. The equation can include metrics such as number of True Positives or number of False Positives, etc. in addition to accuracy, Specificity and Sensitivity. Give an interpretation of the equation. 2.(a) Manually generate the decision tree (as much as possible) for the following subset from a large dataset using the ID3 algorithm. Show the information gain computation at each stage. 2.(b) Then generate the decision tree programmatically using Python. Submit code and the decision tree so generated. Dataset for "Play Tennis" Decision Tree that you will generate 4/18/2020 Regularization of Linear Models with SKLearn - Coinmonks - Medium https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2 1/5 Regularization of Linear Models with SKLearn Robert Thas John Follow Jul 11, 2018 · 3 min read An over�t model Linear models are usually a good starting point for training a model. However, a lot of datasets do not exhibit linear relationships between the independent and the dependent variables. As a result, it is frequently necessary to create a polynomial model. However, these models are usually prone to overfitting. One method of reducing overfitting in polynomial models is through the use of regularization. Let’s start by building a baseline model to determine the required improvement. We will make use of the popular Boston Housing dataset which is available on Kaggle here. Let’s import the necessary libraries and load up our training dataset. 1 2 3 #imports import numpy as np import pandas as pd https://medium.com/@robertjohn_15390?source=post_page-----f88633a93a2---------------------- https://medium.com/@robertjohn_15390?source=post_page-----f88633a93a2---------------------- https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2?source=post_page-----f88633a93a2---------------------- https://www.kaggle.com/c/boston-housing 4/18/2020 Regularization of Linear Models with SKLearn - Coinmonks - Medium https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2 2/5 Let’s split our data into a training set and a validation set. We will hold out 30% of the data for validation. We will use a random state to make our experiment reproducible. view raw 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 regularization_imports.py hosted with ❤ by GitHub import math from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.linear_model import Ridge from sklearn.linear_model import Lasso from sklearn.linear_model import ElasticNet from sklearn.metrics import mean_squared_error from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt import seaborn as sns sns.set() %matplotlib inline #import training dataset train_df = pd.read_csv('train.csv', index_col='ID') #see the columns in our data train_df.info() # take a look at the head of the dataset train_df.head() view rawtrain_test_split_boston.py hosted with ❤ by GitHub 1 2 3 4 5 #create our X and y X = train_df.drop('medv', axis=1) y = train_df['medv'] X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.3) https://gist.github.com/securetorobert/99ef19b2566805cc28ee187d86ccad73/raw/5574b69eadbe9c3173d1c56313f5f1327c7d29e7/regularization_imports.py https://gist.github.com/securetorobert/99ef19b2566805cc28ee187d86ccad73#file-regularization_imports-py https://github.com/ https://gist.github.com/securetorobert/fecf3de2aa804a1ad4a2310629537c0e/raw/5314ce11edd6e05a4a94d852de8bdf09befc20a2/train_test_split_boston.py https://gist.github.com/securetorobert/fecf3de2aa804a1ad4a2310629537c0e#file-train_test_split_boston-py https://github.com/ 4/18/2020 Regularization of Linear Models with SKLearn - Coinmonks - Medium https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2 3/5 Let’s establish a baseline by training a linear regression model. The model above should give us a training accuracy and a test accuracy of about 72%. We should also get an RMSE of about 4.587. The next models we train should outperform this model with higher accuracy scores and a lower RMSE. We need to engineer new features. Specifically, we need to create polynomial features by taking our individual features and raising them to a chosen power. Thankfully, scikit- learn has an implementation for this and we don’t need to do it manually. Something else we would like to do is standardize our data. This scales our data down to a range between 0 and 1. This serves the purpose of letting us work with reasonable numbers when we raise to a power. Finally, because we need to carry out the same operations on our training, validation, and test sets, we will introduce a pipeline. This will let us pipe our process so the same steps get carried out repeatedly. To summarize, we will scale our data, then create polynomial features, and then train a linear regression model. After running our code, we will get a training accuracy of about 94.75%, and a test accuracy of 46.76%. This is a sign of overfitting. It’s normally not a desirable feature, but that is exactly what we were hoping for. view raw 1 2 3 4 5 6 7 8 9 10 11 linear_regression_boston.py hosted with ❤ by GitHub lr_model = LinearRegression() lr_model.fit(X_train, y_train) print('Training score: {}'.format(lr_model.score(X_train, y_train))) print('Test score: {}'.format(lr_model.score(X_test, y_test))) y_pred = lr_model.predict(X_test) mse = mean_squared_error(y_test, y_pred) rmse = math.sqrt(mse) print('RMSE: {}'.format(rmse)) https://gist.github.com/securetorobert/d8fa6ff57ccafad6236f1a8b5459a254/raw/20ad6b303ed513e7e5074b3d4b123c39136ec904/linear_regression_boston.py https://gist.github.com/securetorobert/d8fa6ff57ccafad6236f1a8b5459a254#file-linear_regression_boston-py https://github.com/ 4/18/2020 Regularization of Linear Models with SKLearn - Coinmonks - Medium https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2 4/5 We will now apply regularization to our new data. l2 Regularization or Ridge Regression To understand Ridge Regression, we need to remind ourselves of what happens during gradient descent, when our model coefficients are trained. During training, our initial weights are updated according to a gradient update rule using a learning rate and a gradient. Ridge regression adds a penalty to the update, and as a result shrinks the size of our weights. This is implemented in scikit-learn as a class called Ridge. We will create a new pipeline, this time using Ridge. We will specify our regularization strength by passing in a parameter, alpha. This can be really small, like 0.1, or as large as you would want it to be. The larger the value of alpha, the less variance your model will exhibit. By executing the code, we should have a training accuracy of about 91.8%, and a test accuracy of about 82.87%. That is an improvement on our baseline linear regression model. Let’s try something else. l1 Regularization or Lasso Regression By creating a polynomial model, we created additional features. The question we need to ask ourselves is which of our features are relevant to our model, and which are not. l1 regularization tries to answer this question by driving the values of certain coefficients down to 0. This eliminates the least important features in our model. We will create a pipeline similar to the one above, but using Lasso. You can play around with the value of alpha, which can range from 0.1 to 1. The code above should give us a training accuracy of 84.8%, and a test accuracy of 83%. This is an even better model than the one we trained earlier. At this point, you can evaluate your model by finding the RMSE. Don’t forget to read the documentation for everything we used. 4/18/2020 Regularization of Linear Models with SKLearn - Coinmonks - Medium https://medium.com/coinmonks/regularization-of-linear-models-with-sklearn-f88633a93a2 5/5 I hope you found this tutorial useful. Until next time. Machine Learning Regularization Ridge Regression Lasso Regression Sklearn About Help Legal https://medium.com/tag/machine-learning https://medium.com/tag/regularization https://medium.com/tag/ridge-regression https://medium.com/tag/lasso-regression https://medium.com/tag/sklearn https://medium.com/?source=post_page-----f88633a93a2---------------------- https://medium.com/about?autoplay=1&source=post_page-----f88633a93a2---------------------- https://help.medium.com/?source=post_page-----f88633a93a2---------------------- https://medium.com/policy/9db0094a1e0f?source=post_page-----f88633a93a2---------------------- 4/18/2020 Ridge and Lasso Regression: L1 and L2 Regularization https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b 1/11 Ridge and Lasso Regression: L1 and L2 Regularization Complete Guide Using Scikit-Learn Saptashwa Bhattacharyya Follow Sep 26, 2018 · 8 min read Moving on from a very important unsupervised learning technique that I have discussed last week, today we will dig deep in to supervised learning through linear regression, specifically two special linear regression model — Lasso and Ridge regression. As I’m using the term linear, first let’s clarify that linear models are one of the simplest way to predict output using a linear function of input features. Linear model with n features for output prediction In the equation (1.1) above, we have shown the linear model based on the n number of features. Considering only a single feature as you probably already have understood that w[0] will be slope and b will represent intercept. Linear regression looks for optimizing w and b such that it minimizes the cost function. The cost function can be written as Cost function for simple linear model https://towardsdatascience.com/@saptashwa?source=post_page-----e20e34bcbf0b---------------------- https://towardsdatascience.com/@saptashwa?source=post_page-----e20e34bcbf0b---------------------- https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b&source=-9a3c3c477239-------------------------follow_byline- https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b?source=post_page-----e20e34bcbf0b---------------------- https://towardsdatascience.com/dive-into-pca-principal-component-analysis-with-python-43ded13ead21 4/18/2020 Ridge and Lasso Regression: L1 and L2 Regularization https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b 2/11 In the equation above I have assumed the data-set has M instances and p features. Once we use linear regression on a data-set divided in to training and test set, calculating the scores on training and test set can give us a rough idea about whether the model is suffering from over-fitting or under-fitting. The chosen linear model can be just right also, if you’re lucky enough! If we have very few features on a data-set and the score is poor for both training and test set then it’s a problem of under-fitting. On the other hand if we have large number of features and test score is relatively poor than the training score then it’s the problem of over-generalization or over-fitting. Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent over-fitting which may result from simple linear regression. . . . Ridge Regression : In ridge regression, the cost function is altered by adding a penalty equivalent to square of the magnitude of the coefficients. Cost function for ridge regression This is equivalent to saying minimizing the cost function in equation 1.2 under the condition as below Supplement 1: Constrain on Ridge regression coe�cients So ridge regression puts constraint on the coefficients (w). The penalty term (lambda) regularizes the coefficients such that if the coefficients take large values the optimization function is penalized. So, ridge regression shrinks the coefficients and it helps to reduce the model complexity and multi-collinearity. Going back to eq. 1.3 4/18/2020 Ridge and Lasso Regression: L1 and L2 Regularization https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b 3/11 one can see that when λ → 0 , the cost function becomes similar to the linear regression cost function (eq. 1.2). So lower the constraint (low λ) on the features, the model will resemble linear regression model. Let’s see an example using Boston house data and below is the code I used to depict linear regression as