ICT112 Week 4 Lab ICT707 Big Data Assignment Big Data Assignment Marking Criteria The Big Data Assignment is comprised of two parts: · The first part is to create the algorithms in the tasks, namely:...

1 answer below »
python


ICT112 Week 4 Lab ICT707 Big Data Assignment Big Data Assignment Marking Criteria The Big Data Assignment is comprised of two parts: · The first part is to create the algorithms in the tasks, namely: Decision Tree, Gradient Boosted Tree and Linear regression and then to apply them to the bike sharing dataset provided. Try and produce the output given in the task sections (also given in the Big-Data Assignment.docx provided on Blackboard). · The second part is then use those algorithms created in the first part and apply them to another dataset chosen from Kaggle (other than the bike sharing dataset provided). Rubric Datasets bike sharing [provided] Student selected dataset [from Kaggle.com] Decision Tree Decision Tree 5 5 Decision Tree Categorical features 5 5 Decision Tree Log 5 5 Decision Tree Max Bins 5 5 Decision Tree Max Depth 5 5 Gradient Boosted Tree Gradient Boosted Tree 5 5 Gradient boost tree iterations 5 5 Gradient boost tree Max Bins 5 5 Linear regression Linear regression 5 5 Linear regression Cross Validation Intercept 5 5 Iterations 5 5 Step size 5 5 L1 Regularization 5 5 L2 Regularization 5 5 Linear regression Log 5 5 75 75 Total mark 150 What needs to be submitted for marking: For the Decision tree section a .py or .ipynb file for each of the following: · Decision Tree · Decision Tree Categorical features · Decision Tree Log · Decision Tree Max Bins · Decision Tree Max Depth For the Gradient boost tree section a .py or .ipynb file for each of the following: · Gradient boost tree · Gradient boost tree iterations · Gradient boost tree Max Bins For the Linear regression section a .py or .ipynb file for each of the following: · Linear regression · Linear regression Cross Validation · Intercept · Iterations · Step size · L1 Regularization · L2 Regularization · Linear regression Log Each of the files submitted will be tested with the following datasets: · bike sharing [which is provided on blackboard] · A dataset of the students choice downloaded from Kaggle.com [Hint] Write each algorithm so that it can take in a dataset name. For example: raw_data = sc.textFile("/home/spark/data/hour.csv") In this manner both datasets can be run with the same files. Assignment 1. Utilising Python 3 Build the following regression models: · Decision Tree · Gradient Boosted Tree · Linear regression 2. Select a dataset (other than the example dataset given in section 3) and apply the Decision Tree and Linear regression models created above. Choose a dataset from Kaggle https://www.kaggle.com/datasets 3. Build the following in relation to the gradient boost tree and the dataset choosen in step 2 a) Gradient boost tree iterations (see Big-Data Assignment.docx section 6.1) b) Gradient boost tree Max Bins (see Big-Data Assignment.docx section 7.2) 4. Build the following in relation to the decision tree and the dataset choosen in step 2 a) Decision Tree Categorical features b) Decision Tree Log (see Big-Data Assignment.docxsection 5.4) c) Decision Tree Max Bins (see Big-Data Assignment.docx section 7.2) d) Decision Tree Max Depth (see Big-Data Assignment.docx section 7.1) 5. Build the following in relation to the linear regression and the dataset choosen in step 2 a) Linear regression Cross Validation i. Intercept (see Big-Data Assignment.docx section 6.5) ii. Iterations (see Big-Data Assignment.docx section 6.1) iii. Step size (see Big-Data Assignment.docxsection 6.2) iv. L1 Regularization (see Big-Data Assignment.docx section 6.4) v. L2 Regularization (see Big-Data Assignment.docx section 6.3) b) Linear regression Log (see Big-Data Assignment.docx section 5.4) 6. Follow the provided example of the Bike sharing data set and the guide lines in the sections that follow this section to develop the requirements given in steps 1,3,4 and 5 Task 1 Task 1 is comprised of developing: 1. Decision Tree a) Decision Tree Categorical features b) Decision Tree Log (see Big-Data Assignment.docx section 5.4) c) Decision Tree Max Bins (see Big-Data Assignment.docx section 7.2) d) Decision Tree Max Depth (see Big-Data Assignment.docx section 7.1) The Output for this task and all the sub tasks are based on the the Bike sharing data set as input. Utilise the Bike sharing data set as input to test that the Decision Tree task and sub tasks (i.e.step 1 and 4 from the assignment) are working and producing the correct output before apply to your selected data set. Decision Tree Output 1: Feature vector length for categorical features: 57 Feature vector length for numerical features: 4 Total feature vector length: 61 Decision Tree feature vector: [1.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.81,0.0] Decision Tree feature vector length: 12 Decision Tree predictions: [(16.0, 54.913223140495866), (40.0, 54.913223140495866), (32.0, 53.171052631578945), (13.0, 14.284023668639053), (1.0, 14.284023668639053)] Decision Tree depth: 5 Decision Tree number of nodes: 63 Decision Tree - Mean Squared Error: 11611.4860 Decision Tree - Mean Absolute Error: 71.1502 Decision Tree - Root Mean Squared Log Error: 0.6251 Output 2: Decision Tree feature vector: [1.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.81,0.0] Decision Tree feature vector length: 12 Decision Tree predictions: [(16.0, 54.913223140495866), (40.0, 54.913223140495866), (32.0, 53.171052631578945), (13.0, 14.284023668639053), (1.0, 14.284023668639053)] Decision Tree depth: 5 Decision Tree number of nodes: 63 Decision Tree - Mean Squared Error: 11611.4860 Decision Tree - Mean Absolute Error: 71.1502 Decision Tree - Root Mean Squared Log Error: 0.6251 Categorial features Output: Mapping of first categorical feature column: {'1': 0, '4': 1, '2': 2, '3': 3} Categorical feature size mapping {0: 5, 1: 3, 2: 13, 3: 25, 4: 3, 5: 8, 6: 3, 7: 5} Decision Tree Categorical Features - Mean Squared Error: 7912.5642 Decision Tree Categorical Features - Mean Absolute Error: 59.4409 Decision Tree Categorical Features - Root Mean Squared Log Error: 0.6192 Decision Tree Log Output: Decision Tree Log - Mean Squared Error: 14781.5760 Decision Tree Log - Mean Absolute Error: 76.4131 Decision Tree Log - Root Mean Squared Log Error: 0.6406 Decision Tree Max Bins Output: Decision Tree Max Depth Output: Task 2 Task 2 is compromised of developing: 1. Gradient boost tree a) Gradient boost tree iterations (see Big-Data Assignment.docx section 6.1) b) Gradient boost tree Max Bins (see Big-Data Assignment.docxsection 7.2) c) Gradient boost tree Max Depth (see Big-Data Assignment.docx section 7.1) Gradient Boosted Tree Output: GradientBoosted Trees predictions: [(16.0, 103.33972087713495), (40.0, 103.33972087713495), (32.0, 103.33972087713495), (13.0, 103.33972087713495), (1.0, 103.33972087713495)] Gradient Boosted Trees - Mean Squared Error = 325939579.98366314 Gradient Boosted Trees - Mean Absolute Error = 1845603.969 Gradient Boosted Trees - Mean Root Mean Squared Log Error = 32155.5757154 Gradient boost tree iterations Output: Gradient boost tree Max Bins Output: Task 3 Task 3 is compromised of developing: 1. Linear regression model a) Linear regression Cross Validation i. Intercept (see Big-Data Assignment.docx section 6.5) ii. Iterations (see Big-Data Assignment.docx section 6.1) iii. Step size (see Big-Data Assignment.docx section 6.2) iv. L1 Regularization (see Big-Data Assignment.docx section 6.4) v. L2 Regularization (see Big-Data Assignment.docx section 6.3) b) Linear regression Log (see Big-Data Assignment.docx section 5.4) Linear regression model Output: Mapping of first categorical feature column: {'1': 0, '4': 1, '2': 2, '3': 3} Feature vector length for categorical features: 57 Feature vector length for numerical features: 4 Total feature vector length: 61 Linear Model feature vector: [1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.24,0.2879,0.81,0.0] Linear Model feature vector length: 61 Gradient Boosted Trees - Mean Root Mean Squared Log Error = 32155.5757154 Output 2: Linear Model predictions: [(16.0, 53.183375554478182), (40.0, 52.572149013454187), (32.0, 52.517786871472346), (13.0, 52.312352839640027), (1.0, 52.285323002218234)] Linear Regression - Mean Squared Error: 46565.6666 Linear Regression - Mean Absolute Error: 148.3472 Linear Regression - Root Mean Squared Log Error: 1.4284 Linear regression Cross Validation Output: Training data size: 13869 Test data size: 3510 Total data size: 17379 Train + Test size : 17379 Intercept Output: Iterations Output: Step size Output: L1 Regularization Output: L2 Regularization Output: Linear regression Log Output:Linear Regression Log - Mean Squared Error: 50685.5559 Linear Regression Log - Mean Absolute Error: 155.2955 Linear Regression Log - Root Mean Squared Log Error: 1.5411 6 ICT707 Big Data aSSignment ICT707 Big Data aSSignment 1 ICT112 Week 4 Lab ICT707 Big Data Assignment Regression Models Regression models are concerned with target variables that can take any real value. The underlying principle is to find a model that maps input features to predicted target variables. Regression is also a form of supervised learning. Regression models can be used to predict just about any variable of interest. A few examples include the following: · Predicting stock returns and other economic variables · Predicting loss amounts for loan defaults (this can be combined with a classification model that predicts the probability of default, while the regression model predicts the amount in the case of a default) · Recommendations (the Alternating Least Squares factorization model from Chapter 5, Building a Recommendation Engine with Spark, uses linear regression in each iteration) · Predicting customer lifetime value (CLTV) in a retail, mobile, or other business, based on user behavior and spending patterns In the different sections of this chapter, we will do the following: Introduce the various types of regression models available in ML · Explore feature extraction and target variable transformation for regression models · Train a number of regression models using ML · Building a Regression Model with Spark · See how to make predictions using the trained model · Investigate the impact on performance of various parameter settings for regression using cross-validation Types of regression models The core idea of linear models (or generalized linear models) is that we model the predicted outcome of interest (often called the target or dependent variable) as a function of a simple linear predictor applied to the input variables (also referred to as
Answered Same DaySep 22, 2020ICT707University of the Sunshine Coast

Answer To: ICT112 Week 4 Lab ICT707 Big Data Assignment Big Data Assignment Marking Criteria The Big Data...

Akash answered on Oct 08 2020
137 Votes
assign2/.DS_Store
__MACOSX/assign2/._.DS_Store
assign2/.ipynb_checkpoints/bike-checkpoint.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"path = \"/Users/priya/Desktop/Bike-Sharing-Dataset/bike.csv\"\n",
"data_df = sc.textFile(path)\n",
"data_count= data_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['1', '2011-01-01', '1', '0', '1', '0', '0', '6', '0', '1', '0.24', '0.2879', '0.81', '0', '3', '13', '16']\n"
]
}
],
"source": [
"data_rec = data_df.map(lambda x: x.split(\",\"))\n",
"first = data_rec.first()\n",
"print (first)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"17379\n"
]
}
],
"source": [
"print (data_count)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Now we have 17379 hourly records,we removed column name already by using unix command"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"sed 1d hour.csv > new_hour.csv We will ignore the record ID and raw date columns. \n",
"We will also ignore the casual and registered count target variables and focus on the \n",
"overall count variable, cnt (which is the sum of the other two counts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"from below command we are cache are data to use again"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PythonRDD[6] at RDD at PythonRDD.scala:48"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_rec.cache()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"now extract each catagorical variable into a binary vector form \n",
"Let's define a function that will extract this mapping from our dataset for a given column:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def get_mapping(rdd, idx):\n",
" return rdd.map(lambda fields: fields[idx]).distinct().zipWithIndex().collectAsMap()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"above function first map the all field to its unique values and uses the zipwithindex \n",
"transformation to performed key value rdd.\n",
"and key is the variable and value is the index\n",
"We can test our function on the third variable column (index 2):\n",
"so i am taking records is rdd and 2 is index of 3rd variable\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mapping of feature catagorical columns: {'1': 0, '4': 1, '2': 2, '3': 3}\n"
]
}
],
"source": [
"print(\"mapping of feature catagorical columns: %s\" %get_mapping(data_rec,2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"applying above function to each categorical column \n",
"for variable index from 2 to 9"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"mappings = [get_mapping(data_rec, i) for i in range(2,10)]\n",
"catagorical_len = sum(map(len, mappings))\n",
"num_len = len(data_rec.first()[11:15])\n",
"total_length = num_len + catagorical_len"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have to mappings for each variable, \n",
"and we can see how many values in total we need \n",
"for our binary vector representation:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature vector length for categorical features: 57\n",
"Feature vector length for numerical features: 4\n",
"Total feature vector length: 61\n"
]
}
],
"source": [
"print (\"Feature vector length for categorical features: %d\" % catagorical_len)\n",
"print (\"Feature vector length for numerical features: %d\" % num_len)\n",
"print (\"Total feature vector length: %d\" % total_length)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"creating feature vector for linear model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"again we calling mapping function to convert catagorical to binary-encoded features\n",
"import numpy for linear algebra utilities and MLlib LabeledPoint class to wrap our feature vectors and target variables"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.regression import LabeledPoint\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def extract_feat(record):\n",
" catagorical_vector = np.zeros(catagorical_len)\n",
" j = 0\n",
" steps = 0\n",
" for fields in record[2:9]:\n",
" mapp = mappings[j]\n",
" idx = mapp[fields]\n",
" catagorical_vector[idx + steps] = 1\n",
" j = j + 1\n",
" steps = steps + len(mapp)\n",
" number_vector = np.array([float(field) for field in record[10:14]])\n",
" return np.concatenate((catagorical_vector, number_vector))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def ex_label(record):\n",
" return float(record[-1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ex_features function, we cross through each column in the row of data. \n",
"We find the binary encoding for each single variable in every turn \n",
"from the mappings we created previously\n",
"The step variable ensures that the nonzero feature index in the full feature vector is correct\n",
"(and is somewhat more efficient than, say, creating many smaller binary vectors and \n",
" concatenating them). The numeric vector is created directly by first converting the data \n",
"to floating point numbers and wrapping these in a numpy array. The resulting two vectors \n",
"are then concatenated. The extract_label function simply converts the last column variable \n",
"(the count) into a float. With our utility functions defined, we can proceed with extracting \n",
"feature vectors and labels from our data records:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"data = data_rec.map(lambda r: LabeledPoint(ex_label(r), extract_feat(r)))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Raw data: ['1', '0', '1', '0', '0', '6', '0', '1', '0.24', '0.2879', '0.81', '0', '3', '13', '16']\n",
"Label: 16.0\n",
"Linear Model feature vector:\n",
"[1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.24,0.2879,0.81,0.0]\n",
"Linear Model feature vector length: 61\n"
]
}
],
"source": [
"first_point = data.first()\n",
"print (\"Raw data: \" + str(first[2:]))\n",
"print (\"Label: \" + str(first_point.label))\n",
"print (\"Linear Model feature vector:\\n\" + str(first_point.features))\n",
"print (\"Linear Model feature vector length: \" + str(len(first_point.features)))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.regression import LinearRegressionWithSGD"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
}
],
"source": [
"linear_model = LinearRegressionWithSGD.train(data, iterations=10,step=0.1, intercept=False)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Model predictions: [(16.0, 117.89250386724846), (40.0, 116.2249612319211), (32.0, 116.02369145779235), (13.0, 115.67088016754433), (1.0, 115.56315650834317)]\n"
]
}
],
"source": [
"true_vs_predicted = data.map(lambda p: (p.label, linear_model.predict(p.features)))\n",
"print (\"Linear Model predictions: \" + str(true_vs_predicted.take(5)))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Model - Mean Squared Error: 30679.4539\n"
]
}
],
"source": [
"li=[]\n",
"for i in true_vs_predicted.collect():\n",
" true,pred=i[0],i[1]\n",
" val=(pred - true)**2\n",
" li.append(val)\n",
"lenth=len(li)\n",
"su=sum(li)\n",
"mean=su/lenth\n",
"print (\"Linear Model - Mean Squared Error: %2.4f\" % mean)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"targets = data_rec.map(lambda r: float(r[-1])).collect()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"import pylab"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/anaconda3/lib/python3.6/site-packages/IPython/core/magics/pylab.py:160: UserWarning: pylab import has clobbered these variables: ['mean', 'pylab']\n",
"`%matplotlib` prevents importing * from pylab and numpy\n",
" \"\\n`%matplotlib` prevents importing * from pylab and numpy\"\n"
]
}
],
"source": [
"%pylab inline"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA+kAAAJ4CAYAAAAZcKItAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X/sZXV+1/HX25lCu6uyLTsaFtCZhnF0aNIffoOrNY2WWgbbdAyh6aBVUjH4B9jWHzHgH1VJNpHEFGtkm5CFitjugDjVSbMp/UFNNVHgS7faHeg3foVapozdqVDqjxQcfPvHPazffvnOfO8wl/l+7szjkRDu/dzPOfeczdnLPOece251dwAAAICd97t2egMAAACAGZEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMIjdO70B5+PjH/947927d6c3AwAAAM7Liy+++BvdvWe7eUsV6Xv37s3q6upObwYAAACcl6r6r/PMc7k7AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCB27/QGXKqOrZ1a2LpuO3DNwtYFAADAuJxJBwAAgEGIdAAAABiESAcAAIBBiHQAAAAYhEgHAACAQYh0AAAAGIRIBwAAgEGIdAAAABiESAcAAIBBiHQAAAAYhEgHAACAQYh0AAAAGIRIBwAAgEGIdAAAABiESAcAAIBBiHQAAAAYhEgHAACAQYh0AAAAGIRIBwAAgEGIdAAAABjEXJFeVYeqaq2q1qvqvi1ev7Kqnpxef66q9m547f5pfK2qbtkw/ter6kRVfaGqPltVX76IHQIAAIBltW2kV9WuJA8nuTXJwSR3VNXBTdPuSvJmd9+Q5KEkD07LHkxyJMmNSQ4l+XRV7aqqa5N8b5KV7v6aJLumeQAAAHDZmudM+k1J1rv7le5+J8nRJIc3zTmc5PHp8dNJbq6qmsaPdvfb3f1qkvVpfUmyO8lXVNXuJB9J8vqF7QoAAAAst3ki/dokr214fnIa23JOd59J8laSq8+2bHf/WpJ/mORXk5xK8lZ3/9RWb15Vd1fValWtnj59eo7NBQAAgOU0T6TXFmM955wtx6vqKzM7y74vySeSfLSqvnurN+/uR7p7pbtX9uzZM8fmAgAAwHKaJ9JPJrl+w/Pr8v5L0780Z7p8/aokb5xj2W9J8mp3n+7u/5PkWJI/8UF2AAAAAC4V80T6C0n2V9W+qroisxu8Hd8053iSO6fHtyd5trt7Gj8y3f19X5L9SZ7P7DL3T1bVR6bvrt+c5OUL3x0AAABYXru3m9DdZ6rq3iTPZHYX9se6+0RVPZBktbuPJ3k0yRNVtZ7ZGfQj07InquqpJC8lOZPknu5+N8lzVfV0kl+Yxj+f5JHF7x4AAAAsj5qd8F4OKysrvbq6utObMZdja6cWtq7bDlyzsHUBAABw8VXVi929st28eS53BwAAAC4CkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMIi5Ir2qDlXVWlWtV9V9W7x+ZVU9Ob3+XFXt3fDa/dP4WlXdMo0dqKpf3PDPb1XV9y9qpwAAAGAZ7d5uQlXtSvJwkj+T5GSSF6rqeHe/tGHaXUne7O4bqupIkgeTfFdVHUxyJMmNST6R5Geq6g9191qSr9uw/l9L8uML3C8AAABYOvOcSb8pyXp3v9Ld7yQ5muTwpjmHkzw+PX46yc1VVdP40e5+u7tfTbI+rW+jm5P8l+7+rx90JwAAAOBSME+kX5vktQ3PT05jW87p7jNJ3kpy9ZzLHkny2bO9eVXdXVWrVbV6+vTpOTYXAAAAltM8kV5bjPWcc865bFVdkeQ7kvyLs715dz/S3SvdvbJnz545NhcAAACW0zyRfjLJ9RueX5fk9bPNqardSa5K8sYcy96a5Be6+9fPb7MBAADg0jNPpL+QZH9V7ZvOfB9JcnzTnONJ7pwe357k2e7uafzIdPf3fUn2J3l+w3J35ByXugMAAMDlZNu7u3f3maq6N8kzSXYleay7T1TVA0lWu/t4kkeTPFFV65mdQT8yLXuiqp5K8lKSM0nu6e53k6SqPpLZHeP/6oewXwAAALB0to30JOnuzyX53KaxH9jw+LeTfOdZlv1Ukk9tMf6/M7u5HAAAAJD5LncHAAAALgKRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAg5gr0qvqUFWtVdV6Vd23xetXVtWT0+vPVdXeDa/dP42vVdUtG8Y/VlVPV9UvV9XLVfXHF7FDAAAAsKy2jfSq2pXk4SS3JjmY5I6qOrhp2l1J3uzuG5I8lOTBadmDSY4kuTHJoSSfntaXJD+U5Ce7+w8n+dokL1/47gAAAMDymudM+k1J1rv7le5+J8nRJIc3zTmc5PHp8dNJbq6qmsaPdvfb3f1qkvUkN1XV703yTUkeTZLufqe7f/PCdwcAAACW1zyRfm2S1zY8PzmNbTmnu88keSvJ1edY9quTnE7yI1X1+ar6TFV9dKs3r6q7q2q1qlZPnz49x+YCAADAcpon0muLsZ5zztnGdyf5hiQ/3N1fn+R/JXnfd92TpLsf6e6V7l7Zs2fPHJsLAAAAy2meSD+Z5PoNz69L8vrZ5lTV7iRXJXnjHMueTHKyu5+bxp/OLNoBAADgsjVPpL+QZH9V7auqKzK7EdzxTXOOJ7lzenx7kme7u6fxI9Pd3/cl2Z/k+e7+b0leq6oD0zI3J3npAvcFAAAAltru7SZ095mqujfJM0l2JXmsu09U1QNJVrv7eGY3gHuiqtYzO4N+ZFr2RFU9lVmAn0lyT3e/O636ryX50Sn8X0nyPQveNwAAAFgqNTvhvRxWVlZ6dXV1pzdjLsfWTi1sXbcduGZh6wIAAODiq6oXu3tlu3nzXO4OAAAAXAQiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgEHNFelUdqqq1qlqvqvu2eP3Kqnpyev25qtq74bX7p/G1qrplw/ivVNUvVdUvVtXqInYGAAAAltnu7SZU1a4kDyf5M0lOJnmhqo5390sbpt2V5M3uvqGqjiR5MMl3VdXBJEeS3JjkE0l+pqr+UHe/Oy33p7v7Nxa4PwAAALC05jmTflOS9e5+pbvfSXI0yeFNcw4neXx6/HSSm6uqpvGj3f12d7+aZH1aHwAAALDJPJF+bZLXNjw/OY1tOae7zyR5K8nV2yzbSX6qql6sqrvP9uZVdXdVrVbV6unTp+fYXAAAAFhO80R6bTHWc84517Lf2N3fkOTWJPdU1Tdt9ebd/Uh3r3T3yp49e+bYXAAAAFhO80T6ySTXb3h+XZLXzzanqnYnuSrJG+datrvf+/cXk/x4XAYPAADAZW6eSH8hyf6q2ldVV2R2I7jjm+YcT3Ln9Pj2JM92d0/jR6a7v+9Lsj/J81X10ar6PUlSVR9N8q1JvnDhuwMAAADLa9u7u3f3maq6N8kzSXYleay7T1TVA0lWu/t4kkeTPFFV65mdQT8yLXuiqp5K8lKSM0nu6e53q+r3J/nx2b3lsjvJj3X3T34I+wcAAABLY9tIT5Lu/lySz20a+4ENj387yXeeZdlPJfnUprFXknzt+W4sAAAAXMrmudwdAAAAuAhEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIOaK9Ko6VFVrVbVeVfdt8fqVVfXk9PpzVbV3w2v3T+NrVXXLpuV2VdXnq+onLnRHAAAAYNltG+lVtSvJw0luTXIwyR1VdXDTtLuSvNndNyR5KMmD07IHkxxJcmOSQ0k+Pa3vPd+X5OUL3QkAAAC4FMxzJv2mJOvd/Up3v5PkaJLDm+YcTvL49PjpJDdXVU3jR7v77e5+Ncn6tL5U1XVJvi3JZy58NwAAAGD5zRPp1yZ5bcPzk9PYlnO6+0ySt5Jcvc2y/yjJ307yf8/15lV1d1WtVtXq6dOn59hcAAAAWE7zRHptMdZzztlyvKq+PckXu/vF7d68ux/p7pXuXtmzZ8/2WwsAAABLap5IP5nk+g3Pr0vy+tnmVNXuJFcleeMcy35jku+oql/J7PL5b66qf/4Bth8AAAAuGfNE+gtJ9lfVvqq6IrMbwR3fNOd4kjunx7cneba7exo/Mt39fV+S/Ume7+77u/u67t47re/Z7v7uBewPAAAALK3d203o7jNVdW+SZ5LsSvJYd5+oqgeSrHb38SSPJnmiqtYzO4N+ZFr2RFU9leSlJGeS3NPd735I+wIAAABLrWYnvJfDyspKr66u7vRmzOXY2qmFreu2A9csbF0AAABcfFX1YnevbDdvnsvdAQAAgItApAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgdu/0BrC9Y2unFrau2w5cs7B1AQAAsFjOpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMYq5Ir6pDVbVWVetVdd8Wr19ZVU9Orz9XVXs3vHb/NL5WVbdMY19eVc9X1X+sqhNV9fcXtUMAAACwrLaN9KraleThJLcmOZjkjqo6uGnaXUne7O4bkjyU5MFp2YNJjiS5McmhJJ+e1vd2km/u7q9N8nVJDlXVJxezSwAAALCc5jmTflOS9e5+pbvfSXI0yeFNcw4neXx6/HSSm6uqpvGj3f12d7+aZD3JTT3zP6f5Xzb90xe4LwAAALDU5on0a5O8tuH5yWlsyzndfSbJW0muPteyVbWrqn4xyReT/HR3P7fVm1fV3VW1WlWrp0+fnmNzAQAAYDnNE+m1xdjms95nm3PWZbv73e7+uiTXJbmpqr5mqzfv7ke6e6W7V/bs2TPH5gIAAMBymifSTya5fsPz65K8frY5VbU7yVVJ3phn2e7+zST/JrPvrAMAAMBla55IfyHJ/qraV1VXZHYjuOOb5hxPcuf0+PYkz3Z3T+NHpru/70uyP8nzVbWnqj6WJFX1FUm+JckvX/juAAAAwPLavd2E7j5TVfcmeSbJriSPdfeJqnogyWp3H0/yaJInqmo9szPoR6ZlT1TVU0leSnImyT3d/W5VXZPk8elO778ryVPd/RMfxg4CAADAsqjZCe/lsLKy0qurqzu9GXM5tnZqpzdhS7cduGanNwEAAOCyU1UvdvfKdvPmudwdAAAAuAhEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIHbv9AZwcR1bO7Wwdd124JqFrQsAAABn0gEAAGAYIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQIh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBC7d3oDIEmOrZ1a6PpuO3DNQtcHAABwMTiTDgAAAIMQ6QAAADAIkQ4AAACD8J10PrBFf48cAADgcifSuSQt8i8Q3IQOAAC4WFzuDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwiN07vQFwOTm2dmph67rtwDULWxcAADAGZ9IBAABgECIdAAAABiHSAQAAYBC+kw7bWOT3yAEAAM7FmXQAAAAYhEgHAACAQbjcHZaUn3MDAIBLz1xn0qvqUFWtVdV6Vd23xetXVtWT0+vPVdXeDa/dP42vVdUt09j1VfVzVfVyVZ2oqu9b1A4BAADAsto20qtqV5KHk9ya5GCSO6rq4KZpdyV5s7tvSPJQkgenZQ8mOZLkxiSHknx6Wt+ZJH+zu/9Ikk8muWeLdQIAAMBlZZ4z6TclWe/uV7r7nSRHkxzeNOdwksenx08nubmqaho/2t1vd/erSdaT3NTdp7r7F5Kku/9HkpeTXHvhuwMAAADLa55IvzbJaxuen8z7g/pLc7r7TJK3klw9z7LTpfFfn+S5rd68qu6uqtWqWj19+vQcmwsAAADLaZ5Iry3Ges4551y2qn53kn+Z5Pu7+7e2evPufqS7V7p7Zc+ePXNsLgAAACyneSL9ZJLrNzy/LsnrZ5tTVbuTXJXkjXMtW1Vfllmg/2h3H/sgGw8AAACXknki/YUk+6tqX1VdkdmN4I5vmnM8yZ3T49uTPNvdPY0fme7+vi/J/iTPT99XfzTJy939g4vYEQAAAFh22/5Oenefqap7kzyTZFeSx7r7RFU9kGS1u49nFtxPVNV6ZmfQj0zLnqiqp5K8lNkd3e/p7ner6k8m+YtJfqmqfnF6q7/T3Z9b9A4CAADAstg20pNkiufPbRr7gQ2PfzvJd55l2U8l+dSmsX+Xrb+vDgAAAJeteS53BwAAAC4CkQ4AAACDEOkAAAAwiLm+kw4wr2Nrpxa2rtsOXLOwdQEAwDJwJh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBAiHQAAAAYh0gEAAGAQfoINWOjPpgEAAB+cM+kAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADAIkQ4AAACD2L3TGwBwNsfWTi10fbcduGah6wMAgEVzJh0AAAAGIdIBAABgECIdAAAABiHSAQAAYBBuHAdcNhZ5Izo3oQMA4MPgTDoAAAAMwpl0gA/AWXkAAD4MzqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAxCpAMAAMAgRDoAAAAMYvdObwDA5e7Y2qmFreu2A9csbF0AAFx8zqQDAADAIEQ6AAAADEKkAwAAwCDmivSqOlRVa1W1XlX3bfH6lVX15PT6c1W1d8Nr90/ja1V1y4bxx6rqi1X1hUXsCAAAACy7bSO9qnYleTjJrUkOJrmjqg5umnZXkje7+4YkDyV5cFr2YJIjSW5McijJp6f1Jck/ncYAAACAzHcm/aYk6939Sne/k+RoksOb5hxO8vj0+OkkN1dVTeNHu/vt7n41yfq0vnT3zyd5YwH7AAAAAJeEeSL92iSvbXh+chrbck53n0nyVpKr51wWAAAAyHyRXluM9Zxz5ln23G9edXdVrVbV6unTp89nUQAAAFgq80T6ySTXb3h+XZLXzzanqnYnuSqzS9nnWfacuvuR7l7p7pU9e/acz6IAAACwVHbPMeeFJPural+SX8vsRnB/ftOc40nuTPLvk9ye5Nnu7qo6nuTHquoHk3wiyf4kzy9q4wH4nY6tnVrYum47cM3C1gUAwHy2PZM+fcf83iTPJHk5yVPdfaKqHqiq75imPZrk6qpaT/I3ktw3LXsiyVNJXkryk0nu6e53k6SqPptZ1B+oqpNVdddidw0AAACWS3Wf11fEd9TKykqvrq7u9GbMZZFnswB2gjPpAACLU1UvdvfKdvPm+U46AAAAcBGIdAAAABiESAcAAIBBiHQAAAAYxDw/wQYAF2TRN9N0UzsA4FIl0gHYkl+pAAC4+FzuDgAAAIMQ6QAAADAIkQ4AAACDEOkAAAAwCJEOAAAAgxDpAAAAMAiRDgAAAIMQ6QAAADCI3Tu9AQBwvo6tnVrYum47cM3C1gUAcKGcSQcAAIBBiHQAAAAYhEgHAACAQYh0AAAAGIRIBwAAgEGIdAAAABiESAcAAIBB+J10AC5rfnMdABiJM+kAAAAwCJEOAAAAg3C5OwAsiEvnAYAL5Uw6AAAADMKZdAC4xDnDDwDLQ6QDwIAWGdYAwPJwuTsAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAzC3d0BgLkt+q7zftINAH4nZ9IBAABgECIdAAAABuFydwBgxyzy8nmXzgNwKXAmHQAAAAYh0gEAAGAQLncHAC4Ji77z/KK4DB+A8+FMOgAAAAxCpAMAAMAgRDoAAAAMQqQDAADAINw4DgDgQ+S34AE4H86kAwAAwCCcSQcAWBLOygNc+pxJBwAAgEE4kw4AcBla5Fn5xJl5gEUR6QAAXLBFR/+i+MsDYNm43B0AAAAGIdIBAABgECIdAAAABuE76QAAXLL8bB2wbEQ6AADMQfCfP/+bwfkT6QAAcJGJV+BsRDoAAPAlo/6cnr/Y4HIh0gEAgMuK4GdkIh0AAJbYqGe+gQ9mrkivqkNJfijJriSf6e5/sOn1K5P8syR/NMl/T/Jd3f0r02v3J7krybtJvre7n5lnnQAAAKMb+S9JnOVfTtv+TnpV7UrycJJbkxxMckdVHdw07a4kb3b3DUkeSvLgtOzBJEeS3JjkUJJPV9WuOdcJAAAAl5V5zqTflGS9u19Jkqo6muRwkpc2zDmc5O9Nj59O8k+qqqbxo939dpJXq2p9Wl/mWCcAAAAf0Mhn+RflUrxaYJ5IvzbJaxuen0zyx842p7vPVNVbSa6exv/DpmWvnR5vt84kSVXdneTu6en/rKq1ObYtPB6yAAAF1ElEQVR5p308yW/s9EbABXAMs8wcvyw7xzDLzjHMMvswj98/OM+keSK9thjrOeecbXyry+w3r3M22P1IkkfOtYGjqarV7l7Z6e2AD8oxzDJz/LLsHMMsO8cwy2yE43fb76Rndpb7+g3Pr0vy+tnmVNXuJFcleeMcy86zTgAAALiszBPpLyTZX1X7quqKzG4Ed3zTnONJ7pwe357k2e7uafxIVV1ZVfuS7E/y/JzrBAAAgMvKtpe7T98xvzfJM5n9XNpj3X2iqh5Istrdx5M8muSJ6cZwb2QW3ZnmPZXZDeHOJLmnu99Nkq3Wufjd2zFLdXk+bMExzDJz/LLsHMMsO8cwy2zHj9+anfAGAAAAdto8l7sDAAAAF4FIBwAAgEGI9AWqqkNVtVZV61V1305vD2ylqq6vqp+rqper6kRVfd80/lVV9dNV9Z+nf3/lNF5V9Y+n4/o/VdU37OweQFJVu6rq81X1E9PzfVX13HT8PjndlDTTjUufnI7f56pq705uNyRJVX2sqp6uql+ePov/uM9glklV/fXpzxBfqKrPVtWX+xxmZFX1WFV9saq+sGHsvD93q+rOaf5/rqo7t3qvRRDpC1JVu5I8nOTWJAeT3FFVB3d2q2BLZ5L8ze7+I0k+meSe6Vi9L8nPdvf+JD87PU9mx/T+6Z+7k/zwxd9keJ/vS/LyhucPJnloOn7fTHLXNH5Xkje7+4YkD03zYKf9UJKf7O4/nORrMzuWfQazFKrq2iTfm2Slu78ms5tAH4nPYcb2T5Mc2jR2Xp+7VfVVSf5ukj+W5KYkf/e9sF80kb44NyVZ7+5XuvudJEeTHN7hbYL36e5T3f0L0+P/kdkfDq/N7Hh9fJr2eJI/Nz0+nOSf9cx/SPKxqrrmIm82fElVXZfk25J8ZnpeSb45ydPTlM3H73vH9dNJbp7mw46oqt+b5Jsy+2WcdPc73f2b8RnMctmd5CuqaneSjyQ5FZ/DDKy7fz6zXyHb6Hw/d29J8tPd/UZ3v5nkp/P+8F8Ikb441yZ5bcPzk9MYDGu65OzrkzyX5Pd396lkFvJJft80zbHNaP5Rkr+d5P9Oz69O8pvdfWZ6vvEY/dLxO73+1jQfdspXJzmd5Eemr2x8pqo+Gp/BLInu/rUk/zDJr2YW528leTE+h1k+5/u5e9E+j0X64mz1N4J+345hVdXvTvIvk3x/d//WuaZuMebYZkdU1bcn+WJ3v7hxeIupPcdrsBN2J/mGJD/c3V+f5H/l/19iuRXHMEOZLu89nGRfkk8k+Whmlwdv5nOYZXW2Y/aiHcsifXFOJrl+w/Prkry+Q9sC51RVX5ZZoP9odx+bhn/9vUsop39/cRp3bDOSb0zyHVX1K5l9reibMzuz/rHpssvkdx6jXzp+p9evyvsvd4OL6WSSk9393PT86cyi3Wcwy+Jbkrza3ae7+/8kOZbkT8TnMMvnfD93L9rnsUhfnBeS7J/ubHlFZjfQOL7D2wTvM30P7NEkL3f3D2546XiS9+5SeWeSf71h/C9Nd7r8ZJK33rs0CC627r6/u6/r7r2Zfc4+291/IcnPJbl9mrb5+H3vuL59mu8MDjumu/9bkteq6sA0dHOSl+IzmOXxq0k+WVUfmf5M8d4x7HOYZXO+n7vPJPnWqvrK6YqSb53GFq78f2RxqurPZnZGZ1eSx7r7Uzu8SfA+VfUnk/zbJL+U//+d3r+T2ffSn0ryBzL7D/B3dvcb03+A/0lmN8b430m+p7tXL/qGwyZV9aeS/K3u/vaq+urMzqx/VZLPJ/nu7n67qr48yROZ3XvhjSRHuvuVndpmSJKq+rrMbnx4RZJXknxPZidOfAazFKrq7yf5rsx+MebzSf5KZt/N9TnMkKrqs0n+VJKPJ/n1zO7S/q9ynp+7VfWXM/tzc5J8qrt/5EPZXpEOAAAAY3C5OwAAAAxCpAMAAMAgRDoAAAAMQqQDAADAIEQ6AAAADEKkAwAAwCBEOgAAAAzi/wHCEC8MiF7HCQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"hist(targets, bins=45, color='lightblue', normed=True)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"\n",
"fig.set_size_inches(17, 11)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA6sAAAJCCAYAAAAm3lF7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X+s3fdd3/HXG5uUUSgUYk1enBAXjEX4oYZd0k0VZaJp6woUV1Er3IkpTJWyTs0o6qaRDpRqQZVKkfjxRxiNWqPCKF5pjWQhs6yjLRtiob5pC51T7uqY0tzZWw3p6Dogwel7f9wTdHpzHR/XJz4fn/t4SFbO93s+32/eV0dV+vT3e763ujsAAAAwkq9Y9AAAAACwmVgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABjOzkUPsNm1117bN95446LHAAAA4Fnw0EMP/Vl377rYuuFi9cYbb8zq6uqixwAAAOBZUFV/Oss6twEDAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcGaK1ao6UFVrVXWqqu5+hnWvrqquqpWpfW+eHLdWVa+Yx9AAAAAst50XW1BVO5Lcl+RlSdaTnKiqY9398KZ1X5vkR5P8wdS+m5IcSvLtSf5ekv9cVd/a3U/O70cAAABg2cxyZfWWJKe6+3R3P5HkSJKDW6z7qSRvT/LXU/sOJjnS3Y93958kOTU5HwAAAFzQLLF6XZJHp7bXJ/v+VlXdnOT67v6tSz0WAAAANpslVmuLff23b1Z9RZKfS/IvL/XYqXPcWVWrVbV67ty5GUYCAABgmc0Sq+tJrp/a3pPkzNT21yb5jiQfrqpPJ/kHSY5NHrJ0sWOTJN19f3evdPfKrl27Lu0nAAAAYOnMEqsnkuyrqr1VdU02Hph07Kk3u/svuvva7r6xu29M8mCS27p7dbLuUFU9p6r2JtmX5CNz/ykAAABYKhd9GnB3n6+qu5I8kGRHksPdfbKq7k2y2t3HnuHYk1X13iQPJzmf5A2eBAwAAMDFVPfTvkK6UCsrK726urroMQAAAHgWVNVD3b1ysXWz3AYMAAAAV9RFbwMGAIBldHTt7FzPd/v+3XM9H2x3rqwCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMPZuegBAABgFkfXzi56BOAKcmUVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhrNz0QMAAMAyOLp2du7nvH3/7rmfE64WrqwCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxnplitqgNVtVZVp6rq7i3ef31VfaKqPl5Vv1dVN03231hVfzXZ//Gq+qV5/wAAAAAsn50XW1BVO5Lcl+RlSdaTnKiqY9398NSy93T3L03W35bkZ5McmLz3SHe/cL5jAwAAsMxmubJ6S5JT3X26u59IciTJwekF3f35qc3nJun5jQgAAMB2M0usXpfk0ant9cm+L1FVb6iqR5K8PcmPTr21t6o+VlW/W1Xfu9W/oKrurKrVqlo9d+7cJYwPAADAMpolVmuLfU+7ctrd93X3Nyf58SQ/Odl9NskN3X1zkjcleU9VPW+LY+/v7pXuXtm1a9fs0wMAALCUZonV9STXT23vSXLmGdYfSfKqJOnux7v7zyevH0rySJJv/fJGBQAAYLuYJVZPJNlXVXur6pokh5Icm15QVfumNn8gyacm+3dNHtCUqnpBkn1JTs9jcAAAAJbXRZ8G3N3nq+quJA8k2ZHkcHefrKp7k6x297Ekd1XVrUn+JsnnktwxOfwlSe6tqvNJnkzy+u5+7Nn4QQAAAFgeF43VJOnu40mOb9p3z9TrN17guPcnef/lDAgAAMD2M8ttwAAAAHBFiVUAAACGI1YBAAAYjlgFAABgODM9YAkAALjyjq6dnev5bt+/e67ng2eTK6sAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADMfTgAEAmLt5P8UW2H5cWQUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDh7Fz0AAAAwJVxdO3sXM93+/7dcz0fTHNlFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGM1OsVtWBqlqrqlNVdfcW77++qj5RVR+vqt+rqpum3nvz5Li1qnrFPIcHAABgOV00VqtqR5L7krwyyU1JXjsdoxPv6e7v7O4XJnl7kp+dHHtTkkNJvj3JgSS/ODkfAAAAXNDOGdbckuRUd59Okqo6kuRgkoefWtDdn59a/9wkPXl9MMmR7n48yZ9U1anJ+f7bHGYHAGBOjq6dXfQIAF9illi9LsmjU9vrSV60eVFVvSHJm5Jck+T7p459cNOx131ZkwIAALBtzPKd1dpiXz9tR/d93f3NSX48yU9eyrFVdWdVrVbV6rlz52YYCQAAgGU2S6yuJ7l+antPkjPPsP5IklddyrHdfX93r3T3yq5du2YYCQAAgGU2S6yeSLKvqvZW1TXZeGDSsekFVbVvavMHknxq8vpYkkNV9Zyq2ptkX5KPXP7YAAAALLOLfme1u89X1V1JHkiyI8nh7j5ZVfcmWe3uY0nuqqpbk/xNks8luWNy7Mmqem82HsZ0PskbuvvJZ+lnAQAAYEnM8oCldPfxJMc37btn6vUbn+HYtyZ565c7IAAAANvPLLcBAwAAwBUlVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDh7Fz0AAAAXLqja2cXPQLAs8qVVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhzBSrVXWgqtaq6lRV3b3F+2+qqoer6o+q6neq6pum3nuyqj4++XNsnsMDAACwnHZebEFV7UhyX5KXJVlPcqKqjnX3w1PLPpZkpbv/sqr+eZK3J/mhyXt/1d0vnPPcAAAALLFZrqzekuRUd5/u7ieSHElycHpBd3+ou/9ysvlgkj3zHRMAAIDtZJZYvS7Jo1Pb65N9F/K6JL89tf1VVbVaVQ9W1au2OqCq7pysWT137twMIwEAALDMLnobcJLaYl9vubDqh5OsJPm+qd03dPeZqnpBkg9W1Se6+5EvOVn3/UnuT5KVlZUtzw0AAMD2McuV1fUk109t70lyZvOiqro1yU8kua27H39qf3efmfzzdJIPJ7n5MuYFAABgG5glVk8k2VdVe6vqmiSHknzJU32r6uYk78hGqH52av/zq+o5k9fXJnlxkukHMwEAAMDTXPQ24O4+X1V3JXkgyY4kh7v7ZFXdm2S1u48l+ZkkX5PkN6oqST7T3bcl+bYk76iqL2YjjN+26SnCAAAA8DSzfGc13X08yfFN++6Zen3rBY77/STfeTkDAgAAsP3MchswAAAAXFEzXVkFAADY7Oja2bmf8/b9u+d+Tq5OrqwCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAw9m56AEAALaDo2tnFz0CwFXFlVUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIazc9EDAAAAPOXo2tm5nu/2/bvnej6uHFdWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYzU6xW1YGqWquqU1V19xbvv6mqHq6qP6qq36mqb5p6746q+tTkzx3zHB4AAIDldNFYraodSe5L8sokNyV5bVXdtGnZx5KsdPd3JXlfkrdPjv2GJG9J8qIktyR5S1U9f37jAwAAsIxmubJ6S5JT3X26u59IciTJwekF3f2h7v7LyeaDSfZMXr8iyQe6+7Hu/lySDyQ5MJ/RAQAAWFazxOp1SR6d2l6f7LuQ1yX57S/zWAAAAMjOGdbUFvt6y4VVP5xkJcn3XcqxVXVnkjuT5IYbbphhJAAAAJbZLFdW15NcP7W9J8mZzYuq6tYkP5Hktu5+/FKO7e77u3ulu1d27do16+wAAAAsqVli9USSfVW1t6quSXIoybHpBVV1c5J3ZCNUPzv11gNJXl5Vz588WOnlk30AAABwQRe9Dbi7z1fVXdmIzB1JDnf3yaq6N8lqdx9L8jNJvibJb1RVknymu2/r7seq6qeyEbxJcm93P/as/CQAAAAsjVm+s5ruPp7k+KZ990y9vvUZjj2c5PCXOyAAAADbzyy3AQMAAMAVJVYBAAAYjlgFAABgOGIVAACA4YhVAAAAhjPT04ABALaTo2tnFz0CwLbnyioAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMPZuegBAAAu19G1s4seAYA5c2UVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOHsnGVRVR1I8gtJdiR5Z3e/bdP7L0ny80m+K8mh7n7f1HtPJvnEZPMz3X3bPAYHAK5eR9fOLnoEAAZ30Vitqh1J7kvysiTrSU5U1bHufnhq2WeS/EiSf7XFKf6qu184h1kBAADYJma5snpLklPdfTpJqupIkoNJ/jZWu/vTk/e++CzMCAAAwDYzy3dWr0vy6NT2+mTfrL6qqlar6sGqetVWC6rqzsma1XPnzl3CqQEAAFhGs8RqbbGvL+HfcUN3ryT5x0l+vqq++Wkn676/u1e6e2XXrl2XcGoAAACW0Syxup7k+qntPUnOzPov6O4zk3+eTvLhJDdfwnwAAABsQ7PE6okk+6pqb1Vdk+RQkmOznLyqnl9Vz5m8vjbJizP1XVcAAADYykVjtbvPJ7kryQNJPpnkvd19sqrurarbkqSqvqeq1pO8Jsk7qurk5PBvS7JaVX+Y5ENJ3rbpKcIAAADwNDP9ntXuPp7k+KZ990y9PpGN24M3H/f7Sb7zMmcEAABgm5nlNmAAAAC4osQqAAAAwxGrAAAADEesAgAAMJyZHrAEAABwNTq6dnau57t9/+65no8Lc2UVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4exc9AAAwNiOrp1d9AgAbEOurAIAADAcsQoAAMBwxCoAAADD8Z1VAFiweX8n9Pb9u+d6PgBYBFdWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYzs5FDwAAV5uja2cXPcIzGn0+AJiFK6sAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMJydix4AAJ5NR9fOLnoEAODL4MoqAAAAw3FlFYChuBIKACSurAIAADAgV1aBbW3eV/Fu3797rucDANiuZrqyWlUHqmqtqk5V1d1bvP+SqvpoVZ2vqldveu+OqvrU5M8d8xocAACA5XXRWK2qHUnuS/LKJDcleW1V3bRp2WeS/EiS92w69huSvCXJi5LckuQtVfX8yx8bAACAZTbLldVbkpzq7tPd/USSI0kOTi/o7k939x8l+eKmY1+R5APd/Vh3fy7JB5IcmMPcAAAALLFZvrN6XZJHp7bXs3GldBZbHXvdjMcCXHV8BxYAYD5mubJaW+zrGc8/07FVdWdVrVbV6rlz52Y8NQAAAMtqllhdT3L91PaeJGdmPP9Mx3b3/d290t0ru3btmvHUAAAALKtZYvVEkn1VtbeqrklyKMmxGc//QJKXV9XzJw9WevlkHwAAAFzQRWO1u88nuSsbkfnJJO/t7pNVdW9V3ZYkVfU9VbWe5DVJ3lFVJyfHPpbkp7IRvCeS3DvZBwAAABc0ywOW0t3HkxzftO+eqdcnsnGL71bHHk5y+DJmBAAAYJuZ5TZgAAAAuKJmurIKABcy71/XAwCQuLIKAADAgMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADD2bnoAYBxHF07O9fz3b5/91zPBwDA9uHKKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAw9m56AGA5XV07excz3f7/t1zPR8AAOMSqwADE/wAwHblNmAAAACG48oqXKXmfcUNAICLc9fTlePKKgAAAMMRqwAAAAzHbcBwAW7xAACAxXFlFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACG41fXsBB+LQwAAPBMXFkFAABgOGIVAACA4YhVAAAAhuM7q8BVY97fdQYAYFyurAIAADAcsQoAAMBwxCoAAADDEasAAAAMxwOWALYRD6kCAK4WrqwCAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMZ+csi6rqQJJfSLIjyTu7+22b3n9Okl9J8veT/HmSH+ruT1fVjUk+mWRtsvTB7n79fEZfnKNrZ+d6vtv3757r+QAAAK52F43VqtqR5L4kL0uynuREVR3r7oenlr0uyee6+1uq6lCSn07yQ5P3HunuF855bgAAAJbYLLcB35LkVHef7u4nkhxJcnDTmoNJ3j15/b4kL62qmt+YAAAAbCezxOp1SR6d2l6f7NtyTXefT/IXSb5x8t7eqvpYVf1uVX3vVv+CqrqzqlaravXcuXOX9AMAAACwfGaJ1a2ukPaMa84muaG7b07ypiTvqarnPW1h9/3dvdLdK7t27ZphJAAAAJbZLLG6nuT6qe09Sc5caE1V7UzydUke6+7Hu/vPk6S7H0rySJJvvdyhAQAAWG6zxOqJJPuqam9VXZPkUJJjm9YcS3LH5PWrk3ywu7uqdk0e0JSqekGSfUlOz2d0AAAAltVFnwbc3eer6q4kD2TjV9cc7u6TVXVvktXuPpbkXUl+tapOJXksG0GbJC9Jcm9VnU/yZJLXd/djz8YPAgAAwPKY6fesdvfxJMc37btn6vVfJ3nNFse9P8n7L3NGAAAAtplZbgMGAACAK0qsAgAAMByxCgAAwHDEKgAAAMMRqwAAAAxHrAIAADAcsQoAAMBwxCoAAADDEasAAAAMR6wCAAAwHLEKAADAcMQqAAAAwxGrAAAADGfnogcAAADYro6unZ37OW/fv3vu51wEV1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDhiFQAAgOGIVQAAAIYjVgEAABiOWAUAAGA4YhUAAIDhiFUAAACGI1YBAAAYjlgFAABgOGIVAACA4YhVAAAAhiNWAQAAGI5YBQAAYDgzxWpVHaiqtao6VVV3b/H+c6rqP0ze/4OqunHqvTdP9q9V1SvmNzoAAADL6qKxWlU7ktyX5JVJbkry2qq6adOy1yX5XHd/S5KfS/LTk2NvSnIoybcnOZDkFyfnAwAAgAua5crqLUlOdffp7n4iyZEkBzetOZjk3ZPX70vy0qqqyf4j3f14d/9JklOT8wEAAMAFzRKr1yV5dGp7fbJvyzXdfT7JXyT5xhmPBQAAgC+xc4Y1tcW+nnHNLMemqu5Mcudk8wtVtTbDXIt0bZI/W/QQzJ3PdTn5XJeTz3U5+VyXj890Oflcl9OV/Fy/aZZFs8TqepLrp7b3JDlzgTXrVbUzydcleWzGY9Pd9ye5f5aBR1BVq929sug5mC+f63LyuS4nn+ty8rkuH5/pcvK5LqcRP9dZbgM+kWRfVe2tqmuy8cCkY5vWHEtyx+T1q5N8sLt7sv/Q5GnBe5PsS/KR+YwOAADAsrroldXuPl9VdyV5IMmOJIe7+2RV3ZtktbuPJXlXkl+tqlPZuKJ6aHLsyap6b5KHk5xP8obufvJZ+lkAAABYErPcBpzuPp7k+KZ990y9/uskr7nAsW9N8tbLmHFEV80ty1wSn+ty8rkuJ5/rcvK5Lh+f6XLyuS6n4T7X2rhbFwAAAMYxy3dWAQAA4IoSq5eoqg5U1VpVnaqquxc9D5evqg5X1Wer6r8vehbmo6qur6oPVdUnq+pkVb1x0TNx+arqq6rqI1X1h5PP9d8ueibmp6p2VNXHquq3Fj0L81FVn66qT1TVx6tqddHzMB9V9fVV9b6q+uPJf2f/4aJn4vJU1f7J/06f+vP5qvqxRc+VuA34klTVjiT/I8nLsvFreU4keW13P7zQwbgsVfWSJF9I8ivd/R2LnofLV1W7k+zu7o9W1dcmeSjJq/xv9epWVZXkud39har6yiS/l+SN3f3ggkdjDqrqTUlWkjyvu39w0fNw+arq00lWutvv41wiVfXuJP+1u985+U0hX93d/2fRczEfk975n0le1N1/uuh5XFm9NLckOdXdp7v7iSRHkhxc8Excpu7+L9l4ijVLorvPdvdHJ6//b5JPJrlusVNxuXrDFyabXzn5429cl0BV7UnyA0neuehZgAurqucleUk2fhNIuvsJobp0XprkkRFCNRGrl+q6JI9Oba/H/wGGoVXVjUluTvIHi52EeZjcKvrxJJ9N8oHu9rkuh59P8q+TfHHRgzBXneQ/Vd
VDVXXnoodhLl6Q5FySX57ctv/Oqnruoodirg4l+fVFD/EUsXppaot9/lYfBlVVX5Pk/Ul+rLs/v+h5uHzd/WR3vzDJniS3VJVb969yVfWDST7b3Q8tehbm7sXd/d1JXpnkDZOv3XB125nku5P8u+6+Ocn/S+IZLkticlv3bUl+Y9GzPEWsXpr1JNdPbe9JcmZBswDPYPKdxvcn+bXuPrroeZivyW1nH05yYMGjcPlenOS2yfcbjyT5/qr694sdiXno7jOTf342yW9m4+tUXN3Wk6xP3dXyvmzEK8vhlUk+2t3/e9GDPEWsXpoTSfZV1d7J3zwcSnJswTMBm0wexPOuJJ/s7p9d9DzMR1Xtqqqvn7z+O0luTfLHi52Ky9Xdb+7uPd19Yzb+u/rB7v7hBY/FZaqq504ecJfJbaIvT+Kp+1e57v5fSR6tqv2TXS9N4uGFy+O1GegW4GTjUj4z6u7zVXVXkgeS7EhyuLtPLngsLlNV/XqSf5Tk2qpaT/KW7n7XYqfiMr04yT9J8onJ9xuT5N909/EFzsTl253k3ZMnFX5Fkvd2t19zAmP6u0l+c+PvDrMzyXu6+z8udiTm5F8k+bXJhZvTSf7pgudhDqoF1kOPAAAATElEQVTqq7PxG0/+2aJnmeZX1wAAADActwEDAAAwHLEKAADAcMQqAAAAwxGrAAAADEesAgAAMByxCgAAwHDEKgAAAMMRqwAAAAzn/wPfSjq58IpfOgAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"log_targets = data_rec.map(lambda r: np.log(float(r[-1]))).collect()\n",
"\n",
"hist(log_targets, bins=40, color='lightblue', normed=True)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"\n",
"fig.set_size_inches(16, 10)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"data_log = data.map(lambda lp: LabeledPoint(np.log(lp.label), lp.features))\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
}
],
"source": [
"model_log = LinearRegressionWithSGD.train(data_log, iterations=10, step=0.1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we have transformed the target variable, the predictions of the model will be on the log scale,\n",
"as will the target values of the transformed dataset. Therefore, in order to use our model and \n",
"evaluate its performance, we must first transform the log data back into the original scale by \n",
"taking the exponent of both the predicted and true values using the numpy exp function.\n",
"We will show you how to do this in the code here:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"true_vs_predicted_log = data_log.map(lambda p: (np.exp(p.label), np.exp(model_log.predict(p.features))))"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"17379\n",
"log - Mean Squared Error: 50685.5559\n",
"log - Mean Absolue Error: 155.2955\n",
"Root Mean Squared Log Error: 1.5411\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_log.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared Error: %2.4f\" % t)\n",
"print(\"log - Mean Absolue Error: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log Error: %2.4f\" % s_log_mean)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Non log-transformed predictions:\n",
"[(16.0, 117.89250386724846), (40.0, 116.2249612319211), (32.0, 116.02369145779235)]\n",
"Log-transformed predictions:\n",
"[(15.999999999999998, 28.080291845456212), (40.0, 26.959480191001784), (32.0, 26.65472562945802)]\n"
]
}
],
"source": [
"print (\"Non log-transformed predictions:\\n\" + str(true_vs_predicted.take(3)))\n",
"\n",
"print (\"Log-transformed predictions:\\n\" + str(true_vs_predicted_log.take(3)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tuning model parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One relatively easy way to do this is by first taking a random sample of, say, 20 percent of our data as our test set. We will then define our training set as the elements of the original RDD that are not in the test set RDD."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Spliting data into training and test data for cross validation"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"train, test = data.randomSplit([0.8, 0.2], seed=12345)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"train_size=train.count()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"test_size=test.count()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training data size: 13834\n"
]
}
],
"source": [
"print (\"Training data size: %d\" % train_size)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test data size: 3545\n"
]
}
],
"source": [
"print (\"Test data size: %d\" % test_size)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train + Test size : 17379\n"
]
}
],
"source": [
"print (\"Train + Test size : %d\" % (train_size + test_size))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can confirm that we now have two distinct datasets that add up to the original dataset in total:\n",
"\n",
"Training data size: 13934\n",
"\n",
"Test data size: 3545\n",
"\n",
"Total data size: 17379\n",
"\n",
"Train + Test size : 17379\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The impact of parameter settings for linear models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have prepared our training and test sets, we are ready to investigate the impact of different parameter settings on model performance. We will first carry out this evaluation for the linear model. We will create a convenience function to evaluate the relevant performance metric by training the model on the training set and evaluating it on the test set for different parameter settings.\n",
"\n",
"We will use the RMSLE evaluation metric, as it is the one used in the Kaggle competition with this dataset, and this allows us to compare our model results against the competition leaderboard to see how we perform.\n",
"\n",
"The evaluation function is defined here:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"def squared_log_error(pred, actual):\n",
" return (np.log(pred + 1) - np.log(actual + 1))**2"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"def evaluate(train, test, iterations, step, regParam, regType, intercept):\n",
"\n",
" model = LinearRegressionWithSGD.train(train, iterations, step, regParam=regParam, regType=regType, intercept=intercept)\n",
"\n",
" tp = test.map(lambda p: (p.label, model.predict(p.features)))\n",
" \n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Iterations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we saw when evaluating our classification models, we generally expect that a model trained with SGD will achieve better performance as the number of iterations increases, although the increase in performance will slow down as the number of iterations goes above some minimum number. Note that here, we will set the step size to 0.01 to better illustrate the impact at higher iteration numbers:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 5, 10, 20, 50, 100]\n",
"[2.9204455616016656, 2.0695085222669265, 1.79815897170536, 1.594156705081269, 1.43308397524522, 1.3878383528812235]\n"
]
}
],
"source": [
"params = [1, 5, 10, 20, 50, 100]\n",
"\n",
"metrics = [evaluate(train, test, param, 0.01, 0.0, 'l2', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we will use the matplotlib library to plot a graph of the RMSLE metric against the number of iterations. We will use a log scale for the x axis to make the output easier to visualize:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEPCAYAAAC5sYRSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl4VOXdxvHvb7KybwmQsBiQRRCIQFgUqYhWqYJL3SqKBQVq5VWstrVv32ptaxf3jbqwiQu4VKkKVq0LuLMEJAiCAoqCQQiyL9mf9485YIhZJmGSM5ncn+vKxWTOM+fcSU5uTs7MPMecc4iISHQJ+B1ARETCT+UuIhKFVO4iIlFI5S4iEoVU7iIiUUjlLiIShSotdzNLNLMlZpZlZqvN7E9ljEkws2fNbL2ZLTaztJoIKyIioQnlyD0PGO6cSwdOAEaY2eBSY64CdjrnugD3AreHN6aIiFRFpeXugvZ5n8Z5H6Xf+XQu8Lh3+3ngNDOzsKUUEZEqCemcu5nFmNkKYBvwhnNucakh7YBNAM65QmA30CqcQUVEJHQhlbtzrsg5dwLQHhhoZr1KDSnrKP0H8xqY2UQzy/Q+JlY9roiIhMKqOreMmf0R2O+cu6vEfa8DtzrnPjKzWOBbINlVsPKkpCSXlpZWvdQiIvXUsmXLtjvnkisbF1vZADNLBgqcc7vMrAFwOj98wvRl4OfAR8CFwNsVFTtAWloamZmZlW1eRERKMLOvQhlXabkDKcDjZhZD8DTOc865+Wb2ZyDTOfcyMAN40szWAzuAn1Uzt4iIhEGl5e6cWwn0LeP+W0rczgUuCm80ERGpLr1DVUQkCqncRUSikMpdRCQKqdxFRKJQnSv3/MJi5i7fjK79KiJSvjpX7nOXb+aG57K4+7+f+x1FRCRihfI694hyyYAOZG3ezZQF60mIDXDtaV39jiQiEnHqXLmbGX89rxf5hcXc/cbnxMcG+MUpx/odS0QkotS5cgcIBIw7LuxDflExf391LQmxAcYO6eR3LBGRiFEnyx0gJmDcc3E6+YVF3DrvU+JjYxg9qKPfsUREIkKde0K1pLiYAA9e2o/hx7Xm/178hOeXbfY7kohIRKjT5Q4QHxvgocv6cXKXJH77fBYvZ2X7HUlExHd1vtwBEuNimDomgwFpLfnVsyt4bdUWvyOJiPgqKsodoEF8DDPHDuCEDs259umPeWvNVr8jiYj4JmrKHaBRQiyPjRtAj5Sm/PKp5bz7eY7fkUREfBFV5Q7QNDGOJ64cyLGtGzPxyUw+2vCd35FERGpd1JU7QPOG8Tx11UA6tGjIVY8vZdlXO/yOJCJSqyotdzPrYGYLzGyNma02s8lljGlmZvPMLMsbM65m4oauVeMEZk8YRNumiYyduZSsTbv8jiQiUmtCOXIvBG50zvUABgOTzKxnqTGTgE+dc+nAMOBuM4sPa9JqaN0kkdkTBtG8URxXzFzC6uzdfkcSEakVlZa7c26Lc265d3svsAZoV3oY0MTMDGhM8CLZhWHOWi0pzRowZ/xgGsXHMGbGEj7futfvSCIiNa5K59zNLI3gxbIXl1o0BegBZAOfAJOdc8VhyBcWHVo2ZM6EwcTFGKOnLWZDzj6/I4mI1KiQy93MGgMvANc75/aUWnwmsAJIBU4ApphZ0zLWMdHMMs0sMyendl+mmJbUiNnjBwOO0dMW8dV3+2t1+yIitSmkcjezOILFPts5N7eMIeOAuS5oPfAlcFzpQc65qc65DOdcRnJy8tHkrpYurRsze/xg8guLGT1tMZt3Hqj1DCIitSGUV8sYMANY45y7p5xhXwOneePbAN2BL8IVMpy6t23Ck1cNYm9uAZdNX8y3u3P9jiQiEnahHLkPAcYAw81shfdxlpldbWZXe2P+ApxkZp8AbwE3Oee211Dmo9arXTMev3Ig3+3LZ/T0ReTszfM7kohIWJlfF5rOyMhwmZmZvmz7kKUbd3DFjCV0bNmQpycOpmUj31+9KSJSITNb5pzLqGxcVL5DNVQD0loy4+cZbPxuP2NmLGb3gQK/I4mIhEW9LneAk7ok8eiY/qzbuo8rHlvC3lwVvIjUffW+3AGGdW/NPy/rx+pvdjPusaXsz4uI91+JiFSbyt3z455teODSviz/eifjH8/kYH6R35FERKpN5V7CWb1TuPeSE1j05XdMfDKT3AIVvIjUTSr3Us49oR23X9CH99Zt53/mLCe/MGJmURARCZnKvQwXZ3TgtvN68eaabUx+5mMKi1TwIlK3qNzLcfngY7h5ZE9eXfUtN/4ri6Jif94PICJSHbF+B4hkV53cifzCYm5/bS3xMQFuv6APgYD5HUtEpFIq90r8ctix5BUWcd+b64iPDXDbeb0ITrcjIhK5VO4hmHxaV/IKi3l44QbiYwPcMrKnCl5EIprKPQRmxm/P7E5eQTEzP/iShNgYbhrRXQUvIhFL5R4iM+PmkT3ILyrikXc2kBAb4Fc/7uZ3LBGRMqncq8DM+PM5vcgrKOb+t4Ln4Ced2sXvWCIiP6Byr6JAwPjHBX0oKCrmztc/IyE2wPihnf2OJSJyBJV7NcQEjLsuSie/qJjbXllDQmyAMSem+R1LROQwlXs1xcYEuP9nfckvXM7NL60mITaGiwd08DuWiAgQ2jVUO5jZAjNbY2arzWxyOeOGeZfgW21m74Q/auSJiwnwz8v6ckq3ZG6au5IXP/7G70giIkBo0w8UAjc653oAg4FJZtaz5AAzaw48BJzjnDseuCjsSSNUQmwMj47pz+BOrbjhuRW8snKL35FERCovd+fcFufccu/2XmAN0K7UsNHAXOfc1964beEOGskS42KYMTaD/se0YPIzH/PGp1v9jiQi9VyVJg4zszSgL7C41KJuQAszW2hmy8zsivDEqzsaxscyc+wAjm/XjEmzl7Pws3r1/5uIRJiQy93MGgMvANc75/aUWhwL9AfOBs4EbjazH7zDx8wmmlmmmWXm5OQcRezI1CQxjifGDaRrm8b84sllfLB+u9+RRKSeCqnczSyOYLHPds7NLWPIZuA159x+59x24F0gvfQg59xU51yGcy4jOTn5aHJHrGYN43jyqkGktWrE+MczWfLlDr8jiUg9FMqrZQyYAaxxzt1TzrCXgKFmFmtmDYFBBM/N10stG8Xz1PhBpDRPZNxjS1j+9U6/I4lIPRPKkfsQYAww3Hup4wozO8vMrjazqwGcc2uA14CVwBJgunNuVY2lrgOSmyQwZ/xgkpok8POZS1j1zW6/I4lIPWLO+XOFoYyMDJeZmenLtmvTN7sOcvEjH7E/v5BnJg7muLZN/Y4kInWYmS1zzmVUNk6X2ath7Zo34OkJg0mMjeGyaYtZv22v35FEpB5QudeCjq0aMmfCIMyM0dMWs3H7fr8jiUiUU7nXks7JjZkzYRCFxY7R0xaxaccBvyOJSBRTudeibm2a8NRVg9ifX8To6YvI3nXQ70giEqVU7rWsZ2pTnrhyILv2F3DZ9MVs25PrdyQRiUIqdx+kd2jOrCsHsHVPLqOnL2b7vjy/I4lIlFG5+6T/MS2ZOXYAm3ce4PLpi9l1IN/vSCISRVTuPhrcuRXTrsjgi+37GTNjCXtyC/yOJCJRQuXus6Fdk3nk8n6s/XYPY2cuYV9eod+RRCQKqNwjwPDj2vDgpf3I2rybK2ct5WB+kd+RRKSOU7lHiBG92nLfJSeQuXEHE57IJLdABS8i1adyjyCj0lO588J0PtiwnV8+tYy8QhW8iFSPyj3CXNC/PX89rzcLPsvh2jkfU1BU7HckEamDVO4RaPSgjtw6qif//XQr1z+7gkIVvIhUUazfAaRsY4d0Ir+omL/9Zy0JMQHuuiidQMD8jiUidYTKPYJN/NGx5BUUc/cbnxMfG+Bv5/dWwYtISFTuEe7a07qSV1jMlAXrSYgNcOs5xxO88qGISPlCuYZqBzNbYGZrzGy1mU2uYOwAMysyswvDG7N+u/GMbkwY2onHP/qKv/1nDX5dPUtE6o5QjtwLgRudc8vNrAmwzMzecM59WnKQmcUAtwOv10DOes3M+P1ZPcgvLGbae1+SGBfDjWd09zuWiESwSsvdObcF2OLd3mtma4B2wKelhl4LvAAMCHdICRb8H0cdT35RMQ++vZ74mADXntbV71giEqGqdM7dzNKAvsDiUve3A84HhqNyrzGBgPHX83of8STrL0451u9YIhKBQi53M2tM8Mj8eufcnlKL7wNucs4VVfRkn5lNBCYCdOzYsepphUDAuOPCPuQXFfP3V9eSEBtg7JBOfscSkQgTUrmbWRzBYp/tnJtbxpAM4Bmv2JOAs8ys0Dn3YslBzrmpwFSAjIwMPStYTbExAe695ATyC4u5dd6nxMfGMHqQ/rMUke+F8moZA2YAa5xz95Q1xjnXyTmX5pxLA54Hrild7BJecTEBHhzdl1O7J/N/L37C88s2+x1JRCJIKNMPDAHGAMPNbIX3cZaZXW1mV9dwPqlAQmwMD1/enyHHJvHb57N4OSvb70giEiFCebXM+0DI75pxzo09mkBSNYlxMUy9oj9jH1vKr55dQXxMgBG92vodS0R8ponDokDD+Fhmjh1An/bNuPbp5by9dqvfkUTEZyr3KNE4IZZZ4wZyXNumXP3Uct5bl+N3JBHxkco9ijRrEMeTVw2kc1IjJjyRyaIvvvM7koj4ROUeZZo3jGf2+EF0aNGQK2ctZdlXO/yOJCI+ULlHoVaNE5g9fhBtmiYyduZSsjbt8juSiNQylXuUat00kTkTBtG8URxXzFzC6uzdfkcSkVqkco9iKc0aMGf8YBrFxzBmxhI+37rX70giUktU7lGuQ8uGzJ4wmNiAMXraYr7I2ed3JBGpBSr3eqBTUiPmTBiEc47R0xbz9XcH/I4kIjVM5V5PdGndhKfGDyK3sIhLpy3im10H/Y4kIjVI5V6P9EhpylNXDWJPbgGjpy3i2925fkcSkRqicq9nerVrxhNXDmT73jxGT19Ezt48vyOJSA1QuddDfTu24LFxA9myK5fLpy9mx/58vyOJSJip3OupgZ1aMuPnGWz8bj9jZixm94ECvyOJSBip3Ouxk7ok8eiY/qzbuo8rHlvC3lwVvEi0ULnXc8O6t2bK6L6s/mY34x5byv68Qr8jiUgYhHKZvQ5mtsDM1pjZajObXMaYy8xspffxoZml10xcqQlnHN+W+3/Wl+Vf72T845nkFhT5HUlEjlIoR+6FwI3OuR7AYGCSmfUsNeZL4BTnXB/gL3gXwZa64+w+Kdx9cTqLvvyOiU8uI69QBS9Sl1Va7s65Lc655d7tvcAaoF2pMR8653Z6ny4C2oc7qNS88/u25x8/7c27n+cwafZy8guL/Y4kItVUpXPuZpYG9AUWVzDsKuDV6kcSP10yoCN/Ofd43lyzjcnPfExhkQpepC4KudzNrDHwAnC9c25POWNOJVjuN5WzfKKZZZpZZk6OLgMXqcacmMYfzu7Bq6u+5cZ/ZVFU7PyOJCJVFBvKIDOLI1jss51zc8sZ0weYDvzEOVfm9d2cc1PxzsdnZGSoMSLY+KGdyS8q5o7XPiM+JsDtF/QhEDC/Y4lIiCotdzMzYAawxjl3TzljOgJzgTHOuc/DG1H8cs2wLuQVFHP/W+uIjw1w23m9CO4OIhLpQjlyHwKMAT4xsxXefb8HOgI45x4BbgFaAQ95v/yFzrmM8MeV2nb96V3JKyzmkXc2EB8b4JaRPVXwInVApeXunHsfqPC32Tk3HhgfrlASOcyMm0Z0J6+wiMc+2EhCbAw3jeiugheJcCGdc5f6zcy4ZWRP8r0j+MS4ANef3s3vWCJSAZW7hMTM+Mu5vcgrLOa+N4Pn4K8Z1sXvWCJSDpW7hCwQMG6/oA/5hd+/imb80M5+xxKRMqjcpUpiAsY9F6dTUFTMba+sISE2wJgT0/yOJSKlaFZIqbLYmAD3/6wvpx3XmptfWs1zSzf5HUlESlG5S7XExwb452X9GNo1iZvmruTFj7/xO5KIlKByl2pLjIth6pgMBndqxQ3PreCVlVv8jiQiHpW7HJUG8TFM/3kG/Tq2YPIzH/PGp1v9jiQiqNwlDBolxPLYuAEc364Zk2YvZ+Fn2/yOJFLvqdwlLJokxvHEuIF0ad2YXzy5jA/Xb/c7kki9pnKXsGnWMI6nxg/imFYNuerxTJZu3OF3JJF6S+UuYdWyUTyzxw8mpXki4x5bypIvVfAiflC5S9glN0lgzvjBJDdJYPS0RUx/7wuc0/T9IrVJ5S41om2zRF6cNIThx7XmtlfWcPVTy9iTW+B3LJF6Q+UuNaZZgzgeHdOf/zurB2+u2caoB99ndfZuv2OJ1Asqd6lRZsaEH3XmmYmDyS0o4vyHPuTZpV/rNI1IDVO5S60YkNaSV64byoC0Ftz0wif8+l8rOZhf5HcskahVabmbWQczW2Bma8xstZlNLmOMmdkDZrbezFaaWb+aiSt1WVLjBJ64chDXDe/C3I83c/5DH/BFzj6/Y4lEpVCO3AuBG51zPYDBwCQz61lqzE+Art7HRODhsKaUqBETMG44ozuPjR3A1j25nDPlA81JI1IDKi1359wW59xy7/ZeYA3QrtSwc4EnXNAioLmZpYQ9rUSNYd1b88p1Q+napjGT5izn1pdXk19Y7HcskahRpXPuZpYG9AUWl1rUDig5qfdmfvgfgMgRUps34NmJJ3LlkE7M+nAjFz/6Ed/sOuh3LJGoEHK5m1lj4AXgeufcntKLy3jID14OYWYTzSzTzDJzcnKqllSiUnxsgFtG9eShy/qxfts+zn7gPU08JhIGIZW7mcURLPbZzrm5ZQzZDHQo8Xl7ILv0IOfcVOdchnMuIzk5uTp5JUqd1TuFl/9nCG2bJjJu1lLu/u9nFBXr5ZIi1RXKq2UMmAGscc7dU86wl4ErvFfNDAZ2O+f0LJlUSefkxvz7miFc2K89D769njEzFpOzN8/vWCJ1UihH7kOAMcBwM1vhfZxlZleb2dXemP8AXwDrgWnANTUTV6Jdg/gY7rwonTsu6MOyr3Zy9gPvafIxkWowv94pmJGR4TIzM33ZttQNn2bv4ZrZy9i08yC/PbM7E3/UmeAfkiL1l5ktc85lVDZO71CViNUztSkvX3syZ/Rsw99fXcvEJ5ex+6AmHxMJhcpdIlrTxDgeuqwft4zsyYK12xj54Hus+kaTj4lURuUuEc/MuPLkTjz7ixMpLHL89OEPmbNYk4+JVETlLnVG/2Na8Mp1QxnUqSW///cn3PBcFgfyC/2OJRKRVO5Sp7RsFM+scQP51endeHHFN5z3zw9Yv02Tj4mUpnKXOicmYEw+vStPXDmQ7fvyOWfK+7yc9YP3zInUayp3qbOGdk3mletOpkdKU657+mNueWkVeYWaI14EVO5Sx6U0a8AzEwczYWgnnvjoKy5+5CM27TjgdywR36ncpc6Liwnwf2f35JHL+/NFzn5GPvg+b63Z6ncsEV+p3CVqjOjVlvnXnUy75g246vFM7nhtLYVFmiNe6ieVu0SVY1o1Yu41J3HpwA48tHADl01fzLa9uX7HEql1KneJOolxMfz9p324+6J0sjbv4uwH3mfRF9/5HUukVqncJWpd0L89L04aQpOEWEZPW8RDC9dTrDnipZ5QuUtUO65tcPKxn/RO4Y7XPmPCE5nsOpDvdyyRGqdyl6jXOCGWKZf25U/nHM+763I4+4H3Wbl5l9+xRGqUyl3qBTPj5yel8dwvTgTgwoc/4slFX2nyMYlaKnepV/p2bMH8a0/mpC6tuPnFVUx+ZgX78zT5mESfUK6hOtPMtpnZqnKWNzOzeWaWZWarzWxc+GOKhE+LRvHM/PkAfnNmd+avzObcf37Auq17/Y4lElahHLnPAkZUsHwS8KlzLh0YBtxtZvFHH02k5gQCxqRTu/DU+EHsOpDPOVM+4N8fb/Y7lkjYVFruzrl3gYquUOyAJha8uGVjb6z+zpU64aRjk3jluqH0bteMXz2bxe///Qm5BZp8TOq+cJxznwL0ALKBT4DJzjm951vqjDZNE5kzYRBXn3IscxZ/zYWPfMhn3+o0jdRt4Sj3M4EVQCpwAjDFzJqWNdDMJppZppll5uTkhGHTIuERGxPgdz85jmlXZLB550F+cv+73PziKnbs12vipW4KR7mPA+a6oPXAl8BxZQ10zk11zmU45zKSk5PDsGmR8PpxzzYsuHEYYwYfw5wlXzPszgXMfP9LCjQBmdQx4Sj3r4HTAMysDdAd+CIM6xXxRYtG8fzp3F68Onko6R2a8+f5nzLivndZ8Nk2v6OJhMwqexOHmT1N8FUwScBW4I9AHIBz7hEzSyX4ipoUwIB/OOeeqmzDGRkZLjMz82iyi9Q45xxvr93Gba+s4cvt+xnWPZk/nN2TLq0b+x1N6ikzW+acy6h0nF/v0FO5S12SX1jM4x9u5IG31nGwoIgxJx7D9ad1o1nDOL+jST0TarnrHaoiIYiPDTDhR51Z8JthXJTRgcc/3Miwuxbw5EcbdUEQiUgqd5EqSGqcwN9/2pv51w6le9sm3PzSas5+4H3eX7fd72giR1C5i1RDz9SmPD1hMI9c3o8DBYVcPmMx4x/PZOP2/X5HEwFU7iLVZmaM6JXCG786hd+O6M5HG7bz43vf4W//WcOe3AK/40k9p3IXOUqJcTFcM6wLC349jPNOaMe0975g+F0LeXrJ1xTpyk/iE5W7SJi0bprInRel8/Kkk0lr1Yj/nfsJox7U9VvFHyp3kTDr3b4Z/7r6RB68tC+7Dxbws6mLuGb2MjbtOOB3NKlHVO4iNcDMGJWeyls3nsINP+7GgrU5nHbPO9zx2lr26eIgUgtU7iI1KDEuhutO68rbvz6Fs3un8NDCDQy/ayHPL9tMsc7HSw1SuYvUgpRmDbj3khOYe81JpDZvwK//lcX5D33Asq8qulSCSPWp3EVqUb+OLZj7y5O495J0vt2TywUPf8R1T39M9q6DfkeTKKNyF6llgYBxft/2LPj1MK4b3oXXV3/L8LsXcu8bn3MwX1eBkvBQuYv4pGF8LDec0Z23bjyF03u04f631jH87oW8tOIb/JrQT6KHyl3EZ+1bNGTK6H4894sTadU4nsnPrOCChz8ka9Muv6NJHaZyF4kQAzu15OVJJ3PHBX34esdBzv3nB9zw3Aq27sn1O5rUQSp3kQgSCBgXD+jAwt8M45fDjmV+1hZOvWshU95eR26BzsdL6FTuIhGocUIsN404jjdvOIWhXZO467+fc9rd7/DKyi06Hy8hqbTczWymmW0zs1UVjBlmZivMbLWZvRPeiCL1V8dWDXl0TAZzJgyiSWIsk+Ys55Kpi1j1zW6/o0mEC+XIfRYworyFZtYceAg4xzl3PHBReKKJyCEnHZvEK9cN5a/n92L9tn2MmvI+Nz2/kpy9eX5HkwhVabk7594FKnob3WhgrnPua2+8LhEvUgNiAsZlg45hwa+HcdWQTrywfDOn3rWQR9/ZQF6hzsfLkcJxzr0b0MLMFprZMjO7IgzrFJFyNGsQxx9G9uS/v/oRgzq15O+vruWMe9/ltVVbNH+8HBYbpnX0B04DGgAfmdki59znpQea2URgIkDHjh3DsGmR+qtzcmNmjB3Au5/n8Jf5n3L1U8tJapzAyD4pjOyTQr+OLQgEzO+Y4hML5Zl3M0sD5jvnepWx7HdAonPuVu/zGcBrzrl/VbTOjIwMl5mZWY3IIlJaYVEx//10K/Oysnl77TbyCotJbZbIyPRURvVJpVe7ppip6KOBmS1zzmVUNi4cR+4vAVPMLBaIBwYB94ZhvSISotiYAGf1TuGs3insyyvkTa/oH/vgS6a++wVprRoyKj2VUempdGvTxO+4UgsqPXI3s6eBYUASsBX4IxAH4Jx7xBvzG2AcUAxMd87dV9mGdeQuUvN2Hcjn9dXfMi9rCx9u2E6xg25tGjOqTyoj01PplNTI74hSRaEeuYd0WqYmqNxFalfO3jxeW7WFeVlbWLIx+AK43u2aBc/Rp6fSrnkDnxNKKFTuIlKu7F0H+c8nW5i3csvhCcr6H9OCUX1SOKtPCq2bJPqcUMqjcheRkHz93QHmrcxmXlY2a7/dS8BgcOdWjEpPZcTxbWnRKN7viFKCyl1Eqmzd1r3MW7mF+VnZfLF9P7EB4+SuSYzqk8qPj29D08Q4vyPWeyp3Eak25xyfbtnDvKwtzMvK5ptdB4mPDTCsWzKj0lM5rUdrGsaH48V2UlUqdxEJC+ccH2/axbysbF5ZuYVte/NoEBfD6T3bMKpPCqd0TyYhNsbvmPWGyl1Ewq6o2LF04w7mZWXz6qpv2bE/nyaJsZx5fFtG9klhSJck4mI0k3hNUrmLSI0qKCrmww3fMT8rm9dWf8ve3EJaNIzjJ71TGNUnlYGdWhKj6Q/CTuUuIrUmr7CIdz/fzrysbN5cs5UD+UW0bpLAWb1TGJWeSr+OzTX9QZio3EXEFwfzi3h77bbgPDefbSO/sJh2zRswMj14RH98qua5ORoqdxHx3d7cAt7w5rl5b912CosdnZMaMbJP8Ii+q+a5qTKVu4hElJ37vXluVmbz0YbvKHZwXNsmjEpPZWSfFI5ppXluQqFyF5GItW1vLq9+8i3zsrLJ/GonAOntmzGyTypn90khVfPclEvlLiJ1Qvaug7yycgvzVmazcnPwwt8D0lowKj2Vn/RKIblJgs8JI4vKXUTqnI3b9zN/ZTbzsrbw2dbgPDcnHtuKUX1SGdGrLc0bap4blbuI1Gmfb93L/Kxs5q3cwpfePDc/6pbMqPQUTu/Rhib1dJ4blbuIRAXnHKuz9zAvK5v5K7fwza6DJMQGOLV7a4Z0TaJDiwa0b9GQ9i0akBgX/dMgqNxFJOoUF5eY5+aTLeTszTtieXKTBDq0aECHlsGy79Ci4eHbqc0bRMXUCGErdzObCYwEtpV1gewS4wYAi4BLnHPPV7ZhlbuIHI3iYkfOvjw27TjA5p0H2bTjAJt2erd3HiB7Vy5Fxd/3W8CgbdNE2rdsSAfvSP/wfwItG9K2aWKdmC4hnBfIngVMAZ6oYGMxwO3A66EGFBE5GoGA0aZpIm2aJpKR9sPlhUXFfLsnl007Dh4u/c3efwQfbtjOt3tyKXlsGxtRAfI+AAAK60lEQVQwUps3oEPLBrRv3pAOLY/8CyC5SUKdemdtpeXunHvXzNIqGXYt8AIwIAyZRESOWmxMwDsX35ATafWD5XmFRWzZlcumnQfYtOMgm3ceYNPO4L9vrd3G9n1HnvJJiA3Qziv6Q0f7JW+3aBgXUeV/1LPtm1k74HxgOCp3EakjEmJjSEtqRFpS2e+MPZhfxDe7Dhxx5H/oFFDW5l3sOlBwxPhG8TG0bxE84m9fxmmf2r6KVTgupXIfcJNzrqiy/7XMbCIwEaBjx45h2LSISM1oEB9Dl9ZN6NK67Plv9uYWlDjX7x35e38BfLThO/bnFx0xvmli7OGj/bO9uXVqUjjKPQN4xiv2JOAsMyt0zr1YeqBzbiowFYJPqIZh2yIivmiSGEePlDh6pDT9wTLnHLsOFBx+cvfwk747D7A+Zx/Zuw7WeL6jLnfnXKdDt81sFjC/rGIXEakvzIwWjeJp0Sie3u2b+ZKh0nI3s6eBYUCSmW0G/gjEATjnHqnRdCIiUi2hvFrm0lBX5pwbe1RpREQkLOr+27VEROQHVO4iIlFI5S4iEoVU7iIiUUjlLiIShVTuIiJRyLf53M0sB9gF7C5nSLMKliUB22siVw2r6GuK5G0dzbqq+thQx4cyrrIx0baPaf8K3/hI3r+Occ4lVzrKOefbBzC1mssy/cxdE19vJG/raNZV1ceGOj6UcZWNibZ9TPtX+MZHw/7l92mZedVcVlfV5tcUzm0dzbqq+thQx4cyrrIx0baPaf8K3/g6v3/5dlrmaJhZpgvhSiQi1aV9TGpSbexffh+5V9dUvwNI1NM+JjWpxvevOnnkLiIiFaurR+4iIlIBlbuISBRSuYuIRKGoKHcza2Rmj5vZNDO7zO88El3MrLOZzTCz5/3OItHJzM7z+uslMzsjHOuM2HI3s5lmts3MVpW6f4SZfWZm683sd97dPwWed85NAM6p9bBS51Rl/3LOfeGcu8qfpFJXVXEfe9Hrr7HAJeHYfsSWOzALGFHyDjOLAf4J/AToCVxqZj2B9sAmb9iRlxwXKdssQt+/RKpjFlXfx/7gLT9qEVvuzrl3gR2l7h4IrPeOpPKBZ4Bzgc0ECx4i+GuSyFHF/Uukyqqyj1nQ7cCrzrnl4dh+XSvCdnx/hA7BUm8HzAUuMLOHib63lEvtKXP/MrNWZvYI0NfM/tefaBIlyuuwa4HTgQvN7OpwbKjSC2RHGCvjPuec2w+Mq+0wEnXK27++A8LyCyf1Xnn72APAA+HcUF07ct8MdCjxeXsg26csEn20f0lNq7V9rK6V+1Kgq5l1MrN44GfAyz5nkuih/UtqWq3tYxFb7mb2NPAR0N3MNpvZVc65QuB/gNeBNcBzzrnVfuaUukn7l9Q0v/cxTRwmIhKFIvbIXUREqk/lLiIShVTuIiJRSOUuIhKFVO4iIlFI5S4iEoVU7hHEzJyZPVni81gzyzGz+ZU87gQzO6uC5RlmdlRvbTazZDNbbGYfm9nQo1lXuJnZn83sdL9zVMTMZpnZhbWwnYvMbI2ZLSh1f+qh+egr21+qsc3mZnZNWdsS/6jcI8t+oJeZNfA+/zHwTQiPOwEo85fVzGKdc5nOueuOMttpwFrnXF/n3HuhPMCb3jQszKzceZCcc7c4594M17YiTRW/j1cB1zjnTi15p3Mu2zl36D+XcveXCjJUNA9Vc+BwuZfalvjFOaePCPkA9gF/Ay70Pn8CuAmY733eCJhJ8C3MHxOcjjYe+BrIAVYQnOj/VmAq8F9gDjCsxDoaA48BnwArgQuAGIJzT6/y7v9VqVwnlNpGA+BSb+wq4PZSX8OfgcXAySXu7wEsKfF5GrDSu32L9zWt8nIfenPdQu/78Q7wR+BLIM5b1hTYCMR52Q99zzYCfwKWe/mO8+5PBt7w7n8U+ApIKudn8FcgC1gEtPHuP7yNQ+O8f4d5+Z4DPgf+AVwGLPG2f2yJxz8CvOeNG+ndHwPc6X39K4FflFjvAu/n92kZOX/w/fe+j/uAz4A7S41P88aWtb/8YL/yHjMW+BfBmVbfJrjvvFXie3to3DPAQW99dx7alrcske/3t4+BU0usey7wGrAOuKPE92MW5eyL+qhCn/gdQB8lfhjBX8w+wPPeL8UKjizmvwGXe7ebeyXRyPtFmVJiPbcCy4AG3ucl13E7cF+JsS2A/sAbJe5rXka2w9sAUr2CSCY4s+jbwHneMgdcXM7XtwLo7N2+CfiDd7tliTFPAqO82wuBh0ose6zEdiYCd3u3Z3FkuV/r3b4GmO7dngL8r3d7hJezrHJ3JbZ/R4mMh7dx6GdV4nu7C0gBEgj+pfUnb9nkQ99r7/GvEfxruSvBCaQSva/j0DYSgEygk7fe/UCnMjJW9P1fCGSU8Zg0vi/cwz/LEParzYd+Pt62mnq3k4D1BGc5PLzuMrZ1I/CYd/s4L3eit+4vgGbe518RnFCr0n1RH6F96LRMhHHOrST4y3Ep8J9Si88AfmdmKwj+EicCHctZ1cvOuYNl3H86Ja704pzbSfCXrLOZPWhmI4A9lcQcACx0zuW44FwZs4EfecuKgBfKedxzwMXe7UuAZ73bp3rn8z8BhgPHl3jMsyVuT+f7qZ3HESz7ssz1/l1G8HsJcDLBI0ycc68BO8t5bD5w6DmOko+vyFLn3BbnXB6wgeBfTBA88iz5+Oecc8XOuXUEv+fHEfyZXuH9TBcDrQiWPwT/0vmyjO1V9P2vjor2qzecc4cuOGHA38xsJfAmwXnI21Sy7pMJ/oeNc24twRLv5i17yzm32zmXC3wKHEPV90UpR12bz72+eBm4i+DRW6sS9xtwgXPus5KDzWxQGevYX866jeDR6WHOuZ1mlg6cCUwiWMBXVpCvrDmpD8l1zpV3qcNngX+Z2dzgZt06M0sEHiJ4tLnJzG4lWC4/+Dqccx+YWZqZnQLEOOeOuDZlCXnev0V8v49XlLmkAucdMpZ6fCHec1RmZgRPb5TeHkBxic+LOfJ3rPRETs7Lda1z7vWSC8xsGBX/DMOpov2qZIbLCP610N85V2BmGznyZ1XeustT8vtWBMRWY1+UcujIPTLNBP7snPuk1P2vA9d65YKZ9fXu3ws0CXHd/yU4Kx3eOlqYWRIQcM69ANwM9KtkHYuBU8wsyXuy71KC550r5JzbQPCX+Ga+PyI/VA7bzawxUNkTcU8AT1P+UXt53sf7q8G7unyLKj5+I8FTBhB8riOuio8HuMjMAmZ2LNCZ4Lnx14Ffmlmcl62bmTWqZD3V+v6XUHp/KW+/Kq0ZsM0r9lMJHmmXtb6S3iX4nwJm1o3gXwSflTOWauyLUg6VewRyzm12zt1fxqK/ECyVld4V1f/i3b8A6GlmK8yssiun3wa0MLNVZpYFnErwz+uF3p/ls4AKLyXnnNvijVlA8InH5c65l0L76ngWuJzgKRqcc7uAaQRPYbxI8Em9iswmWMxPh7i9Q/4EnGFmywlenHgLwVIK1TSChboEKH1EG6rPCJbwq8DV3umI6QRPSSz3fqaPUslf1Ef5/Ycf7i/l7VelzQYyzCyTYGGv9fJ8B3zg7VN3lnrMQ0CMd8rtWWCsd/qqPFXaF6V8mvJX6hTvteLnOufGVPFxCUCRc67QzE4EHnbOnVAjIUUigM65S51hZg8SPOquzhtwOgLPmVmA4JOmE8KZTSTS6MhdRCQK6Zy7iEgUUrmLiEQhlbuISBRSuYuIRCGVu4hIFFK5i4hEof8HpfozdFs5+LIAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying number of iterations')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will perform a similar analysis for step size in the following code:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"params = [0.01, 0.025, 0.05, 0.1, 1.0]"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
}
],
"source": [
"metrics = [evaluate(train, test, 10, param, 0.0, 'l2', False) for param in params]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.01, 0.025, 0.05, 0.1, 1.0]\n",
"[1.79815897170536, 1.432660677663247, 1.3921046531899715, 1.463373357714063, nan]\n"
]
}
],
"source": [
"print (params)\n",
"print (metrics)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"why we avoided using the default step size when training the linear model originally. The default is set to 1.0, which, in this case, results in a nan output for the RMSLE metric. This typically means that the SGD model has converged to a very poor local minimum in the error function that it is optimizing. This can happen when the step size is relatively large, as it is easier for the optimization algorithm to overshoot good solutions.\n",
"\n",
"We can also see that for low step sizes and a relatively low number of iterations (we used 10 here), the model performance is slightly poorer. However, in the preceding Iterations section, we saw that for the lower step-size setting, a higher number of iterations will generally converge to a better solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Selecting the best parameter settings can be an intensive process that involves training a model on many combinations of parameter settings and selecting the best outcome. Each instance of model training involves a number of iterations, so this process can be very expensive and time consuming when performed on very large datasets.\n",
"\n",
"The output is plotted here, again using a log scale for the step-size axis:"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEOCAYAAACO+Hw9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8FfW9//HXJwtbgARIWBM2ZV8SJcUqLriCKy4Etbe3m79Se7Xe1t3WpS7VumA322u9vdbb5WoJKu6gdcO6J5iEXRCVhDXsOyHh8/vjDHiMCQnkJJPkvJ+PRx6czHzPzOecQ95n5jsz3zF3R0RE4kdC2AWIiEjTUvCLiMQZBb+ISJxR8IuIxBkFv4hInFHwi4jEGQW/iEicUfCLiMQZBb+ISJxR8IuIxJmksAuoSXp6uvfv3z/sMkREWozCwsL17p5Rn7bNMvj79+9PQUFB2GWIiLQYZvZ5fduqq0dEJM4o+EVE4oyCX0QkztQr+M3sUTNbZ2bza5mfambPmVmxmS0ws+9Gzfu2mS0Nfr4dq8JFROTw1HeL/zFg4kHmXwEsdPdsYDwwzczamFlX4DbgGGAscJuZdTn8ckVEpKHqFfzuPgfYeLAmQCczM6Bj0LYSmAC84u4b3X0T8AoH/wIREZFGFqs+/oeAYcAqYB7wn+6+D+gDlEa1KwumNYq3lpZTtmlnYy1eRKRViFXwTwCKgN5ADvCQmXUGrIa2Nd7k18ymmlmBmRWUl5cfcgGbd1Zw+V8LuX5GCfv26T7CIiK1iVXwfxd4yiOWAZ8CQ4ls4WdFtcskslfwFe7+iLvnuntuRka9Lj77krQObfjZ2cN555MN/O39el/HICISd2IV/CuAUwHMrAcwBFgOzAbOMLMuwUHdM4JpjeLSsVmcODiDe15czKfrdzTWakREWrT6ns75OPAuMMTMyszsMjO73MwuD5rcCRxnZvOAV4Eb3H29u28M5n0Y/NwRTGsUZsZ9F40mOdG4Nr+YKnX5iIh8hbk3v3DMzc31hozVM/Ojlfz4H0XceOZQLj/piBhWJiLSPJlZobvn1qdtq7xyd1JObyaO6MmDL3/MkjXbwi5HRKRZaZXBb2bcdcFIOrVL4urpReyt2hd2SSIizUarDH6A9I5t+cUFo1iwaiu/e21Z2OWIiDQbrTb4ASaO7MmFR/Xh968vo6Rsc9jliIg0C606+AFuO3cEGR3bcvX0YnbvrQq7HBGR0LX64E/tkMy9k0ezbN12pr28JOxyRERC1+qDH+CkwRn82zF9+dO/PuWDTxvtMgIRkRYhLoIf4KdnDSOzS3uuzS9mx57KsMsREQlN3AR/StskHpicTemmndzz0qKwyxERCU3cBD/AMQO7cdm4AfztvRXM+fjQRwAVEWkN4ir4Aa6dMIQju3fk+hklbNm1N+xyRESaXNwFf7vkRKblZVO+fQ+3P7cg7HJERJpc3AU/QHZWGleMP4Kn5q5k9oI1YZcjItKk4jL4Aa48ZRAjenfmZ0/PY8P2PWGXIyLSZOI2+NskJTBtSjZbd1Xys6fn0xyHpxYRaQx1Br+ZPWpm68xsfi3zrzOzouBnvplVmVnXYN5nZjYvmHf4A+w3kqE9O/OT0wcza8Eani2u8Y6QIiKtTn22+B8DJtY2093vd/ccd88BbgLerHaXrZOD+fW6QUBTm3riQI7qm8YtM+ezduvusMsREWl0dQa/u88B6jvOwaXA4w2qqIklJhgPTsmhomof188oUZePiLR6MevjN7MORPYMnoya7MDLZlZoZlNjta5YG5Cewk1nDuPNj8t54sPSsMsREWlUsTy4ey7wdrVunnHufjRwJnCFmZ1Y25PNbKqZFZhZQXl5019V++9f78dxR3TjrucXUrpxZ5OvX0SkqcQy+C+hWjePu68K/l0HPA2Mre3J7v6Iu+e6e25GRkYMy6qfhATjvsmjMTOuzS9m3z51+YhI6xST4DezVOAk4JmoaSlm1mn/Y+AMoMYzg5qLzC4duPWc4bz/6Ub+/M5nYZcjItIokupqYGaPA+OBdDMrA24DkgHc/eGg2QXAy+6+I+qpPYCnzWz/ev7P3WfFrvTGkZebyewFa7hv1mJOGpzBkd07hl2SiEhMWXM8iyU3N9cLCsI77X/d1t2c8es59OuWwpOXH0tSYtxe5yYiLYSZFdb3tHklWg26d27HnZNGUly6mYff/CTsckREYkrBX4tzs3tzzuhe/ObVpSxYtSXsckREYkbBfxB3ThpJavs2XDO9mD2VVWGXIyISEwr+g+iS0oZfXjiKxWu28dtXl4ZdjohITCj463Da8B7kjcnkv974hLkrNoVdjohIgyn46+HWc4fTK7U9104vZleFunxEpGVT8NdDp3bJ3Dd5NMvX7+C+2YvDLkdEpEEU/PU07sh0vn1sP/789me888n6sMsRETlsCv5DcMOZQ+nfrQPX5ZewbffesMsRETksCv5D0KFNEtOmZLN6yy5+8cKisMsRETksCv5DNKZfV6aeeARPfFjK64vXhV2OiMghU/Afhp+cPojBPTpyw5MlbN5ZEXY5IiKHRMF/GNomJfLglBw27qjg1mcWhF2OiMghUfAfppF9Urnq1EE8W7yKF0pWh12OiEi9Kfgb4Ifjj2B0Zio3z5xH+bY9YZcjIlIvdQa/mT1qZuvMrMa7Z5nZdWZWFPzMN7MqM+sazJtoZkvMbJmZ3Rjr4sOWnJjAtLxsdlRUcdNT82iO9zYQEamuPlv8jwETa5vp7ve7e4675wA3AW+6+0YzSwR+T+RG68OBS81seAxqblYG9ejE9ROG8M9Fa3ly7sqwyxERqVOdwe/uc4CN9VzepXxxw/WxwDJ3X+7uFcATwKTDqrKZ++64AYzt35Xbn13Aqs27wi5HROSgYtbHb2YdiOwZPBlM6gOURjUpC6a1OokJxv15o6ly5/oZJeryEZFmLZYHd88F3nb3/XsHVkObWhPRzKaaWYGZFZSXl8ewrKbRr1sKPz1rGP9atp6/vb8i7HJERGoVy+C/hC+6eSCyhZ8V9XsmsKq2J7v7I+6e6+65GRkZMSyr6fzbMX05YVA6d7+wiM/W7wi7HBGRGsUk+M0sFTgJeCZq8ofAIDMbYGZtiHwxPBuL9TVXZsZ9k0eTlGhcm19M1T51+YhI81Of0zkfB94FhphZmZldZmaXm9nlUc0uAF529wObue5eCVwJzAYWAdPdvdVf5tortT23nzeCgs838T//Wh52OSIiX2HN8UBkbm6uFxQUhF3GYXN3fvDXQt5YUs7zVx3P4B6dwi5JRFo5Myt099z6tNWVu43AzLj7wlF0bJfE1dOL2Fu1L+ySREQOUPA3kvSObbn7gpHMX7mV37++LOxyREQOUPA3ookje3F+Tm8eem0Z88q2hF2OiAig4G90t583km4d23D19CJ2760KuxwREQV/Y0vtkMy9F41m6brt/OqVj8MuR0REwd8Uxg/pzqVj+/LIW8sp+Ky+wx6JiDQOBX8T+dnZw+iT1p5r8ovZWVEZdjkiEscU/E2kY9skHsjLZsXGndzz4uKwyxGROKbgb0JfH9iN740bwF/f+5y3lra8gehEpHVQ8Dex6yYMYWBGCtfPKGHr7r1hlyMicUjB38TaJSfy4JQc1m3bwx3PLQy7HBGJQwr+EORkpfEf449gRmEZryxcG3Y5IhJnFPwh+dEpgxjWqzM3PVXCxh0VYZcjInFEwR+SNkkJPDglmy279nLzzHm6XaOINBkFf4iG9erMj08bzIvz1vBcyeqwyxGROKHgD9kPThzIUX3TuGXmfNZu3R12OSISB+pzB65HzWydmc0/SJvxZlZkZgvM7M2o6Z+Z2bxgXsu9s0ojSkpMYFpeNnsqq7jxyRJ1+YhIo6vPFv9jwMTaZppZGvAH4Dx3HwHkVWtysrvn1PfOMPFoYEZHbpg4lNeXlDO9oDTsckSklasz+N19DnCwkcW+ATzl7iuC9utiVFtc+fax/Tl2YDfueG4hpRt3hl2OiLRisejjHwx0MbM3zKzQzL4VNc+Bl4PpU2OwrlYrIcG4b/JozIzrZhSzb5+6fESkccQi+JOAMcDZwATgFjMbHMwb5+5HA2cCV5jZibUtxMymmlmBmRWUl8fnODZZXTtwyznDeG/5Rv733c/CLkdEWqlYBH8ZMMvdd7j7emAOkA3g7quCf9cBTwNja1uIuz/i7rnunpuRkRGDslqmKblZnDwkg1++tJhPyreHXY6ItEKxCP5ngBPMLMnMOgDHAIvMLMXMOgGYWQpwBlDrmUESYWb88qLRtEtO5JrpxVRW7Qu7JBFpZepzOufjwLvAEDMrM7PLzOxyM7scwN0XAbOAEuAD4E/uPh/oAfzLzIqD6S+4+6zGeiGtSY/O7bjz/JEUlW7mj3OWh12OiLQySXU1cPdL69HmfuD+atOWE3T5yKE7d3QvZs9fw6//+TGnDO3OsF6dwy5JRFoJXbnbTJkZd54/ktT2yVw9vZiKSnX5iEhsKPibsa4pbbjnwtEsWr2V3766NOxyRKSVUPA3c6cP78HkMZn84Y1lfLRiU9jliEgroOBvAW49dzg9O7fjmvxidu+tCrscEWnhFPwtQOd2ydw3OZvl5Tu4f/aSsMsRkRZOwd9CHD8onW8d249H3/6U95ZvCLscEWnBFPwtyI1nDqVv1w5cm1/M9j2VYZcjIi2Ugr8F6dAmiWl52azcvItfvLAo7HJEpIVS8Lcwuf27MvWEgTz+wQreWKIRsEXk0Cn4W6CfnD6YwT06csOTJWzZuTfsckSkhVHwt0DtkhOZlpfDhu0V3Pasxr0TkUOj4G+hRmWmcuUpRzKzaBUvzVsddjki0oIo+FuwK04+klF9UvnZzPms374n7HJEpIVQ8LdgyYkJTJuSzfY9lfz0qXm463aNIlI3BX8LN7hHJ649YzAvL1zL0x+tDLscEWkB6nMjlkfNbJ2Z1XoU0czGm1mRmS0wszejpk80syVmtszMboxV0fJllx0/kNx+Xbjt2QWs3rIr7HJEpJmrzxb/Y8DE2maaWRrwB+A8dx8B5AXTE4HfE7nR+nDgUjMb3tCC5asSE4wH8rKprHKun1GiLh8ROag6g9/d5wAbD9LkG8BT7r4iaL//qqKxwDJ3X+7uFcATwKQG1iu16J+ewk/PHsZbS9fz9/dXhF2OiDRjsejjHwx0MbM3zKzQzL4VTO8DlEa1KwumSSP55jF9OWFQOne/uIjPN+wIuxwRaaZiEfxJwBjgbGACcIuZDQashra19kGY2VQzKzCzgvLy8hiUFX/MjHsvGk1ignFdfglV+9TlIyJfFYvgLwNmufsOd18PzCFyk/UyICuqXSawqraFuPsj7p7r7rkZGRkxKCs+9U5rz8/PHcEHn23k0X99GnY5ItIMxSL4nwFOMLMkM+sAHAMsAj4EBpnZADNrA1wCPBuD9UkdLjy6D6cP78H9Ly9h6dptYZcjIs1MfU7nfBx4FxhiZmVmdpmZXW5mlwO4+yJgFlACfAD8yd3nu3slcCUwm8gXwXR3X9BYL0S+YGbcfcEoUtokck1+MXur9oVdkog0I9YcT/3Lzc31goKCsMto8V6ct5r/+Ptcrj59MFedOijsckSkEZlZobvn1qetrtxtxc4a1YtJOb357atLmb9yS9jliEgzoeBv5W4/bwRdU9pw9fQi9lRWhV2OiDQDCv5WLq1DG+69aDQfr93Or15ZGnY5ItIMKPjjwMlDu3PJ17J4ZM4nFH5+sIuwRSQeKPjjxM3nDKd3WnuumV7MzorKsMsRkRAp+ONEx7ZJ3D85m8827OTelxaHXY6IhEjBH0eOPaIb3x3Xn/9993PeXrY+7HJEJCQK/jhz/YShDExP4foZJWzdvTfsckQkBAr+ONO+TSLTpmSzessu7nxuYdjliEgIFPxx6Ki+Xfjh+CPILyzjnwvXhl2OiDQxBX+cuurUQQzt2Ykbn5rHph0VYZcjIk1IwR+n2iYl8uCUHLbsquCWZ2q9nbKItEIK/jg2vHdnfnzaYJ4vWc1zxbXeKkFEWhkFf5z7wYkDyc5K45Zn5rNu6+6wyxGRJqDgj3NJiQlMy8tmV0UVNz01j+Y4TLeIxFZ9bsTyqJmtM7MaO4LNbLyZbTGzouDn1qh5n5nZvGC6Bthvpo7s3pEbJg7l1cXryC8oC7scEWlk9dnifwyYWEebt9w9J/i5o9q8k4Pp9bpBgITjO8f155gBXbnj+YWUbdoZdjki0ojqDH53nwNoSMdWLiHBeCAvG3fn+hkl7NunLh+R1ipWffzHmlmxmb1kZiOipjvwspkVmtnUGK1LGklW1w7cfM5w3vlkA3997/OwyxGRRhKL4J8L9HP3bOB3wMyoeePc/WjgTOAKMzuxtoWY2VQzKzCzgvLy8hiUJYfjkq9lMX5IBve8tIjl5dvDLkdEGkGDg9/dt7r79uDxi0CymaUHv68K/l0HPA2MPchyHnH3XHfPzcjIaGhZcpjMjHsvGk3bpESuyS+mSl0+Iq1Og4PfzHqamQWPxwbL3GBmKWbWKZieApwB6BLRFqBH53bcMWkEH63YzCNzloddjojEWFJdDczscWA8kG5mZcBtQDKAuz8MTAZ+aGaVwC7gEnd3M+sBPB18JyQB/+fusxrlVUjMnZfdm1nz1/CrVz7m5KEZDO3ZOeySRCRGrDlesJObm+sFBTrtP2wbtu9hwq/n0L1TO2ZeMY42SbreT6S5MrPC+p42r79kqVW3jm35xQWjWLh6Kw+9tjTsckQkRhT8clATRvTkwqP78Ps3PqG4dHPY5YhIDCj4pU63nTuC7p3ack1+Mbv3VoVdjog0kIJf6pTaPpl7LxrNsnXbeWD2krDLEZEGUvBLvZw4OINvfr0v//P2p7y/fEPY5YhIAyj4pd5uOnMYWV06cO2MYnbsqQy7HBE5TAp+qbeUtkk8kJdN2aZd3P3iorDLEZHDpOCXQzJ2QFe+f8JA/v7+Ct78WGMqibRECn45ZFefPjhy85YZJWzZuTfsckTkECn45ZC1S07kwSnZlG/fw+3PLQi7HBE5RAp+OSyjM9O48uQjeeqjlcyavybsckTkECj45bBdecqRjOjdmZ89PY/12/eEXY6I1JOCXw5bcmICD07JYdvuSm5+ej7NccA/EfkqBb80yJCenbj6jMHMWrCGZ4pWhV2OiNSDgl8a7PsnDGRMvy7c+sx81mzZHXY5IlIHBb80WGKCMS0vm71VzvVPlqjLR6SZqzP4zexRM1tnZjXeNtHMxpvZFjMrCn5ujZo30cyWmNkyM7sxloVL89I/PYWbzhrKnI/LefyD0rDLEZGDqM8W/2PAxDravOXuOcHPHQBmlgj8HjgTGA5cambDG1KsNG/fPKYf447sxl0vLGTFhp1hlyMitagz+N19DrDxMJY9Fljm7svdvQJ4Aph0GMuRFiIhwbhvcjaJZlw7o5h9+9TlI9IcxaqP/1gzKzazl8xsRDCtDxC9z18WTJNWrE9ae249dzgffLqRR9/+NOxyRKQGsQj+uUA/d88GfgfMDKZbDW1r3QQ0s6lmVmBmBeXlGvyrJZs8JpPThnXnvtlLWLZuW9jliEg1DQ5+d9/q7tuDxy8CyWaWTmQLPyuqaSZQ64ne7v6Iu+e6e25GRkZDy5IQmRl3XziKlDaJXDO9mMqqfWGXJCJRGhz8ZtbTzCx4PDZY5gbgQ2CQmQ0wszbAJcCzDV2ftAzdO7XjrvNHUVy2hf9645OwyxGRKEl1NTCzx4HxQLqZlQG3AckA7v4wMBn4oZlVAruASzxyInelmV0JzAYSgUfdXUM5xpGzR/di1oLe/ObVpZwyrDsjeqeGXZKIANYcL7bJzc31goKCsMuQGNi0o4Izfj2HbilteObKcbRNSgy7JJFWycwK3T23Pm115a40qi4pbbj3olEsXrON3/xzadjliAgKfmkCpwztwcW5WTz85icUfr4p7HJE4p6CX5rEzecMo1dqe67NL2ZXRVXY5YjENQW/NIlO7ZK5P280n67fwb2zFoddjkhcU/BLkznuiHS+c1x/HnvnM975ZH3Y5YjELQW/NKkbJg5lQHoK1+WXsG333rDLEYlLCn5pUu3bJPJAXjart+zirucXhV2OSFxS8EuTG9OvCz846Qj+UVDKa4vXhl2OSNxR8EsofnzaIIb27MQNT85j046KsMsRiSsKfglF26REpk3JZtOOCm59ViN5iDQlBb+EZkTvVP7z1EE8V7yK50tqHbhVRGJMwS+h+uH4I8jOTOWWmfNZt2132OWIhOaz9TuYNX9Nk6xLwS+hSkpMYNqUHHZWVPHTp+bRHAcNFGksO/ZUMr2glCkPv8v4B97guhnFVFQ2/v0r6hyWWaSxHdm9I9dNGMJdLyxiRmEZeblZdT9JpIVydz74dCP5hWW8OG81OyuqIte2TBjCRUdn0iap8bfHFfzSLHxv3ABeXriWO55byHFHptMnrX3YJYnE1MrNu3iqsIwZc8v4fMNOUtokcu7o3uTlZjKmXxeC+1k1ifrciOVR4BxgnbuPPEi7rwHvARe7+4xgWhUwL2iywt3Pa3jJ0holJBgPTM5m4m/mcMOMEv7yvbEkJDTdH4JIY9i9t4rZC9aQX1DG25+sxx2+PrArV50yiDNH9aRDm3C2veuz1seAh4C/1NbAzBKBe4ncbSvaLnfPOezqJK707daBm88ezk+fnsff3v+cbx3bP+ySRA6Zu1NUupn8wjKeK17Ftt2V9Elrz49OGcTkozPp261D2CXWHfzuPsfM+tfR7EfAk8DXYlCTxLFLx2Yxa8Ea7nlxMScMymBAekrYJYnUy7ptu3l67kpmFJaxdN122iYlcObInuTlZnHswG7Nag+2wfsZZtYHuAA4ha8GfzszKwAqgV+6+8yGrk9aNzPjvotGc8av3uTa/GKm/+BYEpvRH4xItIrKfby2eC35BWW88XE5Vfuco/umcc+Fozh7dC86t0sOu8QaxaKD6dfADe5eVcPBib7uvsrMBgKvmdk8d/+kpoWY2VRgKkDfvn1jUJa0VD1T23HHpJH8+B9F/Omt5fzgpCPCLknkSxau2kp+YSnPFK1i444Kundqy/dPGMjkMZkc2b1j2OXVKRbBnws8EYR+OnCWmVW6+0x3XwXg7svN7A3gKKDG4Hf3R4BHIHKz9RjUJS3YpJzezJq/hmkvf8z4Id0Z0rNT2CVJnNu0o4JnilaSX1jGglVbaZOYwGnDu5M3JosTBqWTlNhyLotqcPC7+4D9j83sMeB5d59pZl2Ane6+x8zSgXHAfQ1dn8QHM+OuC0Yy4VdzuHp6ETOvGEdyC/rDktahsmofc5aWk19Qxj8XrWVvlTOid2d+fu5wJuX0oUtKm7BLPCz1OZ3zcWA8kG5mZcBtQDKAuz98kKcOA/5oZvuIXCH8S3df2OCKJW6kd2zLLy4YxeV/K+Sh15bxk9MHh12SxIll67aTX1jK03NXsm7bHrqmtOGbX+9H3pgshvfuHHZ5DVafs3oure/C3P07UY/fAUYdXlkiERNH9uTCo/rw0OvLOG1YD0ZlpoZdkrRSW3fv5fni1eQXlvLRis0kJhgnD8lg8pgsThnavUmuqG0qunJXmr3bzh3BO59s4OrpRTz3o+Npl5wYdknSSuzb57y7fAP5BaXMWrCG3Xv3Mah7R3561lDOP6oP3Tu1C7vERqHgl2YvtUMy904ezbcf/YAHX/mYn541LOySpIUr3biT/MIyniwsY+XmXXRql8RFR2eSl5tFdmZqkw6fEAYFv7QIJw3O4BvH9OW/31rOiN6dGT+kO6ntm+c50tI87ayo5MV5a5hRWMp7yzdiBscfmc71E4cwYUTPuNqTVPBLi/Gzs4bx3icb+M8nigA4IiOFnKwu5PRN46isNIb27NSiTqmTxufuFHy+ifyCUl4oWc2Oiir6devANacP5sIxmXE7GKCCX1qMlLZJPH/V8cz9fDNFpZsoKt3MG0vW8eTcMgDaJScwqk8qOVlpB74Qeqe2a/W77fJVq7fs4qlg+IRP1++gQ5tEzhrVi7wxmYwd0DXu/09Yc7zxRW5urhcUFIRdhrQA7k7Zpl18VLqZohWRL4T5q7YeuJlF905tI18EfdPIyUpjdGYaHdtqe6c12r23ilcWriW/sIx/LS1nn8PYAV3JG5PJWaN6kdLKP3czK3T33Pq0bd3vhLR6ZkZW1w5kde3Aedm9gcj4KYtWb6WodPOBn5cXrgUgwWBQ904cFXwR5PRNY1D3ThoPqIVyd+at3EJ+QRnPFq9iy6699E5txxUnH8nkMZn066ZB/mqiLX6JC5t2VFBUtn+vIPKzZddeAFLaJDIqMzXSPZSVxlF90+jRuXWextdarN++h5kfrSS/oIwla7fRNimBCSN6kpebyXFHpMflF/mhbPEr+CUuuTufbdgZOVawYjMflW5m0eqt7K2K/D30Tm13oHsoJ6sLo/qk0r5N/Jz10RztrdrH64vXkV9YxuuL11G5z8nOSiNvTCbnZveO+7O81NUjUgczY0B6CgPSU7jgqEwg0ke8YFV0F9EmXpy3BoDEBGNoz07BF0Fkr2BgesdmNcZ6a7V4zVZmFJQxs2gl67dXkN6xLd87fgB5YzIZ1EOD9x0ObfGLHMT67Xu+1D1UXLqZbXsqAejULonszC++CHKy0ujWsW3IFbcOm3dW8GzxKvILypi3cgvJicapQ3uQl5vJiYMzNGBfDdTVI9JI9u1zlq/fzkdB91DRis0sWbuNqn2Rv6Osru0PHCvIyUpjRO/OcXVhUENU7XPeWlpOfmEZryxYS0XVPob16kzemEwm5fTWl2odFPwiTWhnRSXzV249cG3BRys2s3rLbgCSE43hvTpHnVLahf7dOsT9eeTRlpdvZ0ZhGU/NXcmarbtJ65DM+Tl9mDwmk5F9NChffSn4RUK2dutuPlrxxbGCkrIt7KyoAiCtQ/KBPYL9P2kdWua47odr+55KXiiJdOUUfL6JBIsMy5GXm8Wpw7rTNkl7SYdKwS/SzFTtcz5euy3yRRB8IXy8bhv7//wGpKd86VjB0J6dW9UwwBDpJnv/043kF5by0rw17NpbxcCMFPLGZHHh0X10Cm0DxTz4zexR4BxgnbuPPEi7rwHvARe7+4yPb7qyAAANcUlEQVRg2reBm4Mmd7n7/9a1PgW/xIPteyopKdsctWewmfJtewBok5TAyN6dvzQWUWaX9i2yi6hs006eLFzJjLmllG7cRae2SZyT3Zu83EyOykprka+pOWqM4D8R2A78pbbgN7NE4BVgN/Cou88ws65AAZH78jpQCIxx900HW5+CX+KRu7Nqy+4DQ098tGIz81ZuYU8w/ER6xzZR3UNdGJ2VSud2zfPc9V0VVcxesIb8wlLe+WQD7jDuyG7kjcliwoieuiaiEcT8PH53n2Nm/eto9iPgSeBrUdMmAK+4+8agsFeAicDj9VmvSDwxM/qktadPWnvOHt0LiFy0tGTNti+NRfTPReuC9nBkRscvjUU0pEd4I5S6O3NXbGZGYSnPF69m255Ksrq258enDuaiMX3I7NIhlLrkq2JyAZeZ9QEuAE7hy8HfByiN+r0smCYi9ZCcmMDIPqmM7JPKv3+9HwBbdu6luOyL7qFXg6tZAdonJzKqT+qXxiLqldq4Qw+v3bo7GAmzlE/Kd9A+OZEzR/Ukb0wWxwzoqovcmqFYXbn7a+AGd6+q1l9X0ydeY9+SmU0FpgL07ds3RmWJtD6pHZI5cXAGJw7OACJb2qUbd/FR0D1UVLqZP7/9GRVVkS6iHp3bHugeOqpvGqP6pDZ4pMo9lVW8umgd+QWlvPlxZCTM3H5duPeigZw9urdGQG3mYvXp5AJPBKGfDpxlZpVEtvDHR7XLBN6oaQHu/gjwCET6+GNUl0irZ2b07daBvt06MCknskO9p7KKRau3UbRiU6SbqHQzsxd8MULp4B5RI5RmdeHI7h3rNbDZ/JVbmFEYGT5h88699OzcjstPOoLJYzIZmNGxUV+nxE69T+cM+vifP9hZPUG7x4J2+w/uFgJHB7PnEjm4u/Fgy9DBXZHY27ijguLSzQe+CIpWbGLr7sjwEx3bJjE6M/WLg8d90w7caHzjjorISJiFZSxavZU2SQmcMbwHk8dkcsKgjLgcCbM5ivnBXTN7nMiWe7qZlQG3AckA7v5wbc9z941mdifwYTDpjrpCX0QaR9eUNpw8tDsnD+0ORM6r/3TDji+NRfTInOVUBsNP9ElrT9+uHSj4fCN7q5zRmancOWkE52b3jrsLzlobXcAlIgfs3lvF/JVbIkNPlG5mefkOxh3RjbzcLIb01EiYzZmGZRaRw9IuOZHc/l3J7d817FKkEbWua8JFRKROCn4RkTij4BcRiTMKfhGROKPgFxGJMwp+EZE4o+AXEYkzCn4RkTjTLK/cNbNyYDOw5TCeng6sj21FUotUDu8zau6a6+sKq67GXm+slx+r5TVkOYf73IbkVz93z6hPw2YZ/ABm9oi7Tz2M5xXU97JlaZjD/Yyau+b6usKqq7HXG+vlx2p5DVlOc8+v5tzV81zYBUidWutn1FxfV1h1NfZ6Y738WC2vIctprv+HgGa8xX+4tMUvIi2VtvgP3yNhFyAicpiaJL9a3Ra/iIgcXGvc4hcRkYNQ8IuIxBkFv4hInImr4Dez883sv83sGTM7I+x6RETqy8wGmtn/mNmMhi6rxQS/mT1qZuvMbH616RPNbImZLTOzGw+2DHef6e7fB74DXNyI5YqIHBCj/Fru7pfFpJ6WclaPmZ0IbAf+4u4jg2mJwMfA6UAZ8CFwKZAI3FNtEd9z93XB86YBf3f3uU1UvojEsRjn1wx3n9yQelrMzdbdfY6Z9a82eSywzN2XA5jZE8Akd78HOKf6MszMgF8CLyn0RaSpxCK/YqnFdPXUog9QGvV7WTCtNj8CTgMmm9nljVmYiEgdDim/zKybmT0MHGVmNzVkxS1mi78WVsO0Wvuu3P23wG8brxwRkXo71PzaAMRkg7Wlb/GXAVlRv2cCq0KqRUTkUISWXy09+D8EBpnZADNrA1wCPBtyTSIi9RFafrWY4Dezx4F3gSFmVmZml7l7JXAlMBtYBEx39wVh1ikiUl1zy68WczqniIjERovZ4hcRkdhQ8IuIxBkFv4hInFHwi4jEGQW/iEicUfCLiMQZBX8LZWZuZn+N+j3JzMrN7Pk6npdjZmcdZH6umTVoWAszyzCz983sIzM7oSHLijUzu8PMTgtp3Z+ZWXoI673fzBaY2f31aNvfzL7RyPVcbmbfasx1yMG19LF64tkOYKSZtXf3XUSGdl1Zj+flALnAi9VnmFmSuxcABQ2s7VRgsbt/u75PMLNEd69q4Hr3LyspuDjmK9z91liso4X5AZDh7nvq0bY/8A3g/xqrGHd/uLGWLfWjLf6W7SXg7ODxpcDj+2eYWUpw84cPgy3vScFl4XcAF5tZkZldbGY/N7NHzOxl4C9mNn7/XoOZdTSzP5vZPDMrMbOLzCzRzB4zs/nB9J9EF2RmOcB9wFnBOtqb2aVB2/lmdm9U2+3BFvj7wLFR04eZ2QdRv/c3s5Lg8a3Ba5of1G3B9DfM7G4zexP4mZl9ambJwbzOwdZ2clD75GD6Z2Z2u5nNDeobGkzPMLNXgul/NLPPq2+pm9kPzey+qN+/Y2a/Cx7PNLPCYCt7avUPLXg986N+v9bMfh48PsLMZgXPfyuqprzgNReb2ZwalmnBlv3+z+XiYPqzQArw/v5pUc85KfiMioL/I52IDFt+QjDtJ8HnfX/wnpeY2Q+C5443szlm9rSZLTSzh83sK3liZr8M5peY2QPBtJ8Hr7l31PqLzKzKzPoF7/+TwTo/NLNx1ZcrDeTu+mmBP0Ru6jAamAG0A4qA8cDzwfy7gW8Gj9OI3PAhhcjdxx6KWs7PgUKgffB79DLuBX4d1bYLMAZ4JWpaWg21HVgH0BtYAWQQ2cN8DTg/mOfAlFpeXxEwMHh8A3Bz8LhrVJu/AucGj98A/hA1789R65kKTAsePwZMDh5/BvwoePwfwJ+Cxw8BNwWPJwZ1plerL4PIWOr7f38JOD66RqA9MB/oFrW+dCJb1fOjnnst8PPg8avAoODxMcBrweN5QJ+DvOcXAa8QuYlHj+A977X//0ot7/FzwLjgccfg8znw+Ue9d/vf+7ZE9gYHBO12AwODdb6y/32Nem5XYAlfjBCQFvV/7tpqba8gMmQBRPY29r+XfYFFYf+9tbYfbfG3YO5eQiRELuWrXTdnADeaWRGRUGxH5I+oJs96pLuoutOA30etbxOwHBhoZr8zs4nA1jrK/BrwhruXe6T75e/AicG8KuDJWp43HZgSPL4Y+Efw+GSLHD+YB5wCjIh6zj+iHv8J+G7w+LtEvghq8lTwbyGR9xLgeOAJAHefBWyq/iR3LweWm9nXzawbMAR4O5h9lZkVA+8RGX1xUC3r/hIz6wgcB+QHn9sfgV7B7LeBx8zs+0SCtrrjgcfdvcrd1wJvEnnvD+Zt4EEzu4pIKNfUPXYG8K2gnveBblGv5wOP3A6wisje5vHVnruVyJfDn8zsQmBnLa97HPD/gO8Fk04DHgrW+SzQOdgbkRhRH3/L9yzwAJEtsG5R0w24yN2XRDc2s2NqWMaOWpZtVBsf3N03mVk2MIHIVtoUvviDrW0Ztdnttffr/4NIAD4VWa0vNbN2wB+AXHcvDbpH2tX0Otz97aBL5SQg0d2/dK/TKPv7vav44u/hYDVXr3EKsBh42t3dzMYTCa5j3X2nmb1RrUaASr7czbp/fgKw2d1zqq/I3S8PPruzgSIzy/HI+Oz71bfm6GX+0sxeAM4C3rOaD3obkb2i2V+aGHmd1Qf6qv5/pdLMxhI55nMJkQHJTqm2nF7A/wDnufv2YHICkfevpo0RiQFt8bd8jwJ3uPu8atNnAz+K6gM/Kpi+Dajv1tPLRP5YCZbRJejrTnD3J4FbgKPrWMb7wElmlm6Re4xeSmRr9KDc/RMiYXwLX2zJ7w/I9cHWcV33Hf0LkS3R2rb2a/Mvgr0NMzuDSBdXTZ4CzifymvbXmApsCkJ/KPD1Gp63FuhukTsqtSW4zZ67bwU+NbO8YN0WfMliZke4+/seOTi9ni+P4w4wh8ixm0QzyyCyV/UBBxEsc56730ukC2coX/3/MRv4oX1xvGSwmaUE88ZaZEjhBCJ7Zf+qtvyOQKq7vwj8mMiJBdHzk4ns2d3g7h9Hzar+/+4rX4TSMAr+Fs7dy9z9NzXMuhNIBkqCA4l3BtNfB4YHB9MuruF50e4Cuuw/qAicTOTWcG8Eu+GPAQe9BZy7rw7avA4UA3Pd/Zn6vTr+AXyTSDjg7puB/ybS3z2TyHjmB/N3IqH9eB3tqrsdOMPM5gJnAquJBOKXBF1fC4F+7r4/ZGcBSRY5GH0nke6e6s/bS+Qg+/vA80T2GPb7N+Cy4P1eAEwKpt8fHLSdTyTki6st9mmgJJj+GnC9u6+p43X+OOqz3UXkOEUJUBkcRP4JkS6zhcDcYN1/5Is9o3eJHAyeD3wa1BCtE/B88F68Cfyk2vzjiHRH3R51gLc3cBWQGxwQXkiM7jolX9CwzNJqWeTsnUnu/u+H+Ly2QFXQVXEs8F81db/Es6Cr51p3b9SbgkvjUB+/tEoWObXyTCL914eqLzA96MKoAL4fy9pEwqYtfhGROKM+fhGROKPgFxGJMwp+EZE4o+AXEYkzCn4RkTij4BcRiTP/H9e3Sn2HGX8eAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying values of step size')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L2 regularization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"egularization has the effect of penalizing model complexity in the form of an additional loss term that is a function of the model weight vector. L2 regularization penalizes the L2-norm of the weight vector, while L1 regularization penalizes the L1-norm."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0, 0.01, 0.1, 1.0, 5.0, 10.0, 20.0]\n",
"[1.463373357714063, 1.4627638795194882, 1.457389998406437, 1.414347928269498, 1.4006915016046428, 1.5458042588519074, 1.8520326400407603]\n"
]
}
],
"source": [
"params = [0.0, 0.01, 0.1, 1.0, 5.0, 10.0, 20.0]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, param, 'l2', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEOCAYAAABy7Vf3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X10HHd97/H3V8+yHm1JtmPJsWPy7KeEGEgCKSmhJFCgEB5CSqEhbXPb23Lvuee0l/Yc2rT0tkC5PYdLU5oG6phQMFBICVAeb9uQGychOGm0Vp6oiZ9WjmM5jtaSbOth9b1/zEheKavVWlrt7M5+Xufs0Wpndua7o9FnZn8z8xtzd0REJF6qoi5AREQKT+EuIhJDCncRkRhSuIuIxJDCXUQkhhTuIiIxpHAXEYkhhbuISAwp3EVEYkjhLiISQzVRzbizs9PXr18f1exFRMrSY489dszdu+YbL7JwX79+Pbt3745q9iIiZcnMDuQznpplRERiSOEuIhJDCncRkRhSuIuIxJDCXUQkhhTuIiIxpHAXESmiHzx5hP3HRpZ8Pgp3EZEiGZ1I83tffpydPz245PNSuIuIFMkzzw8xnna29rQv+bwU7iIiRZJIDgKwpadtyeelcBcRKZLeZIqOpjq62xuXfF4KdxGRIkkkB9nS04aZLfm8FO4iIkUwMjrB3qPDbClCezso3EVEiqKvP8Wkw9a1S9/eDgp3EZGiSCRTANpzFxGJk97kIN3tjXQ21xdlfgp3EZEiSCRTRTkFcorCXURkib00MsbB4yeL1iQDCncRkSWX6A/a27dqz11EJD4Sh4IrUzcp3EVE4qM3mWJDVxOtDbVFm6fCXURkiSWSg0XpLCyTwl1EZAkdSZ3m6NBoUc+UAYW7iMiS6p3uCVJ77iIisbEnmaKmyti4prWo81W4i4gsod7kIBeuaqGhtrqo85033M1su5kdNbO+OYa3mdm3zazXzJ40sw8VvkwRkfLj7uzpL+6VqVPy2XPfAdyQY/jvAk+5+1bgWuCvzaxu8aWJiJS3g8dPMnhyvOjt7ZBHuLv7A8DxXKMALRb0Pt8cjjtRmPJERMpX73RPkMXfc68pwDTuAL4FHAZagJvcfbIA0xURKWuJQ4PU11Rx0eqWos+7EAdUrweeANYAlwF3mFnWw8JmdpuZ7Taz3QMDAwWYtYhI6UokU1y6ppXa6uKfu1KIOX4IuNcDe4F9wMXZRnT3u9x9m7tv6+rqKsCsRURKU3rS6TucKvqVqVMKEe4HgesAzGwVcBHwXAGmKyJStvYeHebkWDqS9nbIo83dzHYSnAXTaWZJ4HagFsDd7wT+HNhhZnsAAz7i7seWrGIRkTIQ1ZWpU+YNd3e/eZ7hh4E3FawiEZEYSCQHaamvYUNnUyTz1xWqIiJLIJFMsam7jaoqi2T+CncRkQIbnUjz9PMn2LI2mvZ2ULiLiBTcM88PMZ72yM6UAYW7iEjBJaYPpmrPXUQkNnqTKTqa6uhub4ysBoW7iEiBJZKDbOlpI+hyKxoKdxGRAhoZnWDv0eHIzm+fonAXESmgvv4Ukw5bIzxTBhTuIiIFlZju5ld77iIisdGbHKS7vZHO5vpI61C4i4gUUCIZzW31ZlO4i4gUyEsjYxw8fjLyJhlQuIuIFEyiP2hv36o9dxGR+EgcCq5M3aRwFxGJj95kig1dTbQ21EZdisJdRKRQEsnBSDsLy6RwFxEpgCOp0xwdGi2JM2VA4S4iUhBR31ZvNoW7iEgBJJKD1FQZG9e0Rl0KoHAXESmIRDLFhataaKitjroUQOEuIrJo7k4imYq8s7BMCncRkUU68OJJUqfGS6a9HRTuIiKL1lsCt9WbTeEuIrJIiWSK+poqLlzVEnUp0xTuIiKLlEgOsnFNK7XVpROppVOJiEgZmkhP0td/oqTa20HhLiKyKHsHhjk1ni6pM2VA4S4isiiJQ6VxW73ZFO4iIovQmxykpb6G8zqaoi5lBoW7iMgiJJIpNve0UVVlUZcyg8JdRGSBRifSPHOk9A6mgsJdRGTBnn5+iPG0l9TFS1MU7iIiC5QowStTpyjcRUQWqPdQio6mOrrbG6Mu5WUU7iIiC7Snf5AtPW2YldbBVMgj3M1su5kdNbO+OYb/gZk9ET76zCxtZisKX6qISOkYGZ1g79HhkjyYCvntue8AbphroLt/yt0vc/fLgD8CfuzuxwtUn4hISerrTzHplNyVqVPmDXd3fwDIN6xvBnYuqiIRkTKQSJbmlalTCtbmbmbLCPbwv1GoaYqIlKre5CDd7Y10NtdHXUpWhTyg+jZgV64mGTO7zcx2m9nugYGBAs5aRKS4EslUSZ4COaWQ4f4+5mmScfe73H2bu2/r6uoq4KxFRIrnpZExDh4/WbJNMlCgcDezNuD1wH2FmJ6ISClL9Aft7VtLeM+9Zr4RzGwncC3QaWZJ4HagFsDd7wxHeyfwQ3cfWaI6RURKRuJQcGXqpnIOd3e/OY9xdhCcMikiEnu9yRQbuppobaiNupQ56QpVEZGzlEgOsrWE29tB4S4iclaOpE5zdGi0pM+UAYW7iMhZ6Z3uCVJ77iIisZFIDlJTZWxc0xp1KTkp3EVEzkIimeLCVS001FZHXUpOCncRkTy5O4lkqmQ7C8ukcBcRydOBF0+SOjVe8u3toHAXEclbbwnfVm82hbuISJ4SyRT1NVVcuKol6lLmpXAXEclTIjnIxjWt1FaXfnSWfoUiIiVgIj1JX/+JsmhvB4W7iEhe9g4Mc2o8XRZnyoDCXUQkL4lDpX1bvdkU7iIieehNDtJSX8N5HU1Rl5IXhbuISB4SyRSbe9qoqrKoS8mLwl1EZB6jE2meOVI+B1NB4S4iMq+nnx9iPO0lfVu92RTuIiLzSExdmbpWe+4iIrHReyhFZ3Mda9oaoi4lbwp3EZF5JJKDbOlpx6w8DqaCwl1EJKfh0Qn2DgyXRWdhmRTuIiI59PWncKfkb4g9m8JdRCSHRBl185tJ4S4ikkNvMkV3eyMdzfVRl3JWFO4iIjkkkoNl01lYJoW7iMgcjo+Mcej4qbK6MnWKwl1EZA7l2t4OCncRkTklkkE3v5u6Fe4iIrGRSA6yoauJ1obaqEs5awp3EZEs3J3eZKrszm+fonAXEcniyInTDAyNlmV7OyjcRUSy6i2z2+rNpnAXEckikRykpsrYuKY16lIWROEuIpJFIpniwlUtNNRWR13KgijcRURmcfeyvTJ1yrzhbmbbzeyomfXlGOdaM3vCzJ40sx8XtkQRkeI68OJJTpyeKNv2dshvz30HcMNcA82sHfgs8HZ33wi8pzCliYhEo7eMr0ydMm+4u/sDwPEco/wqcK+7HwzHP1qg2kREIpFIpqivqeLCVS1Rl7JghWhzvxBYbmb3m9ljZvbBAkxTRCQyieQgG9e0UltdvoclC1F5DXAF8MvA9cAfm9mF2UY0s9vMbLeZ7R4YGCjArEVECmsiPUlf/4mybm+HwoR7Evi+u4+4+zHgAWBrthHd/S533+bu27q6ugowaxGRwto7MMyp8XRZnykDhQn3+4BrzKzGzJYBrwGeLsB0RUSKLlHmV6ZOqZlvBDPbCVwLdJpZErgdqAVw9zvd/Wkz+z6QACaBz7v7nKdNioiUst7kIC31NZzX0RR1KYsyb7i7+815jPMp4FMFqUhEJEKJZIrNPW1UVVnUpSxK+R4KFhEpsNGJNM8cKf+DqaBwFxGZ9vTzQ4ynna1lfPHSFIW7iEho+p6pa7XnLiISG72HUnQ217GmrSHqUhZN4S4iEkokB9nS045ZeR9MBYW7iAgAw6MT7B0YLuvOwjIp3EVEgL7+FO6U7Q2xZ1O4i4iQcTBVe+4iIvHRm0zR3d5IR3N91KUUhMJdRATK/rZ6syncRaTiHR8Z49DxU7G4MnWKwl1EKl7c2ttB4S4iQiKZwgw2dyvcRURiI5EcZENnEy0NtVGXUjAKdxGpaO5ObzIVm/PbpyjcRaSiHTlxmoGh0Vi1t4PCXUQqXO/UbfVi0BNkJoW7iFS0RHKQmirj0nNaoy6loBTuIlLRHvr5i1x8TgsNtdVRl1JQCncRqVi9hwZ54tAgN17eE3UpBadwF5GKdfeufTTX1/CebQp3EZFYeOHEab6TeJ73bOuJ1fntUxTuIlKRvvjwAdLu3HL1+qhLWRIKdxGpOKfH03z50YO88ZJVrOtoirqcJaFwF5GKc98T/RwfGePW154XdSlLRuEuIhXF3dn+4H4uXt3ClRtWRF3OklG4i0hFeejnL/LsC0Pc+rrzMLOoy1kyCncRqSjbH9xHR1Mdb9+6JupSlpTCXUQqxr5jI/zbs0d5/5XrYndF6mwKdxGpGF94aD81VcavXXlu1KUsOYW7iFSE1Klxvrb7EG/bsoaVLQ1Rl7PkFO4iUhH+afchTo6l+VCMT3/MpHAXkdhLTzo7HtrPq9evYHPMbsoxF4W7iMTej556geRLp7j1deujLqVoFO4iEnvbd+2jZ3kjv3Tp6qhLKZp5w93MtpvZUTPrm2P4tWaWMrMnwsefFL5MEZGF6etP8ei+49xy9Xqqq+J70dJsNXmMswO4A7gnxzj/z93fWpCKREQKaPuufSyrq+Y929ZGXUpRzbvn7u4PAMeLUIuISEEdHTrNt3sP854remhrjF+f7bkUqs39KjPrNbPvmdnGAk1TRGRRvvTIQcbTzi0VcvpjpnyaZebzOLDO3YfN7C3AN4ELso1oZrcBtwGce278rxATkeicHk/zpZ8c4LqLV3JeZzz7bM9l0Xvu7n7C3YfD598Fas2sc45x73L3be6+raura7GzFhGZ07d7D3NseIxbX1d5e+1QgHA3s9UW9ptpZq8Op/niYqcrIrJQ7s72Xfu5aFULV7+iI+pyIjFvs4yZ7QSuBTrNLAncDtQCuPudwLuB3zGzCeAU8D539yWrWERkHo88d5ynnz/BJ27cHOs+23OZN9zd/eZ5ht9BcKqkiEhJuHvXPpYvq+Udl3dHXUpkdIWqiMTKwRdP8qOnX+D9r4l/n+25KNxFJFZ2PLSfajM+cNW6qEuJlMJdRGJj6HTQZ/svbzmHVa3x77M9F4W7iMTGP+1OMjw6UTF9tueicBeRWEhPOl94eD9XrFvOZWvboy4ncgp3EYmFf3vmKAdePMmt2msHFO4iEhPbH9zHmrYGrt+4KupSSoLCXUTK3lOHT/Dwcy/ywavXU1OtWAOFu4jEwN279tFYW837XlVZfbbnonAXkbJ2bHiU+3oP864rumlfVhd1OSWjEF3+Lsh/vjDMDZ9+YMHvr62uoq6mitpqo66mmrpqC3+vom56WBX1U69l/JwxbsbP+uoqamuC958ZN3M+Z6ZfVUG36xIpZV/+yUHGJia55WodSM0UWbjX1VSxrmPZgt7rDuPpScbTztjEJCdOjTM2MclYepLx9CRjE8HP0Ykzv08WuCuz2mqbuXHI2BjU1ljWDcTMDUzGRqO6evo9dbM3LjUzN1ZT78m+0dKGRyrL6ESaLz5ygGsv6uL8lc1Rl1NSIgv3dR3L+PsPbCva/NKTPr0BGMsI/bGMn+OzNhBj4cZjrg3G9LD0JGMT/rJpjE5MMjw6MWsazujEJGMT6WDjlJ4kXeAtT03VzA1P/exvHrO/1czzzaa+ppoLVjazpaeNlRV+1Z+Uln9JPM/A0KhOf8wisnAvtuoqo7GumkZKryOh9KRnbCSybUx8xsZkdNY4MzdQzlg6feY9szdEGa+NjE5kvGcy67ef8fTMDc+q1nq29LSzpbuNzT1tbOlpZ0WT2jml+II+2/dx/spmrrkg6/2BKlrFhHspq64yqquqS7IHu8lJ5+R4mmePnKD3UIo9/SkSyUH+79MvMNVrf8/yRraEQb+lu41NPW20NlTWzYil+HYfeIm+/hP8xTs3VWyf7bko3CWnqiqjub6GK9at4Ip1K6ZfHzo9Tl//Cfb0D9KbTLEnmeK7e45MD9/Q2cTmnjY2d7exdW07G9e0sqxOq5sUzvYH99HWWMuNl/dEXUpJ0n+bLEhLQy1XvaKDqzJuYfbSyBh7+s/s3T+67zj3PXEYgCqD81c2B3v3Yehfck5rSX5bkdJ36PhJfvDkEf7L619BY53WoWwU7lIwy5vq+IULu/iFC8/c/Pzo0Gn2JFMkkkHo3//sUb7+WBIIDvxetLpluklnc3cbF61uoVZXGMo87nl4P2bGByu8z/ZcFO6ypFa2NHDdJQ1cd0nQ34e783zqNIlksHe/pz9oztn56CEgOEX20nNap/fut65t5xVdzVTr9E4JjYxO8JWfHuLNm1ZzTltj1OWULIW7FJWZsaa9kTXtjdywaTUQBP7B4yen9+57Dw3yjceS3PPwAQAaa6vZ1N06o0lnfUeTzuevUN94PMnQ6QlufZ1Of8xF4S6RMzPWdTSxrqOJt21dAwRn6Tx3bIREcnA69L/0kwP8w4OTALQ01LA5PB1za9ik07O8UWdNxNzkpHP3rv1ctradV567POpySprCXUpSVZVx/spmzl/ZzI2vDM6GmEhP8p9Hh9mTTNEbNulsf3Df9Ln4K5rq2NzdNqNJp9JvtRY39//sKPuOjfCZmy+PupSSp3CXslFTXcUl57RyyTmtvDfs/W90Is2zR4am2/ATyRSfvf/Y9FW/K1vqzxyw7WljS3cbHc31UX4MWYTtD+5ndWsDbw6b9GRuCncpa/U11WFbfDsQnDlxaizNU8+HZ+iEe/n/+szR6YuuutuDi66CsA9Cv61RF12VumePDPHg3mP8wfUX6YyqPCjcJXYa66qzXnT15OETM9rwv9d35qKr9R3LZhyw3dTdRlO9/j1KyY6H9lFfU8WvvvrcqEspC1p7pSK0NNRy5YYOrtxw5qKrwZNj4QVXQZPO7v3H+VZvcNGVGZzf1TzdlLNlbTuX6qKryBwfGePex/u58ZU9LFdfRnlRuEvFal9WxzUXdHHNBWcuuhoYGmVP/+B0k84DPzvGvY/3A0EfQBeuamFrRpPORatbqKtRE8FS2/noQUYnJrn1teujLqVsKNxFMnS11POGi1fxhovPXHR15MTpGQdsv//kEb7y0/Ciq+oqLjmnZbqHzC09bZzf1az7eBbQ2MQk9zy8n2su6OSCVS1Rl1M2FO4iOZgZ57Q1ck5bI9dvPHPR1aHjp0iEe/iJ5CDf/I/D/OMjB4HgoquNa1rDwA9C/zxddLVg3+t7nhdOjPKJG7dEXUpZUbiLnCUz49yOZZzbsYy3bpl50dWe6cBPsfPRg9y9K7joqrm+hk3drcEFV2GTztoVuuhqPu7O9gf3saGziddn9Fkk81O4ixRA5kVX77z8zEVXeweGSRxKkegfZE8yxd279jOWDgK/fVnt9EVXU006q1sbFPgZHj8YdCn957+yUd98zpLCXWSJ1FRXcfHqVi5ePfOiq58dGQ6adA6lSPSnuPPHz01fdNXVUs+W7jbWdTTR1VJPZ3MdXS3104+OpvqK6kRt+659tDbUTF+lLPlTuIsUUX1NdXATk5423v+a4LXT42mePHyCPclBEv3BWTqPPPciI2Ppl73fDDqa6uhsPhP4XbOed4Y/25fVlvW3gMODp/h+3xF+83Xn6ZqDBdASE4lYQ201V6xbzhXrZnaENTI6wbHhUQaGRqd/DgyNMjA8ysDQGAPDozw3MMLA0Oh0U0+m2mqjs7n+zIYgYyMwY+PQUk9TXXXJbQimegX94NXroy2kTCncRUpUU30NTfU1rOtoyjmeu3Pi1EQY+qPTPzM3CEdSp+nrT3FseJRJf/k0GmqrZmwAsn0zmHqtGBdynRybYOejB7l+4yq629Vn+0LMG+5mth14K3DU3TflGO9VwCPATe7+9cKVKCK5mBlty2ppW1bL+Subc46bnnReOjn28m8DU78Pj7Lv2AiP7jvOSyfHs06jpaFm5gYgyzeDrpZ6VjTVLbgPmHsf7yd1apxbX6s+2xcqnz33HcAdwD1zjWBm1cAngR8UpiwRWQrVVWeaauYznp7kxeGx8NvAaY6FTUGZzUNPHz7BA0OjDI1OvOz9ZrB8Wd2s5qCMA8TNDXS2BMOXL6ubPhsm6LN9H1t62l7WVCX5mzfc3f0BM1s/z2gfBr4BvKoANYlICaitrmJ1WwOr2xqAtpzjnh5Pz2gSynac4MCBEY6eGGV04uXHB4KNTnCguKm+hp8PjPDpmy4rueMA5WTRbe5m1g28E3gD84S7md0G3AZw7rnq2U0kLhpqq1m7YhlrVyzLOZ67Mzw6kbEBGGNg6HTGcYLgm8Jrz+/gLZvPKVL18VSIA6qfBj7i7un5trLufhdwF8C2bduyHNYRkTgzM1oaamlpqGVDV+7jA7I4hQj3bcBXwmDvBN5iZhPu/s0CTFtERBZg0eHu7tOHs81sB/AdBbuISLTyORVyJ3At0GlmSeB2oBbA3e9c0upERGRB8jlb5uZ8J+butyyqGhERKQjdUUBEJIYU7iIiMaRwFxGJIYW7iEgMmXs01xKZ2QAwCKRmDWrL47VO4NjSVfcy2WpayvfnM36uceYalu/r2cYr5jJf7PI+22ks1fKea1ilr+OLXd65hlfC8l7n7vPfc9DdI3sAdy3kNWB31HUu5fvzGT/XOHMNy/f1Of4GRVvmi13eZzuNpVreOZZlRa/ji13euYZXyvLO5xF1s8y3F/FaMS12/mf7/nzGzzXOXMPyfb3cl/fZTmOplvdcwyp9HV/s8s41vFKW97wia5ZZDDPb7e7boq6jkmiZF5eWd3HFcXlHvee+UHdFXUAF0jIvLi3v4ord8i7LPXcREcmtXPfcRUQkB4W7iEgMKdxFRGIoduFuZu8ws8+Z2X1m9qao64k7M9tgZv9gZl+Pupa4MrMmM/tCuF6/P+p6KkEc1uuSCncz225mR82sb9brN5jZs2a218z+MNc03P2b7v5bwC3ATUtYbtkr0PJ+zt1/Y2krjZ+zXPY3Al8P1+u3F73YmDibZR6H9bqkwh3YAdyQ+YKZVQN/C7wZuBS42cwuNbPNZvadWY+VGW/9aPg+mdsOCre85ezsIM9lD/QAh8LR0kWsMW52kP8yL3uFuIdqwbj7A2a2ftbLrwb2uvtzAGb2FeBX3P3jwFtnT8OCm7l+Avieuz++tBWXt0Isb1mYs1n2QJIg4J+g9HbIysZZLvOniltd4ZXDitLNmb0WCFb07hzjfxh4I/BuM/vtpSwsps5qeZtZh5ndCVxuZn+01MXF3FzL/l7gXWb2d0R/2XzcZF3mcVivS2rPfQ6W5bU5r7xy988An1m6cmLvbJf3i4A2ooWRddm7+wjwoWIXUyHmWuZlv16Xw557Elib8XsPcDiiWiqBlnd0tOyLL7bLvBzC/afABWZ2npnVAe8DvhVxTXGm5R0dLfvii+0yL6lwN7OdwMPARWaWNLPfcPcJ4PeAHwBPA19z9yejrDMutLyjo2VffJW2zNVxmIhIDJXUnruIiBSGwl1EJIYU7iIiMaRwFxGJIYW7iEgMKdxFRGJI4b4EzMzN7IsZv9eY2YCZfWee911mZm/JMXybmS2qawUz6zKzn5jZf5jZNYuZVqGZ2cfM7I0FmtZ+M+ssxLQKOU0zu9jMngiX/yvmm76Zvd/MEuHjITPbupj5L8RCPreZfX4hvSua2S1mtmax05Hy6FumHI0Am8ys0d1PAb8E9OfxvsuAbcB3Zw8wsxp33w3sXmRt1wHPuPuv5/sGM6t294J0NRt+jolsw9z9TwoxjxL3DuA+d789z/H3Aa9395fM7M3AXcBrcr2hkH+vhQjn/5sLfPstQB9hFwCLmI64ux4FfgDDwF8C7w5/vwf4CPCd8PcmYDvBpc//QdDFaB1wEBgg6Nr1JuBPCf6Zfwh8Gbg2YxrNwN3AHiABvAuoJuizui98/X/MquuyWfNoBG4Ox+0DPjnrM3wM+AnwuozXLwEezfh9PZAIn/9J+Jn6wrqnLpK7P1wePwZuJwis2nBYK7AfqA1rn1pm+4E/Ax4P67s4fL0L+FH4+t8DB4DOLH+D/VOvA78GPBp+5r8Pl9PvAH+VMf4twN/MNX7mNMO/378AveFnvSnL/C8DHgn/Nv8MLAfeAhwh2ND/e66a51ivlgP9Oda56b8XcEW4vB8juPrynHC8V4U1PQx8CujL+Px3ZEzvO8C1WZblN8NpPgnclmP+9xPsqLw9XI5PAM8C++ZaV4B3h9N5ljPr5/3AtvA9udbVvwj/Ho8Aq6LOgFJ4RF5AHB/hyrYF+DrQEK6o13ImmP8S+LXweTvwszAwZv+D/Wn4j9QY/p45jU8Cn84Yd3n4D/2jjNfas9Q2PQ9gDUHYdxF8i/s34B3hMAfeO8fnewLYED7/CPDR8PmKjHG+CLwtfH4/8NmMYXdnzOc24K/D5zuYGe4fDp//V+Dz4fM7gD8Kn98Q1jlnuBNsjL7NmY3JZ4EPhp95b8b43yMIpazjz5rmu4DPZby3Lcv8EwR73BCE3qcz/qa/P8dy3Z/ts2QM//2p5ZBl2PTfi2BD+RDQFf5+E7A9fN4HXB0+/wRnH+4rwp+N4bQ6sq0vZIRyxmtfA343j3Vl2+zpMP+6OvX+vyJcHyv9oTb3JeLuCYK92pt5eTPLm4A/NLMnCFbeBuDcOSb1LQ+admZ7Ixl3mnL3l4DngA1m9jdmdgNwYp4yXwXc7+4DHjSVfAn4hXBYGvjGHO/7GvDe8PlNwFfD578YtufvAd4AbMx4z1cznn+eM13Yfogg7LO5N/z5GMGyhCCAvwLg7t8HXprrw4WuI9jo/TRc3tcRbJgGgOfM7Eoz6wAuAnbNNf6sae4B3mhmnzSza9w9lTnQzNoINqw/Dl/6AmeW64KY2S8Cv0GwMc0m8+91EbAJ+FH4GT4K9JhZO9Di7g+F4315AaX8NzOb2kNeC1yQZf7Z6v+fwCl3n1pnc60r2eRaV8cINkYwc12paGpzX1rfAv43wR53R8brBrzL3Z/NHNnMsrWljswxbWNWP+setMtuBa4HfpcggG/NUV+2vqynnPa5222/CvyTmd0bzNb/08waCPZyt7n7ITP7U4KN1ss+h7vvMrP1ZvZ6giaPGfe0zDAa/kxzZl3NVXM2BnzB3bPdcOGrBMvoGeCf3d3DO3nNNf5U/T8zsysImlkBrP1WAAAC2klEQVQ+bmY/dPePnWVdeTOzLQQbxDd70M94Npl/LwOedPerZk1neY7ZTDDzBIuG2SOY2bUEOxVXuftJM7s/Y7w51xczuw54D2EY57GuZJ1MjmHjHu62M3NdqWjac19a24GPufueWa//APhwGCSY2eXh60NAS57T/iFBb3aE01gentFQ5e7fAP4YeOU80/gJ8Hoz67TgXpI3E7TT5uTuPyf4J/pjzuyRT/1zHjOzZoL201zuAXYy9177XB4k/NZgZm8iaI7K5V8J7sq1MnzPCjNbFw67l+AA582c+Ry5xid8bQ1w0t3/kWDjPWM5h3vyL2WcjfQB8liu2ZjZuWGdH3D3n+X5tmeBLjO7KpxGrZltDL/dDZnZleF478t4z37gMjOrMrO1BLefm60NeCkM9ouBK7OMM7v+dQRB/t6Mb6C51pW5/gcWtK5WMm3hlpC7J4H/k2XQnwOfBhJhwO8nuD/pv3Omuebj80z+fwF/a8Gd3NMEBx9/DtxtZlMb7Zy3B3P35y24hdi/E+wZfdfd78vnsxGE4aeA88JpDZrZ5wiaLPYTHCzL5UvhZ9iZ5/ym/Bmw08xuIvjnfp4gELJy96fM7KPAD8PlMk7wreZA+E3nKeBSd390vvEzJrsZ+JSZTYbDfyfLrH8duNPMlhE0l+V7J6VEOF0Imr9aCb71fTbcF5hw9225JuDuY2b2buAzYRNRDcH69iRB087nzGyEoElwqklpF8GB7qkDltnuP/x94LfNLEGwAXkkj89zS1j/P4f1H3b3t+RYV3YQLLdTwPQ3j0WuqxVJXf5KJMLw+RV3/8BZvq8eSLv7RLhn+nfuftmSFBlDZtbs7sPh8z8kOIvmv0dcliwB7blL0ZnZ3wBvJmizPlvnAl8L96rHgN8qZG0V4JfDPeAagm8jt0RbjiwV7bmLiMSQDqiKiMSQwl1EJIYU7iIiMaRwFxGJIYW7iEgMKdxFRGLo/wMrlHM1Wt/pRQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying levels of L2 regularization')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L1 regularization"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]\n",
"[1.463373357714063, 1.4633409680931317, 1.4630506454349392, 1.4603658739928238, 1.4355688529629576, 1.7677660966171576, 4.800777158151935]\n"
]
}
],
"source": [
"params = [0.0, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, param, 'l1', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEOCAYAAABy7Vf3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xt4XPV95/H3Rxff77ZsGV8whIvBFyA4BHKDAAEDNunzQBuymzR0k+VJNm2SfZrtljbNNuxu26TbNpumaUpCGpKmCVmSNkjcCTEJJEAMQbKNDTFgwJFky/e7rct3/5gje5BH0kia0dHMfF7PI/vMmd+c8/3NjD5z9JtzUURgZmblpSrtAszMrPAc7mZmZcjhbmZWhhzuZmZlyOFuZlaGHO5mZmXI4W5mVoYc7mZmZcjhbmZWhhzuZmZlqCatFc+aNSsWLVqU1urNzErSM888syMi6gZql1q4L1q0iLVr16a1ejOzkiTp1XzaeVjGzKwMOdzNzMqQw93MrAw53M3MypDD3cysDDnczczKkMPdzGwEPbC+jZY9h4u+Hoe7mdkI2Xu4g09891fc8fgrRV+Xw93MbIQ8/Pw2jnV1s2r53KKvy+FuZjZCGppamD99POcvmFb0dTnczcxGwK6Dx3hi8w5WLT8FSUVfn8PdzGwEPLC+jc7uYPV5xR+SAYe7mdmIaGxu4fRZEzl37pQRWZ/D3cysyLbvP8KTL+9k1XkjMyQDDnczs6K7f10b3QGrR2AvmR4OdzOzImtoamFx/WTOnDN5xNbpcDczK6KWPYdZ++ruEdm3PZvD3cysiO5tbgVg1fJTRnS9DnczsyJqbG5h2bypLJo1cUTX63A3MyuSV3cepGnr3hHbtz2bw93MrEgakyGZ60Z4SAYGEe6SqiX9SlJjjvtultQu6bnk5yOFLdPMrPQ0NLVw4anTmTdt/IivezBb7p8ENvZz/10RcX7y8/Vh1mVmVtI2bz/Aprb9I76XTI+8wl3SfOA6wKFtZpaHxuYWJLhu2SgOd+CLwB8B3f20uUFSs6S7JS0YfmlmZqUpImhoauGtp81g9pRxqdQwYLhLWgVsj4hn+mnWACyKiOXAI8CdfSzrFklrJa1tb28fUsFmZqPdprb9vNR+kNXnjfwXqT3y2XJ/O3C9pC3A94DLJf1LdoOI2BkRR5ObXwMuzLWgiLg9IlZExIq6urphlG1mNno1NLVQXSWuWZrOkAzkEe4RcWtEzI+IRcBNwKMR8YHsNpKye3A9/X/xamZWtiKCxuZW3vammcyYOCa1Ooa8n7uk2yRdn9z8hKQNkpqATwA3F6I4M7NS07x1L6/tOpTqkAxAzWAaR8QaYE0y/dms+bcCtxayMDOzUtTY3EJttbj63PpU6/ARqmZmBdLdnRmSufSsOqZOqE21Foe7mVmBPPvablr3HhnxM0Dm4nA3MyuQhqYWxtZUceW5c9IuxeFuZlYIXd3BvevauHzxbCaNHdTXmUXhcDczK4CnXt7JjgNHU99LpofD3cysABqaW5kwppp3nz077VIAh7uZ2bB1dHVz//pW3nPuHMaPqU67HMDhbmY2bE9s3sGeQx2jYi+ZHg53M7NhamhqZfK4Gt511qy0SznO4W5mNgxHO7t4aEMbVy+pZ2zN6BiSAYe7mdmwPPZCO/uPdo6avWR6ONzNzIahsbmV6RNqedubZqZdyhs43M3MhujwsS4e2biNa5bNpbZ6dMXp6KrGzKyEPLppO4eOdaV2Eez+ONzNzIaosbmFusljeetpo2tIBhzuZmZDcuBoJ49u2s51y+ZSXaW0yzmJw93MbAgeeX4bRzu7R+WQDDjczcyGpKGphVOmjuPNC6enXUpODnczs0Hae6iDn/66neuWz6VqFA7JgMPdzGzQHtzQRkdXjLoDl7I53M3MBqmhuYWFMyawbN7UtEvpU97hLqla0q8kNea4b6ykuyRtlvSUpEWFLNLMbLTYeeAoP39pJ6vPm4s0OodkYHBb7p8ENvZx34eB3RFxBvB3wOeHW5iZ2Wh0//o2urpjVJ3eN5e8wl3SfOA64Ot9NHkvcGcyfTdwhUbzR5qZ2RA1NLVwxuxJLK6fnHYp/cp3y/2LwB8B3X3cPw94HSAiOoG9wEmHbEm6RdJaSWvb29uHUK6ZWXq27TvC01t2sWr56B6SgTzCXdIqYHtEPNNfsxzz4qQZEbdHxIqIWFFXVzeIMs3M0ndvcysRjPohGchvy/3twPWStgDfAy6X9C+92mwFFgBIqgGmArsKWKeZWeoam1s4Z+4Uzpg9Ke1SBjRguEfErRExPyIWATcBj0bEB3o1uwf4UDJ9Y9LmpC13M7NStXX3IZ59bQ+rzxudpxvorWaoD5R0G7A2Iu4B7gC+LWkzmS32mwpUn5nZqHBvcysAq5aN/iEZGGS4R8QaYE0y/dms+UeA3y5kYWZmo0lDcwvnLZjGwpkT0i4lLz5C1cxsAFt2HGT9b/axepSeATIXh7uZ2QAam1sAuM7hbmZWPhqaWnnLounMnTo+7VLy5nA3M+vHi9v288K2/SWxb3s2h7uZWT8am1qoElyzrD7tUgbF4W5m1oeIoLG5lYtPn8nsyePSLmdQHO5mZn3Y0LKPl3ccHNUX5eiLw93MrA+Nza3UVImVS0prSAYc7mZmOWWGZFp4x5mzmD5xTNrlDJrD3cwsh+de38PW3YdLbi+ZHg53M7McGppaGVNdxVVL5qRdypA43M3MeunuDu5d18KlZ9cxZVxt2uUMicPdzKyXX27ZxbZ9R0tyL5keDnczs14am1sZV1vFFYtnp13KkDnczcyydHZ1c9+6Vq44Zw4Txw75khepc7ibmWV58uVd7Dx4rKRO75uLw93MLEtDUwuTxtZw2dmlOyQDDnczs+OOdXbzwIY23nPuHMbVVqddzrA43M3MEo9vbmfv4Y6SuQh2fxzuZmaJxqZWpo6v5R1n1KVdyrANGO6Sxkl6WlKTpA2SPpejzc2S2iU9l/x8pDjlmpkVx5GOLh56fhsrl9Qzpqb0t3vz2c/nKHB5RByQVAs8Lun+iHiyV7u7IuL3C1+imVnxrXmhnQNHO1lVBkMykEe4R0QAB5KbtclPFLMoM7OR1tjcwsyJY7jk9Jlpl1IQef3tIala0nPAduDhiHgqR7MbJDVLulvSgoJWaWZWRIeOdfLjjdu5Zlk9NdWlPyQDeYZ7RHRFxPnAfOAiSUt7NWkAFkXEcuAR4M5cy5F0i6S1kta2t7cPp24zs4L58cbtHO7oKtnT++YyqI+oiNgDrAFW9pq/MyKOJje/BlzYx+Nvj4gVEbGirq70v402s/LQ0NTCnCljecuiGWmXUjD57C1TJ2laMj0euBLY1KtN9jcQ1wMbC1mkmVmx7DvSwZoX27l22Vyqq5R2OQWTz94yc4E7JVWT+TD4fkQ0SroNWBsR9wCfkHQ90AnsAm4uVsFmZoX08IZtHOvsLunT++aSz94yzcAFOeZ/Nmv6VuDWwpZmZlZ8jc0tzJs2ngsWTEu7lIIqj6+FzcyGYPfBY/zs1ztYdd5cpPIZkgGHu5lVsAc3tNHZHawuo71kejjczaxiNTS3cNqsiSw5ZUrapRScw93MKlL7/qP84qWdrFpefkMy4HA3swp1//pWuoOy20umh8PdzCpSY1MrZ82ZxFlzJqddSlE43M2s4rTuPczTW3aV5RepPRzuZlZx7m1uBWBVmQ7JgMPdzCpQQ3MrS+dN4bRZE9MupWgc7mZWUV7fdYim1/eU1Rkgc3G4m1lFaUyGZK5bVh5XXOqLw93MKkpDUwsXLJzGghkT0i6lqBzuZlYxXmo/wPOt+8p+SAYc7mZWQRqbWpHKf0gGHO5mViEigobmFt6yaAb1U8elXU7ROdzNrCK8sG0/m7cfKNvTDfTmcDezitDY1EqV4Jql9WmXMiIc7mZW9nqGZN5+xixmTRqbdjkjwuFuZmVv/W/28erOQ6xaXv5fpPZwuJtZ2WtobqG2Wly9pDKGZCCPcJc0TtLTkpokbZD0uRxtxkq6S9JmSU9JWlSMYs3MBqu7O7i3uZV3nlnHtAlj0i5nxOSz5X4UuDwizgPOB1ZKurhXmw8DuyPiDODvgM8Xtkwzs6H51eu7+c2ew6w+r3KGZCCPcI+MA8nN2uQnejV7L3BnMn03cIXK8bpVZlZyGppaGVNTxZXnzEm7lBGV15i7pGpJzwHbgYcj4qleTeYBrwNERCewF5hZyELNzAarqzu4d10rl589m8njatMuZ0TlFe4R0RUR5wPzgYskLe3VJNdWeu+teyTdImmtpLXt7e2Dr9bMbBCefmUX7fuPsqrChmRgkHvLRMQeYA2wstddW4EFAJJqgKnArhyPvz0iVkTEirq6uiEVbGaWr4bmFiaMqebyxbPTLmXE5bO3TJ2kacn0eOBKYFOvZvcAH0qmbwQejYiTttzNzEZKR1c3D6xv44pz5jBhTE3a5Yy4fHo8F7hTUjWZD4PvR0SjpNuAtRFxD3AH8G1Jm8lssd9UtIrNzPLw85d2suvgMVZX0IFL2QYM94hoBi7IMf+zWdNHgN8ubGlmZkPX2NTC5LE1XHp2ZQ4B+whVMys7Rzu7eHBDG+9ZMoexNdVpl5MKh7uZlZ2fvbiDfUc6K+b0vrk43M2s7DQ2tzBtQi3vOGNW2qWkxuFuZmXlSEcXDz+/jWuW1lNbXbkRV7k9N7Oy9JNN2zl4rKsiLoLdH4e7mZWVhuYWZk0ay8WnV/YZUBzuZlY2Dhzt5NFN27l2WT3VVZV97kKHu5mVjR9v3MaRju6K3kumh8PdzMpGQ1Mr9VPGceHC6WmXkjqHu5mVhb2HO3jsxe2sWj6XqgofkgGHu5mViYc2tNHRFazykAzgcDezMtHQ3MqCGeM5b/7UtEsZFRzuZlbydh08xhObd7Bq+Sn4Cp8ZDnczK3n3r2+lqztYXeEHLmVzuJtZyWtsauX0uomcM3dy2qWMGg53Mytp2/cd4clXdrLaQzJv4HA3s5J237pWImB1BV4Euz8OdzMraY3NrSyun8wZsz0kk83hbmYlq2XPYda+utunG8jB4W5mJeve5lYAVlXoRbD743A3s5LV0NzC8vlTOXXmxLRLGXUGDHdJCyT9RNJGSRskfTJHm8sk7ZX0XPLz2eKUa2aW8erOgzRv3eut9j7U5NGmE/jDiHhW0mTgGUkPR8Tzvdr9LCJWFb5EM7OTNSZDMtf5wKWcBtxyj4jWiHg2md4PbATmFbswM7P+NDS1cOGp05k3bXzapYxKgxpzl7QIuAB4Ksfdl0hqknS/pCUFqM3MLKfN2/ezqW0/qz0k06d8hmUAkDQJ+AHwqYjY1+vuZ4FTI+KApGuBfwfOzLGMW4BbABYuXDjkos2ssjU0tSLBtcsc7n3Ja8tdUi2ZYP9ORPyw9/0RsS8iDiTT9wG1kmblaHd7RKyIiBV1dXXDLN3MKlFE0NDcwsWnzWT2lHFplzNq5bO3jIA7gI0R8bd9tKlP2iHpomS5OwtZqJkZwMbW/bzcfpBVPt1Av/IZlnk78EFgnaTnknl/AiwEiIivAjcCH5PUCRwGboqIKEK9ZlbhGppbqK4S1yx1uPdnwHCPiMeBfk+1FhFfBr5cqKLMzHKJCBqbW3j7GbOYMXFM2uWMaj5C1cxKRtPWvby+67D3ksmDw93MSkZjUwtjqqu4akl92qWMeg53MysJ3d1BY3Mr7zqrjqnja9MuZ9RzuJtZSXjmtd207Tvii3LkyeFuZiWhoamFcbVVXHnOnLRLKQkOdzMb9bq6g/vWtXH54tlMHJv3gfUVzeFuZqPeUy/vZMeBo6z2GSDz5nA3s1GvobmFiWOqeffi2WmXUjIc7mY2qnV0dXP/+jauPHcO42qr0y6nZDjczWxUe3zzDvYc6vCQzCA53M1sVGtsamXyuBreedZJJ5q1fjjczWzUOtLRxUMb2li5pJ6xNR6SGQyHu5mNWj99sZ39RztZdZ6HZAbL4W5mo1ZDcyszJo7hbW+amXYpJcfhbmaj0qFjnTzy/DZWLq2nttpRNVh+xsxsVHp003YOd3R5L5khcrib2ajU2NRK3eSxXHTajLRLKUkOdzMbdfYf6eDRF7Zz3bK5VFf1eyE464PD3cxGnUc2buNYZ7dP7zsMDnczG3UamlqZN208FyyYnnYpJWvAcJe0QNJPJG2UtEHSJ3O0kaQvSdosqVnSm4tTrpmVuz2HjvGzX7dz3fK5VHlIZsjyOTFyJ/CHEfGspMnAM5Iejojns9pcA5yZ/LwV+MfkfzOzQXlwQxsdXeG9ZIZpwC33iGiNiGeT6f3ARmBer2bvBb4VGU8C0yR5sMzMBq2xuZVTZ05g6bwpaZdS0gY15i5pEXAB8FSvu+YBr2fd3srJHwBmZv3aceAoT2zewerlpyB5SGY48g53SZOAHwCfioh9ve/O8ZDIsYxbJK2VtLa9vX1wlZpZ2bt/fRvdAau8l8yw5RXukmrJBPt3IuKHOZpsBRZk3Z4PtPRuFBG3R8SKiFhRV1c3lHrNrIw1NrVwxuxJnD1nctqllLx89pYRcAewMSL+to9m9wC/m+w1czGwNyJaC1inmZW5bfuO8PSWXR6SKZB89pZ5O/BBYJ2k55J5fwIsBIiIrwL3AdcCm4FDwO8VvlQzK2f3NrcSHpIpmAHDPSIeJ/eYenabAD5eqKLMrPI0NLdw7twpvKluUtqllAUfoWpmqXt91yF+9doeb7UXkMPdzFJ377rMV3Q+cKlwHO5mlrrG5hbOWzCNBTMmpF1K2XC4m1mqXtlxkPW/2cfq5R6SKSSHu5mlqrEpc0jMdQ73gnK4m1mqGppbuGjRDOZOHZ92KWXF4W5mqXmhbT8vbjvgvWSKwOFuZqlpbG6hSnDNUod7oeVzhKqZWcFs23eEhza08cCGNp58eRdvP2MWdZPHpl1W2XG4m1nRvbrzIA9uaOOB9W08+9oeAE6vm8hHLz2dD12yKN3iypTD3cwKLiJ4cdsBHlif2ULf2Jo5S/jSeVP49FVnsXJpPWfM9pkfi8nhbmYFERE0bd3LA+vbeHBDG6/sOIgEK06dzmeuO4erl9T7IKUR5HA3syHr6g5+uWXX8UBv3XuEmipxyZtm8pF3nsZ7zp3D7Mnj0i6zIjnczWxQjnZ28fOXdvLg+jYefn4bOw8eY2xNFe86q45PX3U2V54zh6kTatMus+I53M1sQIeOdfLYC+08sKGNRzduZ//RTiaNreHyxbNZubSeS8+qY+JYx8lo4lfDzHLae6iDH2/axgPr23jsxXaOdnYzfUIt1y6by8ql9bztjJmMralOu0zrg8PdzI7bvv8IDz+fCfRfvLSTzu6gfso43n/RQq5eUs9bFk2nptrHPpYCh7tZhdu6+9DxL0TXvrqbCFg0cwIffudpXLN0LsvnTaWqytc0LTUOd7MKtHn7geMHFa37zV4AFtdP5pNXnMnKpfWcPWeyL1Jd4hzuZhUgItjQsu/4QUWbtx8A4IKF07j1msVcvaSeRbMmplylFdKA4S7pG8AqYHtELM1x/2XAj4BXklk/jIjbClmkmQ1eV3fw7Gu7M4G+vo3f7DlMdZV462kz+N1LTuWqc+upn+p90MtVPlvu3wS+DHyrnzY/i4hVBanIzIaso6ubX7y0kwc2tPHQhm3sOHCUMdVVvOPMWXzyyjO58pw5zJg4Ju0ybQQMGO4R8VNJi4pfipkNxZGOLn76YmYf9Eee38a+I51MGFPNu8+ezdVL63n32XVMHueDiipNocbcL5HUBLQAn46IDQVarpnlsP9IB49u2s6DG9r4yaZ2Dnd0MXV8Le85t56VS+t555mzGFfrfdArWSHC/Vng1Ig4IOla4N+BM3M1lHQLcAvAwoULC7Bqs8qx88BRHtmY2Qf9ic07OdbVTd3ksdxw4TxWLpnLW0+fQa33QbfEsMM9IvZlTd8n6SuSZkXEjhxtbwduB1i87Px4/NcnNSmawe7VNeidwIaw15gG+SDpxGokHe+TyO5f7/l6Q3mZZej4NH3MP7GMwa8n12Oz1zdQWyX/VEmMqaliTHUVtdVVVFfgvtatew/zYLKHy9Ov7KI7YP708Xzobaeycmk9FyyY7n3QLadhh7ukemBbRISki8hcum/nQI97ZcdBPnDHU8NdvVWQKkFtdRL2NVXUVmfC//i86sy82uqqN87vaXu8TRW1NZnbJ+6vYkzy2Npk3pis5ff8jEkem73OMTVZ662uGnbYvrLj4PFdFptez1zY4szZk/j4u8/g6iX1LDllivdBtwHlsyvkd4HLgFmStgL/A6gFiIivAjcCH5PUCRwGboqIGGi5p9dN5FsfvWQYpedv4Gp6tx/cAwa5+GQdg2xPHF9RZD0+iKzpE7VHVmGRTETQZ9sT9fReXv/roa82J8olIshafB/LO3l+V3fQ0dVNR1fP/90c6+qmozM41tVFR2ecmJfV7mhnNweOdmbm5WhzrDNz+1hn90BP+5BUV+kNYX/iA+KNHzw9t8dmfYC8uG0/m9r2A7B8/lT+29Vnc/WSes6YPakotVr50mCDrFBWrFgRa9euTWXdZpD50Ml8gERW+GdCP/N/rw+VrqCj88QHRaZdHx88yYdIR/aHSlf3Gx6fafvGNrMnj+XqJfVctWQO86f7whZ2MknPRMSKgdr5CFWrWJKoqRY11TAe71li5cVfrZuZlSGHu5lZGXK4m5mVIYe7mVkZcribmZUhh7uZWRlyuJuZlSGHu5lZGUrtCFVJ7cCrwFRgb9Zd/d3umZ4FFOqsY73XN9R2fd2fa34+fex9X6X0OXu6UH3Ot7/5tHWf+54/lN9lKJ0+D/Y17n27UH0+NSLqBmwVEan+ALfne7tnGlhbrPUPtV1f9+ean08fK7XPvaYL0ud8++s+D6/PQ/ldLqU+D/Y1Hok+9/czGoZlGgZxu/d9xVj/UNv1dX+u+YPpY6X1Oc3+5tPWfe57fqn8LufTNp/XM9e8ke5zn1IblhkOSWsjjxPnlBP3uTK4z5VhJPo8Grbch+L2tAtIgftcGdznylD0PpfklruZmfWvVLfczcysHw53M7My5HA3MytDZRfukn5L0tck/UjSVWnXMxIknS7pDkl3p11LMUmaKOnO5PX9j2nXMxIq5bXNVmm/w5LOkfRVSXdL+ljBFlzsHekHeRDCN4DtwPpe81cCLwCbgT/Oc1nTgTvS7tMI9/nutPtTzP4DHwRWJ9N3pV37SL7mpfjaFqDPJfE7XMD+VhWyv6k/Cb069y7gzdlPCFANvAScDowBmoBzgWVAY6+f2VmP+xvgzWn3aYT7XHIBMMj+3wqcn7T517RrH4k+l/JrW4A+l8TvcCH6C1wP/Bz4D4WqYVRdIDsifippUa/ZFwGbI+JlAEnfA94bEX8JrOq9DEkC/gq4PyKeLW7Fw1eIPpeywfQf2ArMB56jhIcUB9nn50e2uuIYTJ8lbaSEfodzGexrHBH3APdIuhf410LUUAq/IPOA17Nub03m9eUPgCuBGyV9tJiFFdGg+ixppqSvAhdIurXYxY2Avvr/Q+AGSf/ICB/KPQJy9rkMX9tsfb3O5fA7nEtfr/Flkr4k6Z+A+wq1slG15d4H5ZjX55FXEfEl4EvFK2dEDLbPO4Fy+iXI2f+IOAj83kgXM0L66nO5vbbZ+upzOfwO59JXf9cAawq9slLYct8KLMi6PR9oSamWkVKJfc5Wif13n8u/zyPa31II918CZ0o6TdIY4CbgnpRrKrZK7HO2Suy/+1z+fR7Z/qb9rXKvb5i/C7QCHWQ+5T6czL8WeJHMN81/mnad7rP77z67z6O9vz5xmJlZGSqFYRkzMxskh7uZWRlyuJuZlSGHu5lZGXK4m5mVIYe7mVkZcrgXgaSQ9O2s2zWS2iU1DvC48yVd28/9KyQN67BsSXWSnpL0K0nvHM6yCk3SbZKuLNCytkiaVYhlFXKZkhZLei55/t800PKT9r+QdFTSp4ez7qEaSr8lfV3SuUNY182SThnucqw0zi1Tig4CSyWNj4jDwHuA3+TxuPOBFeQ4eZCkmohYC6wdZm1XAJsi4kP5PkBSdUR0DXO9PcuqiYjOXPdFxGcLsY5R7reAH0XE/8iz/S7gE8nj8lLI12sokvV/ZIgPvxlYT3JY/jCWU/G85V489wPXJdPvJ3PEGnD8ikLfkPTLZAvuvcnhyLcB70u27N4n6c8l3S7pIeBbydnjGpNlTJL0z5LWSWqWdIOkaknflLQ+mf9fswuSdD7wBeDaZB3jJb0/abte0uez2h5ItqSfAi7Jmn+OpKezbi+S1JxMfzbp0/qkbiXz10j6C0mPAX8q6RVJtcl9U5Itw9qk9huT+VskfU7Ss0l9i5P5dZIeTub/k6RXB9qqlPQBSU8nff6n5Hn6mKQvZLW5WdLf99W+1/ImSrpXUlPS1/flWOf5kp5MXpt/kzQ9+avsU8BHJP2kv5p7RMT2iPglmSMd++vjG14vSRdKekzSM5IelDQ3afeWpKZfSPprSeuz+v/lrOU1Srosx3r+PVnmBkm39LP+Ncr8pXl98jw+J+kFSa8k7U96rySv/QrgO1nvzzWSViSP6e+9+r+T1+NJSXPyeW7LXtqH6ZbjD3AAWA7cDYwjc/7xy4DG5P6/AD6QTE8jczjyRDJbLV/OWs6fA88A45Pb2cv4PPDFrLbTgQuBh7PmTctR2/F1AKcArwF1ZP6KexT4reS+AH6nj/49B5yeTP934DPJ9IysNt/mxFWT1gBfybrvn7PWcwvwN8n0N4Ebk+ktwB8k0/8F+Hoy/WXg1mR6ZVLnrBw1bgFmAeeQOT1wbTL/K8DvJn3enNX+fuAdfbXvtcwbgK9lPXZqjvU3A5cm07f1vFbJa/rpPp7XLbn6MtDjer9eQC2ZCz/UJbffB3wjmV4PvC2Z/iuSi0lw8nuvEbisd109rzEwPlnWzFzvl+Q1X9Grxu8DH8/jvbKi93IY+L3a8/gvkLwfK/3HW+5FEhHNwCIyW+29h1muAv5Y0nNk3rzjgIV9LOqeyAzt9HYl8A9Z69sNvAycLunvJa0E9g1Q5luANRHRHpmhku+QuYIMQBfwgz4e933gd5Lp9wF3JdPvVmY8fx1wObAk6zF3ZU1/nROn7v09MmGfyw+T/58h81yhsKFZAAAD3UlEQVRCJoC/BxARDwC7++pc4goyH3q/TJ7vK8h8MLUDL0u6WNJM4Gzgib7a91rmOuBKSZ+X9M6I2Jt9p6SpZD5YH0tm3cmJ57VYsl+vs4GlwMNJHz4DzJc0DZgcET9P2g3lohCfkNQEPEnmDIdn5lj/SST9EXA4Inres/29V3Lp7716jMyHEbzxvVLRPOZeXPcA/4fMFvfMrPkCboiIF7IbS3prjmUc7GPZotc53iNit6TzgKuBj5MJ4P/UT325zi/d40j0PW57F/D/JP0ws9r4taRxZLZyV0TE65L+nMyH1kn9iIgnlBnOuRSojoj1faznaPJ/Fyfeq/3VnIuAOyMi14Uu7iLzHG0C/i0iQlJ/7Xvqf1HShWROAvWXkh6KiNsGWVehZb9eAjZExCXZDSRN7+fxnbxxmHZc7wbJMM2VwCURcUjSmqx2fb5fJF0B/DZJGOfxXsm5mH7u64hks503vlcqmrfci+sbwG0Rsa7X/AeBP0iCBEkXJPP3A5PzXPZDwO/33EjGdGcBVRHxA+DPyFzDsT9PAZdKmpWMK78feGyAxxARL5H5JfozTmyR9/xy7pA0CbhxgMV8i8z3EH1ttfflcZK/GiRdRWY4qj8/JnNFn9nJY2ZIOjW574dkvqh8Pyf60V97knmnAIci4l/IfHi/4XlOtuR368TeSB8kj+e1gF4A6iRdktRbK2lJ8tfdfkkXJ+1uynrMFuB8SVWSFpC5JFxvU4HdSbAvBi7O0eYNkufuK2SGbHr+Au3vvdLX78CQ3quVzJ9wRRQRW4H/m+Ou/wl8EWhOAn4LmWuj/oQTwzV/OcDi/xfwD8kXYl3A58icRvSfJfV8aPd7WbaIaFXm0m0/IbNldF9E/CifvpEJw78GTkuWtUfS18gMWWwhc+7q/nwn6cN3B2jX2+eA7yrzJeZjZE6rur+vxhHxvKTPAA8lz0sHmb9qXk3+0nmezEWKnx6ofdZilwF/Lak7uf9jOVb9IeCrkiaQGS7L9wpSzclyITP89QUye0hNAbolfSqpt88ht4g4lnw5+aVkiKiGzPttA/Bh4GuSDpIZEuwZUnoCeIXM67ceyHXt0geAjyrzBfoLZIZmBnIzmb9a/y3ZlmmJiGv7ea98k8zzdpisL/KH+V6tSD7lr6UiCZ/3RsQHB/m4sUBXRHQmW6b/GBHnF6XIMiRpUkQcSKb/GJgbEZ9MuSwrAm+524hTZpfDa8iMWQ/WQuD7yVb1MeA/F7K2CnBdsgVcQ+avkZvTLceKxVvuZmZlyF+ompmVIYe7mVkZcribmZUhh7uZWRlyuJuZlSGHu5lZGfr/6mnuTs/5iKkAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying levels of L1 regularization')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using L1 regularization can encourage sparse weight vectors. Does this hold true in this case? We can find out by examining the number of entries in the weight vector that are zero, with increasing levels of regularization:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"L1 (1.0) number of zero weights: 4\n",
"L1 (10.0) number of zeros weights: 33\n",
"L1 (100.0) number of zeros weights: 58\n"
]
}
],
"source": [
"model_l1 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=1.0, regType='l1', intercept=False)\n",
"\n",
"model_l1_10 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=10.0, regType='l1', intercept=False)\n",
"\n",
"model_l1_100 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=100.0, regType='l1', intercept=False)\n",
"\n",
"print (\"L1 (1.0) number of zero weights: \" + str(sum(model_l1.weights.array == 0)))\n",
"\n",
"print (\"L1 (10.0) number of zeros weights: \" + str(sum(model_l1_10.weights.array == 0)))\n",
"\n",
"print (\"L1 (100.0) number of zeros weights: \" + str(sum(model_l1_100.weights.array == 0)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intercept"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The final parameter option for the linear model is whether to use an intercept or not. An intercept is a constant term that is added to the weight vector and effectively accounts for the mean value of the target variable. If the data is already centered or normalized, an intercept is not necessary; however, it often does not hurt to use one in any case."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[False, True]\n",
"[1.414347928269498, 1.4431958566566532]\n"
]
}
],
"source": [
"params = [False, True]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, 1.0, 'l2', param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAF5NJREFUeJzt3X+8HXV95/HX24Qfrr9Ac7UUiIkWtdEi2lvWaqtYaRvoLrGVKqxWcNHUrVh31T6Kq0sRu60/tqurYjV1EbUtiNa2WYwLiiBURQkCgUCjMVDJQktE5LHqiqKf/WPmwuFwbs7c3JN7yfB6Ph7ncefHd2Y+Z+657ztn5sz3pKqQJPXLgxa7AEnS5BnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPLV2sDS9btqxWrFixWJuXpD3SFVdc8a2qmhrXbtHCfcWKFWzcuHGxNi9Je6Qk/9SlnadlJKmHDHdJ6iHDXZJ6yHCXpB4y3CWphwx3Seohw12Seshwl6QeMtwlqYcW7Q5Vqe8+ueWWxS5B91O/9cQDdvs2PHKXpB4aG+5Jzkxya5Jrx7T7hSQ/TnLs5MqTJO2KLkfuZwGrd9YgyRLgbcD5E6hJkjRPY8+5V9UlSVaMafZq4G+AX5hATWN5LlM7sxDnM6X7u3mfc09yIPCbwPvnX44kaRImcUH1XcAfVtWPxzVMsjbJxiQbd+zYMYFNS5JGmcRHIaeBc5IALAOOTnJXVf3dcMOqWgesA5ienq4JbFuSNMK8w72qVs4MJzkLOG9UsEuSFs7YcE9yNnAEsCzJduCPgL0Aqsrz7JJ0P9Tl0zLHd11ZVZ04r2okSRPhHaqS1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg+NDfckZya5Ncm1s8x/cZJN7eOLSZ46+TIlSXPR5cj9LGD1TubfADynqg4F3gKsm0BdkqR5WDquQVVdkmTFTuZ/cWD0MuCg+ZclSZqPSZ9zPwn49ITXKUmao7FH7l0leS5NuP/STtqsBdYCLF++fFKbliQNmciRe5JDgQ8Ca6rqttnaVdW6qpququmpqalJbFqSNMK8wz3JcuCTwO9U1dfmX5Ikab7GnpZJcjZwBLAsyXbgj4C9AKrq/cCpwKOA9yUBuKuqpndXwZKk8bp8Wub4MfNfDrx8YhVJkubNO1QlqYcMd0nqIcNdknrIcJekHjLcJamHDHdJ6iHDXZJ6yHCXpB4y3CWphwx3Seohw12Seshwl6QeMtwlqYcMd0nqIcNdknrIcJekHjLcJamHDHdJ6iHDXZJ6yHCXpB4aG+5Jzkxya5JrZ5mfJO9OsjXJpiRPn3yZkqS56HLkfhaweifzjwIOaR9rgT+ff1mSpPkYG+5VdQnw7Z00WQN8pBqXAfslOWBSBUqS5m4S59wPBG4aGN/eTruPJGuTbEyycceOHRPYtCRplEmEe0ZMq1ENq2pdVU1X1fTU1NQENi1JGmUS4b4dOHhg/CDg5gmsV5K0iyYR7uuBl7afmnkGcEdV3TKB9UqSdtHScQ2SnA0cASxLsh34I2AvgKp6P7ABOBrYCnwfeNnuKlaS1M3YcK+q48fML+BVE6tIkjRv3qEqST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ53CPcnqJFuSbE1yyoj5y5NclOTKJJuSHD35UiVJXY0N9yRLgDOAo4BVwPFJVg01exNwblU9DTgOeN+kC5UkddflyP1wYGtVbauqHwLnAGuG2hTw8Hb4EcDNkytRkjRXSzu0ORC4aWB8O/Cvh9qcBlyQ5NXAQ4AjJ1KdJGmXdDlyz4hpNTR+PHBWVR0EHA18NMl91p1kbZKNSTbu2LFj7tVKkjrpEu7bgYMHxg/ivqddTgLOBaiqLwH7AsuGV1RV66pquqqmp6amdq1iSdJYXcL9cuCQJCuT7E1zwXT9UJtvAs8DSPKzNOHuobkkLZKx4V5VdwEnA+cD19N8KmZzktOTHNM2ex3wiiRXA2cDJ1bV8KkbSdIC6XJBlaraAGwYmnbqwPB1wLMmW5okaVd5h6ok9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOdwj3J6iRbkmxNcsosbV6Y5Lokm5P89WTLlCTNxdJxDZIsAc4AfhXYDlyeZH1VXTfQ5hDgDcCzqur2JI/eXQVLksbrcuR+OLC1qrZV1Q+Bc4A1Q21eAZxRVbcDVNWtky1TkjQXXcL9QOCmgfHt7bRBTwCekOQLSS5LsnpSBUqS5m7saRkgI6bViPUcAhwBHARcmuQpVfWde60oWQusBVi+fPmci5UkddPlyH07cPDA+EHAzSPa/H1V/aiqbgC20IT9vVTVuqqarqrpqampXa1ZkjRGl3C/HDgkycokewPHAeuH2vwd8FyAJMtoTtNsm2ShkqTuxoZ7Vd0FnAycD1wPnFtVm5OcnuSYttn5wG1JrgMuAv6gqm7bXUVLknauyzl3qmoDsGFo2qkDwwW8tn1IkhaZd6hKUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST3UKdyTrE6yJcnWJKfspN2xSSrJ9ORKlCTN1dhwT7IEOAM4ClgFHJ9k1Yh2DwN+H/jypIuUJM1NlyP3w4GtVbWtqn4InAOsGdHuLcDbgR9MsD5J0i7oEu4HAjcNjG9vp90tydOAg6vqvJ2tKMnaJBuTbNyxY8eci5UkddMl3DNiWt09M3kQ8E7gdeNWVFXrqmq6qqanpqa6VylJmpMu4b4dOHhg/CDg5oHxhwFPAS5OciPwDGC9F1UlafF0CffLgUOSrEyyN3AcsH5mZlXdUVXLqmpFVa0ALgOOqaqNu6ViSdJYY8O9qu4CTgbOB64Hzq2qzUlOT3LM7i5QkjR3S7s0qqoNwIahaafO0vaI+ZclSZoP71CVpB4y3CWphwx3Seohw12Seshwl6QeMtwlqYcMd0nqIcNdknrIcJekHjLcJamHDHdJ6iHDXZJ6yHCXpB4y3CWphwx3Seohw12Seshwl6QeMtwlqYcMd0nqoU7hnmR1ki1JtiY5ZcT81ya5LsmmJBcmeezkS5UkdTU23JMsAc4AjgJWAccnWTXU7EpguqoOBT4BvH3ShUqSuuty5H44sLWqtlXVD4FzgDWDDarqoqr6fjt6GXDQZMuUJM1Fl3A/ELhpYHx7O202JwGfnk9RkqT5WdqhTUZMq5ENk5cA08BzZpm/FlgLsHz58o4lSpLmqsuR+3bg4IHxg4CbhxslORJ4I3BMVd05akVVta6qpqtqempqalfqlSR10CXcLwcOSbIyyd7AccD6wQZJngZ8gCbYb518mZKkuRgb7lV1F3AycD5wPXBuVW1OcnqSY9pm7wAeCnw8yVVJ1s+yOknSAuhyzp2q2gBsGJp26sDwkROuS5I0D96hKkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1kOEuST1kuEtSDxnuktRDhrsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPWS4S1IPGe6S1EOGuyT1UKdwT7I6yZYkW5OcMmL+Pkk+1s7/cpIVky5UktTd2HBPsgQ4AzgKWAUcn2TVULOTgNur6meAdwJvm3ShkqTuuhy5Hw5sraptVfVD4BxgzVCbNcCH2+FPAM9LksmVKUmaiy7hfiBw08D49nbayDZVdRdwB/CoSRQoSZq7pR3ajDoCr11oQ5K1wNp29LtJtnTY/mJaBnxrsYvowDonb0+p1Tona0+o87FdGnUJ9+3AwQPjBwE3z9Jme5KlwCOAbw+vqKrWAeu6FHZ/kGRjVU0vdh3jWOfk7Sm1Wudk7Sl1dtHltMzlwCFJVibZGzgOWD/UZj1wQjt8LPC5qrrPkbskaWGMPXKvqruSnAycDywBzqyqzUlOBzZW1XrgfwIfTbKV5oj9uN1ZtCRp57qclqGqNgAbhqadOjD8A+C3J1va/cKecgrJOidvT6nVOidrT6lzrHj2RJL6x+4HJKmHHvDhnuSRST6T5Ovtz/1HtDksyZeSbE6yKcmLBuadleSGJFe1j8MmXN8ud/2Q5A3t9C1Jfn2Sde1Cna9Ncl27/y5M8tiBeT8e2H/DF+sXus4Tk+wYqOflA/NOaF8nX09ywvCyC1znOwdq/FqS7wzMW8j9eWaSW5NcO8v8JHl3+zw2JXn6wLyF3J/j6nxxW9+mJF9M8tSBeTcmuabdnxt3Z50TVVUP6AfwduCUdvgU4G0j2jwBOKQd/mngFmC/dvws4NjdVNsS4BvA44C9gauBVUNtfg94fzt8HPCxdnhV234fYGW7niWLWOdzgX/VDv+HmTrb8e8u0O+6S50nAu8dsewjgW3tz/3b4f0Xq86h9q+m+aDDgu7PdlvPBp4OXDvL/KOBT9PcC/MM4MsLvT871vnMme3TdLXy5YF5NwLLFmqfTurxgD9y595dJ3wYeP5wg6r6WlV9vR2+GbgVmFqA2ubT9cMa4JyqurOqbgC2tutblDqr6qKq+n47ehnN/RILrcv+nM2vA5+pqm9X1e3AZ4DV95M6jwfO3k217FRVXcKIe1oGrAE+Uo3LgP2SHMDC7s+xdVbVF9s6YPFenxNluMNjquoWgPbno3fWOMnhNEdT3xiY/F/bt3PvTLLPBGubT9cPXZZdyDoHnURzNDdj3yQbk1yW5D7/XCeoa50vaH+fn0gycwPf/XJ/tqe3VgKfG5i8UPuzi9mey0Luz7kafn0WcEGSK9q77PcInT4KuadL8lngp0bMeuMc13MA8FHghKr6STv5DcA/0wT+OuAPgdN3vdp7b3LEtK5dP3TqEmJCOm8ryUuAaeA5A5OXV9XNSR4HfC7JNVX1jVHLL0Cd/ws4u6ruTPJKmndFv9Jx2UmZy7aOAz5RVT8emLZQ+7OL+8Prs7Mkz6UJ918amPysdn8+GvhMkn9s3wncrz0gjtyr6siqesqIx98D/9KG9kx43zpqHUkeDnwKeFP79nJm3be0bznvBD7EZE99zKXrB3Lvrh+6LLuQdZLkSJp/qMe0+wu4+1QXVbUNuBh42mLVWVW3DdT2F8DPd112IesccBxDp2QWcH92MdtzWcj92UmSQ4EPAmuq6raZ6QP781bgb9l9pzcna7FP+i/2A3gH976g+vYRbfYGLgT+44h5B7Q/A7wLeOsEa1tKc6FpJfdcWHvyUJtXce8Lque2w0/m3hdUt7H7Lqh2qfNpNKeyDhmavj+wTzu8DPg6O7l4uAB1HjAw/JvAZe3wI4Eb2nr3b4cfuVh1tu2eSHOxL4uxPwe2uYLZL1T+Bve+oPqVhd6fHetcTnNd6plD0x8CPGxg+IvA6t1Z58Se72IXsNgPmvPTF7Z/BBfOvMBoTh18sB1+CfAj4KqBx2HtvM8B1wDXAn8JPHTC9R0NfK0Nxje2006nOfoF2Bf4ePvC/ArwuIFl39gutwU4ajfvx3F1fhb4l4H9t76d/sx2/13d/jxpkev8U2BzW89FwJMGlv337X7eCrxsMetsx09j6GBiEfbn2TSfHvsRzdH4ScArgVe280PzZT/faOuZXqT9Oa7ODwK3D7w+N7bTH9fuy6vb18Ubd2edk3x4h6ok9dAD4py7JD3QGO6S1EOGuyT1kOEuST1kuEtSDxnue4AkleSjA+NL254Lzxuz3GFJjt7J/Okk755krSO2ccxMr4ZJnp9k1cC8i5NM5Psqk/znSaxnlnXfmGTZLiz3wZnnO1hfkhWz9U44SUk2JNlvTJsTk/z07q7l/rr9PjPc9wzfA56S5MHt+K8C/6fDcofRfF76PpIsraqNVfX7E6pxpKpaX1VvbUefT9Nb5e6w28J9V1XVy6vqunZ0weurqqOr6jtjmp1I09NpZ+2d0JMy5+2rG8N9z/Fpmrv9YKgXwCQPafurvjzJlUnWpPky89OBF7X9UL8oyWlJ1iW5APhIkiNmjv6TPDTJh9p+qzcleUGSJWn6q7+2nf6fBgtq529r++zeL8lPkjy7nXdpkp9pj8zem+SZwDHAO9p6Ht+u5reTfCVNn+S/3C6770AtV7b9fcwc5b13YPvntc/hrcCD2/X+1fCOS/LnbUdam5O8eWD6jUnenOSr7bae1E5/VJIL2m1/gBH9oCR5YZL/3g6/Jsm2dvjxSf6hHb64fXc0qr4lSf6iremCgX/cg9v4t2n66L8yyWeTPKadflr7+7643f8j/0HPvONo3ylcP7y9JMfS3Kz3V21tD07y80k+n6aTrPNzT9ccFyf5kySfB16T5DFJ/jbJ1e3jmW27l7S/z6uSfCDJknb6d5P8WbuvL0wyNWr7o56HdtFi30XlY/wD+C5wKE2XvvvS3EF3BHBeO/9PgJe0w/vR3Nn4EIb6Jqe5o/EK4MHt+OA63ga8a6Dt/jT9qnxmYNp+I2r73zRdHfwb4HKau2L3AW5o599dA0N939P0e/Jn7fDRwGfb4dcBH2qHnwR8s33ew8/nPOCImX20k/03c9fxknabh7bjNwKvbod/j3vuSH43cGo7/Bs0HVotG1rnTwGXt8OfaJ/7gcAJwJ8OPL/p4fpoboO/i3vucj535vc3tI39ueerMF8+sK9Oo7kNfh+aLgZuA/YasfyN7fxZtzdU417teqfa8RfR9hPftnvfwLo/RtsdR7tfHwH8LE3Ha3u1098HvLQdLuDF7fCpA6+Ju7fvY7KPB0SvkH1QVZvSfMvS8Qx9WTnwa8AxSV7fju9L01fGKOur6v+NmH4kTd80M9u7vT0afVyS99B0mnbBiOUupfkihJU0t+6/Avg8Tdh18cn25xU0IQRNj3zvaev4xyT/RPOFKbvqhWm6al0KHEBzamjTiO3/Vjv87JnhqvpUktsZUlX/3L7beRhNB1h/3S73ywPr3JkbquqqgW2vGNHmIOBj7dHz3jT9r8z4VDUdnN2Z5FbgMTS31c9ne08EnkLT8yE0oX3LwPyPDQz/CvBSgGp6pLwjye/QHBBc3i7/YO7piO8nA8v/Jd32kebB0zJ7lvXAf+O+X8wQ4AVVdVj7WF5V18+yju/NMj0MdblazZcXPJXm6OpVNP1vDLuUJtAOp/mnsx/NO4KuXaLO9MD4Y+7pgnpUd7DQHH0Ovmb3HbfyJCuB1wPPq6pDaf5JDS43avvQrfvZLwEvo+m7Z2Y//CLwhQ7L3jkwPLztGe+hOcL9OeB3Z6l7Z8vPdXsBNg+8jn6uqn5tYP5sr53B5T88sPwTq+q0Wdra78luZrjvWc4ETq+qa4amnw+8Ou3hUpKZLl7/L/Cwjuu+ADh5ZiTJ/mk+IfKgqvob4L/QfE3ZsC/TdFb1k6r6Ac0po9+lCbthXeu5BHhxW8cTaN6FbKE5zXBYkgel+RKNwa5Xf5RkrxHrejhNKN3RnrM+ao7bP4rm9Mhs7V7f/ryS5qsE76yqO0a0na2+nXkE91w4P2GOy3Y1+DvZAkwl+UWAJHslefIsy11I83WJM9deHt5OOzZNv+cz30/82Lb9g4Bj2+F/B/zDiO1rggz3PUhVba+q/zFi1ltozpduSvMRu7e00y8CVrUXq140YrlBfwzsn+bi6dU0QXUgcHGSq2jOl79hRE130nyjzkwf95fS/LEO/wOC5uvi/qC9QPj4EfNnvI/mguM1NG/lT2y38wWaUxPX0LyD+erAMuva53+vC6pVdTVN8G6m+efY5aj6zcCzk3yV5pTXN2dpdynNKZlL2lMTN3FPaA0bWd8YpwEfT3Ip8K05LDcXZwHvb3/HS2gC+G3ta+Aqmn/co7wGeG77O7qCpkvi64A30Xxr0Saar847oG3/PeDJSa6gOaUz84U2d2/fC6qTZa+Qkna7JN+tqocudh0PJB65S1IPeeQuST3kkbsk9ZDhLkk9ZLhLUg8Z7pLUQ4a7JPWQ4S5JPfT/AR/BbR3UMogHAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"bar(params, metrics, color='lightblue')\n",
"pyplot.xlabel('Metrics without and with an intercept')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Decision Tree"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we have seen, decision tree models typically work on raw features (that is, it is not required to convert categorical features into a binary vector encoding; they can, instead, be used directly). Therefore, we will create a separate function to extract the decision tree feature vector, which simply converts all the values to floats and wraps them in a numpy array:"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"def extract_features_dt(record):\n",
" return np.array(list(map(float, record[2:14])))"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"def extract_label(record):\n",
" return float(record[-1])"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"data_dt = data_rec.map(lambda r: LabeledPoint(extract_label(r),extract_features_dt(r)))\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Decision Tree feature vector: [1.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.81,0.0]\n",
"Decision Tree feature vector length: 12\n"
]
}
],
"source": [
"first_point_dt = data_dt.first()\n",
"print (\"Decision Tree feature vector: \" + str(first_point_dt.features))\n",
"print (\"Decision Tree feature vector length: \" + str(len(first_point_dt.features)))"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.tree import DecisionTree"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Decision Tree predictions: [(16.0, 54.913223140495866), (40.0, 54.913223140495866), (32.0, 53.171052631578945), (13.0, 14.284023668639053), (1.0, 14.284023668639053)]\n",
"Decision Tree depth: 5\n",
"Decision Tree number of nodes: 63\n"
]
}
],
"source": [
"dt_model = DecisionTree.trainRegressor(data_dt,{})\n",
"preds = dt_model.predict(data_dt.map(lambda p: p.features))\n",
"actual = data.map(lambda p: p.label)\n",
"true_vs_predicted_dt = actual.zip(preds)\n",
"print (\"Decision Tree predictions: \" + str(true_vs_predicted_dt.take(5)))\n",
"print (\"Decision Tree depth: \" + str(dt_model.depth()))\n",
"print (\"Decision Tree number of nodes: \" + str(dt_model.numNodes()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use the same approach for the decision tree model, using the true_vs_predicted_dt RDD:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"17379\n",
"log - Mean Squared Error: 11611.4860\n",
"log - Mean Absolue Error: 71.1502\n",
"Root Mean Squared Log Error: 0.6251\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_dt.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared Error: %2.4f\" % t)\n",
"print(\"log - Mean Absolue Error: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log Error: %2.4f\" % s_log_mean)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Impact of training on log-transformed targets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will perform the same analysis for the decision tree model:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"data_dt_log = data_dt.map(lambda lp: LabeledPoint(np.log(lp.label), lp.features))\n",
"\n",
"dt_model_log = DecisionTree.trainRegressor(data_dt_log,{})\n",
"\n",
"preds_log = dt_model_log.predict(data_dt_log.map(lambda p: p.features))\n",
"\n",
"actual_log = data_dt_log.map(lambda p: p.label)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"new=actual_log.zip(preds_log)"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(2.772588722239781, 3.6251613906330347),\n",
" (3.6888794541139363, 3.6251613906330347),\n",
" (3.4657359027997265, 1.985090627799027),\n",
" (2.5649493574615367, 1.985090627799027),\n",
" (0.0, 1.985090627799027)]"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new.take(5)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"true_vs_predicted_dt_log=[]\n",
"for val in new.collect():\n",
" t,p=val[0],val[1]\n",
" x=np.exp(t),np.exp(p)\n",
" true_vs_predicted_dt_log.append(x)"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"17379\n",
"log - Mean Squared Error: 14781.5760\n",
"log - Mean Absolue Error: 76.4131\n",
"Root Mean Squared Log Error: 0.6406\n",
"Non log-transformed predictions:\n",
"[(16.0, 54.913223140495866), (40.0, 54.913223140495866), (32.0, 53.171052631578945)]\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_dt_log:\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared Error: %2.4f\" % t)\n",
"print(\"log - Mean Absolue Error: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log Error: %2.4f\" % s_log_mean)\n",
"print (\"Non log-transformed predictions:\\n\" + str(true_vs_predicted_dt.take(3)))\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CROSS VALIDATION for the decision tree"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"train_dt, test_dt = data_dt.randomSplit([0.8, 0.2], seed=12345)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The impact of parameter settings for the decision tree"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [],
"source": [
"def evaluate_dt(train, test, maxDepth, maxBins):\n",
"\n",
" model = DecisionTree.trainRegressor(train, {}, impurity='variance', maxDepth=maxDepth, maxBins=maxBins)\n",
"\n",
" preds = model.predict(test.map(lambda p: p.features))\n",
"\n",
" actual = test.map(lambda p: p.label)\n",
"\n",
" tp = actual.zip(preds)\n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tree depth"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We would generally expect performance to increase with more complex trees (that is, trees of greater depth). Having a lower tree depth acts as a form of regularization, and it might be the case that as with L2 or L1 regularization in linear models, there is a tree depth that is optimal with respect to the test set performance.\n",
"\n",
"Here, we will try to increase the depths of trees to see what impact they have on test set RMSLE, keeping the number of bins at the default level of 32:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 2, 3, 4, 5, 10, 20]\n",
"[1.0009455704281573, 0.9071380409401831, 0.8083991513814845, 0.7316093046671605, 0.6252775817287765, 0.43025139584509925, 0.4467589576168234]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xt8VPWd//HXJ/cQ7hAQIZCAKCIqQgC5qNS2Lri7gHepd1QEsd3t/tyt3bb+XNtt163tdrUgoiLiDe9KrRWveOEmAQRBLoZ7ACFyv4eQ7/5xTnQYJslAJjkzk/fz8ZgHM+d8Z+bNmck7J2dmvmPOOUREJLmkBB1ARERiT+UuIpKEVO4iIklI5S4ikoRU7iIiSUjlLiKShFTuIiJJSOUuIpKEVO4iIkkoLag7bt26tcvPzw/q7kVEEtKCBQu+cc7l1jQusHLPz8+nqKgoqLsXEUlIZrY+mnE6LCMikoRU7iIiSUjlLiKShFTuIiJJSOUuIpKEaix3M5tsZtvMbGkV683MHjKzYjNbYma9Yh9TRERORDR77lOAIdWsHwp09U+jgUdqH0tERGqjxnJ3zn0M7KhmyHBgqvPMBZqbWbtYBQy3dNNuHnh7Bfp6QBGRqsXimHt7YGPI5RJ/WZ1YuGEnj8xczZw12+vqLkREEl4syt0iLIu4W21mo82syMyKSktLT+rOri7Mo02TTB56/6uTur6ISEMQi3IvAfJCLncANkca6Jyb5JwrdM4V5ubWODVCRFnpqYy5qAtz1+xgnvbeRUQiikW5Twdu9N81cz6w2zm3JQa3W6Uf9etI68aZPPSB9t5FRCKJ5q2QzwNzgDPMrMTMbjWzMWY2xh/yFrAGKAYeA+6ss7Q+b++9M7OKt1O0rrrXekVEGiYL6l0nhYWFrjazQh4oK+eCBz7krPbNmDqqbwyTiYjELzNb4JwrrGlcwn5CtVFGGrdf2JmPV5WyaMPOoOOIiMSVhC13gBvO70SLRuk8/EFx0FFEROJKQpd7TmYat13QmQ9WbGNJya6g44iIxI2ELneAG/t3oll2Og+9r713EZFKCV/uTbLSuXVQAe8t38rSTbuDjiMiEhcSvtwBbhqQT5OsNP6sY+8iIkCSlHuz7HRuGVjA28u+ZvmWPUHHEREJXFKUO8Cogfk0ztTeu4gIJFG5N2+Uwc0D8nlr6RZWbd0bdBwRkUAlTbkD3DqogOz0VO29i0iDl1Tl3iIngxv75/OXJZsp3rYv6DgiIoFJqnIHuP2CArLSUhn/ofbeRaThSrpyb9U4kxv6d+KNzzex9pv9QccREQlE0pU7wO0XdCY9NUV77yLSYCVluec2yeS6fp14bdEmNmw/EHQcEZF6l5TlDnDHRZ1JTTHtvYtIg5S05d62aRYj++TxysISNu7Q3ruINCxJW+4AYwZ3IcWMRz5aHXQUEZF6ldTl3q5ZNlf36cBLRRvZvOtg0HFEROpNUpc7wNjBpwEwUXvvItKAJH25t2+ezZW9OzDts418vftQ0HFEROpF0pc7wJ2DT6PCOe29i0iD0SDKPa9lIy7v1Z7nP9vAtj3aexeR5Ncgyh1g3PdOo7zCMenjNUFHERGpcw2m3Du1ymF4z1N5Zt56SvceDjqOiEidajDlDt7ee1l5BY9/or13EUluUZW7mQ0xs5VmVmxm90RY38nM3jezJWY208w6xD5q7XXJbcw/nnsqU+esZ+f+sqDjiIjUmRrL3cxSgfHAUKA7MNLMuocNexCY6pw7B7gf+F2sg8bKuO+dxsEjR3lqzrqgo4iI1Jlo9tz7AsXOuTXOuTJgGjA8bEx34H3//IcR1seN09s24QdntmXK7HUcKCsPOo6ISJ2IptzbAxtDLpf4y0ItBq7wz18GNDGzVuE3ZGajzazIzIpKS0tPJm9MjB3chV0HjjDts401DxYRSUDRlLtFWObCLt8NXGRmi4CLgE3AcbvFzrlJzrlC51xhbm7uCYeNld6dWtC3oCWPf7KGsvKKwHKIiNSVaMq9BMgLudwB2Bw6wDm32Tl3uXPuPOAX/rLdMUtZB8YO7sLm3YeYvnhzzYNFRBJMNOU+H+hqZgVmlgFcC0wPHWBmrc2s8rZ+DkyObczYG3x6Lt1OacLEj1ZTURH+h4iISGKrsdydc+XAXcAMYDnwonNumZndb2bD/GGDgZVmtgpoC/xnHeWNGTNj7OAuFG/bx3vLtwYdR0Qkpsy5YPZaCwsLXVFRUSD3Xan8aAWDH5xJbpNMXh07ALNILy+IiMQPM1vgnCusaVyD+oRquLTUFO64sDOLNuzis7U7go4jIhIzDbrcAa4qzKNVToa+ik9EkkqDL/es9FRGDSpg5spSvty8J+g4IiIx0eDLHeD68zvRODNNX+YhIklD5Q40y07nun4deXPJZjZsPxB0HBGRWlO5+0YNKiAtJYVJn2jvXUQSn8rd17ZpFlf0bs+LRSX6Mg8RSXgq9xCjL+zCkaMVPDlrbdBRRERqReUeoqB1Dpf2aMfTc9ez99CRoOOIiJw0lXuYMRd1Ye+hcp6btyHoKCIiJ03lHubsDs0YdFprHv90LYeOHA06jojISVG5RzB2cBdK9x7mtUWbgo4iInJSVO4RDOjSinM6NOPRj1ZzVNMBi0gCUrlHYGaMvagL67Yf4O2lXwcdR0TkhKncq3DJWafQuXUOj3xUTFDTIouInCyVexVSU4w7LurM0k17+LT4m6DjiIicEJV7NUac1562TTN5ZKamJBCRxKJyr0ZmWiq3DerM7NXbWbxxV9BxRESipnKvwch+HWmapemARSSxqNxr0DgzjZsG5PP2sq9ZXbov6DgiIlFRuUfhpgH5ZKSmMOmjNUFHERGJiso9Cq0bZ3JNnzxeXVTC17sPBR1HRKRGKvco3X5BZyocPPGp9t5FJP6p3KOU17IR/3hOO56bt4FdB8qCjiMiUi2V+wkYM7gL+8uO8vSc9UFHERGpVlTlbmZDzGylmRWb2T0R1nc0sw/NbJGZLTGzS2MfNXjdTmnKxd3a8OTsdew7XB50HBGRKtVY7maWCowHhgLdgZFm1j1s2C+BF51z5wHXAhNiHTRe/OT7Xdmxv4yH3v8q6CgiIlWKZs+9L1DsnFvjnCsDpgHDw8Y4oKl/vhmwOXYR40vPvOZc2yePyZ+uZdXWvUHHERGJKJpybw9sDLlc4i8LdR9wvZmVAG8BP45Jujj1b0O6kZOZxr1vLNWMkSISl6Ipd4uwLLzRRgJTnHMdgEuBp83suNs2s9FmVmRmRaWlpSeeNk60zMng34acwdw1O5i+OGn/SBGRBBZNuZcAeSGXO3D8YZdbgRcBnHNzgCygdfgNOecmOecKnXOFubm5J5c4TlzbpyPndGjGb/66nL2HjgQdR0TkGNGU+3ygq5kVmFkG3gum08PGbAC+D2BmZ+KVe+LumkchNcX49fAefLPvMH96Ty+uikh8qbHcnXPlwF3ADGA53rtilpnZ/WY2zB/2/4DbzWwx8Dxws2sAB6PPzWvOyL4dmTJ7HSu+3hN0HBGRb1lQHVxYWOiKiooCue9Y2rm/jIv/MJOubZrwwh3nYxbpJQoRkdgwswXOucKaxukTqrXUIieDnw3pxmfrdvDaok1BxxERAVTuMXF1YR4985rz27eWs/ugXlwVkeCp3GMgJcX4zYgebN9fxv+8uyroOCIiKvdY6dG+Gdf368TUOev4crNeXBWRYKncY+juS86geaMM7n1jKRUVSf9mIRGJYyr3GGrWKJ17hnajaP1OXtWLqyISIJV7jF3ZqwO9Ojbnd28tZ/cBvbgqIsFQucdYSorx6xE92HmgjD+8uzLoOCLSQKnc68BZpzbjxv75PDN3PUs37Q46jog0QCr3OvLTH55Oy5wMfqUXV0UkACr3OtIsO52fDz2TRRt28fKCkqDjiEgDo3KvQ5f3ak+f/Bb819sr2HWgLOg4ItKAqNzrkJlx//Ae7D54hN/P0IurIlJ/VO517Mx2Tbmpfz7PfbaBJSW7go4jIg2Eyr0e/PMPu9K6cSa/el0vropI/VC514OmWen84tIzWVyymxeKNtZ8BRGRWlK515PhPU+lX0FLHnh7BTv368VVEalbKvd6YuZ9cnXvoXL+e8aKoOOISJJTudej09s2YdTAfKbN30jxtr1BxxGRJKZyr2djLupCemoKT85aF3QUEUliKvd61qpxJiN6nsqrCzdp1kgRqTMq9wDcMrCAg0eOMm3+hqCjiEiSUrkH4Mx2TTm/c0uemr2O8qMVQccRkSSkcg/IqIEFbN59iHe+3Bp0FBFJQir3gHz/zLbktczmyVlrg44iIklI5R6Q1BTjpv75zF+3ky9K9IUeIhJbUZW7mQ0xs5VmVmxm90RY/z9m9rl/WmVmmiErClf3ySMnI1V77yISczWWu5mlAuOBoUB3YKSZdQ8d45z7qXOup3OuJ/Aw8GpdhE02TbPSuaowj78s2cy2vYeCjiMiSSSaPfe+QLFzbo1zrgyYBgyvZvxI4PlYhGsIbhqQT3mF45m5elukiMRONOXeHgidyrDEX3YcM+sEFAAfVLF+tJkVmVlRaWnpiWZNSgWtc7j4jDY8N289h8uPBh1HRJJENOVuEZZVNSn5tcDLzrmILeWcm+ScK3TOFebm5kabMendMrCAb/aV8ZfFW4KOIiJJIppyLwHyQi53ADZXMfZadEjmhA08rRWnt23M5E/X4py+zENEai+acp8PdDWzAjPLwCvw6eGDzOwMoAUwJ7YRk5+ZccvAAr7csofP1u4IOo6IJIEay905Vw7cBcwAlgMvOueWmdn9ZjYsZOhIYJrTrudJGdGzPc0bpWu2SBGJibRoBjnn3gLeClt2b9jl+2IXq+HJzkjlR307MvGj1WzccYC8lo2CjiQiCUyfUI0jN/TvhJkxdc66oKOISIJTuceRds2yGdrjFKbN38j+w+VBxxGRBKZyjzOjBhWw91A5rywsCTqKiCQwlXuc6dWxBefmNWfKrHVUVOi1aRE5OSr3ODRqYD5rvtnPR6v0KV4ROTkq9zg0tEc72jbNZLJmixSRk6Ryj0MZaSnccH4nPvnqG77aujfoOCKSgFTucWpk345kpKXw5Ox1QUcRkQSkco9TrRpnclnP9ry6sIRdB8qCjiMiCUblHsduGZTPoSMVTJu/sebBIiIhVO5xrNspTenfuRVTZ6+j/GhF0HFEJIGo3OPcqEEFbN59iBnLtgYdRUQSiMo9zl3crQ0dWzbSl2iLyAlRuce51BTjpgH5FK3fyZKSXUHHEZEEoXJPAFcXdqBxZprmeheRqKncE0CTrHSu7N2BN5dsZtueQ0HHEZEEoHJPEDcPyKe8wvHM3PVBRxGRBKByTxD5rXP4frc2PDtvA4eOHA06jojEOZV7ArllYAHb95fxl8Wbg44iInFO5Z5ABnRpxRltmzBZc72LSA1U7gnEzLj9ws4s37KHH09bpMMzIlKltKADyIm5old7tu87zH+9vYKSnQd57MbetGmSFXQsEYkz2nNPMGbGHRd14dHre7Pq672M+PMsvty8J+hYIhJnVO4J6pKzTuGlMf2pcHDlxNm896XmnhGR76jcE1iP9s14466BnNamMbc/XcTjn6zBOb3QKiJRlruZDTGzlWZWbGb3VDHmajP70syWmdlzsY0pVWnbNIsXRvdnaI9T+M1fl/Pvr33BEU0PLNLg1fiCqpmlAuOBHwIlwHwzm+6c+zJkTFfg58BA59xOM2tTV4HleNkZqfx5ZC/+2HoVf/6wmPXbDzDhul40b5QRdDQRCUg0e+59gWLn3BrnXBkwDRgeNuZ2YLxzbieAc25bbGNKTVJSjLv/7gz+ePW5FK3byeUTZrP2m/1BxxKRgERT7u2B0O95K/GXhTodON3MZpnZXDMbEquAcmIu79WBZ2/vx66DRxgxfhZzVm8POpKIBCCacrcIy8JftUsDugKDgZHA42bW/LgbMhttZkVmVlRaWnqiWSVKffJb8vqdA2nTJJMbnpjHC/M3BB1JROpZNOVeAuSFXO4AhE9uUgK84Zw74pxbC6zEK/tjOOcmOecKnXOFubm5J5tZotCxVSNeuXMA/bu04mevfMFv31rOUU1ZINJgRFPu84GuZlZgZhnAtcD0sDGvA98DMLPWeIdp1sQyqJy4plnpPHlzH27s34lJH6/hjqcXsP9wedCxRKQe1Fjuzrly4C5gBrAceNE5t8zM7jezYf6wGcB2M/sS+BD4V+ecDvbGgbTUFO4f3oP/GHYWH6zYypUT57B518GgY4lIHbOgPvRSWFjoioqKArnvhmrmym38+LlFZGWk8viNhZybd9zLIiIS58xsgXOusKZx+oRqAzL4jDa8cucAMtNSuPrROfx1yZagI4lIHVG5NzCnt23CG+MG0qN9M8Y9t5CH3/9KUxaIJCGVewPUqnEmz97Wj8vOa88f3l3Fv7y4mMPlmhteJJloPvcGKis9lT9efS5dcnN48J1VbNhxgEdv6E3rxplBRxORGNCeewNmZtx1cVcmXNeLpZt2M2L8LFZt3Rt0LBGJAZW7cOnZ7Xjxjv4cLq/gigmzmblSUwOJJDqVuwBwbl5z3hg3kLyWjRg1ZT5PzV4XdCQRqQWVu3zr1ObZvDSmPxd3a8v/n76Me99YSrnmhhdJSCp3OUZOZhqP3tCbOy7szNQ567llynz2HDoSdCwROUEqdzlOaorx80vP5IErzmbO6u1cMWE2G7YfCDqWiJwAlbtU6Zo+HXn61n5s23uYERNmMX/djqAjiUiUVO5Srf5dWvH6uIE0z07nusfm8erCkqAjiUgUVO5So4LWObx65wB6d2rBv7y4mN/PWEGF5oYXiWsqd4lK80YZTL21LyP75jH+w9WMe24hB8s0ZYFIvFK5S9TSU1P47WVn88u/P5O3l33NNZPmsHXPoaBjiUgEKnc5IWbGbRd05rEbCineto/hf57F0k27g44lImFU7nJSftC9LS+PGUCKwVUT5zBj2ddBRxKRECp3OWndT23K63cN5PRTmjDmmQU8+tFqzQ0vEidU7lIrbZpk8cLo87n07Hb87m8r+NkrSygr15QFIkHTfO5Sa1npqTx87Xl0yW3MQ+9/xfrtB5h4fW9a5GQEHU2kwdKeu8RESorxLz88nT9d05NFG3dx2YRZrC7dF3QskQZL5S4xNeK89jx/ez/2HirnsvGzmFX8TdCRRBoklbvEXO9OLXl93EBOaZbFTZM/47l5G4KOJNLgqNylTuS1bMQrYwcwqGtr/v21L/j1m19yVFMWiNQblbvUmSZZ6Tx+YyE3D8jniU/XMnpqEfsOlwcdS6RBULlLnUpLTeG+YWfx6xE9mLmqlCsfmU3JTs0NL1LXoip3MxtiZivNrNjM7omw/mYzKzWzz/3TbbGPKonshvM7MeWWPmzadZAR42ezaMPOoCOJJLUay93MUoHxwFCgOzDSzLpHGPqCc66nf3o8xjklCVzQNZfX7hxAo4xUrpk0l+mLNwcdSSRpRbPn3hcods6tcc6VAdOA4XUbS5LVaW2a8Pq4gfTs0JyfPL+IP723SlMWiNSBaMq9PbAx5HKJvyzcFWa2xMxeNrO8mKSTpNQyJ4Onb+vLFb068Kf3vuKfpn3OoSOaG14klqIpd4uwLHxX6y9AvnPuHOA94KmIN2Q22syKzKyotLT0xJJKUslMS+XBq87h34acwfTFmxn52FxK9x4OOpZI0oim3EuA0D3xDsAxB0udc9udc5U/mY8BvSPdkHNuknOu0DlXmJubezJ5JYmYGXcOPo2J1/di+ZY9jBg/ixVf7wk6lkhSiKbc5wNdzazAzDKAa4HpoQPMrF3IxWHA8thFlGQ3pEc7XrpjAOUVFVwxYTYfrtgWdCSRhFdjuTvnyoG7gBl4pf2ic26Zmd1vZsP8YT8xs2Vmthj4CXBzXQWW5HR2h2a8MW4QBbk53PrUfCZ/ulYvtIrUggX1A1RYWOiKiooCuW+JXwfKyvnpC58zY9lWruvXkfuGnUV6qj5rJ1LJzBY45wprGqefGokrjTLSeOS63owd3IVn523glifns/vgkaBjiSQclbvEnZQU42dDuvH7K89h3trtXD5hFuu37w86lkhCUblL3LqqMI9nbu3H9v1ljBg/i3lrtgcdSSRhqNwlrvXr3IrX7xxIi5wMrn9iHi8Vbaz5SiKicpf4l986h9fGDqRfQSv+9eUl/NffVlChueFFqqVyl4TQrFE6T97Shx/168jEj1Yz9tkFHCjT3PAiVVG5S8JIT03hP0f04N5/6M67X27lqolz+Hr3oaBjicQllbskFDNj1KACHr+pkHXf7Gf4+E/5omR30LFE4o7KXRLSxd3a8sqdA0hLSeGqR2fz9tItQUcSiSsqd0lY3U5pyuvjBnJmu6aMeWYhE2YWa8oCEZ/KXRJabpNMnr/9fIadeyr//fZK7n5pCYfLNTe8SFrQAURqKys9lf+9tiddchvzP++tYuOOA0y8oTctczKCjiYSGJW7JAUz459+0JWC3BzufmkxI8bPYvLNhZzWpknQ0SQJlR+t4FB5BYeOHPVPEc6Xhy//7vLQs9vRu1OLOs2ocpekMuzcU+nQIpvRUxdw2YTZTLiuFxd01RfDJLvjyza0WI8t24NHjnK4xiKOtPy79eUn+SE6M8hOT6Vr28Z1Xu6a8leSUsnOA9z2VBFfbdvHfcPO4obzOwUdqUEJLduDZUc5HKkky711h8orqizb74q4+hI+2bJNMe+wXlZ6KllpKWSlp5KZnkpWegrZlcvTU8hK+255Vnqqvy7Fv14qmZXn/dvJzkj9dl1Wesq3181ITcEs0jeXRi/aKX+15y5JqUOLRrw8dgA/eX4Rv3p9Kau37eOXf38maQ10bvjKsj1Y5pVipLI9WFmYVZTtwbIKDpVHLtvwAo9l2WaFFGnT7PRjyvaYko1Qttlh1w8t2+z0VNJTrdZlG69U7pK0Gmem8diNhfz2reU88ela1m3fz8Mjz6NJVnrQ0Sg/WuGXaUXEsv22aKso22+vG6FsvQKviHnZZqf7pZkWoWzTj91TPa5QMyKXtbfuu/PJXLb1TeUuSS01xfjVP3SnS25j7n1jKVc8MpsnbupDXstGx4w7cvTYvdjQYgwt4fCyPXbd8WV7MKSUY1W23+2NHl+2zbLTv1seoWyzvz3scHzZZofepso24ancpUH4Ub+OdGrViLHPLGDo/35Cs+z0mJRtaoodU5KVZesdc02psmxDDxeEl212xrGlrLKVk6FylwZj4GmteW3cQCbOXM1R5459YSwtctmGvzAWWrbeMduGeQxf4p/KXRqULrmN+f1V5wYdQ6TOabdDRCQJqdxFRJKQyl1EJAmp3EVEkpDKXUQkCancRUSSkMpdRCQJqdxFRJJQYFP+mlkpsD6QO69Za+CboENUQ/lqJ97zQfxnVL7aqU2+Ts65Gr+kILByj2dmVhTNfMlBUb7aifd8EP8Zla926iOfDsuIiCQhlbuISBJSuUc2KegANVC+2on3fBD/GZWvduo8n465i4gkIe25i4gkoQZb7maWZ2YfmtlyM1tmZv8UYcxgM9ttZp/7p3vrOeM6M/vCv++iCOvNzB4ys2IzW2Jmveox2xkh2+VzM9tjZv8cNqbet5+ZTTazbWa2NGRZSzN718y+8v9tUcV1b/LHfGVmN9VTtt+b2Qr/8XvNzJpXcd1qnwt1nPE+M9sU8jheWsV1h5jZSv/5eE895nshJNs6M/u8iuvW6TasqlMCe/455xrkCWgH9PLPNwFWAd3DxgwG3gww4zqgdTXrLwX+BhhwPjAvoJypwNd4778NdPsBFwK9gKUhy/4buMc/fw/wQITrtQTW+P+28M+3qIdslwBp/vkHImWL5rlQxxnvA+6O4jmwGugMZACLw3+e6ipf2Po/APcGsQ2r6pSgnn8Nds/dObfFObfQP78XWA60DzbVCRsOTHWeuUBzM2sXQI7vA6udc4F/KM059zGwI2zxcOAp//xTwIgIV/074F3n3A7n3E7gXWBIXWdzzr3jnCv3L84FOsTyPk9UFdsvGn2BYufcGudcGTANb7vHVHX5zPty2auB52N9v9GoplMCef412HIPZWb5wHnAvAir+5vZYjP7m5mdVa/BwAHvmNkCMxsdYX17YGPI5RKC+QV1LVX/QAW5/Sq1dc5tAe8HEGgTYUw8bMtReH+JRVLTc6Gu3eUfOppcxWGFeNh+FwBbnXNfVbG+3rZhWKcE8vxr8OVuZo2BV4B/ds7tCVu9EO9Qw7nAw8Dr9RxvoHOuFzAUGGdmF4attwjXqde3P5lZBjAMeCnC6qC334kIdFua2S+AcuDZKobU9FyoS48AXYCewBa8Qx/hAn8uAiOpfq+9XrZhDZ1S5dUiLKvV9mvQ5W5m6XgPwrPOuVfD1zvn9jjn9vnn3wLSzax1feVzzm32/90GvIb3p2+oEiAv5HIHYHP9pPvWUGChc25r+Iqgt1+IrZWHq/x/t0UYE9i29F88+wfgOucfgA0XxXOhzjjntjrnjjrnKoDHqrjvQJ+LZpYGXA68UNWY+tiGVXRKIM+/Blvu/vG5J4Dlzrk/VjHmFH8cZtYXb3ttr6d8OWbWpPI83gtvS8OGTQdu9N81cz6wu/LPv3pU5d5SkNsvzHSg8t0HNwFvRBgzA7jEzFr4hx0u8ZfVKTMbAvwMGOacO1DFmGieC3WZMfR1nMuquO/5QFczK/D/mrsWb7vXlx8AK5xzJZFW1sc2rKZTgnn+1dUrx/F+Agbh/dmzBPjcP10KjAHG+GPuApbhvfI/FxhQj/k6+/e72M/wC395aD4DxuO9S+ELoLCet2EjvLJuFrIs0O2H94tmC3AEb2/oVqAV8D7wlf9vS39sIfB4yHVHAcX+6ZZ6ylaMd6y18jk40R97KvBWdc+Fetx+T/vPryV4RdUuPKN/+VK8d4isrquMkfL5y6dUPu9CxtbrNqymUwJ5/ukTqiIiSajBHpYREUlmKncRkSSkchcRSUIqdxGRJKRyFxFJQir3JGVmzsyeDrmcZmalZvZmDdfrWdWsf/76QjN7qJbZcs1snpktMrMLanNb/u3lV84SGJrPzDLN7D1/FsBrzOwCf7a+z80su7b3W02ewWY24ETX1UGO+8wYoV+pAAAFi0lEQVTs7pO87jHPg9rclgQjLegAUmf2Az3MLNs5dxD4IbApiuv1xHv/7VvhK8wszTlXBNR2utTv433gJOppTc0s1Tl3tKZxYfnOA9Kdcz3925gIPOicezLK+zS8L7SpiDanbzCwD5h9Iuv87Vt+3DWCUeXzQBJEXX0YQqdgT3gF8lvgSv/yVLxPQr7pX84BJuN9snAR3sx1GcAGoBTvAxjX4E33Ogl4B3iOkGl8gcbAk3z3AZcr8KZ+nYL36b8vgJ+G5eoZdh/ZeJ9y/cK/zgNh/4f78SZfGhR2O73xPpAyB/g9/hSwlfnwJmcqBnb793MH3myCa/E+Gg7wr/7/fwnwH/6yfLzZ/Cb426UT3qcF5+DNlfMS0Ngfuw74D3/5F0A3//pf4/0i/Ry4ICTzcev8bfVH4EO8OVuOe1z866b6/8/KvHdU8bj/AlgJvIf3gZ+7/eVdgLeBBcAnQDd/+RRgor9sFd40CFU9DyYDM/Gmo/1JyPPor/5jsRS4Jujnvk7+cyHoADrV0QPrFeM5wMtAlv9DOpjvivm3wPX++eb+D3YOcDPw55Dbuc8vhGz/cuhtPAD8KWRsC7zSfTdkWfMI2b69D7xPEW4AcvH+kvwAGOGvc8DVVfz/lgAX+eePK/fw8/7lKXz3y+4SvF9ahnd48k28ucLzgQrgfH9ca+BjIMe//DP8+cLxyv3H/vk78T9tSDXzn4ev8zO9CaTW8LiMBn7pL8/E++ukIOy2e+P9kmkENMX75VZZ7u8DXf3z/YAPQu7/bX8bdMX71GdWFc+D2f59t8b7ZHI63i/0x0LGNYv0/9ap/k86LJPEnHNL/KlHR3L8n9eXAMNCjqNmAR2ruKnpzju0E+4HeHOIVN7fTjNbA3Q2s4fx9ujeqSFmH2Cmc64UwMyexSvZ14GjeJMwHcPMmuH90vjIX/Q03gRmJ+IS/7TIv9wYr9w2AOudNz8+eF+C0h2Y5U+Tk4G3F1+pcnKoBXgTV52Ml9x3h5yqelwuAc4xsyv95c38vGtDbucC4DXnz1FjZtP9fxsDA4CX/P8DeCVd6UXnHXr6yn/8ulWR86/OucPAYTPbBrTF+2XyoJk9gPeL9JMT/+9LXVC5J7/pwIN4e7GtQpYbcIVzbmXoYDPrF+E29ldx20bYtKR+wZ+L9+UD4/C+PGFUNfkiTXVa6ZCLfJz9uPs9CQb8zjn36DELvV+G+8PGveucG1nF7Rz2/z3Kyf88hd9fpMfF8P5KqGkyqUjbJQXY5fzXHqK4TlXb9nDI+aN43yC1ysx6482h8jsze8c5d38NGaUe6N0yyW8ycL9z7ouw5TOAH4fM2niev3wv3leEReMdvMnB8G+jhT+lb4pz7hXgV3hfiVadecBFZtbazFLx/sr4qLorOOd2AbvNbJC/6Loo84aaAYzy92oxs/ZmFulLFOYCA83sNH9cIzM7vYbbrm4b1rR9q3pcZgBj/SllMbPT/dkNQ30MXGZm2f4MiP8I3tTLwFozu8q/rvm/gCtdZWYpZtYFb4KtlVHkxL+tU4EDzrln8HYi6u17fKV6Kvck55wrcc79b4RVv8Y7ZrrEfxvhr/3lHwLdK98+WMPN/wZoYWZLzWwx8D28b4+Zad6XFE8Bfl5Dvi3+mA/xXpRb6JyLNCVquFuA8WY2B4h0yKhazrnKF4jnmNkXeK9NHFdm/uGim4HnzWwJXtlXddii0l/wSvbzCG/1rG4dVP24PA58CSz0lz9K2F8KzvuKtxfwXl95Be9F0krXAbf6j9Myjv0KvJV4v1D/hjez4iGifx6cDXzmP96/wHtOSBzQrJAiDZiZTcE7Vv5y0FkktrTnLiKShLTnLiKShLTnLiKShFTuIiJJSOUuIpKEVO4iIklI5S4ikoRU7iIiSej/AA9NjrE2HzfSAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [1, 2, 3, 4, 5, 10, 20]\n",
"\n",
"metrics = [evaluate_dt(train_dt, test_dt, param, 32) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different tree depths')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Maximum bins"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we will perform our evaluation on the impact of setting the number of bins for the decision tree. As with the tree depth, a larger number of bins should allow the model to become more complex and might help performance with larger feature dimensions. After a certain point, it is unlikely that it will help any more and might, in fact, hinder performance on the test set due to over-fitting:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[1.2692079792473667, 0.8059355903824542, 0.7446332199349833, 0.5969914946964172, 0.6252775817287765, 0.6252775817287765, 0.6252775817287765]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAHyFJREFUeJzt3Xt8VPWd//HXJ5kk5IICJqASENAgxQuoqWvrjdqtgq1i1aps+9Pd2tLuT9uuv1qLu7+qxd31svbX1q2Xsq61+utqrdUWXZVa62W3amuQi6CCgBcCClFABcI1n/3jnCGTYSYzJDNMzsn7+XjkwcyZ75z5nBzyzjffc873mLsjIiLxUlbqAkREpPAU7iIiMaRwFxGJIYW7iEgMKdxFRGJI4S4iEkMKdxGRGMoZ7mZ2p5mtNbNFWV6famYLzWy+mbWY2QmFL1NERPaE5bqIycxOAjYCd7v74RlerwM2ubub2ZHA/e4+rijViohIXhK5Grj7s2Y2qpvXN6Y8rQXyuuS1vr7eR43KuloREclg7ty577l7Q652OcM9H2b2eeA6YCjw2W7aTQemA4wcOZKWlpZCfLyISL9hZm/l064gB1Td/aFwKOYs4Npu2s1y92Z3b25oyPmLR0REeqigZ8u4+7PAwWZWX8j1iojInul1uJvZIWZm4eOjgUrg/d6uV0REei7nmLuZ3QtMAurNrBW4GqgAcPfbgXOAC81sO9AOnO+aR1hEpKTyOVtmWo7XbwBuKFhFIiLSa7pCVUQkhhTuIiIxFLlwX/LuR9w0ZwnrNm0rdSkiIn1W5ML9jfc28pOnlrHmwy2lLkVEpM+KXLhXVwbHgDdv21niSkRE+q7IhXttZTkAm7ftKHElIiJ9V+TCvSbsuW/aqp67iEg2EQz3oOfevl09dxGRbKIX7lVBuKvnLiKSXeTCvXbXAVX13EVEsolcuFdXJA+oqucuIpJN5MK9rMyorihXuIuIdCNy4Q5QW1XOpq0alhERySaS4V5dWU67eu4iIllFMtxrKxNs0gFVEZGsIhnuNZUacxcR6U5Ewz2hcBcR6UZEw10HVEVEuhPJcK+tUs9dRKQ7kQz36spyXaEqItKNnOFuZnea2VozW5Tl9S+a2cLw6zkzm1D4Mruq1QFVEZFu5dNzvwuY3M3rbwAnu/uRwLXArALU1a3kAdWODi/2R4mIRFLOcHf3Z4F13bz+nLuvD5++ADQWqLasOqf9Ve9dRCSTQo+5Xww8VuB17qamSrfaExHpTqJQKzKzTxGE+wndtJkOTAcYOXJkjz+r6632qnq8HhGRuCpIz93MjgTuAKa6+/vZ2rn7LHdvdvfmhoaGHn9eclhGN+wQEcms1+FuZiOBB4H/5e5Le19Sbsn7qOpWeyIimeUcljGze4FJQL2ZtQJXAxUA7n47cBWwH3CrmQHscPfmYhUMwZS/oJ67iEg2OcPd3afleP0rwFcKVlEeqit0qz0Rke5E8grVZM9dZ8uIiGQWyXBPjrlvUriLiGQU0XAPe+6aGVJEJKNIhnt1hYZlRES6E8lwLyuz8G5M6rmLiGQSyXCH8IYd6rmLiGQU4XBP0K5wFxHJKMLhrlvtiYhkE+lw1wFVEZHMIhvuwX1U1XMXEckksuGunruISHYRDvcEm9RzFxHJKMLhXq6zZUREsohsuNdWJTTlr4hIFpEN9+qKctq376Sjw0tdiohInxPZcE9O+9u+Xb13EZF0kQ33zml/dVBVRCRdhMM9Oe2veu4iIukiHO7JW+0p3EVE0kU23DtvtadhGRGRdDnD3czuNLO1ZrYoy+vjzOx5M9tqZpcXvsTMksMymvZXRGR3+fTc7wImd/P6OuCbwE2FKChfyWGZdvXcRUR2kzPc3f1ZggDP9vpad38R2F7IwnKpTZ4towOqIiK7ieyYe3WlxtxFRLLZq+FuZtPNrMXMWtra2nq1rs4Dquq5i4ik26vh7u6z3L3Z3ZsbGhp6ta4BiXLMdEBVRCSTyA7LlJUZ1RXlbNat9kREdpPI1cDM7gUmAfVm1gpcDVQAuPvtZrY/0ALsA3SY2d8B4939w6JVHaqpTLBZc8uIiOwmZ7i7+7Qcr78LNBasoj1QW6Weu4hIJpEdloFg2l+NuYuI7C7S4V5bldDdmEREMoh0uNdUlmvKXxGRDCIf7pryV0Rkd5EO99rKBJu3q+cuIpIu0uFeU6Weu4hIJtEO98qExtxFRDKIeLiXs2V7Bzs7vNSliIj0KZEO9+S0v+26SlVEpItIh/uuaX91laqISBeRDndN+ysiklmkwz15qz0dVBUR6Sri4a6eu4hIJhEP96DnrnAXEekq0uG+a8xdB1RFRLqIdLjXVCTH3NVzFxFJFe1wD3vu7TqgKiLSRaTDvbZSPXcRkUwiHe4DKsow05i7iEi6SIe7mVFTUa6zZURE0kQ63AFqqhIalhERSZMz3M3sTjNba2aLsrxuZnazmS0zs4VmdnThy8yuprKczTqgKiLSRT4997uAyd28PgVoCr+mA7f1vqz81VQmNCwjIpImZ7i7+7PAum6aTAXu9sALwCAzO6BQBeZSq567iMhuCjHmPhxYmfK8NVy2GzObbmYtZtbS1tZWgI8Opv3dpFvtiYh0UYhwtwzLMt4ayd1nuXuzuzc3NDQU4KODc93bNSwjItJFIcK9FRiR8rwRWF2A9ealpqpcU/6KiKQpRLjPBi4Mz5o5DvjA3d8pwHrzEpwto567iEiqRK4GZnYvMAmoN7NW4GqgAsDdbwceBU4HlgGbgb8pVrGZ1FYmdEBVRCRNznB392k5XnfgkoJVtIdqKhNs2d7Bzg6nvCzT8L+ISP8T/StUd92NSb13EZGk6If7rml/Ne4uIpIU+XDXtL8iIruLfLhXh8MymzTtr4jILpEP92TPvX27eu4iIkmRD/fkmLt67iIinaIf7rvOllHPXUQkKfLhnhyWUbiLiHSKfLjrPHcRkd3FINzDUyE17a+IyC6RD/cBFWWYQbt67iIiu0Q+3M2M2krdJFtEJFXkwx2CC5k05i4i0ikW4V6rOd1FRLqIRbjXVCZ0QFVEJEVMwl3DMiIiqeIR7lUJDcuIiKSIRbjXqucuItJFLMK9urJcY+4iIiliEe61lQlN+SsikiKvcDezyWa2xMyWmdmMDK8fZGZPmtlCM3vazBoLX2p2NVXlmvJXRCRFznA3s3LgFmAKMB6YZmbj05rdBNzt7kcCM4HrCl1od2oqEmzd0cHODt+bHysi0mfl03M/Fljm7ivcfRtwHzA1rc144Mnw8VMZXi+q2irNDCkikiqfcB8OrEx53houS7UAOCd8/HlgoJnt1/vy8lOjOd1FRLrIJ9wtw7L08Y/LgZPNbB5wMrAK2K0bbWbTzazFzFra2tr2uNhsanSTbBGRLvIJ91ZgRMrzRmB1agN3X+3uZ7v7UcA/hMs+SF+Ru89y92Z3b25oaOhF2V3pVnsiIl3lE+4vAk1mNtrMKoELgNmpDcys3syS67oSuLOwZXavtkrDMiIiqXKGu7vvAC4F5gCvAve7+2Izm2lmZ4bNJgFLzGwpMAz4pyLVm1F1clhGB1RFRABI5NPI3R8FHk1bdlXK4weABwpbWv6SN8luV89dRASIyRWqOqAqItJVrMJdY+4iIoFYhLsOqIqIdBWLcK9KlFFmukJVRCQpFuFuZrrVnohIiliEOwTj7u3b1XMXEYEYhXttlXruIiJJsQn36grdak9EJCk24V5bVa6zZUREQrEJ95rKBJsU7iIiQKzCvZzNukJVRASIVbgnNCwjIhKKTbgHY+7quYuIQIzCvbqyXGPuIiKh2IR7bWWCbTs62LGzo9SliIiUXGzCfdfMkNvVexcRiVG4hzND6ipVEZH4hHttlW61JyKSFJtwr9Gt9kREdolRuOtWeyIiSbELd13IJCKSZ7ib2WQzW2Jmy8xsRobXR5rZU2Y2z8wWmtnphS+1e7rVnohIp5zhbmblwC3AFGA8MM3Mxqc1+7/A/e5+FHABcGuhC81lcE0lAH9+4/29/dEiIn1OPj33Y4Fl7r7C3bcB9wFT09o4sE/4eF9gdeFKzE/DwCou/MRB/Pz5t3hu+Xt7++NFRPqUfMJ9OLAy5XlruCzVNcCXzKwVeBT4RqYVmdl0M2sxs5a2trYelNu9K6d8jDH1tXznVwv5cMv2gq9fRCQq8gl3y7DM055PA+5y90bgdOAeM9tt3e4+y92b3b25oaFhz6vNobqynB+cN4F3P9zCzIdfKfj6RUSiIp9wbwVGpDxvZPdhl4uB+wHc/XlgAFBfiAL31FEjB3PJpIN5YG4rcxa/W4oSRERKLp9wfxFoMrPRZlZJcMB0dlqbt4FPA5jZxwjCvfDjLnm69JQmDh++D3//4Mu8t3FrqcoQESmZnOHu7juAS4E5wKsEZ8UsNrOZZnZm2OzbwFfNbAFwL/DX7p4+dLPXVCbK+OF5E/lo6w5m/PplSliKiEhJJPJp5O6PEhwoTV12VcrjV4DjC1ta7zQNG8gVpx3KP/7nq/xqbivnNY/I/SYRkZiIzRWqmXz5+NEcN2YIMx9+hZXrNpe6HBGRvSbW4V5WZtz0hQkAfPtXC+jo0PCMiPQPsQ53gMbBNVx9xnj+/MY67vzjG6UuR0Rkr4h9uAOce0wjnxk/jBvnLGHpmo9KXY6ISNH1i3A3M647+wgGViW47Jfz2bZD91kVkXjrF+EOUF9XxXVnH8Hi1R9y85Ovl7ocEZGi6jfhDnDqYfvzhWMaufXpZbz09vpSlyMiUjT9KtwBrjpjPAfsW83/+eV8Nut+qyISU/0u3AcOqOAH503grXWbue7R10pdjohIUfS7cAc4bsx+XHz8aO554S2eWVqyKXBERIqmX4Y7wOWnHUrT0Dqu/PVCtmzXrflEJF76bbgPqCjn2rMOZ/UHW5j17IpSlyMiUlD9NtwhGJ6Zcvj+3Pb0ct79YEupyxERKZh+He4Af3/6x9jZ4dz4uA6uikh89PtwHzGkhotPHM2D81Yxf+WGUpcjIlIQ/T7cAS751CHU11Ux8+HFurGHiMSCwh2oq0pwxWmH8tLbG5i9IP32sCIi0aNwD517TCOHHbgP1z/2Gu3bdGqkiESbwj1UVmZcfcZhvKNTI0UkBhTuKY4dPYTPHnEAtz+znHc+aC91OSIiPZZXuJvZZDNbYmbLzGxGhtd/aGbzw6+lZhbZ005mTBnHTndufHxJqUsREemxnOFuZuXALcAUYDwwzczGp7Zx98vcfaK7TwT+FXiwGMXuDSOG1PDVE0fz0LxVmhZYRCIrn577scAyd1/h7tuA+4Cp3bSfBtxbiOJK5W8nHULDwCpmPvyKTo0UkUjKJ9yHAytTnreGy3ZjZgcBo4E/9L600kmeGjl/5QZ+O1+nRopI9OQT7pZhWbbu7AXAA+6e8VxCM5tuZi1m1tLW1ren2j3n6EaOGL4v1z/2mm7qISKRk0+4twIjUp43Atm6sxfQzZCMu89y92Z3b25oaMi/yhIoKzOuOmM87364hZ8+o1MjRSRa8gn3F4EmMxttZpUEAT47vZGZHQoMBp4vbIml8/FRQ/jskQfw02eXs3qDTo0UkejIGe7uvgO4FJgDvArc7+6LzWymmZ2Z0nQacJ/H7AjklVPG0eFwg2aNFJEISeTTyN0fBR5NW3ZV2vNrCldW39E4uIbpJ47hJ08t48JPjOKYgwaXuiQRkZx0hWoe/nbSwQwdWMXMR16hoyNWf5iISEwp3PNQW5XgisnjWLByA79dsKrU5YiI5KRwz9PZRw3nyMZ9ueGxJTo1UkT6PIV7nsrKjKs+F5waebtOjRSRPk7hvgeaRw3hjAkH8tNnlrNKp0aKSB+mcN9DM6aMA+CGx3RqpIj0XQr3PTR8UDVfO2kMsxesZu5b60pdjohIRgr3HvjayQczbJ9g1kidGikifZHCvQdqqxJ8d/I4FrR+wEPzdGqkiPQ9CvceOmvicCaMGMSNc15j01adGikifYvCvYeSp0au+XArtz+zvNTliIh0oXDvhWMOGsyZEw5k1rMraF2/udTliIjsonDvpRlTxmEG1+vUSBHpQxTuvXTgoGqmn3Qwjyx8h5Y3dWqkiPQNCvcC+PrJY9h/nwF8X6dGikgfkdd87tK9msoE351yKJf9cgEPzlvFucc0lrqkgtrZ4aze0M62nR2lLkUkFgZVV7BfXVVRP0PhXiBTJwzn58+9xY2Pv8aUw/entip639qdHc7b6zazdM1HvL7mI5au2cjSNR+x4r1NbNuhYBcplK+ffPCuqUyKJXoJ1Eclb6h99q3PcdvTy7n8tENLXVJWOzuclckQXxsE+NI1G1netrFLiA8fVM3YYXWcPLaBgxvqqKrQKJ5IIRwytK7on6FwL6CjRw7mrIkHMuu/VnD+x0cwYkhNSevp6HBWrt+8qwf+ekqIb00L8aZhdZzYVM8hQ+sYO2wghwytoy6Cf32ISEA/vQV2xeRxPL74Xa5//DVu+auj98pndnQ4revbgx742o94PQzz5W0b2bK9M8QP3HcATcMG8smD92PssIE0DaujadhAhbhIDOX1U21mk4EfA+XAHe5+fYY25wHXAA4scPe/KmCdkXHgoGq+fvLB/Oj3r3PRJ9Zx7OghBVt3R4ezakP7rmGU18NhlWVrN9K+feeudgfsO4BDhtZx3JiDGBsGeNPQOgYOqChYLSLSt5l796fumVk5sBT4DNAKvAhMc/dXUto0AfcDp7j7ejMb6u5ru1tvc3Ozt7S09Lb+Pql9205O+cHT7FdXyexLTqCszPbo/ckQf31t50HNZWGIb97WGeL77zMg6H0PHdgZ4sPq2EchLhJbZjbX3Ztztcun534ssMzdV4Qrvg+YCryS0uarwC3uvh4gV7DHXXVlOTOmjONb983ngZdaOa95RMZ27mGIJ8fE13b2xlNDfOjAKsYOG8j5Hx/B2GFBkB8ydCD7VivERSSzfMJ9OLAy5Xkr8BdpbcYCmNkfCYZurnH3xwtSYUSdOeFA7nruTf5lzhKmHL4/H27Zseug5utrNrJ07UaWrfmITSkh3jCwirHD6jivuTPEm4YOZN8ahbiI7Jl8wj3TmEL6WE4CaAImAY3Af5nZ4e6+ocuKzKYD0wFGjhy5x8VGiVkwa+Tnb32Oo699gu07O79l9XVBiH+heUSXYZVBNZUlrFhE4iSfcG8FUscVGoHVGdq84O7bgTfMbAlB2L+Y2sjdZwGzIBhz72nRUXHUyMF873Pjeev9TTQNG8jY8DTDwbUKcREprnzC/UWgycxGA6uAC4D0M2F+A0wD7jKzeoJhmhWFLDSqLj5hdKlLEJF+KOclh+6+A7gUmAO8Ctzv7ovNbKaZnRk2mwO8b2avAE8B33H394tVtIiIdC/nqZDFEudTIUVEiiXfUyE1WYiISAwp3EVEYkjhLiISQwp3EZEYUriLiMSQwl1EJIZKdiqkmbUBb+VoVg+8txfK6Wu03f1Pf912bfeeO8jdG3I1Klm458PMWvI5nzNutN39T3/ddm138WhYRkQkhhTuIiIx1NfDfVapCygRbXf/01+3XdtdJH16zF1ERHqmr/fcRUSkB/psuJvZZDNbYmbLzGxGqespFjMbYWZPmdmrZrbYzL4VLh9iZk+Y2evhv4NLXWsxmFm5mc0zs0fC56PN7E/hdv/SzGJ3ZxMzG2RmD5jZa+F+/0R/2N9mdln4f3yRmd1rZgPiuL/N7E4zW2tmi1KWZdy/Frg5zLmFZnZ0oerok+FuZuXALcAUYDwwzczGl7aqotkBfNvdPwYcB1wSbusM4El3bwKeDJ/H0bcI7hOQdAPww3C71wMXl6Sq4vox8Li7jwMmEGx/rPe3mQ0Hvgk0u/vhBPdavoB47u+7gMlpy7Lt3ykEd61rIrgF6W2FKqJPhjtwLLDM3Ve4+zbgPmBqiWsqCnd/x91fCh9/RPCDPpxge38eNvs5cFZpKiweM2sEPgvcET434BTggbBJ7LbbzPYBTgL+HcDdt4X3Go79/ia481u1mSWAGuAdYri/3f1ZYF3a4mz7dypwtwdeAAaZ2QGFqKOvhvtwYGXK89ZwWayZ2SjgKOBPwDB3fweCXwDA0NJVVjQ/Aq4AOsLn+wEbwrt/QTz3+xigDfhZOBx1h5nVEvP97e6rgJuAtwlC/QNgLvHf30nZ9m/Rsq6vhrtlWBbr03rMrA74NfB37v5hqespNjP7HLDW3eemLs7QNG77PQEcDdzm7kcBm4jZEEwm4RjzVGA0cCBQSzAkkS5u+zuXov2f76vh3gqMSHneCKwuUS1FZ2YVBMH+C3d/MFy8JvnnWfjv2lLVVyTHA2ea2ZsEw26nEPTkB4V/tkM893sr0OrufwqfP0AQ9nHf338JvOHube6+HXgQ+CTx399J2fZv0bKur4b7i0BTeCS9kuDAy+wS11QU4TjzvwOvuvv/S3lpNnBR+Pgi4Ld7u7Zicvcr3b3R3UcR7N8/uPsXCW6wfm7YLI7b/S6w0swODRd9GniFmO9vguGY48ysJvw/n9zuWO/vFNn272zgwvCsmeOAD5LDN73m7n3yCzgdWAosB/6h1PUUcTtPIPgzbCEwP/w6nWD8+Ung9fDfIaWutYjfg0nAI+HjMcCfgWXAr4CqUtdXhO2dCLSE+/w3wOD+sL+B7wOvAYuAe4CqOO5v4F6C4wrbCXrmF2fbvwTDMreEOfcywdlEBalDV6iKiMRQXx2WERGRXlC4i4jEkMJdRCSGFO4iIjGkcBcRiSGFe0SZmZvZPSnPE2bWlpxdsZv3TTSz07t5vdnMbu5lbQ3hTH/zzOzE3qwrXN+o5Ax7qfWZWZWZ/d7M5pvZ+WZ2Yjjr4Hwzq+7t53ZTzyQz+2Sx1p/lM+8o1eR54fZm/H9lZo+a2aC9XZPklsjdRPqoTcDhZlbt7u3AZ4BVebxvItAMPJr+gpkl3L2F4Bzs3vg08Jq7X5SzZednl7v7zlzt0uo7Cqhw94nhOm4HbnL3n+X5mUZww5qOnI27mgRsBJ7bw/f1mLt/ZW991p5w96wdBSmxUp/wr68eXyixEfhn4Nzw+d3Ad+m8GKgWuJPgat95BPN6VBJcKdhGcLHU+cA1BLf8+h3wH3S9oKgO+BnBxRULgXMIpmq9i+BClJeBy9Lqmpj2GdXAtLDtIuCGtG2YSTBR2glp6zkGWAA8D/wLsChcPgl4hGDipWUEE1DNB75GMBPfGwTTOAB8J9z+hcD3w2WjCGbevDX8vhwEnBp+zksEF9LUhW3fJLjw5qWw/nHh+98l+EU6Hzgxre5rCGb9+134/rOBG8P3P07wywjgqrC2ReH33wg6Wy8Ck8I21wH/FD5+mvACl/D7dgPBxFu/J5hF9WlgBXBm2OavgZ+k1PVIynpzvj9tmyYBzwIPEVxVejtQlvI9qk/5vv4bsDjc/uqwzTfD9y0E7iv1z05/+Sp5Afrq4Y4LfkCPJJibZEAYNJPoDOZ/Br4UPh5EcLVvbYYf+mvCH/LkD2LqOm4AfpTSdjBB6D6RsmxQhtp2fQbBJFFvAw1heP0BOCt8zYHzsmzfQuDk8PFu4Z7+OHx+F52/7E6lMzTLwnA7KQyhDuC4sF19GFy14fPvAleFj98EvhE+/t/AHSnfs8uz1H0N8N9ABcFc7ZuBKeFrD6Vs+5CU99wDnBE+PiwMyc8Q/PKpDJc/TWe4e9o6f5fyefPT90H4PDXcc74/bZsmAVsIriYtB55I+T6/SWe47wAmhsvvp/P/32rCK0/J8P9FX8X50ph7hLn7QoIfqmnsPsxyKjDDzOYTBMMAYGSWVc32YGgn3V8SXBqd/Lz1BL27MWb2r2Y2Gcg1g+XHgac9mDBqB/ALgpAF2EkwYVoXZrYvQQg8Ey66J71NHk4Nv+YR9LzHEdwQAeAtD+bOhuAGKeOBP4bfq4sIevNJyYnc5hJ8r/PxmAeTY71MEIaPh8tfTlnHp8LjEi8TTJp2GIC7LybY3oeBL3twP4N029LW+UzK5+VTY0/e/2cP7q+wk+Dy+hMytHnD3eeHj1O/XwuBX5jZlwh+AcheoDH36JtNME/2JIL5K5IMOMfdl6Q2NrO/yLCOTVnWbaRNP+ru681sAnAacAlwHvDlburLNKVp0hbPPM6+2+f2gAHXuftPuywM5szflNbuCXeflmU9W8N/d5L/z8tWAHfvMLPtHnZZCf5iSJjZAIJhoWZ3X2lm1xD88k06AtgADMuy/vR1pn5essYddD1hYsAevj9d+v7ItH+2pjzeSTAkB8ENWU4CzgS+Z2aHeecc7lIk6rlH353ATHd/OW35HOAb4UFDzOyocPlHwMA81/074NLkEzMbbGb1BOOtvwa+RzBdbXf+BJxsZvXh7ROnAc909wYP7kz0gZkle4dfzLPeVHOAL4fz5GNmw80s0w0wXgCON7NDwnY1ZjY2x7r35HuYSTJo3wvrS86KiJmdTfBL+iTg5l6cifImMNHMysxsBMG4em8cG87SWkZwrOa/83lT2H6Euz9FcGOWQQTHcqTIFO4R5+6t7v7jDC9dSzCOujA8jfDacPlTwPjk6YM5Vv+PwGALbmi8APgUwV1ing6HMO4CrsxR3zthm6cIDpC+5O75TOv6N8AtZvY8kGnIqFvunjxA/Hw49PEAGQLZ3dsIxqfvNbOFBGE/LsfqHwY+H34P9/hUz/CX178RDIP8huAgKuEvzuuBi919KfATgvut9sQfCQ4uv0zwl91LPVxP0vNhbYvC9T6U5/vKgf8f7oN5BPdL3dDLWiQPmhVSRCSG1HMXEYkhhbuISAwp3EVEYkjhLiISQwp3EZEYUriLiMSQwl1EJIYU7iIiMfQ/fdBbbb6ySS4AAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(train_dt, test_dt, 5, param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different maximum bins')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient BOOSTED TREE"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.tree import GradientBoostedTrees, GradientBoostedTreesModel\n"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [],
"source": [
"data_gbt = data_rec.map(lambda r: LabeledPoint(extract_label(r),extract_features_dt(r)))"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [],
"source": [
"(trainingData, testData) = data_gbt.randomSplit([0.7, 0.3])"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Gradient BOOSTED predictions: [(40.0, 18.20990171759985), (2.0, 18.223666887903477), (8.0, 127.79752806968237), (106.0, 120.2269624548493), (37.0, 133.7865565239979)]\n"
]
}
],
"source": [
"model = GradientBoostedTrees.trainRegressor(trainingData,\n",
" categoricalFeaturesInfo={}, numIterations=3)\n",
"preds = model.predict(testData.map(lambda p: p.features))\n",
"actual = testData.map(lambda p: p.label)\n",
"true_vs_predicted_GBT = actual.zip(preds)\n",
"print (\"Gradient BOOSTED predictions: \" + str(true_vs_predicted_GBT.take(5)))\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5240\n",
"log - Mean Squared Error: 14147.5213\n",
"log - Mean Absolue Error: 82.3479\n",
"Root Mean Squared Log Error: 0.7971\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_GBT.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared Error: %2.4f\" % t)\n",
"print(\"log - Mean Absolue Error: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log Error: %2.4f\" % s_log_mean)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"def evaluate_dt(trainingData,categoricalFeaturesInfo, loss, numIterations, maxDepth, maxBins):\n",
"\n",
" model = GradientBoostedTrees.trainRegressor(trainingData,categoricalFeaturesInfo, loss,numIterations,maxDepth=maxDepth, maxBins=maxBins)\n",
"\n",
" preds = model.predict(testData.map(lambda p: p.features))\n",
"\n",
" actual = testData.map(lambda p: p.label)\n",
"\n",
" tp = actual.zip(preds)\n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient boost tree Iteration"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[0.8199092268881405, 0.8191629830985493, 0.8176741701394266, 0.8147160855323975, 0.8089602867888349, 0.7979663301656902, 0.7864647907071756]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEOCAYAAACJlmBtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8VPW9//HXJzuQEJYk7JsQkE1B41o3FBC4tdrWKtS1VWmt2rp0sfdn77Xetvd6vdWqVStai3pVpNZa7FXBBVwqVYJsAQTCooQ17ASEkOTz+2NOYIhZBrKcTHg/H4955MyZ7/mez0xO5p3vOWfOmLsjIiKSEHYBIiLSPCgQREQEUCCIiEhAgSAiIoACQUREAgoEEREBFAgiIhJQIIiICKBAEBGRgAJBREQASAq7gCORlZXlvXv3DrsMEZG4Mnfu3C3unl1Xu7gKhN69e5Ofnx92GSIiccXMPoulnXYZiYgIoEAQEZGAAkFERIAYA8HMxpjZMjMrNLM7q3m8p5nNNLN5ZrbQzMYF80eZ2VwzWxT8PD9qmZOD+YVm9pCZWcM9LREROVJ1BoKZJQKPAGOBQcAEMxtUpdldwFR3Hw6MBx4N5m8BLnL3ocA1wLNRyzwGTARyg9uYejwPERGpp1hGCKcChe6+yt1LgSnAxVXaONA2mM4E1gO4+zx3Xx/MXwykmVmqmXUB2rr7bI98ZdszwCX1fC4iIlIPsZx22g1YG3W/CDitSpu7gRlmdgvQBhhZTT/fBOa5+34z6xb0E91nt1iLPlJzP9vGnv3ldGiTQsf0FDq0SSE1KbGxViciEpdiCYTq9u1X/SLmCcBkd/+tmZ0BPGtmQ9y9AsDMBgP3AqOPoE+CZScS2bVEz549Yyj3yx58u5D3lhcfNi89NYkObSLh0LFNCu2DnwfnpafQoU3qwXmtUxLRYQ4RacliCYQioEfU/e4Eu4SiXEdwDMDdZ5tZGpAFbDaz7sBfgavdfWVUn93r6JOgv0nAJIC8vLxqQ6Muv75kCJt27WPrnlK2BbetJaVs27OfrXtK2bhrH0s27GLrnlJKyyqq7SM1KSESDlWCIvrW8eDPVNq2SlKAiEhciSUQ5gC5ZtYHWEfkoPG3q7T5HLgAmGxmA4E0oNjM2gH/B/zc3f9R2djdN5jZbjM7HfgIuBp4uN7PpgY9OrSmR4fWdbZzd/aUlrOtpJSte/ZHgiMIkO1R01v3lLJ6SwnbSkrZU1pebV9JCfblUUebSJh0SE+hQ+uUw3ZhtW+dQmKCAkREwlNnILh7mZndDEwHEoGn3H2xmd0D5Lv7NOAO4Akzu43Irp9r3d2D5foBvzCzXwRdjnb3zcCNwGSgFfB6cAuVmZGemkR6ahI9O9YdIAD7DpQfGnXsCUYdJaVV5pWyZH1kBLLziwM1rBvatUo+OMLoEIxGDg+U1MNGJClJ+hiJiDQci5zkEx/y8vI83q9ldKC8gu17g8AoOXzUsa1yVBIVKNv3llJRw68oIzWJbu1b0TcnndycdHJzMsjtlE7vjm0UFiJykJnNdfe8utrF1cXtWoLkxARyMtLIyUiLqX1FhbPziwNRxz8ixz0qw2Tttr0UrNvJa4s2UJntiQlGr46tDwuJvtmRW6sUnV0lItVTIDRzCcGxiPZtUmptt+9AOSuLSyjcXMKKTcHPzbt5a+lmyoMhhhn0aB8Jin6d0umXnU5upwz65aSTnqpNQeRYp3eBFiItOZHBXTMZ3DXzsPmlZRWs2brnsJAo3FzC+yu2UFp+6Iyqrplp9OuUEYREZBdUv5x02rWuPYhEpOVQILRwKUkJ9O+UQf9OGYfNLyuvYO32L1ixaTcrNh8KixdWb+OLA4fOnMpKT43segpCom+wGyorPUWn1Yq0MAqEY1RSYgJ9strQJ6sNowcfml9R4azb8cXBgFixqYTC4hL++sk6du8vO9iuXevkYBQR2eVUGRqd26YpKETilAJBDpOQYAc/tzHi+JyD892dTbv2H9zltGJzCYWbSni9YAM79h46lTY9NSnqrKdISPTLzqB7+1Yk6HMWIs2aAkFiYmZ0zkyjc2YaZ+ce+mpWd2frntKokIjsgnp3eTEvzT10uaq05AT6ZleGREZkulM6vTq0JilRp8iKNAcKBKkXMyMrPZWs9FROP67jYY/t3HuAwuLIbqfK4xRz1mznlfmHrlKSEuy66hccxI4cq8igb3YbBYVIE1MgSKPJbJ3Myb06cHKvDofNL9lfxsrNh0KicPNuCtbv5LWCQ5+laNc6mVEDOzF2aGe+0i9LV6cVaQIKBGly6alJnNijHSf2aHfY/MrPUqzYVMKsZZt5o2Ajf55bREZqEucPzGHskM6c2z9HH64TaSS6dIU0W/vLyvmwcCtvFGxkxpKNbN97gFbJiZw3IJsxQzpz/vE5ZKQlh12mSLMX66UrFAgSF8rKK/h49TZeK9jA9MWbKN69n5TEBM7OzWLMkM6MGtRJH6ITqYECQVqsigpn7ufbeX3RRqYv3si6HV+QlGCc0bcjY4Z0ZvSgzmRnpIZdpkizoUCQY4K7s2jdTl4v2MgbBRtZvWUPZnBK7w6MHdKZMUM60yWzVdhlioRKgSDHHHdn2abdvL4oEg7LNu0GYFiPdowd0pmxQ7rE/D0XIi2JAkGOeauKSw6OHBat2wnAoC5tI+EwtDP9cjLq6EGkZVAgiERZu20v0xdv5PWCjcz9bDsA/XLSD+5WGtSlra7BJC2WAkGkBpt27YuEw6KNfLR6KxUOPTu0PhgOw3q0UzhIi6JAEInB1pL9vLlkE68XbOTDlVs4UO50yUzjwsGdGTukM3m9O5Coi/JJnFMgiByhnV8c4O2lkXB4b3kx+8sqyEpPYXQQDqcf15FkXV9J4pACQaQe9uwvY+ayzbxesJGZn25mb2k5ma2SGTWoE2OHdOasXF1fSeJHgwaCmY0BHgQSgSfd/b+qPN4TeBpoF7S5091fM7OOwEvAKcBkd785aplZQBfgi2DWaHffXFsdCgQJw74D5by3vJg3Cjby5tJN7N5XRnpqEucfH1xfaUA2rVN0WTBpvmINhDq3YjNLBB4BRgFFwBwzm+buS6Ka3QVMdffHzGwQ8BrQG9gH/AIYEtyqusLd9Q4vzVpaciKjB3dm9ODOlJZV8OHKLcH1lTYxbcF60pITOK9/DmOH6vpKEt9i+bfmVKDQ3VcBmNkU4GIgOhAcaBtMZwLrAdx9D/CBmfVrsIpFQpSSlMB5A3I4b0AOv7qkgo/XbOON4LMObyzeSEpiAmdVXl9pYCfat9H1lSR+xBII3YC1UfeLgNOqtLkbmGFmtwBtgJExrv9PZlYO/AX4lVez/8rMJgITAXr27BljtyKNLykxgTP7ZnFm3yzuvmgw89ZGrq/0esFG3vl0M4kJxhnHBddXGtyJnIy0sEsWqVWdxxDM7FvAhe5+fXD/KuBUd78lqs3tQV+/NbMzgD8CQ9y9Inj8WiCvyjGEbu6+zswyiATC/7r7M7XVomMIEg/cnYJ1u3hj8QZeL9jIquLg+kq9OvDds3pz4eDO+pyDNKkGO4ZAZETQI+p+d4JdQlGuA8YAuPtsM0sDsoAaDxK7+7rg524ze57IrqlaA0EkHpgZQ7tnMrR7Jj8ePYAVm0t4fdFG/rZgHd//308YOTCHey4eQtd2uuieNC+xnFQ9B8g1sz5mlgKMB6ZVafM5cAGAmQ0E0oDimjo0syQzywqmk4GvAgVHXr5I82Zm9O+UwY9G5jLj1nP4f+MG8o/CrYy6/12e+mA15RXxc9q3tHyxnnY6DvgdkVNKn3L3X5vZPUC+u08Lzix6AkgncoD5p+4+I1h2DZEDzinADmA08BnwHpAc9PkWcLu7l9dWh3YZSUuwdttefvG3AmYtK+bE7pn85htDGdw1M+yypAXTB9NEmjF35+8LN/DLV5ewfW8p15/Vhx+NzNXnGaRRxBoI+hy+SAjMjItO7Mrbt5/LZXndefy9VYx+4D1mLav1s5kijUqBIBKizNbJ/Oc3TmDq984gNSmBa/80hx++MI/i3fvDLk2OQQoEkWbg1D4deO1HZ3PryFzeKNjIyPvf5cU5nxNPu3Ql/ikQRJqJ1KREbh3Zn9d+dDYDOmfws78sYvykf7KyuCTs0uQYoUAQaWb65aQz5YbTufebQ/l0427G/u59HnxrBfvLaj0JT6TeFAgizVBCgnH5KT156/ZzGTOkMw+8tZxxD77Px6u3hV2atGAKBJFmLDsjlYcmDGfyd05hf1kFlz0+m5+/vJCdew+EXZq0QAoEkThw3oAcZtx2DhPPOY6p+UVccP+7vLpgvQ46S4NSIIjEidYpSfzruIFMu/krdG2Xxi0vzOM7k+ewdtvesEuTFkKBIBJnBnfN5K8/+Ar/9tVBfLx6G6MfeI8n319FWXlF2KVJnFMgiMShxATju2f14c3bz+Ur/Tryq/9byiWP/oNFRTvDLk3imAJBJI51a9eKJ67O47ErTmLzrv1c/MgH/Mffl7Bnf1nYpUkcUiCIxDkzY+zQLrx1x7l8+7Se/PGD1Yx+4D3e+XRT2KVJnFEgiLQQbdOS+dUlQ/nLjWfQJjWR707O56bnPmHzrn1hlyZxQoEg0sKc3KsDf7/lbH48uj9vLt3EBfe/y3MffUaFvoxH6qBAEGmBUpISuPn8XKbfeg5Dumby//5awGWPz2bFpt1hlybNmAJBpAXrk9WG5284jf/51okUFpcw7qH3uX/GMvYd0HWR5MsUCCItnJlx6cndefv2c7nohK489E4h4x58n9krt4ZdmjQzCgSRY0TH9FTuv3wYz153KmUVzoQn/slP/ryA7XtKwy5NmgkFgsgx5uzcbKbfeg43nteXv85bx8j73+WVeet0XSSJLRDMbIyZLTOzQjO7s5rHe5rZTDObZ2YLzWxcML9jML/EzH5fZZmTzWxR0OdDZmYN85REpC6tUhL52ZjjefWWs+jRoTW3vjifq5/6mM+36rpIx7I6A8HMEoFHgLHAIGCCmQ2q0uwuYKq7DwfGA48G8/cBvwB+XE3XjwETgdzgNuZonoCIHL2BXdrylxvP5J6LBzPv8x2M/t27/OHdlRzQdZGOSbGMEE4FCt19lbuXAlOAi6u0caBtMJ0JrAdw9z3u/gGRYDjIzLoAbd19tkfGqc8Alxz90xCRo5WYYFx9Rm/evP0czu2fzX+9/ikXPfwB89fuCLs0aWKxBEI3YG3U/aJgXrS7gSvNrAh4Dbglhj6L6uhTRJpQl8xWPH5VHo9fdTI79h7g64/+g7unLaZE10U6ZsQSCNXt26969GkCMNnduwPjgGfNrLa+Y+kz0tBsopnlm1l+cXFxDOWKSH1cOLgzb95+Dlef3ounZ69h1P3vMmPxxrDLkiYQSyAUAT2i7ncn2CUU5TpgKoC7zwbSgKw6+uxeR58E/U1y9zx3z8vOzo6hXBGpr4y0ZH558RBevvFMMlslM/HZuXzv2Xw27tR1kVqyWAJhDpBrZn3MLIXIQeNpVdp8DlwAYGYDiQRCjf/Ou/sGYLeZnR6cXXQ18LejqF9EGtHwnu159Zaz+OmYAcxaVszI+9/l2dlrdF2kFqrOQHD3MuBmYDqwlMjZRIvN7B4z+1rQ7A7gBjNbALwAXBscLMbM1gD3A9eaWVHUGUo3Ak8ChcBK4PWGe1oi0lCSExP4wXn9mHHbOQzv2Y5f/G0x3/zDh3y6cVfYpUkDs3j6MEpeXp7n5+eHXYbIMcvdeWX+Ov7j70vZ9cUBJp5zHD+8IJe05MSwS5NamNlcd8+rq50+qSwiMTMzvj48cl2kS4Z349FZK7nwd+9RsE5f3dkSKBBE5Ii1b5PC/3zrRJ6//jQOlFVw7Z/msH7HF2GXJfWkQBCRo3Zmvyye/u6p7DtQzg3P5LO3VJ9ZiGcKBBGpl9xOGTw8YThLN+zijqkLdAZSHFMgiEi9jTg+h38dN5DXCzby4Nsrwi5HjlJS2AWISMtw3Vl9WLZxNw++vYLcTul89YSuYZckR0gjBBFpEGbGr74+hFN6t+eOqQtYWKSL48UbBYKINJjUpEQeu/JkstJTueGZfDbt0qUu4okCQUQaVFZ6Kk9ek8fufWVMfCaffQfKwy5JYqRAEJEGN7BLWx4cP5yF63byk5cW6us544QCQUQaxahBnfjJhQN4dcF6HplZGHY5EgOdZSQijebGc/uyYlMJ/zNjOf1y0hkzpEvYJUktNEIQkUZjZvznN4YyvGc7bntxAYvX65pHzZkCQUQaVVpyIo9fdTLtWidzw9P5bN6tM4+aKwWCiDS6nIw0nrg6j+17D/C9Z+fqzKNmSoEgIk1iSLdM7r/sROZ9voN/fXmRzjxqhhQIItJkxg7twu2j+vPyvHU8/t6qsMuRKnSWkYg0qVvO78eKzSXc+8an9M1OZ9SgTmGXJAGNEESkSZkZ9116AkO7ZXLrlHn6buZmRIEgIk0uLTmRJ67OIz0tieufzmdryf6wSxIUCCISkk5t05h0VR7Fu/fz/f+dS2lZRdglHfNiCgQzG2Nmy8ys0MzurObxnmY208zmmdlCMxsX9djPg+WWmdmFUfPXmNkiM5tvZvkN83REJJ6c2KMd933rROas2c5dr+jMo7DVeVDZzBKBR4BRQBEwx8ymufuSqGZ3AVPd/TEzGwS8BvQOpscDg4GuwFtm1t/dK09CHuHuWxrw+YhInPnaiV0p3LSbh94ppH+nDK4/+7iwSzpmxTJCOBUodPdV7l4KTAEurtLGgbbBdCawPpi+GJji7vvdfTVQGPQnInLQrSP7M3ZIZ37z2lJmLtscdjnHrFgCoRuwNup+UTAv2t3AlWZWRGR0cEsMyzoww8zmmtnEI6xbRFqQhATjt5edyPGd2/LD5+exYtPusEs6JsUSCFbNvKo7+iYAk929OzAOeNbMEupY9ivufhIwFrjJzM6pduVmE80s38zyi4uLYyhXROJR65Qknrwmj9TkRK5/Jp/te0rDLumYE0sgFAE9ou5359AuoUrXAVMB3H02kAZk1basu1f+3Az8lRp2Jbn7JHfPc/e87OzsGMoVkXjVtV0rJl19Mht27uPG5+ZyoFxnHjWlWAJhDpBrZn3MLIXIQeJpVdp8DlwAYGYDiQRCcdBuvJmlmlkfIBf42MzamFlG0L4NMBooaIgnJCLx7aSe7bn3m0P556pt/Pu0xTrzqAnVeZaRu5eZ2c3AdCAReMrdF5vZPUC+u08D7gCeMLPbiOwSutYjv8XFZjYVWAKUATe5e7mZdQL+amaVNTzv7m80xhMUkfjz9eHdWb6phMdmrWRApwyuObN32CUdEyye0jcvL8/z8/WRBZFjQUWFM/HZucxctpnJ3zmFs3O1y/homdlcd8+rq50+qSwizVJCgvG78cPIzUnnpuc+YVVxSdgltXgKBBFpttJTk3ji6jySExO4/ul8du49EHZJLZoCQUSatR4dWvOHq05m7fa93PT8J5TpzKNGo0AQkWbvlN4d+PXXh/JB4RZ+9X9Lwy6nxdIX5IhIXLgsrwcrNu3mifdXk9spnStO6xV2SS2ORggiEjfuHDuQ8wZk8+9/W8yHK3VdzIamQBCRuJGYYDw0YTi9s9rwg+c+Yc2WPWGX1KIoEEQkrrRNS+aP10ROqb/+mXx27dOZRw1FgSAicadXxzY8dsXJrNmyhx++MI/yivj5gG1zpkAQkbh0Rt+O3HPxEGYtK+Y/X9OZRw1BZxmJSNz69mk9Wb5pN09+EDnz6PJTeoZdUlzTCEFE4tpd/zKQs3OzuOuVAj5evS3scuKaAkFE4lpSYgK/n3ASPdq35vv/O5e12/aGXVLcUiCISNzLbJ3Mk9fkUVZewfVP51OyvyzskuKSAkFEWoTjstN59IqTKSwu4dYpOvPoaCgQRKTFOCs3i3+/aBBvLd3MfdOXhV1O3NFZRiLSolx1ei+WbdzNH95dSf9O6XzjpO5hlxQ3NEIQkRbFzLj7a4M547iO3PmXRcz9bHvYJcUNBYKItDjJiQk8esVJdGmXxveezWfdji/CLikuKBBEpEVq3yaFP16Tx/4DFdzwdD57S3XmUV1iCgQzG2Nmy8ys0MzurObxnmY208zmmdlCMxsX9djPg+WWmdmFsfYpIlJf/XIyeOjbw/l04y5uf3EBFTrzqFZ1BoKZJQKPAGOBQcAEMxtUpdldwFR3Hw6MBx4Nlh0U3B8MjAEeNbPEGPsUEam3EQNy+NdxA3lj8UYeeGt52OU0a7GcZXQqUOjuqwDMbApwMbAkqo0DbYPpTGB9MH0xMMXd9wOrzaww6I8Y+hQRaRDXndWHFZtKePidQvrlpHPxsG5hl9QsxbLLqBuwNup+UTAv2t3AlWZWBLwG3FLHsrH0KSLSIMyM/7hkCKf27sBPX1rI/LU7wi6pWYolEKyaeVV3xE0AJrt7d2Ac8KyZJdSybCx9RlZuNtHM8s0sv7i4OIZyRUS+LCUpgceuPInsjFQmPpPPxp37wi6p2YklEIqAHlH3u3Nol1Cl64CpAO4+G0gDsmpZNpY+Cfqb5O557p6XnZ0dQ7kiItXrmJ7KH685hT37y7jhmXy+KC0Pu6RmJZZAmAPkmlkfM0shcpB4WpU2nwMXAJjZQCKBUBy0G29mqWbWB8gFPo6xTxGRBjegcwYPjh9Owfqd/PilBbjrzKNKdQaCu5cBNwPTgaVEziZabGb3mNnXgmZ3ADeY2QLgBeBaj1hMZOSwBHgDuMndy2vqs6GfnIhIdUYO6sTPxhzP/y3cwENvF4ZdTrNh8ZSOeXl5np+fH3YZItICuDt3/HkBL3+yjkevOIlxQ7uEXVKjMbO57p5XVzt9UllEjklmxm++PpSTerbj9qnzKVi3M+ySQqdAEJFjVlpyIo9flUeH1inc8Ew+m3cd22ceKRBE5JiWnZHKE9fksWPvASY+O5d9B47dM48UCCJyzBvcNZMHLh/G/LU7uPMvC4/ZM48UCCIiwJghnfnx6P68Mn89j85aGXY5odA3pomIBG4a0Y/lm0q4b/oycnPSGT24c9glNSmNEEREAmbGf196Aid2z+TWF+ezfNPusEtqUgoEEZEoacmJTLo6j5SkBH7z2tKwy2lSCgQRkSo6tU3je+f0Zday4mPqO5kVCCIi1bjmzF5kpafwwJvHzpfqKBBERKrROiWJ75/blw8Kt/DRqq1hl9MkFAgiIjW48vRe5GSk8ts3lx8Tn01QIIiI1CAtOZGbRvTj49Xb+Edhyx8lKBBERGox/tQedM1M4/43l7X4UYICQUSkFqlJidx8fi6ffL6DWctb9tf4KhBEROpw6cnd6d6+FQ+08GMJCgQRkTqkJCXwwwtyWVi0kzeXbAq7nEajQBARicE3hnejT1YbHnhrBRUVLXOUoEAQEYlBUmICP7ogl6UbdvHG4o1hl9MoFAgiIjG66MSu9MtJ54E3l1PeAkcJCgQRkRglJhi3jsxlxeYS/r5wfdjlNLiYAsHMxpjZMjMrNLM7q3n8ATObH9yWm9mOqMfuNbOC4HZ51PzJZrY6arlhDfOUREQaz7ghXTi+cwYPvrWCsvKKsMtpUHUGgpklAo8AY4FBwAQzGxTdxt1vc/dh7j4MeBh4OVj2X4CTgGHAacBPzKxt1KI/qVzO3ec3yDMSEWlECQnGbaP6s2rLHl6Z37JGCbGMEE4FCt19lbuXAlOAi2tpPwF4IZgeBLzr7mXuvgdYAIypT8EiImEbPagTQ7q15aG3V3CgBY0SYgmEbsDaqPtFwbwvMbNeQB/gnWDWAmCsmbU2syxgBNAjapFfm9nCYJdT6hFXLyISAjPj9lH9+XzbXl6aWxR2OQ0mlkCwaubVdHh9PPCSu5cDuPsM4DXgQyKjhtlAWdD258DxwClAB+Bn1a7cbKKZ5ZtZfnFxy/7YuIjEjxEDchjWox0Pv72C/WXlYZfTIGIJhCIO/6++O1DTjrPxHNpdBIC7/zo4RjCKSLisCOZv8Ij9wJ+I7Jr6Enef5O557p6XnZ0dQ7kiIo3PzLhjdH/W79zH1Dlr614gDsQSCHOAXDPrY2YpRN70p1VtZGYDgPZERgGV8xLNrGMwfQJwAjAjuN8l+GnAJUBB/Z6KiEjTOqtfFqf0bs/vZxay70D8jxLqDAR3LwNuBqYDS4Gp7r7YzO4xs69FNZ0ATPHDr/yUDLxvZkuAScCVQX8Az5nZImARkAX8qv5PR0Sk6USOJQxg0679PPfR52GXU28WT1fuy8vL8/z8/LDLEBE5zLef+CfLN+3mvZ+OoHVKUtjlfImZzXX3vLra6ZPKIiL1dPuo/mwpKeXZ2Z+FXUq9KBBEROopr3cHzumfzR/eXUnJ/rK6F2imFAgiIg3g9lH92b73AJP/sTrsUo6aAkFEpAEM69GOkQNzmPTeKnZ+cSDsco6KAkFEpIHcOrI/u/aV8dQH8TlKUCCIiDSQId0yGTO4M099sJode0vDLueIKRBERBrQbaP6U1JaxqT3VoVdyhFTIIiINKABnTP46gldmfzhGraW7A+7nCOiQBARaWA/uiCXfQfKeTzORgkKBBGRBtYvJ51LhnXjmdlr2Lx7X9jlxEyBICLSCH54QS4Hyp1HZ64Mu5SYKRBERBpB76w2XHpSd57/6HM27Pwi7HJiokAQEWkkN5/fD8d5ZGZh2KXERIEgItJIenRozWV5PXhxzlqKtu8Nu5w6KRBERBrRzef3w8x4+O3mP0pQIIiINKIuma349qk9eemTItZs2RN2ObVSIIiINLIfjOhLcqLx0Dsrwi6lVgoEEZFGlpORxlWn9+KVeeso3FwSdjk1UiCIiDSB75/bl7TkRB58u/mOEhQIIiJNoGN6Ktee2Zu/L1zPso27wy6nWjEFgpmNMbNlZlZoZndW8/gDZjY/uC03sx1Rj91rZgXB7fKo+X3M7CMzW2FmL5pZSsM8JRGR5mniOcfRJiWJB95cHnYp1aozEMwsEXgEGAsMAiaY2aDoNu5+m7sPc/dhwMPAy8Gy/wKcBAwDTgN+YmZtg8XuBR5w91xgO3BdwzwlEZHmqV3rFL57Vh/eWLyRgnU7wy7nS2IZIZwKFLr7KncvBaYAF9fSfgLwQjA9CHjX3cvcfQ+wABhjZgacD7wUtHsauORonoCISDy57qw+tE1L4ncRiE6nAAANSklEQVRvNb9RQiyB0A1YG3W/KJj3JWbWC+gDvBPMWgCMNbPWZpYFjAB6AB2BHe5eVlefIiItSWarZCaecxxvLd3M/LU76l6gCcUSCFbNPK+h7XjgJXcvB3D3GcBrwIdERg2zgbIj6dPMJppZvpnlFxcXx1CuiEjzdu1X+tC+dTL3N7NjCbEEQhGR/+ordQfW19B2PId2FwHg7r8Oji+MIhIEK4AtQDszS6qrT3ef5O557p6XnZ0dQ7kiIs1bemoS3zu3L+8tLyZ/zbawyzkolkCYA+QGZwWlEHnTn1a1kZkNANoTGQVUzks0s47B9AnACcAMd3dgJnBp0PQa4G/1eSIiIvHk6jN6kZWe0qxGCXUGQrCf/2ZgOrAUmOrui83sHjP7WlTTCcCU4M2+UjLwvpktASYBV0YdN/gZcLuZFRI5pvDH+j8dEZH40DoliRvP68eHK7cye+XWsMsBwA5//27e8vLyPD8/P+wyREQaxL4D5Zx730x6dmjN1O+dQeQEzIZnZnPdPa+udvqksohISNKSE7lpRD/mrNnOB4Vbwi5HgSAiEqbLT+lB18w0fjtjOWHvsVEgiIiEKDUpkVsuyGX+2h3MXLY51FoUCCIiIbv05O706NCK+98Md5SgQBARCVlyYgI/PD+XgnW7mLFkU2h1KBBERJqBrw/vxnFZbXjgzeVUVIQzSlAgiIg0A0mJCfxoZC6fbtzNawUbQqlBgSAi0kx89YSu5Oak87u3VlAewihBgSAi0kwkJhi3juxP4eYSXl1Q0yXjGo8CQUSkGRk7pDPHd87gwbdXUFZe0aTrViCIiDQjCQnG7aP6s3rLHl6et65p192kaxMRkTqNGtSJod0yeejtFZSWNd0oQYEgItLMmEVGCUXbv+CluUVNtl4FgohIM3TegGyG92zH799Zwf6y8iZZpwJBRKQZMjPuGDWA9Tv3MeXjtXUv0AAUCCIizdRX+nXk1D4deGRmIfsONP4oQYEgItJMVR5LSEowVm/Z0+jrS6q7iYiIhOX04zry7k9HkJzY+P+/a4QgItLMNUUYgAJBREQCCgQREQFiDAQzG2Nmy8ys0MzurObxB8xsfnBbbmY7oh77bzNbbGZLzewhM7Ng/qygz8rlchruaYmIyJGq86CymSUCjwCjgCJgjplNc/cllW3c/bao9rcAw4PpM4GvACcED38AnAvMCu5f4e759X8aIiJSX7GMEE4FCt19lbuXAlOAi2tpPwF4IZh2IA1IAVKBZCC874cTEZEaxRII3YDoj8kVBfO+xMx6AX2AdwDcfTYwE9gQ3Ka7+9KoRf4U7C76ReWuJBERCUcsgVDdG3VNX+UzHnjJ3csBzKwfMBDoTiREzjezc4K2V7j7UODs4HZVtSs3m2hm+WaWX1xcHEO5IiJyNGL5YFoR0CPqfnegpq/yGQ/cFHX/68A/3b0EwMxeB04H3nP3dQDuvtvMnieya+qZqh26+yRgUrB8sZl9FkPN1ckEdh7lsg2hsdbfEP3Wp48jXTbW9rG0q6tNFrAlxrriibblxumjJW/LvWJq5e613oiExioiu4JSgAXA4GraDQDWABY173LgraCPZOBt4KLgflbQJhl4Cfh+XbXU5wZMasz+w1p/Q/Rbnz6OdNlY28fSrq42QH6Yv/PGumlbbpw+tC173buM3L0MuBmYDiwFprr7YjO7x8y+FtV0AjDFg+oDLwErgUVBkCxw91eJHGCebmYLgfnAOuCJumqpp1cbuf+w1t8Q/danjyNdNtb2sbQL+3calrCft7blI2sfN9uyHf7+LdJymFm+u+eFXYdIfTXVtqxPKktLNinsAkQaSJNsyxohiIgIoBGCiIgEFAgiIgIoEEREJKBAkGOGmR1nZn80s5fCrkWkPszsEjN7wsz+ZmajG6pfBYLENTN7ysw2m1lBlflfumS7Ry7QeF04lYrU7gi35Vfc/QbgWiIfAG4QCgSJd5OBMdEzoi7ZPhYYBEwws0FNX5rIEZnMkW/LdwWPNwgFgsQ1d38P2FZl9pFesl0kdEeyLVvEvcDr7v5JQ9WgQJCWqNpLtptZRzP7AzDczH4eTmkiR6Smrx+4BRgJXGpm32+olcVytVOReFPtJdvdfSvQYH88Ik2gpm35IeChhl6ZRgjSEh3JJdtFmrMm3ZYVCNISzQFyzayPmaUQ+Z6OaSHXJHI0mnRbViBIXDOzF4DZwAAzKzKz62q6ZHuYdYrUpTlsy7q4nYiIABohiIhIQIEgIiKAAkFERAIKBBERARQIIiISUCCIiAigQIh7ZuZm9mzU/SQzKzazv9ex3DAzG1fL43lmVq+PxptZtpl9ZGbzzOzs+vTV0MzsHjMbGXYdtTGzyWZ2aROs51tmttTMZlaZ37XyuyPq2l6OYp3tzOwH1a1LwqNAiH97gCFm1iq4PwpYF8Nyw4Bq/8DNLMnd8939h/Ws7QLgU3cf7u7vx7JAcLnfBmFmNV6ry93/zd3faqh1NTdH+DpeB/zA3UdEz3T39e5eGUg1bi+11FDbtdLaAQcDocq6JCzurlsc34AS4DfApcH9Z4CfAX8P7rcBniLyEfh5RC4DnQJ8DhQD84l8wcbdwCRgBvA8cF5UH+nAn4BFwELgm0Aikeu3FwTzb6tS17Aq62gFTAjaFgD3VnkO9wAfAWdFzR8IfBx1vzewMJj+t+A5FQR1V37IclbwerwL/DuwGkgOHmsLrAGSg9orX7M1wC+BT4L6jg/mZwNvBvMfBz4Dsmr4HfwaWAD8E+gUzD+4jsp2wc/zgvqmAsuB/wKuAD4O1t83avk/AO8H7b4azE8E7gue/0Lge1H9zgx+f0uqqfNLr3/wOpYAy4D7qrTvHbStbnv50nYVLHMt8GfgVeAdItvO21GvbWW7KcAXQX/3Va4reCyNQ9vbPGBEVN8vA28AK4D/jno9JlPDtqjbEbyfhF2AbvX8BUb+mE8AXgr+kOZz+Jv5b4Arg+l2wRtLm+CP6/dR/dwNzAVaBfej+7gX+F1U2/bAycCbUfPaVVPbwXUAXYM3lWwiV9l9B7gkeMyBy2p4fvOB44LpnwF3BdMdoto8C1wUTM8CHo167E9R65kI/DaYnszhgXBLMP0D4Mlg+vfAz4PpMUGd1QWCR63/v6NqPLiOyt9V1Gu7A+gCpBIZ0f0yeOxHla91sPwbREbyuUQudJYWPI/KdaQC+UCfoN89QJ9qaqzt9Z8F5FWzTG8OvUkf/F3GsF0VVf5+gnW1DaazgEIiV/A82Hc167oD+FMwfXxQd1rQ9yogM7j/GZELv9W5LeoW2027jFoAd19I5A9qAvBalYdHA3ea2Xwif/hpQM8auprm7l9UM38kUd/K5O7bifxhHmdmD5vZGGBXHWWeAsxy92KPXJ/lOeCc4LFy4C81LDcVuCyYvhx4MZgeERyfWAScDwyOWubFqOknge8E098hEhDVeTn4OZfIawlwFpH/ZHH3N4DtNSxbClQes4levjZz3H2Du+8HVhIZmUHkP9zo5ae6e4W7ryDymh9P5Hd6dfA7/QjoSCQwIDKiWl3N+mp7/Y9GbdvVm+5e+UUvBvzGzBYCbxG5ln+nOvo+i0jI4+6fEnnj7x889ra773T3fcASoBdHvi1KDfR9CC3HNOB/iPyX2DFqvgHfdPdl0Y3N7LRq+thTQ99G5L/gg9x9u5mdCFwI3ETkTfu7tdRX3XXdK+1z9/IaHnsR+LOZvRxZra8wszTgUSL/1a41s7uJvCF96Xm4+z/MrLeZnQskuvth31cbZX/ws5xDfxe11RztgAf/mlZZvozgOJ2ZGZFdL1XXB1ARdb+Cw/8uq15szIO6bnH36dEPmNl51P47bEi1bVfRNVxBZFRysrsfMLM1HP67qqnvmkS/buVA0lFsi1IDjRBajqeAe9x9UZX504FbgjckzGx4MH83kBFj3zOIXHGRoI/2ZpYFJLj7X4BfACfV0cdHwLlmlhUc8JxAZD96rdx9JZE//F9w6D//yjeULWaWDtR1MPIZ4AVqHh3U5AOC0YmZjSayq+xIrCGyOwMix26Sj3B5gG+ZWYKZ9QWOI7Kvfzpwo5klB7X1N7M2dfRzVK9/lKrbS03bVVWZwOYgDEYQ+Y++uv6ivUckSDCz/kRGHstqaMtRbItSAwVCC+HuRe7+YDUP/QeRN6KFZlYQ3IfIwcdBZjbfzC6vo/tfAe3NrMDMFgAjiAz9ZwW7DCYDtX4lpbtvCNrMJHLw9RN3/1tsz44XgSuJ7D7C3XcATxDZvfIKkQObtXmOyJv5CzGur9IvgdFm9gmRLznfQOSNLFZPEHkT/hio+p9zrJYReeN+Hfh+sKvkSSK7Sz4JfqePU8dov56vP3x5e6lpu6rqOSDPzPKJvMl/GtSzFfhHsE3dV2WZR4HEYHfgi8C1wa61mhzRtig10+WvpcULzuW/2N2vOsLlUoFydy8zszOAx9x9WKMUKdIM6BiCtGhm9jCR/+6P5kNVPYGpZpZA5MDxDQ1Zm0hzoxGCiIgAOoYgIiIBBYKIiAAKBBERCSgQREQEUCCIiEhAgSAiIgD8f/7NRojchwi/AAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(trainingData, {},'leastAbsoluteError', param,3, 32) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying number of iterations')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[1.3129641027539716, 0.8623427672994901, 0.824921416579645, 0.7999850600613403, 0.8169310667181966, 0.8169291195629018, 0.8169291195629018]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEKCAYAAADpfBXhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAG7RJREFUeJzt3Xl0XOV5x/HvI81II294kSBgSxiIE+oQMESmtIHghIQa2rKENI5PSEiAuj0lS3OatuT0FChpS2mTNjvUocYJTUlcAq2TkgAhEE4bQixjY5vdCWAJQ6x4AYxk2ZKe/nHfkUaj2WyNPLpXv885Op7lzp336lo/vXrmve9r7o6IiCRLXa0bICIi1adwFxFJIIW7iEgCKdxFRBJI4S4ikkAKdxGRBFK4i4gkkMJdRCSBFO4iIgmUqtUbNzc3+/z582v19iIisbR+/fpfu3tLue1qFu7z58+no6OjVm8vIhJLZvZCJdupLCMikkAKdxGRBFK4i4gkkMJdRCSBFO4iIgmkcBcRSSCFu4hIAsUu3J9++TU+f+/T7NzbV+umiIhMWLEL91907+XLP95Kt8JdRKSo2IV7Jh01uXf/QI1bIiIycZUNdzNbZWY7zGxLkecvNLNNZrbRzDrM7MzqN3NYJlUPwL4Dg+P5NiIisVZJz301sLTE8/cDp7j7IuBy4JYqtKuoTEMI93713EVEiikb7u7+ELCrxPN73d3D3amAF9u2GrI9974DCncRkWKqUnM3s4vN7Cngf4h678W2WxFKNx3d3d2H9F7ZmrvKMiIixVUl3N39Lnc/EbgI+GyJ7Va6e7u7t7e0lJ2OuKBMOltzV89dRKSYqo6WCSWcE8ysuZr7zaVwFxEpb8zhbmZvNDMLt08DGoCdY91vMUNDIVWWEREpquxKTGZ2O7AEaDazLuBaIA3g7jcDlwAfNrMDQC+wLOcD1qobHgqpnruISDFlw93dl5d5/kbgxqq1qIy6OqMhVaehkCIiJcTuClWATKqOPpVlRESKime4p+tVlhERKUHhLiKSQDEN9zp6Fe4iIkXFNNzrdYWqiEgJMQ539dxFRIqJb7j3q+cuIlJMPMM9VadZIUVESohnuKssIyJSUkzDXaNlRERKiWm4a7SMiEgpsQz3JpVlRERKimW4N6br6esfZBwnnxQRibVYhnt2Tvc+DYcUESkonuGuOd1FREqKZ7iHpfY0YkZEpLCYhnvUbI2YEREpLKbhrrKMiEgpsQz3JoW7iEhJsQz3RpVlRERKimW4D5VltEi2iEhB8Qz3MBRSM0OKiBQWz3APZRkNhRQRKSym4Z79QFU1dxGRQmIZ7hotIyJSWizDXT13EZHSYhnujansUEj13EVEColluNfVGQ2pOg2FFBEpIpbhDtEi2fv2K9xFRAopG+5mtsrMdpjZliLPf9DMNoWvn5rZKdVv5mhaak9EpLhKeu6rgaUlnn8OONvdTwY+C6ysQrvKamqoV1lGRKSIVLkN3P0hM5tf4vmf5tz9GTBv7M0qL5PSOqoiIsVUu+Z+BfCDKu+zoEy6TmUZEZEiyvbcK2Vm7yQK9zNLbLMCWAHQ1tY2pvdrTKvnLiJSTFV67mZ2MnALcKG77yy2nbuvdPd2d29vaWkZ03tmFO4iIkWNOdzNrA24E/iQuz8z9iZVJpNSWUZEpJiyZRkzux1YAjSbWRdwLZAGcPebgWuAOcDXzAyg393bx6vBWZm0RsuIiBRTyWiZ5WWevxK4smotqlCTyjIiIkXF9wpVjZYRESkqxuGunruISDGxDffGdD19/YMMDnqtmyIiMuHENtyzS+319as0IyKSL77hntJqTCIixcQ23JsaQrhrOKSIyCixDfdsWUYjZkRERotvuKssIyJSVHzDPa1wFxEpJrbh3hjKMr0KdxGRUWIb7tmee59q7iIio8Q23JtUlhERKSq24T5Uc9dQSBGRUWIc7hoKKSJSTHzDXUMhRUSKim+4h7KMRsuIiIwW23BvTKksIyJSTGzDva7OaEjV0aeeu4jIKLENd9BSeyIixcQ63LXUnohIYTEP93qNcxcRKSDe4Z6qp3e/wl1EJF+8wz1dxz4tsyciMkqsw71RH6iKiBQU63BvStdrKKSISAGxDneNlhERKSzm4a7RMiIihcQ73DVaRkSkoHiHe7pOH6iKiBQQ83Cv11BIEZECyoa7ma0ysx1mtqXI8yea2cNm1mdmn65+E4trTNezv3+QwUE/nG8rIjLhVdJzXw0sLfH8LuATwOeq0aCDkV1HtU+9dxGREcqGu7s/RBTgxZ7f4e7rgAPVbFglhpfaU91dRCTXYa25m9kKM+sws47u7u4x70+LZIuIFHZYw93dV7p7u7u3t7S0jHl/2Z67hkOKiIwU79EyQ4tkq+YuIpIr3uGusoyISEGpchuY2e3AEqDZzLqAa4E0gLvfbGZvADqAGcCgmf0psNDdXx23VgdD4a4PVEVERigb7u6+vMzzLwPzqtaig5CtufepLCMiMkIyyjLquYuIjJCIcO9VuIuIjBDzcM9exKSyjIhIrniHe0plGRGRQmId7k0NGgopIlJIrMO9MaWyjIhIIbEOdzOjMVWnRbJFRPLEOtwhGjGj0TIiIiMlINy11J6ISL4EhHu9au4iInniH+6pevXcRUTyxD/cG7RItohIvviHe0o1dxGRfPEP97TKMiIi+RIQ7uq5i4jkS0C4a7SMiEi++Ie7RsuIiIwS+3BvalC4i4jki324N6brNBRSRCRP7MM9k6pnf/8gg4Ne66aIiEwY8Q/3tOZ0FxHJl4Bw15zuIiL5EhDuWmpPRCRf7MO9SeEuIjJK7MNdZRkRkdFiH+6N+kBVRGSU2Id7JhXCfb/CXUQkK/7hni3LqOcuIjIkAeGe/UBVNXcRkayy4W5mq8xsh5ltKfK8mdmXzGyrmW0ys9Oq38ziNBRSRGS0Snruq4GlJZ4/D1gQvlYAN429WZVrUs9dRGSUsuHu7g8Bu0psciHwTY/8DJhpZkdXq4HlDA+FVM9dRCSrGjX3uUBnzv2u8NhhkS3L9CrcRUSGVCPcrcBjBadoNLMVZtZhZh3d3d1VeGtoTEWH0KdwFxEZUo1w7wJac+7PA7YX2tDdV7p7u7u3t7S0VOGtwcxoTGlOdxGRXNUI97XAh8OomTOAV9z9pSrst2LROqrquYuIZKXKbWBmtwNLgGYz6wKuBdIA7n4zcDdwPrAV6AE+Ol6NLaZJ4S4iMkLZcHf35WWed+CqqrXoEGTSdRoKKSKSI/ZXqEJUltFoGRGRYYkI90aVZURERkhEuGdSdfSpLCMiMiQZ4Z6u16yQIiI5EhHuGi0jIjJSIsJdo2VEREZKSLir5y4ikisx4a6hkCIiwxIR7o1pjZYREcmViHDPpOrZPzDIwGDByShFRCadZIR7mNO9T8MhRUSAhIR709BqTCrNiIhAQsJdi2SLiIyUqHDXiBkRkUhCwl2LZIuI5EpEuDcOlWVUcxcRgYSEeyYVRsuo5y4iAiQk3JsaQs9dQyFFRICEhHtGQyFFREZIRriHskzvfvXcRUQgKeGeVllGRCRXQsJdZRkRkVwJCXddoSoikisR4d6Yig5DQyFFRCKJCHczi5ba61dZRkQEEhLuEFZj0mgZEREgSeGe0jqqIiJZyQl3lWVERIYkKNzVcxcRyUpMuDcq3EVEhlQU7ma21MyeNrOtZnZ1geePNbP7zWyTmT1oZvOq39TSmtJ19OkiJhERoIJwN7N64KvAecBCYLmZLczb7HPAN939ZOB64IZqN7ScTLpe0w+IiASV9NxPB7a6+y/dfT/wbeDCvG0WAveH2w8UeH7cZVIaCikiklVJuM8FOnPud4XHcj0GXBJuXwxMN7M5+TsysxVm1mFmHd3d3YfS3qKi0TIKdxERqCzcrcBjnnf/08DZZrYBOBt4Eegf9SL3le7e7u7tLS0tB93YUqLRMqq5i4gApCrYpgtozbk/D9ieu4G7bwfeC2Bm04BL3P2VajWyEhoKKSIyrJKe+zpggZkdZ2YNwAeAtbkbmFmzmWX39RlgVXWbWV4mXa/RMiIiQdlwd/d+4GPAPcCTwBp3f9zMrjezC8JmS4CnzewZ4Cjg78apvUVl0nXsHxhkYDC/YiQiMvlUUpbB3e8G7s577Jqc23cAd1S3aQcnd073qY0VHZaISGIl5grVTCq7GpPq7iIiyQn3oXVUVXcXEUleuKvnLiKSpHBXWUZEJCtB4Z7tuassIyKSwHBXz11EROEuIpJACQr3bM1dZRkRkcSEe1Pouf/q1X01bomISO0lJtznzZrCKfOO4J/ueZotLx7WOctERCacxIR7fZ3x9cvamT21gctXr2P7nt5aN0lEpGYSE+4AR07PsOoji+ndP8Dlq9fx2r4DtW6SiEhNJCrcAd78hul87dLT2LpjL1f9xwb6B/QBq4hMPokLd4CzFrTwtxedxEPPdHPN2sdx1zTAIjK5JHZu3A+c3sYLu3q46cFfcOzsKfzR2SfUukkiIodNYsMd4M/PfTPbdvVwww+eonX2FM5/69G1bpKIyGGRyLJMVl2d8fk/OIXT2mbyqe9s5NFtu2vdJBGRwyLR4Q7RtARf/3A7R83I8Iff6KBzV0+tmyQiMu4SH+4Ac6Y1cutHF9M/6Hzk1p/zSo+GSIpIsk2KcAc4oWUa//qht7FtVw9//O/r2a8Vm0QkwSZNuAOccfwc/vF9J/PwL3fymTs3a4ikiCRWokfLFHLxqfN4YWcPX/jRs8yfM4WPn7Og1k0SEam6SRfuAJ88ZwHbdvbw+fueYUZTmuWnt9GQmlR/xIhIwk3KcDczbrjkrbz86j6uXfs4X7z/WS4+dS7LFrfypqOm17p5IiJjZrWqO7e3t3tHR0dN3jtrYNB56Nlu1qzr5EdP/ooDA86i1pksW9zK759yDNMaJ+XvPhGZwMxsvbu3l91uMod7rp17+7hrw4t8Z10nz+7YS1O6nt89+WiWLW6l/dhZmFmtm3jYuTvdr/XRubuHF/fs0yRsIlXypqOmc9LcIw7ptZWGu7qmwZxpjVx51vFcceZxbOjcw5p1nXzvse3csb6L41um8v72Vt572lyOnJ6pdVOr6vW+fjp397BtZw/bdvXQtbuXbbt66NzVQ+fuHi1bKDIO/vjsEw453CulnnsJr/f18z+bX2LNuk46XthNfZ3xrhOPZFl7K0ve3EKqfuJ/CNs/MMhLr+wbCuttu3rYtqs3ur+rh52v7x+x/bTGFK2zp9A2u4m22VNoDV/zZjbpQ2eRKpmRSTNrasMhvVZlmSrbumMv/9nRyXcf7eLXe/dz5PRGLnnbPN7f3spxzVNr1i53Z0/Pgai3HcI7Cu6oB759Ty/9g8PnOFVnHDMzN7ij222zp9A6awozp6QnZQlKJC6qGu5mthT4IlAP3OLu/5D3fBvwDWBm2OZqd7+71D7jFu5ZBwYG+fFTO1izrpMHnt7BoMPpx83mA4tbOe+ko2lqqK/6e+47MEDX7l46d0fBvW1nNsh76drVw2t9/SO2nzO1IfS+h8O7dVYU5kcfkYnFXxwiUljVwt3M6oFngPcAXcA6YLm7P5GzzUpgg7vfZGYLgbvdfX6p/cY13HP96tV93LG+izUdnbyws4fpjSkuWHQMyxa38ta5R1TcAx4cdHaEDy6Hg3u4B/7yq/tGbN+YqhvqeQ/1wGc10TYnCvGpGuUjkljV/ED1dGCru/8y7PjbwIXAEznbODAj3D4C2H5wzY2no2ZkuOqdb+RPlpzAI8/tYs26qGzzrUe2ceIbprNscSsXLZrLrKkNvLbvwFCppGuo9p394LJ3xFw3ZnD0jAzzZk/hzAXNo3rgLdMbVToRkZIq6bm/D1jq7leG+x8CftPdP5azzdHAvcAsYCrwbndfX2q/Sei5F/LqvgOs3bidNR2dbOp6hYb6OqY21rM7bybK6ZnUcK07txc+q4m5s5poTFW/vCMi8VfNnnuhLmL+b4TlwGp3/7yZ/RZwm5md5O4jxtGZ2QpgBUBbW1sFbx0/MzJpLj3jWC4941ie2P4qdz7aRc+BgREfWrbNnsIRU9K1bqqIJFgl4d4FtObcn8fosssVwFIAd3/YzDJAM7AjdyN3XwmshKjnfohtjo2Fx8xg4TELa90MEZmEKhk2sQ5YYGbHmVkD8AFgbd4224BzAMzsN4AM0F3NhoqISOXKhru79wMfA+4BngTWuPvjZna9mV0QNvsz4A/N7DHgduAjrsnSRURqpqIxc2HM+t15j12Tc/sJ4O3VbZqIiBwqXc0iIpJACncRkQRSuIuIJJDCXUQkgRTuIiIJVLMpf82sG3ihzGbNwK8PQ3MmGh335DNZj13HffCOdfeWchvVLNwrYWYdlcyhkDQ67slnsh67jnv8qCwjIpJACncRkQSa6OG+stYNqBEd9+QzWY9dxz1OJnTNXUREDs1E77mLiMghmLDhbmZLzexpM9tqZlfXuj3jxcxazewBM3vSzB43s0+Gx2eb2X1m9mz4d1at2zoezKzezDaY2ffD/ePM7JFw3N8J00wnipnNNLM7zOypcN5/azKcbzP7VPg/vsXMbjezTBLPt5mtMrMdZrYl57GC59ciXwo5t8nMTqtWOyZkuIdFub8KnAcsBJaHhbeTqB/4M3f/DeAM4KpwrFcD97v7AuD+cD+JPkk0lXTWjcC/hOPeTbQQTNJ8Efihu58InEJ0/Ik+32Y2F/gE0O7uJwH1RGtDJPF8ryYsXpSj2Pk9D1gQvlYAN1WrERMy3MlZlNvd9wPZRbkTx91fcvdHw+3XiH7Q5xId7zfCZt8ALqpNC8ePmc0Dfhe4Jdw34F3AHWGTxB23mc0A3gH8G4C773f3PUyC8000xXiTmaWAKcBLJPB8u/tDwK68h4ud3wuBb3rkZ8DMsCb1mE3UcJ8LdObc7wqPJZqZzQdOBR4BjnL3lyD6BQAcWbuWjZsvAH8BZNfanQPsCQvEQDLP+/FEq5TdGspRt5jZVBJ+vt39ReBzRKu2vQS8Aqwn+ec7q9j5Hbesm6jhXsmi3IliZtOA7wJ/6u6v1ro9483Mfg/Y4e7rcx8usGnSznsKOA24yd1PBV4nYSWYQkKN+ULgOOAYYCpRSSJf0s53OeP2f36ihnsli3InhpmliYL9W+5+Z3j4V9k/z8K/O4q9PqbeDlxgZs8Tld3eRdSTnxn+bIdknvcuoMvdHwn37yAK+6Sf73cDz7l7t7sfAO4Efpvkn++sYud33LJuooZ7JYtyJ0KoM/8b8KS7/3POU2uBy8Lty4D/PtxtG0/u/hl3n+fu84nO74/d/YPAA8D7wmZJPO6XgU4ze3N46BzgCRJ+vonKMWeY2ZTwfz573Ik+3zmKnd+1wIfDqJkzgFey5Zsxc/cJ+QWcDzwD/AL4q1q3ZxyP80yiP8M2ARvD1/lE9ef7gWfDv7Nr3dZx/B4sAb4fbh8P/BzYCvwn0Fjr9o3D8S4COsI5/y9g1mQ438DfAE8BW4DbgMYknm/gdqLPFQ4Q9cyvKHZ+icoyXw05t5loNFFV2qErVEVEEmiilmVERGQMFO4iIgmkcBcRSSCFu4hIAincRUQSSOEeU2bmZnZbzv2UmXVnZ1cs8bpFZnZ+iefbzexLY2xbS5jpb4OZnTWWfYX9zc/OsJfbPjNrNLMfmdlGM1tmZmeFWQc3mlnTWN+3RHuWmNlvj9f+i7znLbWaPC8cb8H/V2Z2t5nNPNxtkvJS5TeRCep14CQza3L3XuA9wIsVvG4R0A7cnf+EmaXcvYNoDPZYnAM85e6Xld1y+L3r3X2g3HZ57TsVSLv7orCPm4HPufutFb6nES1YM1h245GWAHuBnx7k6w6Zu195uN7rYLh70Y6C1FitB/zr65AvlNgL/D3wvnD/m8BfMnwx0FRgFdHVvhuI5vVoILpSsJvoYqllwHVES37dC/wHIy8omgbcSnRxxSbgEqKpWlcTXYiyGfhUXrsW5b1HE7A8bLsFuDHvGK4nmijtzLz9vA14DHgY+CdgS3h8CfB9oomXthJNQLUR+COimfieI5rGAeDPw/FvAv4mPDafaObNr4Xvy7HAueF9HiW6kGZa2PZ5ogtvHg3tPzG8/mWiX6QbgbPy2n0d0ax/94bXvxf4x/D6HxL9MgK4JrRtS/j+G1Fnax2wJGxzA/B34faDhAtcwvftRqKJt35ENIvqg8AvgQvCNh8BvpLTru/n7Lfs6/OOaQnwEHAX0VWlNwN1Od+j5pzv69eBx8PxN4VtPhFetwn4dq1/dibLV80boK9DPHHRD+jJRHOTZELQLGE4mP8euDTcnkl0te/UAj/014Uf8uwPYu4+bgS+kLPtLKLQvS/nsZkF2jb0HkSTRG0DWkJ4/Ri4KDznwPuLHN8m4Oxwe1S4598O91cz/MvuXIZDsy6E2ztCCA0CZ4TtmkNwTQ33/xK4Jtx+Hvh4uP0nwC0537NPF2n3dcD/Ammiudp7gPPCc3flHPvsnNfcBvx+uP2WEJLvIfrl0xAef5DhcPe8fd6b834b889BuJ8b7mVfn3dMS4B9RFeT1gP35Xyfn2c43PuBReHxNQz//9tOuPKUAv9f9DU+X6q5x5i7byL6oVrO6DLLucDVZraRKBgyQFuRXa31qLST791El0Zn3283Ue/ueDP7spktBcrNYLkYeNCjCaP6gW8RhSzAANGEaSOY2RFEIfCT8NBt+dtU4NzwtYGo530i0YIIAC94NHc2RAukLAT+L3yvLiPqzWdlJ3JbT/S9rsQPPJocazNRGP4wPL45Zx/vDJ9LbCaaNO0tAO7+ONHxfg+43KP1DPLtz9vnT3Ler5I2Hsrrf+7R+goDRJfXn1lgm+fcfWO4nfv92gR8y8wuJfoFIIeBau7xt5ZonuwlRPNXZBlwibs/nbuxmf1mgX28XmTfRt70o+6+28xOAX4HuAp4P3B5ifYVmtI0a58XrrOPet9DYMAN7v6vIx6M5sx/PW+7+9x9eZH99IV/B6j856UPwN0HzeyAhy4r0V8MKTPLEJWF2t2908yuI/rlm/VWYA9wVJH95+8z9/2ybexn5ICJzEG+Pl/++Sh0fvpybg8QleQgWpDlHcAFwF+b2Vt8eA53GSfqucffKuB6d9+c9/g9wMfDh4aY2anh8deA6RXu+17gY9k7ZjbLzJqJ6q3fBf6aaLraUh4Bzjaz5rB84nLgJ6Ve4NHKRK+YWbZ3+MEK25vrHuDyME8+ZjbXzAotgPEz4O1m9saw3RQze1OZfR/M97CQbND+OrQvOysiZvZeol/S7wC+NIaRKM8Di8yszsxaierqY3F6mKW1juizmv+t5EVh+1Z3f4BoYZaZRJ/lyDhTuMecu3e5+xcLPPVZojrqpjCM8LPh8QeAhdnhg2V2/7fALIsWNH4MeCfRKjEPhhLGauAzZdr3UtjmAaIPSB9190qmdf0o8FUzexgoVDIqyd2zHxA/HEofd1AgkN29m6g+fbuZbSIK+xPL7P57wMXhe3jQQz3DL6+vE5VB/ovoQ1TCL85/AK5w92eArxCtt3oo/o/ow+XNRH/ZPXqI+8l6OLRtS9jvXRW+rh7493AONhCtl7pnjG2RCmhWSBGRBFLPXUQkgRTuIiIJpHAXEUkghbuISAIp3EVEEkjhLiKSQAp3EZEEUriLiCTQ/wPx770DyW9UeAAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(trainingData, {},'leastAbsoluteError',10,3, param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different maximum bins')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
assign2/.ipynb_checkpoints/house-checkpoint.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
"

SparkContext

\n",
"\n",
"

Spark UI

\n",
"\n",
"
\n",
"
Version
\n",
"
v2.2.0
\n",
"
Master
\n",
"
local[*]
\n",
"
AppName
\n",
"
PySparkShell
\n",
"
\n",
"
\n",
" "
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sc"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.regression import LabeledPoint,LinearRegressionWithSGD\n",
"from pyspark.mllib.tree import DecisionTree\n",
"import numpy as np\n",
"import operator\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"\n",
"house_df = sc.textFile(\"/Users/Priya/Desktop/house/trainnoheader.csv\")\n",
"data_count= house_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"records = house_df.map(lambda x: x.split(\",\"))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"type_columns=[2,5,7,8,9,10,11,12,13,14,15,16,21,22,23,24,27,28,29,39,40,41,53,55,65,78,79]\n",
"type_columns_with_NA=[6,25,30,31,32,33,35,42,57,58,60,63,64,72,73,74]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"number_columns=[1,4,17,18,19,20,34,36,37,38,43,44,45,46,47,48,49,50,51,52,54,56,61,62,66,67,68,69,70,71,75,76,77]\n",
"number_columns_with_NA=[3,26,59]\n",
"number_columns_with_many_zeros=[26,34,36,37,38,44,45,62,66,67,68,69,70,71,75]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"saleprice_column=80"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def getMapOfColumn(idx):\n",
" return records.map(lambda fields:fields[idx]).distinct().zipWithIndex().collectAsMap()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def get_type_maps():\n",
" type_maps={}\n",
" for i in type_columns:\n",
" type_maps[i]=getMapOfColumn(i)\n",
" for i in type_columns_with_NA:\n",
" type_maps[i]=getMapOfColumn(i)\n",
" return type_maps"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"type_maps=get_type_maps()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{2: {'RL': 0, 'RH': 1, 'RM': 2, 'C (all)': 3, 'FV': 4}, 5: {'Pave': 0, 'Grvl': 1}, 7: {'Reg': 0, 'IR1': 1, 'IR2': 2, 'IR3': 3}, 8: {'Bnk': 0, 'Low': 1, 'Lvl': 2, 'HLS': 3}, 9: {'NoSeWa': 0, 'AllPub': 1}, 10: {'FR2': 0, 'CulDSac': 1, 'Inside': 2, 'Corner': 3, 'FR3': 4}, 11: {'Gtl': 0, 'Mod': 1, 'Sev': 2}, 12: {'CollgCr': 0, 'Mitchel': 1, 'NWAmes': 2, 'NAmes': 3, 'MeadowV': 4, 'Edwards': 5, 'ClearCr': 6, 'NPkVill': 7, 'Blmngtn': 8, 'SWISU': 9, 'Veenker': 10, 'Crawfor': 11, 'NoRidge': 12, 'Somerst': 13, 'OldTown': 14, 'BrkSide': 15, 'Sawyer': 16, 'NridgHt': 17, 'SawyerW': 18, 'IDOTRR': 19, 'Timber': 20, 'Gilbert': 21, 'StoneBr': 22, 'BrDale': 23, 'Blueste': 24}, 13: {'Norm': 0, 'Feedr': 1, 'PosN': 2, 'Artery': 3, 'RRAe': 4, 'RRNn': 5, 'PosA': 6, 'RRAn': 7, 'RRNe': 8}, 14: {'Norm': 0, 'Artery': 1, 'RRNn': 2, 'Feedr': 3, 'PosN': 4, 'PosA': 5, 'RRAe': 6, 'RRAn': 7}, 15: {'1Fam': 0, 'Duplex': 1, 'TwnhsE': 2, '2fmCon': 3, 'Twnhs': 4}, 16: {'1.5Fin': 0, '1.5Unf': 1, 'SLvl': 2, '2.5Unf': 3, '2.5Fin': 4, '2Story': 5, '1Story': 6, 'SFoyer': 7}, 21: {'Hip': 0, 'Shed': 1, 'Gable': 2, 'Gambrel': 3, 'Mansard': 4, 'Flat': 5}, 22: {'Metal': 0, 'Membran': 1, 'Roll': 2, 'CompShg': 3, 'WdShngl': 4, 'WdShake': 5, 'Tar&Grv': 6, 'ClyTile': 7}, 23: {'VinylSd': 0, 'WdShing': 1, 'Plywood': 2, 'BrkComm': 3, 'AsphShn': 4, 'CBlock': 5, 'MetalSd': 6, 'Wd Sdng': 7, 'HdBoard': 8, 'BrkFace': 9, 'CemntBd': 10, 'AsbShng': 11, 'Stucco': 12, 'Stone': 13, 'ImStucc': 14}, 24: {'VinylSd': 0, 'Wd Shng': 1, 'Plywood': 2, 'CmentBd': 3, 'AsphShn': 4, 'CBlock': 5, 'MetalSd': 6, 'HdBoard': 7, 'Wd Sdng': 8, 'BrkFace': 9, 'Stucco': 10, 'AsbShng': 11, 'Brk Cmn': 12, 'ImStucc': 13, 'Stone': 14, 'Other': 15}, 27: {'Fa': 0, 'Gd': 1, 'TA': 2, 'Ex': 3}, 28: {'Fa': 0, 'Po': 1, 'TA': 2, 'Gd': 3, 'Ex': 4}, 29: {'PConc': 0, 'CBlock': 1, 'BrkTil': 2, 'Wood': 3, 'Slab': 4, 'Stone': 5}, 39: {'GasW': 0, 'GasA': 1, 'Grav': 2, 'Wall': 3, 'OthW': 4, 'Floor': 5}, 40: {'Fa': 0, 'Po': 1, 'Ex': 2, 'Gd': 3, 'TA': 4}, 41: {'N': 0, 'Y': 1}, 53: {'Fa': 0, 'Gd': 1, 'TA': 2, 'Ex': 3}, 55: {'Typ': 0, 'Min2': 1, 'Maj2': 2, 'Min1': 3, 'Maj1': 4, 'Mod': 5, 'Sev': 6}, 65: {'N': 0, 'Y': 1, 'P': 2}, 78: {'WD': 0, 'New': 1, 'ConLw': 2, 'COD': 3, 'ConLD': 4, 'ConLI': 5, 'CWD': 6, 'Con': 7, 'Oth': 8}, 79: {'Normal': 0, 'AdjLand': 1, 'Family': 2, 'Abnorml': 3, 'Partial': 4, 'Alloca': 5}, 6: {'NA': 0, 'Pave': 1, 'Grvl': 2}, 25: {'None': 0, 'NA': 1, 'BrkFace': 2, 'Stone': 3, 'BrkCmn': 4}, 30: {'NA': 0, 'Fa': 1, 'Gd': 2, 'TA': 3, 'Ex': 4}, 31: {'NA': 0, 'Fa': 1, 'Po': 2, 'TA': 3, 'Gd': 4}, 32: {'Mn': 0, 'NA': 1, 'No': 2, 'Gd': 3, 'Av': 4}, 33: {'GLQ': 0, 'Rec': 1, 'NA': 2, 'ALQ': 3, 'Unf': 4, 'BLQ': 5, 'LwQ': 6}, 35: {'NA': 0, 'Rec': 1, 'GLQ': 2, 'Unf': 3, 'BLQ': 4, 'ALQ': 5, 'LwQ': 6}, 42: {'FuseF': 0, 'FuseA': 1, 'FuseP': 2, 'Mix': 3, 'NA': 4, 'SBrkr': 5}, 57: {'NA': 0, 'Fa': 1, 'Po': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}, 58: {'BuiltIn': 0, 'CarPort': 1, 'NA': 2, 'Basment': 3, '2Types': 4, 'Attchd': 5, 'Detchd': 6}, 60: {'Fin': 0, 'NA': 1, 'RFn': 2, 'Unf': 3}, 63: {'Fa': 0, 'NA': 1, 'Po': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}, 64: {'Fa': 0, 'NA': 1, 'Po': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}, 72: {'NA': 0, 'Fa': 1, 'Ex': 2, 'Gd': 3}, 73: {'NA': 0, 'MnPrv': 1, 'MnWw': 2, 'GdWo': 3, 'GdPrv': 4}, 74: {'NA': 0, 'Shed': 1, 'Othr': 2, 'Gar2': 3, 'TenC': 4}}\n"
]
}
],
"source": [
"print(type_maps)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def get_type_cnt(maps):\n",
" return sum([len(maps[i]) for i in maps])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature vector length for type features: 268\n",
"Feature vector length for numerical features: 33\n",
"Total feature vector length: 301\n",
"Total_dt feature vector length: 76\n"
]
}
],
"source": [
"type_cnt=get_type_cnt(type_maps)\n",
"number_cnt=len(number_columns)\n",
"total=type_cnt+number_cnt\n",
"\n",
"total_dt=len(type_columns)+len(type_columns_with_NA)+len(number_columns)\n",
"\n",
"print (\"Feature vector length for type features: %d\" % type_cnt)\n",
"print (\"Feature vector length for numerical features: %d\" % number_cnt)\n",
"print (\"Total feature vector length: %d\" % total)\n",
"print (\"Total_dt feature vector length: %d\" % total_dt)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def extract_features(fields):\n",
" features=np.zeros(total)\n",
" step=0\n",
" for i in type_columns:\n",
" features[step+ int(type_maps[i][fields[i]]) ]=1.0\n",
" step=step+len(type_maps[i])\n",
" for i in type_columns_with_NA:\n",
" features[step+int(type_maps[i][fields[i]])]=1.0\n",
" step=step+len(type_maps[i])\n",
" for i in number_columns:\n",
" features[step]=float(fields[i])\n",
" step=step+1\n",
" return features"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def extract_features_dt(fields):\n",
" features=np.zeros(total_dt)\n",
" step=0\n",
" for i in type_columns:\n",
" features[step]=float(type_maps[i][fields[i]])\n",
" step=step+1\n",
" \n",
" for i in type_columns_with_NA:\n",
" features[step]=float(type_maps[i][fields[i]])\n",
" step=step+1\n",
" for i in number_columns:\n",
" features[step]=float(fields[i])\n",
" step=step+1\n",
" return features"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"data=records.map(lambda fields: LabeledPoint(float(fields[saleprice_column]),extract_features(fields)))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Label: 208500.0\n",
"Linear Model feature vector:\n",
"[1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,60.0,8450.0,7.0,5.0,2003.0,2003.0,706.0,0.0,150.0,856.0,856.0,854.0,0.0,1710.0,1.0,0.0,2.0,1.0,3.0,1.0,8.0,0.0,2.0,548.0,0.0,61.0,0.0,0.0,0.0,0.0,0.0,2.0,2008.0]\n",
"Linear Model feature vector length: 301\n"
]
}
],
"source": [
"first_point = data.first()\n",
"#print (\"Raw data: \" + str(first_point[1:]))\n",
"print (\"Label: \" + str(first_point.label))\n",
"print (\"Linear Model feature vector:\\n\" + str(first_point.features))\n",
"print (\"Linear Model feature vector length: \" + str(len(first_point.features)))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.regression import LinearRegressionWithSGD"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
}
],
"source": [
"lrModel=LinearRegressionWithSGD.train(data, iterations=10, step=0.1, intercept=False)\n",
"true_vs_predicted=data.map(lambda p: (p.label, lrModel.predict(p.features)))"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Model predictions: [(208500.0, -1.3111060925180484e+75), (181500.0, -1.4720767452081686e+75), (223500.0, -1.7050281430818638e+75), (140000.0, -1.4631365187530982e+75), (250000.0, -2.1369709269890862e+75)]\n"
]
}
],
"source": [
"print (\"Linear Model predictions: \" + str(true_vs_predicted.take(5)))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Model - Mean Squared Error: 4519283835876382689242228853370308019420839092654378420329959275965062173707428040253979807151535809132649831467864490762707227557576984928707873341440.0000\n"
]
}
],
"source": [
"li=[]\n",
"for i in true_vs_predicted.collect():\n",
" true,pred=i[0],i[1]\n",
" val=(pred - true)**2\n",
" li.append(val)\n",
"lenth=len(li)\n",
"su=sum(li)\n",
"mean=su/lenth\n",
"print (\"Linear Model - Mean Squared Error: %2.4f\" % mean)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"targets = records.map(lambda r: float(r[-1])).collect()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"import pylab"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/anaconda3/lib/python3.6/site-packages/IPython/core/magics/pylab.py:160: UserWarning: pylab import has clobbered these variables: ['mean', 'pylab']\n",
"`%matplotlib` prevents importing * from pylab and numpy\n",
" \"\\n`%matplotlib` prevents importing * from pylab and numpy\"\n"
]
}
],
"source": [
"%pylab inline"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA8QAAAJCCAYAAAAGMg6GAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X+spNdd3/HPt7vYgUDt4JhqaztdR9muuhYIwsoEQSlySmKjFqdbV1qDhNWmcn8kUhGtiC3UqkRRJaMKI0QCRE2QFdquXXdpVvxyEUn7B6J21ji/nHDx4lC88kJs7IRC1YQ1p3/Mcbgs98fs+pq7s9/XSxrdmTPPnOe5c8KEd565z9YYIwAAANDNX9rtAwAAAIDdIIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtLR3tw9gN7z61a8e+/fv3+3DAAAA4GXw6KOPPjvGuHq77VoG8f79+3Py5MndPgwAAABeBlX1v5fZzlemAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtLR3tw8AVsXxtTM7PueRg/t2fE4AAGA5zhADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoKWlgriqbq6qtao6VVV3bfD85VV1/3z+4arav+65u+f4WlW9ebs5q+r6OccTc87LttpHVX1ZVd1XVZ+oqk9X1d0X+mYAAADQx7ZBXFV7krw7yS1JDiW5vaoOnbPZW5M8P8Z4XZJ7k9wzX3soydEkNyS5Ocl7qmrPNnPek+TeMcaBJM/PuTfdR5J/kOTyMcbXJvnGJP9kfZADAADARpY5Q3xjklNjjCfHGF9McizJredsc2uS++b9B5O8sapqjh8bY3xhjPGZJKfmfBvOOV9z05wjc863bLOPkeSVVbU3yZcn+WKSP1j6HQAAAKClZYL4miRPrXt8eo5tuM0Y42ySzye5aovXbjZ+VZLPzTnO3ddm+3gwyR8lOZPkd5L8+zHGc+f+ElV1Z1WdrKqTzzzzzBK/NgAAAJeyZYK4NhgbS26zU+Nb7ePGJC8k+atJrk/yL6vqtX9uwzHeO8Y4PMY4fPXVV28wFQAAAJ0sE8Snk1y37vG1SZ7ebJv51eUrkjy3xWs3G382yZVzjnP3tdk+vjvJL40x/niM8dkkv5rk8BK/FwAAAI0tE8QfSXJgXv35siwuknXinG1OJLlj3r8tyYfGGGOOH51XiL4+yYEkj2w253zNh+ccmXN+cJt9/E6Sm2rhlUnekOQ3ln8LAAAA6GjvdhuMMc5W1duTPJRkT5L3jzEer6p3Jjk5xjiR5H1JPlBVp7I4a3t0vvbxqnogyaeSnE3ytjHGC0my0Zxzl+9Icqyq3pXksTl3NttHFler/ukkn8zia9U/Pcb4+AW/IwAAALRQi5OsvRw+fHicPHlytw+DFXN87cyOz3nk4L4dnxMAALqrqkfHGNv+Ke0yX5kGAACAS44gBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALe3d7QOAzo6vndnR+Y4c3Lej8wEAwKXMGWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFpaKoir6uaqWquqU1V11wbPX15V98/nH66q/eueu3uOr1XVm7ebs6qun3M8Mee8bIl9fF1V/VpVPV5Vn6iqV1zImwEAAEAf2wZxVe1J8u4ktyQ5lOT2qjp0zmZvTfL8GON1Se5Ncs987aEkR5PckOTmJO+pqj3bzHlPknvHGAeSPD/n3mofe5P8TJJ/Osa4Icm3J/nj83wfAAAAaGaZM8Q3Jjk1xnhyjPHFJMeS3HrONrcmuW/efzDJG6uq5vixMcYXxhifSXJqzrfhnPM1N805Mud8yzb7eFOSj48xPpYkY4zfH2O8sPxbAAAAQEfLBPE1SZ5a9/j0HNtwmzHG2SSfT3LVFq/dbPyqJJ+bc5y7r8328deTjKp6qKp+vap+YKNfoqrurKqTVXXymWeeWeLXBgAA4FK2TBDXBmNjyW12anyrfexN8q1Jvmf+/HtV9cY/t+EY7x1jHB5jHL766qs3mAoAAIBOlgni00muW/f42iRPb7bN/JveK5I8t8VrNxt/NsmVc45z97XVPv7nGOPZMcb/TfILSV6/xO8FAABAY8sE8UeSHJhXf74si4tknThnmxNJ7pj3b0vyoTHGmONH5xWir09yIMkjm805X/PhOUfmnB/cZh8PJfm6qvqKGcp/K8mnln8LAAAA6GjvdhuMMc5W1duzCM89Sd4/xni8qt6Z5OQY40SS9yX5QFWdyuKs7dH52ser6oEsAvVskre9eMGrjeacu3xHkmNV9a4kj825s8U+nq+qH8kiskeSXxhj/PxLelcAAAC45NXiJGsvhw8fHidPntztw2DFHF87s9uHsK0jB/ft9iEAAMCuq6pHxxiHt9tu2zPEwOrY6WgX2AAAXMqW+RtiAAAAuOQIYgAAAFoSxAAAALQkiAEAAGjJRbW4ZK3CVaEBAIDd4wwxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWtq72wcAXLyOr53Z8TmPHNy343MCAMCFcIYYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0NJSQVxVN1fVWlWdqqq7Nnj+8qq6fz7/cFXtX/fc3XN8rarevN2cVXX9nOOJOedl2+1jPv+aqvrDqvpX5/smAAAA0M+2QVxVe5K8O8ktSQ4lub2qDp2z2VuTPD/GeF2Se5PcM197KMnRJDckuTnJe6pqzzZz3pPk3jHGgSTPz7k33cc69yb5xWV/cQAAAHpb5gzxjUlOjTGeHGN8McmxJLees82tSe6b9x9M8saqqjl+bIzxhTHGZ5KcmvNtOOd8zU1zjsw537LNPlJVb0nyZJLHl//VAQAA6GyZIL4myVPrHp+eYxtuM8Y4m+TzSa7a4rWbjV+V5HNzjnP3teE+quqVSd6R5Ie2+iWq6s6qOllVJ5955pltfmUAAAAudcsEcW0wNpbcZqfGt9rHD2XxFes/3OD5P91wjPeOMQ6PMQ5fffXVW20KAABAA3uX2OZ0kuvWPb42ydObbHO6qvYmuSLJc9u8dqPxZ5NcWVV751ng9dtvto9vSnJbVf1wkiuT/ElV/b8xxo8v8bsBAADQ1DJniD+S5MC8+vNlWVwk68Q525xIcse8f1uSD40xxhw/Oq8QfX2SA0ke2WzO+ZoPzzky5/zgVvsYY/zNMcb+Mcb+JD+a5N+JYQAAALaz7RniMcbZqnp7koeS7Eny/jHG41X1ziQnxxgnkrwvyQeq6lQWZ22Pztc+XlUPJPlUkrNJ3jbGeCFJNppz7vIdSY5V1buSPDbnzmb7AAAAgAtRi5OyvRw+fHicPHlytw+Dl9nxtTO7fQhs4MjBfbt9CAAAXOKq6tExxuHttlvmK9MAAABwyRHEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFrau9sHAPRyfO3Mjs535OC+HZ0PAIA+nCEGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0tHe3DwDgpTi+dmZH5ztycN+OzgcAwMXLGWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtLR3tw8AXnR87cxuHwIAANCIM8QAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC3t3e0DALiYHF87s6PzHTm4b0fnAwBg5zhDDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJaWCuKqurmq1qrqVFXdtcHzl1fV/fP5h6tq/7rn7p7ja1X15u3mrKrr5xxPzDkv22ofVfUdVfVoVX1i/rzpQt8MAAAA+tg2iKtqT5J3J7klyaEkt1fVoXM2e2uS58cYr0tyb5J75msPJTma5IYkNyd5T1Xt2WbOe5LcO8Y4kOT5Ofem+0jybJK/O8b42iR3JPnA+b0FAAAAdLTMGeIbk5waYzw5xvhikmNJbj1nm1uT3DfvP5jkjVVVc/zYGOMLY4zPJDk159twzvmam+YcmXO+Zat9jDEeG2M8PccfT/KKqrp82TcAAACAnvYusc01SZ5a9/h0km/abJsxxtmq+nySq+b4/zrntdfM+xvNeVWSz40xzm6w/Wb7eHbdPH8/yWNjjC8s8XvxEhxfO7PbhwAAAPCSLBPEtcHYWHKbzcY3OjO91fbbHkdV3ZDF16jftMF2qao7k9yZJK95zWs22gQAAIBGlvnK9Okk1617fG2Spzfbpqr2JrkiyXNbvHaz8WeTXDnnOHdfm+0jVXVtkp9N8r1jjN/a6JcYY7x3jHF4jHH46quvXuLXBgAA4FK2TBB/JMmBefXny7K4SNaJc7Y5kcUFrZLktiQfGmOMOX50XiH6+iQHkjyy2ZzzNR+ec2TO+cGt9lFVVyb5+SR3jzF+9Xx+eQAAAPraNojn3/O+PclDST6d5IExxuNV9c6q+q652fuSXFVVp5J8f5K75msfT/JAkk8l+aUkbxtjvLDZnHOudyT5/jnXVXPuTfcx53ldkn9dVR+dt6+5wPcDAACAJmpxUraXw4cPj5MnT+72Yaw0F9WC5Rw5uG+3DwEAoJ2qenSMcXi77Zb5yjQAAABccgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANDS3t0+AIBL2fG1Mzs+55GD+3Z8TgCAjpwhBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC25qBbAitnpC3W5SBcA0JUzxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABaEsQAAAC0JIgBAABoSRADAADQkiAGAACgJUEMAABAS4IYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLe3f7AAC4tBxfO7Pjcx45uG/H5wQAcIYYAACAlgQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAlQQwAAEBLghgAAICWBDEAAAAtCWIAAABa2rvbBwDA7jq+dma3D2FbO32MRw7u29H5AIDV5AwxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoCVBDAAAQEuCGAAAgJYEMQAAAC0JYgAAAFoSxAAAALS0d7cPgL8Yx9fO7PYhAAAAXFScIQYAAKAlQQwAAEBLghgAAICW/A0xAFxkXo7rPhw5uG/H5wSAVecMMQAAAC0JYgAAAFoSxAAAALQkiAEAAGhJEAMAANCSIAYAAKAl/+wSALxEL8c/kwQAvPycIQYAAKAlQQwAAEBLghgAAICWBDEAAAAtuagWAO24CBYAkDhDDAAAQFPOEF+knL0AYCft9H+vHDm4b0fnA4Dd4AwxAAAALQliAAAAWhLEAAAAtORviAGA8+ZvkgG4FDhDDAAAQEuCGAAAgJZ8ZRoA2HUvxz836GvYAGzHGWIAAABacoYYALgkufAXANtxhhgAAICWnCEGANgFHc9gd/ydd5q/t+dC+M/N5pY6Q1xVN1fVWlWdqqq7Nnj+8qq6fz7/cFXtX/fc3XN8rarevN2cVXX9nOOJOedlF7oPAAAA2My2QVxVe5K8O8ktSQ4lub2qDp2z2VuTPD/GeF2Se5PcM197KMnRJDckuTnJe6pqzzZz3pPk3jHGgSTPz7nPex/n+0YAAADQyzJfmb4xyakxxpNJUlXHktya5FPrtrk1yb+d9x9M8uNVVXP82BjjC0k+U1Wn5nzZaM6q+nSSm5J899zmvjnvT1zAPn5tyfcAAGBbL8dXDi92fueX7lL5Wilcqpb5yvQ1SZ5a9/j0HNtwmzHG2SSfT3LVFq/dbPyqJJ+bc5y7r/PdBwAAAGxqmTPEtcHYWHKbzcY3CvGttr+QffzZA6y6M8md8+EfVtVaklcneXaD17MarN/qs4arzxquNuu3+qzh6rOGq88aXpz+2jIbLRPEp5Nct+7xtUme3mSb01W1N8kVSZ7b5rUbjT+b5Mqq2jvPAq/f/kL28SVjjPcmee/6sao6OcY4vOlvzkXN+q0+a7j6rOFqs36rzxquPmu4+qzhalvmK9MfSXJgXv35siwuYHXinG1OJLlj3r8tyYfGGGOOH51XiL4+yYEkj2w253zNh+ccmXN+8AL3AQAAAJva9gzxGONsVb09yUNJ9iR5/xjj8ap6Z5KTY4wTSd6X5APzglbPZRG4mds9kMUFuM4medsY44Uk2WjOuct3JDlWVe9K8ticOxeyDwAAANhMLU6y9lRVd86vUrOCrN/qs4arzxquNuu3+qzh6rOGq88arrbWQQwAAEBfy/wNMQAAAFxyWgZxVd1cVWtVdaqq7trt4+moqt5fVZ+tqk+uG/vqqvrlqnpi/nzVHK+q+rG5Xh+vqteve80dc/snquqOdePfWFWfmK/5saqqrfbB+amq66rqw1X16ap6vKr+xRy3hiuiql5RVY9U1cfmGv7QHL++qh6e7+/988KHmRcuvH+ux8NVtX/dXHfP8bWqevO68Q0/azfbB+evqvZU1WNV9XPzsfVbIVX12/Nz7qNVdXKO+RxdIVV1ZVU9WFW/Mf878Zut4eqoqoPz//5evP1BVX2fNWxmjNHqlsVFvH4ryWuTXJbkY0kO7fZxdbsl+bYkr0/yyXVjP5zkrnn/riT3zPvfmeQXs/g3p9+Q5OE5/tVJnpw/XzXvv2o+90iSb56v+cUkt2y1D7fzXr99SV4/739Vkt9Mcsgars5tvq9fOe9/WZKH59o8kOToHP/JJP9s3v/nSX5y3j+a5P55/9D8HL08yfXz83XPVp+1m+3D7YLW8fuT/KckP7fVe2v9Ls5bkt9O8upzxnyOrtAtyX1J/vG8f1mSK63hat7m597vZvFv11rDRrddP4C/8F948R/Ih9Y9vjvJ3bt9XB1vSfbnzwbxWpJ98/6+JGvz/k8luf3c7ZLcnuSn1o3/1Bzbl+Q31o1/abvN9uH2ktfyg0m+wxqu5i3JVyT59STflMW/B793jn/p8zKLfxXgm+f9vXO7Ovcz9MXtNvusna/ZcB9u571u1yb5lSQ3Jfm5rd5b63dx3rJxEPscXZFbkr+c5DOZ1+Sxhqt9S/KmJL9qDfvdOn5l+pokT617fHqOsfv+yhjjTJLMn18zxzdbs63GT28wvtU+uEDzq5ffkMUZRmu4QubXbT+a5LNJfjmLM4KfG2OcnZusf9+/tFbz+c8nuSrnv7ZXbbEPzs+PJvmBJH8yH2/13lq/i9NI8t+r6tGqunOO+RxdHa9N8kySn67Fny78h6p6Zazhqjqa5D/P+9awkY5BXBuMudT2xW2zNTvfcXZYVX1lkv+a5PvGGH+w1aYbjFnDXTbGeGGM8fVZnGm8Mcnf2Giz+XOn1tDa7oCq+jtJPjvGeHT98AabWr+L27eMMV6f5JYkb6uqb9tiW2t18dmbxZ9//cQY4xuS/FEWX33djDW8SM1rIXxXkv+y3aYbjFnDFdcxiE8nuW7d42uTPL1Lx8Kf9XtVtS9J5s/PzvHN1myr8Ws3GN9qH5ynqvqyLGL4P44xjs9ha7iCxhifS/I/svh7qCurau98av37/qW1ms9fkeS5nP/aPrvFPljetyT5rqr67STHsvja9I/G+q2UMcbT8+dnk/xsFv/DlM/R1XE6yekxxsPz8YNZBLI1XD23JPn1McbvzcfWsJGOQfyRJAdqcZXMy7L4esSJXT4mFk4kuWPevyOLv0t9cfx755X93pDk8/OrJQ8leVNVvWpeme9NWfwt25kk/6eq3jCv5Pe958y10T44D/N9fV+ST48xfmTdU9ZwRVTV1VV15bz/5Un+dpJPJ/lwktvmZueu4Yvv+21JPjTGGHP8aC2uYnx9kgNZXEBkw8/a+ZrN9sGSxhh3jzGuHWPsz+K9/dAY43ti/VZGVb2yqr7qxftZfP59Mj5HV8YY43eTPFVVB+fQG5N8KtZwFd2eP/26dGINe9ntP2LejVsWV4j7zSz+Xu4Hd/t4Ot6y+NA5k+SPs/hfz96axd+m/UqSJ+bPr57bVpJ3z/X6RJLD6+b5R0lOzds/XDd+OIv/x+K3kvx45gUvNtuH23mv37dm8ZWfjyf56Lx9pzVcnVuSr0vy2FzDTyb5N3P8tVkE0aksvjp2+Rx/xXx8aj7/2nVz/eBcp7XMq2fO8Q0/azfbh9sFr+W350+vMm39VuQ238ePzdvjL77HPkdX65bk65OcnJ+l/y2LKwxbwxW6ZXFhyd9PcsW6MWvY6PbiggAAAEArHb8yDQAAAIIYAAAcjf74AAAALUlEQVSAngQxAAAALQliAAAAWhLEAAAAtCSIAQAAaEkQAwAA0JIgBgAAoKX/D7gn8g719yqtAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"hist(targets, bins=40, color='lightblue', normed=True)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"\n",
"fig.set_size_inches(16, 10)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA6UAAAJCCAYAAAA4F2HIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAGthJREFUeJzt3X+s3fdd3/HXm5iABAUKMcjkB4m2YDWqOpVZEVOlUQTtkmpKNg+mZGXrtkI0iYxJsKmZOoUp1aTS/sE0LaxErGpB0CxDZljDXVptRd0GYXFFyZqES63QLVasxbRdUVWxLOO9P3xand7e63tqH/ttn/t4SJbv+Z7P/d531Y+O7zPf86O6OwAAADDha6YHAAAAYP8SpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIw5MPWDr7vuur755punfjwAAACX0Mc+9rE/6u6De60bi9Kbb745J0+enPrxAAAAXEJV9T9WWefpuwAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIwRpQAAAIw5MD0AAPvTsa0z0yOc19HDh6ZHAIB9wZVSAAAAxuwZpVX13qp6sao+scv9b66qpxZ/fquq/tz6xwQAAGATrXKl9H1J7jjP/X+Y5Pu6+zVJ3pHkkTXMBQAAwD6w52tKu/ujVXXzee7/raWbTyS54eLHAgAAYD9Y92tK35rkg7vdWVX3VdXJqjp59uzZNf9oAAAArjZri9Kq+v6ci9K37bamux/p7iPdfeTgwYPr+tEAAABcpdbykTBV9Zokv5Dkzu7+9DrOCQAAwOa76CulVXVTkmNJ/mZ3/8HFjwQAAMB+seeV0qr6QJLXJ7muqk4n+ekkX5sk3f2eJA8m+bYkP1dVSfJydx+5VAMDAACwOVZ5991797j/R5P86NomAgAAYN9Y97vvAgAAwMpEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGNEKQAAAGMOTA8AAFyYY1tn1nq+o4cPrfV8ALAKV0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYs2eUVtV7q+rFqvrELvdXVf2LqjpVVU9V1fesf0wAAAA20SpXSt+X5I7z3H9nklsXf+5L8q8ufiwAAAD2gz2jtLs/muQz51lyd5Jf7HOeSPItVXVoXQMCAACwudbxmtLrkzy/dPv04hgAAACc1zqitHY41jsurLqvqk5W1cmzZ8+u4UcDAABwNVtHlJ5OcuPS7RuSvLDTwu5+pLuPdPeRgwcPruFHAwAAcDVbR5QeT/K3Fu/C+71JPtfdZ9ZwXgAAADbcgb0WVNUHkrw+yXVVdTrJTyf52iTp7vckOZHkTUlOJflCkr9zqYYFAABgs+wZpd197x73d5IfX9tEALChjm15IhEAbLeOp+8CAADABRGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjDkwPQAAXImObZ2ZHgEA9gVXSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABhzYHoAAK58x7bOTI8AAGwoV0oBAAAYI0oBAAAYI0oBAAAYI0oBAAAYs1KUVtUdVbVVVaeq6oEd7r+pqj5SVb9bVU9V1ZvWPyoAAACbZs8oraprkjyc5M4ktyW5t6pu27bsnyR5rLtfm+SeJD+37kEBAADYPKtcKb09yanufq67X0ryaJK7t63pJN+0+Pqbk7ywvhEBAADYVKtE6fVJnl+6fXpxbNk/TfIjVXU6yYkkf3+nE1XVfVV1sqpOnj179gLGBQAAYJOsEqW1w7HedvveJO/r7huSvCnJL1XVV5y7ux/p7iPdfeTgwYNf/bQAAABslFWi9HSSG5du35CvfHruW5M8liTd/dtJvj7JdesYEAAAgM21SpQ+meTWqrqlqq7NuTcyOr5tzf9M8gNJUlWvyrko9fxcAAAAzmvPKO3ul5Pcn+TxJM/m3LvsPl1VD1XVXYtlP5Xkx6rq95J8IMnf7u7tT/EFAACAL3NglUXdfSLn3sBo+diDS18/k+R16x0NAACATbfK03cBAADgkljpSikAsPmObZ1Z6/mOHj601vMBsJlcKQUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGCMKAUAAGDMgekBAIDNdGzrzPQIezp6+ND0CAD7niulAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjBGlAAAAjFkpSqvqjqraqqpTVfXALmv+elU9U1VPV9WvrHdMAAAANtGBvRZU1TVJHk7yhiSnkzxZVce7+5mlNbcm+cdJXtfdn62qb79UAwMAALA5VrlSenuSU939XHe/lOTRJHdvW/NjSR7u7s8mSXe/uN4xAQAA2ESrROn1SZ5fun16cWzZdyf57qr6r1X1RFXdsa4BAQAA2Fx7Pn03Se1wrHc4z61JXp/khiT/uape3d3/+8tOVHVfkvuS5KabbvqqhwUAAGCzrHKl9HSSG5du35DkhR3W/Hp3/9/u/sMkWzkXqV+mux/p7iPdfeTgwYMXOjMAAAAbYpUofTLJrVV1S1Vdm+SeJMe3rfl3Sb4/Sarqupx7Ou9z6xwUAACAzbNnlHb3y0nuT/J4kmeTPNbdT1fVQ1V112LZ40k+XVXPJPlIkn/U3Z++VEMDAACwGVZ5TWm6+0SSE9uOPbj0dSf5ycUfAAAAWMkqT98FAACAS0KUAgAAMEaUAgAAMGal15QCcHU5tnVmegQAgJW4UgoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMAYUQoAAMCYA9MDAJAc2zozPQIAwAhXSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABjjI2EAgH1r3R/HdPTwobWeD2A/cKUUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMaIUAACAMQemBwAA2BTHts6s/ZxHDx9a+zkBriSulAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBGlAIAADBmpSitqjuqaquqTlXVA+dZ90NV1VV1ZH0jAgAAsKn2jNKquibJw0nuTHJbknur6rYd1r0iyU8k+Z11DwkAAMBmWuVK6e1JTnX3c939UpJHk9y9w7p3JHlXkj9Z43wAAABssFWi9Pokzy/dPr049iVV9dokN3b3v1/jbAAAAGy4VaK0djjWX7qz6muS/GySn9rzRFX3VdXJqjp59uzZ1acEAABgI60SpaeT3Lh0+4YkLyzdfkWSVyf5zar6VJLvTXJ8pzc76u5HuvtIdx85ePDghU8NAADARlglSp9McmtV3VJV1ya5J8nxL97Z3Z/r7uu6++buvjnJE0nu6u6Tl2RiAAAANsaeUdrdLye5P8njSZ5N8lh3P11VD1XVXZd6QAAAADbXgVUWdfeJJCe2HXtwl7Wvv/ixAAAA2A9WefouAAAAXBKiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDGiFAAAgDEHpgcAAODyObZ1Zq3nO3r40FrPB+w/rpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAw5sD0AAAA7O7Y1pnpEQAuKVdKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGHNgegCAq9GxrTPTIwAAbARXSgEAABgjSgEAABgjSgEAABjjNaXAFcfrNQEA9g9XSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABjjc0oBALhgl+KzpY8ePrT2cwJXLldKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGCNKAQAAGLNSlFbVHVW1VVWnquqBHe7/yap6pqqeqqr/WFXftf5RAQAA2DR7RmlVXZPk4SR3Jrktyb1Vddu2Zb+b5Eh3vybJryZ517oHBQAAYPOscqX09iSnuvu57n4pyaNJ7l5e0N0f6e4vLG4+keSG9Y4JAADAJlolSq9P8vzS7dOLY7t5a5IP7nRHVd1XVSer6uTZs2dXnxIAAICNtEqU1g7HeseFVT+S5EiSd+90f3c/0t1HuvvIwYMHV58SAACAjXRghTWnk9y4dPuGJC9sX1RVP5jk7Um+r7v/z3rGAwAAYJOtcqX0ySS3VtUtVXVtknuSHF9eUFWvTfLzSe7q7hfXPyYAAACbaM8o7e6Xk9yf5PEkzyZ5rLufrqqHququxbJ3J/nGJP+2qj5eVcd3OR0AAAB8ySpP3013n0hyYtuxB5e+/sE1zwUAAMA+sMrTdwEAAOCSEKUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMEaUAAACMOTA9AHD1O7Z1ZnoEAACuUq6UAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMMbnlAIAsNHW/XnaRw8fWuv5YL9zpRQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxohQAAIAxB6YHAACAZce2zkyPAFxGrpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAwRpQCAAAw5sD0AAAAcDU5tnVmrec7evjQWs8HVxtXSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABgjSgEAABhzYHoA2DTHts6s9XxHDx9a6/kAAOBK4kopAAAAY0QpAAAAY0QpAAAAY7ymFK5wXqMKAJtt3f/WJ/695+riSikAAABjRCkAAABjRCkAAABjvKaUq4rXXAAAwGZxpRQAAIAxohQAAIAxnr57GfloD64El+Ip0AAAcKFcKQUAAGCMKAUAAGCMKAUAAGCM15Seh9fe7Q/+fwYANo3fby7eut+/xfvL7M6VUgAAAMaIUgAAAMaIUgAAAMasFKVVdUdVbVXVqap6YIf7v66q/s3i/t+pqpvXPSgAAACbZ88oraprkjyc5M4ktyW5t6pu27bsrUk+291/NsnPJvmZdQ8KAADA5lnlSuntSU5193Pd/VKSR5PcvW3N3Unev/j6V5P8QFXV+sYEAABgE60SpdcneX7p9unFsR3XdPfLST6X5NvWMSAAAACba5XPKd3pimdfwJpU1X1J7lvc/HxVba3w868G1yX5o+khuGLYDyyzH1hmP7DMfmCZ/cCyTdkP37XKolWi9HSSG5du35DkhV3WnK6qA0m+Oclntp+oux9J8sgqg11Nqupkdx+ZnoMrg/3AMvuBZfYDy+wHltkPLNtv+2GVp+8+meTWqrqlqq5Nck+S49vWHE/ylsXXP5TkP3X3V1wpBQAAgGV7Xint7per6v4kjye5Jsl7u/vpqnooycnuPp7kXyf5pao6lXNXSO+5lEMDAACwGVZ5+m66+0SSE9uOPbj09Z8k+eH1jnZV2binJHNR7AeW2Q8ssx9YZj+wzH5g2b7aD+VZtgAAAExZ5TWlAAAAcEmI0vOoqvdW1YtV9YmlY99aVR+uqk8u/n7lLt/7/6rq44s/298YiqvQLvvhh6vq6ar606ra9R3SquqOqtqqqlNV9cDlmZhL6SL3w6eq6r8vHh9OXp6JuZR22Q/vrqrfr6qnqurXqupbdvlejw8b5iL3g8eHDbPLfnjHYi98vKo+VFXfucv3vmXxO+cnq+otO63h6nKR+2Fj+8LTd8+jqv5iks8n+cXufvXi2LuSfKa737n45eGV3f22Hb738939jZd3Yi6lXfbDq5L8aZKfT/IPu/srfoGoqmuS/EGSN+Tcxyc9meTe7n7mcs3O+l3oflis+1SSI929CZ8/RnbdD2/MuXejf7mqfiZJtv974fFhM13oflis+1Q8PmyUXfbDN3X3Hy++/okkt3X339v2fd+a5GSSI0k6yceS/Pnu/uzlnJ/1utD9sLhvY/vCldLz6O6P5is/b/XuJO9ffP3+JH/lsg7FmJ32Q3c/291be3zr7UlOdfdz3f1Skkdzbh9xFbuI/cAG2mU/fKi7X17cfCLnPud7O48PG+gi9gMbaJf98MdLN78h56Jzu7+U5MPd/ZlFiH44yR2XbFAui4vYDxtNlH71vqO7zyTJ4u9v32Xd11fVyap6oqqE6/52fZLnl26fXhxj/+okH6qqj1XVfdPDcFn83SQf3OG4x4f9abf9kHh82Deq6p9V1fNJ3pzkwR2WeHzYR1bYD8kG94UovXRu6u4jSf5Gkn9eVX9meiDG1A7H9t1/AePLvK67vyfJnUl+fPFUHjZUVb09yctJfnmnu3c45vFhg+2xHxKPD/tGd7+9u2/Mub1w/w5LPD7sIyvsh2SD+0KUfvX+V1UdSpLF3y/utKi7X1j8/VyS30zy2ss1IFec00luXLp9Q5IXhmbhCrD0+PBikl/LuadwsoEWb0zyl5O8uXd+EwePD/vICvvB48P+9CtJ/toOxz0+7E+77YeN7gtR+tU7nuSL7372liS/vn1BVb2yqr5u8fV1SV6XxJtW7F9PJrm1qm6pqmuT3JNz+4h9qKq+oape8cWvk7wxySfO/11cjarqjiRvS3JXd39hl2UeH/aJVfaDx4f9o6puXbp5V5Lf32HZ40neuPi98pU5tx8evxzzcXmtsh82vS9E6XlU1QeS/HaSw1V1uqremuSdSd5QVZ/MuXdLfOdi7ZGq+oXFt74qycmq+r0kH0nyTu+kePXbaT9U1V+tqtNJ/kKS36iqxxdrv7OqTiTJ4o0t7s+5f0ieTfJYdz8987+CdbnQ/ZDkO5L8l8Xjw39L8hvd/R8m/jewPrv8e/Evk7wiyYcXb9//nsVajw8b7kL3Qzw+bKTdfp+sqk9U1VM5F5v/YLH2S79Pdvdnkrwj5/7j1ZNJHloc4yp2ofshG94XPhIGAACAMa6UAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMEaUAgAAMOb/A9XyCq7o05fSAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"log_targets = records.map(lambda r: np.log(float(r[-1]))).collect()\n",
"\n",
"hist(log_targets, bins=40, color='lightblue', normed=True)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"\n",
"fig.set_size_inches(16, 10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"data_log = data.map(lambda lp: LabeledPoint(np.log(lp.label), lp.features))\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
}
],
"source": [
"model_log = LinearRegressionWithSGD.train(data_log, iterations=10, step=0.1)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"true_vs_predicted_log = data_log.map(lambda p: (np.exp(p.label), np.exp(model_log.predict(p.features))))"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1460\n",
"log - Mean Squared Error: 39039267707.7658\n",
"log - Mean Absolue Error: 180921.1959\n",
"Root Mean Squared Log Error: 12.0307\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_log.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared Error: %2.4f\" % t)\n",
"print(\"log - Mean Absolue Error: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log Error: %2.4f\" % s_log_mean)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Non log-transformed predictions:\n",
"[(208500.0, -1.3111060925180484e+75), (181500.0, -1.4720767452081686e+75), (223500.0, -1.7050281430818638e+75)]\n",
"Log-transformed predictions:\n",
"[(208500.00000000012, 0.0), (181499.99999999988, 0.0), (223500.0, 0.0)]\n"
]
}
],
"source": [
"print (\"Non log-transformed predictions:\\n\" + str(true_vs_predicted.take(3)))\n",
"\n",
"print (\"Log-transformed predictions:\\n\" + str(true_vs_predicted_log.take(3)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tuning model parameters"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"train, test = data.randomSplit([0.7, 0.3], seed=12345)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"train_size=train.count()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"test_size=test.count()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training data size: 1050\n"
]
}
],
"source": [
"print (\"Training data size: %d\" % train_size)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test data size: 410\n"
]
}
],
"source": [
"print (\"Test data size: %d\" % test_size)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train + Test size : 1460\n"
]
}
],
"source": [
"print (\"Train + Test size : %d\" % (train_size + test_size))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The impact of parameter settings for linear models"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"def evaluate(train, test, iterations, step, regParam, regType, intercept):\n",
"\n",
" model = LinearRegressionWithSGD.train(train, iterations, step, regParam=regParam, regType=regType, intercept=intercept)\n",
"\n",
" tp = test.map(lambda p: (p.label, model.predict(p.features)))\n",
" \n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Iterations"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 5, 11, 15, 20, 50]\n",
"[16.401492085322918, 81.34883033703413, 176.05369822746945, 238.23038626017032, nan, nan]\n"
]
}
],
"source": [
"params = [1, 5, 11, 15, 20, 50]\n",
"\n",
"metrics = [evaluate(train, test, param, 0.1, 0.0, 'l2', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEOCAYAAACHE9xHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAHwNJREFUeJzt3Xl8VPW9//HXB8IOsiUQVllkR4oYRdQqVAVqbWlr1VrcKWhre/vro+2t+nBpba/X1tvb5bZVcUOqotS6tbVsKlVbAQMiJCwSEQQhC3sMELJ8f3+cExziZJ+ZM3Pm/Xw88sjMmbN8MnPmnZMzn5yvOecQEZHwahV0ASIiEl8KehGRkFPQi4iEnIJeRCTkFPQiIiGnoBcRCTkFvYhIyCnoRURCTkEvIhJyCnoRkZDLCLoAgMzMTDdo0KCgyxARSSmrV6/e45zLami+pAj6QYMGkZubG3QZIiIpxcy2N2Y+nboREQk5Bb2ISMgp6EVEQk5BLyIScgp6EZGQU9CLiIScgl5EJCCL8grZV3Ys7ttR0IuIBOC9olK+89Qa/nfp5rhvS0EvIpJg1dWOW59bT5f2GXz/wuFx356CXkQkwRa8/SGrt+/ntotH0bNzu7hvT0EvIpJAxYeOcu8/NjFpSE++dnr/hGxTQS8ikkA//dsGyiur+a+vjMXMErJNBb2ISIK8tqmYv6/bzXemnMKQrM4J266CXkQkAQ4fq+T2F/IY1qszN50/NKHbTorLFIuIhN2vl77HRweO8OebJtE2I7HH2DqiFxGJs7yPDvLov7Zx5ZkDOWNQj4RvX0EvIhJHVdWO255fT/eObbll+shAalDQi4jE0eP/3sa6nQe564uj6dqxTSA1KOhFROJk14Ej/GrJZiaPyOKScX0Cq0NBLyISB8457nwxj2oHP5uRuJ75aBT0IiJxsCivkGUbi/n+RcMY0KNjoLUo6EVEYuzQ0Qrueimf0X1O4oZzBgddjvroRURi7b5Fm9nzcTkPXZNDRuvgj6eDr0BEJERWb9/PEyu3c82kQXxmQLegywEU9CIiMVNRVc1tz60n+6T2/HDaiKDLOU6nbkREYuShN7ayuaiUuVefTud2yROvOqIXEYmB7XvL+O2yLUwb05upY7KDLucECnoRkRZyznH7C3m0ad2Kn35pbNDlfIqCXkSkhV5cu4s3tuzhR9NGkN21fdDlfIqCXkSkBQ4cPsbP/raB8QO6cdVZJwddTlTJ82mBiEgKuufljRw4UsETXz2V1q2Cu8xBfXRELyLSTCu27mVh7k5mf3YIo/qcFHQ5dVLQi4g0w9GKKm57fj0DenTgexcMC7qceunUjYhIM/xx+ftsLSlj/g1n0qFt66DLqZeO6EVEmqiguJT7lxcwY3xfzhueFXQ5DVLQi4g0QXW147bn8ujYNoM7LhkddDmNoqAXEWmChbk7WLVtH7ddPJLMzu2CLqdRFPQiIo1UUlrOPS9v5MzBPbg8Z0DQ5TRag0FvZgPM7DUz22hm+Wb2PX96DzNbamZb/O/d/elmZr8zswIzW2dmE+L9Q4iIJMLP/raBoxXV3POVUwMdGrCpGnNEXwn8wDk3CjgLuNnMRgO3AK8454YBr/j3AT4PDPO/5gD3x7xqEZEEW765mJfe3cW3Jg/llF6dgy6nSRoMeufcbufcGv92KbAR6AfMAB73Z3sc+LJ/ewYw33lWAN3MLLjhz0VEWujIsSrueDGPIVmd+PaUoUGX02RNOkdvZoOA04CVQG/n3G7wfhkAvfzZ+gE7Ihbb6U8TEUlJv3nlPXbsO8I9XzmVdhnJ3TMfTaOD3sw6A38B/p9z7lB9s0aZ5qKsb46Z5ZpZbklJSWPLEBFJqA27DvHwGx9weU5/zhrSM+hymqVRQW9mbfBC/knn3HP+5KKaUzL+92J/+k4g8uPo/sCu2ut0zs11zuU453KyspL/Hw5EJP1UVTtufX493Tq04baLRwVdTrM1puvGgEeAjc65/4146CXgWv/2tcCLEdOv8btvzgIO1pziERFJJU+s2M67Ow5wxyWj6daxbdDlNFtjrnVzDnA1sN7M1vrTbgPuBRaa2SzgQ+Ay/7GXgYuBAuAwcH1MKxYRSYDCg0e5b/FmPjsskxnj+wZdTos0GPTOuTeJft4d4IIo8zvg5hbWJSISqLteyqOiqpqff3lsSvXMR6P/jBURqWVxfiGL84v43oXDOLlnp6DLaTEFvYhIhNKjFdz1Yj4js7sw+7NDgi4nJnQ9ehGRCL9a8h5FpUf541UTaNM6HMfC4fgpRERiYO2OAzz+1jauPutkJgzsHnQ5MaOgFxEBKqqqufW59fTq0o4fTRsRdDkxpVM3IiLAo29+wMbdh3jgqgl0ad8m6HJiSkf0IpL2duw7zK+XvceFo3ozbUx20OXEnIJeRNKac47bX8ijtRl3zxiT8j3z0SjoRSSt/XXdbv75Xgk/mDqCvt06BF1OXCjoRSRtHTxcwd1/zWdc/65ce/agoMuJG30YKyJp695FG9l/uIJ5159J61bhO2VTQ0f0IpKWVn2wjwWrdnDDOYMY269r0OXElYJeRNJOeWUVtz2/nn7dOvD9i4YHXU7c6dSNiKSdB5ZvpaD4Yx677gw6tg1/DOqIXkTSyvslH/OH1wr4wrg+TBnZq+EFQkBBLyJpo/RoBT9Y+C7t2rTiri+ODrqchAn/3ywiIsCBw8e49rG3yfvoIL+/8jR6dWkfdEkJo6AXkdArKS3n6kdWsrWkjPtnTmBqCC9zUB8FvYiE2u6DR5j50Ep2HTzCI9fl8NlhWUGXlHAKehEJre17y5j58EoOHK5g/g0TOXNwj6BLCoSCXkRCqaC4lJkPr6S8spqnZk9kXP9uQZcUGAW9iIRO/q6DXP3IKlqZ8fScsxiZfVLQJQVK7ZUiEiprPtzPlXNX0D6jFQtvVMiDjuhFJET+/f4evvl4Llld2vHkNyfSv3vHoEtKCgp6EQmF1zYVc9MTqxnYoyNPfnMivU5Knz75hijoRSTlvbx+N997+h1GZHdh/g0T6dGpbdAlJRUFvYiktGdX7+Q/n32X0wZ257Hrz+CkkA3sHQsKehFJWX96axt3vJjP2UN78tA1OXRqp0iLRs+KiKSkB//5Pv/9j01cMLIXf5g5gfZtWgddUtJS0ItISnHO8etlW/jdK1v4wrg+/OaK8bRprU7x+ijoRSRlOOf4+d838sibH3DZ6f2599JxoR7rNVYU9CKSEqqqHbe/kMeCVR9y3dmDuPOS0bRSyDeKgl5Ekl5lVTU//PO7vLB2F9+aPJT/nDYCM4V8YynoRSSplVdW8R8L3mFxfhE/mjaCm6ecEnRJKUdBLyJJ68ixKm58YjWvv1fCnZeM5oZzBwddUkpq8KNqM3vUzIrNLC9i2k/M7CMzW+t/XRzx2K1mVmBmm81sWrwKF5FwKz1awbWPreKNLSX84tJTFfIt0Jgj+nnA74H5tab/2jn3P5ETzGw08HVgDNAXWGZmw51zVTGoVUTSxIHDx7j20VXk7TrEb64Yz4zx/YIuKaU1eETvnHsd2NfI9c0AnnbOlTvnPgAKgDNbUJ+IpJmS0nK+PncFG3eXcv/MCQr5GGjJfxl8x8zW+ad2uvvT+gE7IubZ6U8TEWnQ7oNHuOLBt9i2t4xHrstJu0G846W5QX8/MBQYD+wGfuVPj9bv5KKtwMzmmFmumeWWlJQ0swwRCYvte8u47IG3KC4tZ/4NE9NyEO94aVbQO+eKnHNVzrlq4CE+OT2zExgQMWt/YFcd65jrnMtxzuVkZekFFUlnBcWlXP7gW3xcXslTs9N3EO94aVbQm1mfiLtfAWo6cl4Cvm5m7cxsMDAMWNWyEkUkzPI+OsjlD66gqhqemTMprQfxjpcGu27MbAEwGcg0s53AXcBkMxuPd1pmG3AjgHMu38wWAhuASuBmddyISF1Wb9/PdY+toku7DJ6cfRaDMzsFXVIomXNRT6EnVE5OjsvNzQ26DBFJII3v2nJmtto5l9PQfPrPWBFJOI3vmlgKehFJKI3vmngKehFJGI3vGgwFvYgkRM34ruec4o3v2rGt4idR9EyLSNzVjO964ahe/P4bGt810RT0IhI3keO7XjKuD7/W+K6BUNCLSFxofNfkoaAXkZjT+K7JRUEvIjEVOb7rtycP5Uca3zVwCnoRiRmN75qcFPQiEhMa3zV5KehFpMVKj1Yw6/Fc3t62j19ceipXnDEw6JIkgoJeRFpE47smPwW9iDRbSWk5Vz+ykq0lZdw/c4KG/ktSCnoRaZZdB45w1cMr2X3wKI9cl6Oh/5KYgl5Emmz73jK+8dBKDh2pYP6sMzljkIb+S2YKehFpkoLiUmY+vJLyymqenD1RQ/+lAAW9iDRa3kcHuebRVbQy45k5kxiR3SXokqQRdHUhEWmU1dv3c+VDK2if0Yo/36SQTyU6oheRBv27YA/fnJ9Lry7teELju6YcBb2I1OvVTUXc9MQaBvXsyBOzNL5rKlLQi0idNL5rOCjoRSQqje8aHgp6EfkUje8aLnr1ROQED/zzfe7V+K6hoqAXEcAf33Xpe/zu1QKN7xoyCnoROWF818tz+vPfX9X4rmGioBdJcxrfNfwU9CJpTOO7pgcFvUiaKq+s4rtPvcOSDRrfNewU9CJp6MixKub8KZc3tuzhri+O5vpzNL5rmCnoRdJM6dEKZs3L5e3t+/jlpeO4/IwBQZckcaagF0kjNeO75u86xG+/fhpf+kzfoEuSBFDQi6SJE8Z3vep0LhrdO+iSJEEU9CJpQOO7pjcFvUjIaXxXafD/m83sUTMrNrO8iGk9zGypmW3xv3f3p5uZ/c7MCsxsnZlNiGfxIlK/LUWlXPbAW5Qdq+Sp2Wcp5NNUYy5kMQ+YXmvaLcArzrlhwCv+fYDPA8P8rznA/bEpU0SaKu+jg1wxdwUOeGbOJE7t3zXokiQgDQa9c+51YF+tyTOAx/3bjwNfjpg+33lWAN3MrE+sihWRxokc33XhjRrfNd0199J0vZ1zuwH877386f2AHRHz7fSniUiC/LtgD1c/spKendqy8KZJDM7sFHRJErBYX4M02kUyXNQZzeaYWa6Z5ZaUlMS4DJH09OqmIq6b9zb9u3dg4Y2TNIi3AM0P+qKaUzL+92J/+k4g8t/s+gO7oq3AOTfXOZfjnMvJylKrl0hL/X3dbubMX83w3p15es4kDeItxzU36F8CrvVvXwu8GDH9Gr/75izgYM0pHhGJn2dX7+S7C9YwfkA3npp9lgbxlhM02EdvZguAyUCmme0E7gLuBRaa2SzgQ+Ayf/aXgYuBAuAwcH0cahaRCDXju557SiZzrzld47vKpzS4RzjnrqzjoQuizOuAm1talIg0jsZ3lcbQr36RFKTxXaUpFPQiKUbju0pTKehFUog3vut6FqzaofFdpdEU9CIpInJ815unDOWHUzW+qzSOgl4kBWh8V2kJBb1IktP4rtJSCnqRJKbxXSUWFPQiSUrju0qsKOhFkpDGd5VYUtCLJJnI8V0fve4Mzh2WGXRJkuIU9CJJoryyije37OHOF/M1vqvElIJeJEBl5ZUs31zCovxCXttUzMfllWR2bsuTsycyrn+3oMuTkFDQiyTYgcPHWLaxmEV5hby+pYRjldX06NSWS8b1YdrYbM4e2pN2Gbo4mcSOgl4kAYoPHWXxhiIW5xXy1ta9VFU7+nRtzzfOHMj0sdnknNydDF2UTOJEQS8SJ9v3lrE4v5BFeYWs+fAAAEMyOzHnvCFMH5PNuP5ddQkDSQgFvUiMOOd4r+hjFuUVsii/kI27DwEwpu9J/OCi4Uwfm80pvTor3CXhFPQiLVBd7Xh35wEW5ReyJL+ID/aUYQanD+zO7V8YxbQx2QzooQG6JVgKepEmqqyqZtW2fSzOK2RxfhGFh46S0cqYNLQns84dzNTRvTUwtyQVBb1II5RXVvGvgj0syitk6YYi9h+uoF1GK84fnsV/jh3BBSN707Vjm6DLFIlKQS9Sh2g97l3aZfC5Ub2YPiab80dkaSBuSQnaS0Ui7C87xrKNRSzOL1KPu4SGgl7SXtGhoyzJ9zplVmzdR1W1o29Ej/sZg3poTFZJaQp6SUt19bjfeN4Qpo/N5tR+6nGX8FDQS1pwzrG5qNTrcc8rZFNhKaAed0kPCnoJrcge98V5hWzbexgzyDlZPe6SXhT0Eir19bjPPm8IF43uTa8u6nGX9KKgl5R3tKKKf7+vHneRuijoJSV9XF7J8s3epX6Xby5Rj7tIPfROkJTxSY97Ia9v2cOxymp6dmrLFz/Th6lj1OMuUhcFvSS1unrcZ04cyPQx2eSox12kQQp6STrb95axKK+QxfkRPe5Z6nEXaS4FvQSurh73sf1O4odTa3rcuwRcpUjqUtBLINTjLpI4CnpJGPW4iwRDQS9xdbTik+u4L9uoHneRILQo6M1sG1AKVAGVzrkcM+sBPAMMArYBlzvn9resTEklkT3ur20qpuxYFV3aZXDBqF5MH5vNecPV4y6SSLF4t01xzu2JuH8L8Ipz7l4zu8W//+MYbEeSWF097l8a35dpY7I5e2gmbTNaBV2mSFqKx2HVDGCyf/txYDkK+lCK1uPer1sH9biLJJmWBr0DlpiZAx50zs0FejvndgM453abWa+WFinJY9se/zru+YW8E9HjftP5Q5g+pg9j+52kHneRJNPSoD/HObfLD/OlZrapsQua2RxgDsDAgQNbWIbEi3rcRVJfi4LeObfL/15sZs8DZwJFZtbHP5rvAxTXsexcYC5ATk6Oa0kdElvV1Y61Ow+wWD3uIqHQ7KA3s05AK+dcqX97KnA38BJwLXCv//3FWBQq8VVZVc2qD/Z5/8CUX0jRoXL1uIuEREuO6HsDz/vnYzOAp5xzi8zsbWChmc0CPgQua3mZEg/Retzbt/F63KePzeZzI9TjLhIGzQ5659xW4DNRpu8FLmhJURI/6nEXST96R6eB/WXHWLqxiCURPe6ZndvypfH9mDamt3rcRUJOQR9ShQePsmSD1ymz8oNPetyvmngy08dmc/rJ3dXjLpImFPQhoh53EYlGQZ/CnHNsKiw9PkiHetxFJBoFfYo53uOe5x25b/d73M84uYd63EUkKgV9Cqirx/3sUzK58byhXDS6N1ld2gVdpogkKQV9korscV+6sYgDtXvcR/amawf1uItIwxT0SeTj8kpe21TMovxCltf0uLfP4MJRvZk2Jpvzh2fRoW3roMsUkRSjoA9YTY/74rxC3ig4scd9+thsJg3pqR53EWkRBX0A1OMuIomkoE+QbXvKWJTvhfvaHV6P+1D1uItIAijo46SuHvdT+3XlR9NGMG1Mb/W4i0hCKOhjqL4e9zsuGc20Mb3p31097iKSWAr6ForW496mtTFpqHrcRSQ5KOib4WhFFW9u2cOifO867jU97pOHe5f6nTKyl3rcRSRpKOgbST3uIpKqFPT12Fd2jGXqcReRFKegr2X3wSMsyS9icb563EUkHBT0RO9xP6VXZ751/lCmj81mTF/1uItI6krLoFePu4ikk7QJ+upqxzs7DrAkv1aP+yD1uItIuIU66Ctqetz9I/fiUq/H/eyhmdx0/lAuHKUedxEJv9AFvXrcRUROFIqgV4+7iEjdUjro13y4nz+8WsAbW/ZwrEo97iIi0aR00B+rrGZTYSlXT/J63CcMVI+7iEhtKR30Ewf34M0fT1GPu4hIPVI66BXwIiIN00lsEZGQU9CLiIScgl5EJOQU9CIiIaegFxEJOQW9iEjIKehFRELOnHNB14CZlQDbIyZ1BQ42cvFMYE/Miwq/pjzHySTouhOx/XhsIxbrbMk6mrNsU5dJxyw42TmX1dBMSRH0tZnZXOfcnEbOm+ucy4l3TWHTlOc4mQRddyK2H49txGKdLVlHc5Zt6jLKgrol66mbvwZdQBpI1ec46LoTsf14bCMW62zJOpqzbNCvdWgk5RF9U+i3uIiAsqA+yXpE3xRzgy5ARJKCsqAOKX9ELyIi9QvDEb2IiNRDQS8iEnIKehGRkAtd0JtZJzN73MweMrOZQdcjIsEwsyFm9oiZPRt0LUFLiaA3s0fNrNjM8mpNn25mm82swMxu8Sd/FXjWOTcb+FLCixWRuGlKFjjntjrnZgVTaXJJiaAH5gHTIyeYWWvgD8DngdHAlWY2GugP7PBnq0pgjSISf/NofBaILyWC3jn3OrCv1uQzgQL/t/Yx4GlgBrATL+whRX4+EWmcJmaB+FI5CPvxyZE7eAHfD3gOuNTM7kf/Qi2SDqJmgZn1NLMHgNPM7NZgSksOGUEX0AIWZZpzzpUB1ye6GBEJTF1ZsBe4KdHFJKNUPqLfCQyIuN8f2BVQLSISHGVBA1I56N8GhpnZYDNrC3wdeCngmkQk8ZQFDUiJoDezBcBbwAgz22lms5xzlcB3gMXARmChcy4/yDpFJL6UBc2ji5qJiIRcShzRi4hI8ynoRURCTkEvIhJyCnoRkZBT0IuIhJyCXkQk5BT0ScrMnJn9KeJ+hpmVmNnfGlhuvJldXM/jOWb2uxbWlmVmK83sHTP7bEvWFWtmdreZXRh0HfUxs3lm9rUEbOcyM9toZq/Vmt635hrtDe0vzdhmNzP7drRtSXAU9MmrDBhrZh38+xcBHzViufFA1DeumWU453Kdc//RwtouADY5505zzr3RmAX8S8nGhJnVeY0m59ydzrllsdpWsmni8zgL+LZzbkrkROfcLudczS+aOveXemqo7xpZ3YDjQV9rWxIU55y+kvAL+Bi4B/iaf38+8GPgb/79TsCjeP/+/Q7eZVnbAh8CJcBa4ArgJ8BcYAnwFDA5Yh2dgceA9cA64FKgNd41v/P86d+vVdf4WtvoAFzpz5sH/KLWz3A3sBI4N2L6KGBVxP1BwDr/9p3+z5Tn113zT33L/efjn8BdwAdAG/+xk4BtQBu/9prnbBvwU2CNX99If3oWsNSf/iCwHcis4zX4L+BdYAXQ259+fBs18/nfJ/v1LQTeA+4FZgKr/O0PjVj+AeANf75L/Omtgfv8n38dcGPEel/zX78NUer81PPvP48fA5uB+2rNP8ifN9r+8qn9yl/mOuDPeFeEfRVv33kl4rmtme9p4Ii/vvtqtuU/1p5P9rd3gCkR634OWARsAX4Z8XzMo459UV9NyJOgC9BXHS+M9yYdBzzrv0HWcmJI3wNc5d/u5gdGJ/9N8/uI9fwEWA108O9HruMXwG8i5u0OnA4sjZjWLUptx7cB9PXDIgvvaqivAl/2H3PA5XX8fGuBIf7tHwO3+7d7RMzzJ+CL/u3lwB8jHnssYjtzgF/5t+dxYtB/17/9beBh//bvgVv929P9OqMFvYvY/i8jajy+jZrXKuK5PQD0Adrh/QX2U/+x79U81/7yi/D+oh6Gd1Gu9v7PUbONdkAuMNhfbxkwOEqN9T3/y4GcKMsM4pPwPf5aNmK/2lnz+vjbOsm/nQkU4F1F8vi6o2zrB8Bj/u2Rft3t/XVvBbr697fjXaSswX1RX4370qmbJOacW4f3RrkSeLnWw1OBW8xsLd4buj0wsI5VveScOxJl+oV4I/PUbG8/3htuiJn9n5lNBw41UOYZwHLnXInzrjnyJHCe/1gV8Jc6llsIXO7fvgJ4xr89xT//vx74HDAmYplnIm4/zCeXo74eL/ijec7/vhrvuQQ4F+/IE+fcImB/HcseA2o+E4lcvj5vO+d2O+fKgffx/pIC74g0cvmFzrlq59wWvOd8JN5reo3/mq4EeuL9IgDvL6APomyvvue/Oerbr5Y652oG/TDgHjNbByzDuyZ87wbWfS7eL2+cc5vwAn24/9grzrmDzrmjwAbgZJq+L0odUvl69OniJeB/8I7qekZMN+BS59zmyJnNbGKUdZTVsW7DO2o9zjm338w+A0wDbsYL4xvqqS/atcBrHHXO1TWc4zPAn83sOW+zbouZtQf+iHcUusPMfoIXNJ/6OZxz/zKzQWZ2PtDaOXfCGKIRyv3vVXyyv9dXc6QK5x9K1lq+Ev/zLTMzvFMgtbcHUB1xv5oT32+1LzLl/Lq+65xbHPmAmU2m/tcwlurbryJrmIn3V8TpzrkKM9vGia9VXeuuS+TzVgVkNGNflDroiD75PQrc7ZxbX2v6YuC7ftBgZqf500uBLo1c9xK8q/7hr6O7mWUCrZxzfwHuACY0sI6VwPlmlul/UHgl3nnqejnn3sd7Q9/BJ0fqNUGxx8w6Aw19iDcfWEDdR/N1eRP/rwkzm4p3yqoptuGdVgDvs5E2TVwe4DIza2VmQ4EheOfSFwPfMrM2fm3DzaxTA+tp1vMfofb+Utd+VVtXoNgP+Sl4R+DR1hfpdbxfEJjZcLy/FDbXMS/N2BelDgr6JOec2+mc+22Uh36GFzDrzCzPvw/eh3ajzWytmV3RwOp/DnQ3szwzexeYgvcn+HL/T/d5QL1DsDnndvvzvIb3oeUa59yLjfvpeAa4Cu80Ds65A8BDeKc5XsD7QLA+T+KF9IJGbq/GT4GpZrYGb0Dp3XgB1VgP4YXrKqD2kW5jbcYL5H8AN/mnLB7GO22xxn9NH6SBv7pb+PzDp/eXuvar2p4EcswsFy+8N/n17AX+5e9T99Va5o9Aa/+03DPAdf4prro0aV+UuukyxZKy/F70Gc65q5u4XDugyjlXaWaTgPudc+PjUqRIEtA5eklJZvZ/eEfjzflnn4HAQjNrhfeB6+xY1iaSbHRELyIScjpHLyIScgp6EZGQU9CLiIScgl5EJOQU9CIiIaegFxEJuf8PKzpg4clVSY8AAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying number of iterations')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step size"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [],
"source": [
"params = [0.1, 0.020, 0.25, 0.1, 1.0]"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
}
],
"source": [
"metrics = [evaluate(train, test, 20, param, 0.0, 'l2', False) for param in params]"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.1, 0.02, 0.25, 0.1, 1.0]\n",
"[nan, nan, nan, nan, nan]\n"
]
}
],
"source": [
"print (params)\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEOCAYAAACNY7BQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFIRJREFUeJzt3X2UJXV95/H3hxkFVxIeB4PgOBhI3HEfcNMLMeJKFHAw0eEIETBZJ1nixGyIR3I4Kx7XyIMnB9SEbOLjBAiEk+UhqHGiUUJAdMMxSA8iDCgyQbNMYHVYUBeNssN+94+qlvu73J7p6XuHnof365w+XfWrX1V9773d9alfVfe9qSokSZqxx0IXIEnasRgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqTG4oUuYD4OPPDAWrZs2UKXIUk7lXXr1j1cVUu21m+nDIZly5YxPT290GVI0k4lyT/OpZ+XkiRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJDYNBktQwGCRJjYkEQ5IVSe5NsiHJOSOW75nkmn75rUmWDS1fmuSxJGdPoh5J0vyNHQxJFgEfAE4ElgOnJ1k+1O0M4NGqOhy4GLhoaPnFwKfHrUWSNL5JjBiOAjZU1f1V9ThwNbByqM9K4Ip++jrglUkCkOQk4H7g7gnUIkka0ySC4RDggYH5jX3byD5VtRn4DnBAkmcDbwPOm0AdkqQJmEQwZERbzbHPecDFVfXYVneSrE4ynWR606ZN8yhTkjQXiyewjY3A8wbmDwUenKXPxiSLgX2AR4CjgVOSvAfYF/h/SX5QVe8f3klVrQHWAExNTQ0HjyRpQiYRDLcBRyQ5DPgn4DTgDUN91gKrgC8ApwA3VVUBL5vpkORc4LFRoSBJevqMHQxVtTnJmcD1wCLgsqq6O8n5wHRVrQUuBa5MsoFupHDauPuVJG0f6U7cdy5TU1M1PT290GVI0k4lybqqmtpaP//zWZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUmEgwJFmR5N4kG5KcM2L5nkmu6ZffmmRZ3358knVJ7uq/v2IS9UiS5m/sYEiyCPgAcCKwHDg9yfKhbmcAj1bV4cDFwEV9+8PAa6rqXwOrgCvHrUeSNJ5JjBiOAjZU1f1V9ThwNbByqM9K4Ip++jrglUlSVV+qqgf79ruBvZLsOYGaJEnzNIlgOAR4YGB+Y982sk9VbQa+Axww1Odk4EtV9cMJ1CRJmqfFE9hGRrTVtvRJ8iK6y0snzLqTZDWwGmDp0qXbXqUkaU4mMWLYCDxvYP5Q4MHZ+iRZDOwDPNLPHwp8HHhjVf3DbDupqjVVNVVVU0uWLJlA2ZKkUSYRDLcBRyQ5LMkzgdOAtUN91tLdXAY4BbipqirJvsCngLdX1S0TqEWSNKaxg6G/Z3AmcD3wFeDaqro7yflJXtt3uxQ4IMkG4HeAmT9pPRM4HHhnkjv6r4PGrUmSNH+pGr4dsOObmpqq6enphS5DknYqSdZV1dTW+vmfz5KkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWpMJBiSrEhyb5INSc4ZsXzPJNf0y29Nsmxg2dv79nuTvGoS9UiS5m/sYEiyCPgAcCKwHDg9yfKhbmcAj1bV4cDFwEX9usuB04AXASuAD/bbkyQtkEmMGI4CNlTV/VX1OHA1sHKoz0rgin76OuCVSdK3X11VP6yqrwMb+u1JkhbIJILhEOCBgfmNfdvIPlW1GfgOcMAc15UkPY0mEQwZ0VZz7DOXdbsNJKuTTCeZ3rRp0zaWKEmaq0kEw0bgeQPzhwIPztYnyWJgH+CROa4LQFWtqaqpqppasmTJBMqWJI0yiWC4DTgiyWFJnkl3M3ntUJ+1wKp++hTgpqqqvv20/q+WDgOOAL44gZokSfO0eNwNVNXmJGcC1wOLgMuq6u4k5wPTVbUWuBS4MskGupHCaf26dye5FrgH2Az8VlU9MW5NkqT5S3fivnOZmpqq6enphS5DknYqSdZV1dTW+vmfz5KkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkhsEgSWoYDJKkxljBkGT/JDckua//vt8s/Vb1fe5Lsqpv+xdJPpXkq0nuTnLhOLVIkiZj3BHDOcCNVXUEcGM/30iyP/Au4GjgKOBdAwHyvqp6IfBi4KVJThyzHknSmMYNhpXAFf30FcBJI/q8Crihqh6pqkeBG4AVVfX9qvosQFU9DtwOHDpmPZKkMY0bDM+pqocA+u8HjehzCPDAwPzGvu1HkuwLvIZu1CFJWkCLt9Yhyd8CPzFi0TvmuI+MaKuB7S8GrgL+qKru30Idq4HVAEuXLp3jriVJ22qrwVBVx822LMk3kxxcVQ8lORj41ohuG4FjB+YPBW4emF8D3FdVf7iVOtb0fZmamqot9ZUkzd+4l5LWAqv66VXAJ0b0uR44Icl+/U3nE/o2krwb2Ad465h1SJImZNxguBA4Psl9wPH9PEmmklwCUFWPABcAt/Vf51fVI0kOpbsctRy4PckdSX59zHokSWNK1c53VWZqaqqmp6cXugxJ2qkkWVdVU1vr538+S5IaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqTFWMCTZP8kNSe7rv+83S79VfZ/7kqwasXxtkvXj1CJJmoxxRwznADdW1RHAjf18I8n+wLuAo4GjgHcNBkiS1wGPjVmHJGlCxg2GlcAV/fQVwEkj+rwKuKGqHqmqR4EbgBUASfYGfgd495h1SJImZNxgeE5VPQTQfz9oRJ9DgAcG5jf2bQAXAL8PfH/MOiRJE7J4ax2S/C3wEyMWvWOO+8iItkpyJHB4VZ2VZNkc6lgNrAZYunTpHHctSdpWWw2GqjputmVJvpnk4Kp6KMnBwLdGdNsIHDswfyhwM/AS4GeSfKOv46AkN1fVsYxQVWuANQBTU1O1tbolSfMz7qWktcDMXxmtAj4xos/1wAlJ9utvOp8AXF9VH6qq51bVMuAY4GuzhYIk6ekzbjBcCByf5D7g+H6eJFNJLgGoqkfo7iXc1n+d37dJknZAqdr5rspMTU3V9PT0QpchSTuVJOuqampr/fzPZ0lSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSI1W10DVssySbgG8D35nH6gcCD0+2Im3BPszvddqR7aiPaaHq2t77nfT2J7W9cbYz33XHPX49v6qWbK3TThkMAEnWVNXqeaw3XVVT26MmPdV8X6cd2Y76mBaqru2930lvf1LbG2c7O/rxa2e+lPRXC12A5mRXfJ121Me0UHVt7/1OevuT2t4429lRf4aAnXjEMF+OGCTtrBwxbD9rFroASZqnp+X4tduNGCRJW7Y7jhgkSVtgMEiSGgaDJKmx2wdDkmcnuSLJnyT55YWuR5LmKskLklya5LpJbneXDIYklyX5VpL1Q+0rktybZEOSc/rm1wHXVdWbgNc+7cVK0oBtOX5V1f1Vdcaka9glgwG4HFgx2JBkEfAB4ERgOXB6kuXAocADfbcnnsYaJWmUy5n78Wu72CWDoao+Dzwy1HwUsKFP2MeBq4GVwEa6cIBd9PmQtPPYxuPXdrE7HQgP4cmRAXSBcAjwMeDkJB9iB/83dUm7rZHHryQHJPkw8OIkb5/UzhZPakM7gYxoq6r6HvBrT3cxkrQNZjt+/W/gzZPe2e40YtgIPG9g/lDgwQWqRZK2xdN6/NqdguE24IgkhyV5JnAasHaBa5KkuXhaj1+7ZDAkuQr4AvDTSTYmOaOqNgNnAtcDXwGuraq7F7JOSRq2Ixy/fBM9SVJjlxwxSJLmz2CQJDUMBklSw2CQJDUMBklSw2CQJDUMhl1Ukkpy5cD84iSbknxyK+sdmeTVW1g+leSPxqxtSZJbk3wpycvG2dakJTk/yXELtO9vJDlwAfb73iR3J3nvHPouS/KG7VzPm5O8cXvuQ1u2O71X0u7me8C/SvKsqvpn4Hjgn+aw3pHAFPDXwwuSLK6qaWB6zNpeCXy1qlbNdYUki6pqIm+L3j+OzaOWVdXvTmIfO5nfAJZU1Q/n0HcZ8Abgv2+vYqrqw9tr25obRwy7tk8Dv9BPnw5cNbOg/+S6y5Lc1p+5r+z/1f584NQkdyQ5Ncm5SdYk+Rvgz5IcOzPqSLJ3kj9NcleSO5OcnGRRksuTrO/bzxosKMmRwHuAV/f7eFaS0/u+65NcNND3sf4M/lbgJQPt/zLJFwfmlyW5s5/+3f4xre/rTt9+c5LfS/I54B1Jvp7kGf2yH+/P1p/R135K3/6NJOclub2v74V9+5IkN/TtH0nyj8Nn+kl+M8l7BuZ/Nckf99N/mWRdf5a+evhF6x/P+oH5s5Oc20//ZJLP9Ov/j4Gafql/zF9O8vkR20w/Mph5XU7t29cCzwZunWkbWOfl/Wt0R/8z8mPAhcDL+raz+tf7vf1zfmeS3+jXPTbJ55N8PMk9ST6c5CnHmyQX9svvTPK+vu3c/jE/d2D/dyR5Isnz++f/o/0+b0vy0uHtakxV5dcu+AU8Bvwb4DpgL+AO4Fjgk/3y3wN+pZ/eF/ga3QHiV4H3D2znXGAd8Kx+fnAbFwF/ONB3P+BngBsG2vYdUduP9gE8F/ifwBK6EexNwEn9sgJeP8vjuwN4QT/9NuC/9tP7D/S5EnhNP30z8MGBZX86sJ/VwO/305cDp/TT3wB+u5/+z8Al/fT7gbf30yv6Og8cqm8J3fvnz8x/GjhmsEbgWcB64ICB/R1Id1a+fmDds4Fz++kbgSP66aOBm/rpu4BDtvCcnwzcACwCntM/5wfP/KzM8hz/FfDSfnrv/vX50es/8NzNPPd70o0mD+v7/QB4Qb/PG2ae14F19wfu5cl3YNh34Gfu7KG+v0X3NhDQjVZmnsulwFcW+vdtV/tyxLALq6o76Q4yp/PUS0MnAOckuYPuoLkX3S/ZKGuruxw17Di6T5Wa2d+jwP3AC5L8cZIVwHe3Uua/B26uqk3VXd75c+A/9MueAD46y3rXAq/vp08Frumnfz7d/Yu7gFcALxpY55qB6Ut48u3Wf40uKEb5WP99Hd1zCXAM3QelUFWfAR4dXqmqNgH3J/nZJAcAPw3c0i9+S5IvA39P946ZR8yy70aSvYGfA/6if90+AhzcL74FuDzJm+gOxMOOAa6qqieq6pvA5+ie+y25BfiDJG+hO2iPuvx2AvDGvp5bgQMGHs8Xq/tgmSfoRqvHDK37XbrwuCTJ64Dvz/K4Xwr8OvCf+qbjgPf3+1wL/Hg/mtGEeI9h17cWeB/dGdwBA+0BTq6qewc7Jzl6xDa+N8u2Q3e2/CNV9WiSfwu8iu4s7/U8+Qs92zZm84Oa/b7CNXQHyI91u637kuwFfBCYqqoH+ssve416HFV1S3/J5uXAoqpqPl93wMx19yd48vdlSzUP1/h64KvAx6uqkhxLd2B7SVV9P8nNQzUCbKa9zDuzfA/g21V15PCOqurN/Wv3C8AdSY6s7r36Z8y15sFtXpjkU8Crgb/P6JvyoRtVXd80do9z+I3Yhn9WNic5iu6e02l0bxL3iqHtHAxcCry2qh7rm/ege/5GnaxoAhwx7PouA86vqruG2q8HfnvgGvyL+/b/A8z17Otv6H6Z6bexX3+tfY+q+ijwTuDfbWUbtwIvT3Jgus+1PZ3ubHaLquof6A7W7+TJkcDMAfTh/uz6lK1s5s/ozmRnGy3M5u/oRytJTqC7hDbKx4CT6B7TTI37AI/2ofBC4GdHrPdN4KB0n861J/CLAFX1XeDrSX6p33f6ECbJT1bVrdXdPH+Y9r37AT5Pd+9oUZIldKOyL7IF/TbvqqqL6C4RvZCn/nxcD/xmnrxf81NJnt0vOyrd20TvQTeq+7uh7e8N7FNVfw28le4PHwaXP4NuZPi2qvrawKLhn7unBKXGYzDs4qpqY1X9txGLLgCeAdzZ3+i8oG//LLC8v9l36oj1Br0b2G/mpifw83QfQXhzP8y/HNjixw1W1UN9n88CXwZur6pPzO3RcQ3wK3QHD6rq28Cf0F1v/0u697Dfkj+nO6hftZV+w84DTkhyO92Hsz9Ed8Bs9JfW7gGeX1UzB+HPAIvT3Sy/gO5y0vB6/5fujwBuBT5JN+KY8cvAGf3zfTdPfu7ve/ubyuvpQuDLQ5v9OHBn334T8F+q6n9t5XG+deC1/We6+yR3Apv7m9xn0V2Suwe4vd/3R3hyZPUFupvV64Gv9zUM+jHgk/1z8TngrKHlP0d3ueu8gRvQzwXeAkz1N6zvYTt8gtnuzrfd1m4r3V8frayq/7iN6+0JPNFfCnkJ8KFRl3d2Z/2lpLOr6hcXuhZtO+8xaLeU7k9HT6S7fr6tlgLX9pdIHgfeNMnapIXmiEGS1PAegySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhr/H9PYhXnKZ6AqAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying values of step size')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L2 regularization"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0, 0.01, 0.1, 1.0, 5.0, 10.0, 20.0]\n",
"[nan, nan, nan, nan, nan, nan, nan]\n"
]
}
],
"source": [
"params = [0.0, 0.01, 0.1, 1.0, 5.0, 10.0, 20.0]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, param, 'l2', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEOCAYAAACNY7BQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFJ1JREFUeJzt3X20ZXV93/H3R0bBaMLjYJBhHCws7ZgHrCeobawkPDjY6rACFWhihpRkVlJJ22S5Kq5oUDSJaLKwKpqMiE5owkOJ1okmQYKSrvqA3FEKjBEZEcsIjWOZ0KBGMuTbP/YeOL/ruQ9zzxnuvcz7tdZZdz/89m9/9z777s/Z+9xzbqoKSZL2eNJiFyBJWloMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSw2CQJDUMBklSY8ViF7AQRxxxRK1Zs2axy5CkZWXr1q3fqqqVc7VblsGwZs0apqamFrsMSVpWknx9Pu28lSRJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJakwkGJKsS3Jnku1JLhwx/8Ak1/Tzb06yZtr81UkeSvLaSdQjSVq4sYMhyQHAZcDpwFrg3CRrpzU7H9hVVccBlwKXTJt/KfDn49YiSRrfJK4YTgS2V9XdVfUwcDWwflqb9cDmfvg64OQkAUhyBnA3sG0CtUiSxjSJYDgauHdofEc/bWSbqtoNPAgcnuRpwOuAN0+gDknSBEwiGDJiWs2zzZuBS6vqoTlXkmxMMpVkaufOnQsoU5I0Hysm0McO4Jih8VXAfTO02ZFkBXAw8ADwQuCsJG8HDgH+McnfV9V7pq+kqjYBmwAGg8H04JEkTcgkguEW4PgkxwLfAM4B/u20NluADcBngbOAT1ZVAS/Z0yDJm4CHRoWCJOnxM3YwVNXuJBcA1wMHAFdU1bYkFwNTVbUF+ABwZZLtdFcK54y7XknSvpHuhfvyMhgMampqarHLkKRlJcnWqhrM1c5PPkuSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKlhMEiSGhMJhiTrktyZZHuSC0fMPzDJNf38m5Os6aefmmRrktv7nz89iXokSQs3djAkOQC4DDgdWAucm2TttGbnA7uq6jjgUuCSfvq3gFdU1Y8CG4Arx61HkjSeSVwxnAhsr6q7q+ph4Gpg/bQ264HN/fB1wMlJUlVfrKr7+unbgIOSHDiBmiRJCzSJYDgauHdofEc/bWSbqtoNPAgcPq3NmcAXq+p7E6hJkrRAKybQR0ZMq71pk+R5dLeXTptxJclGYCPA6tWr975KSdK8TOKKYQdwzND4KuC+mdokWQEcDDzQj68CPgL8fFV9daaVVNWmqhpU1WDlypUTKFuSNMokguEW4PgkxyZ5CnAOsGVamy10by4DnAV8sqoqySHAx4HXV9WnJ1CLJGlMYwdD/57BBcD1wF8D11bVtiQXJ3ll3+wDwOFJtgO/Duz5k9YLgOOANya5tX8cOW5NkqSFS9X0twOWvsFgUFNTU4tdhiQtK0m2VtVgrnZ+8lmS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEkNg0GS1DAYJEmNiQRDknVJ7kyyPcmFI+YfmOSafv7NSdYMzXt9P/3OJC+bRD2SpIUbOxiSHABcBpwOrAXOTbJ2WrPzgV1VdRxwKXBJv+xa4BzgecA64L19f5KkRTKJK4YTge1VdXdVPQxcDayf1mY9sLkfvg44OUn66VdX1feq6mvA9r4/SdIimUQwHA3cOzS+o582sk1V7QYeBA6f57KSpMfRJIIhI6bVPNvMZ9mug2RjkqkkUzt37tzLEiVJ8zWJYNgBHDM0vgq4b6Y2SVYABwMPzHNZAKpqU1UNqmqwcuXKCZQtSRplEsFwC3B8kmOTPIXuzeQt09psATb0w2cBn6yq6qef0//V0rHA8cDnJ1CTJGmBVozbQVXtTnIBcD1wAHBFVW1LcjEwVVVbgA8AVybZTnelcE6/7LYk1wJfAnYDr6mqR8atSZK0cOleuC8vg8GgpqamFrsMSVpWkmytqsFc7fzksySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpMVYwJDksyQ1J7up/HjpDuw19m7uSbOin/UCSjyf5cpJtSd42Ti2SpMkY94rhQuDGqjoeuLEfbyQ5DLgIeCFwInDRUID8blU9F3g+8C+SnD5mPZKkMY0bDOuBzf3wZuCMEW1eBtxQVQ9U1S7gBmBdVX2nqj4FUFUPA18AVo1ZjyRpTOMGwzOq6n6A/ueRI9ocDdw7NL6jn/aoJIcAr6C76pAkLaIVczVI8pfAD4+Y9RvzXEdGTKuh/lcAVwHvqqq7Z6ljI7ARYPXq1fNctSRpb80ZDFV1ykzzkvxNkqOq6v4kRwHfHNFsB3DS0Pgq4Kah8U3AXVX1zjnq2NS3ZTAY1GxtJUkLN+6tpC3Ahn54A/DREW2uB05Lcmj/pvNp/TSSvBU4GPhPY9YhSZqQcYPhbcCpSe4CTu3HSTJIcjlAVT0AvAW4pX9cXFUPJFlFdztqLfCFJLcm+cUx65EkjSlVy++uzGAwqKmpqcUuQ5KWlSRbq2owVzs/+SxJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqTGWMGQ5LAkNyS5q/956AztNvRt7kqyYcT8LUnuGKcWSdJkjHvFcCFwY1UdD9zYjzeSHAZcBLwQOBG4aDhAkvwM8NCYdUiSJmTcYFgPbO6HNwNnjGjzMuCGqnqgqnYBNwDrAJI8Hfh14K1j1iFJmpBxg+EZVXU/QP/zyBFtjgbuHRrf0U8DeAvwe8B3xqxDkjQhK+ZqkOQvgR8eMes35rmOjJhWSU4AjquqX0uyZh51bAQ2AqxevXqeq5Yk7a05g6GqTplpXpK/SXJUVd2f5CjgmyOa7QBOGhpfBdwEvBh4QZJ7+jqOTHJTVZ3ECFW1CdgEMBgMaq66JUkLM+6tpC3Anr8y2gB8dESb64HTkhzav+l8GnB9Vb2vqp5ZVWuAnwS+MlMoSJIeP+MGw9uAU5PcBZzaj5NkkORygKp6gO69hFv6x8X9NEnSEpSq5XdXZjAY1NTU1GKXIUnLSpKtVTWYq52ffJYkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNVJVi13DXkuyE/hb4MEFLH4E8K3JVqRZHMzCnqelbKlu02LVta/XO+n+J9XfOP0sdNlxz1/PqqqVczValsEAkGRTVW1cwHJTVTXYFzXp+y30eVrKluo2LVZd+3q9k+5/Uv2N089SP38t51tJf7rYBWhenojP01LdpsWqa1+vd9L9T6q/cfpZqscQsIyvGBbKKwZJy5VXDPvOpsUuQJIW6HE5f+13VwySpNntj1cMkqRZGAySpIbBIElq7PfBkORpSTYneX+Sn13seiRpvpI8O8kHklw3yX6fkMGQ5Iok30xyx7Tp65LcmWR7kgv7yT8DXFdVvwS88nEvVpKG7M35q6rurqrzJ13DEzIYgA8B64YnJDkAuAw4HVgLnJtkLbAKuLdv9sjjWKMkjfIh5n/+2ieekMFQVf8DeGDa5BOB7X3CPgxcDawHdtCFAzxB94ek5WMvz1/7xP50Ijyax64MoAuEo4EPA2cmeR9L/GPqkvZbI89fSQ5P8vvA85O8flIrWzGpjpaBjJhWVfVt4Bce72IkaS/MdP76v8AvT3pl+9MVww7gmKHxVcB9i1SLJO2Nx/X8tT8Fwy3A8UmOTfIU4BxgyyLXJEnz8biev56QwZDkKuCzwHOS7EhyflXtBi4Argf+Gri2qrYtZp2SNN1SOH/5JXqSpMYT8opBkrRwBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwLDFJKsmVQ+MrkuxM8rE5ljshyctnmT9I8q4xa1uZ5OYkX0zyknH6mrQkFyc5ZUJ93ZPkiEn0Nck+kzw3ya39/v8nc/Wf5GeT3NY/PpPkx8dZ/0IsZLuTXL6Qbw5Ncl6SZ47bj/av70paLr4N/EiSp1bVd4FTgW/MY7kTgAHwZ9NnJFlRVVPA1Ji1nQx8uao2zHeBJAdU1US+zrzfjt2j5lXVb05iHUvcGcBHq+qiebb/GvDSqtqV5HRgE/DC2RaY5PO1EP36f3GBi58H3EH/VRFj9KOq8rGEHsBDwG8DZ/Xjfwi8DvhYP/404Aq6j8h/ke6rd58C/G9gJ3ArcDbwJroTwSeAPwZOGurj6cAHgduB24AzgQPovgf+jn76r02r64Rp63gqcG7f9g7gkmnbcDFwM/CTQ9P/KfD5ofE1wG398G/223RHX/eeD1/e1O+PvwIuojvZPbmf90PAPcCT+9r37LN7gDcDX+jre24/fSVwQz/9D4CvA0eMeA7u2TMd+Dng8/02/0G/n34FePtQ+/OAd8/UfrjP/vn7OPC/+m09e8T6TwA+1z83HwEOBV4O/B+6Fwmfmq3mGY6rQ4FvzHLMPfp8AS/o9/dWuk/aHtW3+4m+ps8C7wDuGNr+9wz19zHgpBH78r/3fW4DNs6y/pvoXuS8st+PtwJ3Al+b6VgBzur7uZPHjs+bgEG/zGzH6m/1z8fngGcs9jlgKTwWvQAf056Q7kD9MeA64KD+ID+Jx07qvw38XD98CPCV/mQz/ZfzTf0v4VP78eE+LgHeOdT20P5kcMPQtENG1PboOoBn0gXFSrorz08CZ/TzCnjVDNt3K/Dsfvh1wBv64cOG2lwJvKIfvgl479C8Dw6tZyPwe/3wh2iD4Vf74X8PXN4Pvwd4fT+8rq9zxmCgC7I/5bEgei/w8/02bx9q/+d0J7SR7af1eSbw/qFlDx6x/tvoXulDd8J859Bz+toZ9us9o7ZlaP5r9+yHEfMefb7oQvYzwMp+/Gzgin74DuCf98NvY++D4bD+51P7vg4fdbwwdEIfmnYt8Jp5HCuD6f0w97G6Z/m30x+P+/vD9xiWoKq6je7V9Ll8/62h04ALk9xKd+AfBKyeoast1d2Omu4Uuv8GtWd9u4C7gWcneXeSdcD/m6PMnwBuqqqd1d3e+SPgX/bzHgH+ZIblrgVe1Q+fDVzTD/9U//7F7cBPA88bWuaaoeHLeexr0n+BLihG+XD/cyvdvoTu5H01QFX9BbBrpo3rnUwXmLf0+/tkulDbCdyd5EVJDgeeA3x6pvbT+rwdOCXJJUleUlUPDs9McjBdKP9VP2kzj+3XBUnyU8D5dEE8yvDz9RzgR4Ab+m14A7AqySHAD1bVZ/p2f7yAUv5Dkj2vzI8Bjh+x/lH1/2fgu1W155id7VgZZbZj9WG6IIP2WNmv+R7D0rUF+F26V/qHD00PcGZV3TncOMmoe8ffnqHv0L1SelR196F/HHgZ8Bq6k/e/m6W+Ud8Pv8ff18z3qa8B/luSD3errbuSHET36npQVfcmeRNd4H3fdlTVp5OsSfJSuts0zf/FHfK9/ucjPHacz1bzKAE2V9Wof4ByDd0++jLwkaqqJLO131P/V5K8gO7W0O8k+URVXbyXdc1bkh+jC9PTq/vu/lGGn68A26rqxdP6OXSW1eym/UOWg6Y3SHIS3QuSF1fVd5LcNNRuxuMlycnAv6E/kc/jWBnZzSzz/qH6ywXaY2W/5hXD0nUFcHFV3T5t+vXAr/YnIZI8v5/+d8APzrPvT9B9UyN9H4f2fznypKr6E+CNwD+bo4+bgZcmOaL/f7Tn0t2XnlVVfZXuF/CNPHYlsOcX+1tJnk53v3g2fwhcxcxXCzP5n/RXK0lOo7uFNpsbgbOSHNkvc1iSZ/XzPkz3ZvC5PLYds7Wnn/ZM4DtV9V/pgr/Zz/0VxK6hv/p6NfPYr6MkWd3X+eqq+so8F7sTWJnkxX0fT07yvP6q8u+SvKhvd87QMvcAJyR5UpJj6P4N5XQHA7v6UHgu8KIRbabX/yy6EHjV0JXvbMfKTL8DCzpW92em4xJVVTuA/zJi1luAdwK39eFwD/CvgU/x2C2m35mj+7cClyW5g+4k/Wbgq8AHk+x5sTDrvwmsqvvT/SvBT9G9IvuzqvrofLaN7kT6DuDYvq+/TfJ+utss99C9sTibP+q34ap5rm+PNwNXJTmb7sRwP93JZKSq+lKSNwCf6PfLP9BdTX29v8L6ErC2qj4/V/uhbn8UeEeSf+zn/8qIVW8Afj/JD9Dd4pvvfxi8re8Xult2P0R3tfne/nXE7qoazNZBVT2c5CzgXf1trRV0x9s2uttR70/ybbrbmHtug32a7o8C9ry5+4URXf8F8MtJbqMLn8/NY3vO6+v/SF//fVX18lmOlQ/R7bfvAo9e8Yx5rO6X/NptLTv9iWt9Vb16L5c7EHikqnb3r4jfV1Un7JMin4CSPL2qHuqHL6T7a6X/uMhlaR/wikHLSpJ3A6fT3aPfW6uBa/tX8w8DvzTJ2vYD/6p/5b2C7irovMUtR/uKVwySpIZvPkuSGgaDJKlhMEiSGgaDJKlhMEiSGgaDJKnx/wE1BT111UnhvQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying levels of L2 regularization')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L1 regularization"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]\n",
"[nan, nan, nan, nan, nan, nan, nan]\n"
]
}
],
"source": [
"params = [0.0, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, param, 'l1', False) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"L1 (1.0) number of zero weights: 6\n",
"L1 (10.0) number of zeros weights: 6\n",
"L1 (100.0) number of zeros weights: 6\n"
]
}
],
"source": [
"model_l1 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=1.0, regType='l1', intercept=False)\n",
"\n",
"model_l1_10 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=10.0, regType='l1', intercept=False)\n",
"\n",
"model_l1_100 = LinearRegressionWithSGD.train(train, 10, 0.1, regParam=100.0, regType='l1', intercept=False)\n",
"\n",
"print (\"L1 (1.0) number of zero weights: \" + str(sum(model_l1.weights.array == 0)))\n",
"\n",
"print (\"L1 (10.0) number of zeros weights: \" + str(sum(model_l1_10.weights.array == 0)))\n",
"\n",
"print (\"L1 (100.0) number of zeros weights: \" + str(sum(model_l1_100.weights.array == 0)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intercept"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/akashsoni/spark/python/pyspark/mllib/regression.py:281: UserWarning: Deprecated in 2.0.0. Use ml.regression.LinearRegression.\n",
" warnings.warn(\"Deprecated in 2.0.0. Use ml.regression.LinearRegression.\")\n",
"/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in log\n",
" # This is added back by InteractiveShellApp.init_path()\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[False, True]\n",
"[nan, nan]\n"
]
}
],
"source": [
"params = [False, True]\n",
"\n",
"metrics = [evaluate(train, test, 10, 0.1, 1.0, 'l2', param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEKCAYAAAAW8vJGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFX1JREFUeJzt3X+0ZWV93/H3xxkFfyQwwIDIOBkUEjtEi/UuXNrowh8gmCgspRGT1DHVktVoarR2BWsTAV2JmFisv5JM1EhtEjBaV6aSFBBFiU2VO4A/RkXGEcMIVSyUVbRi0W//2M+V89ycy71zz7lzufB+rXXW2fvZz977+5w7cz9n733PPqkqJEma86DVLkCSdN9iMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKmzfrULWI7DDjustmzZstplSNKasnPnzu9U1cbF+q3JYNiyZQuzs7OrXYYkrSlJvrGUfp5KkiR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUmcqwZDklCTXJ9md5Owxyw9IcnFb/pkkW+Yt35zkziSvnUY9kqTlmzgYkqwD3gWcCmwFXpxk67xuLwNur6pjgAuA8+ctvwD4m0lrkSRNbhpHDCcAu6tqT1X9ALgIOG1en9OAC9v0h4BnJQlAktOBPcCuKdQiSZrQNILhKOCmkfm9rW1sn6q6G7gDODTJw4HfAs6dQh2SpCmYRjBkTFstsc+5wAVVdeeiO0nOSjKbZPbWW29dRpmSpKVYP4Vt7AUePTK/Cbh5gT57k6wHDgJuA54MnJHkLcDBwI+SfL+q3jl/J1W1HdgOMDMzMz94JElTMo1guBo4NsnRwDeBM4FfmtdnB7AN+DvgDODjVVXA0+Y6JDkHuHNcKEiS9p+Jg6Gq7k7ySuBSYB3wvqraleQ8YLaqdgDvBT6QZDfDkcKZk+5XkrQyMrxxX1tmZmZqdnZ2tcuQpDUlyc6qmlmsn598liR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1phIMSU5Jcn2S3UnOHrP8gCQXt+WfSbKltZ+UZGeSL7TnZ06jHknS8k0cDEnWAe8CTgW2Ai9OsnVet5cBt1fVMcAFwPmt/TvA86rq8cA24AOT1iNJmsw0jhhOAHZX1Z6q+gFwEXDavD6nARe26Q8Bz0qSqrq2qm5u7buAA5McMIWaJEnLNI1gOAq4aWR+b2sb26eq7gbuAA6d1+eFwLVVddcUapIkLdP6KWwjY9pqX/okOY7h9NLJC+4kOQs4C2Dz5s37XqUkaUmmccSwF3j0yPwm4OaF+iRZDxwE3NbmNwEfAV5SVV9baCdVtb2qZqpqZuPGjVMoW5I0zjSC4Wrg2CRHJ3kIcCawY16fHQwXlwHOAD5eVZXkYOAS4HVV9ekp1CJJmtDEwdCuGbwSuBT4MvDBqtqV5Lwkz2/d3gscmmQ38Bpg7k9aXwkcA/x2kuva4/BJa5IkLV+q5l8OuO+bmZmp2dnZ1S5DktaUJDuramaxfn7yWZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSR2DQZLUMRgkSZ2pBEOSU5Jcn2R3krPHLD8gycVt+WeSbBlZ9rrWfn2S50yjHknS8k0cDEnWAe8CTgW2Ai9OsnVet5cBt1fVMcAFwPlt3a3AmcBxwCnAu9v2JEmrZBpHDCcAu6tqT1X9ALgIOG1en9OAC9v0h4BnJUlrv6iq7qqqrwO72/YkSatkGsFwFHDTyPze1ja2T1XdDdwBHLrEdSVJ+9E0giFj2mqJfZay7rCB5Kwks0lmb7311n0sUZK0VNMIhr3Ao0fmNwE3L9QnyXrgIOC2Ja4LQFVtr6qZqprZuHHjFMqWJI0zjWC4Gjg2ydFJHsJwMXnHvD47gG1t+gzg41VVrf3M9ldLRwPHAp+dQk2SpGVaP+kGquruJK8ELgXWAe+rql1JzgNmq2oH8F7gA0l2MxwpnNnW3ZXkg8CXgLuBV1TVDyetSZK0fBneuK8tMzMzNTs7u9plSNKakmRnVc0s1s9PPkuSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOgaDJKljMEiSOhMFQ5JDklye5Ib2vGGBfttanxuSbGttD0tySZKvJNmV5M2T1CJJmo5JjxjOBq6oqmOBK9p8J8khwBuAJwMnAG8YCZA/qKrHAU8E/mmSUyesR5I0oUmD4TTgwjZ9IXD6mD7PAS6vqtuq6nbgcuCUqvpeVX0CoKp+AFwDbJqwHknShCYNhiOq6haA9nz4mD5HATeNzO9tbT+W5GDgeQxHHZKkVbR+sQ5JPgY8csyi1y9xHxnTViPbXw/8BfD2qtpzL3WcBZwFsHnz5iXuWpK0rxYNhqp69kLLknwryZFVdUuSI4Fvj+m2FzhxZH4TcOXI/Hbghqp62yJ1bG99mZmZqXvrK0lavklPJe0AtrXpbcBfjelzKXBykg3tovPJrY0kbwIOAn5zwjokSVMyaTC8GTgpyQ3ASW2eJDNJ3gNQVbcBbwSubo/zquq2JJsYTkdtBa5Jcl2Sl09YjyRpQqlae2dlZmZmanZ2drXLkKQ1JcnOqppZrJ+ffJYkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVLHYJAkdQwGSVJnomBIckiSy5Pc0J43LNBvW+tzQ5JtY5bvSPLFSWqRJE3HpEcMZwNXVNWxwBVtvpPkEOANwJOBE4A3jAZIkhcAd05YhyRpSiYNhtOAC9v0hcDpY/o8B7i8qm6rqtuBy4FTAJI8AngN8KYJ65AkTcmkwXBEVd0C0J4PH9PnKOCmkfm9rQ3gjcBbge9NWIckaUrWL9YhyceAR45Z9Pol7iNj2irJ8cAxVfXqJFuWUMdZwFkAmzdvXuKuJUn7atFgqKpnL7QsybeSHFlVtyQ5Evj2mG57gRNH5jcBVwJPAZ6U5MZWx+FJrqyqExmjqrYD2wFmZmZqsbolScsz6amkHcDcXxltA/5qTJ9LgZOTbGgXnU8GLq2qP6yqR1XVFuDngK8uFAqSpP1n0mB4M3BSkhuAk9o8SWaSvAegqm5juJZwdXuc19okSfdBqVp7Z2VmZmZqdnZ2tcuQpDUlyc6qmlmsn598liR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUsdgkCR1UlWrXcM+S3Ir8I3VrmMfHQZ8Z7WL2M8c8wODY147fqqqNi7WaU0Gw1qUZLaqZla7jv3JMT8wOOb7H08lSZI6BoMkqWMw7D/bV7uAVeCYHxgc8/2M1xgkSR2PGCRJHYNhipIckuTyJDe05w0L9NvW+tyQZNuY5TuSfHHlK57cJGNO8rAklyT5SpJdSd68f6vfN0lOSXJ9kt1Jzh6z/IAkF7fln0myZWTZ61r79Umesz/rnsRyx5zkpCQ7k3yhPT9zf9e+HJP8jNvyzUnuTPLa/VXziqgqH1N6AG8Bzm7TZwPnj+lzCLCnPW9o0xtGlr8A+HPgi6s9npUeM/Aw4Bmtz0OAq4BTV3tMC4xzHfA14DGt1s8BW+f1+XXgj9r0mcDFbXpr638AcHTbzrrVHtMKj/mJwKPa9M8C31zt8azkeEeWfxj4S+C1qz2eSR4eMUzXacCFbfpC4PQxfZ4DXF5Vt1XV7cDlwCkASR4BvAZ4036odVqWPeaq+l5VfQKgqn4AXANs2g81L8cJwO6q2tNqvYhh7KNGX4sPAc9KktZ+UVXdVVVfB3a37d3XLXvMVXVtVd3c2ncBByY5YL9UvXyT/IxJcjrDm55d+6neFWMwTNcRVXULQHs+fEyfo4CbRub3tjaANwJvBb63kkVO2aRjBiDJwcDzgCtWqM5JLTqG0T5VdTdwB3DoEte9L5pkzKNeCFxbVXetUJ3TsuzxJnk48FvAufuhzhW3frULWGuSfAx45JhFr1/qJsa0VZLjgWOq6tXzz1uutpUa88j21wN/Aby9qvbse4X7xb2OYZE+S1n3vmiSMQ8Lk+OA84GTp1jXSplkvOcCF1TVne0AYk0zGPZRVT17oWVJvpXkyKq6JcmRwLfHdNsLnDgyvwm4EngK8KQkNzL8XA5PcmVVncgqW8Exz9kO3FBVb5tCuStlL/DokflNwM0L9Nnbwu4g4LYlrntfNMmYSbIJ+Ajwkqr62sqXO7FJxvtk4IwkbwEOBn6U5PtV9c6VL3sFrPZFjvvTA/h9+guxbxnT5xDg6wwXXze06UPm9dnC2rn4PNGYGa6nfBh40GqPZZFxrmc4f3w091yYPG5en1fQX5j8YJs+jv7i8x7WxsXnScZ8cOv/wtUex/4Y77w+57DGLz6vegH3pwfDudUrgBva89wvvxngPSP9/gXDBcjdwK+O2c5aCoZlj5nhHVkBXwaua4+Xr/aY7mWszwW+yvCXK69vbecBz2/TBzL8Rcpu4LPAY0bWfX1b73ruo395Nc0xA/8e+O7Iz/U64PDVHs9K/oxHtrHmg8FPPkuSOv5VkiSpYzBIkjoGgySpYzBIkjoGgySpYzDcjyWpJB8YmV+f5NYkH11kveOTPPdels8kefs0ax2zj+fP3d0yyelJto4suzLJVL5vN8m/m8Z2Ftj2jUkOW8Z675kb72h9Sbbsj7vuJvnrdouSe+vz0iSPWula7qv7v78zGO7fvgv8bJKHtvmTgG8uYb3jGf6e+x9Isr6qZqvqX0+pxrGqakdVzd2G+3SGO5SuhBULhuWqqpdX1Zfa7H6vr6qeW1X/e5FuLwX26Rdz+6TwtOzz/rV0BsP9398AP9+mX8xwTyIAkjw8yfuSXJ3k2iSnJXkIwwd6XpTkuiQvSnJOku1JLgP+U5IT5446kjwiyZ+2++5/PskLk6xL8v4kX2ztrx4tqC3fk8HBSX6U5Olt2VVJjmnvCN+Z5KnA84Hfb/U8tm3mnyX5bJKvJnlaW/fAkVquTfKM1v7SJO8c2f9H2xjeDDy0bffP5r9wSf4wyWyG74o4d6T9xiTnJrmm7etxrf3QJJe1ff8xY+6rk+QXk/yHNv2qJHva9GOT/G2bvrIdlY2rb12SP2k1XTYS+qP7eF6G7wq4NsnHkhzR2s9pP+8r2+s/NtznjnTaEcqX5+8vyRkMH2D8s1bbQ5M8KcknM3z3wqUZbo8yN5bfTfJJ4FVJjkjykSSfa4+ntn6/0n6e1yX54yTrWvudSd7aXusrkmwct/9x49AEVvsTdj5W7gHcCTyB4fbABzJ8+vRE4KNt+e8Cv9KmD2b4xOfDGd6NvXNkO+cAO4GHtvnRbZwPvG2k7wbgSQy32Z5rO3hMbf+N4VYRvwBczfDJ4AOAr7flP64BeD9wxsi6VwJvbdPPBT7Wpv8N8Kdt+nHA37dxzx/PR4ET516je3n95j7Fva7t8wlt/kbgN9r0r9M+4Q28HfidNv3zDJ/qPmzeNh8JXN2mP9TGfhSwDfi9kfHNzK+P4RPxdwPHt/kPzv385u1jA/d8be/LR16rc4D/3l7nw4D/BTx4zPo3tuUL7m9ejQ9u293Y5l8EvG+k37tHtn0x8Jsjr+tBwD8C/utcLcC7Ge6vRHsNf7lN/87Iv4kf79/H9B/eRO9+rqo+n+FurS8G/nre4pOB5+eeb5s6ENi8wKZ2VNX/HdP+bIZ7xszt7/b2LvgxSd4BXAJcNma9q4CnM9yX5veAfwl8kuEX5VL8l/a8k+EXGMDPAe9odXwlyTeAn17i9sb5xSRnMdxD50iG01mfH7P/F7Tpp89NV9UlSW6fv8Gq+p/tKOsnGG7G9udtvaeNbPPefL2qrhvZ95YxfTYBF7d37Q9huDfVnEtquP31XUm+DRzBcGO4Sfb3MwxfxnN5hjuLrgNuGVl+8cj0M4GXAFTVD4E7kvxzhjcTV7f1H8o9N2P80cj6/5mlvUaakKeSHhh2AH/AyGmkJgw3OTu+PTZX1ZcX2MZ3F2gP825NXMOX8fxjhnd1rwDeM2a9qxh+GZ7AEFgHMxyJfGqxwTRz9/b/IffcJXih+x3fTf9v/cDFNp7kaOC1wLOq6gkMATe63rj9w9Jup/13wK8y3Ddp7nV4CvDpJaw7+p0G8/c95x0M76wfD/zaAnXf2/r7ur8Au0b+HT2+qkZvs73Qv53R9S8cWf9nquqcBfp6D5/9wGB4YHgfcF5VfWFe+6XAbyQ//gaqJ7b2/wP8xBK3fRnwyrmZJBsy/CXOg6rqw8BvA/9kzHqfAZ4K/Kiqvs9wmuvXGH5RzrfUej4F/HKr46cZjn6uZzg1cnySByV5NP23p/2/JA8es62fZPiFdkc7R3/qPu7/VIZTOgv1e217vhZ4BnBXVd0xpu9C9d2bg7jnjwy27eO6SzX6M7ke2JjkKQBJHpzhexjGuQL4V63fuiQ/2drOSHJ4az8kyU+1/g8CzmjTvwT87Zj9a8oMhgeAqtpbVf9xzKI3Mpwf/nyGP4N8Y2v/BLC1Xdh70SKbfxOwIcOF5s8x/JI7CrgyyXUM1wdeN6amuxi+Cet/tKarGP6jzw8vGL5i8d+2i6mPHbN8zrsZLs5+geH0w0vbfj7NcDrlCwxHTteMrLO9jb+7+FxVn2P4pb2LIViX8m7+XODpSa5hOE339wv0u4rhNNKn2umUm7jnF958Y+tbxDnAXya5CvjOPqy3L94P/FH7Ga9j+OV9fvs3cB1D6I/zKuAZ7We0k+G21l9iuBvrZUk+z/DVr0e2/t8Fjkuyk+E01Hnz9+/F5+nz7qqS7rOS3FlVj1jtOh5oPGKQJHU8YpAkdTxikCR1DAZJUsdgkCR1DAZJUsdgkCR1DAZJUuf/A6flJmuHzJr7AAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"bar(params, metrics, color='lightblue')\n",
"pyplot.xlabel('Metrics without and with an intercept')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Decision Tree"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [],
"source": [
"def extract_features_dt(fields):\n",
" features=np.zeros(total_dt)\n",
" step=0\n",
" for i in type_columns:\n",
" features[step]=float(type_maps[i][fields[i]])\n",
" step=step+1\n",
" \n",
" for i in type_columns_with_NA:\n",
" features[step]=float(type_maps[i][fields[i]])\n",
" step=step+1\n",
" for i in number_columns:\n",
" features[step]=float(fields[i])\n",
" step=step+1\n",
" return features"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"outputs": [],
"source": [
"data_dt=records.map(lambda fields: LabeledPoint(float(fields[saleprice_column]),extract_features_dt(fields)))"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[LabeledPoint(208500.0, [0.0,0.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,3.0,2.0,0.0,3.0,5.0,0.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,60.0,8450.0,7.0,5.0,2003.0,2003.0,706.0,0.0,150.0,856.0,856.0,854.0,0.0,1710.0,1.0,0.0,2.0,1.0,3.0,1.0,8.0,0.0,2.0,548.0,0.0,61.0,0.0,0.0,0.0,0.0,0.0,2.0,2008.0]), LabeledPoint(181500.0, [0.0,0.0,0.0,2.0,1.0,0.0,0.0,10.0,1.0,0.0,0.0,6.0,2.0,3.0,6.0,6.0,2.0,2.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,3.0,3.0,3.0,3.0,5.0,3.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,20.0,9600.0,6.0,8.0,1976.0,1976.0,978.0,0.0,284.0,1262.0,1262.0,0.0,0.0,1262.0,0.0,1.0,2.0,0.0,3.0,1.0,6.0,1.0,2.0,460.0,298.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2007.0]), LabeledPoint(223500.0, [0.0,0.0,1.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,3.0,0.0,0.0,3.0,5.0,3.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,60.0,11250.0,7.0,5.0,2001.0,2002.0,486.0,0.0,434.0,920.0,920.0,866.0,0.0,1786.0,1.0,0.0,2.0,1.0,3.0,1.0,6.0,1.0,2.0,608.0,0.0,42.0,0.0,0.0,0.0,0.0,0.0,9.0,2008.0]), LabeledPoint(140000.0, [0.0,0.0,1.0,2.0,1.0,3.0,0.0,11.0,0.0,0.0,0.0,5.0,2.0,3.0,7.0,1.0,2.0,2.0,2.0,1.0,3.0,1.0,1.0,0.0,1.0,0.0,3.0,0.0,0.0,3.0,4.0,2.0,3.0,3.0,5.0,4.0,6.0,3.0,3.0,3.0,0.0,0.0,0.0,70.0,9550.0,7.0,5.0,1915.0,1970.0,216.0,0.0,540.0,756.0,961.0,756.0,0.0,1717.0,1.0,0.0,1.0,0.0,3.0,1.0,7.0,1.0,3.0,642.0,0.0,35.0,272.0,0.0,0.0,0.0,0.0,2.0,2006.0]), LabeledPoint(250000.0, [0.0,0.0,1.0,2.0,1.0,0.0,0.0,12.0,0.0,0.0,0.0,5.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,3.0,4.0,0.0,3.0,5.0,3.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,60.0,14260.0,8.0,5.0,2000.0,2000.0,655.0,0.0,490.0,1145.0,1145.0,1053.0,0.0,2198.0,1.0,0.0,2.0,1.0,4.0,1.0,9.0,1.0,3.0,836.0,192.0,84.0,0.0,0.0,0.0,0.0,0.0,12.0,2008.0]), LabeledPoint(143000.0, [0.0,0.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,3.0,0.0,0.0,2.0,2.0,3.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,3.0,2.0,0.0,3.0,5.0,0.0,5.0,3.0,3.0,3.0,0.0,1.0,1.0,50.0,14115.0,5.0,5.0,1993.0,1995.0,732.0,0.0,64.0,796.0,796.0,566.0,0.0,1362.0,1.0,0.0,1.0,1.0,1.0,1.0,5.0,0.0,2.0,480.0,40.0,30.0,0.0,320.0,0.0,0.0,700.0,10.0,2009.0]), LabeledPoint(307000.0, [0.0,0.0,0.0,2.0,1.0,2.0,0.0,13.0,0.0,0.0,0.0,6.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,3.0,4.0,3.0,4.0,0.0,3.0,5.0,4.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,20.0,10084.0,8.0,5.0,2004.0,2005.0,1369.0,0.0,317.0,1686.0,1694.0,0.0,0.0,1694.0,1.0,0.0,2.0,0.0,3.0,1.0,7.0,1.0,2.0,636.0,255.0,57.0,0.0,0.0,0.0,0.0,0.0,8.0,2007.0]), LabeledPoint(200000.0, [0.0,0.0,1.0,2.0,1.0,3.0,0.0,2.0,2.0,0.0,0.0,5.0,2.0,3.0,8.0,7.0,2.0,2.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,3.0,2.0,3.0,0.0,3.0,4.0,5.0,3.0,5.0,2.0,3.0,3.0,0.0,0.0,1.0,60.0,10382.0,7.0,6.0,1973.0,1973.0,859.0,32.0,216.0,1107.0,1107.0,983.0,0.0,2090.0,1.0,0.0,2.0,1.0,3.0,1.0,7.0,2.0,2.0,484.0,235.0,204.0,228.0,0.0,0.0,0.0,350.0,11.0,2009.0]), LabeledPoint(129900.0, [2.0,0.0,0.0,2.0,1.0,2.0,0.0,14.0,3.0,0.0,0.0,0.0,2.0,3.0,9.0,1.0,2.0,2.0,2.0,1.0,3.0,1.0,2.0,3.0,1.0,0.0,3.0,0.0,0.0,3.0,3.0,2.0,4.0,3.0,0.0,3.0,6.0,3.0,0.0,3.0,0.0,0.0,0.0,50.0,6120.0,7.0,5.0,1931.0,1950.0,0.0,0.0,952.0,952.0,1022.0,752.0,0.0,1774.0,0.0,0.0,2.0,0.0,2.0,2.0,8.0,2.0,2.0,468.0,90.0,0.0,205.0,0.0,0.0,0.0,0.0,4.0,2008.0]), LabeledPoint(118000.0, [0.0,0.0,0.0,2.0,1.0,3.0,0.0,15.0,3.0,1.0,3.0,1.0,2.0,3.0,6.0,6.0,2.0,2.0,2.0,1.0,2.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,0.0,3.0,5.0,3.0,5.0,2.0,4.0,3.0,0.0,0.0,0.0,190.0,7420.0,5.0,6.0,1939.0,1950.0,851.0,0.0,140.0,991.0,1077.0,0.0,0.0,1077.0,1.0,0.0,1.0,0.0,2.0,2.0,5.0,2.0,1.0,205.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,1.0,2008.0])]\n"
]
}
],
"source": [
"print(data_dt.take(10))"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Decision Tree feature vector: [0.0,0.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,3.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,3.0,2.0,0.0,3.0,5.0,0.0,5.0,2.0,3.0,3.0,0.0,0.0,0.0,60.0,8450.0,7.0,5.0,2003.0,2003.0,706.0,0.0,150.0,856.0,856.0,854.0,0.0,1710.0,1.0,0.0,2.0,1.0,3.0,1.0,8.0,0.0,2.0,548.0,0.0,61.0,0.0,0.0,0.0,0.0,0.0,2.0,2008.0]\n",
"Decision Tree feature vector length: 76\n"
]
}
],
"source": [
"first_point_dt = data_dt.first()\n",
"print (\"Decision Tree feature vector: \" + str(first_point_dt.features))\n",
"print (\"Decision Tree feature vector length: \" + str(len(first_point_dt.features)))"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.tree import DecisionTree"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Decision Tree predictions: [(208500.0, 190334.33561643836), (181500.0, 147907.61375661375), (223500.0, 190334.33561643836), (140000.0, 156058.38888888888), (250000.0, 307760.1111111111)]\n",
"Decision Tree depth: 5\n",
"Decision Tree number of nodes: 63\n"
]
}
],
"source": [
"dt_model = DecisionTree.trainRegressor(data_dt,{})\n",
"preds = dt_model.predict(data_dt.map(lambda p: p.features))\n",
"actual = data.map(lambda p: p.label)\n",
"true_vs_predicted_dt = actual.zip(preds)\n",
"print (\"Decision Tree predictions: \" + str(true_vs_predicted_dt.take(5)))\n",
"print (\"Decision Tree depth: \" + str(dt_model.depth()))\n",
"print (\"Decision Tree number of nodes: \" + str(dt_model.numNodes()))"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1460\n",
"log - Mean Squared Error: 875573280.8278\n",
"log - Mean Absolue Error: 21582.1548\n",
"Root Mean Squared Log Error: 0.1736\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_dt.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared Error: %2.4f\" % t)\n",
"print(\"log - Mean Absolue Error: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log Error: %2.4f\" % s_log_mean)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Impact of training on log-transformed targets"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [],
"source": [
"data_dt_log = data_dt.map(lambda lp: LabeledPoint(np.log(lp.label), lp.features))\n",
"\n",
"dt_model_log = DecisionTree.trainRegressor(data_dt_log,{})\n",
"\n",
"preds_log = dt_model_log.predict(data_dt_log.map(lambda p: p.features))\n",
"\n",
"actual_log = data_dt_log.map(lambda p: p.label)"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [],
"source": [
"new=actual_log.zip(preds_log)"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(12.247694320220994, 12.147159998151047),\n",
" (12.109010932687042, 11.890912291269839),\n",
" (12.31716669303576, 12.147159998151047),\n",
" (11.84939770159144, 11.949554245993713),\n",
" (12.429216196844383, 12.515673640608348)]"
]
},
"execution_count": 136,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new.take(5)"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [],
"source": [
"true_vs_predicted_dt_log=[]\n",
"for val in new.collect():\n",
" t,p=val[0],val[1]\n",
" x=np.exp(t),np.exp(p)\n",
" true_vs_predicted_dt_log.append(x)"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1460\n",
"log - Mean Squared Error: 1022580494.4448\n",
"log - Mean Absolue Error: 21569.5794\n",
"Root Mean Squared Log Error: 0.1610\n",
"Non log-transformed predictions:\n",
"[(208500.0, 190334.33561643836), (181500.0, 147907.61375661375), (223500.0, 190334.33561643836)]\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_dt_log:\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared Error: %2.4f\" % t)\n",
"print(\"log - Mean Absolue Error: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log Error: %2.4f\" % s_log_mean)\n",
"print (\"Non log-transformed predictions:\\n\" + str(true_vs_predicted_dt.take(3)))\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CROSS VALIDATION for the decision tree"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {},
"outputs": [],
"source": [
"train_dt, test_dt = data_dt.randomSplit([0.8, 0.2], seed=12345)"
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {},
"outputs": [],
"source": [
"def evaluate_dt(train, test, maxDepth, maxBins):\n",
"\n",
" model = DecisionTree.trainRegressor(train, {}, impurity='variance', maxDepth=maxDepth, maxBins=maxBins)\n",
"\n",
" preds = model.predict(test.map(lambda p: p.features))\n",
"\n",
" actual = test.map(lambda p: p.label)\n",
"\n",
" tp = actual.zip(preds)\n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tree depth"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 2, 3, 4, 5, 10, 20]\n",
"[0.332943090421251, 0.2770328548990305, 0.25563569006835973, 0.24091676957589137, 0.212163652773227, 0.22209830650755852, 0.23462193469250922]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEKCAYAAAD+XoUoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8HPV9//HXR5clWbctS7IkfIBPsIAgDIEACQXCYSBHk0BDkzRpIW1o0+aX/EJ+pPmlpLkgTX9pSxtImrNJCEeTGmMCCSGBJJhggi1jjE8OS76EbfmSbFnS5/fHjMRqvbLWtqRZ7byfj4ce2p35zu5nR6v3zn5n5jvm7oiISDzkRF2AiIiMHYW+iEiMKPRFRGJEoS8iEiMKfRGRGFHoi4jEiEJfRCRGFPoiIjGi0BcRiZG8qAtINnnyZJ8+fXrUZYiIjCvPPvvsa+5ePVy7jAv96dOns3z58qjLEBEZV8zslXTaqXtHRCRGFPoiIjGi0BcRiRGFvohIjCj0RURiRKEvIhIjCn0RkRjJmtDf03mYr/1iPS2tHVGXIiKSsTLu5KzjlZMD//yLdeTlGk0NFVGXIyKSkbJmS7+0MJ/6iiLWbtsXdSkiIhkra0IfYG5tqUJfROQosir0Z9eWsrF9P909fVGXIiKSkbIq9OfWltLT52x6bX/UpYiIZKSsCv05taUA6uIRERlCVoX+zMkl5OWYQl9EZAhZFfoFeTmcXF2i0BcRGUJWhT4EO3NfVOiLiKSUdaE/t7aUto4u9h08HHUpIiIZJ+tCf05NsDN33XZt7YuIJMu+0B84gkeHbYqIJMu60G+oLKJkQh5rt+2NuhQRkYyTdaFvZsyuKdHOXBGRFLIu9CHo4lm7fR/uHnUpIiIZJTtDv6aUjs7D7Nh3KOpSREQySnaGfm0ZoOEYRESSZWXoz9UYPCIiKaUV+mZ2uZmtNbMNZnZLivkfNrNVZrbCzH5jZvPD6Zea2bPhvGfN7OKRfgGpVE4sYErpBO3MFRFJMmzom1kucCdwBTAfuL4/1BP80N0XuPsZwO3AV8PprwFXu/sC4P3A90es8mEEO3N12KaISKJ0tvQXAhvcfZO7dwP3ANcmNnD3xHSdCHg4/Tl33xJOXw0UmtmEEy97eHNqSlm/fT+9fTqCR0SkXzqhXw9sTrjfGk4bxMw+YmYbCbb0/ybF47wTeM7djzikxsxuNLPlZra8vb09vcqHMae2lEM9fby888CIPJ6ISDZIJ/QtxbQjNp/d/U53Pxn4JPDpQQ9gdirwZeCmVE/g7ne7e7O7N1dXV6dR0vDmhkfwrFO/vojIgHRCvxVoTLjfAGwZoi0E3T9v679jZg3AT4D3ufvG4ynyeMyqKSHH0M5cEZEE6YT+M8AsM5thZgXAdcDixAZmNivh7lXA+nB6BfAQ8Cl3/+3IlJyewvxcpk+aqMM2RUQSDBv67t4D3Aw8AqwB7nX31WZ2m5ldEza72cxWm9kK4GMER+oQLncK8Pfh4ZwrzGzKyL+M1GbXBMMxiIhIIC+dRu6+FFiaNO0zCbc/OsRy/wj844kUeCLm1JbyyAvb6OrupaggN6oyREQyRlaekdtvbm0p7rBhh8bWFxGBLA/9/guqvKix9UVEgCwP/WmTJjIhL0c7c0VEQlkd+rk5xqyaEu3MFREJZXXoA8ypKdOx+iIioawP/bm1pbTvO8SuA91RlyIiErmsD/05GltfRGRA1of+6xdU0RE8IiJZH/rVpROoKM7XzlwREWIQ+mbGnJpS7cwVESEGoQ9BF8+6bfvo0wVVRCTmYhH6c2rLONDdS1tHV9SliIhEKiahryN4REQgJqE/u6YEQDtzRST2YhH6pYX51FcUaWeuiMReLEIfgp25OlZfROIuNqE/p7aUTe0H6O7pi7oUEZHIxCr0e/qcTa/pgioiEl+xCf25tWWAjuARkXiLTejPmDyRvBzTzlwRibXYhH5BXg4nV5doS19EYi02oQ9Bv75CX0TiLHah39bRxb6Dh6MuRUQkEmmFvpldbmZrzWyDmd2SYv6HzWyVma0ws9+Y2fyEeZ8Kl1trZm8dyeKPVf/Y+ut0Zq6IxNSwoW9mucCdwBXAfOD6xFAP/dDdF7j7GcDtwFfDZecD1wGnApcD/x4+XiT6x+DRzlwRiat0tvQXAhvcfZO7dwP3ANcmNnD3xFNdJwL9YxhfC9zj7ofc/SVgQ/h4kaivKKJkQp769UUktvLSaFMPbE643wqck9zIzD4CfAwoAC5OWHZZ0rL1x1XpCDAzZteUaEtfRGIrnS19SzHtiKuRuPud7n4y8Eng08eyrJndaGbLzWx5e3t7GiUdvzm1Zazdtg93XVBFROInndBvBRoT7jcAW47S/h7gbceyrLvf7e7N7t5cXV2dRknHb25tKXu6DrNj36FRfR4RkUyUTug/A8wysxlmVkCwY3ZxYgMzm5Vw9ypgfXh7MXCdmU0wsxnALOD3J1728dPOXBGJs2H79N29x8xuBh4BcoFvuftqM7sNWO7ui4GbzewS4DCwG3h/uOxqM7sXeAHoAT7i7r2j9FrSMqem/ypae7lo9uh+qxARyTTp7MjF3ZcCS5OmfSbh9kePsuzngc8fb4EjrXJiAVNKJ2hLX0RiKVZn5PbTcAwiElexDP25taWs37Gf3j4dwSMi8RLL0J9TW0Z3Tx8v7zwQdSkiImMqnqE/sDNXXTwiEi+xDP1ZNSXkmA7bFJH4iWXoF+bnMn3SRNZu2zt8YxGRLBLL0AcdwSMi8RTr0H9lVydd3ZGeKyYiMqZiG/pza0txh/U7tLUvIvER29CfXaMxeEQkfmIb+tMmTaQwP0f9+iISK7EN/dwcY9YU7cwVkXiJbehDeASPLpIuIjES69CfW1tK+75D7DrQHXUpIiJjItah//rOXJ2kJSLxEOvQn1urMXhEJF5iHfrVpROoLM5X6ItIbMQ69M2MObWlOlZfRGIj1qEPMLe2jPXb99GnC6qISAzEPvTn1JZyoLuXto6uqEsRERl1sQ99DccgInES+9CfM3AEjw7bFJHsF/vQL5mQR0Nlkbb0RSQWYh/6EByvv07DMYhIDKQV+mZ2uZmtNbMNZnZLivkfM7MXzKzFzB4zs2kJ8243s9VmtsbM/sXMbCRfwEiYU1vKpvYDdPf0RV2KiMioGjb0zSwXuBO4ApgPXG9m85OaPQc0u3sTcD9we7jsecD5QBNwGnA2cNGIVT9CZteU0tPnbGzfH3UpIiKjKp0t/YXABnff5O7dwD3AtYkN3P1xd+8M7y4DGvpnAYVAATAByAe2j0ThI2lubRmg4RhEJPulE/r1wOaE+63htKF8CHgYwN2fAh4HtoY/j7j7muMrdfTMrJ5Ifq5pZ66IZL10Qj9VH3zK01fN7AagGbgjvH8KMI9gy78euNjMLkyx3I1mttzMlre3t6db+4jJz83h5OoSHbYpIlkvndBvBRoT7jcAW5IbmdklwK3ANe5+KJz8dmCZu+939/0E3wDOTV7W3e9292Z3b66urj7W1zAi5tSWsm67+vRFJLulE/rPALPMbIaZFQDXAYsTG5jZmcBdBIG/I2HWq8BFZpZnZvkEO3EzrnsHgp25bR1d7D14OOpSRERGzbCh7+49wM3AIwSBfa+7rzaz28zsmrDZHUAJcJ+ZrTCz/g+F+4GNwCpgJbDS3R8c6RcxEvrH1l+nfn0RyWJ56TRy96XA0qRpn0m4fckQy/UCN51IgWOlfziGla17aJ5eFXE1IiKjQ2fkhuorilhQX84Plr2iYZZFJGsp9ENmxl9cOJNNrx3g52sy7lQCEZERodBPcOVptTRUFnH3E5uiLkVEZFQo9BPk5ebw52+awbOv7Gb5y7uiLkdEZMQp9JO8++xGKorzuUtb+yKShRT6SYoL8vjTc6fxizXbNQCbiGQdhX4K73vjdPJzc/jmk9raF5HsotBPobp0Au98QwMP/KGN9n2Hhl9ARGScUOgP4S8umMHh3j6++7uXoy5FRGTEKPSHMLO6hEvn1fD9Za9w4FBP1OWIiIwIhf5R3HTRTPZ0Hebe5ZuHbywiMg4o9I/irGlVnDWtkv/8zUv09Or6uSIy/in0h3HThTNp3d3F0ue3RV2KiMgJU+gP45J5NcysnsjdT2zEXQOxicj4ptAfRk6O8RcXzOT5tr38buPOqMsRETkhCv00vP3MeiaXTNDQDCIy7in001CYn8sHzpvGE+vaWbNVF08XkfFLoZ+mG86dRnFBLt/Q1r6IjGMK/TRVFBfw7uZGFq/cwpaOrqjLERE5Lgr9Y/ChN83AgW//9qWoSxEROS4K/WPQWFXMlQvq+NHvN7P34OGoyxEROWYK/WN004Uz2X+ohx8+/WrUpYiIHDOF/jE6rb6c80+ZxLd/+xLdPRqaQUTGF4X+cbjxwpPZvvcQ/7OiLepSRESOSVqhb2aXm9laM9tgZrekmP8xM3vBzFrM7DEzm5Yw7yQze9TM1oRtpo9c+dG4cNZk5taW8o0nN9HXp6EZRGT8GDb0zSwXuBO4ApgPXG9m85OaPQc0u3sTcD9we8K87wF3uPs8YCGwYyQKj5KZceOFM1m3fT+/WjfuX46IxEg6W/oLgQ3uvsndu4F7gGsTG7j74+7eGd5dBjQAhB8Oee7+87Dd/oR249rVp0+lrryQu36tk7VEZPxIJ/TrgcSriLSG04byIeDh8PZsoMPM/tvMnjOzO8JvDuNefm4OHzx/Bk+/tIuVmzuiLkdEJC3phL6lmJayI9vMbgCagTvCSXnABcDHgbOBmcAHUix3o5ktN7Pl7e3taZSUGa5b2EjphDzu1tAMIjJOpBP6rUBjwv0GYEtyIzO7BLgVuMbdDyUs+1zYNdQD/BR4Q/Ky7n63uze7e3N1dfWxvobIlBbm8yfnnsTDz2/l1Z1Z0WslIlkundB/BphlZjPMrAC4Dlic2MDMzgTuIgj8HUnLVppZf5JfDLxw4mVnjg+eP4PcHOObv9HWvohkvmFDP9xCvxl4BFgD3Ovuq83sNjO7Jmx2B1AC3GdmK8xscbhsL0HXzmNmtoqgq+gbo/A6IlNTVsjbzqjn3uWb2XWgO+pyRESOyjLtEoDNzc2+fPnyqMs4Juu37+PSf36Cv7tkNh+9ZFbU5YhIDJnZs+7ePFw7nZE7AmbVlHLx3Cl876mXOXi4N+pyRESGpNAfITdeOJOdB7r5r2WvRF2KiMiQFPoj5JwZVbxlTjX/9Og6Xtl5IOpyRERSUuiPEDPjC+9YQF6u8Yn7WzQmj4hkJIX+CKorL+LvF83n9y/t4vvq5hGRDKTQH2HvOquBi2ZX86WHX9QJWyKScRT6I8zM+OI7FpCXY3zi/pXq5hGRjKLQHwVTK4r49KJ5PP3SLv7raXXziEjmUOiPknc3N3KhunlEJMMo9EeJmfGldywgx4z//YC6eUQkMyj0R9HUiiI+fdU8lm3axQ/UzSMiGUChP8rec3YjF8yazBcffpHNu9TNIyLRUuiPMjPjS+9sCrp5dNKWiERMoT8G6iuKuPWqeTy1aSc//P2rUZcjIjGm0B8j153dyJtOmcwXl65RN4+IREahP0aCbp4FmBmffKCFTLuOgYjEg0J/DDVUFvN/rpzH7zaqm0dEoqHQH2PXLwy6eb7w0Bpad6ubR0TGlkJ/jPWPzQNwywOr1M0jImNKoR+BxqpiPnXlPH6z4TV+9PvNUZcjIjGi0I/Inyw8ifNOnsTnH3pB3TwiMmYU+hHJyTG+/M4mHPjUf6ubR0TGhkI/Qv3dPE+uf417nlE3j4iMPoV+xN678CTeOHMSn39oDW0dXVGXIyJZLq3QN7PLzWytmW0ws1tSzP+Ymb1gZi1m9piZTUuaX2ZmbWb2byNVeLbIyTFu/+Mm+ty5RSdticgoGzb0zSwXuBO4ApgPXG9m85OaPQc0u3sTcD9we9L8zwG/PvFys1NjVTGfumIuT65/jcUrt0RdjohksXS29BcCG9x9k7t3A/cA1yY2cPfH3b3/EJRlQEP/PDM7C6gBHh2ZkrPTe8+ZxozJE/mx+vZFZBSlE/r1QGIStYbThvIh4GEAM8sB/gn4xNGewMxuNLPlZra8vb09jZKyT06OsaipjmWbdtK+71DU5YhIlkon9C3FtJQdz2Z2A9AM3BFO+itgqbsfdfPV3e9292Z3b66urk6jpOy0qGkqfQ4PP7816lJEJEulE/qtQGPC/QbgiI5nM7sEuBW4xt37N1XfCNxsZi8DXwHeZ2ZfOqGKs9ic2lJmTSlhyUqFvoiMjnRC/xlglpnNMLMC4DpgcWIDMzsTuIsg8Hf0T3f397r7Se4+Hfg48D13P+LoH3nd1adP5ZlXdrF1jw7fFJGRN2zou3sPcDPwCLAGuNfdV5vZbWZ2TdjsDqAEuM/MVpjZ4iEeToaxqKkOd3ioRVv7IjLyLNOOC29ubvbly5dHXUakrvzakxTk5fDTj5wfdSkiMk6Y2bPu3jxcO52Rm4EWnV7His0duqyiiIw4hX4GWrRgKgAPrVIXj4iMrLyoC5AjnTSpmNMbylnSsoUPX3Ry1OWIyCg41NNL2+4uNu/uonV3J5t3dVFelM9fvnl0/+cV+hlqUdNUPr90DS+/doDpkydGXY6IHKOe3j627jnI5t2dtO7uonVXJ5t3d7F5V3B/+76DJO5Szc81zj9lskI/rq5qquPzS9ewpGULN188K+pyRCRJX5/Tvv/QQIhv3tXJ5nCLvbWjky0dB+ntez3VcwzqyotoqCzi/FMm01hVRGNlMY1VxTRUFlFTVkhuTqpzYUeWQj9DTa0oonlaJUtatir0RSLg7uzuPPx6qO/uDIM96I5p3d1Fd0/foGWqSyfQWFnEmY2VXHP64FCvKy+iIC/63agK/Qy2qKmOzz74Auu372NWTWnU5YhknX0HDydspfd3vby+5X6gu3dQ+4rifBori5lTU8ol82porCyioao4+F1ZTGF+bkSvJH0K/Qx25YI6/mHJCzzYspWPXarQFzlWBw/3DmyltyZspW/eFUzr6Dw8qP3EgtyBLfNzZ06iMSHQG6uKKC3Mj+iVjByFfgabUlbIOTOqWNKyhb+7ZBZmo9/fJzKeHO7tY2vHwYSul85BW+7JI9YW5OXQEIZ4U0N5GOpByDdWFVNZnJ/1/2cK/Qy3qGkqn/7p86zZuo/5U8uiLkdkTPX2OTv2HQy2zJNCvXV3F1v3dJGwr5TcHKOuvJDGymLeMqc6CPSEHabVJRPIGYOdpZlMoZ/hrjitlv+7eDVLWrYo9CXruDs7D3QP2kG6eVf/707aOro43Dt4qJiasgk0VhazcEbVQNdLf7DXlReSlxv9ztJMptDPcJNKJnDeyZNY0rKVT7x1TtZ/9ZTss6fr8BE7SBOPV+86PHhnadXEAhorizi1vpzLT6sb6HpprCxiakXRuNhZmskU+uPA1U1T+d8PtLCqbQ9NDRVRlyMySGd3z6Aul0HHq+/uZO/BnkHtSyfk0VBVzIzJE7lwdnUQ6gmHNk6coFgaTVq748BbT63l1p+u4sGVWxT6Mua6e/po6xh81Et/wLfu7uS1/d2D2hfm5wRHu1QWcda0yiNOQiovyv6dpZlMoT8OlBfnc8Gsah5q2cqnrpgX+x1RMrJ6+5yte7oGdb20JpyQtG3v4OEC8nKM+nDr/JJ5NQNh3v+7umSCQj2DKfTHiUVNdfzyxR08t3k3Z02rirocGUfcnfZ9hxJ2lL4+VMDmXV1s6eiiJ+EQGDOoKyukoaqYN548aWArvf9EpNoxGi5ARodCf5y4dH4NBXk5PLhyq0JfBnH3cGfp4K6XxNuHkoYLmFwygcaqIk5vrGBRU92g49WnVmTGcAEyOhT640RpYT5vmVPN0lVb+ftF87WlFTMHDvUM7BxNFer7Dw3eWVpelE9DZRGzppRy8dwpr3fBVBbTUFlMUYGOgIkrhf44sqhpKo+s3s4zL+/i3JmToi5HRtDBw720dXQNHtArYct9d9JwAUX5uQM7SM+dOWngLNPGqqBvvSwLhguQ0aHQH0f+aN4UivJzWdKyRaE/zgyMrZ5iqIDNuzrZkTxcQG4O9ZXBMLynLagbNFRAY2URVRMLtLNUjotCfxwpLsjj4nlTeHjVNj579ak68zCD9PU5O/YdCgO984hhA7buST22emNVERfNrh50BExjZTFTSjVcgIwOhf44c3VTHQ+1bOWpTTu5YFZ11OXEhruz60D3EaM0bt7VSdvuLlo7jhxbfUrpBBqrimmeVvl610t4JExteSH5+tCWCCj0x5k3z5nCxIJclqzcqtAfYQcO9fDKzs7BR8Ak7DTtTBpbvbI4n8aqYubVlXHp/JqBcdUbq4qp13ABkqHSCn0zuxz4GpALfNPdv5Q0/2PAnwM9QDvwQXd/xczOAP4DKAN6gc+7+49HsP7YKczP5dL5Nfxs9TY+97bTdGjdcerq7uWFrXtoad3DqtY9tLTtYWP7/kEnIZVMyKOhsoiTJhVz3ikJx6tXBTtNSzRcgIxDw75rzSwXuBO4FGgFnjGzxe7+QkKz54Bmd+80s78EbgfeA3QC73P39WY2FXjWzB5x944RfyUxcvXpU/npii38dsNrvGXulKjLyXgHD/eyZutenm8LQ75tD+u27xsYkrembAIL6iu4umkqs2pKBnaaVsRgbHWJn3Q2VRYCG9x9E4CZ3QNcCwyEvrs/ntB+GXBDOH1dQpstZrYDqAYU+ifgglnVlBXm8WDLFoV+ku6ePtZu20dLW0ewBd8aBHz/GaeTJhbQ1FDOZafW0lRfzoKGcmrKCiOuWmTspBP69cDmhPutwDlHaf8h4OHkiWa2ECgANh5LgXKkgrwc3npqLT97fhsHD/fGtu/4cG8f67fvZ1Vbx8AW/Itb99HdG+xQrSjOZ0F9OTfNncmC+gqaGsqpKy/U1rvEWjqhn+o/xFNMw8xuAJqBi5Km1wHfB97v7n0plrsRuBHgpJNOSqMkWXT6VO57tpUn1rVz2am1UZcz6nr7nI3t+8M++A5a2vbwwpa9A8MLlBbmsaC+nD9703SawoBvqCxSwIskSSf0W4HGhPsNwJbkRmZ2CXArcJG7H0qYXgY8BHza3ZelegJ3vxu4G6C5uTnlB4oMdt7Jk6gszufBlq1ZF/p9fc5LOw8MdM+sauvg+ba9AxfbmFiQy6n15fzpudNY0FBOU0MF06qKdVy7SBrSCf1ngFlmNgNoA64D/iSxgZmdCdwFXO7uOxKmFwA/Ab7n7veNWNVCfm4Ol59Wx/+saKOru3fcjqXi7ry6q3Oge6alNQj4/rFkCvNzOHVqOe85u5GmhnKaGsqZMblEYw+JHKdhQ9/de8zsZuARgkM2v+Xuq83sNmC5uy8G7gBKgPvCr9Ovuvs1wLuBC4FJZvaB8CE/4O4rRv6lxM/Vp9fxo9+/yi9f3MFVTXVRlzMsd6d1d1cY7nvCo2k6Bq6sVJCXw/y6Mt5+Zn24BV/OKdUlOvNYZASZe2b1pjQ3N/vy5cujLmNc6O1zzv3iYzRPq+Q/bjgr6nIGcXe27T046Dj4Va0dAwOH5ecac2vLgnAPj6KZXVOqs1RFjpOZPevuzcO109kl41hujnHlabXc88xm9h/qifRkoR37Dib0wQe/X9t/aKDO2TWlXDa/dmALfk5tKRPyxmeXlMh4ptAf5xadPpXvPvUKj63ZzrVn1I/Jc+7cf4hVbYlb8HvYtvcgEAwkdsqUEi6aXU1TQ7AFP7+uLLaHlYpkGoX+OHfWSZXUlhXy4MqtoxL6ezoPB1vuCSc7tXV0DcyfWT2Rc2dWsaAhOExyfl0ZEzU8gUjG0n/nOJeTY1zVVMf3nnqZPV2HKS86/otn7Dt4mOfb9g462emVnZ0D86dNKubMkyp4/3nTWFBfwWn1ZZTqYh0i44pCPwssaqrjP3/zEo+u3sa7mhuHX4BgRMnVW/bS0tox0FWz6bUDA/MbKotoaijnurNPYkF9OQvqyykvVsCLjHcK/SxwRmMFDZVFLGnZmjL0gxEl9w6cybqqdQ8bEkaUrCsvZEF9Oe94Qz0LGipYUF9O1cSCMX4VIjIWFPpZwMxY1DSVbz65ie17D7Jtz8GBQyRbWvewfsf+gas2TS6ZwOkN5VzVVEdTQzmn1ZczpVQDjonEhUI/SyxqquPrv97IOV94bGBa1cQCFtSXc+n8GhbUB8MV1JRN0Hg0IjGm0M8Sp04t468vPoWePh842am+QgOOichgCv0sYWb8r8vmRF2GiGQ4nfMuIhIjCn0RkRhR6IuIxIhCX0QkRhT6IiIxotAXEYkRhb6ISIwo9EVEYiTjLpdoZu3AK1HXcRSTgdeiLuIoVN+JUX0nRvWdmBOpb5q7Vw/XKONCP9OZ2fJ0rkMZFdV3YlTfiVF9J2Ys6lP3johIjCj0RURiRKF/7O6OuoBhqL4To/pOjOo7MaNen/r0RURiRFv6IiIxotBPYmaNZva4ma0xs9Vm9tEUbd5sZnvMbEX485kI6nzZzFaFz788xXwzs38xsw1m1mJmbxjD2uYkrJsVZrbXzP42qc2YrkMz+5aZ7TCz5xOmVZnZz81sffi7cohl3x+2WW9m7x/D+u4wsxfDv99PzKxiiGWP+l4Yxfo+a2ZtCX/DK4dY9nIzWxu+F28Zw/p+nFDby2a2Yohlx2L9pcyVSN6D7q6fhB+gDnhDeLsUWAfMT2rzZmBJxHW+DEw+yvwrgYcBA84Fno6ozlxgG8ExxJGtQ+BC4A3A8wnTbgduCW/fAnw5xXJVwKbwd2V4u3KM6rsMyAtvfzlVfem8F0axvs8CH0/j778RmAkUACuT/59Gq76k+f8EfCbC9ZcyV6J4D2pLP4m7b3X3P4S39wFrgPpoqzou1wLf88AyoMLM6iKo44+Aje4e6Ql37v4EsCtp8rXAd8Pb3wXelmLRtwI/d/dd7r4b+Dlw+VjU5+6PuntPeHcZ0DDSz5uuIdZfOhYCG9x9k7t3A/cQrPcRdbT6LLhm6LuBH43086brKLky5u9Bhf5RmNl04Ezg6RSz32hmK83sYTM7dUwLCzjwqJk9a2Y3pphfD2xOuN9KNB9e1zH0P1vU67DG3bdC8E8JTEnRJlPW4wcJvrmlMtx7YTTdHHY/fWuIrolMWH8XANvdff0Q88fKjnFPAAAIIklEQVR0/SXlypi/BxX6QzCzEuAB4G/dfW/S7D8QdFecDvwr8NOxrg84393fAFwBfMTMLkyan+qK6GN6qJaZFQDXAPelmJ0J6zAdmbAebwV6gB8M0WS498Jo+Q/gZOAMYCtBF0qyyNcfcD1H38ofs/U3TK4MuViKace9DhX6KZhZPsEf5gfu/t/J8919r7vvD28vBfLNbPJY1ujuW8LfO4CfEHyNTtQKNCbcbwC2jE11A64A/uDu25NnZMI6BLb3d3mFv3ekaBPpegx32i0C3uthB2+yNN4Lo8Ldt7t7r7v3Ad8Y4nmjXn95wDuAHw/VZqzW3xC5MubvQYV+krD/7z+BNe7+1SHa1IbtMLOFBOtx5xjWONHMSvtvE+zwez6p2WLgfeFRPOcCe/q/Ro6hIbewol6HocVA/5EQ7wf+J0WbR4DLzKwy7L64LJw26szscuCTwDXu3jlEm3TeC6NVX+I+orcP8bzPALPMbEb4ze86gvU+Vi4BXnT31lQzx2r9HSVXxv49OJp7rMfjD/Amgq9OLcCK8OdK4MPAh8M2NwOrCY5EWAacN8Y1zgyfe2VYx63h9MQaDbiT4MiJVUDzGNdYTBDi5QnTIluHBB8+W4HDBFtOHwImAY8B68PfVWHbZuCbCct+ENgQ/vzZGNa3gaAvt/99+PWw7VRg6dHeC2NU3/fD91YLQXjVJdcX3r+S4GiVjWNZXzj9O/3vuYS2Uay/oXJlzN+DOiNXRCRG1L0jIhIjCn0RkRhR6IuIxIhCX0QkRhT6IiIxotCPITNzM/t+wv08M2s3syXDLHfGUCMphvObzexfTrC2ajN72syeM7MLTuSxwseb3j/yYmJ9ZjbBzH4Rjqz4HjO7IBz9cIWZFZ3o8x6lnjeb2XnHOm8U6vismX38OJcd9D44kceSsZcXdQESiQPAaWZW5O5dwKVAWxrLnUFw/PDS5Blmlufuy4ETHZr2jwhOpkl7+Fgzy3X33uHaJdV3JpDv7meEj/F14Cvu/u00n9MILkLUl26doTcD+4HfHcu8cP32HLFENIZ8H8g4MBonIugns38IguULwB+H979HcObnkvD+ROBbBGdTPkcwEmAB8CrQTnBiyXsIhta9G3gU+CEJwyUDJcC3ef3knXcSDLP7HYIzHlcBf5dU1xlJz1FEcFbvqnCZLye9htsIBq16U9LjnEVwss1TwB2Ew+3210cwqNUGYE/4PDcRjND4EsEp8gCfCF9/C/AP4bTpBKMj/nu4XqYRnB35FMFYQvcBJWHbl4F/CKevAuaGy28j+IBdAVyQUPMR88J19VXgcYJxbY74u4TL5oavs7/em4b4u98KrAV+QXAy08fD6ScDPwOeBZ4E5obTvwN8PZy2jmA4iKHeB98CfkUw7O/fJLyPHgr/Fs8D74n6va8fV+jH8ScMzCbgfqAw/Od9M68H9heAG8LbFeE//ETgA8C/JTzOZ8OgKArvJz7Gl4H/l9C2kiCMf54wrSJFbQPPQXDm5KtANcG30l8CbwvnOfDuIV5fC3BRePuI0E++Hd7/Dq9/CF5G8GFmBF2gSwjGa58O9AHnhu0mA08AE8P7nyQcs50g9P86vP1XhGdXcpQx6JPnhTUtAXKH+bvcCHw6nD6B4NvMjKTHPovgw6cYKCP40OsP/ceAWeHtc4BfJjz/z8J1MIvgTNfCId4HvwufezLBmdj5BB/030hoV57qdetnbH/UvRNT7t4SDvF6PUd+Tb8MuCahn7YQOGmIh1rsQRdRsksIxlnpf77dZrYJmGlm/0qwBfjoMGWeDfzK3dsBzOwHBOH7U6CXYPCqQcysnODD5NfhpO8TDPx2LC4Lf54L75cQhN6rwCseXJ8AgovTzAd+Gw4jVECw1d+vf1CtZwkG/Toe9/nrXVdD/V0uA5rM7I/D6eVhvS8lPM4FwE88HMPHzBaHv0uA84D7wtcAQXj3u9eDLqz14d9v7hB1PuTuh4BDZrYDqCH4kPmKmX2Z4AP2yWN/+TLSFPrxthj4CsFW76SE6Qa8093XJjY2s3NSPMaBIR7bSBr+NQz+0wkuCvERggtbfPAo9aUaUrbfQU/dj3/E8x4HA77o7ncNmhh8SB5Iavdzd79+iMc5FP7u5fj/15KfL9XfxQi+VQw3CFeq9ZIDdHi4byONZYZat4cSbvcSXPFrnZmdRTDGzBfN7FF3v22YGmWU6eidePsWcJu7r0qa/gjw1wmjYJ4ZTt9HcKm3dDxKMKga4WNUhkMn57j7A8DfE1ze7mieBi4ys8lmlkvwreTXR1vA3TuAPWb2pnDSe9OsN9EjwAfDrWDMrN7MUl3cYhlwvpmdErYrNrPZwzz20dbhcOt3qL/LI8BfhkP3YmazwxEjEz0BvN3MisJRJa+GYIhr4CUze1e4rIUfzP3eZWY5ZnYyweBka9Ook/CxpgKd7v5fBBsXY3adZhmaQj/G3L3V3b+WYtbnCPpkW8LDHT8XTn8cmN9/mOMwD/+PQKWZPW9mK4G3EFzt51cWXKD6O8Cnhqlva9jmcYKdgX9w91RDzyb7M+BOM3sKSNX1dFTu3r9j+ikzW0Ww7+OIkAu7nT4A/MjMWgg+BIbq/uj3IEH4rkhxSOrR5sHQf5dvAi8Afwin30XSNwsPLtX3Y4L9Nw8Q7Jzt917gQ+HfaTWDL2e4luCD9mGC0SoPkv77YAHw+/DvfSvBe0IiplE2RSQlM/sOQV/8/VHXIiNHW/oiIjGiLX0RkRjRlr6ISIwo9EVEYkShLyISIwp9EZEYUeiLiMSIQl9EJEb+P+ZLBP6j/4ukAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [1, 2, 3, 4, 5, 10, 20]\n",
"\n",
"metrics = [evaluate_dt(train_dt, test_dt, param, 32) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different tree depths')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Maximum bins"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[0.22578199542260993, 0.22626606160811255, 0.20380255431723798, 0.2076920210675261, 0.212163652773227, 0.21000218813883056, 0.2228581552832826]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEKCAYAAAASByJ7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8VOX5///XlY0EwoRd2UFBEBBZwqLWuqHV2qq1VkVQVFxqq7baWm1rbav126r9VWu1VkHcQHD/iPtura0g+yoKIrJLkH3Jfv3+OCdhEhMyIcskmffz8ciDmTNnzlwnE+Y9933OuW9zd0RERJLiXYCIiDQMCgQREQEUCCIiElIgiIgIoEAQEZGQAkFERAAFgoiIhBQIIiICKBBERCSUEu8CqqNdu3beo0ePeJchItKozJkzZ7O7t69qvUYVCD169GD27NnxLkNEpFExsy9jWU9dRiIiAigQREQkpEAQERFAgSAiIiEFgoiIAAoEEREJKRBERARI8ED47KudvLX0q3iXISLSIDSqC9NqS1Gx89AHK/nbW59SVOwsu+000lISOhtFRBIvEFZt3s0vnlnAnC+30jErnQ3bc9mRW0C7zGbxLk1EJK4SJhDcnckzV/P/XvmE1GTjnvMG4TjXPbWAHXsVCCIiCREIG7fn8qvnFvLBZzkc27sdd54zkI5ZGby7LDh+sCO3MM4ViojEX0IEwk+fnMvS9Tu47awBjB3RDTMDIJKeCsCOvQXxLE9EpEFIiED401kDSE9Npme7FmWWRzLCQMhVIIiIJEQgHN4xUuHyfS0EdRmJiCT0uZaRjCAP1UIQEYkxEMzsVDP71MxWmNlNFTx+vZktNbOFZvaOmXUPlw8ys4/MbEn42HlRz3nUzL4ws/nhz6Da263YZKQmk5JkOoYgIkIMgWBmycD9wGlAP2C0mfUrt9o8INvdBwLPAneGy/cAF7l7f+BU4B4zaxX1vBvcfVD4M7+G+1JtZkYkI1UtBBERYmshDAdWuPtKd88HpgFnRq/g7u+5+57w7gygS7j8M3dfHt5eD2wCqpzXsz5F0lN0DEFEhNgCoTOwJur+2nBZZcYDr5VfaGbDgTTg86jFt4ddSXebWYVXhpnZFWY228xm5+TkxFBu9WSphSAiAsQWCFbBMq9wRbOxQDZwV7nlHYEngEvcvThc/GugLzAMaAPcWNE23f0hd8929+z27Wu/cRHJSGW7jiGIiMQUCGuBrlH3uwDry69kZqOA3wJnuHte1PII8Apws7vPKFnu7hs8kAc8QtA1Ve8i6ak6qCwiQmyBMAvobWY9zSwNOB+YHr2CmQ0GHiQIg01Ry9OAF4DH3f2Zcs/pGP5rwFnA4prsyIGKZKRo6AoREWK4MM3dC83sauANIBmY5O5LzOxWYLa7TyfoIsoEngmHhVjt7mcA5wLfBtqa2cXhJi8OzyiaYmbtCbqk5gM/rt1di41aCCIigZiuVHb3V4FXyy27Jer2qEqeNxmYXMljJ8ZeZt2JZKSSV1hMbkER6anJ8S5HRCRuEvpKZQhOOwXYqW4jEUlwCgQNcCciAigQNAS2iEhIgVA6wJ26jEQksSkQ1EIQEQEUCGTpGIKICKBAKD2orOErRCTRJXwgNEtJIi05SSOeikjCS/hACOZESFGXkYgkvIQPBNDwFSIioEAAoGVGqk47FZGEp0CgZNY0tRBEJLEpEEDzKotIg5WzM4/fv7iY3IKiOn8tBQIlxxDUZSQiDcvmXXlcMGEGT89ey4pNu+r89RQIoLOMRKTB+XpXHmMmzGTN1j1MungYAzpn1flrKhAIWgj54ZwIIiLxtmV3PmMmzmTV17t5eNwwjjq0bb28rgIBDV8hIg3Htj35jJ04k5WbdzNxXDbH9GpXb6+tQCBqTgQdRxCRONq+p4AxE2eyImcXEy7K5tje7ev19RUI7Js1TeMZiUi8bN9bwNiHZ7L8q108eOFQjjusfsMAFAiAZk0TkfjakVvARQ/PZNnGHTwwdggn9OkQlzoUCGhOBBGJn525BYyb9DFLN+zgn2OGctLhB8WtFgUCmjVNROJjV14hFz8yi0Vrt3PfBUM4uV/8wgAgJa6v3kCohSAi9W13XiGXPPIx89ds477Rg/lO/4PjXZJaCADpqcmkpSTpGIKI1Is9+YVc8ugs5q7exr3nD+a0IzrGuyRAgVBKw1eISH3Ym1/EpY/OYvaqLdxz3iBOH9gwwgAUCKU0fIWI1LW9+UWMf2wWH3+xhbvPG8T3j+wU75LK0DGEkCbJEZG6lFtQxOWPz+ajlV/zt3OP5MxBneNd0jeohRCKaJIcEakjJWHw3883c9c5R/KDwV3iXVKFFAihrIxUdqqFICK1LK+wiB9PnsN/lm/mjrMHcs7QhhkGoEAoFUlP0dAVIlKr8gqLuGryXN7/NIe/nH0E5w7rGu+S9kuBECqZNc3d412KiDQB+YXF/HTKXN5dtonbfzCA84d3i3dJVVIghCLpqRQUObkFxfEuRUQauYKiYq5+ci5vf7KJ284awJgR3eNdUkwUCKF9w1eo20hEDlxBUTHXPDmPN5d+xa1n9ufCkY0jDECBUErDV4hITRUWFfPzafN5fclGfv/9flx0VI94l1QtCoSQhsAWkZooLCrm50/N55VFG7j59MO55Jie8S6p2mIKBDM71cw+NbMVZnZTBY9fb2ZLzWyhmb1jZt3D5YPM7CMzWxI+dl7Uc3qa2UwzW25mT5lZWu3tVvWVTJKj4StEpLqKip1fPLOAlxdu4Dff7ctlxx4S75IOSJWBYGbJwP3AaUA/YLSZ9Su32jwg290HAs8Cd4bL9wAXuXt/4FTgHjNrFT52B3C3u/cGtgLja7ozNaEWgogciKJi55fPLODF+eu58dS+XPHtQ+Nd0gGLpYUwHFjh7ivdPR+YBpwZvYK7v+fue8K7M4Au4fLP3H15eHs9sAlob2YGnEgQHgCPAWfVdGdqQscQRKS6ioqdXz27kBfmreOG7/ThquMbbxhAbIHQGVgTdX9tuKwy44HXyi80s+FAGvA50BbY5u4l/TOVbtPMrjCz2WY2OycnJ4ZyD4wmyRGR6igudm56biHPzV3L9Scfxk9P6BXvkmoslkCwCpZVePWWmY0FsoG7yi3vCDwBXOLuxdXZprs/5O7Z7p7dvn3dTTrdLCWZ9NQktRBEpErFxc5vXljEM3PW8rOTenPtSb3jXVKtiGW007VA9PXWXYD15Vcys1HAb4Hj3D0vankEeAW42d1nhIs3A63MLCVsJVS4zfoWSU/V8BUisl/Fxc7NLy5m2qw1XHNiL34+qmmEAcTWQpgF9A7PCkoDzgemR69gZoOBB4Ez3H1T1PI04AXgcXd/pmS5B+NDvAecEy4aB7xYkx2pDSXDV4iIVMTduWX6Yp6cuZqfHH8o1598GMEh0aahykAIv8FfDbwBfAI87e5LzOxWMzsjXO0uIBN4xszmm1lJYJwLfBu4OFw+38wGhY/dCFxvZisIjik8XHu7dWAi6Sk67VREKuTu/GH6EibPWM2Vxx3CDd/p06TCAGKcIMfdXwVeLbfslqjboyp53mRgciWPrSQ4g6nBiGSksmV3frzLEJEGxt259eWlPPbRl1x+bE9uOrVvkwsD0JXKZWjWNBEpz925/ZVPeOS/q7j0mJ785ruHN8kwAAVCGcG8yuoyEpGAu/OX15Yx8cMvuPjoHvzue003DECBUEZJC0FzIoiIu3PnG5/y4Acrueio7vz++/2adBiAAqGMSEYqhcXO3oKieJciInHk7vx/b37GA+9/zpgR3fjjGf2bfBiAAqGMfcNXqNtIJJHd/fZy7ntvBaOHd+W2MwckRBiAAqEMTZIjIn9/ezn3vrOc87K7cvtZR5CUlBhhAAqEMrIyNMCdSCK7793l3P32Z5wztAt/PjuxwgAUCGWUdBlp+AqRxPPP91fw1zc/4+zBnbnjhwMTLgxAgVCG5kQQSUwP/vtz7nz9U84a1Im7fnQkyQkYBqBAKEOzpokknon/WcmfX1vG94/sxF8TOAxAgVBGS02SI5JQHv7wC/70yiecPrAjd597JCnJif2RmNh7X05aShIZqcnqMhJJAI/+9wtue3kppw04mHvOG5TwYQAKhG+IZGjEU5Gm7omPVvGHl5bynf4Hce/owaQqDAAFwjdE0jUngkhTNmXml/zuxSWc3O8g/jF6iMIgin4T5WiSHJGma+rHq/ntC4s5qW8H7r9gCGkp+giMpt9GOZokR6RpenrWGn79/CJO6NOef45VGFREv5Fy1EIQaXqemb2GG59fyHGHteeBsUNplpIc75IaJAVCOZokR6RpeW7OWn713EK+1asdD144lPRUhUFlFAjlZGWksiO3UHMiiDQB/zdvHb98dgFHH9qWCRdlKwyqoEAoJ5KRQlGxsztfcyKINGYvzl/H9U/PZ2TPtky8aJjCIAYKhHIiulpZpNF7eeF6rntqPsN6tOHhi7PJSFMYxEKBUI4GuBNp3F5dtIGfTZtPdvc2TLp4GM3TUuJdUqOhQChHs6aJNF6vL97ItVPnMbhrKyZdMowWzRQG1aFAKKd01jR1GYk0Km8u2cjVT85lYJcsHrlkGJkKg2pTIJRT2kJQl5FIo/H20q/46ZNz6d85i0cvHV46crFUjwKhnIim0RRpVN5btomfTJnL4R0jPH7p8NIvdVJ9CoRyWpZMkpOrYwgiDd37n27iyifmcNjBmTxx6YjSedHlwCgQyklNTqJ5WrJaCCIN3Aef5XDFE3Po1SGTyeNHkNVcYVBTCoQKaAhskYbtw+Wbufzx2RzaPpMpl42gVfO0eJfUJCgQKpCVkarTTkUaqP+t2Mxlj8+iZ7sWTLlsBK1bKAxqiwKhApGMFLary0ikwZmx8msufWwW3do0Z8plI2ijMKhVCoQKqMtIpOH5+IstXPLILLq0bs6Uy0bSNrNZvEtqchQIFdCcCCINy+xVW7j4kY/p1CqdJy8fQfuWCoO6oECogGZNE2k45ny5lXGTPubgSDpTLx9Jh5bp8S6pyVIgVCCSkcrO3AKKizUngkg8zVsdhEH7ls148vKRdIgoDOpSTIFgZqea2admtsLMbqrg8evNbKmZLTSzd8yse9Rjr5vZNjN7udxzHjWzL8xsfvgzqOa7Uzsi6akUO+zOVytBJF4WrNnGRQ9/TNvMNKZeMZKDsxQGda3KQDCzZOB+4DSgHzDazPqVW20ekO3uA4FngTujHrsLuLCSzd/g7oPCn/nVrr6OlA5wp6uVReJi0drtXPjwTFq1SGXq5SPpmJUR75ISQiwthOHACndf6e75wDTgzOgV3P09d98T3p0BdIl67B1gZy3VWy80SY5I/Cxet52xD88kkhGEQadWCoP6EksgdAbWRN1fGy6rzHjgtRhf//awm+luM6vwtAEzu8LMZpvZ7JycnBg3WzMa4E4kPpau38HYh2eS2SyFqZePpEvr5vEuKaHEEghWwbIKj7aa2Vggm6CbqCq/BvoCw4A2wI0VreTuD7l7trtnt2/fPobN1ty+IbDVZSRSXz7ZsIMxE2fQPDWZqZePpGsbhUF9iyUQ1gJdo+53AdaXX8nMRgG/Bc5w97yqNuruGzyQBzxC0DXVIGSphSBSrz7duJMxE2fSLCWZqVeMpFtbhUE8xBIIs4DeZtbTzNKA84Hp0SuY2WDgQYIw2BTLC5tZx/BfA84CFlen8LpUclBZw1eI1L3lX+3kggkzSE02pl4xku5tW8S7pIRV5Rxz7l5oZlcDbwDJwCR3X2JmtwKz3X06QRdRJvBM8PnOanc/A8DM/kPQNZRpZmuB8e7+BjDFzNoTdEnNB35c+7t3YEqm3tPVyiJ1a8WmXYyeMJOkJOPJy0fSs53CIJ5imnTU3V8FXi237Jao26P289xjK1l+Yow11ruU5CQym+lqZZG69HnOLkZPmAHA1MtHcmj7zDhXJLpSuRKR9BS1EETqyBebdzP6oRm4O1MvH0GvDgqDhkCBUIlIRqoOKovUgVVhGBQVO1MuG0nvg1rGuyQJxdRllIg0BLZI7Vv99R5GT5hBflExT14+gj4HKwwaErUQKhHJ0DEEkdq0ZksQBnsLipg8fgR9D47EuyQpR4FQCbUQRGrP2q17OP+hGezKK2TKZSPo10lh0BApECqhYwgitWPdtr2MnjCDnbkFTLlsBP07ZcW7JKmEAqESkfQUduYVak4EkRrYsH0vox+awbY9BUy+bAQDOisMGjIFQiUiGam4wy7NiSByQDZuz2X0QzPYujufJ8aPYGCXVvEuSaqgQKhEyYin2/eo20ikujbtyOWCCTPYvCufx8YPZ1BXhUFjoECoxL4RTxUIItWxaWcu50+YwVc7cnns0mEM6dY63iVJjHQdQiVKZ03TqaciMcvZmccFE2aycXsuj106nKHd28S7JKkGtRAqoRaCSPVs3pXHmIkzWLd1L49cPIxhPRQGjY0CoRKaE0Ekdlt25zN24kxWb9nDpIuHMeKQtvEuSQ6AAqESmjVNJDZbd+dzwYQZfLF5Nw+PG8ZRhyoMGisdQ6hEZnrJMQS1EEQqs21PPmMmzmTl5t08PC6bY3q1i3dJUgNqIVQiOclo2UxDYItUZvueAsY+PJMVObuYcFE2x/aunznPpe4oEPYjGL5CXUYi5W3fW8CFk2by2cZdPHjhUI47TGHQFCgQ9qOlJskR+YYduQVcNOljPtmwgwfGDuGEPh3iXZLUEgXCfmiAuwO3Yftepi9YzycbdlBQVBzvcqSW7MwtYNykj1m6fjv/HDOUkw4/KN4lSS3SQeX9yMpIZc2WPfEuo9H54LMcrp02j23hsB9pKUn0Oagl/TtF6N8pQr9OWRzesSXN0/Tn15jsyivk4kdmsWjtdu4fM4ST+ykMmhr9j9yPSHoqO3XaaczcnQf+/Tl/feNTDjuoJQ+OHcrGHbksXb+DJet38MaSjUybtQaAJIOe7VrQv1NWGBTBv61bpMV5L6Qiu/MKueSRj5m/Zhv3jR7Md/ofHO+SpA4oEPYjmDVNXUax2JVXyC+fXsDrSzby/SM7cccPjyhtAZw5qDMQBMaG7bksWb+DJeu3s2T9DuZ8uZXpC9aXbqdTVjr9SkMiQv/OWXTKSsfM4rJfAnvyC7nk0VnMXb2Ne88fzGlHdIx3SVJHFAj7EUlPZWdeIUXFTnKSPpAq83nOLq58Yg5fbN7Nzacfzvhv9azwA9zM6NQqg06tMsp0N2zdnc/SDftCYsn6Hby77CtKpqJo3TyVflGtiP6dIvRsl6n3pB7szS9i/KOzmb1qC38/fzCnD1QYNGUKhP0oGQJ7V24hWc1T41xNw/Tmko384ukFpKUk8cT44Rx9aPUvTGrdIo1jerUrc1HTnvxClm3cyZL1O1gaBsWj/1tFfmFwgDojNZm+HVuW6W467KCWpKcm19q+JbrcgiIue3wWM7/4mrvPG8T3j+wU75KkjikQ9iNScrVyboECoZyiYueetz/jH++uYGCXLP41diidWmXU2vabp6UwpFvrMkMnFxQV83nOLpas21Ha7fTivPVMnrEagJQko1eHzDKtiX6dIqXDkEjscguKuPzx2fzv86/527lHlnb7SdOmQNiP0kly9hbQNc61NCTb9xTws6fm8f6nOZyb3YVbzxxQL9/MU5OT6HtwhL4HR/jh0GCZu7Nmy96o7qbtfLh8M8/PXVf6vG5tmu87JhEGRYdIep3X21jlFhRxxRNz+HDFZu4650h+MLhLvEuSeqJA2A8Ngf1Nyzbu4Mon5rB+215u/8EALhjeLa4HfM2Mbm2b061t8zIHO3N25pWGxNIwKF5bvLH08XaZzb4REt3aNCcpwY9L5BUWcdXkOXzwWQ53/nAg5wxVGCQSBcJ+aJKcsqYvWM+Nzy4kkpHCtCuOYmj3hjsTVvuWzTi+TweOj7qKdmduAZ9s2Fnm4PV/P1hJYXj0OrNZCv06RsIupyAoeh+USWpyYly/mVdYxE8mz+W9T3P4y9lHcO4wtYsTjQJhP9RCCBQWFfOX15Yx8cMvGNajNfePGUKHlo2vy6VleirDe7ZheM99E7fkFRax/KtdZULi6dlr2JNfBEBachKHHZxJ/45Z9O8cBEXfgyO0aNa0/uvkFxbz0ynzeGfZJm7/wQDOH94t3iVJHDStv+paFtEkOXy9K4+rn5zHRyu/5uKje/Cb7x5OWkrT+cbcLCWZAZ2zGNA5q3RZUbGz6uvdpccklq7fwVuffMVTs4OL6uwbF9UFrYk2jfSiuoKiYq6ZOpe3P/mK287sz5gR3eNdksSJAmE/WjZLwSxxA2HBmm1cNXkOX+/O52/nHsnZQxKjPzk5yTi0fSaHts/kjPBUS3dn447cMmc4zf1yKy9FXVTXMSu9dGiOkqDo3CqjQV9UV1BUzLVT5/HGkq/44xn9ufCoHvEuSeJIgbAfSaVzIiTeMYSnZ63h5hcX0z6zGc9ddXSZb9CJyMzomJVBx6wMRkVdVLdtT37p0Bwl3U7vLttUelFdVkbqNw5eH9K+YVxUV1hUzM+nzee1xRu55Xv9GHd0j3iXJHGmQKhCoo14mldYxB9fWsqTM1fzrV7t+MfowRpfaD9aNU/j6F7tODrqorq9+UUs27ij9JjE0vXbeeyjL0svqktPDU6fjQ6JPgfX70V1hUXFXPf0Al5ZtIGbTz+cS7/Vs95eWxouBUIVIumpCXNQeeP2XK6aMod5q7fx4+MO5Ybv9GkQ32Qbm4y0ZAZ3a83gqIvqCouK+Txnd5nrJaYvWM+UmcFFdclJRq/2maUX0/XvlEW/ThGyMmr/orqiYucXzyzgpQXr+fVpfbns2ENq/TWkcVIgVCEY4K7pdxl9/MUWfjJlLnvyC3lgzBANYFbLUpKT6HNwS/oc3JKzhwTL3J21W/eWPQ328808P2/fRXVd22QEZzh1ioRnOWXRoWWzAz4uUVTs3PDMAl6cv55fndqHK487tDZ2T5qImALBzE4F/g4kAxPd/S/lHr8euAwoBHKAS939y/Cx14GRwIfu/r2o5/QEpgFtgLnAhe6eX+M9qmWR9FRWN+E5Edydx/63ij+98gld2zRn6uUj6H1Qy3iXlRDMjK5tmtO1TXNOHbAvgDfvyitzTGLp+h28viT6orq0siPCdsqiewwX1RUXOzc+t5Dn563jl6ccxk+O71Vn+yaNU5WBYGbJwP3AycBaYJaZTXf3pVGrzQOy3X2PmV0F3AmcFz52F9AcuLLcpu8A7nb3aWb2L2A88ECN9qYONOVjCLkFRfzm+UU8P28dow7vwN/OG6RxfxqAdpnNOO6w9mXmKd6VV8gnG3awZN2+1sTE/6ykoGjfRXWHd2xZ2tXUv1OE3h1alp4iXFzs3PT8Qp6ds5brRh3G1Sf2jsu+ScMWSwthOLDC3VcCmNk04EygNBDc/b2o9WcAY6Mee8fMjo/eoAXt3ROBC8JFjwF/oCEGQnpqkzzLaM2WPfx48hyWbtjB9ScfxtUn9Er4YRsassxmKQzr0YZhPb55Ud3SqNZE9EV1qcnGYQe1pF/HCLvyCnlt8UauPak3PxulMJCKxRIInYE1UffXAiP2s/544LUqttkW2ObuJZ+0a8PX+QYzuwK4AqBbt/q/ejKSkcKuvEIKi4pJaSJDGPxneQ7XTJ1HUbEzadwwTuirSdIbo7IX1QXDTBSXuaguCIp3l21iy558rjmxF9cpDGQ/YgmEir42eoUrmo0FsoHjamub7v4Q8BBAdnZ2hevUpZIulF15hbRq3rhPv3R3/vXvldz1xjJ6d2jJgxcOpUe7FvEuS2pRUpJxSPtMDmmfWTp/gbuzt6BIc1hLlWL5C1kLZUZ/7gKsL7+SmY0Cfgsc5+55VWxzM9DKzFLCVkKF22wI9g1f0bgDYVdeITc8s4DXFm/k9IEdufOHA5vceDxSMTNTGEhMYvkrmQX0Ds8KWgecz76+fwDMbDDwIHCqu2+qaoPu7mb2HnAOwZlG44AXq1l7vSiZJGd7Iz6wvDKc4vLznF389ruHc9mxFU9xKSKJrcpAcPdCM7saeIPgtNNJ7r7EzG4FZrv7dIIziTKBZ8IPmtXufgaAmf0H6AtkmtlaYLy7vwHcCEwzsz8RnKX0cO3vXs2VXBjUWC9Oe2vpV1z/1HxSU5KYPH5EmStqRUSixdSOdPdXgVfLLbsl6vao/Tz32EqWryQ4g6lBa6wjnhYXO/e8s5x731nOEZ2z+NeFQ+lci1NcikjTo47FKkQaYQth+54Cfv7UPN77NIdzhnbhT2fVzxSXItK4KRCqUHIMobEMXxE9xeVtZw1g7Ij4TnEpIo2HAqEKLdJSSLLG0UJ4acF6fvXsQlqmpzDtipEM7d6m6ieJiIQUCFVISjJapjfs4SsKi4q54/VlTPjPF2R3b80/xwyhQ6TxTXEpIvGlQIhBJKPhTpITPcXlRUd15+bT+zWpKS5FpP4oEGIQaaAthIVrt/HjJ4IpLv/6oyM5Z2hiTHEpInVDgRCDhjhJztOz13Dz/2mKSxGpPQqEGEQyUli1uWHMiZBfWMwfX1rClJmrOaZXW/4xeghtNMWliNQCBUIMGkoL4asduVw1eQ5zV2/jyuMO4YZT+jSZEVhFJP4UCDHIykiN+1hGs1YFU1zuzivk/guGcPpATXEpIrVLgRCDSEYqe/KLKCgqJrWev5G7O0/M+JJbX1pKl9YZTB4/gj4Ha4pLEal9CoQYlFytvDO3sF7763MLivjNC4t4fu46TuobTHFZMtieiEhtUyDEIHqAu/oKhLVbgykuF6/bwc9H9ebaE3triksRqVMKhBiUzJpWXweWP1y+mWumzqWw2Hl4XDYnHX5QvbyuiCQ2BUIMomdNq2sffJbDxY98TK8OmTx4YTY9NcWliNQTBUIMIhnhiKd13ELI2ZnH9U8voFeHTJ7/yTFkaopLEalH+sSJQWmXUR2eelpc7PzymQXszC1gymUjFAYiUu90VVMM6mOSnEn//YJ/f5bDzd/rp9NKRSQuFAgxaJGWHMyJUEfHEBav284dry/jlH4HMXZEtzp5DRGRqigQYmBmRDLqZviK3XmFXDN1Hu0ym3HnOQM1u5mIxI06qmNUV8NX/H76Er78ejdPXj6SVs01SJ2IxI9aCDGqizkRXpy/jmeUv2nrAAAPGUlEQVTnrOXqE3sz8pC2tbptEZHqUiDEqLZnTVv99R5++8Jisru35toTe9XadkVEDpQCIUa12UIoKCrmmmnzMIN7zh+kIaxFpEHQMYQY1eacCHe/9RkL1mzj/guG0KV181rZpohITemraYwiGSm1ctrpf1ds5oF/f87o4V01p4GINCgKhBhF0lPZW1BEfmHxAW/j6115XPfUfA5tn8kt3+tfi9WJiNScAiFGJVcr7zzAbiN354ZnF7JtbwH3nj+YjLTk2ixPRKTGFAgx2jfA3YF1Gz36v1W8u2wTvzmtL/06RWqzNBGRWqFAiFFNBrhbsn47f351GSf17cC4o3vUcmUiIrVDgRCjAx3gbk9+IddOnUer5qnc9aMjNTSFiDRYOu00RiUthOoOX3HrS0tZuXk3U8aPqNf5mEVEqksthBhlHcCsaS8vXM+0WWv4yfGHcnSvdnVVmohIrVAgxKi6s6at2bKHXz+/iMHdWvHzUYfVZWkiIrVCgRCjjNRkUpIspoPKhUXF/GzaPHC49/zBpGpoChFpBGL6pDKzU83sUzNbYWY3VfD49Wa21MwWmtk7ZtY96rFxZrY8/BkXtfz9cJvzw58OtbNLdaM6cyL8/Z3lzF29jdvPPoKubTQ0hYg0DlUeVDazZOB+4GRgLTDLzKa7+9Ko1eYB2e6+x8yuAu4EzjOzNsDvgWzAgTnhc7eGzxvj7rNrcX/qVCS96uErPvr8a+57bwU/GtqFM47sVE+ViYjUXCwthOHACndf6e75wDTgzOgV3P09d98T3p0BdAlvfwd4y923hCHwFnBq7ZRe/6pqIWzdnc91T82nZ9sW/OEMDU0hIo1LLIHQGVgTdX9tuKwy44HXYnzuI2F30e+skhP0zewKM5ttZrNzcnJiKLfu7G8IbHfnV88tZMvufO4dPZgWzXRGr4g0LrEEQkUf1F7himZjCbqH7orhuWPc/Qjg2PDnwoq26e4PuXu2u2e3b98+hnLrzv4myZk840veWvoVN57WlwGds+q5MhGRmoslENYCXaPudwHWl1/JzEYBvwXOcPe8qp7r7uvCf3cCTxJ0TTVolbUQlm3cwW2vfMIJfdpz6TE96r8wEZFaEEsgzAJ6m1lPM0sDzgemR69gZoOBBwnCYFPUQ28Ap5hZazNrDZwCvGFmKWbWLnxuKvA9YHHNd6duVXQMYW9+Edc8OY9IuoamEJHGrcqObncvNLOrCT7ck4FJ7r7EzG4FZrv7dIIuokzgmfADcbW7n+HuW8zsNoJQAbg1XNaCIBhSw22+DUyo9b2rZZH0FHILiskrLKJZSjB89W2vLGX5pl08MX447TKbxblCEZEDF9ORT3d/FXi13LJbom6P2s9zJwGTyi3bDQytVqUNQPTwFe1bJvP64g08OXM1Vx53CMf2ju/xDRGRmtKpMNUQPeJpflExv3p2IUd2yeIXJ/eJc2UiIjWnQKiGkhFPt+7O587XP6XY4d7Rg0lL0dAUItL4KRCqoWSAuztf/5SPV23hnvMG0b1tizhXJSJSO/TVthpKWggfr9rC2UM6c9bg/V2fJyLSuCgQqqHkGEKPts259cwBca5GRKR2qcuoGjq0bMZVxx/KWYM6k6mhKUSkidGnWjWYGTee2jfeZYiI1Al1GYmICKBAEBGRkAJBREQABYKIiIQUCCIiAigQREQkpEAQERFAgSAiIiFzr3B65AbJzHKAL6tYrR2wuR7KaWi034lF+51Yarrf3d29yklbGlUgxMLMZrt7drzrqG/a78Si/U4s9bXf6jISERFAgSAiIqGmGAgPxbuAONF+Jxbtd2Kpl/1ucscQRETkwDTFFoKIiByAJhMIZnaqmX1qZivM7KZ411NXzKyrmb1nZp+Y2RIz+1m4vI2ZvWVmy8N/W8e71rpgZslmNs/MXg7v9zSzmeF+P2VmafGusS6YWSsze9bMloXv/VGJ8J6b2XXh3/liM5tqZulN8T03s0lmtsnMFkctq/D9tcC94WfdQjMbUlt1NIlAMLNk4H7gNKAfMNrM+sW3qjpTCPzC3Q8HRgI/Dff1JuAdd+8NvBPeb4p+BnwSdf8O4O5wv7cC4+NSVd37O/C6u/cFjiT4HTTp99zMOgPXAtnuPgBIBs6nab7njwKnlltW2ft7GtA7/LkCeKC2imgSgQAMB1a4+0p3zwemAWfGuaY64e4b3H1ueHsnwQdDZ4L9fSxc7THgrPhUWHfMrAtwOjAxvG/AicCz4SpNdb8jwLeBhwHcPd/dt5EA7znBrI4ZZpYCNAc20ATfc3f/ANhSbnFl7++ZwOMemAG0MrOOtVFHUwmEzsCaqPtrw2VNmpn1AAYDM4GD3H0DBKEBdIhfZXXmHuBXQHF4vy2wzd0Lw/tN9X0/BMgBHgm7yyaaWQua+Hvu7uuAvwKrCYJgOzCHxHjPofL3t84+75pKIFgFy5r06VNmlgk8B/zc3XfEu566ZmbfAza5+5zoxRWs2hTf9xRgCPCAuw8GdtPEuocqEvaZnwn0BDoBLQi6S8priu/5/tTZ331TCYS1QNeo+12A9XGqpc6ZWSpBGExx9+fDxV+VNBvDfzfFq746cgxwhpmtIugSPJGgxdAq7E6Apvu+rwXWuvvM8P6zBAHR1N/zUcAX7p7j7gXA88DRJMZ7DpW/v3X2eddUAmEW0Ds8+yCN4MDT9DjXVCfCfvOHgU/c/W9RD00HxoW3xwEv1ndtdcndf+3uXdy9B8H7+667jwHeA84JV2ty+w3g7huBNWbWJ1x0ErCUJv6eE3QVjTSz5uHffcl+N/n3PFTZ+zsduCg822gksL2ka6mmmsyFaWb2XYJvjMnAJHe/Pc4l1Qkz+xbwH2AR+/rSf0NwHOFpoBvBf6QfuXv5g1RNgpkdD/zS3b9nZocQtBjaAPOAse6eF8/66oKZDSI4mJ4GrAQuIfhC16TfczP7I3Aewdl184DLCPrLm9R7bmZTgeMJRjX9Cvg98H9U8P6G4XgfwVlJe4BL3H12rdTRVAJBRERqpql0GYmISA0pEEREBFAgiIhISIEgIiKAAkFEREIKhARiZm5mT0TdTzGznJKRQ/fzvEHhab2VPZ5tZvfWsLb24QiW88zs2JpsK9xej5KRI6PrM7NmZva2mc03s/PM7NhwNM35ZpZR09fdTz3Hm9nRdbX9Sl5zYrwGeQz3t8K/KzN71cxa1XdNUrWUqleRJmQ3MMDMMtx9L3AysC6G5w0CsoFXyz9gZinhOdA1PQ/6JGCZu4+rcs19r53s7kVVrVeuvsFAqrsPCrfxL+Cv7v5IjK9pBKdrF1e5clnHA7uA/1XzeQfM3S+rr9eqDnev9MuFxJdaCInnNYIRQwFGA1NLHjCzFuG47LPCb+pnhld+3wqcF/Wt+g9m9pCZvQk8Hv1t0MwyzewRM1sUjtX+QwvmMHjUgjHtF5nZddEFhRdd3Ql8t+SbupmNDtddbGZ3RK27y8xuNbOZwFHltjPUzBaY2UfAT6OWH29mL5tZB2AyMCh8nSuBc4FbzGxKuO4N4f4vDC+KKmltfGJm/wTmAl3N7BQz+8jM5prZM+HYUpjZKjP7Y7h8kZn1tWAQwh8D14WvW6YFFP4+HzOzN8Pnn21md4bPf92CoUows1vC2haHv38LW3mzLLhYDzP7s5ndHt5+38yyo35vd5jZnLCFNDx8fKWZnRGuc7GZ3RdV18tR263y+RWImNkLZrbUzP5lZklRv6N2Ub/XCRa00t60sJVmZteGz1toZtMq2b7UNnfXT4L8EHxDHUgwFk46MJ/gm+vL4eP/j+CqT4BWwGcEA4pdDNwXtZ0/EIw6mRHej97GHcA9Ueu2BoYCb0Uta1VBbaWvQTCQ2WqgPUEr9l3grPAxB86tZP8WAseFt+8CFldQX+nt8P6jwDnh7VMI5q41gi9LLxMMO92D4KrwkeF67YAPgBbh/RuBW8Lbq4Brwts/ASZG/c5+WUndfwA+BFIJ5jrYA5wWPvZC1L63iXrOE8D3w9v9CYZBP5ngyt20cPn7BHMJlPzeorf5ZtTrzS//HoT3XwaOj/X55fbpeCCXYKTWZOCtqN/zqvB32IPgCuRB4fKn2ff3tx5oVtnfi37q5kcthATj7gsJ/iOO5ptdQKcAN5nZfIIPk3SCy+YrMt2DbqfyRhFMVlTyelsJhlo4xMz+YWanAlWNzjoMeN+DQc0KgSkEH8wARQQD+5VhZlkEHxz/Dhc9UX6dGJwS/swjaAn0JZiEBOBLD8aeh2Bion7Af8Pf1Tige9R2SgYcnEPwu47Fax4M4LaI4AP09XD5oqhtnGDBcZZFBIP79Qdw9yUE+/sScKkHc4KUl19um/+Oer1YajyQ53/swRwlRQQt0W9VsM4X7j4/vB39+1oITDGzsQShIfVAxxAS03SCceaPJ5hToIQBP3T3T6NXNrMRFWxjdyXbNsoNxevuW83sSOA7BF055wKX7qe+iob3LZHrFR83+MbrHgAD/uzuD5ZZGHT57C633lvuPrqS7ZSMq1NE7P/H8gDcvdjMCjz8akzQMkkxs3TgnwTf+NeY2R8IArvEEcA24KBKtl9+m9GvV1JjIWW7kdOr+fzyyr8fFb0/0WMQFQElB/ZPJ/gScAbwOzPr7/vmQJA6ohZCYpoE3Orui8otfwO4xswMwMwGh8t3Ai1j3PabwNUld8ystZm1A5Lc/TngdwRDN+/PTOC4sJ85maA18+/9PcGDGcS2WzD4H8CYGOuN9gZwadTxgM7hcYfyZgDHmFmvcL3mZnZYFduuzu+wIiUfzpvD+kpG+8TMziYI9m8D99qBn8GziuD4SpKZdSWYibAmhlswAnESwQB1H8bypHD9ru7+HsGESK2AzBrWIjFQICQgd1/r7n+v4KHbCPqFF1pwyuZt4fL3gH7hAdHzqtj8n4DW4YHPBcAJBKNTvh92rzwK/LqK+jaE67wHLADmunssQxxfAtxvwUHlirqz9svd3wSeBD4Ku2WepYIPcXfPIehvn2pmCwkCom8Vm38J+EFFB5VjrG0bMIGgi+b/CIZ8JwzbvwDj3f0zglEwK3pvY/Ff4IvwNf5K0G1WEx+FtS0Ot/tCjM9LBiaH78E8gvmTt9WwFomBRjsVERFALQQREQkpEEREBFAgiIhISIEgIiKAAkFEREIKBBERARQIIiISUiCIiAgA/z8DU+h5A0EqJgAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(train_dt, test_dt, 5, param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different maximum bins')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient BOOSTED TREE"
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.mllib.tree import GradientBoostedTrees, GradientBoostedTreesModel\n"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {},
"outputs": [],
"source": [
"def extract_label(record):\n",
" return float(record[-1])"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [],
"source": [
"data_gbt = records.map(lambda r: LabeledPoint(extract_label(r),extract_features_dt(r)))"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {},
"outputs": [],
"source": [
"(traindata, testData) = data_gbt.randomSplit([0.7, 0.3])"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Gradient BOOSTED predictions: [(307000.0, 265005.81171157816), (118000.0, 133611.14615160643), (279500.0, 188655.2935917073), (149000.0, 126758.74131085054), (139000.0, 129323.12870531864)]\n"
]
}
],
"source": [
"model = GradientBoostedTrees.trainRegressor(traindata,\n",
" categoricalFeaturesInfo={}, numIterations=3)\n",
"preds = model.predict(testData.map(lambda p: p.features))\n",
"actual = testData.map(lambda p: p.label)\n",
"true_vs_predicted_GBT = actual.zip(preds)\n",
"print (\"Gradient BOOSTED predictions: \" + str(true_vs_predicted_GBT.take(5)))\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"413\n",
"log - Mean Squared Error: 1852793539.9857\n",
"log - Mean Absolue Error: 29263.6311\n",
"Root Mean Squared Log Error: 0.2392\n"
]
}
],
"source": [
"nn=[]\n",
"ab=[]\n",
"s_log=[]\n",
"for i in true_vs_predicted_GBT.collect():\n",
" real,predict=i[0],i[1]\n",
" value=(predict - real)**2\n",
" value1=np.abs(predict - real)\n",
" value2=(np.log(predict + 1) - np.log(real + 1))**2\n",
" nn.append(value)\n",
" ab.append(value1)\n",
" s_log.append(value2)\n",
"value_len=len(nn)\n",
"print( value_len)\n",
"ss=sum(nn)\n",
"t=ss/value_len\n",
"ab_sum=sum(ab)\n",
"ab_mean=ab_sum/value_len\n",
"s_log_sum=sum(s_log)\n",
"\n",
"s_log_mean=np.sqrt(s_log_sum/value_len)\n",
"print (\"log - Mean Squared Error: %2.4f\" % t)\n",
"print(\"log - Mean Absolue Error: %2.4f\" % ab_mean)\n",
"print(\"Root Mean Squared Log Error: %2.4f\" % s_log_mean)"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [],
"source": [
"def evaluate_dt(traindata,categoricalFeaturesInfo, loss, numIterations, maxDepth, maxBins):\n",
"\n",
" model = GradientBoostedTrees.trainRegressor(trainingData,categoricalFeaturesInfo, loss,numIterations,maxDepth=maxDepth, maxBins=maxBins)\n",
"\n",
" preds = model.predict(testData.map(lambda p: p.features))\n",
"\n",
" actual = testData.map(lambda p: p.label)\n",
"\n",
" tp = actual.zip(preds)\n",
" new_val=[]\n",
" for i in tp.collect():\n",
" actual=i[0]\n",
" pred=i[1]\n",
" va=(np.log(pred + 1) - np.log(actual + 1))**2\n",
" new_val.append(va)\n",
" lenth=len(new_val)\n",
" s_new_val=sum(new_val)\n",
" mean_new_val=s_new_val/lenth\n",
" rmsle=np.sqrt(mean_new_val)\n",
" return rmsle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient boost tree Iteration"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[0.25905666523741905, 0.2590563768733536, 0.25905580014870655, 0.25905464671334816, 0.259052339898376, 0.2590477264914201, 0.25904253676400585]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZYAAAEVCAYAAADD3MPgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl8VdW5//HPkxEIcwiITGGUwQmJU6syaBW9FrTiFeqALS21am2rbdXbXm+rbX9F20tvq9SitFVbBbS24lCtCjgiEFRABCQMyiSEeSYkeX5/nBU8xHOSE0hyMnzfr1de7LP2Ws9a++SQJ2vtnb3N3REREakuKckegIiINCxKLCIiUq2UWEREpFopsYiISLVSYhERkWqlxCIiItVKiSUGM7vPzJaZ2SIz+4eZtY5Tb42ZLTaz980sP6r8FDObE/Y9a2Ytq2FMN5tZgZm5mbU71ngiIjWl0ScWMxtiZn8pV/wycKK7nwx8BNxZQYih7n6qu+dFlT0M3OHuJwH/AH5YDUN9C7gA+LgaYomI1JhGn1hicfd/u3txePkO0LmKIU4AXg/bLwNXAJhZapgNzQ+zoW9VYUzvufuaKo5DRKTWKbFU7uvAv+Lsc+DfZrbAzMZHlX8AjAjbVwJdwvY4YKe7nw6cDnzTzLrXwJhFRJImLdkDSBYzmwtkAs2Btmb2fth1u7u/FOr8GCgG/hYnzBfdfYOZtQdeNrNl7v46kWT0OzO7C5gBFIX6FwInm9mo8LoV0NvMPgEWxunja+4+/+iPVESkdjXaxOLuZ0LkHAtwvbtfH73fzMYClwLne5wbqrn7hvDvZjP7B3AG8Lq7LyOSRDCzPsB/lIUFvlOWuMo58ViPSUSkLtBSWAxmNhy4HRjh7vvi1MkysxZl20QSyQfhdfvwbwrwE+DB0Owl4Ntmlh729wltRUQaDCWW2O4HWhBZ3nrfzB4EMLPjzeyFUKcD8KaZLQTmAc+7+4th3xgz+whYBmwA/hzKHwY+BN41sw+AP5LgrNHMbjGzdUQuJFhkZg8f81GKiNQA023zRUSkOmnGIiIi1UqJRUREqlWjvCqsXbt2npubm+xhiIjUKwsWLNji7jmV1WuUiSU3N5f8/PzKK4qIyGFmltAtpbQUJiIi1UqJRUREqpUSi4iIVCslFhERqVZKLCIiUq2UWEREpFo1ysuNj1ZxSSkvf7iJ1s0yaJOVTttmGbRulkFGmvKziEgZJZYq2L7vEN/+27ufK8/KSD2cbNqEZNO2WXqkrFk6bbIyPttulkGbrAyyMlIxsyQchYhIzVJiqYLWzdJ54ZZz2bGviO37DrFtXxE79ka2I2WR7bXb9rFtbxG7DhTHjZWeaoeTTXTSad0sg7ZZZWVH7m/dLIPUFCUjEanblFiqID01hf7Ht0y4fnFJKTv3H4pKPIciySdGMlq9ZS/v7tvB9r1FFJfGv+N0q6bpMZNR2czoiGQUZlBN0lOr4/BFRBKixFKD0lJTyG6eSXbzzITbuDt7DhazoywJlSWgvUVsi0pQO/YVUbjnIB9t2sOOfUXsLSqJG7NJesrh80FtsmInpbZZGXRt24wubZuRnqpzRiJy9JRY6hgzo0WTdFo0SadL22YJtztYXPJZMtp75Axpx74itu39bIa0cceuSPn+Q5R/HE9qitG1bTN6tMuie7ssuudk0aNdc3rkZNG+RabOC4lIpZRYGojMtFQ6tEylQ8smCbcpLXV2HTjEtr1FbNtbxJqt+1i9ZQ+rt+xlVeFe3izYwsHi0sP1szJS6Z6TRfd2zeneLoueOSH5tMuiRZP0mjgsEamHEn0s7nDg/4BU4GF3/1W5/ZnAo8AgYCtwlbuvCfvuBMYBJcAt7v5SRTHNrDswFWgLvAtc6+5FZnY9cB+wPnR7v7s/HNp0JfLY3y6AA5eU9S/xpaRELiBo3SyDHjmQl9v2iP2lpc7GXQdYXbiXVVv2sKpwL6u37OX9tdt5btGGI2Y77Zpn0iMn6/BMp0dOJPl0bdtMl2OLNDKVJhYzSwUeAL4ErAPmm9kMd/8wqto4YLu79zKz0cAE4Coz6w+MBgYAxwOvmFmf0CZezAnARHefGp41Pw74Q2gzzd1vjjHMR4FfuPvLZtYcKI1RR6ooJcXo1LopnVo35Zze7Y7Yd+BQCWu37WNlSDarQ+J5+cNNbN1bdLheaorRpU3TMLNp/lnyycniuJZNtLQm0gAlMmM5Ayhw91UAZjYVGAlEJ5aRwE/D9lPA/Rb5iTESmOruB4HVZlYQ4hErppktBYYBXw11HglxyxLL54TklebuLwO4+54EjkmOUZP0VHp3aEHvDi0+t2/nvkOsCktqq7fsZVVYWpuzaisHDn2W85ump0adx8miR9QyW6umWloTqa8SSSydgLVRr9cBZ8ar4+7FZrYTyA7l75Rr2ylsx4qZDexw9+IY9QGuMLPzgI+A77v7WqAPsMPMnga6A68Ad7h7/MukpEa1apbOwK5tGNi1zRHlpaXOpt0HWFUYSTZlS2wfrN/JvxZvpPSIpbWMw+dvypbVerTLomt2MzLTdPm0SF2WSGKJtVZR/g8t4tWJVx5r0b2i+gDPAk+4+0Ezu4HIbGYYkWM4FxgIfAJMA64HphwxQLPxwHiArl27xuhGalpKitGxVVM6tmrKF3sdubRWVFzKJ9v2Hj6PU/bvzGWFTM9f91kMg85tmh1OOj3LZjk5WXRs2YQU/QGpSNIlkljWETkpXqYzsCFOnXVmlga0ArZV0jZW+RagtZmlhVnL4fruvjWq/kNEzsWU9f1e1LLaP4GzKJdY3H0yMBkgLy8v/l8gSlJkpKXQq30LerX//NLargOHWF0YvawWWWabv2Yb+6L+fqdJegq52VnhPE64cq19c048viVp+tsckVqTSGKZD/QOV2utJ3Iy/qvl6swAxgJzgFHATHd3M5sBPG5m/0vk5H1vYB6RmcnnYoY2s0KMqSHmMwBm1tHdN4b+RgBLo8bXxsxy3L2QyCxGD7RvQFo2SeeULq05pUvrI8rdnU27Dh4+n1M2y1m6cTcvLdlESVhba9kkjSEntOf8fu0Z0qc9rZrp/I1ITao0sYRzJjcDLxG5NPhP7r7EzO4G8t19BpHZwWPh5Pw2IomCUG86kRP9xcBNZec+YsUMXd4OTDWznwPv8dnM4xYzGxHibCOy3IW7l5jZD4BXwwUDC4jMaKSBMzOOa9WE41o14Qs9P7+0tnb7PpZu3MXs5YXMWraZGQs3kJpi5HVrwwX9OjCsX3t65jRP0uhFGi7z8n963Qjk5eV5fr4mNY1JSanz/todzFy2iVeXbmbZp7sB6N4ui2F9I7OZ03Pb6nY2IhUwswXunldpPSUWaYzWbd/HzGWbeXXpZuas3EpRSSktmqQxuE8OF/TrwOA+ObTJykj2MEXqFCWWCiixSLS9B4t5Y8UWZi7bxMxlm9myp4gUg7xubTm/X2Q20zOnuf6YUxo9JZYKKLFIPKWlzsJ1O5i5bDOvLN3M0o27AOiW3Yzz+3Y4vGSm29RIY6TEUgElFknUhh37eXXZZmYu3cRbK7dSVFxKi8w0zuuTE7nK7IT2tNWSmTQSSiwVUGKRo7GvqJg3V2yJnJtZtpnC3QdJMTitaxvO7xeZzfRuryUzabiUWCqgxCLHqrTU+WDDTl5ZuplXl25iyYbIklmXtk0PL5md2T1bS2bSoCixVECJRarbpzsP8OqyTcxcuvnwc2yaZ6Zxbu92nN+vA0NPyKnSk0RF6iIllgoosUhN2l9UwlsFWyLnZpZtYtOug5jBwC6tDy+ZndChhZbMpN5RYqmAEovUFndnyYZdvLI0cinzonU7AejUumm4lLkDZ/Voqzs2S72gxFIBJRZJlk27Dhz+w8w3Cwo5cKiUZhmpUUtm7clpoSUzqZuUWCqgxCJ1wYFDJby9cguvLt3MzGWb2bjzAGZwSufWXNCvPcP6dqBfRy2ZSd2hxFIBJRapa9ydDzfu4tVwldnCqCWzYX3bM6xfe87ukU2TdC2ZSfIosVRAiUXqus27DzArLJm9sWIL+w+V0DQ9lXN6t+OCfu0Z2rc97Vs0SfYwpZFRYqmAEovUJwcOlfDOqq2HZzMbdh4gNcW49qxu3HphH1o20fNlpHYosVRAiUXqK3dn2ae7+es7H/P4vE/Izsrkzov78pXTOulcjNS4RBOL/ixYpB4xM/p1bMkvLj+JZ28+hy5tm3Lbkwu58sE5fBj++l8k2ZRYROqpEzu14u83fIF7R53Mqi17ufT3b/DTGUvYuf9QsocmjZwSi0g9lpJi/GdeF2bdNoRrzurGo3PWcP5vZvNk/lpKSxvfMrfUDUosIg1Aq2bp3D3yRGbcfA5d2zbjh08tYtSDb/PB+p3JHpo0QgklFjMbbmbLzazAzO6IsT/TzKaF/XPNLDdq352hfLmZXVRZTDPrHmKsCDEzQvn1ZlZoZu+Hr2+UG0NLM1tvZvdX/W0QaRhO7NSKp274AveNOpmPt+5jxP1vctczH7Bzn5bHpPZUmljMLBV4ALgY6A+MMbP+5aqNA7a7ey9gIjAhtO0PjAYGAMOBSWaWWknMCcBEd+8NbA+xy0xz91PD18PlxnAP8FqCxy3SYKWkGFfmdWHmD4Zw3dm5/PWdjxn2m9lMn6/lMakdicxYzgAK3H2VuxcBU4GR5eqMBB4J208B51vk2seRwFR3P+juq4GCEC9mzNBmWIhBiHlZZQM0s0FAB+DfCRyPSKPQqmk6Px0xgGe/cw7d22Xxo78v4gotj0ktSCSxdALWRr1eF8pi1nH3YmAnkF1B23jl2cCOECNWX1eY2SIze8rMugCYWQrwG+CHCRyLSKMz4PhWPHnD2fzmylNYu20/X77/TX7yz8Xs2FeU7KFJA5VIYon1V1fl59Px6lRXOcCzQK67nwy8wmczpBuBF9x9bYy2nw3QbLyZ5ZtZfmFhYUVVRRocM+OKQZ2Z+YPBXP+FXB6f+wlDfz2bqfM+0fKYVLtEEss6oEvU687Ahnh1zCwNaAVsq6BtvPItQOsQ44i+3H2rux8M5Q8Bg8L22cDNZrYG+DVwnZn9qvxBuPtkd89z97ycnJwEDluk4WnZJJ3/+fIAnr/lXHq1b84dTy/m8j+8zaJ1O5I9NGlAEkks84He4WqtDCIn42eUqzMDGBu2RwEzPXKvmBnA6HDVWHegNzAvXszQZlaIQYj5DICZdYzqbwSwFMDdr3b3ru6eC/wAeNTdP3flmoh8pl/Hlkz/1tlMvOoUNuzYz8gH3uK//rGY7Xu1PCbHLq2yCu5ebGY3Ay8BqcCf3H2Jmd0N5Lv7DGAK8JiZFRCZqYwObZeY2XTgQ6AYuMndSwBixQxd3g5MNbOfA++F2AC3mNmIEGcbcP0xH71II2ZmXD6wMxf068BvX1nBX95ewwuLN/Kji/py1eldSE3Rvcfk6OgmlCICwLJPd3HXM0uYt3obJ3duxT0jT+SULq2TPSypQ3QTShGpkr7HtWTa+LP4v9Gn8unOA1w26S3ufHoR27Q8JlWkxCIih5kZI0/txKu3DWbcF7szPX8dw34zm7++8zElunpMEqTEIiKf06JJOj+5tD//+u659D2uBT/55wdc9sBbvPfJ9mQPTeoBJRYRiatPhxY88c2z+N2YgWzefYDLJ73N7U8tYuueg5U3lkZLiUVEKmRmjDjleF69bQjjz+vB399dx7DfvMZjc9ZoeUxiUmIRkYQ0z0zjvy7px7++ey4Djm/Jfz+zhJEPvMmCj7U8JkdSYhGRKundoQV/+8aZ/H7MQLbsLuKKP7zND59cyBYtj0mgxCIiVWZmfPmU43n1tsF8a3AP/vHeeob9ejaPztHymCixiMgxyMpM486L+/Hi987j5M6tueuZJXz592+y4ONtyR6aJJESi4gcs17tm/PYuDOYdPVpbN9XxBV/mMNt0xdSuFvLY42REouIVAsz45KTOvLKrYP59pCezFi4nmG/mc2f31pNcUlpsocntUiJRUSqVVZmGrcP78uL3zuPU7u05mfPfsilv3+T+Wu0PNZYKLGISI3omdOcR79+Bg9ecxq79h/iygfncOu099m8+0CyhyY1TIlFRGqMmTH8xI68cttgbhrak+cWbeT8X7/GlDe1PNaQKbGISI1rlpHGDy/qy4vfO5eB3dpwz3OR5bG5q7Yme2hSA5RYRKTW9MhpziNfO50HrxnE7gPFXDX5Hb439T0279LyWEOixCIitSqyPHYcr9w6mO8M68ULiz9l2G9e4+E3VnFIy2MNghKLiCRF04xUbrvwBF76/nnk5bbh588v5cu/f1Mn9xsAJRYRSaru7bL48/Wn88drB/Hx1n2Mf3QBBw6VJHtYcgwSSixmNtzMlptZgZndEWN/pplNC/vnmllu1L47Q/lyM7uosphm1j3EWBFiZoTy682s0MzeD1/fCOWnmtkcM1tiZovM7KqjfztEJBnMjIsGHMfEq07h/bU7uP3vi3DXPcfqq0oTi5mlAg8AFwP9gTFm1r9ctXHAdnfvBUwEJoS2/YHRwABgODDJzFIriTkBmOjuvYHtIXaZae5+avh6OJTtA65z97I+fmtmrav0LohInTD8xI788KITeOb9DUyavTLZw5GjlMiM5QygwN1XuXsRMBUYWa7OSOCRsP0UcL6ZWSif6u4H3X01UBDixYwZ2gwLMQgxL6tocO7+kbuvCNsbgM1ATgLHJSJ10I1DenLZqcdz30vLefGDjckejhyFRBJLJ2Bt1Ot1oSxmHXcvBnYC2RW0jVeeDewIMWL1dUVY7nrKzLqUH6iZnQFkAPpVR6SeMjN+dcXJDOzamu9PW8gH63cme0hSRYkkFotRVn7xM16d6ioHeBbIdfeTgVf4bIYUGYBZR+Ax4Gvu/rlrFs1svJnlm1l+YWFhjG5EpK5okp7K5GvzaNMsnW88kq+/c6lnEkks64Do2UFnYEO8OmaWBrQCtlXQNl75FqB1iHFEX+6+1d3L7sH9EDCorLGZtQSeB37i7u/EOgh3n+zuee6el5OjlTKRui6nRSYPjz2dXQcO8c1H83WlWD2SSGKZD/QOV2tlEDkZP6NcnRnA2LA9CpjpkUs6ZgCjw1Vj3YHewLx4MUObWSEGIeYzcHhGUmYEsDSUZwD/AB519ycTP3QRqev6H9+SiVedyqL1O/nBkwt1pVg9UWliCec7bgZeIvLDfLq7LzGzu81sRKg2Bcg2swLgVuCO0HYJMB34EHgRuMndS+LFDLFuB24NsbJDbIBbwiXFC4FbgOtD+X8C5wHXR12KfOpRvh8iUsdcNOA4fnRRX55btJHfvVqQ7OFIAqwx/gaQl5fn+fn5yR6GiCTI3bntyYU8/e567v/qQC49+fhkD6lRMrMF7p5XWT395b2I1Hlmxv/7yknkdWvDbdMXsmjdjmQPSSqgxCIi9UJmWioPXjuIds0z+eaj+Xy6U1eK1VVKLCJSb7RrnsmU6/PYc6CYbz6az/4iXSlWFymxiEi90ve4lvxuzEA+2LCT2558n9LSxneeuK5TYhGReuf8fh34r4v78cLiT/ntqyuSPRwpJ63yKiIidc83zu3OR5t287tXV9AzJ4uRp5a/05Qki2YsIlIvmRk/v/xEzshtyw+fWsR7n2xP9pAkUGIRkXorMy2VP1xzGh1aZjL+sQVs2LE/2UMSlFhEpJ7Lbp7JlLGns7+ohG88ks++ouLKG0mNUmIRkXqvT4cW/P6rA1n26S6+P01XiiWbEouINAhDT2jPj/+jPy8t2cT/vvxRsofTqOmqMBFpML7+xVwKNu/m/lkF9GrfnMsG6kqxZNCMRUQaDDPjZyNO5KwebfnR3xex4GNdKZYMSiwi0qBkpKXwh6sH0bFVE771WD7rtu9L9pAaHSUWEWlw2mRlMGVsHgcPlfKNR/LZe1BXitUmJRYRaZB6tW/B/VefxkebdvPdqbpSrDYpsYhIgzW4Tw53XdqfV5Zu4t6Xlid7OI2GrgoTkQZt7BdyWbF5Dw++tpJe7ZszalDnZA+pwdOMRUQaNDPjpyMG8IWe2fzX04vJX7Mt2UNq8BJKLGY23MyWm1mBmd0RY3+mmU0L++eaWW7UvjtD+XIzu6iymGbWPcRYEWJmhPLrzazQzN4PX9+IajM21F9hZmOP7q0QkYYqPTWFSVefRqc2TfnWYwtYu01XitWkShOLmaUCDwAXA/2BMWbWv1y1ccB2d+8FTAQmhLb9gdHAAGA4MMnMUiuJOQGY6O69ge0hdplp7n5q+Ho49NEW+B/gTOAM4H/MrE0V3wcRaeBaN8vg4bF5HCqJXCm2R1eK1ZhEZixnAAXuvsrdi4CpwMhydUYCj4Ttp4DzzcxC+VR3P+juq4GCEC9mzNBmWIhBiHlZJeO7CHjZ3be5+3bgZSJJTETkCD1zmjPp6kEUFO7hu0+8R4muFKsRiSSWTsDaqNfrQlnMOu5eDOwEsitoG688G9gRYsTq6wozW2RmT5lZlyqMT0QEgHN6t+OnX+7Pq8s2M+HFZckeToOUSGKxGGXl03y8OtVVDvAskOvuJwOv8NkMKZHxYWbjzSzfzPILCwtjNBGRxuLas3O57uxuTH59FdPnr628gVRJIollHdAl6nVnYEO8OmaWBrQCtlXQNl75FqB1iHFEX+6+1d0PhvKHgEFVGB/uPtnd89w9Lycnp5JDFpGG7q5L+3NOr3b8+J+Lmbtqa7KH06AkkljmA73D1VoZRE7GzyhXZwZQdjXWKGCmu3soHx2uGusO9AbmxYsZ2swKMQgxnwEws45R/Y0Alobtl4ALzaxNOGl/YSgTEYkrLTWFB756Gl3aNuOGvy7gk626Uqy6VJpYwvmOm4n8sF4KTHf3JWZ2t5mNCNWmANlmVgDcCtwR2i4BpgMfAi8CN7l7SbyYIdbtwK0hVnaIDXCLmS0xs4XALcD1oY9twD1EktV84O5QJiJSoVbN0pky9nRKHcY9Mp9dBw4le0gNgkUmCY1LXl6e5+fnJ3sYIlJHvL1yC9dNmcc5vdsxZezppKbEOnUrZrbA3fMqq6e/vBeRRu8LPdvxs5EDmL28kF++sLTyBlIh3StMRAS4+sxuFGzew5Q3V9OrfXPGnNE12UOqtzRjEREJfnxJPwb3yeG///kBc1bqSrGjpcQiIhKkpabw+68OJLddFt/+2wLWbNmb7CHVS0osIiJRWjZJZ8rYyPnpcY/MZ+d+XSlWVUosIiLldMvO4sFrBvHx1n3c/Pi7FJeUJntI9YoSi4hIDGf1yOYXl5/IGyu28PPndaVYVeiqMBGROK46vSsrNu3h4TdX07N9c649q1uyh1QvaMYiIlKBOy/px9ATcvjpjCW8uWJLsodTLyixiIhUIDXF+N2YgfTMyeLGvy1gVeGeZA+pzlNiERGpRIsmkXuKpaWm8I1H8tm5T1eKVUSJRUQkAV3aNuOP1w5i7fZ93Pj4Ag7pSrG4lFhERBJ0em5bfnn5SbxVsJW7n/0w2cOps3RVmIhIFVyZ14WCzXv44+ur6N2hOdednZvsIdU5SiwiIlX0o+F9WVm4h589+yG52Vmc10dPpY2mpTARkSpKTTF+O3ogvds356bH36Vgs64Ui6bEIiJyFJpnpvHw2Dwy01IY98h8tu8tSvaQ6gwlFhGRo9S5TeRKsY07DvDtvy2gqFhXioESi4jIMRnUrS2/uuIk3lm1jf+ZsYTG+Lj38hJKLGY23MyWm1mBmd0RY3+mmU0L++eaWW7UvjtD+XIzu6iymGbWPcRYEWJmlOtrlJm5meWF1+lm9oiZLTazpWZ2Z9XfBhGRo/eV0zpz45CePDHvE/7y9ppkDyfpKk0sZpYKPABcDPQHxphZ/3LVxgHb3b0XMBGYENr2B0YDA4DhwCQzS60k5gRgorv3BraH2GVjaQHcAsyN6vtKINPdTwIGAd+KTmwiIrXhBxeewIX9O3DPcx8ye/nmZA8nqRKZsZwBFLj7KncvAqYCI8vVGQk8ErafAs43MwvlU939oLuvBgpCvJgxQ5thIQYh5mVR/dwD3AsciCpzIMvM0oCmQBGwK4HjEhGpNikpxsSrTuWE41ryncffY8Wm3ckeUtIkklg6AWujXq8LZTHruHsxsBPIrqBtvPJsYEeIcURfZjYQ6OLuz5Xr+ylgL7AR+AT4tbtvS+C4RESqVVbZlWLpqYx7JJ9tjfRKsUQSi8UoK392Kl6daik3sxQiS2y3xdh/BlACHA90B24zsx7lK5nZeDPLN7P8wsLCGGFERI5dp9ZNmXzdID7ddYAb/to4rxRLJLGsA7pEve4MbIhXJyxJtQK2VdA2XvkWoHWIEV3eAjgRmG1ma4CzgBnhBP5XgRfd/ZC7bwbeAvLKH4S7T3b3PHfPy8nRX8mKSM05rWsb7ht1MvNWb+Mn/1zc6K4USySxzAd6h6u1MoicjJ9Rrs4MYGzYHgXM9Mg7OQMYHa4a6w70BubFixnazAoxCDGfcfed7t7O3XPdPRd4Bxjh7vlElr+GWUQWkaSz7CjeCxGRajPy1E58Z1gvpuevY8qbq5M9nFpVaWIJ5ztuBl4ClgLT3X2Jmd1tZiNCtSlAtpkVALcCd4S2S4DpwIfAi8BN7l4SL2aIdTtwa4iVHWJX5AGgOfABkYT1Z3dflNDRi4jUoO9f0IeLTzyOX76wlJWN6AFh1timaAB5eXmen5+f7GGISCNQuPsg50yYyYhTjue+K09J9nCOiZktcPfPnWooT395LyJSg3JaZDL69C784731rN+xP9nDqRVKLCIiNWz84J4ATH5tZZJHUjuUWEREalin1k25fGAnps5fS+Hug8keTo1TYhERqQXfHtKTopJS/vRWw79CTIlFRKQW9MhpziUndeSxOR+zc9+hZA+nRimxiIjUkpuG9GLPwWIenbMm2UOpUUosIiK1pP/xLRnWtz1/ems1ew8WV96gnlJiERGpRTcN7cn2fYd4Yt4nyR5KjVFiERGpRYO6teWsHm156I1VHCwuSfZwaoQSi4hILbtpaC827TrI3xesT/ZQaoQSi4hILTunVztO6dyKB19bSXFJw7utvhKLiEgtMzNuHNqLT7bt47lFG5M9nGqnxCIikgRf6teBPh2aM2l2AaWlDetmwEosIiJJkJJi3DikFx9t2sPLSzdH5ZhwAAAUOElEQVQlezjVSolFRCRJLj25I13bNmPSrIIG9ZRJJRYRkSRJS03hhsE9WbhuJ28VbE32cKqNEouISBJdMagTHVpmcv+sFckeSrVRYhERSaLMtFS+eW4P3lm1jQUfb0v2cKqFEouISJJ99cyutGmWzgOzGsaDwBJKLGY23MyWm1mBmd0RY3+mmU0L++eaWW7UvjtD+XIzu6iymGbWPcRYEWJmlOtrlJm5meVFlZ1sZnPMbImZLTazJlV7G0REkqdZRhpf/2J3Zi7bzJINO5M9nGNWaWIxs1TgAeBioD8wxsz6l6s2Dtju7r2AicCE0LY/MBoYAAwHJplZaiUxJwAT3b03sD3ELhtLC+AWYG5UWRrwV+AGdx8ADAEa9sMORKTBue7sXJpnpjFpdv2ftSQyYzkDKHD3Ve5eBEwFRparMxJ4JGw/BZxvZhbKp7r7QXdfDRSEeDFjhjbDQgxCzMui+rkHuBc4EFV2IbDI3RcCuPtWd2+Yd3YTkQarVbN0rjmrGy8s3siqwj3JHs4xSSSxdALWRr1eF8pi1nH3YmAnkF1B23jl2cCOEOOIvsxsINDF3Z8r13cfwM3sJTN718x+lMAxiYjUOePO6U5GagoPvla/Zy2JJBaLUVb+L3ni1amWcjNLIbLEdluM/WnAOcDV4d/Lzez88pXMbLyZ5ZtZfmFhYYwwIiLJldMik9Gnd+Hpd9ezfsf+ZA/nqCWSWNYBXaJedwY2xKsTznm0ArZV0DZe+RagdYgRXd4COBGYbWZrgLOAGeEE/jrgNXff4u77gBeA08ofhLtPdvc8d8/LyclJ4LBFRGrf+ME9AXjo9VVJHsnRSySxzAd6h6u1MoicjJ9Rrs4MYGzYHgXM9Mj9CWYAo8NVY92B3sC8eDFDm1khBiHmM+6+093buXuuu+cC7wAj3D0feAk42cyahYQ0GPjwKN4LEZGk69S6KZcP7MQT8z6hcPfBZA/nqFSaWML5jpuJ/ABfCkx39yVmdreZjQjVpgDZZlYA3ArcEdouAaYT+UH/InCTu5fEixli3Q7cGmJlh9gVjW878L9EktX7wLvu/nyib4CISF3z7SE9KSop5U9vrU72UI6KNaQbnyUqLy/P8/Pzkz0MEZG4bnr8XV5bXshbdwyjVdP0ZA8HADNb4O55ldXTX96LiNRBNw7pyZ6DxTz69ppkD6XKlFhEROqgAce3Yljf9vzprdXsKyquvEEdosQiIlJH3TS0J9v3HeLxuZ8keyhVosQiIlJHDerWlrN6tOWhN1ZxsLj+3FBEiUVEpA67aWgvNu06yNPvrk/2UBKmxCIiUoed06sdp3RuxR9mr6S4pDTZw0mIEouISB1mZtw4tBefbNvH84s3Jns4CVFiERGp477UrwN9OjTngVkFlJbW/b89VGIREanjUlKMG4f04qNNe3hl6aZkD6dSSiwiIvXApSd3pGvbZjwwq4C6fscUJRYRkXogLTWFGwb3ZOG6nbxVsDXZw6mQEouISD1xxaBOdGiZyQOzCpI9lAopsYiI1BOZaal889wezFm1lQUfb0/2cOJSYhERqUfGnNGVNs3SmVSHZy1KLCIi9UhWZhpf+2J3Xl22mQ837Er2cGJSYhERqWfGnp1L88w0Js2um7MWJRYRkXqmVbN0rjmrG88v3siqwj3JHs7nKLGIiNRD487pTkZqCg++tjLZQ/kcJRYRkXoop0Umo0/vwtPvrmf9jv3JHs4REkosZjbczJabWYGZ3RFjf6aZTQv755pZbtS+O0P5cjO7qLKYZtY9xFgRYmaU62uUmbmZ5ZUr72pme8zsB4kfvohI/TV+cE8AHnp9VZJHcqRKE4uZpQIPABcD/YExZta/XLVxwHZ37wVMBCaEtv2B0cAAYDgwycxSK4k5AZjo7r2B7SF22VhaALcAc2MMdSLwr0QOWkSkIejUuimXD+zEE/M+Ycueg8kezmGJzFjOAArcfZW7FwFTgZHl6owEHgnbTwHnm5mF8qnuftDdVwMFIV7MmKHNsBCDEPOyqH7uAe4FDkR3bmaXAauAJQkcj4hIg3HDkJ4UlZTypzdXJ3sohyWSWDoBa6NerwtlMeu4ezGwE8iuoG288mxgR4hxRF9mNhDo4u7PRXdsZlnA7cDPEjgWEZEGpWdOcy45qSOPzfmYnfsPJXs4QGKJxWKUlb+1Zrw61VJuZilElrpui7H/Z0SWziq85s7MxptZvpnlFxYWVlRVRKReuXFIT3YfLOaxOWuSPRQgscSyDugS9bozsCFeHTNLA1oB2ypoG698C9A6xIgubwGcCMw2szXAWcCMcAL/TODeUP494L/M7ObyB+Huk909z93zcnJyEjhsEZH6YcDxrRjWtz1T3lzNvqLiyhvUsEQSy3ygd7haK4PIyfgZ5erMAMaG7VHATI88MGAGMDpcNdYd6A3MixcztJkVYhBiPuPuO929nbvnunsu8A4wwt3z3f3cqPLfAr909/uP5s0QEamvbhrak+37DvHEvLWVV65hlSaWcL7jZuAlYCkw3d2XmNndZjYiVJsCZJtZAXArcEdouwSYDnwIvAjc5O4l8WKGWLcDt4ZY2SG2iIhUYFC3tpzZvS2TX1/JweKSpI7F6vqTyGpCXl6e5+fnJ3sYIiLV6o0VhVw7ZR7/7ysnMeaMrtUe38wWuHteZfX0l/ciIg3EOb3acXLnVjz42kqKS0qTNg4lFhGRBsLMuGloLz7euo/nF29M2jiUWEREGpAv9etAnw7NmTRrJaWlyTnVocQiItKApKQYNw7pxfJNu3ll6abkjCEpvYqISI259OSOdGnblAdmryQZF2gpsYiINDBpqSncMLgnC9fu4O2VW2u9fyUWEZEGaNSgzrRvkcn9M2v/8cVKLCIiDVBmWirjz+vBnFVbWfDx9lrtW4lFRKSBGnNGV9o0S2fSrNqdtSixiIg0UFmZaXzti915ddlmPtywq9b6VWIREWnAxp6dS/PMNP7w2spa61OJRUSkAWvVLJ1rzurG84s2sHrL3lrpU4lFRKSBG3dOd9JTU3hwdu3MWpRYREQauJwWmYw+vQtPv7eODTv213h/SiwiIo3A+ME9cYfJr6+q8b6UWEREGoFOrZty+cBObNixv8Zv85JWeRUREWkIfvmVk0hPrfn5hGYsIiKNRG0kFVBiERGRapZQYjGz4Wa23MwKzOyOGPszzWxa2D/XzHKj9t0Zypeb2UWVxTSz7iHGihAzo1xfo8zMzSwvvP6SmS0ws8Xh32FVfxtERKS6VJpYzCwVeAC4GOgPjDGz/uWqjQO2u3svYCIwIbTtD4wGBgDDgUlmllpJzAnARHfvDWwPscvG0gK4BZgb1fcW4MvufhIwFngs8cMXEZHqlsiM5QygwN1XuXsRMBUYWa7OSOCRsP0UcL6ZWSif6u4H3X01UBDixYwZ2gwLMQgxL4vq5x7gXuBAWYG7v+fuG8LLJUATM8tM4LhERKQGJJJYOgFro16vC2Ux67h7MbATyK6gbbzybGBHiHFEX2Y2EOji7s9VMNYrgPfc/WACxyUiIjUgkcuNLUZZ+Yug49WJVx4rocWtb2YpRJbYro87SLMBRJbRLoyzfzwwHqBr167xwoiIyDFKZMayDugS9bozsCFeHTNLA1oB2ypoG698C9A6xIgubwGcCMw2szXAWcCMqBP4nYF/ANe5e8yb4bj7ZHfPc/e8nJycBA5bRESORiIzlvlAbzPrDqwncjL+q+XqzCBy4nwOMAqY6e5uZjOAx83sf4Hjgd7APCIzk8/FDG1mhRhTQ8xn3H0n0K6sMzObDfzA3fPNrDXwPHCnu7+VyEEvWLBgi5l9nEjdOFoRWe5LhprsuzpiH22MqrarSv1E6lZWpx2RX3waGn2Wqz9GQ/4sd0uolrtX+gVcAnwErAR+HMruBkaE7SbAk0ROzs8DekS1/XFotxy4uKKYobxHiFEQYmbGGM9sIC9s/wTYC7wf9dU+keM62i9gck3GT1bf1RH7aGNUtV1V6idSt7I6QH6yvuc1+aXPcvXH0GfZE7uli7u/ALxQruyuqO0DwJVx2v4C+EUiMUP5KiJXjVU0niFR2z8Hfl7hAVS/Z2u5v9rquzpiH22MqrarSv1E6ibze5pM+ixXf4xG/1m2kMFEpAJmlu/ueckeh8ixqo3Psm7pIpKYyckegEg1qfHPsmYsIiJSrTRjERGRaqXEIiIi1UqJRUREqpUSi8hRMLMeZjbFzJ6qvLZI3WVml5nZQ2b2jJnFvCVWVSmxiARm9icz22xmH5Qr/9yzgzxyZ+5xsSOJJFcVP8v/dPdvErkX41XV0b8Si8hn/kLkuUGHJfg8IpG65i9U/bP8k7D/mCmxiATu/jqRm6dGS+R5RCJ1SlU+yxYxAfiXu79bHf0rsYhULOazg8ws28weBAaa2Z3JGZpIlcR7DtZ3gAuAUWZ2Q3V0lNC9wkQasZjPCHL3rUC1/CcUqSXxPsu/A35XnR1pxiJSsUSeRyRSH9TaZ1mJRaRih59HZGYZRJ4dNCPJYxI5GrX2WVZiEQnM7AkiD6s7wczWmdk4dy8GbgZeApYC0919STLHKVKZZH+WdRNKERGpVpqxiIhItVJiERGRaqXEIiIi1UqJRUREqpUSi4iIVCslFhERqVZKLAKAmbmZPRb1Os3MCs3suUranWpml1SwP8/Mjul2EWaWY2Zzzew9Mzv3WGJVNzO728wuSPY4KmJmfzGzUbXQz5VmttTMZpUrP77suTWVfV6Oos/WZnZjrL4keZRYpMxe4EQzaxpefwlYn0C7U4GYPyjMLM3d8939lmMc2/nAMncf6O5vJNIg3CK8WphZ3Hvquftd7v5KdfVV11TxfRwH3OjuQ6ML3X2Du5cltriflwrGUNE9DVsDhxNLub4kSZRYJNq/gP8I22OAJ8p2mFlWeHjQ/DBzGBluC3E3cJWZvW9mV5nZT81sspn9G3jUzIaUzXrMrLmZ/dnMFpvZIjO7wsxSw2/UH4Ty70cPyMxOBe4FLgl9NDWzMaHuB+F232V194QZxFzg7KjyfmY2L+p1rpktCtt3hWP6IIzbQvlsM/ulmb0G/NjMVptZetjX0szWmFl69GwglP3MzN4N4+sbynPM7OVQ/kcz+9jM2pV/88P4f2FmC83sHTPrEMqPmHGY2Z7w7xAze83MppvZR2b2KzO72szmhf57RoW/wMzeCPUuDe1Tzey+cPyLzOxbUXFnmdnjwOIY4/zc+29mdwHnAA+a2X3l6ueGurE+L5/7XIU215vZk2b2LPDv8Nl5Neq9LXt0wa+AniHefWV9hRhNoj5v75nZ0KjYT5vZi2a2wszujXo/4n4WpQrcXV/6AtgDnAw8BTQB3geGAM+F/b8ErgnbrYGPgCwiT527PyrOT4EFQNPwOjrGBOC3UXXbAIOAl6PKWscY2+E+gOOBT4AcInfnnglcFvY58J9xju99oEfYvh34SdhuG1XnMeDLYXs2MClq35+j+hkP/CZs/wUYFbbXAN8J2zcCD4ft+4E7w/bwMM52McboUf3fGzXGw32Ufa+i3tsdQEcgk8gM82dh33fL3uvQ/kUiv0j2JnIzwibhOMr6yATyge4h7l6ge4wxVvT+zwbyYrTJBT4o/71M4HO1ruz7E/pqGbbbAQVE7tZ7OHaMvm4D/hy2+4ZxNwmxVwGtwuuPidycsdLPor4S+9KMRQ5z90VE/mOOAV4ot/tC4A4ze5/ID5AmQNc4oWa4+/4Y5RcQ9YQ6d99O5D94DzP7vZkNB3ZVMszTgdnuXuiRex/9DTgv7CsB/h6n3XTgP8P2VcC0sD3UIudvFgPDgAFRbaZFbT8MfC1sf41Ioonl6fDvAiLvJUR+k58K4O4vAtvjtC0Cys5pRbevyHx33+juB4GVwL9D+eJy7ae7e6m7ryDynvcl8j29LnxP5wLZRBIPwDx3Xx2jv4re/6NR0efqZXcve1iVAb8MM81XiDxHpEMlsc8h8ssC7r6MSALpE/a96u473f0A8CHQjap/FiUOPY9FypsB/JrIb63ZUeUGXOHuy6Mrm9mZMWLsjRPbiPxWfpi7bzezU4CLgJuI/PD/egXji/VMiTIH3L0kzr5pwJNm9nSkW19hZk2ASUR+y15rZj8l8oPtc8fh7m+FZZbBQKq7H/Es8SgHw78lfPb/q6IxRzvk4Vflcu2LCcvWYakuI0Z/AKVRr0s58v93+ZsCehjXd9z9pegdZjaEir+H1amiz1X0GK4mMksa5O6HzGwNR36v4sWOJ/p9KwHSjuKzKHFoxiLl/Qm4293Lr62/BHwn6hzEwFC+G2iRYOx/E7m7KiFGm3CuIcXd/w78N3BaJTHmAoPNrJ1FTiyPAV6rrGN3X0nkB8h/89lMpOwH0xYzaw5UdtL3USLnneLNVuJ5kzBbMrMLiSwBVsUaIss0EHkscnoV2wNcaWYp4bxLD2A5ke/pt6POHfUxs6xK4hzV+x+l/Ocl3ueqvFbA5pBUhhKZYcSKF+11IgkJM+tDZCa0PE5djuKzKHEoscgR3H2du/9fjF33EPmBtiicHL0nlM8C+pedjK0k/M+BNuHk6EJgKJEljdlhKeQvQIWP+XX3jaHOLGAh8K67P5PY0TENuIbIshjuvgN4iMiy0T+JPK+iIn8jkhSeqKReeT8DLjSzd4GLgY1EfiAm6iEiP8znAeV/k0/UciIJ4F/ADWEJ6GEiy0Dvhu/pH6lkFeMY33/4/Ocl3ueqvL8BeWaWTyRZLAvj2Qq8FT5T95VrMwlIDcuc04Drw5JhPFX6LEp8um2+SIIscmXWSHe/tortMoESdy82s7OBP7j7qTUySJE6QOdYRBJgZr8nMts4mj/u6wpMN7MUIifov1mdYxOpazRjERGRaqVzLCIiUq2UWEREpFopsYiISLVSYhERkWqlxCIiItVKiUVERKrV/weuMWy8DWomwwAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(traindata, {},'leastAbsoluteError', param,3, 32) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"\n",
"fig = matplotlib.pyplot.gcf()\n",
"pyplot.xlabel('Metrics for varying number of iterations')\n",
"pyplot.xscale('log')"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 4, 8, 16, 32, 64, 100]\n",
"[0.24489669490739654, 0.26140602081099523, 0.2619618739499482, 0.25816082247564837, 0.25905551178812486, 0.25776353461608653, 0.25866605672527904]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEKCAYAAAAMzhLIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xt8XWWd7/HPN0mTtE3btCTl0qQ3KEKVQiXDQZCRQVBATmGOjsDIjCgcPCpn5sjRIx6UmUGdEdBx9IgOqHgbFQVvFYuAUBRHYFooFGgFS0EaykAKtPRCdnP5nT/Ws9OV3Z1kt03a0nzfr9d+da9nPWutZ+2dPr/9XNZaigjMzMwGUrWnC2BmZns3BwozMxuUA4WZmQ3KgcLMzAblQGFmZoNyoDAzs0E5UJiZ2aAcKMzMbFAOFGZmNqiaSjJJOhX4AlANfC0iPlOy/hLgQqAb6ADeGxF/TOumA18DWoEATo+IpyR9F2gDuoD/AN4XEV2STgR+BjyZdv/jiLhisPI1NTXFzJkzKzkVMzNL7r///nUR0TxUviEDhaRq4BrgFKAdWCJpYUSsyGVbBrRFxBZJ7weuAs5O674NfDoibpfUAPSm9O8C56X33yMLNF9Jy3dHxBlDla1o5syZLF26tNLsZmYGSPpjJfkq6Xo6BlgVEasjYitwA3BmPkNELI6ILWnxXqAlFWIuUBMRt6d8m4r5ImJRJGQtipZKCmxmZrtXJYFiGrAmt9ye0gZyAXBLen8osF7SjyUtk3R1aqH0kTQG+Cvgl7nkN0h6SNItkl5bQRnNzGyEVDJGoTJpZW85K+k8snGHN+X2fwIwH3ga+AFwPvD13GZfBn4TEXen5QeAGRGxSdLpwE+BOWWOdRFwEcD06dMrOA0zM9sZlbQo2skGootagLWlmSSdDFwGLIiIQm7bZanbqpus0n99bpu/A5qBS4ppEfFyRGxK7xcBYyQ1lR4vIq6LiLaIaGtuHnIsxszMdlIlgWIJMEfSLEm1wDnAwnwGSfOBa8mCxPMl206WVKzJTwJWpG0uBN4KnBsRvbl9HSBJ6f0xqYwv7MzJmZnZrhuy6ykiuiVdDNxKNj32+oh4VNIVwNKIWAhcDTQAN6Y6/umIWBARPZI+DNyRKv/7ga+mXf8r8EfgnrRNcRrsO4D3S+oGXgHOCT9dycxsj9G+UAe3tbWFp8eame0YSfdHRNtQ+Xxl9jCICH614jn+fdW6PV0UM7NhV9GV2VZeRPDrxzv43G2P8/AzGxg7pprFHz6RAybV7+mimZkNG7codtK9q1/gndfew/nfWMJLW7by8bcdTk9v8NnbHtvTRTMzG1ZuUeygB9es53O3Pcbdf1jH/hPr+ORZr+PstlZqa6ro2FjgurtXc/5xM3ndtEl7uqhmZsPCgaJCK599mc/d9ji/WvkcU8bX8vG3Hc55x86gfsy2C80/eNIh3Hh/O5+8eQU3XHQsaTaXmdmrmgPFEJ7o2MTnb3+cm5c/y4T6Gj78lkM5//hZNNRt/9FNrB/Dh045lE/89BFuffQ5Tn3dAXugxGZmw8uBYgBrXtzCF+/4Az96oJ36MdVc/GeH8N9PmM2kcWMG3e7cP2nl2797in+6ZSUnHTaV2hoPA5nZq5sDRRkvd3bxti/eTWd3L+85fhbvP/FgmhrqKtq2prqKy952OOd/YwnfvucpLjxh9sgW1sxshPnnbhkdGwu83NnNP/75EXzijLkVB4miE18zlT89tJkv3PEHXty8dYRKaWa2ezhQlFHoym49VW4colIff9vhbC5084VfPT5cxTIz2yMcKMoodPcAUDdm5z+eQ/efwLnHTOff7nuaVc9vGq6imZntdg4UZXSmFkXdLg5Ef+iUQxk3ppp/XLRyOIplZrZHOFCUUWxR5K+R2BlNDXV88KRDuPP3z3P3HzqGo2hmZrudA0UZhe7haVEAnH/cTFqnjOXTv1hJT++r/069Zjb6OFCU0dmVxihqdq1FAVmr5NJTD+f3/7mRHy5dM/QGZmZ7GQeKMoazRQFw+hEH0DZjMp+77TE2FbqHZZ9mZruLA0UZxUCxq2MURZL4xBlzWbdpK19evGpY9mlmtrs4UJRR6Nr16bGljmxt5M/nT+Nrv32S9pe2DNt+zcxGmgNFGcPd9VT0kbe+hirBlb/0MyvM7NXDgaKMQlcPEtRWD+/Hc1DjWC46YTY/f2gt9//xpWHdt5nZSKmoJpR0qqTHJK2SdGmZ9ZdIWiFpuaQ7JM3IrZsu6TZJK1OemSl9lqT7JP1B0g8k1ab0urS8Kq2fORwnuiMK3b3U1VSNyPMk3vemg5k6oY5P/WIFEZ4ua2Z7vyEDhaRq4BrgNGAucK6kuSXZlgFtETEPuAm4Krfu28DVEXE4cAzwfEq/Evh8RMwBXgIuSOkXAC9FxCHA51O+3aqzq2dYpsaWM76uhg+/9TUse3o9P1/+7Igcw8xsOFXSojgGWBURqyNiK3ADcGY+Q0QsjojiCO29QAtACig1EXF7yrcpIrYo+6l+EllQAfgWcFZ6f2ZaJq1/s3bzo+IK3b3UD+NAdqm3v76FuQdO5Mpbft93zYaZ2d6qktpwGpC/Uqw9pQ3kAuCW9P5QYL2kH0taJunq1ELZD1gfEcWLCvL77DteWr8h5d9tsq6nkWlRAFRXiY+fcTjPrH+Fr//2yRE7jpnZcKgkUJT7NV+2c13SeUAbcHVKqgFOAD4M/AkwGzh/iH1WdDxJF0laKmlpR8fw3kcp63oa2XH+4w5u4pS5+/Plxavo2FgY0WOZme2KSmrDdqA1t9wCrC3NJOlk4DJgQUQUctsuS91W3cBPgdcD64BGSTVl9tl3vLR+EvBi6fEi4rqIaIuItubm5gpOo3JZ19PItSiKPnbaYRS6e/nn2z1d1sz2XpUEiiXAnDRLqRY4B1iYzyBpPnAtWZB4vmTbyZKKNflJwIrIpvssBt6R0t8N/Cy9X5iWSevvjN08PajQPfItCoDZzQ389Rtm8oMla1j57Msjfjwzs50xZG2YWgIXA7cCK4EfRsSjkq6QtCBluxpoAG6U9KCkhWnbHrJupzskPUzWrfTVtM1HgUskrSIbg/h6Sv86sF9KvwTYbjruSOvs6h3Wq7IH8zdvPoQJ9WP49C9Werqsme2VKnrWZ0QsAhaVpF2ee3/yINveDswrk76abEZVaXon8BeVlGukFLp7aBw7Zrccq3FcLf/r5Dn8w89XsPix5znpsP13y3HNzCrlK7PLKOzGFgXAecfOYHbTeD71i5V09fTutuOamVXCgaKMzu6Ru+CunDHVVfzf0w9ndcdmvnff07vtuGZmlXCgKKPQ1btbBrPz3nz4VI47eD8+/6vH2bCla7ce28xsMA4UZeyu6bF5kvj42+ay4ZUu/t+df9itxzYzG4wDRRm744K7cuYeNJF3Ht3Kt+55iqfWbd7txzczK6eiWU+jSUT03T12T/jfbz2Uny9fyz/dspJr/6ptj5RhuHV29bDy2ZdZ3r6Bh9as57HnNtLUUMfs5vHMbhrP7OYGZjWN54CJ9VRV7dbbeplZBRwoSmxNs47qdnPXU9HUCfV84MSD+extj3Pv6hc4dvZuvc3VLuvpDVY9v4mH1qznofb1LG/fwO//82W6erJrRJoa6jj8wAms21RgyVMvsmXrtpsijh1Tzcym8bkAMp7ZTQ3Mah7PxPrdM13ZzLbnQFFipJ5utyMuPGE237vvaT71ixUs/OAb99pf2RFB+0uv8OCa9SxvX89D7Rt45JkNfZX/hLoajmiZxIUnzObIlknMa2nkwEn1fc/5iAiee7nA6o5NrF63mdUdm3ly3SYeeWYDtzz8LL256w+bGur6gsesXCtk+pRx1O7B78psNHCgKNHZ97zsPdOiAKgfU81HTzuMv73hQX687BnecXTLHitLXsfGQl9AeCgFh5fSDK3amirmHjiRd7a1Mq9lEke2NjJrv/GDBjlJHDCpngMm1XPcIU391m3t7uXpFzfzRMdmnly3mdUdm3hy3WZuX/EcL2ze2pevukq0Th7L7OYGZjeNZ1ZqhcxuHs/UCXUj8vAps9HGgaJEoStrUdTv4V+pC448iG/8+1NcfevvOf2IAxhXu3u/qo2dXTz8zIa+cYXl7Rt4Zv0rAFQJ5kydwClz92deSyNHtTZy6P4ThvWXfW1NFYdMncAhUydst27Dli5Wr9uUAkgWSJ7o2MTvnlhHZ9e2CxbH11b3BY5ZqTVycHMDM5vG01DnP317ders6mHdpgIdG7NX65RxHH7gxBE9pv+3lOjretqDLQrIfm1/4ozDeftX7uHaX6/mQ6ccOmLHKnT3sPLZjSxvX5+6kTbwRMcmireemj5lHPOnN3L+cTM5srWR1x40kfF7sKKdNG4M86dPZv70yf3Se3uDZ1/u7Gt9rO7YzOp1m3ng6Zf4+fK15G+ltf/Eur4urPx4SMvksdQM87PSzYbS0xu8uHlrVvnngsC25c6+5Zc7u/tt+74/ne1Asbv1dT3tBf3eR8+YwhnzDuTa3zzBOce0cuCksbu8z57e4ImOTX3jCsvbN7Dy2fxgcy1HtjTyX+cdxJGt2bjClPG1u3zc3aGqSkxrHMu0xrGcMKf/rec7u3r44wtbeHLdJp7o2DYesujhZ1mfu8BxTLWYPmUcs5oaOLhkPKSpodZdWVaxiODlzu6SCr80AGSvFzcX+o3JFTXU1dA8oY7mhjoOO2AibzykNlsuvhrqaZm86/XCUBwoShRbFLv7gruBfPTUw7htxXNcfetj/PM7j9qhbYuDzcXZRw+tWc8jz2xgcxpsbqir4Yhpk3jvG2dxVEsj81obOSg32LwvqR9TzWsOmMBrDti+K+ulzVtZvW5TXwvkyY7NrF63id883tE3Cw5gQn1N33Te/HjIrKbxjK3dO/5ebOR1dvXQsbHA8yUV/roygWBr9/b3bhtTLZobsop+WmM9R7VO6lvOB4CmCbW7vct5IHtHKfYihe69p0UB0DplHBe8cRZfuesJzj9uJvNaGgfMu25TGmxes6Fv0PnFNPBbW13F4QdN5O1Ht3BkSyNHtk5idlPDXjujaneaPL6Wo8dP4egZU/ql9/QGa9e/whMd/cdD7lv9Aj9Z9ky/vAdNqi87HnJQ41iq/Rnv9bp7enlx89as8h/gl/+6tLyx0L3d9hLsN76WplThz24e39cSKFb+U1MAmDi25lX3Y8yBokRxMHtvCRQAHzjxYH64ZA2funklP3jfsUhiU6Gbh9uLASELDsXBZgnmTG3gzYdNZV5rI0e2TOKwAyZ6GukOqq4SrVPG0TplHCe+pv+6V7b28OS6bTOyVq/LWiM/ffAZNub6kGtrqpi537h+XVhZl1bDq6ZL79UqItjwSldlXT9btlLucTAT6rd1/cw9aCLNE+r6gkExfeqEOqaMr92nx7YcKEoUWxR7S9cTwIT6MVzylkO57CePcOG3lvL0i1tYlRtsbpk8lqOmN/Lu42Ywr6WR102b5Fk9I2xsbTVzD5rI3IP6DyJGBC9s3to3BlLszlr1/Cbu/P3zfWNBAI3jxmRdWGk6b7Fba8Z+4/aqv7+9zZat3QNW+P26fzYV+n3eRbU1VUxNFX7rlHG8fsbkMl0/2b/+HjKuTUrsDRfclXN2Wys3Lm3nwTXrObK1kbfNO5AjWxqZ1zKJ/Rrq9nTxLJFEU0NWCR0zq39XVndPL+0vvbLdeMhvV3Xwowfac/uAaY1jU+tjW1fW7OYGDtxHb3PS1dPLC5uKs346BwwEHRsLfWNseVWC/Rq2VfBz9p+wXddP8TWh7tXX9bOnOVCU2BsuuCunprqKn37weCLCf+SvUjXVVcxsGs/MpvGcdFj/dZsK3TyVuq9Wd2zqGw+5cemafhVj/ZgqZu6Xu71JbmrvpHF7121OenuD9f26fgYOAC8NcGv9SWPH9FX481oat/vFX+wGmjK+1mNBI8iBosTe2qIocpDYNzXU1fC6aZN43bRJ/dIjgo6Nhe2uUF/57EZuffQ5enJzKvcbX7vdLU4Obh5P65Rxw/Ygrohg89ae/hX+xk46NhVYt3Hrdt1A3WXmfNaPqWLqhHqaJ2TXshwzawrNDfXb/fJvaqjdrQ8Qs4E5UJTouzJ7L2tR2OgkiakT65k6sZ43HNz/BpFbu3tZ89KW7cZD7vx9Bz9cuq0rq0rZ7LlZTdtusnhwCib7T8xuc1Lo7mHdpq19M3sGG/h9pWv7rp/qKtHUUNv3a//wA/NdP/2DwPjaav/geZWpKFBIOhX4AlANfC0iPlOy/hLgQqAb6ADeGxF/TOt6gIdT1qcjYkFKvxsoTmqfCvxHRJwl6UTgZ8CTad2PI+KKnTu9Hbc3XXBnNpjamioObm7g4OYGYP9+617u7OLJXCvkiTQect/qF/tV9ONqqxlTXcWGV8p3/UweN6avgn/99MaSAd9srn9zQx2Tx9Xuk2MnlhkyUEiqBq4BTgHagSWSFkbEily2ZUBbRGyR9H7gKuDstO6ViNjuSrGIOCF3jB+RBYeiuyPijB0+m2FQ6O6lSlDjP3p7FZtYP4YjWxs5srX/dTe9vcFzGzv7Wh+rOzbR0xtlB333G1/nKdUGVNaiOAZYFRGrASTdAJwJ9AWKiFicy38vcF6lBZA0ATgJeE+l24ykQncP9WPcNLZ9U1WVOHDSWA6cNJbjS+7YazaQSn4uTAPW5JbbU9pALgBuyS3XS1oq6V5JZ5XJ/+fAHRHxci7tDZIeknSLpNdWUMZhsyefbmdmtjeqpEVR7qd1mWsYQdJ5QBvwplzy9IhYK2k2cKekhyPiidz6c4Gv5ZYfAGZExCZJpwM/BeaUOdZFwEUA06dPr+A0KpM9L9sD2WZmRZX8dG4HWnPLLcDa0kySTgYuAxZERKGYHhFr07+rgbuA+blt9iPr2vpFLv/LEbEpvV8EjJG0XRs5Iq6LiLaIaGtubi5dvdMK3b3Uj3GLwsysqJIacQkwR9IsSbXAOcDCfAZJ84FryYLE87n0yZLq0vsm4HhyYxvAXwA3R0RnbpsDlAYIJB2TyvjCzpzczih09bpFYWaWM2TXU0R0S7oYuJVseuz1EfGopCuApRGxELgaaABuTHV8cRrs4cC1knrJKvzPlMyWOgfoN9UWeAfwfkndwCvAORHlbtc1Mjq7e6hzi8LMrE9F11GkLqBFJWmX596fPMB2vwOOGGS/J5ZJ+xLwpUrKNRIKXb3Uu0VhZtbHP51LFNyiMDPrxzViic4uT481M8tzjVii0O3psWZmeQ4UJQrdve56MjPLcY1YotPTY83M+nGgKJF1PfljMTMrco1YIrsy2y0KM7MiB4qc3t5gq28KaGbWj2vEnK096TGoHsw2M+vjGjGn7zGoHsw2M+vjQJFT6E6PQXWLwsysj2vEnM7UovD0WDOzbRwocootCj+PwsxsG9eIOYVutyjMzEo5UOR0dqUxCk+PNTPr4xoxp9ii8AV3ZmbbOFDk9M16covCzKyPa8ScvllPHsw2M+vjGjFnW4vCXU9mZkUOFDl9V2a7RWFm1qeiGlHSqZIek7RK0qVl1l8iaYWk5ZLukDQjt65H0oPptTCX/k1JT+bWHZXSJemL6VjLJb1+OE60EttmPblFYWZWVDNUBknVwDXAKUA7sETSwohYkcu2DGiLiC2S3g9cBZyd1r0SEUcNsPuPRMRNJWmnAXPS678AX0n/jrht11G4RWFmVlRJjXgMsCoiVkfEVuAG4Mx8hohYHBFb0uK9QMsulOlM4NuRuRdolHTgLuyvYg4UZmbbq6RGnAasyS23p7SBXADckluul7RU0r2SzirJ++nUvfR5SXU7ebxh09nVQ02VqKl2oDAzK6qkRlSZtCibUToPaAOuziVPj4g24C+Bf5F0cEr/GHAY8CfAFOCjO3I8SRelALS0o6OjgtMYWsEPLTIz204ltWI70JpbbgHWlmaSdDJwGbAgIgrF9IhYm/5dDdwFzE/Lz6bupQLwDbIuroqPFxHXRURbRLQ1NzdXcBpDK3T3+KpsM7MSlQSKJcAcSbMk1QLnAAvzGSTNB64lCxLP59InF7uUJDUBxwMr0vKB6V8BZwGPpM0WAn+dZj8dC2yIiGd34RwrVuhyi8LMrNSQs54iolvSxcCtQDVwfUQ8KukKYGlELCTramoAbszqfZ6OiAXA4cC1knrJgtJncrOlviupmayr6UHgf6T0RcDpwCpgC/Ce4TnVoXV291LnFoWZWT9DBgqAiFhEVoHn0y7PvT95gO1+BxwxwLqTBkgP4IOVlGu4Fbp63KIwMyvhWjGn4BaFmdl2HChyOt2iMDPbjmvFnEJ3r2c9mZmVcKDI8XUUZmbbc62Y48FsM7PtuVbMyVoU7noyM8tzoMjJrsz2R2JmludaMaezyy0KM7NSDhQ5he4ePy/bzKyEa8Wkpzfo6gnq3aIwM+vHgSIpdKfHoLpFYWbWj2vFpNDlp9uZmZXjWjEpPgbVV2abmfXnQJH0dT25RWFm1o9rxaSzr+vJLQozszwHiqTYovAFd2Zm/blWTIpjFG5RmJn150CRdHZ5eqyZWTmuFRNPjzUzK8+1YuLpsWZm5VUUKCSdKukxSaskXVpm/SWSVkhaLukOSTNy63okPZheC3Pp3037fETS9ZLGpPQTJW3IbXP5cJzoUPq6ntyiMDPrZ8haUVI1cA1wGjAXOFfS3JJsy4C2iJgH3ARclVv3SkQclV4LcunfBQ4DjgDGAhfm1t2d2+aKHT6rneDBbDOz8ir5+XwMsCoiVkfEVuAG4Mx8hohYHBFb0uK9QMtQO42IRZEA/1HJNiPJ02PNzMqrpFacBqzJLbentIFcANySW66XtFTSvZLOKs2cupz+CvhlLvkNkh6SdIuk11ZQxl3mC+7MzMqrqSCPyqRF2YzSeUAb8KZc8vSIWCtpNnCnpIcj4onc+i8Dv4mIu9PyA8CMiNgk6XTgp8CcMse6CLgIYPr06RWcxuB8Cw8zs/IqqRXbgdbccguwtjSTpJOBy4AFEVEopkfE2vTvauAuYH5um78DmoFLcvlfjohN6f0iYIykptLjRcR1EdEWEW3Nzc0VnMbgCt291FZXUVVVLi6amY1elQSKJcAcSbMk1QLnAAvzGSTNB64lCxLP59InS6pL75uA44EVaflC4K3AuRHRm9vmAElK749JZXxh50+xMp1dPW5NmJmVMWTXU0R0S7oYuBWoBq6PiEclXQEsjYiFwNVAA3BjquOfTjOcDgeuldRLVuF/JiJWpF3/K/BH4J60zY/TDKd3AO+X1A28ApyTBrxHVKG711dlm5mVUckYRbELaFFJ2uW59ycPsN3vyKa/lltX9tgR8SXgS5WUazgVuno9kG1mVoZ/QieF7h63KMzMynDNmHS6RWFmVpYDRVLo7vHFdmZmZbhmTArdvZ71ZGZWhmvGpNDV464nM7MyHCgStyjMzMpzzZgUunv9LAozszIcKBJfmW1mVp5rxsRXZpuZleeaMSl09VDvwWwzs+04UCSdblGYmZXlmhHo7umlpzc8PdbMrAwHCrY9L9tXZpuZbc81I9mMJ/BjUM3MynGgYFuLwtNjzcy255qRfNeTWxRmZqUcKMh3PfnjMDMr5ZqRXNeTB7PNzLbjmpHsYjvAF9yZmZXhQIFbFGZmg6moZpR0qqTHJK2SdGmZ9ZdIWiFpuaQ7JM3IreuR9GB6Lcylz5J0n6Q/SPqBpNqUXpeWV6X1M3f9NAfn6bFmZgMbMlBIqgauAU4D5gLnSppbkm0Z0BYR84CbgKty616JiKPSa0Eu/Urg8xExB3gJuCClXwC8FBGHAJ9P+UaUp8eamQ2skprxGGBVRKyOiK3ADcCZ+QwRsTgitqTFe4GWwXYoScBJZEEF4FvAWen9mWmZtP7NKf+I8fRYM7OBVRIopgFrcsvtKW0gFwC35JbrJS2VdK+kYjDYD1gfEd1l9tl3vLR+Q8o/Yjw91sxsYDUV5Cn3az7KZpTOA9qAN+WSp0fEWkmzgTslPQy8PMg+KzqepIuAiwCmT58+cOkrsK3ryS0KM7NSlfyEbgdac8stwNrSTJJOBi4DFkREoZgeEWvTv6uBu4D5wDqgUVIxUOX32Xe8tH4S8GLp8SLiuohoi4i25ubmCk5jYIXu1KLwrCczs+1UUjMuAeakWUq1wDnAwnwGSfOBa8mCxPO59MmS6tL7JuB4YEVEBLAYeEfK+m7gZ+n9wrRMWn9nyj9iOrs8mG1mNpAhu54iolvSxcCtQDVwfUQ8KukKYGlELASuBhqAG9O489NphtPhwLWSesmC0mciYkXa9UeBGyR9imzW1NdT+teB70haRdaSOGeYznVAhe4eamuqGOExczOzV6VKxiiIiEXAopK0y3PvTx5gu98BRwywbjXZjKrS9E7gLyop13ApdPVS79aEmVlZrh3JWhR1nhprZlaWAwVZi8LjE2Zm5bl2JJse64vtzMzKc6Agu+DOLQozs/JcO5K1KBwozMzKc+1INpjtriczs/IcKHCLwsxsMK4dKY5RuEVhZlaOAwWpReH7PJmZleXakeKV2W5RmJmV40ABdHb3uEVhZjYA1474ymwzs8GM+toxIjw91sxsEKM+UHT1BL3hZ1GYmQ1k1NeOfU+382C2mVlZDhTpedn1Hsw2Mytr1NeOnV1uUZiZDWbUB4pii8LTY83Myhv1tWOhKwUKtyjMzMoa9YGisziY7RaFmVlZFdWOkk6V9JikVZIuLbP+EkkrJC2XdIekGSXrJ0p6RtKX0vIESQ/mXusk/Utad76kjty6C4fjRAeyrUXhQGFmVk7NUBkkVQPXAKcA7cASSQsjYkUu2zKgLSK2SHo/cBVwdm79J4FfFxciYiNwVO4Y9wM/zuX/QURcvBPns8M8PdbMbHCV/Iw+BlgVEasjYitwA3BmPkNELI6ILWnxXqCluE7S0cD+wG3ldi5pDjAVuHvHi7/rPD3WzGxwldSO04A1ueX2lDaQC4BbACRVAZ8DPjJI/nPJWhCRS3t76sa6SVJrBWXcaZ4ea2Y2uEoChcqkRZk0JJ0HtAFXp6QPAIsiYk25/Mk5wPdzyz8HZkbEPOBXwLcGONZFkpZKWtrR0THEKQysb3qsxyjMzMoacoyCrAWR/1XfAqwtzSTpZOAy4E02DTZpAAAONklEQVQRUUjJbwBOkPQBoAGolbQpIi5N2xwJ1ETE/cX9RMQLud1+FbiyXKEi4jrgOoC2traygasS27qe3KIwMyunkkCxBJgjaRbwDFkL4C/zGSTNB64FTo2I54vpEfGuXJ7zyQa887OmzqV/awJJB0bEs2lxAbCy4rPZCYUuT481MxvMkIEiIrolXQzcClQD10fEo5KuAJZGxEKyrqYG4EZJAE9HxIIKjv9O4PSStL+RtADoBl4Ezq/0ZHaGu57MzAZXSYuCiFgELCpJuzz3/uQK9vFN4JslabPL5PsY8LFKyjUcCl09SFBb7UBhZlbOqK8dO7uzp9ullpCZmZUY9YGi0NXjqbFmZoNwoOju9cV2ZmaDGPU1ZKdbFGZmgxr1gaKQxijMzKy8UV9DZl1PblGYmQ1k1AeKrOtp1H8MZmYDGvU1ZKG711dlm5kNYtTXkIVuD2abmQ3GgaLL02PNzAYz6mvITrcozMwGNeoDRaHL02PNzAYz6mtIT481MxvcqA8Unh5rZja4UV1DRoSvzDYzG8KoriG39qSHFrnrycxsQKM6UHR2+el2ZmZDGdU1ZKG7+LxstyjMzAYyugNFalHUu0VhZjagUV1DukVhZja0igKFpFMlPSZplaRLy6y/RNIKScsl3SFpRsn6iZKekfSlXNpdaZ8PptfUlF4n6QfpWPdJmrlrpzgwj1GYmQ1tyBpSUjVwDXAaMBc4V9LckmzLgLaImAfcBFxVsv6TwK/L7P5dEXFUej2f0i4AXoqIQ4DPA1dWfDY7qNCdup7cojAzG1AlP6WPAVZFxOqI2ArcAJyZzxARiyNiS1q8F2gprpN0NLA/cFuFZToT+FZ6fxPwZkmqcNsdUuhKXU9uUZiZDaiSGnIasCa33J7SBnIBcAuApCrgc8BHBsj7jdTt9IlcMOg7XkR0AxuA/Soo5w4rtigcKMzMBlZJDVnu13yUzSidB7QBV6ekDwCLImJNmezviogjgBPS66925HiSLpK0VNLSjo6OIU6hvL7BbN891sxsQJUEinagNbfcAqwtzSTpZOAyYEFEFFLyG4CLJT0FfBb4a0mfAYiIZ9K/G4HvkXVx9TuepBpgEvBi6fEi4rqIaIuItubm5gpOY3vNE+o4/YgDmDx+zE5tb2Y2GtRUkGcJMEfSLOAZ4BzgL/MZJM0HrgVOzQ1KExHvyuU5n2zA+9IUABojYp2kMcAZwK9S1oXAu4F7gHcAd0ZE2RbMrjp6xhSOnjFlJHZtZrbPGDJQRES3pIuBW4Fq4PqIeFTSFcDSiFhI1tXUANyYhhqejogFg+y2Drg1BYlqsiDx1bTu68B3JK0ia0mcs3OnZmZmw0Ej9GN9t2pra4ulS5fu6WKYmb2qSLo/ItqGyufpPmZmNigHCjMzG5QDhZmZDcqBwszMBuVAYWZmg3KgMDOzQe0T02MldQB/HCJbE7BuNxRnb+PzHn1G67n7vHfcjIgY8tYW+0SgqISkpZXMF97X+LxHn9F67j7vkeOuJzMzG5QDhZmZDWo0BYrr9nQB9hCf9+gzWs/d5z1CRs0YhZmZ7ZzR1KIwM7OdMCoChaRTJT0maZWkS/d0eUaKpFZJiyWtlPSopL9N6VMk3S7pD+nfyXu6rCNBUrWkZZJuTsuzJN2XzvsHkmr3dBmHm6RGSTdJ+n363t8wGr5vSR9Kf+OPSPq+pPp98fuWdL2k5yU9kksr+/0q88VUzy2X9PrhKsc+HygkVQPXAKcBc4FzJc3ds6UaMd3A/46Iw4FjgQ+mc70UuCMi5gB3pOV90d8CK3PLVwKfT+f9Etnz3Pc1XwB+GRGHAUeSnf8+/X1Lmgb8DdmD0F5H9kybc9g3v+9vAqeWpA30/Z4GzEmvi4CvDFch9vlAQfaI1VURsToitgI3AGfu4TKNiIh4NiIeSO83klUa08jO91sp27eAs/ZMCUeOpBbgbcDX0rKAk4CbUpZ97rwlTQT+lOxhX0TE1ohYzyj4vskeujY2PS1zHPAs++D3HRG/YftHQQ/0/Z4JfDsy9wKNkg4cjnKMhkAxDViTW25Pafs0STOB+cB9wP4R8SxkwQSYuudKNmL+Bfg/QG9a3g9YHxHdaXlf/N5nAx3AN1KX29ckjWcf/74j4hngs8DTZAFiA3A/+/73XTTQ9ztidd1oCBQqk7ZPT/WS1AD8CPhfEfHyni7PSJN0BvB8RNyfTy6TdV/73muA1wNfiYj5wGb2sW6mclKf/JnALOAgYDxZt0upfe37HsqI/c2PhkDRDrTmlluAtXuoLCMuPYf8R8B3I+LHKfm5YhM0/fv8nirfCDkeWCDpKbKuxZPIWhiNqWsC9s3vvR1oj4j70vJNZIFjX/++TwaejIiOiOgCfgwcx77/fRcN9P2OWF03GgLFEmBOmhFRSzbotXAPl2lEpH75rwMrI+Kfc6sWAu9O798N/Gx3l20kRcTHIqIlImaSfb93RsS7gMXAO1K2ffG8/xNYI+k1KenNwAr28e+brMvpWEnj0t988bz36e87Z6DvdyHw12n207HAhmIX1a4aFRfcSTqd7BdmNXB9RHx6DxdpREh6I3A38DDb+ur/L9k4xQ+B6WT/yf4iIkoHyPYJkk4EPhwRZ0iaTdbCmAIsA86LiMKeLN9wk3QU2QB+LbAaeA/ZD8B9+vuW9A/A2WQz/ZYBF5L1x+9T37ek7wMnkt0h9jng74CfUub7TUHzS2SzpLYA74mIpcNSjtEQKMzMbOeNhq4nMzPbBQ4UZmY2KAcKMzMblAOFmZkNyoHCzMwG5UBhSApJ38kt10jqKN6FdZDtjkpTjwda3ybpi7tYtuZ0R9Blkk7YlX2l/c0s3okzXz5JdZJ+JelBSWdLOiHdnfRBSWN39biDlOdESceN1P4HOObX9tSNMdP5lv27krRIUuPuLpMNrWboLDYKbAZeJ2lsRLwCnAI8U8F2RwFtwKLSFZJq0hzuXZ3H/Wbg9xHx7iFzbjt2dUT0DJWvpHzzgTERcVTax78Cn42Ib1R4TJFNN+8dMnN/JwKbgN/t4HY7LSIu3F3H2hERMeCPDtuz3KKwolvI7r4KcC7w/eIKSePTffGXpF/2Z6ar3K8Azs79Cv97SddJug34dv7Xo6QGSd+Q9HC6V/7blT0/4pvKninwsKQP5QuULia7Cji9+Mte0rkp7yOSrszl3STpCkn3AW8o2c/Rkh6SdA/wwVz6iZJuljQV+DfgqHSc9wHvBC6X9N2U9yPp/Jeni72KrZOVkr4MPAC0SnqLpHskPSDpxnTfLSQ9JekfUvrDkg5TduPG/wF8KB23X4spfZ7fknRb2v6/Sboqbf9LZbdrQdLlqWyPpM9fqVW4RNkFiEj6J0mfTu/vktSW+9yulHR/alEdk9avlrQg5Tlf0pdy5bo5t98hty9joqSfSFoh6V8lVeU+o6bc5/pVZa2625RadZL+Jm23XNINA+zfhltE+DXKX2S/aOeR3SuoHniQ7JfuzWn9P5Jd5QrQCDxOdiO284Ev5fbz92R38RyblvP7uBL4l1zeycDRwO25tMYyZes7BtkN4J4Gmslaw3cCZ6V1AbxzgPNbDrwpvb8aeKRM+frep+VvAu9I799C9lxikf24upns9t4zya6APzblawJ+A4xPyx8FLk/vnwL+Z3r/AeBruc/swwOU+++B3wJjyJ41sQU4La37Se7cp+S2+Q7wX9P715Ldav4UsiuVa1P6XWTPcih+bvl93pY73oOl30Favhk4sdLtS87pRKCT7M631cDtuc/5qfQZziS74vqolP5Dtv39rQXqBvp78WtkXm5RGAARsZzsP+i5bN+V9BbgUkkPklUy9WS3DyhnYWTdV6VOJnuAVPF4L5HdcmK2pP8n6VRgqDvd/glwV2Q3g+sGvktWYQP0kN0MsR9Jk8gqlF+npO+U5qnAW9JrGVnL4TCyh8MA/DGye/9D9rCoucC/p8/q3cCM3H6KN2m8n+yzrsQtkd347mGyivWXKf3h3D7+TNk4zsNkN0R8LUBEPEp2vj8H3hvZ81hKbS3Z569zx6ukjDuz/X9E9nyYHrKW6xvL5HkyIh5M7/Of13Lgu5LOIwsmtht4jMLyFpLd5/9Esuc5FAl4e0Q8ls8s6b+U2cfmAfYtSm55HBEvSToSeCtZl9A7gfcOUr5yt1Eu6ozy4xLbHXcnCPiniLi2X2LWdbS5JN/tEXHuAPsp3neoh8r/7xUAIqJXUlekn9JkLZkaSfXAl8laCGsk/T1ZIC86AlgP7D/A/kv3mT9esYzd9O+mrt/B7UuVfh/lvp/8PZp6gOKEgreR/ThYAHxC0mtj2zMobIS4RWF51wNXRMTDJem3Av9TkgAkzU/pG4EJFe77NuDi4oKkyZKagKqI+BHwCbJbZA/mPuBNqR+7mqz18+vBNojsiW8blN0wEeBdFZY371bgvbnxhmlpXKPUvcDxkg5J+cZJOnSIfe/IZ1hOsdJel8pXvHsqkv4bWcD/U+CL2vkZRU+Rjd9USWole2rkrjhG2d2cq8hu7PfbSjZK+VsjYjHZQ6oagYZdLItVwIHC+kREe0R8ocyqT5L1Oy9XNrX0kyl9MTA3DcSePcTuPwVMTgOuDwF/Rna3z7tSN803gY8NUb5nU57FwEPAAxFRya2k3wNco2wwu1y32KAi4jbge8A9qXvnJspU7hHRQdaf/31Jy8kCx2FD7P7nwJ+XG8yusGzrga+SdfX8lOy2+qQg/Bnggoh4nOyuouW+20r8O/BkOsZnybrfdsU9qWyPpP3+pMLtqoF/S9/BMrLnY6/fxbJYBXz3WDMzG5RbFGZmNigHCjMzG5QDhZmZDcqBwszMBuVAYWZmg3KgMDOzQTlQmJnZoBwozMxsUP8ft2GLOf4daNEAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = [2, 4, 8, 16, 32, 64, 100]\n",
"\n",
"metrics = [evaluate_dt(traindata, {},'leastAbsoluteError',10,3, param) for param in params]\n",
"\n",
"print (params)\n",
"\n",
"print (metrics)\n",
"\n",
"plot(params, metrics)\n",
"pyplot.xlabel('Metrics for different maximum bins')\n",
"fig = matplotlib.pyplot.gcf()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here