I need matlab code and a one page short summary/
Gradient Descent for Univariate Linear Regression In this lab you will implement Gradient Descent algorithm for solving a linear regression problem. Dataset For this exercise we will use the student marks data set. The dataset contains the midterm marks and final marks of 100 students in the previous semester. Our goal is creating a linear regression model to predict the final mark based on the midterm mark. Data prepration and EDA The dataset in this lab contains just one feature, with no missing values. Take a look at data and extraxt information such as mean (μ) and standard deviation (σ) Standardization In machine learning, we can handle various types of data, e.g. audio signals and pixel values for image data, and this data can include multiple dimensions. Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance. This method is widely used for normalization in many machine learning algorithms as follows: x′ = x−¯̄x̄ σ https://raw.githubusercontent.com/tofighi/MachineLearning/master/datasets/student_marks.csv Let’s suppose we want to model the above set of points with a line. To do this we’ll use the standard y = mx + b line equation where m is the line’s slope and b is the line’s y- intercept. To find the best line for our data, we need to find the best set of slope m and y- intercept b values. A standard approach to solving this type of problem is to define an error function (also called a cost function) that measures how “good” a given line is. This function will take in a (m, b) pair and return an error value based on how well the line fits our data. To compute this error for a given line, we’ll iterate through each (x, y) point in our data set and sum the square distances between each point’s y value and the candidate line’s y value (computed at mx + b). It’s conventional to square this distance to ensure that it is positive and to make our error function differentiable. Formally, this error function looks like: Error (Cost function) E = Error(m,b) = ∑ N i=1 (yi − (mxi + b)) 2 Lines that fit our data better (where better is defined by our error function) will result in lower error values. If we minimize this function, we will get the best line for our data. Since our error function consists of two parameters (m and b) we can visualize it as a two- dimensional surface. This is what it looks like for our data set: 1 N Linear Regression Each point in this two-dimensional space represents a line. The height of the function at each point is the error value for that line. You can see that some lines yield smaller error values than others (i.e., fit our data better). When we run gradient descent search, we will start from some location on this surface and move downhill to find the line with the lowest error. To run gradient descent on this error function, we first need to compute its gradient. The gradient will act like a compass and always point us downhill. To compute it, we will need to differentiate our error function. Since our function is defined by two parameters (m and b), we will need to compute a partial derivative for each. These derivatives work out to be: Gradient Partial Derivatives = ∑ N i=1 −xi (yi − (mxi + b)) = ∑ N i=1 − (yi − (mxi + b)) We now have all the tools needed to run gradient descent. We can initialize our search to start at any pair of m and b values (i.e., any line) and let the gradient descent algorithm march downhill on our error function towards the best line. Each iteration will update m and b to a line that yields slightly lower error than the previous iteration. The direction to move in for each iteration is calculated using the two partial derivatives from above and we update them using a learning rate (α): Updating m and b using a learning rate α mnew = mold − α bnew = bold − α The learning rate (α) variable controls how large of a step we take downhill during each iteration. If we take too large of a step, we may step over the minimum. However, if we take small steps, it will require many iterations to arrive at the minimum. ∂ ∂m 2 N ∂ ∂b 2 N ∂E ∂m ∂E ∂b Implementation steps Use Matlab or Python to implement functions for above descriptions of Standardization, Error (cost function), Gradient Partial Derivatives and Updating m and b using a learning rate Initialize m = −0.5 , b = 0 and α = 0.0001 Show the datapoints from student marks data set in a figure where x-axis is midterm mark and y-axis is final mark Show the initial regression line on the same figure (m = −0.5 , b = 0 ) Update b and m 100 times, and create another figure showing the regression line and datapoints in the same figure Create a new graph showing Error at each iteration (from initial point to iteration 100). x-axis is iteration number and y-axis is Error Update b and m for 2000 iterations (each update is one iteration), and create another https://raw.githubusercontent.com/tofighi/MachineLearning/master/datasets/student_marks.csv Create a new graph showing Error at each iteration (from initial point to iteration 2000). x-axis is iteration number and y-axis is Error Use Matlab or Python and verify your results Perform the above steps once with standardized features and once without Standardization Lab report Submit your Python Jupyter Notebook or Matlab codes in a zip file Write a short report explaining implementation steps and investigate what will happen if you change the learning rate to 0.1 (chang it in your code and see the results and report your results). Also investigate the effect of Standardization No coverpage is necessary Submissions are just accepted by uploading to D2L, email submissions will be ignored Late submissions will receive 10% penalty per day up to 3 days. A�er 72 hours from the deadline, 0 will be assigned. https://www.mathworks.com/help/stats/regress.html https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html