Solutions should be complete descriptions and full sentences.. Ideally it would look like a report, with a clearly stated opening description and a clear conclusions. Calculations should be paired with comments. Answers should not be a list of 14 bullet points. Answer should be closer to a report or project.
COMP 3007 Final Take-home - Page 2 of 7 Due: TBD - See wall Part 1: The method of Gradient Descent 2.1 Goal The goal of this project is to find the zeroes of a certain class of function using (indirectly) an algorithm known as the method of gradient descent. 2.2 Background: Linear Algebra This project will require very basic linear algebra - no more than matrix multiplication. Consider the following matrix equation: f(x, y) = ( 3 1 2 −1 )( x y ) − ( −1 1 ) = ( 3x+ y + 1 2x− y − 1 ) If we define A := ( 3 1 2 −1 ) , ~x := ( x y ) , and ~b := ( −1 1 ) then we can write this expression as A~x−~b. Now observe that this expression is functionally equivalent to the following: f(x, y) = 〈3x+ y + 1, 2x− y − 1〉 Thus there are two ways to write the function f(x, y) - as a either a matrix equation, or as a typical multivariate function. Symbolab is a great tool if you are unfamiliar with matrix multiplication. Click here to see the matrix multiplication above done in symbolab. Also, while we can’t graph f(x, y) fully, we have some tools to analyze this function: link. The motivation behind this project comes from an important topic in linear algebra - solving systems of equations. For example, we might be interested in know for which x, y the following is true: 3x+ y = −1 2x− y = 1 This is equivalent to asking when f(x, y) = 〈0, 0〉 or equivalently A~x−~b = ~0. 2.3 Background: The Gradient Recall that for a multivariate function h(x, y) from Rn → R, we can construct the gradient of h, denoted ∇h. When n = 2, the gradient of h : R2 → R is: ∇h = ∇h(x, y) = 〈 ∂h ∂x , ∂h ∂y 〉 In particular, we saw this indirectly when taking taking a directional derivative of h(x, y). Notice that the output of h(x, y) is a single number, whereas ∇h is a vector with two components. There is another important fact about gradients which is illustrated in this link: https://www.desmos.com/calculator/sduapnehxo. This illustration shows the level sets (i.e. preimages!!) of the function h(x, y) = x2 + xy + y2. It also contains a movable point. I have also graphed the gradient of h(x, y) (which is a vector!) and translated it to this movable point. You can change which function we take the level sets of in the link, but you have to change both functions it in the top 2 lines. here is another example. https://www.symbolab.com/solver/step-by-step/%5Cbegin%7Bpmatrix%7D3%261%5C%5C%20%202%26-1%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7Dx%5C%5C%20%20y%5Cend%7Bpmatrix%7D-%5Cbegin%7Bpmatrix%7D-1%5C%5C%201%5Cend%7Bpmatrix%7D https://www.desmos.com/calculator/wovmgdw3om https://www.desmos.com/calculator/sduapnehxo https://www.desmos.com/calculator/azt4uaopy5 Rectangle Rectangle COMP 3007 Final Take-home - Page 3 of 7 Due: TBD - See wall 2.4 Gradient descent project example 2.4.1 Properties of the gradient (i) Experiment with the the desmos link given in section 2.3 above, trying out a variety of functions. For each function you choose, move the gradient along one or more of the level sets. What relationships do you observe between the gradient and the level sets of the function? (ii) Now, using any example you chose or the one provided, drag the gradient to a point on some level set, and zoom in to that point until the level set looks straight. What would happen on the surface above this point if we move slightly in the direction perpendicular to the gradient? (iii) Using the explanation you gave in the previous step, explain why the gradient points in the direction in which f(x, y) increases the fastest. 2.4.2 Set up (iv) Now recall the function f(x, y) given in section 2.2 above, f(x, y) = 〈3x+ y + 1, 2x− y − 1〉 = A~x−~b Why are we unable to take the gradient of f(x, y)? (v) Using the fact that the length of a vector is given by ||〈a, b〉|| = √ a2 + b2, compute ||f(x, y)||2 =: L(x, y). Note that your answer should be a function L : R2 → R. (vi) Now compute ∇||f(x, y)||2 = ∇L(x, y). Note that your answer should be a function ∇L : R2 → R2. (vii) Now compute AT (A~x−~b). Here AT is the transpose of A, so AT = ( 3 2 1 −1 ) . Again, if you are unfamiliar with matrix multiplication, the link above to symbolab is a great tool to carry out this calculation. Your answer should be a 2× 1 matrix (vector). (viii) Compare the results from the previous two steps. Use this to guess the general matrix formula for the gradient of the function ||A~x −~b||2. That is, complete the following equation: ∇||A~x−~b||2 =??? 2.4.3 Investigating the method of Gradient Descent (ix) Now consider the general formula for the method of gradient descent for a function g(x, y), or more formally g : R2 → R: ~an+1 = ~an − η∇g(~an) Rectangle Rectangle COMP 3007 Final Take-home - Page 4 of 7 Due: TBD - See wall Here η is a step size, and we must start with an initial location of ~a0 = ( x0 y0 ) . The method of gradient descent is an algorithm that generates a sequence of points that, under the appropriate conditions, will converge to the location of a local minimum of the function g. Give a short explanation for why this algorithm will find the location of the minimum. You might mention what problems one might encounter. For example, what can go wrong with η? Give as much detail as possible. It would be a great idea to include a picture to show what this process looks like. (x) Provided is Python code to implement the method of gradient descent for the function L(x, y), using the initial point ( .5 −.5 ) and a step size of η = .01. Summarize the results of the output. The code can be run here if you are not familiar with Python. (xi) Turning our attention back to the functions f(x, y) and L(x, y) = ||f(x, y)||2, explain why we can’t use the method of gradient descent on f(x, y), but why we can use it on L(x, y). (xii) Our goal with this project is to find the (x, y) pair(s) that make the function f(x, y) = 〈0, 0〉. Explain why the results of applying the method of gradient descent on L(x, y) gives us just that. (xiii) Use the results of the Python code to estimate a point (x, y) that makes f(x, y) = 0. Verify that your guess makes f(x, y) = 0. (xiv) (bonus) What would have changed if our function did not have a solution? What would have changed in the steps above and what would have remained the same? https://repl.it/join/sbpegyhx-brendtgerics Rectangle Rectangle COMP 3007 Final Take-home - Page 5 of 7 Due: TBD - See wall # Please note t ha t t h i s code i s minimal and i s on ly meant f o r demonstrat ion . #I t i s c o r r e c t mathemat ica l ly , but not what I would use in a r e a l s i t u a t i o n . # I t con ta ins no error checking , and the f unc t i on s cou ld ( shou ld ) #be wr i t t en in a g en e r a l i z e d f a sh i on . I encourage you to r ewr i t e t h i s #more f o rma l l y i f you are f am i l i a r wi th programming . import math import numpy as np def vecLength ( v ) : tempSum = 0.0 for x in v : tempSum += x [ 0 ]∗∗2 return math . s q r t (tempSum) def f (A, b , x , y ) : temp = np . matrix ( [ [ x ] , [ y ] ] ) return np . matrix (np . dot (A, temp)−b) def grad (A, b , x , y ) : vec = np . matrix ( [ [ x ] , [ y ] ] ) return np . matrix (2∗np . dot (np . t ranspose (A) , np . dot (A, vec)−b ) ) def main ( ) : eta = .01 A = np . matrix ( [ [ 3 , 1 ] , [ 2 , − 1 ] ] ) b = np . matrix ( [ [ − 1 ] , [ 1 ] ] ) x = −1 y = 1 step = 0 while vecLength ( grad (A, b , x , y ) ) > . 0 1 : s tep += 1 x = (np . matrix ( [ [ x ] , [ y ] ] ) − eta ∗grad (A, b , x , y ) ) . item (0) y = (np . matrix ( [ [ x ] , [ y ] ] ) − eta ∗grad (A, b , x , y ) ) . item (1) print ( ” (x , y ) = ( ” ,x , ” , ” ,y , ” ) ” ) print ( ”The magnitude squared ( l ength ) o f the output o f f ) p r i n t ( , i . e . , | | f (x , y ) | |ˆ2=L(x , y ) at s tep ” , step , ” i s ” , vecLength ( f (A, b , x , y ) )∗∗2) print ( ”The magnitude ( l ength ) o f the g rad i ent at s tep ” , step , ” i s ” , vecLength ( grad (A, b , x , y ) ) , ’ \n ’ ) main ( ) Rectangle