Answer To: 2020/11/19 HW9 - Colaboratory...
Sandeep Kumar answered on Nov 20 2021
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "c6ITdjT8wyGx"
},
"source": [
"Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
"\n",
"Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name and collaborators below:"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"id": "0G9m-EaXwyGx"
},
"outputs": [],
"source": [
"NAME = \"Yuxuan Guo\"\n",
"COLLABORATORS = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zo5reyZAwyGx"
},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "XMYdXcXEwyGx",
"nbgrader": {
"checksum": "63ab1fa9b6548ab3e3083741a0cdcddf",
"grade": false,
"grade_id": "cell-c2546ad8aa448250",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"# Homework 9: From Expressions to Optimization and Machine Learning\n",
"\n",
"Copyright Luca de Alfaro, 2019-2020. \n",
"License: [CC-BY-NC-ND](https://creativecommons.org/licenses/by-nc-nd/4.0/).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "OrrPaQgBwyGx",
"nbgrader": {
"checksum": "91626c449f859e681933f439fd6f4f1f",
"grade": false,
"grade_id": "cell-5f3c38b7a6291dac",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Submission\n",
"\n",
"[Please submit to this Google Form](https://docs.google.com/forms/d/e/1FAIpQLSdYG6YhWxV5MIgd3yINEOcr2n4wsRv5F1UF2n6PVAIW6J-r9Q/viewform?usp=sf_link).\n",
"\n",
"Deadline: Thursday November 19, 11pm (check on Canvas for updated information)."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "RwH9WK5hwyGx",
"nbgrader": {
"checksum": "18f1a1706e1fc95532f0f0681a4720f5",
"grade": false,
"grade_id": "cell-dca15f51d39533b6",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Test Format\n",
"\n",
"The test contains 5 questions, for a total of 70 points. "
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "7B4SOFgTwyGx",
"nbgrader": {
"checksum": "a63b30f8d6cde2dad94f9a1d461fdd9e",
"grade": false,
"grade_id": "cell-9b6e16513f2274bb",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## ML in a nutshell\n",
"\n",
"Optimization, and machine learning, are intimately connected. At a very coarse level, ML works as follows. \n",
"\n",
"First, you come up somehow with a very complicated model $\\vec{y} = M(\\vec{x}, \\vec{\\theta})$, which computes an output $\\vec{y}$ as a function of an input $\\vec{x}$ and of a vector of parameters $\\vec{\\theta}$. In general, $\\vec{x}$, $\\vec{y}$, and $\\vec{\\theta}$ are vectors, as the model has multiple inputs, multiple outputs, and several parameters. The model $M$ needs to be complicated, because only complicated models can represent complicated phenomena; for instance, $M$ can be a multi-layer neural net with parameters $\\vec{\\theta} = [\\theta_1, \\ldots, \\theta_k]$, where $k$ is the number of parameters of the model. \n",
"\n",
"Second, you come up with a notion of _loss_ $L$, that is, how badly the model is doing. For instance, if you have a list of inputs $\\vec{x}_1, \\ldots, \\vec{x}_n$, and a set of desired outputs $\\vec{y}_1, \\ldots, \\vec{y}_m$, you can use as loss: \n",
"\n",
"$$\n",
"L(\\vec{\\theta}) = \\sum_{i=1}^n |\\!|\\vec{y}_i - M(\\vec{x}_i, \\vec{\\theta})|\\!| \\; .\n",
"$$\n",
"\n",
"Here, we wrote $L(\\vec{\\theta})$ because, once the inputs $\\vec{x}_1, \\ldots, \\vec{x}_n$ and the desired outputs $\\vec{y}_1, \\ldots, \\vec{y}_n$ are chosen, the loss $L$ depends on $\\vec{\\theta}$. \n",
"\n",
"Once the loss is chosen, you decrease it, by computing its _gradient_ with respect to $\\vec{\\theta}$. Remembering that $\\vec{\\theta} = [\\theta_1, \\ldots, \\theta_k]$,\n",
"\n",
"$$\n",
"\\nabla_\\vec{\\theta} L = \\left[ \\frac{\\partial L}{\\partial \\theta_1}, \\ldots, \n",
" \\frac{\\partial L}{\\partial \\theta_k} \\right] \\; .\n",
"$$\n",
"\n",
"The gradient is a vector that indicates how to tweak $\\vec{\\theta}$ to decrease the loss. You then choose a small _step size_ $\\delta$, and you update $\\vec{\\theta}$ via $\\vec{\\theta} := \\vec{\\theta} - \\delta \\nabla_\\vec{\\theta} L$. This makes the loss a little bit smaller, and the model a little bit better. If you repeat this step many times, the model will hopefully get (a good bit) better. \n",
"\n",
"## Autogradient\n",
"\n",
"The key to _pleasant_ ML is to focus on building the model $M$ in a way that is sufficiently expressive, and on choosing a loss $L$ that is helpful in guiding the optimization. The computation of the gradient is done automatically for you. This capability, called _autogradient_, is implemented in ML frameworks such as [Tensorflow](https://www.tensorflow.com), [Keras](https://keras.io), and [PyTorch](https://pytorch.org). \n",
"\n",
"It is possible to use these advanced ML libraries without ever knowing what is under the hood, and how autogradient works. Here, we will insted dive in, and implement autogradient. \n",
"\n",
"Building a model $M$ corresponds to building an expression with inputs $\\vec{x}$, $\\vec{\\theta}$. We will provide a representaton for expressions that enables both the calculation of the expression value, and the differentiation with respect to any of the inputs. This will enable us to implement autogradient. On the basis of this, we will be able to implement a simple ML framework. \n",
"\n",
"We say we, but we mean you. _You_ will implement it; we will just provide guidance."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "wuxU79AuwyGx",
"nbgrader": {
"checksum": "3cc78b9f5abc2f78826438d7f57a22f7",
"grade": false,
"grade_id": "cell-2a72086b83ae1784",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Expressions with autogradient\n",
"\n",
"Our main task will be to implement a class `Expr` that represents expressions with autogradient. \n",
"\n",
"### Implementing expressions\n",
"\n",
"We will have `Expr` be the abstract class of a generic expression, and `Plus`, `Multiply`, and so on, be derived classes representing expression with given top-level operators. The constructor takes the children node. The code for the constructor, and the code to create addition expressions, is as follows. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"deletable": false,
"editable": false,
"id": "n96_CWoewyGy",
"nbgrader": {
"checksum": "56716e3515f384375c85c413a6276bc1",
"grade": false,
"grade_id": "cell-64a6e1036e3a8a8d",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"class Expr(object):\n",
"\n",
" def __init__(self, *args):\n",
" \"\"\"Initializes an expression node, with a given list of children\n",
" expressions.\"\"\"\n",
" self.children = args\n",
" self.value = None # The value of the expression.\n",
" self.values = None # The values of the child expressions.\n",
"\n",
" def __add__(self, other):\n",
" \"\"\"Constructs the sum of two expressions.\"\"\"\n",
" return Plus(self, other)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "f2F8BRX6wyGy",
"nbgrader": {
"checksum": "96695aebd0b3794e8953533ba6f59042",
"grade": false,
"grade_id": "cell-d90eb0fa9962559c",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"The code for the `Plus` class, initially, is empty; no `Expr` methods are over-ridden."
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"deletable": false,
"editable": false,
"id": "J3eVYoYxwyGy",
"nbgrader": {
"checksum": "b9965ff93dc4824f9a9ce11d1e532aa8",
"grade": false,
"grade_id": "cell-80283830e855ede9",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"class Plus(Expr):\n",
" \"\"\"An addition expression.\"\"\"\n",
"\n",
" pass\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "TBO0wZg5wyGy",
"nbgrader": {
"checksum": "1af38fcf3d243ce198cc784bd8a9ffcb",
"grade": false,
"grade_id": "cell-9bb6b7656beaec0d",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"To construct expressions, we need one more thing. So far, if we write things like `2 + 3`, Python will just consider these as expressions involving numbers, and compute their value. To write _symbolic_ expressions, we need symbols, or variables. A variable is a type of expression that just contains a value as child, and that has an `assign` method to assign a value to the variable. The `assign` method can be used to modify the variable's content (without `assign`, our variables would be constants!). "
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"deletable": false,
"editable": false,
"id": "eYt7EcHmwyGy",
"nbgrader": {
"checksum": "77f8abba50f1e93af8a73de7bd4ce946",
"grade": false,
"grade_id": "cell-96c9be270a7105fd",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"class V(Expr):\n",
" \"\"\"This class represents a variable. The derivative rule corresponds\n",
" to d/dx x = 1, but note that it will not be called, since the children\n",
" of a variable are just numbers.\"\"\"\n",
"\n",
" def assign(self, v):\n",
" \"\"\"Assigns a value to the variable.\"\"\"\n",
" self.children = [v]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "uztQHKWVwyGy",
"nbgrader": {
"checksum": "ad85ededb82da77030dbc0e4d60f57ed",
"grade": false,
"grade_id": "cell-47c1ead7e575e743",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"This suffices for creating expressions. Let's create one."
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"deletable": false,
"editable": false,
"id": "S3cdQSfnwyGy",
"nbgrader": {
"checksum": "9d611721b84b2da96ae0825eebdab99c",
"grade": false,
"grade_id": "cell-2c74422d022963de",
"locked": true,
"schema_version": 1,
"solution": false
},
"outputId": "beafb542-7c70-4ac8-dddc-c89a83b61298"
},
"outputs": [
{
"data": {
"text/plain": [
"<__main__.Plus at 0x2ae75c52da0>"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"e = V(3) + 4\n",
"e\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "sNSIJofWwyGy",
"nbgrader": {
"checksum": "f0025404593b37c016edd0487024821b",
"grade": false,
"grade_id": "cell-804c6ca38cd4f41e",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Computing the value of expressions\n",
"\n",
"We now have our first expression. To compute the expression value, we endow each expression with a method `op`, whose task is to compute the value `self.value` of the expression from the list of values `self.values` of the children. "
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "z7Vm_mB5wyGy",
"nbgrader": {
"checksum": "8bd87d4a757b18bfb06f5f2f054d6b58",
"grade": false,
"grade_id": "cell-b6aaba31539c892c",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"Let's implement the `compute` method for an expression. This method will: \n",
"\n",
"1. Loop over the children, and computes the list `self.values` of children values as follows: \n",
" * If the child is an expression (an instance of `Expr`), obtain its value by calling `compute` on it. \n",
" * If the child is not an instance of `Expr`, then the child must be a number, and we can use its value directly. \n",
"2. Call the method `op` of the expression, to compute `self.value` from `self.values`. \n",
"3. Return `self.value`. "
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"deletable": false,
"editable": false,
"id": "tzL62aVKwyGy",
"nbgrader": {
"checksum": "445dae83548d8dbfbc688778c08c25c8",
"grade": false,
"grade_id": "cell-cd6e1cd62e23ccd6",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"class Expr(object):\n",
"\n",
" def __init__(self, *args):\n",
" \"\"\"Initializes an expression node, with a given list of children\n",
" expressions.\"\"\"\n",
" self.children = args\n",
" self.value = None # The value of the expression.\n",
" self.values = None # The values of the child expressions.\n",
" self.gradient = 0 # The value of the gradient.\n",
"\n",
" def op(self):\n",
" \"\"\"This operator must be implemented in subclasses; it should\n",
" compute self.value from self.values, thus implementing the\n",
" operator at the expression node.\"\"\"\n",
" raise NotImplementedError()\n",
"\n",
" def compute(self):\n",
" \"\"\"This method computes the value of the expression.\n",
" It first computes the value of the children expressions,\n",
" and then uses self.op to compute the value of the expression.\"\"\"\n",
" self.values = [e.compute() if isinstance(e, Expr) else e\n",
" for e in self.children]\n",
" # This computes self.value\n",
" self.op()\n",
" return self.value\n",
"\n",
" def __repr__(self):\n",
" return (\"%s:%r %r (g: %r)\" % (\n",
" self.__class__.__name__, self.children, self.value, self.gradient))\n",
"\n",
" # Expression constructors\n",
"\n",
" def __add__(self, other):\n",
" return Plus(self, other)\n",
"\n",
" def __radd__(self, other):\n",
" return Plus(self, other)\n",
"\n",
" def __sub__(self, other):\n",
" return Minus(self, other)\n",
"\n",
" def __rsub__(self, other):\n",
" return Minus(other, self)\n",
"\n",
" def __mul__(self, other):\n",
" return Multiply(self, other)\n",
"\n",
" def __rmul__(self, other):\n",
" return Multiply(other, self)\n",
"\n",
" def __truediv__(self, other):\n",
" return Divide(self, other)\n",
"\n",
" def __rtruediv__(self, other):\n",
" return Divide(other, self)\n",
"\n",
" def __pow__(self, other):\n",
" return Power(self, other)\n",
"\n",
" def __rpow__(self, other):\n",
" return Power(other, self)\n",
"\n",
" def __neg__(self):\n",
" return Negative(self)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "ZDhYKChqwyGy",
"nbgrader": {
"checksum": "3abcdb3ca5f0df789c6427e08a56383d",
"grade": false,
"grade_id": "cell-97de33b3461556ef",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"Let us give `op` for `Plus`, `Multiply`, and for variables via `V`, so you can see how it works."
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"deletable": false,
"editable": false,
"id": "RKHbN_mgwyGy",
"nbgrader": {
"checksum": "113dbe6e85da910fe6131731eb9fd7ff",
"grade": false,
"grade_id": "cell-1a1ed551215fcd72",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"class V(Expr):\n",
" \"\"\"This class represents a variable.\"\"\"\n",
"\n",
" def assign(self, v):\n",
" \"\"\"Assigns a value to the variable. Used to fit a model, so we\n",
" can assign the various input values to the variable.\"\"\"\n",
" self.children = [v]\n",
"\n",
" def op(self):\n",
" self.value = self.values[0]\n",
"\n",
" def __repr__(self):\n",
" return \"Variable: \" + str(self.children[0])\n",
"\n",
"\n",
"class Plus(Expr):\n",
"\n",
" def op(self):\n",
" self.value = self.values[0] + self.values[1]\n",
"\n",
"class Multiply(Expr):\n",
"\n",
" def op(self):\n",
" self.value = self.values[0] * self.values[1]\n"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {
"deletable": false,
"editable": false,
"id": "0b2HUY3NwyGy",
"nbgrader": {
"checksum": "8ea2d3688e9f4369e1a47f11bb6b6a9f",
"grade": false,
"grade_id": "cell-7ef052946f3ba4c5",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"#@title Let's define testing helpers.\n",
"\n",
"def check_equal(x, y, msg=None):\n",
" if x != y:\n",
" if msg is None:\n",
" print(\"Error:\")\n",
" else:\n",
" print(\"Error in\", msg, \":\")\n",
" print(\" Your answer was:\", x)\n",
" print(\" Correct answer: \", y)\n",
" assert x == y, \"%r and %r are different\" % (x, y)\n",
"\n",
"def check_almost_equal(x, y, tolerance=0.001, msg=None):\n",
" if abs(x - y) > tolerance:\n",
" if msg is None:\n",
" print(\"Error:\")\n",
" else:\n",
" print(\"Error in\", msg, \":\")\n",
" print(\" Your answer was:\", x)\n",
" print(\" Correct answer: \", y)\n",
" assert abs(x - y) < tolerance, \"%r and %r are different\" % (x, y)\n",
"\n",
"def check_true(x, msg=None):\n",
" if not x:\n",
" if msg is None:\n",
" print(\"Error: assertion is false\")\n",
" else:\n",
" print(\"Error in\", msg, \": false\")\n",
" assert x\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "r1csvYprwyGy",
"nbgrader": {
"checksum": "0f81d6675a8de84e364059372e7d04cc",
"grade": false,
"grade_id": "cell-e3f69f9e4683f9ca",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Question 1\n",
"\n",
"We will have you implement also multiplication."
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {
"deletable": false,
"id": "aQIrzFeLwyGy",
"nbgrader": {
"checksum": "993abefb34333857c1c822ffbd00284f",
"grade": false,
"grade_id": "cell-3ee82ba0eb0fd073",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [],
"source": [
"### Exercise: Implement `Multiply`\n",
"\n",
"class Multiply(Expr):\n",
" \"\"\"A multiplication expression.\"\"\"\n",
" def op(self):\n",
" # YOUR CODE HERE\n",
" self.value = self.values[0] * self.values[1]\n",
" def derivate(self):\n",
" return [self.values[1], self.values[0]]"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {
"deletable": false,
"id": "DXUUaFVNwyGy",
"nbgrader": {
"checksum": "b6ce9c1e6fb8f3e3462f2b2621a35b4f",
"grade": false,
"grade_id": "cell-a68bf3e6ca6b9d7d",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [],
"source": [
"# Here you can write additional tests, to help debugging your code.\n",
"\n",
"# YOUR CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
"deletable": false,
"editable": false,
"id": "MuulhHrgwyGy",
"nbgrader": {
"checksum": "c9e41e7030f6df0654ee03356eebc85f",
"grade": true,
"grade_id": "cell-65a16e57033790e7",
"locked": true,
"points": 5,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"### 5 points: Tests for `Multiply`\n",
"\n",
"e = V(2) * 3\n",
"check_equal(e.compute(), 6)\n",
"\n",
"e = (V(2) + 3) * V(4)\n",
"check_equal(e.compute(), 20)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "A-2LqvBFwyGy",
"nbgrader": {
"checksum": "5b1e8ffba32aad3a851858059acb1727",
"grade": false,
"grade_id": "cell-5bd20aa6ea61312a",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Implementing autogradient\n",
"\n",
"The next step consists in implementing autogradient. Consider an expression $e = E(x_0, \\ldots, x_n)$, computed as function of its children expressions $x_0, \\ldots, x_n$. \n",
"\n",
"The goal of the autogradient computation is to accumulate, in each node of the expression, the gradient of the loss with respect to the node's value. For instance, if the gradient is $2$, we know that if we increase the value of the expression by $\\Delta$, then the value of the loss is increased by $2 \\Delta$. We accumulate the gradient in the field `self.gradient` of the expression. \n",
"\n",
"We say _accumulate_ the gradient, because we don't really do:\n",
"\n",
" self.gradient = ...\n",
"\n",
"Rather, we have a method `e.zero_gradient()` that sets all gradients to 0, and we then _add_ the gradient to this initial value of 0: \n",
"\n",
" self.gradient += ... \n",
"\n",
"We will explain later in detail why we do so; for the moment, just accept it. \n",
"\n",
"### Computaton of the gradient\n",
"\n",
"In the computation of the autogradient, the expression will receive as input the value $\\partial L / \\partial e$, where $L$ is the loss, and $e$ the value of the expression. The quantity $\\partial L / \\partial e$ is the gradient of the loss with respect to the expression value. \n",
"\n",
"With this input, the method `compute_gradient` of an Expr $e$ must do the following:\n",
"\n",
"* It must _add_ $\\partial L / \\partial e$ to the gradient `self.gradient` of the expression. \n",
"* It must call the method `derivate`. If the expression node has children $c_1, \\ldots, c_n$, the method `derivate` must return the list of derivatives of $e$ with respect to its children $x_1, \\ldots, x_n$:\n",
"\n",
" $$\n",
" \\left[ \\frac{\\partial e}{\\partial x_1}, \\ldots, \n",
" \\frac{\\partial e}{\\partial x_n} \\right]\n",
" $$\n",
"\n",
" The method `derivate` is implemented not directly in the class `Expr`, but for each specific operator, such as `Plus`, `Multiply`, etc: each operator knows how to compute the derivative of its output value with respect to its children (its arguments).\n",
"* It must propagate to each child $x_i$ the gradient \n",
"\n",
" $$\n",
" \\frac{\\partial L}{\\partial e} \\cdot \\frac{\\partial e}{\\partial x_i} \\; , \n",
" $$\n",
"\n",
" by calling the method `compute_gradient` of the child with argument $\\frac{\\partial L}{\\partial e} \\cdot \\frac{\\partial e}{\\partial x_i}$. "
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
"deletable": false,
"editable": false,
"id": "QXG0EagrwyGy",
"nbgrader": {
"checksum": "d00aa523cacd07917d62cf725e98b347",
"grade": false,
"grade_id": "cell-6bbf652e5e782827",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"def expr_derivate(self):\n",
" \"\"\"This method computes the derivative of the operator at the expression\n",
" node. It needs to be implemented in derived classes, such as Plus,\n",
" Multiply, etc.\"\"\"\n",
" raise NotImplementedError()\n",
"\n",
"Expr.derivate = expr_derivate\n",
"\n",
"def expr_zero_gradient(self):\n",
" \"\"\"Sets the gradient to 0, recursively for this expression\n",
" and all its children.\"\"\"\n",
" self.gradient = 0\n",
" for e in self.children:\n",
" if isinstance(e, Expr):\n",
" e.zero_gradient()\n",
"\n",
"Expr.zero_gradient = expr_zero_gradient\n",
"\n",
"def expr_compute_gradient(self, de_loss_over_de_e=1):\n",
" \"\"\"Computes the gradient.\n",
" de_loss_over_de_e is the gradient of the output.\n",
" de_loss_over_de_e will be added to the gradient, and then\n",
" we call for each child the method compute_gradient,\n",
" with argument de_loss_over_de_e * d expression / d child.\n",
" The value d expression / d child is computed by self.derivate. \"\"\"\n",
" pass # We will write this later.\n",
"\n",
"Expr.compute_gradient = expr_compute_gradient\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "HkvCoUhpwyGy",
"nbgrader": {
"checksum": "3567a882d8279c93321e42cc8f559097",
"grade": false,
"grade_id": "cell-400c0a653dac915c",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"Let us endow our operators `V`, `Plus`, `Multiply` with the `derivate` method, so you can see how it works in practice."
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {
"deletable": false,
"editable": false,
"id": "ezp25ZgVwyGy",
"nbgrader": {
"checksum": "3ecd9ed5eb25cfafe4476dddc06e6287",
"grade": false,
"grade_id": "cell-9ce4f7e12b25abca",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"class V(Expr):\n",
" \"\"\"This class represents a variable. The derivative rule corresponds\n",
" to d/dx x = 1, but note that it will not be called, since the children\n",
" of a variable are just numbers.\"\"\"\n",
"\n",
" def assign(self, v):\n",
" \"\"\"Assigns a value to the variable. Used to fit a model, so we\n",
" can assign the various input values to the variable.\"\"\"\n",
" self.children = [v]\n",
"\n",
" def op(self):\n",
" self.value = self.values[0]\n",
"\n",
" def derivate(self):\n",
" return [1.] # This is not really used.\n",
"\n",
"\n",
"class Plus(Expr):\n",
" \"\"\"An addition expression. The derivative rule corresponds to\n",
" d/dx (x+y) = 1, d/dy (x+y) = 1\"\"\"\n",
"\n",
" def op(self):\n",
" self.value = self.values[0] + self.values[1]\n",
"\n",
" def derivate(self):\n",
" return [1., 1.]\n",
"\n",
"\n",
"class Multiply(Expr):\n",
" \"\"\"A multiplication expression. The derivative rule corresponds to\n",
" d/dx (xy) = y, d/dy(xy) = x\"\"\"\n",
"\n",
" def op(self):\n",
" self.value = self.values[0] * self.values[1]\n",
"\n",
" def derivate(self):\n",
" return [self.values[1], self.values[0]]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "AsF1PJlSwyGy",
"nbgrader": {
"checksum": "57b4643288f6727e88a0f4c0e3056921",
"grade": false,
"grade_id": "cell-f0b4b32cdd79b712",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"Let us comment on some subtle points, before you get to work at implementing `compute_gradient`. \n",
"\n",
"**`zero_gradient`:** First, notice how in the implementation of `zero_gradient`, when we loop over the children, we check whether each children is an `Expr` via isinstance(e, Expr). In general, we have to remember that children can be either `Expr`, or simply numbers, and of course numbers do not have methods such as `zero_gradient` or `compute_gradient` implemented for them. \n",
"\n",
"**`derivate`:** Second, notice how `derivate` is not implemented in `Expr` directly, but rather, only in the derived classes such as `Plus`. The derivative of the expression with respect to its arguments depends on which function it is, obviously. \n",
"\n",
"For `Plus`, we have $e = x_0 + x_1$, and so: \n",
"\n",
"$$\n",
"\\frac{\\partial e}{\\partial x_0} = 1 \\qquad \\frac{\\partial e}{\\partial x_1} = 1 \\; ,\n",
"$$\n",
"\n",
"because $d(x+y)/dx = 1$. Hence, the `derivate` method of `Plus` returns\n",
"\n",
"$$\n",
"\\left[ \\frac{\\partial e}{\\partial x_0}, \\: \\frac{\\partial e}{\\partial x_1}\\right] \\; = \\; [1, 1] \\; .\n",
"$$\n",
"\n",
"For `Multiply`, we have $e = x_0 \\cdot x_1$, and so:\n",
"\n",
"$$\n",
"\\frac{\\partial e}{\\partial x_0} = x_1 \\qquad \\frac{\\partial e}{\\partial x_1} = x_0 \\; ,\n",
"$$\n",
"\n",
"because $d(xy)/dx = y$. Hence, the `derivate` method of `Plus` returns\n",
"\n",
"$$\n",
"\\left[ \\frac{\\partial e}{\\partial x_0}, \\: \\frac{\\partial e}{\\partial x_1}\\right] \\; = \\; [x_1, x_0] \\; .\n",
"$$\n",
"\n",
"\n",
"**Calling `compute` before `compute_gradient`:** Lastly, a very important point: when calling `compute_gradient`, we will assume that `compute` has _already_ been called. In this way, the value of the expression, and its children, are available for the computation of the gradient. Note how in `Multiply.derivate` we use these values in order to compute the partial derivatives. "
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"id": "WXzwuWUfwyGy",
"nbgrader": {
"checksum": "09b5f28a099687b50f4cb3a545264b4c",
"grade": false,
"grade_id": "cell-91edc38c6cf81c7a",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Question 2: Implementation of `compute_gradient`\n",
"\n",
"With these clarifications, we ask you to implement the `compute_gradient` method, which again must:\n",
"\n",
"* _add_ $\\partial L / \\partial e$ to the gradient `self.gradient` of the expression; \n",
"* compute $\\frac{\\partial e}{\\partial x_i}$ for each child $x_i$ by calling the method `derivate` of itself; \n",
"* propagate to each child $x_i$ the gradient $\\frac{\\partial L}{\\partial e} \\cdot \\frac{\\partial e}{\\partial x_i}$, by calling the method `compute_gradient` of the child $i$ with argument $\\frac{\\partial L}{\\partial e} \\cdot \\frac{\\partial e}{\\partial x_i}$.\n",
"\n",
"Note that you do not need to implement the method `derivate`. You just need to do the above. "
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"deletable": false,
"id": "7MXRU4w7wyGy",
"nbgrader": {
"checksum": "1035da9869ba6a4f2ee15a24c2594a1c",
"grade": false,
"grade_id": "cell-a283df46be3d85aa",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [],
"source": [
"### Exercise: Implementation of `compute_gradient`\n",
"\n",
"def expr_compute_gradient(self, de_loss_over_de_e=1):\n",
" \"\"\"Computes the gradient.\n",
" de_loss_over_de_e is the gradient of the output.\n",
" de_loss_over_de_e will be added to the gradient, and then\n",
" we call for each child the method compute_gradient,\n",
" with argument de_loss_over_de_e * d expression / d child.\n",
" The value d expression / d child is computed by self.derivate. \"\"\"\n",
" # YOUR CODE HERE\n",
" self.gradient += de_loss_over_de_e\n",
" dedi = self.derivate()\n",
" for i in self.children:\n",
" if (isinstance(i,Expr)):\n",
" i.compute_gradient( de_loss_over_de_e * dedi[self.children.index(i)] )\n",
"Expr.compute_gradient = expr_compute_gradient\n"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {
"deletable": false,
"id": "eSZlnIrPwyGy",
"nbgrader": {
"checksum": "3a6b7e80d91473bd04c655136f6e067a",
"grade": false,
"grade_id": "cell-51b917c41b68398a",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [],
"source": [
"# Here you can write additional tests, to help debugging your code.\n",
"\n",
"# YOUR CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
"deletable": false,
"editable": false,
"id": "x7i0gNbpwyGy",
"nbgrader": {
"checksum": "98156e6ccc7f5c33ec28b884e2ca5a3a",
"grade": true,
"grade_id": "cell-7ac723567f722e2a",
"locked": true,
"points": 5,
"schema_version": 1,
"solution": false
}
},
"outputs": [],
"source": [
"### 5 Points: Tests for `compute_gradient` for a sum\n",
"\n",
"# First, the gradient of a sum.\n",
"vx = V(3)\n",
"vz = V(4)\n",
"y = vx + vz\n",
"check_equal(y.compute(), 7)\n",
"y.zero_gradient()\n",
"y.compute_gradient()\n",
"check_equal(vx.gradient, 1.)\n"
]
},
{
...