ex6/requirements.txt numpy matplotlib ipywidgets jupyter more_itertools torch tqdm gym[box2d] ex6/notebook_dqn.ipynb { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {},...


Deep Q-Networks in Pytorch




ex6/requirements.txt numpy matplotlib ipywidgets jupyter more_itertools torch tqdm gym[box2d] ex6/notebook_dqn.ipynb { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%javascript\n", "IPython.OutputArea.prototype._should_scroll = function(lines) {\n", " return false;\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Before submitting, make sure you are adhering to the following rules, which helps us grade your assignment. Assignments that do not adhere to these rules will be penalized.\n", "\n", " * Make sure your notebook only contains the exercises requested in the notebook, and the written homework (if any) is delivered in class in printed form, i.e. don't submit your written homework as part of the notebook.\n", " * Make sure you are using Python3. This notebook is already set up to use Python3 (top right corner); Do not change this.\n", " * If a method is provided with a specific signature, do not change the signature in any way, or the default values.\n", " * Don't hard-code your solutions to the specific environments which it is being used on, or the specific hyper-parameters which it is being used on; Be as general as possible, which means also using ALL the arguments of the methods your are implementing.\n", " * Clean up your code before submitting, i.e. remove all print statements that you've used to develop and debug (especially if it's going to clog up the interface by printing thousands of lines). Only output whatever is required by the exercise.\n", " * For technical reasons, plots should be contained in their own cell which should run instantly, separate from cells which perform longer computations. This notebook is already formatted in such a way, please make sure this remains the case.\n", " * Make sure your notebook runs completely, from start to end, without raising any unintended errors. After you've made the last edit, Use the option `Kernel -> Restart & Run All` to rerun the entire notebook. If you end up making ANY edit, re-run everything again. Always assume any edit you make may have broken your code!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 6: Deep Q-Networks in Pytorch\n", "\n", "In this assignment you will implement deep q-learning using Pytorch." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import copy\n", "import math\n", "import os\n", "from collections import namedtuple\n", "\n", "import gym\n", "import ipywidgets as widgets\n", "import matplotlib.pyplot as plt\n", "import more_itertools as mitt\n", "import numpy as np\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import tqdm\n", "\n", "plt.style.use('ggplot')\n", "plt.rcParams['figure.figsize'] = [12, 4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Environments\n", "\n", "In this notebook, we will implement DQN and run it on four environments which have a continuous state-space and discrete action-space. There are:\n", "\n", " * CartPole: Balance a pole on a moving cart (https://gym.openai.com/envs/CartPole-v1/).\n", " * Mountain Car: Gather momentum to climb a hill (https://gym.openai.com/envs/MountainCar-v0/).\n", " * AcroBot: A two-link robot needs to swing and reach the area above a line (https://gym.openai.com/envs/Acrobot-v1/).\n", " * LunarLander: A spaceship needs to fly and land in the landing spot. (https://gym.openai.com/envs/LunarLander-v2/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "envs = {\n", " 'cartpole': gym.make('CartPole-v1'),\n", " 'mountaincar': gym.make('MountainCar-v0'),\n", " 'acrobot': gym.make('Acrobot-v1'),\n", " 'lunarlander': gym.make('LunarLander-v2'),\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These environments are particularly cool because they all include a graphical visualization which we can use to visualize our learned policies. Run the folling cell and click the buttons to run the visualization with a random policy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def render(env, policy=None):\n", " \"\"\"Graphically render an episode using the given policy\n", "\n", " :param env: Gym environment\n", " :param policy: function which maps state to action. If None, the random\n", " policy is used.\n", " \"\"\"\n", "\n", " if policy is None:\n", "\n", " def policy(state):\n", " return env.action_space.sample()\n", "\n", " state = env.reset()\n", " env.render()\n", "\n", " while True:\n", " action = policy(state)\n", " state, _, done, _ = env.step(action)\n", " env.render()\n", "\n", " if done:\n", " break\n", " \n", " env.close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Jupyter UI\n", "\n", "def button_callback(button):\n", " for b in buttons:\n", " b.disabled = True\n", "\n", " env = envs[button.description]\n", " render(env)\n", " env.close()\n", "\n", " for b in buttons:\n", " b.disabled = False\n", "\n", "buttons = []\n", "for env_id in envs.keys():\n", " button = widgets.Button(description=env_id)\n", " button.on_click(button_callback)\n", " buttons.append(button)\n", "\n", "print('Click a button to run a random policy:')\n", "widgets.HBox(buttons)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Misc Utilities\n", "\n", "Some are provided, some you should implement\n", "\n", "### Smoothing\n", "\n", "In this homework, we'll do some plotting of noisy data, so here is the smoothing function which was also used in the previous homework." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def rolling_average(data, *, window_size):\n", " \"\"\"Smoothen the 1-d data array using a rollin average.\n", "\n", " Args:\n", " data: 1-d numpy.array\n", " window_size: size of the smoothing window\n", "\n", " Returns:\n", " smooth_data: a 1-d numpy.array with the same size as data\n", " \"\"\"\n", " assert
Nov 07, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here