Complete the code for the Q-Training Algorithm section in your Jupyter Notebook. In order to successfully complete the code, you must do the following: Develop code that meets the given...



  1. Complete the code for theQ-Training Algorithmsection in your Jupyter Notebook. In order to successfully complete the code, you must do the following:


    1. Develop code that meets the given specifications:

      1. Complete the program for the intelligent agent so that it achieves its goal: The pirate should get the treasure.

      2. Apply a deep Q-learning algorithm to solve a pathfinding problem.




    2. Create functional code that runs without error.


    3. Use industry-standard best practicessuch as in-line comments to enhance readability and maintainability.






TreasureHuntGame/GameExperience.py # This class stores the episodes, all the states that come in between the initial state and the terminal state. # This is later used by the agent for learning by experience, called "exploration". import numpy as np class GameExperience(object): # model = neural network model # max_memory = number of episodes to keep in memory. The oldest episode is deleted to make room for a new episode. # discount = discount factor; determines the importance of future rewards vs. immediate rewards def __init__(self, model, max_memory=100, discount=0.95): self.model = model self.max_memory = max_memory self.discount = discount self.memory = list() self.num_actions = model.output_shape[-1] # Stores episodes in memory def remember(self, episode): # episode = [envstate, action, reward, envstate_next, game_over] # memory[i] = episode # envstate == flattened 1d maze cells info, including pirate cell (see method: observe) self.memory.append(episode) if len(self.memory) > self.max_memory: del self.memory[0] # Predicts the next action based on the current environment state def predict(self, envstate): return self.model.predict(envstate)[0] # Returns input and targets from memory, defaults to data size of 10 def get_data(self, data_size=10): env_size = self.memory[0][0].shape[1] # envstate 1d size (1st element of episode) mem_size = len(self.memory) data_size = min(mem_size, data_size) inputs = np.zeros((data_size, env_size)) targets = np.zeros((data_size, self.num_actions)) for i, j in enumerate(np.random.choice(range(mem_size), data_size, replace=False)): envstate, action, reward, envstate_next, game_over = self.memory[j] inputs[i] = envstate # There should be no target values for actions not taken. targets[i] = self.predict(envstate) # Q_sa = derived policy = max quality env/action = max_a' Q(s', a') Q_sa = np.max(self.predict(envstate_next)) if game_over: targets[i, action] = reward else: # reward + gamma * max_a' Q(s', a') targets[i, action] = reward + self.discount * Q_sa return inputs, targets TreasureHuntGame/TreasureHuntGame.ipynb { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Treasure Hunt Game Notebook\n", "\n", "## Read and Review Your Starter Code\n", "The theme of this project is a popular treasure hunt game in which the player needs to find the treasure before the pirate does. While you will not be developing the entire game, you will write the part of the game that represents the intelligent agent, which is a pirate in this case. The pirate will try to find the optimal path to the treasure using deep Q-learning. \n", "\n", "You have been provided with two Python classes and this notebook to help you with this assignment. The first class, TreasureMaze.py, represents the environment, which includes a maze object defined as a matrix. The second class, GameExperience.py, stores the episodes – that is, all the states that come in between the initial state and the terminal state. This is later used by the agent for learning by experience, called \"exploration\". This notebook shows how to play a game. Your task is to complete the deep Q-learning implementation for which a skeleton implementation has been provided. The code blocs you will need to complete has #TODO as a header.\n", "\n", "First, read and review the next few code and instruction blocks to understand the code that you have been given." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function\n", "import os, sys, time, datetime, json, random\n", "import numpy as np\n", "from keras.models import Sequential\n", "from keras.layers.core import Dense, Activation\n", "from keras.optimizers import SGD , Adam, RMSprop\n", "from keras.layers.advanced_activations import PReLU\n", "import matplotlib.pyplot as plt\n", "from TreasureMaze import TreasureMaze\n", "from GameExperience import GameExperience\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following code block contains an 8x8 matrix that will be used as a maze object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "maze = np.array([\n", " [ 1., 0., 1., 1., 1., 1., 1., 1.],\n", " [ 1., 0., 1., 1., 1., 0., 1., 1.],\n", " [ 1., 1., 1., 1., 0., 1., 0., 1.],\n", " [ 1., 1., 1., 0., 1., 1., 1., 1.],\n", " [ 1., 1., 0., 1., 1., 1., 1., 1.],\n", " [ 1., 1., 1., 0., 1., 0., 0., 0.],\n", " [ 1., 1., 1., 0., 1., 1., 1., 1.],\n", " [ 1., 1., 1., 1., 0., 1., 1., 1.]\n", "])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This helper function allows a visual representation of the maze object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def show(qmaze):\n", " plt.grid('on')\n", " nrows, ncols = qmaze.maze.shape\n", " ax = plt.gca()\n", " ax.set_xticks(np.arange(0.5, nrows, 1))\n", " ax.set_yticks(np.arange(0.5, ncols, 1))\n", " ax.set_xticklabels([])\n", " ax.set_yticklabels([])\n", " canvas = np.copy(qmaze.maze)\n", " for row,col in qmaze.visited:\n", " canvas[row,col] = 0.6\n", " pirate_row, pirate_col, _ = qmaze.state\n", " canvas[pirate_row, pirate_col] = 0.3 # pirate cell\n", " canvas[nrows-1, ncols-1] = 0.9 # treasure cell\n", " img = plt.imshow(canvas, interpolation='none', cmap='gray')\n", " return img" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pirate agent can move in four directions: left, right, up, and down. \n", "\n", "While the agent primarily learns by experience through exploitation, often, the agent can choose to explore the environment to find previously undiscovered paths. This
Jun 15, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here