See attached file1 ∈ Artificial Intelligence - Assessed Exercise (deadline: March 16, 5pm UK...

Question

See attached file1   ∈  Artificial Intelligence - Assessed Exercise (deadline: March 16, 5pm UK time)  (Workload: ∼ 20 hours, it should not take you much more than that !!!)    1 Problem Statement  Your task is to design, implement, evaluate and document three virtual agents which are (potentially)  able to reach a goal in a custom Open AI Gym environment derived from Frozen. Thus, you will need  to install and understand the workings of the Open AI Gym environment to be able solve the task (hint:  see AI (H) Lab 2).    The specific environment/problem under consideration is a grid-world with a starting position (S), obsta-   cles (H) and a final goal (G). The task is to get from S to G. The environment is defined in uofgsocsai.py  via the class LochLomondEnv including documentation relating to the setting, specific states, parameters  etc.  An example of how  to instantiate the environment and navigate it using random actions is provided      in lochlomond_demo.py .    You must consider three agent types: a senseless/random agent, a simple agent and a reinforcement agent  based on the requirements listed in the following sections. Your  agents will be tested against other in-   stances of the same problem type, i.e., you can not (successfully) hard-code the solution.  You  will have   access to eighth specific training instances of the environment determined by the value of a single variable  problem_id.    Your agents and findings should be documented in a short (max 1000 word) technical report accompanied  by the actual implementation/code and evaluation scripts.      2 Tasks  You should provide a design, implementation, evaluation and documentation of three different agents;  each with its own set of specific requirements:  2.1 Task I: Senseless/Random agent [5% of marks]  You should provide a solution for an agent without sensory input which takes random actions. You should  initialise the environment as follows:    env = LochLomondEnv(problem_id=problem_id, is_stochastic=True, reward_hole=0.0)    where you can use problem_id [0:7] to evaluate the performance over different instances of the same  problem.    Purpose: This agent should be used as a naive baseline. Hint: A basic senseless/random agent is already  partly provided in lochlomond_demo.py (albeit with output computation of the performance measure...).    Requirements:  -Sensors: None (/random/full; it doesn’t matter...)  -Action: Discrete  -State-space: No prior knowledge (i.e. it has not got a map)  -Rewards/goal: No prior knowledge (does not know where the goal is located)      2.2 Task II: Simple Agent -- A* Search [10% of marks]  You should provide an agent based on a tree/graph-search and justify its use for solving the particular  the problem (using A* search with a suitable heuristic) assuming that the task environment is fully  known and observable. You should initialise the environment as follows:  https://gym.openai.com/envs/FrozenLake8x8-v0/ https://gym.openai.com/ 2   ∈  ∈  env = LochLomondEnv(problem_id=[0-7], is_stochastic=False, reward_hole=0.0)    where you can use problem_id [0:7] to evaluate the performance over different instances of the same  problem - and to fine-tune your agent to make sure it generalises. We recommend you use existing  code (from e.g. the AIMA toolbox) to solve this part).    Purpose: This agent is used as an ideal baseline to find the optimal path under ideal circumstances. Hint:  if you have attended the Lab sessions you will easily be able to reuse most of the code to solve this part  (a parser that maps from env.desc to the lab 3 format will be made available).    Requirements:  -Sensors: Oracle (i.e. you’re allowed to read the location of all object in the environment e.g. using   env.desc)  -Actions: Discrete and noise-free.  -State-space: Fully observable a priori .  -Rewards/goal: Fully known a priori (you are allowed to inform the problem with the rewards and location  of terminal states)      2.3 Key Task III: Reinforcement learning agent -- Q-Learning [85% of marks]  You should provide a reinforcement learning agent to solve the problem with minimal assumptions about  the environment (see below). The choice of the RL agent is up to you but we highly recommend using  tabular Q-learning. Regardless, the choice should be well-justified and meet the list of requirements  listed below. You should instantiate the environment as follows:  env = LochLomondEnv(problem_id=[0-7], is_stochastic=True, reward_hole=[YOUR-CHOICE]) ,  where you can use problem_id [0:7] to evaluate the performance over different instances of the problem  - and to fine tune your agent to make sure it generalises.  You are encouraged to write your own code  but use of 3rd party implementations (e.g. the AIMA toolbox) is allowed, if you demonstrate a sufficient  understanding of the algorithms in the accompanying documentation and cite appropriately.    Requirements:  -Sensors: Perfect information about the current state and thus available actions in that state; no prior  knowledge about the state-space in general  -Action: Discrete and noisy. The requested action is only carried out correctly with a certain probability  (as defined by uofgsocsai.py ).  -State-space: No prior knowledge, but partially observable/learn-able via the sensors/actions.  -Rewards: No prior knowledge, but partially observable via sensors/actions.    Notice: You can restart and replay the same instance of the problem multiple times and maintain the  knowledge obtained across repetitions/episodes. The reward should in this case be reported as;   a) the average across all restarts/repetitions and,   b) the single best policy (typically after a long learning phase).      3 Submission  You should include the following three items in your submission: 3.1 Implementation & Code  Your implementation - containing the three different agents (along  with  any  dependencies,  except  the  Open AI Gym) - should be uploaded to Moodle as a zip-file containing  the  source  code.  You  must  provide three separate and executable python scripts/programs named: run_random.py, run_simple.py  and run_rl.py which takes as (only required) argument the problem_id . Each script/program should  include training/learning phases (including possible repetitions/episodes of the problem) and output  a  3   text file (named after the agent, e.g. ”random”) with any relevant information. Hint: A template will be  provided via Moodle.  3.2 Experiment & Evaluation  An important aspect of RL is assessing and comparing the performance of agents and different policies.  To document the behavior of your agents you should design a suitable set of (computer) experiments  which produces a relevant set of graphs/tables to document the behavior of your RL agent (e.g. average  performance measure vs number of episodes, etc) and compares its performance against the baselines. The  evaluation strategy should be implemented in a single Python script (a wrapper) run_eval.py which runs  you entire evaluation, that is, it should call your agents, collate the results and produce the figures/tables  you have included in your report. The run_eval.py should be submitted via Moodle alongside your  implementation.  3.3 Short Report (1000 words)  You should document your work results in a short technical report of (aim for 1000 words; excluding  figures, tables, captions, references and appendices). Appendices may be used to provide extra  information to support the data and arguments in the main document, e.g., detailed simulation results  but should not provide crucial information required to understand the principle of the solution and the  outcome. You can include as many references as you see fit.   Requiremnts:  - Focus is at efficient, yet accurate and precise description of the implemented agents and learning  algorithms in the given context of the task. We mark your understanding of these agents / learning  algorithms and want to express the importance of their evaluation. You should  The report should be  submitted via Moodle as a pdf file alongside your implementation and evaluation scripts.    4 Marking Scheme  The assessment is based on the degree to which your submission (implementation, evaluation script and  report) concisely, correctly and completely addresses the following aspects:     Analysis [15%]  Introduction/motivation and correct PEAS analysis (including task environment characterisation). Method/design [15%]  Presentation of relevant theory and methods (for all your agents) with proper use of citations and justi-  fication for choosing specific methods (referencing your analysis).    Implementation [25%]  The code for all the agents (not in the report!) should be well-documented, follow best-practices in soft-  ware development and follow the outlined naming convention. The report must contain a presentation of  relevant aspects of the implementation.    Experiments / Evaluation  [20%] Evaluation script/program to reproduce the results (i.e. graphs/tables) adhering to  requirements  [20%] Relevant presentation of the evaluation strategy, metrics and the obtained simulation  results. A suitable presentation and comparison of the performance of the agent with other agents as  evaluated across a suitable number of problem variations (e.g. using graphs/tables).    Discussion and conclusion [5%]  - Including a critical reflections on the outcome/results The weighting of the senseless, simple and RL agent - for all marking criterion - is 5,10 and  85 %, respectively. So allocate your own time accordingly and remember that this  exercise counts 25% of your overal grade in AI, so it is very important.     4   Evaluation of Agents – Tipps you should consider   Issue: A large choice of different models with many algorithmic choices to be made (even for a simple problem)...  For example:  • Which method / model ?  • Parameters changes the results, which setting is better ?  • Initialisation changes the result (e.g. randomness in the environment), which is better ?  • Number of iterations changes the results, converged ?  For standard search based methods, theoretical computing science has been able to prove complexity, optimality  for many (e.g. A-star) under well-defined conditions.  In dealing with real-world AI problems we are often often left with asymptotic bounds depending on some  property of the problem. This effectively means that we often need to do a large amount of empirical evaluation  (we can still do theoretical stuff asymptotically):  Aspects  • Completeness  ▪ Will the procedure find a solution if there is one ?  • Optimality  ▪ Does it find the optimal solution, i.e. policy ?  • Complexity  ▪ Space: How much memory does the procedure require

1 ∈ Artificial Intelligence - Assessed Exercise (deadline: March 16, 5pm UK time) (Workload: ∼ 20 hours, it should not take you much more than that !!!) 1 Problem Statement Your task is to design,...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment