Please open the PDF file named "CS380 Assignment 4.pdf" for assignment instructions and details. This assignment has to be coded in Python. A zip folder with the base code in Python is provided as well, you would have to build on this based on the assignment instructions.
Any code written should be original and will be checked through Moss and other plagiarism detectors. Please make sure that all work is original. Also make sure that any libraries imported for use is allowed based on the assignment instructions.
CS380 Assignment 4 CS380:ArtificialIntelligence Assignment4:ReinforcementLearning Frogger(20pts) Frogger[https://en.wikipedia.org/wiki/Frogger]isanarcadegamefromthe1980sthat hasbeenportedtomanydifferenthomegamingsystemsandisstillavailabletoplay online(e.g.,https://froggerclassic.appspot.com).Inthegame,youcontrolafrogthatis tryingtocrossabusyroadandabusyrivertoarriveatahomerowatthetopofthe screen.Ontheroad,thefrogmustavoidthemovingcars;attheriver,thefrogmustjump betweenthefloatinglogsandturtlestoavoidfallingintothewater.Thereisa30-second timerduringwhichthefrogmustcompleteitsjourney,otherwiseitdiesandthefrogis regeneratedatthebottomofthescreen.Eachtimethefrogreachesthehomerow,itis awardedanumberofpointsproportionaltothetimeleftonthetimer. Inthisassignment,youwilldevelopareinforcement-learning(RL)agentthatplays Frogger.TheagentwillplaywithinasimplifiedversionofthegamethatrunsinPython andincludesmanyaspectsoftheoriginalgame.Hereisasamplescreenshot: Thisassignment,likeothersforthisclass,isanindividualassignmentthatwillbetested andgradedintheusualway.Asabonus,wealsoplantoconductamini-tournament betweenallthesubmittedFroggeragents;thedetailswillbeannouncedlater,butplease keepinmindthatyourcodewillbeembeddedinthistournamentandthusallstudents mustcarefullyfollowtheguidelinesbelowtoensurethattheiragentcancompete successfullyinthetournament. Setup Tostarttheassignment,pleasedownloadthegivencode,whichincludestwomain components:thefroggermodulethatimplementsthegameitself,andtheagentmodule whichhousestheagentcode.Atthetoplevel,thereisamain.pyfilethataccepts command-lineargumentsandrunsthesimulation. ThegamemoduleusesthePythonArcadelibrary[https://arcade.academy]which providesabaseengineonwhichtobuildthegameitself.Forthisreason,youwillfirst needtoinstallthislibraryonyourmachine(or,alternatively,avirtualmachineifyou prefer).Thisshouldbestraightforwardbyentering: > pip3 install arcade==2.3.15 Youcantestoutthegamebyentering: > python3 main.py --player=human Youshouldbeabletocontrolthefrogwiththearrowkeys,andeventuallyquitthegame bypressing‘q’ortheescapekey. Whilewearehopefulthateveryonewillbeabletorunthefullsystemwithgraphics, PythonArcadecanbedependentonyourspecificsetup.Ifyouhaveissues,youmight checktheirwebsite[https://arcade.academy]forfurtherinstallationinstructions specifictoyourmachine.Nevertheless,ifyoudonotgetthegraphicsworkingintheend, thereisabackupplan:ourgameprovidesawaytoprintthestatetotheterminalwithno separatewindoworgraphics.Tousetext-onlyoutput,simplyincludethe--output=text flaginthecommandlinewhenrunninganagent,forexample: > python3 main.py --player=agent --steps=10 --output=text (Unfortunately,thetext-onlyoutputversionofthegamerunsonlywithagentcontrol,and doesnotallowforhumancontrolviathekeyboard.)Thefullsuiteofcommand-line optionswillbedetailedlaterinthisdocument.Notethat,evenifyoucanrunthegraphics version,youmaychoosetousetext-onlyoutputforfastertraining,sincethetextoutputis noticeablyfasterthanthegraphicswhenrunningatfullspeed. Forthisassignment,allyourowncodeandfilesmustbeenclosedwithinyouragent module;youshouldnotmodifymain.pyorthefroggermodule(exceptifyouneedtext outputasnotedabove).Asbefore,exceptforthePythonArcadelibrary(andits dependencies),youmayonlyusebuilt-instandardlibraries(e.g.,math,random,etc.);you mayNOTdirectlyuseanyexternallibrariesorpackages,includingthoseinstalledwith Arcade(e.g.,numpy). FroggerModule Thefroggermoduleincludesallthecodeandotherfiles(e.g.,spriteimages)neededto runthegame.Ingeneral,youwillnotneedtounderstandthedetailsofthismodule;in fact,inmanyapplicationsofAI/ML,thedetailsoftheunderlyingenvironmentare unknown,opaque,orhiddenbehindablack-boxgamingengine.However,itisimportant tounderstandthedetailsoftheAPI:thestateinformationsentfromthegametoyour agent,andtheactioninformationyouragentneedstosendbacktothegame. Whencommunicatingwiththeagent,theAPIsendsthegamestatetotheagentwithall thenecessaryinformationencodedinastring.Thescreenitselfisencodedasagridof characterswhereeachcharacterdenotesadifferenttexture.Forexample,thenormal screenprovidedinthegamecanberepresentedwiththisstring: 'EEEEEEEEEEEEEEEE|~~~KLLLM~~~KLLLM~~|TT~~TTTT~~~TT~~~~~|LLM~~~~K LLLLM~~~KL|SSSSSSSSSSSSSSSSSS|-----DD--------DD-|--AAA------- AAA---|------C---------C-|SSSSSSSSSSSSSSSSSS' Here,thevariouscharactersrepresentdifferentobjects: • E:grassinthehomerow(atthetopofthescreen) • ~:water • K,L,M:start,middle,endofalog • T:turtle • S:purplemiddledivider • A,C,D:carsofdifferenttypes • –:road Inadditiontothescreen,theAPIstringalsoincludesafewotherpiecesofinformation: whetherthefroghasreachedthehomerow;ifithas,thescoregivenforreachingthegoal; andwhetherthetimerhasrunout.YouwillnotneedtocodethedetailsofparsingthisAPI string,however;thisisprovidedforyouintheagentbasecode,asdescribedbelow. AgentModule TheprimarytaskinthisassignmentistobuildoutyouragentmoduletoimplementanRL agentthatplaysFrogger.Asastartingpoint,thegivencodeincludestwofiles,state.py andagent.py.Thegivencodeincludesasampleagentthatsimplymakesrandommoves. Youcantestoutthisagentbyentering: > python3 main.py --player=agent --steps=10 Thiscommandwillrun10stepsofthesimulationwiththerandomagentandthen terminate. TheStateclassinstate.pyparsesthestateAPIstringandprovidesanumberofuseful variables:frog_xandfrog_yforthepositionofthefrog,at_goalwhichindicateswhether thefroghasreachedthehomerow,scorewhichgivesthescoreachievedifatthegoal,and is_donewhichindicateswhetherthefrogisdoneforthisepisode(i.e.,eitherreachedthe goalorfailedtoreachthehomerowbeforetimeexpired).Thereareadditionalmethodsto checkforlegalpositionsonthescreenandtogetthecellcharacteratapositiononthe screen.Youshouldnotneedtomodifystate.py,butmostimportantly,sinceyouragent willeventuallybetiedintothetournamentcode,youmustensurethattheAPIstring parsingremainsthesamesothatyouragentwillworkwithinthetournament. Thefileagent.pyiswhereyouwillneedtobuildyourRLagent,specificallyanagentthat implementsQ-learning.Thedetailsareprovidedinthesectionsbelow. StateRepresentation TheQ_StateclassprovidedextendsthebasicStateclasstoaddadditionalfunctionality neededforQ-learning.First,thereward()methodshowshowwemightcalculatea rewardforagivenstate,givingstate.scoreasarewardifthefroghasreachedthegoal, andanegativerewardifthestateis“done”andthetimerhasrunout. The_compute_key()methodintheQ_Stateclassiscriticaltolearning:itreducesthe complexgamestatetosomethingmanageablethatcanlearnandgeneralizefromthestate. Inessence,iftheQ-tableisimplementedasaPythondictionary,thiskeyisthestringthat willservetoindexthestateintothisdictionary.Therearemanypossibleoptionsforthis key.Ontheonehand,ifthekeyrepresentstheentirescreen,Q-learningwouldneedto learnactionsforeverypossiblescreen!–clearlynotafeasibleoption.Ontheotherhand,if thekeyrepresentsonly(forexample)thecellinfrontofthefrog,itwouldbemissinga greatdealofcontext.Thesamplemethodprovidedusesthethreecellsinfrontofthefrog. YoushouldexperimentwithbetteroptionsafteryouhaveimplementedthebasicQ- learningalgorithmbelow. Q-Learning TheAgentclassinagent.pyiswhereyouwillimplementtheactuallearningalgorithm. First,youshouldconsiderhowtoimplementtheQ-tableitself,likelyasadictionaryover statekeysandsecondarilyoverpossibleactions.Then,youwillneedtoimplementQ- learningwithinthechoose_action()method.Thismethod(whichmustkeepthisname forconsistencywithotheragents)isthecentralmethodthatreceivesthecurrentstate stringandreturnstheactiontobetaken.ItshouldconstructaninternalQ_Stateusingthe givenstatestring,andtheneithertraintheQ-table(iftrainingison)orsimplychoosean action(iftrainingisoff).Thepossibleactions,asincludedinstate.py,aresimplyoneof ['u','d','l','r','_'](forup,down,left,right,andnone,respectively). ThecriticalpieceofQ-learningtobeimplementedistheequationseeninlecture: ?(?, ?) ← (1 − ?)?(?, ?) + ?[? + ?max!"?(?", ?")] Thechoose_action()methodwillbecalledforeachupdateofthestate,andthusyouwill needtokeeptrackofthepreviousstateandactioninordertoproperlyimplementthis equation.Also,pleasethinkaboutthe“explorationvs.exploitation”aspectofthis algorithmdiscussedinclass:whiletheagentmaywanttotakethebestactionmostofthe time,itmaybeagoodideatoallowforsomeprobabilityofperformingsomeotheraction (e.g.,arandomaction)toallowtheagenttoexploreunseenstates. TheAgentclassalsoprovidesmethodsforsavingandloadingtheQ-table.Forsimplicity, werecommendsavingtheQ-tableoneverystepofthesimulation,ensuringthatall trainingissavedincrementally.YoucanalsobackupyourQ-tablesinthesefiles,and choosethebesttrainedagentasyourfinalsubmittedagent. Command-LineInterface Thecodeprovidedincludesacommand-lineinterfacewithmanyoptions.Thegeneral commandis: > python3 main.py [options] Thepossibleoptionsareasfollows: • --player=
:ifplayeris‘human’,thegamerunswithhumaninput; otherwise,playershouldbe‘agent’(default)andwillrunusingtheagent • --screen=:screenshouldbeoneof‘medium’(default),‘hard’,or‘easy’to denoteenvironmentsofdifferentdifficulty • --steps=:stepsindicatesthenumberofsimulationstepstotake;ifnot provided,thesimulationwillrununtilmanuallyterminated(default) • --train=:trainprovidesthefilenameofthetrainingfilethatcontainstheQ- table,tobesavedinthetrain/subdirectory;ifnotprovided,thesimulationruns withouttraining(default) • --speed=:speedcanbeeither‘slow’forreal-timesimulation,or‘fast’for trainingpurposes • --restart=:restartisanumberbetween1and8indicatingtherowat whichthesimulationshouldrestartthefrogforeachepisode;thisisusefulfor trainingtokeepthefrognearthehomerowforinitialtraining • --output=