5. Markov Decision Process Consider the following scenario. You are reading email, and you get an offer from the CEO of Marsomania Ltd., asking you to consider investing into an expedition, which...


What are the possible policies in this MDP?


5. Markov Decision Process<br>Consider the following scenario. You are reading email, and you get an offer from the CEO of Marsomania<br>Ltd., asking you to consider investing into an expedition, which plans to dig for gold on Mars. You can either<br>choose to invest, with the prospect of either getting money or fooled, or you can instead choose to ignore<br>your emails and go to a party. Of course your first thought is to model this as Markov Decision Process, and<br>you come up with the MDP as follows.<br>Get money Stay<br>Invest<br>Read<br>Emails (E)<br>R=0<br>Go to<br>party<br>(M)<br>R=10000<br>.2<br>Be fooled<br>(F)<br>Re-100<br>1 Go back<br>Stay<br>Have fun<br>(H)<br>R-1<br>

Extracted text: 5. Markov Decision Process Consider the following scenario. You are reading email, and you get an offer from the CEO of Marsomania Ltd., asking you to consider investing into an expedition, which plans to dig for gold on Mars. You can either choose to invest, with the prospect of either getting money or fooled, or you can instead choose to ignore your emails and go to a party. Of course your first thought is to model this as Markov Decision Process, and you come up with the MDP as follows. Get money Stay Invest Read Emails (E) R=0 Go to party (M) R=10000 .2 Be fooled (F) Re-100 1 Go back Stay Have fun (H) R-1
Your MDP has four states: Read emails (E), Get money (M), Be fooled (F) or Have fun (H). The actions are<br>denoted by fat arrows, the (probabilistic) transitions are indicated by thin arrows, annotated by the transition<br>probabilities. The rewards only depend on the state, for example, the reward in state E is 0, in state M it is<br>10,000.<br>

Extracted text: Your MDP has four states: Read emails (E), Get money (M), Be fooled (F) or Have fun (H). The actions are denoted by fat arrows, the (probabilistic) transitions are indicated by thin arrows, annotated by the transition probabilities. The rewards only depend on the state, for example, the reward in state E is 0, in state M it is 10,000.

Jun 11, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions ยป

Submit New Assignment

Copy and Paste Your Assignment Here