True/False: For a given Markov decision process, in order to extract the optimal policy π∗,it is sufficient to know the transition function T(s,a,s′) and optimal value function V ∗.If false, explain why this is false. If true, explain how to extract the policy.
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here