Nov 13, 2006 ------------- - Recap - finding values given a policy - Formulating games as RL problems - formulating Towers of Hanoi as a RL problem - Another game: knights move - visit all states exactly once! - can you formulate it similarly? - Ans: NO! - Why? - because the rewards matrix is not "Markov" - reward depends not just on previous state but also "how you got there" - these games are considerably more difficult to implement! - Writing out equations for the algorithms discussed thus far - V^pi as a function of pi - improved pi, using arg max on V - Writing out Bellman optimality equations - V*, independent of pi - pi*, using arg max on V*