Nov 13, 2006
-------------

- Recap
	- finding values given a policy

- Formulating games as RL problems
        - formulating Towers of Hanoi as a RL problem

- Another game: knights move
        - visit all states exactly once!
        - can you formulate it similarly?
                - Ans: NO!

- Why?
        - because the rewards matrix is not "Markov"
        - reward depends not just on previous state
          but also "how you got there"
        - these games are considerably more difficult
          to implement!

- Writing out equations for the algorithms discussed thus far
	- V^pi as a function of pi
	- improved pi, using arg max on V

- Writing out Bellman optimality equations
	- V*, independent of pi
	- pi*, using arg max on V*