Skip to main content

Reinforcement Learning

These notes cover reinforcement learning following Richard S. Sutton and Andrew G. Barto's Reinforcement Learning: An Introduction, 2nd edition. The subject studies agents that learn by interacting with an environment: they observe state, choose actions, receive rewards, and improve behavior to maximize long-run return. The early material builds the finite Markov decision process framework and tabular solution methods. The middle material replaces tables with function approximation. The closing material connects RL to psychology, neuroscience, applications, and open research directions.

The organizing thread is value: how to predict return, how to improve policies from value estimates, and when direct policy optimization or planning is more appropriate. Read the pages in order if you want the Sutton-Barto progression from bandits to dynamic programming, Monte Carlo methods, temporal-difference learning, approximation, and policy gradients.

  1. Reinforcement Learning Problem and Finite MDPs
  2. Multi-armed Bandits
  3. Dynamic Programming
  4. Monte Carlo Methods
  5. Temporal-Difference Learning
  6. n-step Bootstrapping
  7. Planning and Learning with Tabular Methods
  8. On-policy Prediction with Approximation
  9. On-policy Control with Approximation
  10. Off-policy Methods with Approximation
  11. Eligibility Traces
  12. Policy Gradient Methods
  13. Psychology Connections
  14. Neuroscience Connections
  15. Applications and Frontiers