Reinforcement Learning
These notes cover reinforcement learning following Richard S. Sutton and Andrew G. Barto's Reinforcement Learning: An Introduction, 2nd edition. The subject studies agents that learn by interacting with an environment: they observe state, choose actions, receive rewards, and improve behavior to maximize long-run return. The early material builds the finite Markov decision process framework and tabular solution methods. The middle material replaces tables with function approximation. The closing material connects RL to psychology, neuroscience, applications, and open research directions.
The organizing thread is value: how to predict return, how to improve policies from value estimates, and when direct policy optimization or planning is more appropriate. Read the pages in order if you want the Sutton-Barto progression from bandits to dynamic programming, Monte Carlo methods, temporal-difference learning, approximation, and policy gradients.
- Reinforcement Learning Problem and Finite MDPs
- Multi-armed Bandits
- Dynamic Programming
- Monte Carlo Methods
- Temporal-Difference Learning
- n-step Bootstrapping
- Planning and Learning with Tabular Methods
- On-policy Prediction with Approximation
- On-policy Control with Approximation
- Off-policy Methods with Approximation
- Eligibility Traces
- Policy Gradient Methods
- Psychology Connections
- Neuroscience Connections
- Applications and Frontiers