11.1 Model-based Reinforcement Learning

What if you could practice in your imagination before stepping into the real world? 🧩

In Lectures 5-10, we explored Model-free Reinforcement Learning.

Model-free Reinforcement Learning emphasizes learning directly from interactions with the environment without relying on a model of its dynamics.

Agents need to sample many environment interactions to learn environment dynamics.

\[ P(s^{'}, r| s, a) \]

Exploration is blind without a model of environment dynamics. Model-free methods focus on immediate rewards.

WarningProblem

But what if agents could predict the outcomes of their actions without directly interacting with the environment? Could this lead to more efficient learning?

NoteReal Life Example 🧠

Think about how we humans often plan.

For example, imagine you are about to graduate from GW. You could take two possible actions:

Action \(A_1\): Consulting Job. Reward \(R\): This path has consistently provided the highest immediate payoff — strong career growth and financial stability make it the best-known choice.

Action \(A_2\): PhD Program. Reward \(R\): A modest performer so far, but not explored much. With more investment, it could reveal higher potential in research opportunities and long-term impact.

The outcomes of each choice are not immediately visible — you can’t just try both and “reset.” Instead, you have to simulate in your head what the future might look like based on your prior knowledge and expectations:

  • What career growth might the consulting job bring?
  • What opportunities could the PhD open up?
  • How long will each path take?

Unlike trial-and-error learning, where feedback comes directly from experience, here you are relying on a mental model of the world to forecast what might happen and make a choice.