DATS 6450 – Reinforcement Learning

Authors

Tyler Wallett

Amir Jafari

Published

December 20, 2024

Gridworld (Lectures 4-6)
MountainCar (Lecture 7)
Breakout (Lecture 8)
Highway (Lecture 8)
CartPole (Lecture 9)
Cheetah (Lecture 10)

Instructor Information

  • Name: Tyler Wallett
  • Term: Fall 2025
  • Class location: MON 114
  • Class hours: 06:10 PM - 08:40 PM
  • Office location: Samson Hall Room 310
  • Office hours: Monday’s 2 - 4 PM
  • E-mail: twallett@gwu.edu
  • GitHub: twallett

Course Description

The aim of this course is to provide a comprehensive understanding of the reinforcement learning framework. The course will explore the key distinctions between reinforcement learning and other artificial intelligence learning paradigms, delve into relevant industry applications, and examine both classical and deep learning approaches. Additionally, the course will cover the taxonomy of reinforcement learning and offer hands-on experience through practical implementations using OpenAI Gymnasium (Brockman et al. 2016) and other learning environments.

The classical approach will focus on learning methods designed to find optimal solutions in tabular environments, whereas the deep learning approach will tackle the challenge of finding approximate optimal solutions in large or continuous environments through the use of deep learning architectures.

The course will introduce the taxonomy of reinforcement learning by focusing on model-free value-based and policy-based methods. Model-based reinforcement learning will be covered briefly, as it is allocated only one lecture.

To conclude, a discussion on advanced topics, applications, and outlook of reinforcement learning will be provided.

Learning Outcomes

  1. Implement reinforcement learning frameworks using numpy and tensorflow.
  2. Design decision-making systems using classical and deep learning architectures.
  3. Explain the reinforcement learning taxonomy.
  4. Identify reinforcement learning’s challenges, current research, and future outlook.

Resources

  • Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto (Web Link)
  • The Reinforcement Learning Course by Hugging Face (Web Link)
  • Spinning Up in Deep RL by OpenAI (Web Link)
  • OpenAI Gymnasium API documentation (Web Link)
  • Tensorflow Python API documentation (Web Link)

Software Requirements

  • Programming Language: Python.
pip install numpy tensorflow pygame gymnasium pickle tqdm tensorboard
  • Cloud-based GPU Environment: Google Colab.

  • Version Control: GitHub.

Course Outline

Summary of the Course Outline section.

Week Topic Quiz/Exams Learning Objectives
Aug. 28, 2025 Introduction • Why Should I Learn Reinforcement Learning?
• What is Reinforcement Learning?
• Where is Reinforcement Learning Applied?
• How is Reinforcement Learning Structured?
Sep. 4, 2025 Math Foundations • Set Theory
• Axiomatic Probability
• Conditioning
• Independence
• Random Variables
• Expectation
• Probability Distribution
Sep. 11, 2025 Multi-Armed Bandits Quiz 1 • Multi-Armed Bandit Framework
\(\epsilon\)-Greedy
• Upper Confidence Boundary (UCB)
• Thompson Sampling
Sep. 18, 2025 Dynamic Programming Quiz 2 • OpenAI Gymansium GridWorldEnv
• Markov Chain
• Markov Decision Process (MDPs)
• Iterative Policy Evaluation
• Value Iteration
Sep. 25, 2025 Monte Carlo Quiz 3 • OpenAI Gymansium GridWorldEnv
• Monte Carlo Prediction
• Exploring Starts Monte Carlo
• On-Policy Monte Carlo
• Off-Policy Monte Carlo
Oct. 2, 2025 Temporal Difference Quiz 4 • OpenAI Gymansium GridWorldEnv
• Temporal Difference (TD) Prediction
• SARSA
• Q-Learning
• Double Q-Learning
• (Optional) n-step Bootstrapping
Oct. 9, 2025 Fall Break
Oct. 16, 2025 Function Approximation Exam 1 • OpenAI Gymansium MountainCar-v0
• Value Function Approximation (VFA)
• On-Policy Function Approximation
• Semi-gradient SARSA
• Limitations of Off-Policy Function Approximation
Oct. 23, 2025 Deep Q-Networks Quiz 5 • OpenAI Gymansium ALE/Breakout-v5
• Deep Learning
• Deep Q-Networks (DQN)
Oct. 30, 2025 Policy Gradients I Quiz 6 • OpenAI Gymansium CartPole-v1 and Pusher-v5
• Policy Gradient Theorem
• Addressing Sparse Rewards
• Action Selections
• Vanilla Policy Gradient (VPG)
Nov. 6, 2025 Policy Gradients II Quiz 7 • OpenAI Gymansium HalfCheetah-v5
• Trust Regions
• Monotonic Improvement
• Proximal Policy Optimization (PPO)
Nov. 13, 2025 Monte Carlo Tree Search Exam 2 • OpenAI Gymansium CartPole-v0
• Model-based Reinforcement Learning
• Monte Carlo Tree Search (MCTS)
Nov. 20, 2025 Conclusion • Advanced Topics in Reinforcement Learning
• Identify the Reinforcement Learning Application
• Outlook of Reinforcement Learning
Nov. 27, 2025 Thanksgiving Break
Dec. 4, 2025 Final Project Presentation & Submission

Prerequisites

  • DATS 6101 - Introduction to Data Science

Assignments & Grading

Summary of the Assignments & Grading section.

Assignment Points
Quizzes (5 best scores) 25
Exam 1 25
Exam 2 25
Final Project 25
NoteAverage Learning Per Week

Students are expected to spend a minimum of 100 minutes of out-of-class work for every 50 minutes of direct instruction, for a minimum total of 2.5 hours a week. A 3-credit course should include 2.5 hours of direct instruction and a minimum of 5 hours of independent learning or 7.5 hours per week.

NoteOnline Resources

For technical requirements and support, student services, obtaining a GWorld card, and state contact information please check HERE

NoteClassroom Recording

The particular class recordings will be available to students who are registered on an individual basis, upon request. Please let me know in advance if you have any medical issues or emergencies that will prevent you from joining the class.

TipVirtual Academic Support

Writing and research consultations are available online. See HERE. Coaching, offered through the Office of Student Success, is available in a virtual format. See HERE. Academic Commons offers several short videos addressing different virtual learning strategies for the unique circumstances of the fall 2020 semester. See HERE. They also offer a variety of live virtual workshops to equip students with the tools they need to succeed in a virtual environment. See HERE.

WarningSafety and Security

In an emergency: call GWPD 202-994-6111 or 911. For situation-specific actions: review the Emergency Response Handbook in HERE. In an active violence situation: Get Out, Hide Out, or Take Out. See HERE. Stay informed: safety.gwu.edu/stay-informed.