Final Project Overview
Instructions
Instruction table for the Final Project.
Using the different Multi-Armed Bandit algorithms learned in Lecture 3. Your task is to:
- Define the Reinforcement Learning Framework: Formulate a Multi-Armed Bandit problem.
- Define the data.
- Specify the action space \(\mathcal{A}\).
- Specify the reward structure \(R\).
- Define the model: Implement at least two Reinforcement Learning algorithms to solve the problem.
- Define the metrics: To evaluate performance, we will use:
- Cumulative reward: Total return over a time horizon.
- (Optional) Regret: The difference between the reward of the best fixed arm and the reward obtained by the algorithm.
- (Optional) Stability score: Standard deviation of the reward over time.
- (Optional) Adaptability: Performance when the environment shifts.
- Effectively communicate your findings: CV style.
Accomplished [X] as measured by [Y], by doing [Z].
Example: Improved asset allocation strategy stability as measured by lower reward variance across trials, by tuning \(\epsilon\) in an \(\epsilon\)-Greedy policy.
Using a Classical or Deep Reinforcement Learning algorithm learned in Lectures 5-11. Your task is to:
- Define the Reinforcement Learning Framework: Formulate a Markov Decision Process (MDP).
- Define the environment dynamics \(P(s',r|s,a)\).
- Define the state space \(\mathcal{S}\).
- Define the action space \(\mathcal{A}\).
- Define the reward function \(R\).
- Define the episode structure.
- Define the model: Implement at least one Reinforcement Learning algorithm to solve the problem.
- On-Policy Monte Carlo
- Off-Policy Monte Carlo
- SARSA
- Q-Learning
- Double Q-Learning
- n-step Bootstrapping
- Semi-Gradient SARSA
- Deep Q-Network (DQN)
- Vanilla Policy Gradient (VPG)
- Proximal Policy Optimization (PPO)
- Monte Carlo Tree Search (MCTS)
- Define the metrics: To evaluate performance, we will use:
- Cumulative reward: Total return over a time horizon.
- (Optional) Stability score: Standard deviation of the reward over time.
- Effectively communicate your findings: CV style.
Accomplished [X] as measured by [Y], by doing [Z].
Example: Improved asset allocation strategy stability as measured by lower reward variance across trials, by tuning \(\epsilon\) in an \(\epsilon\)-Greedy policy.
Presentation
Your team must clearly present all of the following points:
- Define the Reinforcement Learning Framework.
- Define the model.
- Define the metrics.
- Effectively communicate your findings.
- Conclusion.
Use clear and concise slides with visuals (e.g., charts, diagrams) to support your points. Ensure the presentation is structured logically and is easy to follow.
Here’s an example from Capstone Group 11:
Submission
Each team must submit prior to the deadline, each of the following:
- Code
.zip: A fully functional implementation of your Reinforcement Learning algorithms, withmain.py,model.pyandenv.py. - Report
.pdf: A detailed report summarizing your approach, findings, and conclusions. - Presentation
.pdf: A concise and clear presentation of your work, including visuals and key takeaways.
Late submissions will incur a \(10\%\) penalty for the team.
Grading
Summary of the grading table:
| Criteria | Description | Weight |
|---|---|---|
| Problem Formulation | Clear definition of the Reinforcement Learning framework. | 20% |
| Algorithm Implementation | Correct and efficient implementation of at least two Reinforcement Learning algorithms. | 30% |
| Performance Evaluation | Use of appropriate metrics to evaluate the algorithms’ performance. | 20% |
| Communication of Findings | Clear and concise presentation of results, including visuals and insights. | 20% |
| Code Quality and Documentation | Well-structured, readable, and documented code. | 10% |
(Optional) Capstone or Research
If you or your team would like to discuss this project or potential research opportunities in more detail, feel free to reach out to me at twallett@gwu.edu.