Final Project Ideas
Build a system that learns which ads get clicked most often. Start with fake ad data and test different strategies like ε-greedy (sometimes try random ads) and Thompson Sampling (use probability to guide choices).
What you’ll learn: How to balance exploring new options vs. using what works best.
Create a recommendation system that suggests articles based on context (time of day, device type, user history). Uses contextual bandits that consider multiple factors when making decisions.
What you’ll learn: How to incorporate user context into decision-making.
Build a pricing algorithm that adjusts product prices daily based on demand patterns. The system learns optimal pricing to maximize revenue while adapting to market changes.
What you’ll learn: How to handle changing environments and non-stationary problems.
Automatically choose which banner ad to show on a website to get the most clicks. The system adapts in real-time as user preferences change throughout the day.
What you’ll learn: Real-time optimization and A/B testing with learning algorithms.
Simulate a food delivery app that learns which dishes to highlight based on factors like time of day, weather, and day of the week to increase orders.
What you’ll learn: Seasonal patterns and time-based optimization.
Build a small-scale recommendation system using RL to model long-term engagement, inspired by Google’s RecSim research, but on a synthetic dataset.
What you’ll learn: Sequential user interaction modeling and long-term recommendation optimization.
Train a virtual car to drive on highways using the highway-env simulator. The agent learns lane changing, speed control, and safe overtaking maneuvers.
What you’ll learn: Continuous control and safety constraints in RL.
Train a simulated robot to pick up and place objects using Gymnasium’s robot environments. Start with simple reaching tasks and progress to manipulation.
What you’ll learn: Robotics control and continuous action spaces.
Create a trading algorithm that learns to buy and sell stocks/ETFs using historical data. Focus on risk management and portfolio optimization strategies.
What you’ll learn: Financial applications of RL and risk-reward optimization.
https://finrl.readthedocs.io/en/latest/index.html
Build a web app where users rate AI-generated text responses. The system learns from human preferences to improve its outputs over time (simplified RLHF demo).
What you’ll learn: Human-in-the-loop learning and preference modeling.