Homework 3

Homework for Lecture 3: Multi-Armed Bandits 📝

Instructions:

- Show ALL Work, Neatly and in Order.
- No credit for Answers Without Work.
- Submit a single PDF file including all solutions.
- DO NOT submit individual files or images.
- For coding questions, submit ONE .py file with comments.

Note

For this homework, you only need numpy, pandas, and sklearn’s MinMaxScaler function.

Coding Exercise 1: Load Environments

Load existing Bernoulli and Gaussian environments from create_environment function using a random seed of \(123\).

Coding Exercise 2: Recommendation Systems

Using the existing Epsilon Greedy (\(\epsilon\) = 0.10), Upper Confidence Boundary (UCB) and Thompson Sampling code, create a recommendation system.

Coding Exercise 3: MAB Performance

For 10,000 recommendations:

  1. Does Epsilon-Greedy (\(\epsilon = 0.10\)) perform better in the Bernoulli or Gaussian environment?
  2. Does UCB perform better in the Bernoulli or Gaussian environment?
  3. Does Thompson Sampling perform better in the Bernoulli or Gaussian environment?
  4. Which algorithm performs best in the Bernoulli environment?
  5. Which algorithm performs best in the Gaussian environment?

Hint: Check the performance of each MAB by observing the most frequently recommended arm.

Coding Exercise 4: Random Seed Analysis

Using random seeds \(0\)-\(50\), for \(10,000\) recommendations, do the algorithms perform the same?

Coding Exercise 5: Amazon Dataset Analysis

For the Amazon.csv advertisement dataset repeat exercise E.4. Which arm (ad) would you recommend (advertise)?