Homework 10

Homework for Lecture 10: Advanced Policy Gradients 📝

Instructions:

- Show ALL Work, Neatly and in Order.
- No credit for Answers Without Work.
- Submit a single PDF file including all solutions.
- DO NOT submit individual files or images.
- For coding questions, submit ONE .py file with comments.

Note

For this homework, you only need gymnasium, numpy, tensorflow, os, pickle, tqdm & tensorboard.

Coding Exercise 1: Proximal Policy Optimization (PPO) Clip

For the HalfCheetah-v5 environment, code the update function for the Proximal Policy Optimization: Clip algorithm using the provided parameters.