Project 5
Spring 2022
Due: T 12/6 @ 11:59 PM ET
9 points
Implement Russell and Norvig's grid world, depicted in Figure 17.1, as an environment so an agent does not have access to the transition function.
Implement Sutton and Barto's algorithm for Q-learning using an $\epsilon$-greedy strategy. You can find this algorithm in your lecture notes or on page 131 of Sutton and Barto's book Reinforcement Learning: An Introduction.
I put starter code for this project on Canvas (in the Files section) and on cs-class. You can retrieve it on cs-class by typing
cs-class-1$ cp ~maloofm/cosc570/p5.zip ./
Name NetID In accordance with the class policies and Georgetown's Honor Code, I certify that, with the exceptions of the course materials and those items noted below, I have neither given nor received any assistance on this project.
When you are ready to submit your project for grading, put your source files, Makefile, and honor statement in a zip file named submit.zip. Upload the zip file to Autolab using the assignment p5. Make sure you remove all debugging output before submitting.
Copyright © 2022 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.