Human-Reward Q-Learning Demo

Agent chooses an action. You choose the reward. The policy learns from you.
📘RL basics page

Q-values

Max Q per state (grid)

Low value
High value

All Q-values for all states

Learning Parameters