-
Click Next Action.
The agent chooses an action with ε-greedy Q-learning and moves in the grid.
-
Look at the move and click a reward:
- 👎
-1 = bad move
- 😐
0 = neutral / no opinion
- 👍
+1 = good move
-
Your feedback is used as the reward signal to update the Q-table. Over time,
the agent learns what you consider good or bad actions.
This is a minimal example of human-in-the-loop reinforcement learning:
the environment doesn't define the reward; you do.
Connect With Me