Welcome to the Play-All-ToyText with Q-Learning project! 🚀
In this project, I've applied the Q-Learning algorithm to solve problems in popular ToyText environments like FrozenLake, CliffWalking, Blackjack, and Taxi.
The goal is to train agents using Q-Learning to optimize policies and maximize rewards in these environments.
In ToyText environments, agents learn to take actions by maximizing rewards in games such as:
- FrozenLake 🌊
- CliffWalking 🧗♂️
- BlackJack 🃏
- Taxi 🚕
The objective of this project is to apply the Q-Learning algorithm to optimize agents' policies and achieve the highest possible rewards.
The Q-learning update for the Q-table is expressed as:
-
$Q(s, a)$ : The current Q-value for performing action$a$ in state$s$ . -
$alpha$ $(Learning Rate)$ : A scalar that controls how much the new information influences the update. Values range from$0$ to$1$ . -
$r$ $(Reward)$ : The immediate reward received after performing action$a$ in state$s$ . -
$gamma$ $(Discount Factor)$ : A scalar between$0$ and$1$ that determines the importance of future rewards. A higher$gamma$ emphasizes long-term rewards. -
$max_{a'}$ $Q(s', a')$ : The maximum Q-value for the next state$s'$ across all possible actions$a'$ . -
$(s, a)$ : The current state and action pair. -
$(s', a')$ : The next state and the set of possible actions.
-
Temporal Difference (TD) Error: The difference between the expected Q-value and the current Q-value:
$TD$ $Error$ $=$ $r$ +$\gamma \max_{a'}$ $Q(s', a')$ -$Q(s, a)$ -
Q-value Update: The Q-value for the current state-action pair
$(s, a)$ is updated using the TD error, scaled by the learning rate$(alpha)$ . This balances learning from new experiences versus relying on existing knowledge. -
Learning Dynamics:
- The update incorporates both the immediate reward
$r$ and the discounted future rewards$gamma$ $max_{a'}$ $Q(s', a')$ . - Over time, the Q-table converges to optimal values, assuming sufficient exploration and a properly tuned learning rate.
- The update incorporates both the immediate reward
Environment | Demo | Plot (Results) |
---|---|---|
FrozenLake-v1 | ||
CliffWalking-v0 | ||
BlackJack-v1 | ||
Taxi-v3 |