CartPole version one

This repository contains a solution for the CartPole-v1 problem of the gymnasium library with Deep Reinforcement Learning.
The project focuses on two major algorithms: DQN and SARSA, and evaluates their performance in solving the CartPole-v1 problem.

Problem Description

The CartPole-v1 environment involves balancing a pole on a cart that moves along a frictionless track. The agent's task is to prevent the pole from falling by applying forces to the cart. Below are the key features and conditions:

Property	Details
Goal	Keep the pole balanced for as long as possible.
Reward	+1 for every time step the pole is balanced.
Termination Conditions	Pole angle exceeds 12° or cart position exceeds the track boundaries.
Maximum Episode Length	500 time steps.

State Variables:

Cart Position (x)
Cart Velocity (ẋ)
Pole Angle (θ)
Pole Angular Velocity (θ̇)

Actions:

0 Push cart to the left.
1 Push cart to the right.

The task is episodic, and a well-trained agent aims to keep the pole balanced for the maximum reward of 500. For more information, visit the CartPole-v1 documentation.

Assignment Objectives

Algorithm Implementation:
- Implement DQN and SARSA algorithms and train agents separately.
- Compare the success or failure of each algorithm, focusing on their convergence speed, reward maximization, and overall performance.
Evaluation Metrics:
- Plot graphs for:
  - Rewards: The rewards earned by the agent.
  - Loss (for DQN): The error in predicting future rewards.
  - Epsilon Decay: The exploration-exploitation trade-off.
Hyperparameter Tuning:
- Train the DQN model with at least three different sets of hyperparameters and report the results in a table format.
- Analyze the impact of these hyperparameters on the performance of the models.
Testing:
- Test functions are provided in the code, and all final weights are saved in the respective files for reproducibility.

Hyperparameters

Below is the table for the hyperparameters used in the DQN and SARSA algorithms based on the three stages of experimentation:

DQN:

SARSA:

Experimentation

Part 1: Comparing DQN and SARSA

Convergence:
DQN consistently outperforms SARSA by converging faster and achieving higher rewards. SARSA, being an on-policy algorithm, is less effective in utilizing experience compared to off-policy DQN.
- DQN: Uses experiences multiple times (replay buffer) and selects actions based on a max Q-value approach, leading to better performance.
- SARSA: Follows the policy directly, which results in slower convergence and noisier performance.
Results Summary:
- DQN converges to optimal rewards, while SARSA struggles with noisy updates.

DQN plots:

SARSA plots:

Part 2: Boltzmann vs Epsilon-Greedy Exploration

Boltzmann Exploration:
Instead of using epsilon-greedy for exploration, Boltzmann exploration was implemented. In this strategy, temperature parameters control the randomness of action selection.
Parameters:
- Temperature: Controls exploration intensity.
- Decay Rate: Determines how fast the temperature reduces.
Results:
- Boltzmann temperature control led to faster convergence compared to epsilon-greedy.
- Hyperparameter tuning further accelerated convergence, with optimal parameters leading to early stopping at around 1500 episodes.

Boltzmann Plots:

Part 3: Hyperparameter Tuning

Parameter	Run 1	Run 2	Run 3	Optimal
Learning Rate	2.3e-2	2e-2	1e-3	2.3e-2
Discount Factor	0.9	0.98	0.96	0.93
Update Frequency	8	5	15	10

GIF

Agent.s.game.play.MP4

Conclusion

This project demonstrates the advantages of DQN over SARSA in reinforcement learning tasks, particularly when dealing with environments that require efficient exploration and experience replay. Boltzmann exploration proved more effective than epsilon-greedy, especially when tuned correctly.

Installation

To run the code:

Install the required libraries:

pip install gymnasium

Clone the repository:

git clone https://github.com/navidadkhah/CartPole-V1
cd CartPole-V1

Run the training scripts for DQN and SARSA.
You can run test function using their corresponding weights.

License

This project is under the MIT License, and I’d be thrilled if you use and improve my work!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
GIF		GIF
Weights		Weights
LICENSE		LICENSE
README.md		README.md
cartpoleDQNBoltzman.py		cartpoleDQNBoltzman.py
cartpoleDQNEGreedy.py		cartpoleDQNEGreedy.py
cartpoleDQNSoftmax.py		cartpoleDQNSoftmax.py
cartpoleSARSA.py		cartpoleSARSA.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CartPole version one

Problem Description

State Variables:

Actions:

Assignment Objectives

Hyperparameters

DQN:

SARSA:

Experimentation

Part 1: Comparing DQN and SARSA

DQN plots:

SARSA plots:

Part 2: Boltzmann vs Epsilon-Greedy Exploration

Boltzmann Plots:

Part 3: Hyperparameter Tuning

GIF

Conclusion

Installation

License

About

Releases

Packages

Languages

License

navidadkhah/CartPole-V1

Folders and files

Latest commit

History

Repository files navigation

CartPole version one

Problem Description

State Variables:

Actions:

Assignment Objectives

Hyperparameters

DQN:

SARSA:

Experimentation

Part 1: Comparing DQN and SARSA

DQN plots:

SARSA plots:

Part 2: Boltzmann vs Epsilon-Greedy Exploration

Boltzmann Plots:

Part 3: Hyperparameter Tuning

GIF

Conclusion

Installation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages