Skip to content

Implementation of the PPO algorithm to train an agent to play the classic Snake game.

Notifications You must be signed in to change notification settings

DREI-8/Snake_PPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snake PPO

A pedagogical Proximal Policy Optimization (PPO) project applied to the classic Snake game.

Overview

This repository demonstrates how an agent, based on the PPO algorithm, learns to play Snake by collecting food and avoiding collisions. Over time, the agent discovers a strategy of quickly circling around the reward before taking it to minimize self-collisions, especially since it does not precisely track its entire tail position.

Installation

  1. Clone this repository.
  2. Install dependencies:
    pip install -r requirements.txt
  3. You can then run or modify main.ipynb to train or test the PPO agent.

PPO Algorithm

PPO is a policy gradient method designed to stabilize training by limiting updates to the policy. In this project:

  • We use clipping (controlled by "clip_epsilon") to avoid overly large updates.
  • We incorporate entropy regularization (controlled by "entropy_coef") to encourage exploration.
  • We apply Generalized Advantage Estimation (GAE) for more stable advantage computation.

The core training code is in the train function of PPOAgent, and the environment loops in SnakeEnv.

Demo

Below is a demo of the trained agent playing, occasionally circling around the reward to avoid self-collisions (it doesn't know its tail's exact position, see main.ipynb for the observation space details):

Demo GIF placeholder

Results

The agent’s reward curve increases as it masters collecting food. Episode length first rises with better survival but eventually decreases when it chooses to sacrifice longevity for quicker gains:

Reward Curves placeholder

Usage

  • Train the agent by running:
    # Inside main.ipynb
    agent.train(total_epochs=10000, steps_per_epoch=4096)
  • Test the trained agent (with optional rendering):
    agent.test_episode(render=True)

Explore main.ipynb for more details on experiments, and review the in-code comments for deeper understanding.

About

Implementation of the PPO algorithm to train an agent to play the classic Snake game.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published