Nokia original game:
- Max: Score/Max_Score
- Eat a maximum number of apples
- Until end/win the game (max apples = rows * cols - snake.init_len)
- Min: Step/Score
- With the minimum number of time-steps
Shortest distance in a time dependant graph.
Partly Stochastic: next objective location is unkonwn and random.
States space:
Actions space:
Transitions: Deterministic (vs. Probabilistic). P(s, a, Succ(s,a)) = 1
Rewards:
todo
The game can be modified for the competition:
Parameters | Nokia | Snake Competition |
---|---|---|
Board size | _ rows x 15 cols | 30 rows x 30 cols |
Snake Starting length | 3 body parts | 1 body part |
Snake Starting position | Top Center | Random |
Snake Starting direction | Top-Down | n/a |
- The Game Completion Score (GCS)
- Avg. score over the maximum score (Score = number of Apples eaten)
- Obj: Maximise (up to 100%)
- The Game Over rate (GOR)
- Number of game_over / total_games played
- Objective: Minimize
- The Performance rate (PR)
- Avg. ( number of Steps / score )
- Obj: Minimise
- Max: n^2
- The Performance Rate at 0% and at 1% (PR0 and PR1)
- The Performance rate with 0% Game-over rate (all win)
- The Performance rate with a Game-over rate < 1%
- A. Operational Research
- A1. Search
Pathfinding algorithms, such as: Dijkstra, A*..
To research: Pathfinding in time-dependent graphs - A2. Optimization
Method: Linear Programming - A3. Genetic algorithm
NEAT algo with NN
- A1. Search
- B. Machine Learning
- B1. Supervised Learning Deep Learning
Methods: Neural Network, Deep (DNN), Convo (CNN) - B2. Reinforcement Learning (RL)
Q-value iteration
Deep Q-Netwrok (DQN)
Further: Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO), Monte Carlo Tree Search
- B1. Supervised Learning Deep Learning
Game features:
- General
- Basic game logic
- Human agent (Keyboard)
- Create GIU using pygame
- Refactor code using OOP
- Separate Engine and Agent logic
- Refactor Session/Engine class for OpenAI Gym interface Env()
- Refactor Agent
- History
- Record Episodes
- Replay and Tests
- Save env states and solutions
- [ ]
- Benchmarking
- [ ]
A1. Pathfinding Algorithms:
- Add Greedy algo / agent
- Add A* algo / agent
- Add Modes: Play and Benchmark
- Logging history
- Create snapshot of game configuration/situation/..
- Replay of snapshots
- Rewind steps (manual)
A3. Evolution/Genetic algorithm selecting NN:
- [ ]
B1. Supervised Learning: Deep Learning:
- Generating Training data
B2. Reinforcement Learning:
- Q-Learning
- [ ]
- Deep Q-Learning
- [ ]