- Support games with more than 2 players
- Speed/memory optimized - Reaching about 3000 rollouts/sec per CPU core, meaning about 5 sec/game during self-play (using 800 rollouts per move), with an i5 from 2019 without GPU. All in all, that is a 25x to 100x speed improvement compared to initial repo, see details here.
- MCTS and logic optimized thanks to Numba, NN inference is now >70% time spent during self-plays based on profiler analysis
- Neural Network inference speed and especially latency improved, thanks to ONNX
- Batched MCTS for speed, no use of virtual loss
- Memory optimized with no performance impact, using zlib compression
- Algorithm improvements based on Accelerating Self-Play Learning in Go
- Playout Cap Randomization
- Improve MCTS strength
- Added Dirichlet Noise as per original DeepMind paper, using this pull request
- Automatic dirichlet noise computation on each move
- FPU, based on parent value (article)
- Learning based Q and Z (blog)
- Forced Playout from KataGo article
- Compute policy gradients properly when some actions are invalid based on A Closer Look at Invalid Action Masking inPolicy Gradient Algorithms and its repo
- Temperature strategy from SAI
- Optimize number of MCTS move, see github ticket
- MCTS parameters tuning
- PC-PIMC or what I call "universes" (https://doi.org/10.3389/frai.2023.1014561)
- Improve NN strength
- Use blocks from MobileNetv3 for optimal accuracy with high speed
- Improve training speed using OneCycleLR and AdamW
- Upgrade to KL-divergence loss instead of crossentropy
- HyperParameters Optimization with Population-Based Training
What I tried but didn't worked:
- MCTS: advanced cpuct formula (using init and base), surprise weight, and handle different training with Z and Q values (not averaging) like this article
- NN: SGD optimizer, ReduceLROnPlateau scheduler
- NN architecture: Dropout, BatchNorm2D (BatchNorm works), GARB article, regular architectures like EfficientNet, ResNet, ResNet v2, Squeeze-Excitation, Inception, ResNext, ...
- Performance improvements: new memory allocator (TBB, TC, JE, ...)
Others changes: parameters can be set in cmdline (added new parameters like time limit) and improved prints (logging, tqdm, colored bards depending on current Arena results). Output an ELO-like ranking
Still todo:
- Run full random move in 1% of game to increase diversity
- Multiprocessing to use several cores during self play
- KLD-thresholding (LeelaChessZero/lc0#721)