Support games with more than 2 players
Speed/memory optimized - Reaching about 3000 rollouts/sec per CPU core, meaning about 5 sec/game during self-play (using 800 rollouts per move), with an i5 from 2019 without GPU. All in all, that is a 25x to 100x speed improvement compared to initial repo, see details here.
- MCTS and logic optimized thanks to Numba, NN inference is now >70% time spent during self-plays based on profiler analysis
- Neural Network inference speed and especially latency improved, thanks to ONNX
- Batched MCTS for speed, no use of virtual loss
- Memory optimized with no performance impact, using zlib compression
Algorithm improvements based on Accelerating Self-Play Learning in Go
- Playout Cap Randomization
Improve MCTS strength
- Added Dirichlet Noise as per original DeepMind paper, using this pull request
- Automatic dirichlet noise computation on each move
- FPU, based on parent value (article)
- Learning based Q and Z (blog)
- Forced Playout from KataGo article
- Compute policy gradients properly when some actions are invalid based on A Closer Look at Invalid Action Masking inPolicy Gradient Algorithms and its repo
- Temperature strategy from SAI
- Optimize number of MCTS move, see github ticket
- MCTS parameters tuning
- PC-PIMC or what I call "universes" (https://doi.org/10.3389/frai.2023.1014561)
Improve NN strength
- Use blocks from MobileNetv3 for optimal accuracy with high speed
- Improve training speed using OneCycleLR and AdamW
- Upgrade to KL-divergence loss instead of crossentropy
- HyperParameters Optimization with Population-Based Training

What I tried but didn't worked:

MCTS: advanced cpuct formula (using init and base), surprise weight, and handle different training with Z and Q values (not averaging) like this article
NN: SGD optimizer, ReduceLROnPlateau scheduler
NN architecture: Dropout, BatchNorm2D (BatchNorm works), GARB article, regular architectures like EfficientNet, ResNet, ResNet v2, Squeeze-Excitation, Inception, ResNext, ...
Performance improvements: new memory allocator (TBB, TC, JE, ...)

Others changes: parameters can be set in cmdline (added new parameters like time limit) and improved prints (logging, tqdm, colored bards depending on current Arena results). Output an ELO-like ranking

Still todo:

Run full random move in 1% of game to increase diversity
Multiprocessing to use several cores during self play
KLD-thresholding (LeelaChessZero/lc0#721)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_features.md

README_features.md

Files

README_features.md

Latest commit

History

README_features.md

File metadata and controls