-
Notifications
You must be signed in to change notification settings - Fork 49
MCTS All Move As First
Peter Shih edited this page Jun 20, 2017
·
4 revisions
- Background
- Two sets of value:
- Standard update from MCTS episodes
- AMAF values
- Basic idea:
- The value of a move is often unrelated to the moves plays elsewhere
- Section 4.1 from reference [3]
- The value of a move is often unrelated to the moves plays elsewhere
- Two sets of value:
- Variants
- Alpha-AMAF: weight between the two sets
- Cuteoff-AMAF: Use AMAF values in the first k iterations
- RAVE: Like alpha-AMAF, but each node has its own alpha value
- Generalized AMAF: Also use AMAF value of a parent node
- Discussion
- In Hearthstone, the value of a move is STRONGLY DEPENDS on the board state
- E.g., If there are many enemy minions, a strong AOE is a very good move.
- We can use (state, move) to re-use the previous playout result
- In previous playouts, we knew the move A1 is good in state S1
- So, if current state is S1, we'd like to play A1 more likely
- Especially when the playouts of this node is not much enough
- However, the granularity of the state we just mentioned should be carefully defined.
- If the granularity contains too many details, then rarely the information can be used
- If the granularity contains too less details, then it might be mis-leading
- Maybe we can use a policy network to guide the MCTS selection phase
- In Hearthstone, the value of a move is STRONGLY DEPENDS on the board state
- References
- All-Moves-As-First Heuristics in Monte-Carlo Go
- Generalized Rapid Action Value Estimation
- Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go