AlphaZero *does* perform planning.
That's done through MCTS, using a ConvNet to propose good moves and another one to evaluate positions.
The amount of time spent exploring the tree is potentially infinite.
That's reasoning and planning.
RL is used to train those nets.
AlphaZero Planning Through MCTS and Neural Networks
By
–
Leave a Reply