Search Algorithms - AI
Monte Carlo Tree Search
Basically this is a
Monte Carlo Tree Search
Basically, this is a type of search algorithm (heuristic) that builds probabilities at each state of the game by
randomly sampling different tree traversal paths. Starting at the top, it builds down leaf nodes to build up the likelihood
that a state will produce a win or a loss.
By using the UCB score formula, we ensure that we will eventually traverse different parts of the tree. The UCB score takes into account a level of uncertainty that we use to our advantage.
The formula for UCB score is: UCB score = average reward + exploration bonus
where:
"average reward" is the average reward obtained for a given action in previous iterations. "exploration bonus" is a measure of the uncertainty or variability of the reward distribution for the action. It is usually calculated as a function of the number of times the action has been selected so far and/or the total number of iterations.
AND
exploration bonus = C * sqrt(ln(total iterations) / number of times action selected)
where:
"C" is a parameter that controls the level of exploration. It is usually set to a small positive value. "total iterations" is the total number of iterations or rounds so far. "number of times action selected" is the number of times the action has been selected so far.
By using the UCB score formula, we ensure that we will eventually traverse different parts of the tree. The UCB score takes into account a level of uncertainty that we use to our advantage.
The formula for UCB score is: UCB score = average reward + exploration bonus
where:
"average reward" is the average reward obtained for a given action in previous iterations. "exploration bonus" is a measure of the uncertainty or variability of the reward distribution for the action. It is usually calculated as a function of the number of times the action has been selected so far and/or the total number of iterations.
AND
exploration bonus = C * sqrt(ln(total iterations) / number of times action selected)
where:
"C" is a parameter that controls the level of exploration. It is usually set to a small positive value. "total iterations" is the total number of iterations or rounds so far. "number of times action selected" is the number of times the action has been selected so far.