Search Algorithms - AI


Basically this is a

        
Basically, this is a type of search algorithm (heuristic) that builds probabilities at each state of the game by randomly sampling different tree traversal paths. Starting at the top, it builds down leaf nodes to build up the likelihood that a state will produce a win or a loss.

By using the UCB score formula, we ensure that we will eventually traverse different parts of the tree. The UCB score takes into account a level of uncertainty that we use to our advantage.

The formula for UCB score is: UCB score = average reward + exploration bonus

where:

"average reward" is the average reward obtained for a given action in previous iterations. "exploration bonus" is a measure of the uncertainty or variability of the reward distribution for the action. It is usually calculated as a function of the number of times the action has been selected so far and/or the total number of iterations.

AND

exploration bonus = C * sqrt(ln(total iterations) / number of times action selected)

where:

"C" is a parameter that controls the level of exploration. It is usually set to a small positive value. "total iterations" is the total number of iterations or rounds so far. "number of times action selected" is the number of times the action has been selected so far.