Unexpected path dependence in alpha
I'm writing an artificial intelligence for the old Norse tafl family of board games (project here, source file at issue here). They're close enough to chess in broad strokes for knowledge of chess AI to apply here. (The variant in question is played on a 7x7 board with a radially symmetric starting position, white starting in the middle and black starting at the edge.) I'm running into a curious problem with how I've implemented alpha-beta search: the result of a search to a fixed depth, with no optimizations enabled besides alpha-beta pruning, changes depending on the order in which nodes are explored.
In the file at issue, the important methods are 'explore', 'exploreChildren', 'handleEvaluationResults', and 'generateSuccessorMoves'. 'explore' checks to see if there's a transposition table hit (disabled elsewhere for this test), evaluates the state if it's a victory or a leaf node, or calls exploreChildren. exploreChildren does the recursive searching on child nodes. generateSuccessorMoves generates (and optionally sorts) the moves exiting the current state. handleEvaluationResults determines whether a child evaluation has caused a cutoff.
So, I wrote a minimal test case: generateSuccessorMoves first does no sorting whatsoever, then simply shuffles the list of moves rather than sort it. The results of the search are not equivalent in result, nor in result considering symmetry, nor in value:
MAIN SEARCH
# cutoffs/avg. to 1st a/b a/b
Depth 0: 0/0 0/0
Depth 1: 0/22 0/1
Depth 2: 42/0 3/0
Finding best state...
Best move: d3-g3 with path...
d3-g3
e1-f1
e4-e1xf1
End of best path scored -477
Observed/effective branching factor: 23.00/9.63
Thought for: 72msec. Tree sizes: main search 893 nodes, continuation search: 0 nodes, horizon search: 0 nodes
Overall speed: 12402.77777777778 nodes/sec
Transposition table stats: Table hits/queries: 0/0 Table inserts/attempts: 0/0
1. move: d3-g3 value: -477
Using 5000msec, desired 9223372036854775807
Depth 3 explored 1093 states in 0.037 sec at 29540.54/sec
MAIN SEARCH
# cutoffs/avg. to 1st a/b a/b
Depth 0: 0/0 0/0
Depth 1: 0/21 0/2
Depth 2: 104/0 2/0
Finding best state...
Best move: d3-f3 with path...
d3-f3
d2-c2
d5-f5xf4
End of best path scored -521
Observed/effective branching factor: 23.00/10.30
Thought for: 37msec. Tree sizes: main search 1093 nodes, continuation search: 0 nodes, horizon search: 0 nodes
Overall speed: 29540.540540540544 nodes/sec
Transposition table stats: Table hits/queries: 0/0 Table inserts/attempts: 0/0
7. move: d3-f3 value: -521
This is an extreme case, obviously, but it's my understanding that alpha-beta in this situation (that is, without any feature besides 'alpha-beta pruning') should be stable no matter what the order of the search is—at the very least, it should return a node of the same value. Am I wrong? Am I doing something wrong?
First edit: although I suppose it's obvious from the description of this problem, it turns out that there is some as-yet unknown bug in my alpha-beta implementation. Further testing shows that it does not provide the same result as pure minimax.
Second edit: this is the pseudocode version of the alpha-beta search implemented in the file linked above.
explore(depth, maxDepth, alpha, beta)
// some tafl variants feature rules where one player moves more than once in a turn
// each game tree node knows whether it's maximizing or minimizing
var isMaximizing = this.isMaximizing()
var value = NO_VALUE
if(isTerminal(depth, maxDepth))
value = eval()
else
for(move in successorMoves)
if(cutoff) break
nodeValue = nextNode(move).explore(depth + 1, maxDepth, alpha, beta)
if(value == NO_VALUE) value = nodeValue
if(isMaximizing)
value = max(value, nodeValue)
alpha = max(alpha, value)
if(beta <= alpha) break
else
value = min(value, nodeValue)
beta = min(beta, value)
if(beta <= alpha) break
rootNode.explore(0, 5, -infinity, infinity)
It turns out it was my fault. I have a bit of code which recursively revalues the nodes above a certain node, for use in extension searches, and I was calling it in the wrong place (after exploring all the children of any node). That early back-propagation was causing incorrect alpha and beta values, and therefore early cutoffs.
链接地址: http://www.djcxy.com/p/56322.html上一篇: 同时使用minimax和AB修剪搜索游戏树。 那可能吗?
下一篇: 在alpha中出现意外的路径依赖