beta pruning yields wrong results

I'm trying to implement an abstract minimax algorithm with alpha beta pruning. The minimax part works great, but as soon as I add alpha beta pruning, the IA starts to behave really silly, even skipping obvious moves. I'm not sure what's going on.

This is how my recursive function looks like:

- (id<MMGameMove>)getBestMove:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    id<MMGameMove> bestMove = nil;
    NSArray *allMoves = [game allMoves];

    for (id<MMGameMove> move in allMoves)
    {
        //Take the move and evaluate the game's score
        id<MMGame> gameBoard = [game clone];
        move.player = player;
        [gameBoard saveMove:move];
        self.count++;

        if (depth == 0 || gameBoard.isOver)
        {
            move.rank = [gameBoard scoreForPlayer:self.playerId depth:depth];
        }
        else
        {
            MMPlayerSeed opponent = (player == self.playerId) ? self.opponentId : self.playerId;
            move.rank = [self getBestMove:gameBoard player:opponent depth:depth-1 alpha:alpha beta:beta].rank;
        }

        //If the new move is better than our previous move, take it
        BOOL minMove = (player == self.opponentId && move.rank <= beta);
        BOOL maxMove = (player == self.playerId && move.rank >= alpha);

        if (minMove || maxMove)
        {
            BOOL shouldPrune = NO;
            if (minMove)
            {
                beta = move.rank;
                if (alpha >= beta) {
                    shouldPrune = YES;
                }
            }
            else if (maxMove)
            {
                alpha = move.rank;
                if (alpha <= beta) {
                    shouldPrune = YES;
                }
            }

            bestMove = move;

            if (shouldPrune && depth < self.maxDepth) {
                break;
            }
        }
    }

    return bestMove;
}

My initial call looks like this:

[self getBestMove:game player:self.playerId depth:self.maxDepth alpha:INT_MIN beta:INT_MAX];

As far as I understand, for the same game state, alpha-beta pruning should give me the very same move as minimax does without it, but with this implementation this is clearly not the case.

EDIT 1

After the suggested modification there were another error and it was that I was pruning root nodes. I edited the code to reflect the correct answer. After doing this and running both minimax with and without alpha-beta pruning I can see now that both yield the same result and also I was able to check the better performance you got out of alpha beta addition.

EDIT 2

The code posted above is not working as expected actually. I followed xXliolauXx suggestion, but I still can't get it working. I get the right values at depth = 0 or game over, but it seems they are not passed back recursively to the corresponding root move. For example, I'm able to see that my heuristic returns -3 for a child of the first root move and 0 for the rest of its childs. So I'm expecting that the first root move reports -3 instead of 0, since that's the worst case scenario the computer might find if it does that move.

Here's my new code:

- (NSInteger)alphabeta:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    if (depth == 0 || game.isOver)
    {
        return [game scoreForPlayer:self.playerId depth:depth];
    }

    MMPlayerSeed opponent = (player == self.playerId) ? self.opponentId : self.playerId;

    for (id<MMGameMove> move in game.allMoves)
    {
        id<MMGame> gameCopy = [game clone];
        move.player = player;
        [gameCopy saveMove:move];
        self.count++;

        NSInteger score = [self alphabeta:gameCopy player:opponent depth:depth-1 alpha:alpha beta:beta];

        if (player == self.playerId)
        {
            if (depth == self.maxDepth)
            {
                move.rank = @(score);
                [self.rootMoves addObject:move];
            }

            alpha = MAX(alpha, score);

            if (beta < alpha)
            {
                break;
            }
        }
        else
        {
            beta = MIN(beta, score);

            if (beta < alpha)
            {
                break;
            }
        }
    }

    return (player == self.playerId) ? alpha : beta;
}

Note that I prune when beta < alpha while maximizing. Otherwise, it will always prune after scanning the first root move.

This is how I initiate the recursion:

[self alphabeta:game player:self.playerId depth:self.maxDepth alpha:-INFINITY beta:INFINITY];

EDIT 3

I think I got it. Rather than return alpha or beta, I return the best (or worst) score. I need to cleanup my code to make it more readable, but here's how it looks like right now:

- (NSInteger)alphabeta:(id<MMGame>)game player:(MMPlayerSeed)player depth:(NSInteger)depth alpha:(NSInteger)alpha beta:(NSInteger)beta
{
    if (depth == 0 || game.isOver)
    {
        return [game scoreForPlayer:self.playerId depth:depth];
    }

    MMPlayerSeed opponent;
    NSInteger bestScore;

    if (player == self.playerId)
    {
        opponent = self.opponentId;
        bestScore = -INFINITY;
    }
    else
    {
        opponent = self.playerId;
        bestScore = INFINITY;
    }

    for (id<MMGameMove> move in game.allMoves)
    {
        id<MMGame> gameCopy = [game clone];
        move.player = player;
        [gameCopy saveMove:move];
        self.count++;

        NSInteger score = [self alphabeta:gameCopy player:opponent depth:depth-1 alpha:alpha beta:beta];

        if (player == self.playerId)
        {
            bestScore = MAX(bestScore, score);
            alpha = MAX(alpha, bestScore);

            if (depth == self.maxDepth)
            {
                move.rank = @(score);
                [self.rootMoves addObject:move];
            }

            if (beta < alpha)
            {
                break;
            }
        }
        else
        {
            bestScore = MIN(bestScore, score);
            beta = MIN(beta, bestScore);

            if (beta < alpha)
            {
                break;
            }
        }
    }

    return bestScore;
}

The mistake seems to be in your pruning-possible-part (it's an implementation of a negamax-alpha beta, while you are using minimax-alphabeta).

To fix it, simply add an if, wether you are currently maximising or minimising (as you did at the point where you change alpha or beta).

When you are minimising, you prune whenever alpha >= beta, when maximising vice-versa.

After that, the code should work fine (if there aren't other mistakes ;) ).

链接地址: http://www.djcxy.com/p/56346.html

上一篇: 带Alpha Beta修剪功能的MiniMax适用于奥赛罗不起作用

下一篇: beta修剪产生错误的结果