Iterative Deepening Negamax with Alpha

I have a working negamax algorithm in my program. However, I need the program to find the best possible move within kMaxTimePerMove time. I did some research, and it seemed that using iterative deepening with my negamax algorithm would be the best way to do so. Right now, my function that starts the search looks like this:

// this is a global in the same scope as the alpha-beta functions, so they can check the elapsed time
clock_t tStart;

int IterativeDeepening(Board current_state)
{
    bool overtime = false;
    int depth = 0;
    tStart = clock();

    MoveHolder best_move(-1, kWorstEvaluation);

    while ((static_cast<double> (clock() - tStart)/CLOCKS_PER_SEC) < kMaxTimePerMove)
    {
        MoveHolder temp_move = AlphaBetaRoot(kWorstEvaluation, -best_move.evaluation_,++depth, current_state, overtime);          
        if (!overtime)
            best_move = temp_move;
    }

    return best_move.column_;
}

I think I should also be reordering the previous best move to the front of the children list, however, I am waiting on implementing that until I get the basic version working. The actual Alpha-Beta functions look like this:

MoveHolder AlphaBetaRoot(int alpha, int beta, int remaining_depth, Board current_state, bool &overtime)
{
    MoveHolder best(-1, -1);
    if (overtime)
        return MoveHolder(0,0);

    std::vector<Board> current_children;
    current_state.GetBoardChildren(current_children);

    for (auto i : current_children)
    {
        best.evaluation_ = -AlphaBeta(-beta, -alpha, remaining_depth - 1, i, overtime);
        if ((static_cast<double> (clock() - tStart)/CLOCKS_PER_SEC) > kMaxTimePerMove)
        {
            overtime = true;
            return MoveHolder(0,0);
         }
        if (best.evaluation_ >= beta)
            return best;
        if (best.evaluation_ > alpha)
        {
            alpha = best.evaluation_;
            best.column_ = i.GetLastMoveColumn();
        }
    }
    return best;
}

int AlphaBeta(int alpha, int beta, int remaining_depth, Board2 current_state, bool &overtime)
{
    if (overtime)
        return 0;
    if ((static_cast<double> (clock() - tStart)/CLOCKS_PER_SEC) > kMaxTimePerMove)
    {
        overtime = true;
        return 0;
    }

    if (remaining_depth == 0 || current_state.GetCurrentResult() != kNoResult)
    {
        return current_state.GetToMove() * current_state.GetCurrentEvaluation();
    }


    std::vector<Board> current_children;
    current_state.GetBoardChildren(current_children);
    for (auto i : current_children)
    {
        int score = -AlphaBeta(-beta, -alpha, remaining_depth - 1, i, overtime);
        if (score >= beta)
        {
            return beta;
        }
        if (score > alpha)
        {
            alpha = score;
        }
    }
    return alpha;
}

When I try to debug, everything seems like it is working as expected. However, when I have the iterative deepening version play against the regular alpha-beta implementation, it consistently loses. At times it seems like it gets "stuck", and returns a terrible move.

As an example, if this program is "forced" to make a move the next turn, or else the opponent will win, it doesn't block the win. On that move, it reported that it was searching to a depth of 38. I am finding the algorithm extremely difficult to debug, because if I break the execution, it ruins the timing.

I'm not sure if I have implemented the algorithm incorrectly, or simply have a tricky bug in here. If someone could point me in the right direction, I would really appreciate it.


You are using -best_move.evaluation_ as the beta value for the search, where best_move is the best move from the previous depth. This isn't correct: Suppose a move looks good at depth=2 but turns out to be bad at greater depths. This method will continue to consider it good, and cause beta cutoffs which should not have happened on other moves.

You should search each iteration on (-infinity, infinity) to fix this. You can also use aspiration windows to limit the alpha-beta range.

Note that since you do not use the previous iteration to improve move ordering on the next ones, iterative deepening will result in slightly worse results. Ideally you want move ordering to choose the best move from a transposition table and/or the principal variation of the previous iteration.

链接地址: http://www.djcxy.com/p/56396.html

上一篇: 用python AI alpha实现连接4

下一篇: 使用Alpha迭代深化Negamax