Adverserial search troubles

2018-05-29 12:30:09

I'm writing a Connect4 game with an AI opponent using adversarial search techniques and I have somewhat run into a wall. I feel that I'm not far from a solution but that there's perhaps a problem where I'm switching perspectives (as in: the perspective of which participant I'm basing my evaluation scores on), missing a minus sign somewhere or something like that.

The problem is either that in the variations that I've tried that the AI chooses not to block the player when the player has three-in-a-row but otherwise the AI plays a perfect game, or that he prefers to block the player, even if he has the chance to win the game. It also seems to matter whether or not the search depth is an even or an uneven number, as the AI is pants-on-head retarded at a 6-ply search, which is pretty telling that something is wrong.

Search

The algorithm used is negamax with alpha-beta pruning and is implemented as follows:

private int Negamax(int depth, int alpha, int beta, Player player)
{
  Player winner;
  if (Evaluator.IsLeafNode(game, out winner))
  {
    return winner == player ? (10000 / depth) : (-10000 / depth);
  }

  if (depth == Constants.RecursionDepth)
  {
    return Evaluator.Evaluate(game, depth, player);
  }

  foreach (var move in moves)
  {
    int row;

    if (board.DoMove(move, player, out row))
    {
      var value = -Negamax(depth + 1, -beta, -alpha, (Player)1 - (int)player);

      board.UndoMove(move, row, player);

      if (value > alpha)
      {
        alpha = value;
        if (player == Player.AI)
        {
          bestColumn = move;
        }
      }

      if (alpha >= beta)
      {
        return alpha;
      }

    }
  }
  return alpha;
}

I don't suspect that the problem is in this function, but it might be.

Evaluation

I've based the evaluation function off of the fact that there are only 69 possible ways to get four-in-a-row on a 7x6 board. I have a look-up table of about 350 items that contains the hardcoded information for every column and row which win-combinations the row+column is a part of. For example, for row 0 and column 0, the table looks like this:

//c1r1
table[0][0] = new int[3];
table[0][0][0] = 21;
table[0][0][1] = 27;
table[0][0][2] = 61;

This means that column 0, row 0 is part of win-combination 21, 27 and 61.

I have a second table, that contains for both players how many stones it has in each of the win-combinations. When I do a move then I do the following:

public bool DoMove(int column, Player p, out int row)
{
  row = moves[column];

  if (row >= 0)
  {
    Cells[column + row * Constants.Columns] = p;

    moves[column]--;

    var combinations = this.Game.PlayerCombinations[p];

    foreach (int i in TerminalPositionsTable.Get(column,row))
    {
      combinations[i]++;
    }

    return true;
  }
  else
  {
    return false;
  }
}

The opposite is of course being done for UndoMove .

So after doing a move on column 0, row 0 by Player.Human , the table will be filled with a value of 1 at index 21, 27 and 61. If I do another move in a cell that is also part of win-combination 27, then the player combination table gets incremented at index 27 to 2.

I hope I have made that clear, as it's used in the evaluation function to very quickly determine how close a player is to scoring four-in-a-row.

The evaluation function, where I suspect the problem lies, is as follows:

public static int Evaluate(Game game, int depth, Player player)
{
  var combinations = game.PlayerCombinations[player];

  int score = 0;

  for (int i = 0; i < combinations.Length; i++)
  {
    switch (combinations[i])
    {
      case 1:
        score += 1;
        break;
      case 2:
        score += 5;
        break;
      case 3:
        score += 15;
        break;
    }
  }

  return score;
}

So I simply loop through the 69 possible win-combinations and add an amount to the score based on whether it's a single stone, two-in-a-row or three.

The part I'm still confused about in this whole adversarial search is whether I should care which player is doing a move? I mean, should I pass in the player like I do here, or should I always evaluate the board from the perspective of the AI player? I've tried many combinations of aiScore - humanScore , or just always look from the perspective of Player.AI , and such. But I've hit a dead end and every combination I've tried was pretty flawed.

So:

Is the logic of my evaluation solid in its basis?

When should I 'switch perspective'?

Any help would be much appreciated.

Update

I've implemented Brennan's suggestions below, and while it has definitely much improved, for some reason it doesn't block three-in-a-rows on any column but the two left and right-most, and only when the search-depth is uneven. The AI is unbeatable at even search depths, but only until depth 8 and up. Then it refuses to block again. This is pretty telling that I'm probably very close, but still have some crucial flaw.

Perhaps this has to do with me setting the column the AI should drop a stone in as Brennan commented, but I don't know when else to set it. Setting it only at depth 0 doesn't work.

Update 2

Edited the code as it looks like now with Brennan's changes.

Update 3

Created a Github repo with the full code. If you don't know how to work Git, just download a zip file from here.

It's a .NET 4.0 project, and running it will create log files of the negamax algorithm in your documents/logs directory. The solution also contains a test project, that contains a test for every board column whether or not the AI chooses to block the player when the player has three-in-a-row there.

This stuff makes my brain hurt, so I'm not positive that this answer is correct, but here goes.

In negamax, the score is always evaluated relative to the player currently on move. If it's white's move, then a high score is good for white. If it's black's move, then a high score is good for black. So if you have a leaf node, whether the score is +inf or -inf is determined not by whether the node is a win for white or black, but whether it's a win for the player you're currently evaluating. Replace this:

return winner == Player.AI ? (10000 / depth) : (-10000 / depth);

with this:

return winner == player ? (10000 / depth) : (-10000 / depth);

There is a similar problem in your evaluation function. Replace this:

return player == Player.AI ? score : -score;

with this:

return score;

Again, I'm not sure this is right. But I hope you try those two changes and let me know if it works. I'm very curious!

If it's not blocking certain combinations it sounds like you have a flaw in your table of possible wins.

I also see a problem in your evaluation function: It gives value to moves that have NO hope of winning. Suppose you have xoo.x, you're playing o. Your routine says it's worth 15 points to play here when in reality it's worth 0. Any win pattern that already contains tiles from both players is of no value to anyone.

I have found that when debugging this sort of thing the debugger is of little value as it doesn't let you see the big picture very well. Try writing to a log file each pattern it's checking--put an actual drawing in the log.

链接地址: http://www.djcxy.com/p/1652.html

上一篇: 以编程方式中断原始

下一篇: 对手寻找麻烦