Using minimax search for card games with imperfect information

2018-06-20 01:12:07

I want to use minimax search (with alpha-beta pruning), or rather negamax search, to make a computer program play a card game.

The card game actually consists of 4 players. So in order to be able to use minimax etc., I simplify the game to "me" against the "others". After each "move", you can objectively read the current state's evaluation from the game itself. When all 4 players have placed the card, the highest wins them all - and the cards' values count.

As you don't know how the distribution of cards between the other 3 players is exactly, I thought you must simulate all possible distributions ("worlds") with the cards that are not yours. You have 12 cards, the other 3 players have 36 cards in total.

So my approach is this algorithm, where player is a number between 1 and 3 symbolizing the three computer players that the program might need to find moves for. And -player stands for the opponents, namely all the other three players together.

private Card computerPickCard(GameState state, ArrayList<Card> cards) {
    int bestScore = Integer.MIN_VALUE;
    Card bestMove = null;
    int nCards = cards.size();
    for (int i = 0; i < nCards; i++) {
        if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card
            int score;
            GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state)
            score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE);
            if (score > bestScore) {
                bestScore = score;
                bestMove = cards.get(i);
            }
        }
    }
    // now bestMove is the card to place
}

private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) {
    ArrayList<Card> cards;
    if (player >= 1 && player <= 3) {
        cards = state.getCards(player);
    }
    else {
        if (player == -1) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(2));
            cards.addAll(state.getCards(3));
        }
        else if (player == -2) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(3));
        }
        else {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(2));
        }
    }
    if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached
        if (player >= 1 && player <= 3) {
            return state.getCurrentPoints(player); // player's points as a positive value (for self)
        }
        else {
            return -state.getCurrentPoints(-player); // player's points as a negative value (for others)
        }
    }
    else {
        int score;
        int nCards = cards.size();
        if (player > 0) { // make one move (it's player's turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // wenn Zug gültig ist
                    score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha);
                    if (score >= beta) {
                        return score;
                    }
                    if (score > alpha) {
                        alpha = score; // alpha acts like max
                    }
                }
            }
            return alpha;
        }
        else { // make three moves (it's the others' turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // if move is valid
                    for (int k = 0; k < nCards; k++) {
                        if (k != i) {
                            GameState futureStateLevel2 = futureState.testMove(cards.get(k));
                            if (futureStateLevel2 != null) { // if move is valid
                                for (int m = 0; m < nCards; m++) {
                                    if (m != i && m != k) {
                                        GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m));
                                        if (futureStateLevel3 != null) { // if move is valid
                                            score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha);
                                            if (score >= beta) {
                                                return score;
                                            }
                                            if (score > alpha) {
                                                alpha = score; // alpha acts like max
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
            return alpha;
        }
    }
}

This seems to work fine, but for a depth of 1 ( depthLeft=1 ), the program already needs to calculate 50,000 moves (placed cards) on average. This is too much, of course!

So my questions are:

Is the implementation correct at all? Can you simulate a game like this? Regarding the imperfect information, especially?

How can you improve the algorithm in speed and work load?

Can I, for example, reduce the set of possible moves to a random set of 50% to improve speed, while keeping good results?

I found UCT algorithm to be a good solution (maybe). Do you know this algorithm? Can you help me implementing it?

Minimax search as you've implemented it is the wrong approach for games where there is so much uncertainty. Since you don't know the card distribution among the other players, your search will spend an exponential amount of time exploring games that could not happen given the actual distribution of the cards.

I think a better approach would be to start with good rules for play when you have little or no information about the other players' hands. Things like:

If you play first in a round, play your lowest card since you have little chance of winning the round.

If you play last in a round, play your lowest card that will win the round. If you can't win the round, then play your lowest card.

Have your program initially not bother with search and just play by these rules and have it assume that all the other players will use these heuristics as well. As the program observes what cards the first and last players of each round play it can build up a table of information about the cards each player likely holds. Eg a 9 would have won this round, but player 3 didn't play it so he must not have any cards 9 or higher. As information is gathered about each player's hand the search space will eventually be constrained to the point where a minimax search of possible games could produce useful information about the next card to play.

I want to clarify details that the accepted answer doesn't really go into.

In many card games you can sample the unknown cards that your opponent could have instead of generating all of them. You can take into account information like short suits and the probability of holding certain cards given play so far when doing this sampling to weight the likelihood of each possible hand (each hand is a possible world that we'll solve independently). Then, you solve each hand using perfect information search. The best move over all of these worlds is often the best move overall - with some caveat.

In games like Poker this won't work very well -- the game is all about the hidden information. You have to precisely balance your actions to keep the information about your hand hidden.

But, in games like trick-based card games, this works pretty well - particularly since new information is being revealed all the time. Really good players have a good idea what everyone holds anyway. So, reasonably strong Skat and Bridge programs have been based on these ideas.

If you can completely solve the underlying world, that is best, but if you can't, you can use minimax or UCT to choose the best move in each world. There are also hybrid algorithms (ISMCTS) that try to mix this process together. Be careful about the claims here. Simple sampling approaches are easier to code -- you should try the simpler approach before a more complex one.

Here are some research papers that will give some more information on when the sampling approach to imperfect information has worked well:

Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search (This paper analyzes when the sampling approach is likely to work.)

Improving State Evaluation, Inference, and Search in Trick-Based Card Games (This paper describes the use of sampling in Skat)

Imperfect information in a computationally challenging game (This paper describes sampling in Bridge)

Information Set Monte Carlo Tree Search (This paper merges sampling and UCT/Monte Carlo Tree Search to avoid the issues in the first reference.)

The problem with rule-based approaches in the accepted answer is that they can't take advantage of computational resources beyond that required to create the initial rules. Furthermore, rule-based approaches will be limited by the power of the rules that you can write. Search-based approaches can use the power of combinatorial search to produce much stronger play than the author of the program.

链接地址: http://www.djcxy.com/p/56400.html

上一篇: beta修剪同一玩家的连续移动

下一篇: 使用minimax搜索不完美信息的纸牌游戏