使用minimax搜索不完美信息的纸牌游戏

2018-06-20 01:12:07

我想使用minimax搜索（使用alpha-beta修剪），或者使用negamax搜索，使计算机程序玩纸牌游戏。

纸牌游戏实际上由4名玩家组成。所以为了能够使用minimax等，我把游戏简化为“我”与“其他”。在每次“移动”之后，您可以客观地从游戏本身读取当前状态的评估。当所有4名玩家都放置了这张牌时，最高的牌会赢得他们 - 并且这些牌的数值会被计入。

由于您不知道其他3名玩家之间的卡牌分配究竟如何，我认为您必须使用不属于您的卡牌模拟所有可能的分布（“世界”）。你有12张牌，其他3名牌手总共有36张牌。

所以我的方法就是这种算法，其中player是一个介于1和3之间的数字，象征着程序可能需要寻找的三个计算机玩家。和-player代表对手，即所有其他三名球员一起。

private Card computerPickCard(GameState state, ArrayList<Card> cards) {
    int bestScore = Integer.MIN_VALUE;
    Card bestMove = null;
    int nCards = cards.size();
    for (int i = 0; i < nCards; i++) {
        if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card
            int score;
            GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state)
            score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE);
            if (score > bestScore) {
                bestScore = score;
                bestMove = cards.get(i);
            }
        }
    }
    // now bestMove is the card to place
}

private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) {
    ArrayList<Card> cards;
    if (player >= 1 && player <= 3) {
        cards = state.getCards(player);
    }
    else {
        if (player == -1) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(2));
            cards.addAll(state.getCards(3));
        }
        else if (player == -2) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(3));
        }
        else {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(2));
        }
    }
    if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached
        if (player >= 1 && player <= 3) {
            return state.getCurrentPoints(player); // player's points as a positive value (for self)
        }
        else {
            return -state.getCurrentPoints(-player); // player's points as a negative value (for others)
        }
    }
    else {
        int score;
        int nCards = cards.size();
        if (player > 0) { // make one move (it's player's turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // wenn Zug gültig ist
                    score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha);
                    if (score >= beta) {
                        return score;
                    }
                    if (score > alpha) {
                        alpha = score; // alpha acts like max
                    }
                }
            }
            return alpha;
        }
        else { // make three moves (it's the others' turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // if move is valid
                    for (int k = 0; k < nCards; k++) {
                        if (k != i) {
                            GameState futureStateLevel2 = futureState.testMove(cards.get(k));
                            if (futureStateLevel2 != null) { // if move is valid
                                for (int m = 0; m < nCards; m++) {
                                    if (m != i && m != k) {
                                        GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m));
                                        if (futureStateLevel3 != null) { // if move is valid
                                            score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha);
                                            if (score >= beta) {
                                                return score;
                                            }
                                            if (score > alpha) {
                                                alpha = score; // alpha acts like max
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
            return alpha;
        }
    }
}

这似乎工作正常，但深度为1（ depthLeft=1 ），程序平均需要计算50,000次移动（放置卡片）。当然这太过分了！

所以我的问题是：

实施是否正确？你能模拟这样的游戏吗？关于不完美的信息，特别是？

如何改进速度和工作负载的算法？

例如，我可以将一组可能的移动减少到50％的随机集合以提高速度，同时保持良好的结果吗？

我发现UCT算法是一个很好的解决方案（也许）。你知道这个算法吗？你能帮我实施吗？

如你所实施的Minimax搜索对于那些存在很多不确定性的游戏来说是错误的方法。由于您不知道其他玩家之间的卡片分配情况，因此您的搜索将花费指数的时间来探索在卡片实际分配情况下无法发生的游戏。

我认为一个更好的方法是从很少或根本没有关于其他球员手牌的信息开始，制定好的比赛规则。像：

如果你在一轮中首先进行比赛，则由于你几乎没有机会赢得比赛，因此请尽可能地使用最低的牌。

如果你在一轮中最后一场比赛中打最后一张牌，将赢得一轮。如果你不能赢得一轮，那么打你最低的牌。

让你的程序最初不用搜索，只是玩这些规则，并假设所有其他玩家都会使用这些启发式。当程序观察每场比赛的第一名和最后一名选手时，它可以建立一张关于每名选手可能持有的牌的信息表。例如，9人本可以赢得这一轮，但玩家3没有玩，所以他不能有任何卡9或更高。当收集每个玩家的手的信息时，搜索空间最终将被限制在可能的游戏的极小极大搜索可以产生关于下一张要玩的有用信息的点。

我想澄清接受的答案没有真正涉及的细节。

在许多纸牌游戏中，您可以抽取对手可能拥有的未知卡牌，而不是生成所有这些卡牌。在进行抽样时，您可以考虑短时间内的信息，以及持有某些牌的概率，以衡量每只手的可能性（每只手都是我们将独立解决的可能世界）。然后，你使用完美的信息搜索来解决每一只手。在所有这些世界上最好的举动往往是总体上最好的举动 - 有一些警告。

在像扑克这样的游戏中，这不会很好 - 游戏就是隐藏的信息。您必须精确地平衡自己的行为，以隐藏您手中的信息。

但是，在像诡计纸牌游戏这样的游戏中，这项工作非常好 - 特别是因为新信息一直在显示。无论如何，真正优秀的球员都有一个好主意，每个人都拥有。所以，相当强大的Skat和Bridge计划就是基于这些想法。

如果你能完全解决潜在的世界，那最好，但如果你不能，你可以使用minimax或UCT来选择每个世界中最好的移动。还有混合算法（ISMCTS）试图将这个过程混合在一起。请注意这里的要求。简单的抽样方法更容易编码 - 您应该在更复杂的方法之前尝试更简单的方法。

以下是一些研究论文，它们将提供更多有关何时对不完全信息的抽样方法运作良好的信息：

了解完全信息蒙特卡罗采样在博弈树搜索中的成功（本文分析何时采样方法可能奏效。）

改进技巧型卡片游戏中的状态评估，推理和搜索（本文描述Skat中的抽样的使用）

在计算上具有挑战性的游戏中不完美的信息（本文描述了Bridge中的采样）

信息集蒙特卡罗树搜索（本文合并采样和UCT /蒙特卡罗树搜索以避免第一个参考中的问题。）

在接受的答案中，基于规则的方法存在的问题是，他们无法利用超出创建初始规则所需的计算资源。此外，基于规则的方法将受限于您可以编写的规则的力量。基于搜索的方法可以使用组合搜索的力量来产生比程序作者更强的游戏。

链接地址: http://www.djcxy.com/p/56399.html

上一篇: Using minimax search for card games with imperfect information

下一篇: Connect 4 implemented with python AI alpha