How to use negamax algorithm
I was wondering how I can use the negamax algorithm. I am trying to write an agent for the game mancala in C#. The algorithm gives you a single number when given a game node.
Suppose my AI player wants to make a move. The negamax function returns a single number. So it tells me what the score of the best move from that point is. How can I use this one number?
if it's player A's turn I tried making his possible moves and checking the negamax for each one. However if I make the move first and then check the negamax, then when the negamax is running (let's assume we're still only 1 level deep) it will evaluate the move, and then the next move will have to be player B's.
I'm getting really confused about this. When I see negamax pseudocode (such as on the wikipedia page) it says to try that player's moves. If I do that it will return the highest score without telling me which move got that score.
How is negamax supposed to be used?
This is a fun one.
It is all about exploring each node in the tree of possible moves. If you use alpha-beta pruning, you can make the algorithm more efficient by "pruning" (not evaluating) some of the branches of the tree. I am going to assume you are not using pruning, and are going to look at the full tree.
If Mancala is a very simple game, like Tic-Tac-Toe, you can implement the algorithm without needing an "evaluation function". With tic-tac-toe, if you play out all possible moves, you get to either a win, a loss, or a draw. You would implement a negamax algorithm there without any regard to the intermediate states of the game (ie, any move before the last), because there are a very limited number of possible moves, and the AI engine will easily be able to calculate all the possibilities all the way to the very end.
In Chess, on the other hand, an "evaluation function" (EF, hereafter) is essential, because no hardware on this planet can calculate every possible sequence of chess moves all the way to the end of the game. As such, most chess AIs will go 12-14 levels deep, and then evaluate the resulting position, assigning 8 points for a queen, 5 for a rook, 3 for bishop or knight, 1 for a pawn, and then assign additional points for things like squares controlled (more points for center squares controlled), king safety, etc.
For Mancala, as far as I can tell, it is probably complicated enough that an evaluation function would be needed, but perhaps that evaluation function would be simple, such as the number of seeds still in possession, also adding points for seeds which are in an advanced position. (I looked up the Wiki Mancala, it looks like there are many possible variants--I'm not sure which one you are working with.)
So, the negamax algorithm would need to be implemented for a certain depth (ie, not until the end of the game using all possible plays), and with a simple EF. Let us assume you will implement the AI looking 5 moves deep. The nice thing about negamax is that it is completely symmetrical and zero-sum; in other words, if the position evaluates to 5 for the AI, it evaluates to -5 for the human player. And if evalutes to 13 for the human player, it evalutes to -13 for the AI. That's the "single number" which is discussed. With all this in mind, the AI algorithm would look something like this (again, with no pruning):
1) Examine each possible AI move
2) For each of those moves, examine each possible opponent response
3) For each of those possible responses, examine each possible AI move
4) For each of those possible AI moves, examine each possible opponent response
5) Finally, for each of those possible opponent responses, examine each possible AI move
Now we have reached depth 5, and you have built a tree with 5 levels, and probably thousands or millions of leaves (bottom layer nodes) of the tree. You code this in such a way that each node has reference to its parent node, and references to all of its children nodes, so that you can easily traverse the tree, from parent to children and then back.
Once you have the tree set up properly, now it is time to implement the negamax algorithm, as follows (let us assume that a higher score is better for the AI player):
6) For each 4th level opponent response, find the HIGHEST evaluation among all of the AI children moves, and prune all other children. You are determining which 5th-from-now move your AI would play in response to each possible 4th-from-now opponent response. So now each 4th level response has exactly one assumed 5th level response. Now you assign the evaluation score you made of the 5th level child, to the 4th level parent. This says that if you reach that 4th level opponent move, the AI will make this particular 5th level move, and the board will evaluate to that score.
7) Next, you evaluate each 3rd level AI move, and for each, find the LOWEST evaluation among all of the 4th-from-now opponent moves, prune all other children, and assign the 4th level score (which came from the highest 5th level node) to the 3rd level. You are doing the same as in step 6, except using LOWEST child score (b/c this is an AI move and not a opponent move).
8) Do the same for the 2nd level as step 6, finding the HIGHEST evaluation among all 3rd-from-now moves, and assign to 2nd level nodes those highest evaluations.
9) Do the same for the 1st level as step 7, finding the LOWEST evaluation among all 2nd-from-now moves, and assign the 1st level nodes those lowest evaluations.
10) Look at all of the 1st level nodes, and your AI should play the one with the HIGHEST score.
Obviously, you would make the depth not hard coded to 5, but instead a parameter, and you will use recursion (as in the Wiki) to accomplish this. To choose a depth, see how long it takes to run, and set n equal to the highest depth which still allows for snappy AI response. Once you build the basics here, you could add pruning strategies later which would enable greater depth by not evaluating whole branches of the tree which are obviously not the right move, but this is the complete, basic negamax which I have laid out for you.
Best of luck, it should be a fun one to program!
Onemancat gives a very thorough explanation - +1.
The short answer to your question is that negamax returns the score for a particular position, so what you would do is play every move at the first ply, call negamax for each resulting position to evaluate it, and then pick the move with the best score as the outcome.
链接地址: http://www.djcxy.com/p/9634.html上一篇: beta修剪深度
下一篇: 如何使用negamax算法