I am about to write a chess engine based on reinforcement learning. I'd like to train an evaluation function and figure out what are the weights of the board's most important features.
I'm not an expert of machine learning, I'm trying to learn from books and tutorials. In each tutorial, the reward is quite straightforward, often 1, 0, maybe -1, but there's no such obvious reward in chess (regardless the check-mate positions). For instance, assume I have a situation on the board. I make 10 (random) moves and at that point I should calculate the reward, the difference (or error) between the starting position and the current one. How to do such thing, when my only evaluation function is under training?
I'd like to avoid using other engines' scoring system, because I feel that would rather be supervised learning, which is not my goal.
You can't really do that directly.
A few approaches that I can suggest:
Using scoring from an external source is not bad to at least kick start your algorithm. Algos to evaluate a given position are pretty limited though and your AI won't achieve master level using that alone. Explore the possibility of evaluating the position using another chess playing AI (open source ideally). Say you have a "teacher" AI. You start 2 instances of it and start the game from the position you want to evaluate. Let them play against each other from there until the end of the game. Was this move successful? Reward your own AI given the outcome. To add some variability (you don't want to be better than a single AI), do the same against other AIs. Or even, your own AI playing against itself. For the latter to work though, it probably needs to be already decent playing at chess, not playing entirely randomly. You can replay the same move many times and complete the game allowing your AI some random exploration of new moves and strategies (example: trying the 2nd best move down the road). Feed your ML using datasets of games between real players. Each move by the winning and losing players can be thus "reinforced" Have your AI learn by playing against real players. Reinforce both you AI moves (losing and winning ones) and that of the players.
链接地址:
http://www.djcxy.com/p/14826.html
上一篇:
命令折叠代码的所有部分?
下一篇:
国际象棋评估功能的培训