There are two different goals of Chess AI: To figure out what is objectively the very best move in each situation, and to figure out what is, for me as a human, the best way to play in each situation. And possibly explain why. The explanations part I have no good ideas for short of doing an extraordinary amount of manual work to make a training set but the others can be done in a fairly automated manner.
(As with all AI posts everything I say here is speculative, may be wrong, and may be reinventing known techniques, but has reasonable justifications about why it might be a good idea.)
First a controversial high level opinion which saves a whole lot of computational power: There is zero reason to try to train an AI on deep evaluations of positions. Alpha-beta pruning works equally well for all evaluation algorithms. Tactics are tactics. What training should do is optimize for immediate accuracy on a zero node eval. What deep evaluation is good for is generating more accurate numbers to go into the training data. For a real engine you as want switching information to say which moves should be evaluated more deeply by the alpha-beta pruner. For that information I’m going to assume that when doing a deep alpha-beta search you can get information about how critical each branch is and that can be used as a training set. I’m going to hand wave and assume that there’s a reasonable way of making that be the output of an alpha-beta search even though I don’t know how to do it.
Switching gears for a moment, there’s something I really want but doesn’t seem to exist: An evaluation function which doesn’t say what the best move for a godlike computer program is, but one which says what’s the best practical move for me, a human being, to make in this situation. Thankfully that can be generated straightforwardly if you have the right data set. Specifically, you need a huge corpus of games played by humans and the ratings of the players involved. You then train an AI with input of the ratings of the players and the current position and it returns probability of win/loss/draw. This is something people would pay real money for access to and can be generated from an otherwise fairly worthless corpus of human games. You could even get fancy and customize it a bit to a particular player’s style if you have enough games from them, but that’s a bit tricky because each human generates very few games and you’d have to somehow relate them to other players by style to get any real signal.
Back to making not a human player but a godlike player. Let’s say you’re making something like Leela, with lots of volunteer computers running tasks to improve it. As is often the case with these sorts of things the bottleneck seems to be bandwidth. To improve a model you need to send a copy of it to all the workers, have them locally generate suggested improvements to all the weights, then send those back. That requires a complete upload and download of the model from each worker. Bandwidth costs can be reduced either by making generations take longer or by making the model smaller. My guess is that biasing more towards making the model smaller is likely to get better results due to the dramatically improved training and lower computational overhead and hence deeper searches when using it in practice.
To make suggestions on how to improve a model a worker does as follows: First, they download the last model. Then they generate a self-play game with it. After each move of the game they take the evaluation which the deeper look-ahead gave and train the no look-ahead eval against that to update their suggestions for weight updates. Once it’s time for the next generation they upload all their suggested updates to the central server which sums all the weight updates suggestions (possibly weighting them by the amount of games which went into them) and uses that for the next generation model.
This approach shows how chess is in some sense an ‘easy’ problem for AI because you don’t need training data for it. You can generate all the training data you want out of thin air on an as needed basis.
Obviously there are security issues here if any of the workers are adversarial but I’m not sure what the best way to deal with those is.