The Pattern Recognition Basis of AI

The Pattern Recognition Basis of Artificial Intelligence

Chapter 9. Game Playing

This covers a number of game playing techniques, notably checkers and backgammon because so much good research has been done on these problems and because so many different techniques have been tried.

For machine learning applied to games see the "Machine Learning in Games" WWW page by Jay Scott from Swarthmore College.

Note that typically the game playing program generates all the possible moves and rates each one and this is not quite the same thing that a person does. Human experts only look at the consequences of making a few of the most promising moves. This seems to be a case of memory based reasoning at work.

Note, too, that the idea of producing an evaluation function for board positions is another example of how Classical AI tries to condense/compress knowledge.

9.1 General Game Playing Techniques

This covers mini-max, alpha-beta and a few related techniques.

9.2 Checkers

The main programs here are Arthur Samuel's, the rote learning method which is a lot like a memory based method, generalization learning which is a lot like backprop and a signature table approach that also gives you a feed-forward type network. One of Samuel's programs did beat a checkers champion and the AI community has often make a fuss over that saying that this AI program played a "championship-level" game however that expert beat the program in the next 6 games. Note too, what Samuels says: "the program is quite capable of beating any amateur player and can give better players a good contest".

Recent work by Jonathan Schaeffer has produced a much better checker playing program. One tech report, "Man versus Machine: The World Checkers Championship" by Schaeffer that largely describes the games and not the technique is available for ftp from the University of Alberta, Canada.

9.3 Backgammon

This section looks at Berliner's program, two backprop versions by Tesauro and a temporal difference method by Tesauro. This latter program is VERY good and has found strategies that now human backgammon players acknowledge are better than some of the old humanly devised strategies. More information can be found at Jay Scott's Page at Swarthmore In addition IBM has made a free version available for OS/2, see: Challenge the expert: TD-Gammon

The article by Gerald Tesauro, "TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play" is online at the Ohio State neuroprose archive. It is largely concerned with the program's performance, not its inner workings.

Tesauro's 1995 CACM article on the program is available online via IBM

Another article (ah, well, a master's thesis) on using the temporal difference method for backgammon is from Justin Boyan, "Modular Neural Nets for Reinforcement Learning of Game Strategies", is also available from the Ohio State neuroprose archive and somewhere at Carnegie-Mellon but I don't have that URL handy (one day I'll find it). I've had trouble printing this file with Ghostscript 2.6.1 and 3.0, I have yet to try it with the newest Ghostscript.

While we're on the subject of the temporal difference method learning there is a chess playing program by Sebastian Thrun described in the article, "Learning to Play the Game of Chess", available from Carnegie-Mellon. This experiment with temporal difference learning was much less successful than the backgammon one.

I hate to end up trying to do a comprehensive listing of temporal difference methods here (or other special techniques - would someone else like to?) but here is a fairly easy to read 70+ page (double spaced) master's thesis that includes a derivation and review of the algorithm: "Explorations of the Practical Issues of Learning Prediction-Control Tasks Using Temporal Difference Learning Methods", by Charles L. Isbell available by ftp from: MIT.

Rich Sutton who effectively initiated the current interest in temporal difference methods with his 1988 paper, "Learning to Predict by the Methods of Temporal Differences" has this paper and other related papers available via "Rich Sutton's Publications Page" at the University of Massachusetts. The 1988 paper may be the best one to start with.

Notice that the temporal difference approach is quite different from the other programs in that it learned on its own rather than being told what is good by human experts. Out of all these backgammon programs I grant a level of minimal intelligence to the temporal difference method because it learns almost everything it knows on its own and the "world" it works with is extremely simple, in effect it consists of atoms at certain positions that can move around according to certain "Laws of Physics".

But consider this: after a while Tesuaro's temporal difference program will likely stop learning, so does this means that it lost its intelligence?

For another recent backgammon program that learns see this one from: Brandeis University.

Some game playing programs are getting quite good and I expect that in the long run all the best "players" will be programs. While that is wonderful and while those programs that learn to play their games get a rating of minimal intelligence from me remember that what's impressive about people is that not only can they do games, they do heuristic search, theorem proving, use natural language and cope with the real world. The real challenge is to get programs to do that. If you simply pursue techniques for game playing will you ever end up with all these human capabilities in one program?

If you have any questions or comments, write me.

To Don's Home Page