The Pattern Recognition Basis of Artificial Intelligence

Chapter 3. Pattern Recognition II

In this chapter the pattern recognition problem becomes a problem in abstract math. If you consult the "pattern recognition" journals about all you will find is a nearly endless supply of abstract math and theorem proving and hardly any practical applications of all of it. Despite this theoretical approach this chapter tries to stay down to Earth as much as possible. There are quite a few algorithms and topics to cover in this chapter the most important one being back-propagation, its important because it can be used for so many different applications.

3.1 The Linear Pattern Classifier

This is one of the simplest of all pattern classification algorithms. There is a simple algorithm to find a line, plane or hyperplane that will separate two patterns that you are trying to learn. In effect the line, plane or hyperplane defines the weights of a neural network. One of the beauties of this scheme is that once you find the weights for the network you can throw away all the cases you used to develop the weights. A simple version can be programmed in a hundred lines or so and my version is also available.

3.2 Separating Non-Linearly Separable Classes

In many problems data cannot be separated by lines, planes or hyperplanes and then some non-linear methods must be used. The simplest such algorithm is the nearest neighbor classifier. You simply take your unknown (as a vector) and compute the distance from it to all the other patterns you have the answer for. The answer you guess for the unknown pattern is the class of the closest pattern.

In the nearest neighbor algorithm you have to keep a large inventory of patterns and their classifications so searching through this inventory for the closest match may take quite a long time. Lately some interesting variations on the nearest neighbor method have been developed that are much faster because you store fewer known patterns. The idea is to scatter a few prototype points, that is representative patterns, around the space for each class. There is a whole series of algorithms to do this that are called learning vector quantization (LVQ) algorithms. Perhaps the simplest of these is one called decision surface mapping (DSM) and this one is covered in the text. At one time I had the LVQ1 algorithm in the text as well in another chapter called Other Neural Networking Methods but in the end I thought it was best to cut this chapter from the book and make it available on the net. I've done a nearest neighbor program in C with DOS binaries that implements the k-nearest neighbor algorithm, DSM and LVQ1. More LVQ software is available from the group that started it all, the LVQ/SOM Programming Team of the Helsinki University of Technology, Laboratory of Computer and Information Science, Rakentajanaukio 2 C, SF-02150 Espoo, Finland. It comes as UNIX source and as a DOS self-extracting file. There are also other programs that can be used with this LVQ software.

3.3 Hopfield Networks

Besides the Hopfield network this also contains the Boltzman machine relaxation algorithm. This Boltzman machine idea is really very intriguing because of the way it looks up memories. Its radically different from conventional computer architectures. While its very interesting theoretically there have been very few applications of this method. In the highly recommended Nanopoulos article he says (in effect) that the microtubules can form a Hopfield network. (I checked with physicist Jack Sarfatti on this to make sure I was interpreting Nanopoulos correctly.) Each tubulin molecule would represent a 1 or 0 depending on which state its in. I can't imagine how the weights get represented in this system and especially how a molecule at one end of the MT fiber can influcence a molecule at the other end so I asked Jack the following:

if the MTs work this way how does one unit on the far left manage to influence a unit on the far right? (I can imagine one unit affecting a neighboring unit but that is all.) What are the weights between the MT units and how are they created? Is there quantum "magic" at work here?

His reply was:

Yes. We have to look at the quantum version in which the quantum pilot waves provide long-range nonlocal links between the classical switches. Exactly how to do this mathematically I don't know right this minute.

So APPARENTLY some quantum "magic" allows a tubulin molecule to connect to every other tubulin molecule giving a Hopfield type network, a pretty neat way to implement the algorithm. Nanopoulos also notes that the physical temperature of the tubulin fibers and the strength of the electric fields surrounding them change the characteristics of the fiber making possible (*I* think) the Boltzman machine relaxation algorithm. I'd like to see a supercomputer simulate this numerically, its a good project for someone I think.

The Hopfield and Boltzman programs are available as C source and DOS binaries

3.4 Back-Propagation

Again, backprop is really important because of the large number of problems it can be applied to. Notice how many times it was discovered before it finally caught on. If you're familiar with regression notice how backprop is really just a version of non-linear regression. There are loads of ways to speed up the plain algorithm and generally speaking you should use them not the plain algorithm however sometimes the plain version will give the best results. At this point in time there is no way to know ahead of time which improved version or the plain version will be best.

If you want a tutorial on backprop that is much the same as the one in the text get my postscript file or see the newer HTML file.

The Rprop and Quickprop papers are available online, for these and more material on backprop see my Backpropagator's Review.

Since backprop is so useful I've spent a lot of time working on a good version of backprop and it is available online.

3.5 Pattern Recognition and Curve Fitting

This section shows what backprop really ends up doing is fitting curves and surfaces to whatever data you give it. It is a form of non-linear regression. It also shows an example of overfitting, if you're doing anything important with backprop you MUST know about overfitting.

3.6 Associative Memory and Generalization

This section does several things. First, it demonstrates the use of backprop networks to produce an associative memory and how such a network can respond in a reasonable way to patterns it has never seen before. It can give rule-like behavior without actually having any rules. The data for the sports example is in my backprop package. Second, there is the issue of local and distributed representations in networks. Third, it shows how a network with a distributed representation can do some elementary reasoning.

The Perrone and Cooper article on the virtues of averaging the results of several networks, "When Networks Disagree: Ensemble Methods for Hybrid Neural Networks" is available by FTP from Neuroprose archive at Ohio State. This paper proves that under certain conditions that averaging the output units of a number of networks will give you a better estimate than the best of these networks by itself.

As noted in the text backprop is not magic and it cannot do everything. For details see this article cited in the text: "Trading Spaces: Computation, Representation and the Limits of Uninformed Learning", by Andy Clark and Chris Thornton by ftp from Washington University in St. Louis or by WWW from Philosophy/Neuroscience/Psychology at Washington University. This article makes the very important point that not all problems can be solved by backprop but it also points to ways that you can modify the problem or the training in order to get the network to solve the problem. This paper references the article: "Incremental Learning, or the Importance of Starting Small", by Jeffrey L. Elman and which is also online from the University of California at San Diego.

There is a recent paper available from Los Alamos National Laboratories on Quantum Associative Memory by Dan Ventura and Tony Martinez.

3.7 Applications of Back-Propagation

This section describes some of the applications of backprop, the sonar problem, NETtalk, ALVINN and various others. There are many, many more such applications. Consult any set of neural network conference proceedings for more. The paper, "A Neural Network Model for the Gold Market", by Peter J. McCann and Barry L. Kalman is available by ftp from Washington University in St. Louis or by WWW from Philosophy/Neuroscience/Psychology at Washington University.

The NavLab project now has a home page: NavLab Home Page A little more but not very much more on ALVINN and automated driving can be found on the Carnegie-Mellon robotics page.

3.8 Exercises

Note that exercises that call for the linear pattern classifier, Boltzman machine, nearest neighbor, DSM and backprop you may want to use the software I wrote.

If you're looking for interesting data remember that my backprop package includes the most commonly tested sonar data and data for the artificial problem, the "circle in a square". For more real world data FTP to the machine learning database at the University of California at Irving.

For pointers to more data consult the NN FAQ.


If you have any questions or comments, write me.

To Don's Home Page