Backpropagator's Review

Copyright 1996-2003 by Donald R. Tveter, commercial use is prohibited. Short quotations are permitted if proper attribution is given. This material CAN be posted elsewhere on the net if the posted files are not altered in any way but please let me know where it is posted. The main location is: http://dontveter.com/bpr/bpr.html

Up to Backpropagator's Review

The Articles, Part I

Last Change to This File: January 13, 2003

Here are the articles listed in a random order:

* "Trading Spaces: Computation, Representation and the Limits of Uninformed Learning" by Andy Clark and Chris Thornton by WWW from Washington University in St. Louis. This paper references the paper by Elman which is also online. This article makes the very important point that not all problems can be solved by backprop but it also points to ways that you can modify the problem or the training in order to get the network to solve the problem.

* "Incremental Learning, or the Importance of Starting Small" by Jeffrey L. Elman from The University of California at San Diego. Learning adult size problems can be hard to impossible but learning is easier if you start the network out with child size problems.

* "Financial Prediction, Some Pointers, Pitfalls and Common Errors" by Kevin Swingler from Stirling University, United Kingdom. For those of you eager to predict the financial markets with your brand new neural network package this article gives you a few cautions about network based predictions.

* "A Framework for Combining Symbolic and Neural Learning", by Jude W. Shavlik from The University of Wisconsin at Madison. For related reports by Shavlik and Towell, follow this link. This paper concerns starting with rules to produce an initial backprop network, training it and then extracting a new set of rules that in this report turned out to be better than the original set of rules.

* "Explorations of the Practical Issues of Learning Prediction-Control Tasks Using Temporal Difference Learning Methods", by Charles L. Isbell available by ftp from: MIT. This is a fairly easy to read 70+ page (double spaced) master's thesis that includes a derivation and review of the algorithm. The Ghostscript 2.6.1 interpreter conked out on page 44 and I have not yet tried any of the newer Ghostscripts to see if they can handle the situation. I got this report from Andros Bragianos:

--> GSview (GhostScript graphical interface) for Windows (version 2.8) was able to view the whole thesis but with the "Ignore DSC" option ON. With this option ON, the thesis could only be traversed in a forward direction (previous pages would be "forgotten"). When the option is OFF the thesis can be traversed in forward/reverse directions but only up to the end of the Contents section.

For which I thank him because I am blessed with Linux these days and so I don't know what the newer Windows versions do.

* "Learning to Predict by the Methods of Temporal Differences" by Rich Sutton, Machine Learning 3: 9--44, is available by ftp from the University of Massachusetts. Other related papers are available via Rich Sutton's Publications page.

* J. R. Chen and P. Mars, "Stepsize Variation Methods for Accelerating the Back-Propagation Algorithm", IJCNN-90-WASH-DC volume 1, pp 601-604, Lawrence Erlbaum, 1990.

* "Faster Learning Variations on Back-Propagation: An Empirical Study" by Scott Fahlman from the Ohio State neuroprose archive or from Carnegie-Mellon. This paper shows a series of experiments to try to improve backprop and finishes with quickprop which may be one of the best ways to speed up the training of a network.

* "The Cascade Correlation Learning Algorithm" by Scott Fahlman and Christian Lebiere from the Ohio State neuroprose archive or from Carnegie-Mellon. Cascade correlation builds a network one hidden unit at a time and it is extremely fast. It does not work very well with function approximation and there have been reports that a better version is under development that will work well on function approximation problems and be simpler as well.

* "The Recurrent Cascade Correlation Architecture" by Scott Fahlman from the Ohio State neuroprose archive or from Carnegie-Mellon. This is a version of cascade correlation designed to work with recurrent networks.

* "TRAINREC: A System for Training Feedforward & Simple Recurrent Networks Efficiently and Correctly" by Barry L. Kalman and Stan C. Kwasny available by WWW from Washington University in St. Louis. This article contains a multitude of methods for improving training and generalization. One of them I approve of is their method for averaging outputs when you need to identify a time series with a recurrent network. They also argue that tanh is a better activation function than the standard sigmoid and this is something I have never really seen and so I have grave doubts about it. However they used tanh in the context of a different error function and I have never tried exactly that.

* The rprop papers that have been listed here for so long from ira.uka.de have disappeared. I managed to come up with the following links to them. You can go to Dr. Riedmiller's publication page: Prof. Dr. Martin Riedmiller - Publications where you can find these two versions:

http://lrb.cs.uni-dortmund.de/~riedmill/publications/rprop.details.ps.Z which doesn't work under ghostscript 6.52 that came with my Red Hat 7.3 Linux.
http://lrb.cs.uni-dortmund.de/~riedmill/publications/riedml.icnn93.ps.Z which doesn't work under ghostscript 6.52 that came with my Red Hat 7.3 Linux.

Further searching turned up links to copies of Rprop papers cached in various formats at http://citeseer.nj.nec.com:

More searching turned up the interesting page: Improving the Rprop Learning Algorithm - Igel, Husken (ResearchIndex) where gs 6.52 failed but xpdf handled the pdf format of the paper. It is also found at: http://www.neuroinformatik.ruhr-uni-bochum.de/ini/PEOPLE/husken/nc2000_iRprop.ps.gz I haven't tried this new version of the algorithm yet.

* "Optimization of the Backpropagation Algorithm for Training Multilayer Perceptrons" by W. Schiffmann, M. Joost and R. Werner from the Neuroprose Archive at Ohio State. This article is not a reliable report on the ability of various methods to solve the particular problem the researchers used. Ordinarily you make a dozen or so runs and find the average time it takes for your network to converge however the problem the authors used is so large and the computers they used were so relatively slow that they only made one run per algorithm per parameter setting. I asked one of the authors about this and he said he thought the results would stand up under multiple runs because of the way the weights were initialized however I have not yet taken the time to test this assertion about their weight initialization scheme. (Would someone please do this, it is on page 3, it is rather easy and the BP world is always interested in using the best weight initialization algorithm.) What makes this article worth looking into is the large number of different algorithms that were described in a fair amount of detail plus of course the pointers back to the original articles.

* "Selecting Neural Network Architectures via the Prediction Risk: Application to Corporate Bond Rating Prediction" by Joachim Utans and John Moody available from the Ohio State Neuroprose archive. The authors use backprop, v-fold cross validation and some pruning algorithms to rate corporate bonds.

To Part 2 of the Articles

If you have any questions or comments, write me.

To Don's Home Page