Professional Basis of AI Backprop Hypertext Documentation

Copyright (c) 1990-97 by Donald R. Tveter

Benchmarking

Background

The main purpose of the benchmarking command is to make it possible to run a number of tests of a problem with different initial weights and average the number of iterations and CPU time for networks that converged. Results of multiple runs can also be averaged and this may give better results than the best results from a single network. Benchmarking is also used by the Predict command.

The Benchmarking menu window contains the commands from the typed benchmarking command plus other commands that are useful at the time you want to run benchmarking. The two forms of the actual benchmarking command are shown in these two examples:

b       * b followed by a carriage return actually runs benchmarking
b g 5 m 15 k 1 r 1000 200  * b followed by options ONLY sets parameters
A typical command to simply test the current parameters on a number of networks is:

b g 5 m 15 k 1 r 1000 200
The "g 5" specifies that you'd like to set the goal of getting 5 networks to converge but the "m 15" sets a maximum of 15 tries to reach this goal. The k specifies that each initial network will get a kick by setting each weight to a random number between -1 and 1. The "r 1000 200" portion specifies that you should run up to 1000 iterations on a network and print the status of learning every 200 iterations. The B menu window has entry boxes where you can set these parameters. There is also an entry box that shows the previous benchmarking result.

For an example, here is some output from one benchmarking run:

b g 5 m 5 r 1000 1000
b
 seed =      7; running . . .
   62    100.00 % 0.07057    DONE
 seed =      7; running . . .
  475    100.00 % 0.06954    DONE
 seed =      7; running . . .
   54    100.00 % 0.06932    DONE
 seed =      7; running . . .
   40    100.00 % 0.07429    DONE
 seed =      7; running . . .
   43    100.00 % 0.08272    DONE
5 successes 0 failures  avg. =   134.80 +/- 170.28     0.07 sec/success

The program reports 5 successes with an average of 134.8 iterations required per success, a standard deviation of 170.28 and 0.07 seconds per success. If all networks fail to converge you get a different message that lists the time per failure. If you want an estimate of how long your benchmarking will take you can have a goal of 1 and a maximum tries value of 1, either way you will get a timing value. On PCs the granularity of the time is 1/18 of a second and so for short problems the timing will be unreliable.

When benchmarking with the continuous update methods (Cc) you need to set up-to-date statistics ("f u+") to get an accurate result for the number of iterations required. Without this setting the number reported may be greater than or less that the actual number.

Benchmarking Window Menu Commands

The Benchmark Run Parameters

The first line lets you set the maximum number of iterations to run and the rate at which to print the training status.

Goal for Successes

A benchmarking run trains until the tolerance per unit or the tolerance overall is met and when either condition occurs it is considered a success. When either condition is not met after running the maximum number of iterations the run is considered a failure. The goal entry box lets you set the number of successes you want. The typed command to set the goal to 10 is: "b g 10".

Maximum Tries

A benchmarking run trains until the tolerance per unit or the tolerance overall is met and when either condition occurs it is considered a success. When either condition is not met after running the maximum number of iterations the run is considered a failure. The maximum tries entry box lets you set the maximum number of runs you want to try to reach the goal. The typed command to set the maximum number of tries is "b m 10".

Averaging Outputs

Averaging output units over a number of runs can sometimes give better results than the best network. The theory says however that this idea can fail if certain conditions are not met so this is not a sure-fire way to improve generalization. One name for this method is the Basic Ensemble Method (BEM). If you've set the save weights every minimum (on the test set) flag (swem+) then the weights used will be read from that file. If weights are not saved on the minimums then the program uses the final set of weights for each trial. Use this button to turn on or turn off averaging. The typed command to turn on averaging is "b S+" and the command to turn it off is "b S-".

Print Seed Value Note

There is an ability to print the current seed value or skip it. Normally this does not make much difference unless you want to save the screen output to a file so you can analyze results using another program. The button lets you turn this on or off. The typed command is "bs+" to print the note or "bs-" to skip the note.

Save Final Weights

You can save the final weights for each network. The typed command is "b w+" to save them and "b w-" to not save them. If the "save the networks while benchmarking" flag is on (b w +) and the save weights on every minimum (on the test set) is not set (swem-) then the weights will be saved after each network is finished. Remember that to get each network saved in a different file the "number the weights" option (f W+) must also be on. If you have the program saving weights on every minimum (swem +) then only the best set of weights from each run will be saved however if you use "b w +" the last set of weights will be saved on top of the best set and so you lose the best set. Before you start benchmarking you may want to reset the weights file numbering with "f W +".

Save Weights on Minimum

There is a typed command "swem" for save weights for every minimum error value on the test set. That is, whenever the test set error is checked, if the new error is lower than the previous error then the new set of weights is saved. This gives you a network with the lowest error throughout the training session. The test set is checked only when a summary is written to the screen, therefore if you use "r 100 10" the test set is checked every 10 iterations. To turn this option on use: "swem +" and to turn if off use "swem -". Don't use this option with the "save weights at regular intervals" (swe save weights every) command because that command will overwrite the best weights. Within the benchmark command only the best set of weights for each run will be saved.

Tolerance/Unit

This entry box lets you set the tolerance per unit that must be met for every unit and every pattern in the training set in order to stop training. This is not particularly useful in most problems because the real goal is to minimize the error on the test set. The typed command to set the tolerance to 0.1 is "t 0.1".

Overall Tolerance

This entry box lets you set the AVERAGE error per unit that must be met in order to stop training, thus many patterns will be not meet the tolerance per unit criteria. This is somewhat more useful than the tolerance per unit criteria because trying to make every unit for every pattern register to within the tolerance/unit value can result in overtraining. On the other hand this works with the training set data and does not look at the test set data. You're left stuck trying to find what error on the training set will result in the smallest minimum on the test set. The typed command to set the overall tolerance to 0.2 is "to 0.2".

Initialize Weights to +/-:

Lets you set the range of the initial random weights.

Seed Values:

In most runs a single seed value is all you need however it is possible to use a different seed value for every run. The idea behind allowing multiple seeds is so that if one network does something interesting you can use that seed to run a network with the same initial weights outside of the benchmarking command. Type in whatever seeds you want in the entry box. The typed command to set seeds to 3, 5, 7, 18484 and 99 is:

s 3 5 7 18484 99

When there are more networks to try than there are seeds the random values keep coming from the last seed value.

Previous Result:

The previous result of a benchmarking run is listed here so if you did not copy down the result when it appeared on the screen you can get it again without doing the benchmarking run over again.

Run

To actually run benchmarking, click the Run button, or the typed version is "b".