Professional Version Basis of AI Backprop Hypertext Documentation

Copyright (c) 1990-97 by Donald R. Tveter

Setting Up Data for Function Approximation or Memorization

The normal way to operate this program is to have the training data in one file and if there is test set data it should be in another file. For a function approximation program with say five inputs and 1 output the data should look like:

  -1.588   -1.650     0.365     0.188     0.962   -1.543
  -1.182   -0.926     0.992     0.188     1.140   -1.372
  -2.650   -1.650     3.501     0.188     0.566   -1.201
where the first five numbers are the input and the last number is the answer. If there are a large number of inputs you can use more than one line for each pattern but you MUST start every new pattern on a new line.

NOTE: when doing this type of problem where the outputs can be outside the range 0 to 1 you need to make the output layer activation function the linear one. Do this in the Algorithm (A) menu window. The hidden layer activation function should be the standard sigmoid.

Data Setup For Classification Problems

For a plain classification program, with say four inputs and 3 possible classes the data file will look like:

0.40 0.30 -0.33 0.21    1  * this is a class 1 pattern
0.55 0.32 -0.09 0.20    2  * this is a class 2 pattern
0.11 0.23 -0.97 0.45    3  * this is a class 3 pattern
where the first four numbers are the input and the last number is the class number. If there are a large number of inputs you can use more than one line for each pattern but you MUST start every new pattern on a new line. The asterisk at the end of the line begins a comment.

NOTE: reading in patterns in this format requires setting the classification format flag. This can be set with the Input (I), Format (F) menu or the Pattern (P) menu window.

Data Setup For Recurrent Problems

For a recurrent network problem, with say four inputs and 4 possible outputs the data file will look like:

1 0 0 0   H   0 0 1 0
0 0 1 0   H   0 1 0 0
0 1 0 0   H   0 0 1 0
0 0 1 0   H   0 0 0 1
where the first four numbers are the input, the H stands for the values of the hidden layer units that are copied down to the input and the last four numbers are the output. If there are a large number of inputs you can use more than one line for each pattern but you MUST start every new pattern on a new line.

The use of H to stand for all the hidden layer units is convenient because then you can change the number of hidden layer units without changing the pattern files. On the other hand if you want to take a limited number of hidden layer values and copy them down to input there is another notation that can be used. For instance there is this data:

   0.00000  h   0.15636
   0.15636  h   0.30887
   0.30887  h   0.45378
   0.45378  h   0.58753
   0.58753  h   0.70683
   0.70683  h   0.80874
   0.80874  h   0.89075
   0.89075  h   0.95086
a series used to predict the next value of sin(x), so given 0.0 you want the network to output the next value 0.15636, then give 0.15636 you want to output 0.30887. The single h in each pattern stands for "take one (the first) hidden layer value and use it as the second input value".

There is an additional page with more background on recurrent networks.

Making a Network

To make a network select the Network (N) menu window. The options there are to make a two, three or four layer network. (In fact you can make a network with any number of layers by typing in the right command but four layers is rarely useful and more than four is very rarely done so there are no menu entries for making more than a four layer network).

Whichever size network you choose fill in the entry boxes with the number of units you want in each layer. Then if you want direct input to output connections in a three or four layer network click the button that changes the setting. Likewise if you want a recurrent network click that button.

IF THE NETWORK IS A RECURRENT NETWORK AND USES "H" DO NOT include the number of short term memory units when you input the number of input units if you are using H to stand for all the hidden layer units. Thus for the poetry problem tell the program you want 25 input units (not 45, the 45 comes from 25 normal input units plus the 20 more short term units whose values come from the hidden layer). The Tcl/Tk program will ultimately output a make command that looks like "m 25+20 20 25" so you will end up with 45 input units for the network.

IF THE NETWORK IS A RECURRENT NETWORK AND USES "h" you do count however many h units you have as input units and you DO NOT click on the recurrent network button. So if you use this data:

   0.00000  hh   0.15636
   0.15636  hh   0.30887
   0.30887  hh   0.45378
   0.45378  hh   0.58753
   0.58753  hh   0.70683
   0.70683  hh   0.80874
   0.80874  hh   0.89075
   0.89075  hh   0.95086
the number of input units should be 3 and you DO NOT click on the recurrent network button.

(OK, I should change the label on the button to something more clear.)

To finally make the network, click the "Make" button at the bottom of the window or click "Cancel" to exit the window without making a network.

Reading in Patterns

AFTER making the network you can go to the Patterns (P) or Input (I) menu window to read in the patterns. When you select a button there a list box comes up with the files in the current directory and you double click the file you want.

Note that if you make a network again with a different number of hidden layer units the patterns will be lost (they are attached to the network structure and not saved) so you must read them in again (or buy the pro version where they are saved).

Algorithm Choices

There are many variations on backprop algorithm that are normally faster than the original plain backprop algorithm. The best of these in this program is usually the Quickprop algorithm however there is no guarantee that it will be the best algorithm sometimes the other algorithms will be better. When you are using Quickprop (see the Q menu window), Delta-Bar-Delta (D) or the periodic update algorithms (the G menu window, G for Gradient descent) you need to use an eta that is about 1/n where n is the number of patterns, you must set these parameters yourself it is not done automatically and the default settings are just there because something has to be there.

In the D, Q and G menu windows you can turn on the algorithm but to get some other algorithm you must go to the A menu window.

Random Weights

Normally you want to initialize the weights in the network with small random values, maybe between -1 and 1, this makes the network converge faster than if the network is started with all 0 values for the weights and sometimes like in the xor problem it is absolutely necessary to do any learning at all. For every different seed value you get a different set of random initial weights, if you want to set a new seed you can change the seed value in the seed entry box in the T menu window. To use the clear and initialize command click the button in the T menu window (where you can change the size of the initial weights) or use the "ci" button in the second line of the menu bar (where you get the current range for the weights). The menu bar also has an "sci" command that generates a random number (by CPU time used mod 32768) and then clears and initializes the weights. You can't type "sci" at the command line to do this, the "sci" command is implemented using the "s" and "ci" commands.

Training

Having initialized the network you can run the training algorithm by typing "r" and a carriage return in the main window entry box or click the "r" button on the menu bar or click "Run" in the T menu window. The initial default is to run 100 passes through the training set data and print the status of the patterns every 10 iterations. For the sonar data included in the sample data the listing will look like:

   10     49.04 %  49.04 % 0.47063      62.50 %  62.50 % 0.38221
   20     70.19 %  73.08 % 0.38548      77.88 %  77.88 % 0.38063
   30     76.92 %  76.92 % 0.34943      77.88 %  80.77 % 0.33282
The first column is the iteration number, the second column gives the percentage of training set patterns right based on the tolerance the third column gives the percentage right based on the maximum value, the fourth column gives the abs (not RMS!) error. The fifth column is the percentage of test set patterns right based on the tolerance, the sixth column gives the percentage right based on maximum value and the last column gives the abs error.

Saving Everything

Once you have all the parameters set you can save the make a network command and all the parameters in a command file and the weights in another file (default name: weights). To save everything select the "Save As ..." button or the "Save and Exit" button in the File menu or the "Save Everything" button in the O (Output) menu window. In all cases you will be asked to give a file name to save the commands to, the weights will be saved to the current weights file. Of course, "Save and Exit" also ends the program. Saving everything is also available in the O (Output) menu window.

Quitting the Program

To quit the program you can type "q" in the main window entry box or quit from the File menu.