The Pattern Recognition Basis of Artificial Intelligence

Character Recognition

Introduction

Quite often someone asks in comp.ai.neural-nets how to do character or letter recognition. I then send off a quick reply describing the ideas involved but I've gotten tired of that and besides I should do it in a little more detail, so here goes. The figures used here are low resolution gif files made from xdvi (that's the Linux viewer of dvi files that are made from tex files) taken from my textbook: The Pattern Recognition Basis of Artificial Intelligence .

Tiny Characters

If all you want to do is recognize tiny characters of the size 5x5 or 5x7 or some such low resolution then just list the matrix as a vector so if your character is this:

11111
10000
11111
10000
11111

just list it as the vector:

1111110000111111000011111

and feed it into the pattern recognition algorithm of your choice, the nearest neighbor algorithm is easy or backprop is very popular. And you're done. :-) For minor advice on pattern recognition algorithms see below.

Bigger Characters

The above is nice and simple but if your figure is larger you need to do more work, you need to do some pre-processing to cut down on size of the vector and help generalization. Suppose your figure is 21x21 as in the figure below:

Notice that the matrix has been divided up into 9 sub-matrices of 7x7, the number of each sub-matrix is given on the right. What you need to do now is look at the features in each sub-matrix to see what's there. Let the features to look for be the following:

The choice of these features is ad-hoc, there is no guarantee that they will work well with the set of characters you want to recognize. Many versions of Fukushima's neocognitron use the following set of features:

(The neocognitron is done in more detail in my book however I skip the training algorithm. It is also available in many variations in other places as well.)

Here are some other features used by Uhr and Vossler in their early 1960s program (cut from my book in the last round):

In these matrices (Uhr and Vossler called them operators) the 1 requires that the 1 be present in the picture, a 0 requires that a 0 be present in the picture and the blank was a don't care marker. The top row came from Uhr and Vossler working by hand but they also had a version of the program that would try to find features of its own within the data, a sampling of what it found is shown in the second row. The program that generated its own operators did better than the program that used hand generated operators. (A source for the Uhr and Vossler program is: Computers and Thought, edited by Feigenbaum and Feldman McGraw-Hill, 1963 although it was described in other places as well.)

But returning to the problem we'll work with the four original patterns. The vertical and horizontal patterns are found in region 1 of the letter E pattern and there are no diagonals present, code this as the sequence (1,1,0,0). Repeat this search process for all 9 regions. Given the letters E, F and H you get these 36-bit vectors:

The great virtue of this representation is that letters can be distorted slightly and still they will map into the same set of 36 features, for instance the following two pictures will both map into the same 36-bit vectors.


                 *********             *********************
         ********         **           *
        *                              *
        *                              *
        *                              *
       *                               *
       *                               *
       *                               *
       ******************              *********************
      *                                *
      *                                *
      *                                *
      *                                *
     *                                 *
     *                                 *
     *                                 *
     *                                 *
      *********                        *
               *********               *********************

To get effective pattern recognition without this technique you would have to list large numbers of vectors, one for each possible way of drawing the E within the 21 x 21 area. Not only would this be a pain to do but the training for backprop would be terribly slow and a huge number of hidden layer neurons would be required.

Once you've made this transformation simply use these vectors in the pattern recognition algorithm of your choice. :-)

The easiest way to find the features is to write a conventional little program to find them. In theory you could use a neural networking program to find the features and some programs like the neocognitron do exactly that but since that's a lot of work to go to skip it if you can.

Input from a Tablet

If you're fortunate enough to have a tablet so that the computer can monitor a person writing down a character you have an additional break. Instead of finding the little patterns just monitor the movement of the stroke as it goes through each of the 9 regions. For instance if I have the letter T, first I make the down-stroke, this passes through regions 2, 5 and 8 then I make the top bar, this passes through regions 1, 2 and 3. Code this as two strokes like so:

(2,5,8),(1,2,3)

Then just collect a table of these and whenever you get an unknown look for it the table. If you don't find it or get the answer wrong have the user tell you what it is and add this new entry to the table. If you're trying to monitor a person's writing then your worst problem is trying to figure out when a stroke is part of the current character or part of a new character. (I've lost track of where I saw this algorithm, an old AFIPS edition I think. A more complicated version is the Ledeen character recognition algorithm found in Principles of Interactive Computer Graphics by Newman and Sproull, McGraw-Hill, 1973.) In general you don't necessarily want the 9 small squares I've used, other areas of different sizes and shapes may work better.

Pattern Recognition Algorithms

Probably the cheapest pattern recognition algorithm you can use is the nearest neighbor algorithm. Take the vector you get from the unknown and compute its distance from all the patterns in your database, the smallest distance gives the best match.

A not so simple pattern recognition algorithm is back-propagation (or backprop). It involves a fair amount of math so I do not recommend it for people with math-anxiety however it is one of the very most useful pattern recognition algorithms around. If you want a tutorial including a derivation using calculus get either my version in postscript (68k) or see the HTML version.. I also have some backprop software for Unix/Linux and Microsoftian systems.


If you have any questions or comments, write me.

To Don's Home Page