The Pattern Recognition Basis of AI

The Pattern Recognition Basis of Artificial Intelligence

Chapter 10. Natural Language

Natural language processing has always been one of the most researched areas of AI. Unfortunately natural language normally has to deal with the real world not a small artificial world and therefore it is here that you can see so many failures. For a while researchers worried about the syntax (grammar) of language. Then it became clear that semantics (the meaning of words) was critical. This has not worked very well yet either. As noted in chapter 2 understanding language requires understanding the world and there are no programs that understand the world.

For more on connectionist natural language processing see James A. Hammerton's Connectionist Natural Language Processing Page at Birmingham University, United Kingdom.

I recently saw a paper A connectionist model of recursion where the authors argue that simple recurrent networks can account for human performance including the human inability to handle unrestricted levels of recursion. I'm not really capable of judging their experiments (the details numb my mind) however it seems to contain a fair amount of material on language theory and experiments so you might get some perspective on symbolic vs. connectionist models by reading parts of it.

AltaVista: Translations will translate from English to several languages. Challenge: give it some sentences to translate and report examples where the program messes up, I'll list the sillier ones here and give you credit for finding them. Entry number one comes from Carl Troein who used the system to translate a short movie review from English into German and back into English, here are the before and after texts.

A system called START from MIT has geographic information in its database and it can answer some simple questions stated in natural language. I asked it "Where is Chicago?" and it was right. I asked it "Where is Lincoln?" and it said in Nebraska, partially right, there are at least 7 towns named Lincoln in the US. I asked it "Who is Lincoln?" and it gave the answer that Lincoln is in Nebraska.

A company called Conexor will let you try out their parser and tagger software, again I'd like to hear about silly cases you run into.

For some material on case based methods and learning natural language see Claire Cardie's research interests page at Cornell University.

10.1 Formal Languages

Research on language often focuses on the syntax or grammar of language but understanding language using only syntax does not work. In the theory of formal language there are rules that generate all the correct and only the correct statements in a language. A tree called a parse tree shows the rules that were used to generate a sentence. In grade school you have to diagram sentences to show the structure but in the theory of formal languages you want the parse tree that shows how the sentence would be generated. A parse tree looks like a neural network designed to detect the presence of a sentence.

10.2 The Transition Network Grammar

A transition network grammar is a network designed to uncover the parse tree of a sentence. (You can run a transition network grammar "in reverse" and use it to generate a sentence, but I don't think anyone ever does, so calling the algorithm a "grammar" is perhaps not a very good idea.) The PROLOG code is in the PROLOG Programs Package.

10.3 Semantics Based Methods

This section first gives an example of a semantic grammar, a grammar organized around semantically meaningful quantities like people, places and things rather than the more abstract categories found in a conventional grammar.

The next item in this section deals with Roger Schank's conceptual dependency (CD) theory, a method I tend to find more appealing than other methods perhaps because it is more informal. (I think I don't like formal language and syntax based methods because I hated diagramming sentences in grade school.) The PROLOG program for processing a sentence was adapted from a LISP version in one of Schank's books and it is in the PROLOG Programs Package.

10.4 Scripts and Short Stories

This sections shows how given sentences in CD form you can implement the idea of scripts. Note how shallow the program's understanding of the event is. We have a long way to go to get programs that have the level of understanding that people have. And indeed Schank views these scripts as unrealistic, he has now moved on to worrying about how script-like processing can be done by organizing MEMORIES. This is an interesting topic for readers to look into because you can see the extent to which memories can replace rules. For a starter see Dynamic Memory by Roger Schank, Cambridge University Press. However Schank is trying to do this with symbol processing and I cannot help but think that it needs to be done with neural networking and/or image processing and/or quantum mechanics.

10.5 A Neural Network Based Approach

This section looks at the 1986 program by McClelland and Kawamoto from the second PDP book that shows how a simpler than backprop network can look at short, simple sentences and identify the agent, patient, instrument and modifier. This data is in the backprop package.

10.6 Defining Words by the Way They're Used

The system described here by Miikkulainen and Dyer is very interesting in that the system develops internal codes for each word. Each code is a vector of real numbers so similar words are close to each other in terms of Euclidean distance while dissimilar words are far apart. So maybe symbols are implemented in the human mind as vectors of reals, not plain integers?

10.7 A Recurrent Network for Sentences

This describes a work by St. John and McClelland, a recurrent network that takes in phrases and predicts what may come next. I don't see any way that a structure like this can be scaled up to human level capabilities.

10.8 Neural Based Scripts

The work by Miikkulainen and Dyer on how script processing can be implemented with backprop networks and another type of network, a self-organizing map. The self-organizing map makes it possible to learn scripts and answer questions about a script. I don't go into this part of the method in the text. If you're interested see the papers by Miikkulainen listed below. In my opinion while this script processing with networks is a nice piece of work but except for turning symbols into vectors of reals there is nothing remarkable here that improves on a conventional symbol processing implementation of scripts. It seems to be yet another AI method that can't scale up. But hey, you have to do the simple things first. If you're interested in the papers, here they are:

"Script Recognition with Hierarchical Feature Maps" by Risto Miikkulainen from the University of Texas at Austin.

"Natural Language Processing with Modular PDP Networks and Distributed Lexicon" by Risto Miikkulainen and Michael G. Dyer from the University of Texas at Austin.

"Trace Feature Map: A Model of Episodic Associative Memory" by Risto Miikkulainen and Michael G. Dyer from the University of Texas at Austin.

"DISCERN: A Distributed Neural Network Model of Script Processing and Memory" by Risto Miikkulainen from the University of Texas at Austin.

Risto Miikkulainen has done an entire book on this subject: Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory, for more see the WWW page at the University of Texas At Austin.

His neural script processing software is available for ftp from the University of Texas at Austin and an X11 graphics demo can be run remotely under the World Wide Web or by telnet to: "telnet.cascais.utexas.edu 30000". Unless you've got X11 graphics capability don't bother with this.

10.9 Learning the Past Tense of Verbs

As I note in the text it would be wonderful if a simple problem like this could give us worthwhile insights into how the mind works, it would be like Rutherford bombarding gold foil with alpha particles to determine how matter is structured. Unfortunately not everyone involved can agree on realistic training data and it appears to me that it may be possible to tweak a network to get any kind of results you want. It will be a long time before this gets sorted out.

Originally Prince and Pinker objected quite strongly to this experiment by Rumelhart and McClelland and insisted that their model of language development where children form rules to capture the regularities in the irregular English verbs was correct. Later, Pinker (at least) on the basis of another experiment with people conceded that these irregulars were stored in an associationist type of network leaving only one rule for children to develop, the rule for regular verbs. My comment here is that if a little more ingenuity went into a network model I would not be surprised if a network could account for the regulars as well.

If you're interested in an online article on this subject see: "Learning the past tense in a recurrent network: Acquiring the mapping from meaning to sounds", by Garrison Cottrell and Kim Plunkett from the Ohio State Neuroprose Archive.

Another online article is: "Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models" by Charles Ling published by the Journal of Artificial Intelligence Research is available for FTP from their site in Genoa, Italy or from Carnegie-Mellon. An online appendix containing data is also available from: Genoa, Italy or from Carnegie-Mellon. This paper shows how another pattern recognition algorithm called ID3 does a better job than various neural networking algorithms.

10.10 Other Positions on Language

For quite a while the formal language model of human language dominated thinking but now there are doubters who are finding fault with this model and looking for new models. Certainly I think a key failing of natural language theory is that it has failed to recognize that people also use mental images to ground the symbols and understand the world and it was/is incredibly naive of researchers to think they could/can get by with only grammar and symbol processing. There is just no hope that a SMALL software package will ever be able to understand natural language at the level of human beings. To get a system to understand language and the world at the level of human beings will probably require "raising" the system like a child and letting it organize its knowledge as it goes along. Without this approach there will be many, many gaps in the system's knowledge.

If you have any questions or comments, write me.

To Don's Home Page