This is one of my favorite chapters because it includes very revolutionary and very insightful ideas. As I've mentioned above there are many ways to do pattern recognition: keep cases, make rules or train networks. Making rules and training networks are examples of ways to compress lots of cases down to a small number of rules or a network. Thus these methods are much like the UNIX compress function or the JPEG compression algorithm for compressing pictures. In fact in practice they are more like the JPEG compression algorithm in that they are also lossy like the JPEG format. In that JPEG format the compression scheme will degrade the quality of the picture just a little but generally not enough that a person can notice any loss of picture quality. Likewise with rules and networks its unlikely you can get them to fit your data perfectly so you experience loss with these methods as well. Of course there is a difference between what rules do and what compress/JPEG does. Compress and JPEG can give you back what you entered. With rules and networks there is no way to get back the original data, although if you started generating inputs for the rules or the network then every time you get an answer you generate a new case.
Of course the psychology of using rules and networks to compress data is interesting in itself. First there is the traditional scientific goal of trying to describe how the world operates using as few equations (their version of rules) as necessary. I'm sure this is a major motivation for the symbolic AI community to find rules to express all that is known about the world.
The other bit of psychology involved in compressing knowledge is that its really convenient for von Neumann style computers to work that way. All the computation has to go through a single CPU and to scan a large database as you do in a nearest neighbor method takes a lot of time. Thus a few rules can deal with a problem much faster than an exhaustive search of memory. But what if you have a machine architecture that can search for the nearest match easier than it can deal with rules? For now there are systems like the Connection Machine that can search in parallel. Then there is the scheme used in the Boltzman machine where given part of a pattern you can come up with the rest of the pattern using one cooling session.
So it strikes me that between the scientific tradition of forming rules and the constraints of the von Neumann architecture AI researchers have been mentally trapped into the condensed/compressed knowledge scenario. But in this chapter try to free yourself from such thinking. Consider the virtues of a different architecture, one where you don't even have to bother to come up with rules! Quite a liberating concept if you ask me! But if people operate mostly with cases it poses quite a problem for AI researchers because to come close to duplicating human capabilities it will require vast amounts of storage space as well as an appropriate parallel processing architecture.
I just ran into a WWW site that promotes Case Based Reasoning: Welcome to AI-CBR
I've originated the term "condensed knowledge" here not just to capture the "compression" type of thinking that I've described above but also to make it clear that traditional symbolic systems try to capture knowledge about the world without the system ever having seen complete examples of what it is supposed to know about. So these systems know facts about, say a rose, without ever having seen a complete rose, or smelled one or been pricked by a thorn. Its a philosophy somewhat like taking a human being and describing one by saying that a person is 95% (or whatever percent) water, x% nitrogen, y% calcium, z% carbon, etc. I chose the word condensed because of its use for cans of condensed soup that you buy at the store.
In brief this section gives arguments for and against the use of condensed knowledge.
This looks at one simple original example of mine that was also done in backprop in an earlier section plus MBRtalk, HERBIE and JOHNNY.
This section looks at one simple original example plus the CHEF program.
This section just lists some other case based programs.
One thing I noted in the text is that with a memory based solution the program "knows when it knows" (said in the Waltz and Stanfill article). By this it means that if a new case is close enough to a known case you can be pretty sure the answer is right but if its far away you have to worry. For instance if you find a rock that has twice as much uranium in it than anyone has ever found in a rock before the human being will give the rock special treatment. But a network or a system of rules will not. This makes it pretty clear that people use cases and really intelligent programs must do so as well. So if you're trying to model human behavior is there really any point to finding rules or networks? Especially when you figure that training networks and extracting rules is a lot of trouble.