Intelligence as an Emergent Behavior or, The Songs of Eden

May 2, 2002

Originally published Winter 1988 in Daedalus, Journal of the American Academy of Arts and Sciences. Published on KurzweilAI.net on May 2, 2002.

Sometimes a system with many simple components will exhibit a behavior of the whole that seems more organized than the behavior of the individual parts. Consider the intricate structure of a snowflake. Symmetric shapes within the crystals of ice repeat in threes and sixes, with patterns recurring from place to place and within themselves at different scales. The shapes formed by the ice are consequences of the local rules of interaction that govern the molecules of water, although the connection between the shapes and the rules is far from obvious. After all, these are the same rules of interaction that cause water to suddenly turn to steam at its boiling point and cause whirlpools to form in a stream. The rules that govern the forces between water molecules seem much simpler than crystals or whirlpools or boiling points, yet all of these complex phenomena are called emergent behaviors of the system.

It would be very convenient if intelligence were an emergent behavior of randomly connected neurons in the same sense that snowflakes and whirlpools are the emergent behaviors of water molecules. It might then be possible to build a thinking machine by simply hooking together a sufficiently large network of artificial neurons. The notion of emergence would suggest that such a network, once it reached some critical mass, would spontaneously begin to think.

This is a seductive idea, since it allows for the possibility of constructing intelligence without first understanding it. Understanding intelligence is difficult and probably a long way off. The possibility that it might spontaneously emerge from the interactions of a large collection of simple parts has considerable appeal to a would-be builder of thinking machines. Unfortunately, as a practical approach to construction, the idea tends to be unproductive. The concept of emergence, in itself, offers neither guidance on how to construct such a system nor insight into why it would work.

Ironically, this apparent inscrutability accounts for much of the idea’s continuing popularity, since it offers a way to believe in physical causality while simultaneously maintaining the impossibility of a reductionist explanation of thought. For some, our ignorance of how local interactions produce emergent behavior offers a reassuring fog in which to hide free will.

There has been a renewal of interest in emergent behavior in the form of neural networks and connectionist models, spin glasses and cellular automata, and evolutionary models. The reasons for this interest have little to do with philosophy in one way or the other, but rather are a combination of new insights and new tools. The insights come primarily from a branch of physics called "dynamical systems theory." The tools come from the development of new types of computing devices. Just as in the 1950’s we thought of intelligence in terms of servomechanism, and in the 60’s and 70’s in terms of sequential computers, we are now beginning to think in terms of parallel machines. This is not a deep philosophical shift, but it is of great practical importance, since it is now possible to study large emergent systems experimentally.

Inevitably, anti-reductionists interpret such progress as a schism within the field between symbolic rationalists who oppose them and gestaltists who support them. I have often been asked which "side" I am on. Not being a philosopher, my inclination is to focus on the practical aspects of this question: How would we go about constructing an emergent intelligence? What information would we need to know in order to succeed? How can this information be determined by experiment?

The emergent system that I can most easily imagine would be an implementation of symbolic thought, rather than a refutation of it. Symbolic thought would be an emergent property of the system. The point of view is best explained by the following parable about the origin of human intelligence. As far as I know, this parable of human evolution is consistent with the available evidence (as are many others), but since it is chosen to illustrate a point it should be read as a story rather than as a theory. It is reversed from most accepted theories of human development in that it presents features that are measurable in the archeological records such as increased brain size, food sharing, and neoteny, as consequences rather than as causes of intelligence.

Once upon a time, about two and a half million years ago, there lived a race of apes that walked upright. In terms of intellect and habit they were similar to modern chimpanzees. The young apes, like many young apes today, had a tendency to mimic the actions of others. In particular, they had a tendency to imitate sounds. If one ape went "ooh, eeh, eeh," it would be likely that the other one would repeat, "ooh, eeh, eeh." (I do not know why apes do this, but they do. As do many species of birds.) Some sequences of sounds were more likely to be repeated than others. I will call these "songs."

For the moment let us ignore the evolution of the apes and consider the evolution of the songs. Since the songs were replicated by the apes, and since they sometimes died away and were occasionally combined with others, we may consider them, very loosely, a form of life. They survived, bred, competed with one another, and evolved according to their own criterion of fitness. If a song contained a particularly catchy phrase that caused it to be repeated often, then that phrase was likely to be repeated and incorporated into other songs. Only songs that had a strong tendency to be repeated survived.

The survival of the song was only indirectly related to the survival of the apes. It was more directly affected by the survival of other songs. Since the apes were a limited resource, the songs had to compete with one another for a chance to be sung. One successful strategy for competition was for a song to specialize; that is, for it to find a particular niche where it would be likely to be repeated. Songs that fit particularly well with a specific mood or activity of an ape had a special survival value for this reason. (I do not know why some songs fit well with particular moods, but since it is true for me I do not find it hard to believe for my ancestors.)

Up to this point the songs were not of any particular value to the apes. In a biological sense they were parasites, taking advantage of the apes’ tendency to imitate. Once the songs began to specialize, however, it became advantageous for an ape to pay attention to the songs of others and to differentiate between them. By listening to songs, a clever ape could gain useful information. For example, an ape could infer that another ape had found food, or that it was likely to attack. Once the apes began to take advantage of the songs, a mutually beneficial symbiosis developed. Songs enhanced their survival by conveying useful information. Apes enhanced their survival by improving their capacity to remember, replicate, and understand songs. The blind forces of evolution created a partnership between the songs and the apes that thrived on the basis of mutual self-interest. Eventually this partnership evolved into one of the world’s most successful symbionts: us.

Unfortunately, songs do not leave fossils, so unless some natural process has left a phonographic trace, we may never know if this is what really happened. But if the story is true, the apes and the songs became the two components of human intelligence. The songs evolved into the knowledge, mores, and mechanism of thought that together are the symbolic portion of human intelligence. The apes became apes with bigger brains, perhaps optimized for late maturity so that they could learn more songs. "Homo Sapiens" is a cooperative combination of the two.

It is not unusual in nature for two species to live together so interdependently that they appear to be a single organism. Lichens are symbionts of a fungus and an alga living so closely intertwined that they can only be separated under a microscope. Bean plants need living bacteria in their roots to fix the nitrogen from the soil, and in return the bacteria need nutrients from the bean plants. Even the single celled "Paramecium Bursarra" uses green algae living inside itself to synthesize food.

There may be an example even closer to the songs and the apes, where two entirely different forms of "life" form a symbiosis. In "The Origins of Life," Freeman Dyson suggests that biological life is a symbiotic combination of two different self-reproducing entities with very different forms of replication. Dyson suggests that life originated in two stages. While most theories of the origin of life start with nucleotides replicating in some "primeval soup," Dyson’s theory starts with metabolizing drops of oil.

In the beginning, these hypothetical replicating oil drops had no genetic material, but were self-perpetuating chemical systems that absorbed raw materials from their surroundings. When a drop reached a certain size it would split, with about half of the constituents going to each part. Such drops evolved efficient metabolic systems even though their rules of replication were very different from the Mendelian rules of modern life. Once the oil drops became good at metabolizing, they were infected by another form of replicators, which, like the songs, have no metabolism of their own. These were parasitic molecules of DNA which, like modern viruses, took advantage of the existing machinery of the cells to reproduce. The metabolizers and the DNA eventually co-evolved into a mutually beneficial symbiosis that we know today as life.

This two-part theory of life is not conceptually far from the two-part story of intelligence. Both suggest that a pre-existing homeostatic mechanism was infected by an opportunistic parasite. The two parts reproduced according to different set of rules, but were able to co-evolve so successfully that the resulting symbiont appears to be a single entity.

Viewed in this light, choosing between emergence and symbolic computation in the study of intelligence would be like choosing between metabolism and genetic replication in the study of life. Just as the metabolic system provides a substrate in which the genetic system can work, so an emergent system may provide a substrate in which the symbolic system can operate.

Currently, the metabolic system of life is far too complex for us to fully understand or reproduce. By comparison, the Mendelian rules of genetic replication are almost trivial, and it is possible to study them as a system unto themselves without worrying about the details of metabolism which supports them. In the same sense, it seems likely that symbolic thought can be fruitfully studied and perhaps even recreated without worrying about the details of the emergent system that supports it. So far this has been the dominant approach in artificial intelligence and the approach that has yielded the most progress.

The other approach is to build a model of the emergent substrate of intelligence. This artificial substrate for thought would not need to mimic in detail the mechanisms of the biological system, but it would need to exhibit those emergent properties that are necessary to support the operations of thought.

What is the minimum that we would need to understand in order to construct such a system? For one thing, we would need to know how big a system to build. How many bits are required to store the acquired portion of human knowledge of a typical human? We need to know an approximate answer in order to construct an emergent intelligence with human-like performance. Currently the amount of information stored by a human is not known to within even two orders of magnitude, but it can in principle be determined by experiment. There are at least three ways the question might be answered.

One way to estimate the storage requirements for emergent intelligence would be from an understanding of the physical mechanisms of memory in the human brain. If that information is stored primarily by modifications of synapses, then it would be possible to measure the information storage capacity of the brain by counting the number of synapses. Elsewhere in this issue, Schwartz shows that this method leads to an upper bound on the storage capacity of the brain of 10 to the 15th bits. Even knowing the exact amount of physical storage in the brain would not completely answer the question of storage requirement, since much of the potential storage might be unused or used inefficiently. But at least this method can help establish an upper bound on the requirements.

A second method for estimating the information in symbolic knowledge would be to measure it by some form of statistical sampling. For instance, it is possible to estimate the size of an individual’s vocabulary by testing specific words randomly sampled from a dictionary. The fraction of words known by the individual is a good estimate of the fraction of words known in the complete dictionary. The estimated vocabulary size is this fraction times the number of words in the dictionary. The experiment depends on having a predetermined body of knowledge against which to measure. For example, it would be possible to estimate how many facts in the "Encyclopedia Britannica" were known by a given individual, but this would give no measure of facts not contained within the encyclopedia. The method is useful only in establishing a lower bound.

A related experiment is the game of "20 questions" in which one player identifies an object chosen by the other by asking a series of 20 yes-or-no questions. Since each answer provides no more than a single bit of information, and since skillful players generally require almost all of the 20 questions to choose correctly, we can estimate that the number of allowable choices is on the order of 2 to the 20th, or about one million. This gives an estimated number of allowable objects known in common by the two players. Of course, the measure is inaccurate since the questions are not perfect and the choices of objects are not random. It is possible that a refined version of the game could be developed and used to provide another lower bound.

A third approach to measuring the amount of information of the symbolic portion of human knowledge is to estimate the rate of acquisition and to integrate over time. For example, experiments on memorizing random sequences of syllables indicate that the maximum memorization rate of this type of knowledge is about one "chunk" per second. A "chunk" in this context can be safely assumed to contain less than 100 bits of information, so the results suggest that the maximum rate that a human is able to commit information to long-term memory is significantly less than 100 bits per second. If this is true, a 20-year-old human learning at the maximum rate for 16 hours a day would know less than 50 gigabits of information. I find this number surprisingly small.

A difficulty with this estimate of the rate of acquisition is that the experiment measures only information coming through one sensory channel under one particular set of circumstances. The visual system sends more than a million times this rate of information to the optic nerve, and it is conceivable that all of this information is committed to memory. If it turns out that images are stored directly, it will be necessary to significantly increase the 100 bit per second limit, but there is no current evidence that this is the case. In experiments measuring the ability of exceptional individuals to store "eidetic" images of random dot stereograms, the subjects are given about 5 minutes to "memorize" a 128×128 image. Memorizing only a few hundred of these bits is probably sufficient to pass the test.

I am aware of no evidence that suggests more than a few bits per second of any type of information can be committed to long-term memory. Even if we accept at face value reports of extraordinary feats of memory, such as those of Luria’s showman in "Mind of the Mnemonist", the average rate of commitment to memory never seems to exceed a few bits per second. Experiments should be able to refine this estimate, but even if we knew the maximum rate exactly, the rate averaged over a lifetime would probably be very much less. Knowing the maximum rate would establish an upper bound on the requirements of storage.

The sketchy data cited above would suggest that an intelligent machine would require 10 to the 9th bits of storage, plus or minus two orders of magnitude. This assumes that the information is encoded in such a way that it requires a minimum amount of storage, which for the purpose of processing information would probably not be the most practical representation. As a would-be builder of thinking machines, I find this number encouragingly small, since it is well within the range of current electronic computers. As a human with an ego, I find it distressing. I do not like to think that my entire lifetime of memories could be placed on a reel of magnetic tape. Hopefully experimental evidence will clear this up one way or the other.

There are a few subtleties in the question of storage requirements, in defining the quantity of information in a way that is independent of the representation. Defining the number of bits in the information-theoretical sense requires a measure of the probabilities over the ensemble of possible states. This means assigning an "a priori" probability to each possible set of knowledge, which is the role of inherited intelligence. Inherited intelligence provides a framework in which the knowledge of acquired intelligence can be interpreted. Inherited intelligence defines what is knowable; acquired intelligence determines which of the knowable is known.

Another potential difficulty is how to count the storage of information that can be deduced from other data. In the strict information-theoretical sense, data that can be inferred from other data add no information at all. An accurate measure would have to take into account the possibility that knowledge is inconsistent, and that only limited inferences are actually made. These are the kind of issues currently being studied on the symbolic side of the field of artificial intelligence.

One issue that does not need to be resolved to measure storage capacity is distributed versus localized representation. Knowing what types of representation are used in what parts of the human brain would be of considerable scientific interest, but it does not have a profound impact on the amount of storage in the system, or on our ability to measure it. Non-technical commentators have a tendency to attribute almost mystical qualities to distributed storage mechanisms such as those in holograms and neural networks, but the limitations on their storage capacities are well understood.

Distributed representations with similar properties are often used within conventional digital computers, and they are invisible to most users except in the system’s capacity to tolerate errors. The error correcting memory used in most computers is a good example. The system is composed of many physically separate memory chips, but any single chip can be removed without loosing any data. This is because the data is not stored in any one place, but in a distributed non-local representation across all of the units. In spite of the "holographic" representation, the information storage capacity of the system is no greater than it would be with a conventional representation. In fact, it is slightly less. This is typical of distributed representations.

Storage capacity offers one measure of the requirements of a human-like emergent intelligence. Another measure is the required rate of computation. Here there is no agreed upon metric, and it is particularly difficult to define a unit of measure that is completely independent of representation. The measure suggested below is simple and the answer is certainly important, if not sufficient.

Given an efficiently stored representation of human knowledge, what is the rate of access to that storage (in bits per second) required to achieve human-like performance? Here "efficiently stored representation" means any representation requiring only a multiplicative constant of storage over the number of bits of information. This restriction eliminates the formal possibility of a representation storing a pre-computed answer to every question. Allowing storage within a multiplicative constant of the optimum does restrict the range of possible representations, but it allows most representations that we would regard as reasonable. In particular, it allows both distributed and local representations.

The question of the bandwidth required for human-like performance is accessible by experiment, along similar approaches as those outlined for the question of storage capacity. If the "cycle time" of human memory is limited by the firing time of a neuron, then the ratio of this answer to the total number of bits tells the fraction of the memory that is accessed simultaneously. This gives an indication of the parallel or serial nature of the computation. Informed opinions differ greatly in this matter. The bulk of the quantitative evidence favors the serial approach. Memory retrieval times for items in lists, for example, depend on the position and the number of items in the list. Except for sensory processing, most successful artificial intelligence programs have been based on serial models of computation, although this may be a distortion caused by the availability of serial machines.

My own guess is that the reaction time experiments are misleading and that human-level performance will require accessing of large fractions of the knowledge several times per second. Given a representation of acquired intelligence with a realistic representation efficiency of 10%, the 10 to the 9th bits of memory mentioned above would require a memory bandwidth about 10 to the 11th bits per second. This bandwidth seems physiologically plausible since it corresponds to about a bit per second per neuron in the cerebral cortex.

By way of comparison, the memory bandwidth of a conventional electronic computer is in the range of 10 to the 6th to 10 to the 8th bits per second. This is less than 0.1% of the imagined requirement. For parallel computers the bandwidth is considerably higher. For example, a 65,536 processor Connection Machine can access its memory at approximately 10 to the 11th bits per second. It is not entirely coincidence that this fits well with the estimate above.

Another important question is: What sensory-motor functions are necessary to sustain symbolic intelligence? An ape is a complex sensory-motor machine, and it is possible that much of this complexity is necessary to sustain intelligence. Large portions of the brain seem to be devoted to visual, auditory, and motor processing, and it is unknown how much of this machinery is needed for thought. A person who is blind and deaf or totally paralyzed can undoubtedly be intelligent, but this does not prove that the portion of the brain devoted to these functions is unnecessary for thought. It may be, for example, that a blind person takes advantage of the visual processing apparatus of the brain for spatial reasoning.

As we begin to understand more of the functional architecture of the brain, it should be possible to identify certain functions as being unnecessary for thought by studying patients whose cognitive abilities are unaffected by locally confined damage to the brain. For example, binocular stereo fusion is known to take place in a specific area of the cortex near the back of the head. Patients with damage to this area of the cortex have visual handicaps, but show no obvious impairment in their ability to think. This suggests that stereo fusion is not necessary for thought. This is a simple example, and the conclusion is not surprising, but it should be possible by such experiments to establish that many sensory-motor functions are unnecessary. One can imagine, metaphorically, whittling away at the brain until it is reduced to its essential core. Of course it is not quite this simple. Accidental damage rarely incapacitates completely and exclusively a single area of the brain. Also, it may be difficult to eliminate one function at a time since one mental capacity may compensate for the lack of another.

It may be more productive to assume that all sensory-motor apparatus is unnecessary until proven useful for thought, but this is contrary to the usual point of view. Our current understanding of the philogenic development of the nervous system suggests a point of view in which intelligence is an elaborate refinement of the connection between input and output. This is reinforced by the experimental convenience of studying simple nervous systems, or studying complicated nervous systems by concentrating on those portions most directly related to input and output. By necessity, most everything we know about the function of the nervous system comes from experiments on those portions that are closely related to sensory inputs or motor outputs. It would not be surprising if we have overestimated the importance of these functions to intelligent thought.

Sensory-motor functions are clearly important for the application of intelligence and for its evolution, but these are separate issues from the question above. Intelligence would not be of much use without an elaborate system of sensory apparatus to measure the environment and an elaborate system motor apparatus to change it, nor would it have been likely to have evolved. But the apparatus necessary to exercise and evolve intelligence is probably very much more than the apparatus necessary to sustain it. One can believe in the necessity of the opposable thumb for the development of intelligence, without doubting a human capacity for thumbless thought. It is quite possible that even the meager sensor-motor capabilities that we currently know how to provide would be sufficient for the operation of emergent intelligence.

These questions of capacity and scope are necessary in defining the magnitude of the task of constructing an emergent intelligence, but the key question is one of understanding. While it is possible that we will be able to recreate the emergent substrate of intelligence without fully understanding the details of how it works, it seems likely that we would at least need to understand some of its principles. There are at least three paths by which such understanding could be achieved. One is to study the properties of specific emergent systems, to build a theory of their capabilities and limitations. This kind of experimental study is currently being conducted on several classes of promising systems including neural networks, spin glasses, cellular automata, classifier systems and adaptive automata. Another possible path to understanding is the study of biological systems, which are our only real examples of intelligence, and our only example of an emergent system which has produced intelligence. The disciplines that have provided the most useful information of this type so far have been neurophysiology, cognitive psychology, and evolutionary biology. A third path would be a theoretical understanding of the requirements of intelligence, or of the phenomena of emergence. Examples of relevant disciplines of theories of logic and computability, linguistics, and dynamical systems theory. Anyone who looks to emergent systems as a way of defending human thought from the scrutiny of science is likely to be disappointed.

One cannot conclude, however, that a reductionist understanding is necessary for the creation of intelligence. Even a little understanding could go a long way toward the construction of an emergent system. A good example of this is how cellular automata have been used to simulate the emergent behavior of fluids.

The whirlpools that form as a fluid flows past a barrier are not well understood analytically, yet they are of great practical importance in the design of boats and airplanes. Equations that describe the flow of a fluid have been known for almost a century, but except for a few simple cases they cannot be solved. In practice the flow is generally analyzed by simulation. The most common method of simulation is the numerical solution of the continuous equations.

On a highly parallel computer it is possible to simulate fluids with even less understanding of the system, by simulating billions of colliding particles that reproduce the emergent phenomena such as vortices. Calculating the detailed molecular interactions for so many particles would be extremely difficult, but a few simple aspects of the system such as conservations of energy and particle number are sufficient to reproduce the large-scale behavior. A system of simplified particles that obey these two laws, but are otherwise unrealistic, can reproduce the same emergent phenomena as reality. For example, it is possible to use particles of unit mass that move only at unit speed along a hexagonal lattice, colliding according to the rules of billiard balls. Experiments show that this model produces laminar flow, vortex streams, and even turbulence that is indistinguishable from the behavior of real fluids. Although the detailed rules of interaction are very different than the interactions of real molecules, the emergent phenomena are the same. The emergent phenomena can be created without understanding the details of the forces between the molecules or the equations that describe the flow of the fluid.

The recreation of intricate patterns of ebbs and flows within a fluid offers an example of how it is possible to produce a phenomenon without fully understanding it. But the model was constructed by physicists who knew a lot about fluids. That knowledge helped to determine which features of the physical system were important to implement, and which were not.

Physics is an unusually exact science. Perhaps a better example of an emergent system which we can simulate with only a limited understanding is evolutionary biology. We understand, in a weak sense, how creatures with Mendelian patterns of inheritance, and different propensities for survival can evolve toward better fitness in their environments. In certain simple situations we can even write down equations that describe how quickly this adaptation will take place. But there are many gaps in our understanding of the processes of evolution. We can explain in terms of natural selection why flying animals have light bones, but we cannot explain why certain animals have evolved flight and others have not. We have some qualitative understanding of the forces that cause evolutionary change, but except in the simplest cases, we cannot explain the rate or even the direction of that change.

In spite of these limitations, our understanding is sufficient to write programs of simulated evolution that show interesting emergent behaviors. For example, I have recently been using an evolutionary simulation to evolve programs to sort numbers. In this system, the genetic material of each simulated individual is interpreted as a program specifying a pattern of comparisons and exchanges. The probability of an individual survival in the system is dependent on the efficiency and accuracy of this program in sorting numbers. Surviving individuals produce offspring by sexual combination of their genetic material with occasional random mutation. After tens of thousands of generations, a population of hundreds of thousands of such individuals will evolve very efficient programs for sorting. Although I wrote the simulation producing these sorting programs, I do not understand in detail how they were produced or how they work. If the simulation had not produced working programs, I would have had very little idea about how to fix it.

The fluid flow and simulated evolution examples suggest that it is possible to make a great deal of use of a small amount of understanding. The emergent behaviors exhibited by these systems are a consequence of the simple underlying rules, which are defined by the program. Although the systems succeed in producing the desired results, their detailed behaviors are beyond our ability to analyze and predict. One can imagine if a similar process produced a system of emergent intelligence, we would have a similar lack of understanding about how it worked.

My own guess is that such an emergent system would not be an intelligent system itself, but rather the metabolic substrate on which intelligence might grow. In terms of the apes and the songs, the emergent portion of the system would play the role of the ape, or at least that part of the ape that hosts the songs. This artificial mind would need to be inoculated with human knowledge. I imagine this process to be not so different from teaching a child. This would be a tricky and uncertain procedure since, like a child, this emergent mind would presumably be susceptible to bad ideas as well as good. The result would be not so much an artificial intelligence, but rather a human intelligence sustained within an artificial mind.

Of course, I understand that this is just a dream. And I will admit that I am more propelled by hope than by the probability of success. But if, within this artificial mind, the seed of human knowledge begins to sustain itself and grow of its own accord, then for the first time human thought will live free of bones and flesh, giving this child of mind an earthly immortality denied to us.

Attempts to create emergent intelligence, at least those that are far enough in the past for us to judge, have been disappointing. Many computational systems, such as homeostats, perceptrons, and cellular automata exhibit clear examples of emergent behavior, but that behavior falls far short of intelligence. A perceptron, for example, is a collection of artificial neurons that can recognize simple patterns. Considerable optimism was generated in the 1960’s when it was proved that anything a perceptron could recognize, it could learn to recognize from examples. This was followed by considerable disappointment when it was realized that the set of things that could be recognized at all was very limited. What appeared to be complicated behavior of the system turned out in the final analysis to be surprisingly simple.

In spite of such disappointments, I believe that the notion of emergence contains an element of truth, an element that can be isolated and put to use.

A helpful analogy is the brewing of beer. The brewmaster creates this product by making a soup of barley and hops, and infecting it with yeast. Chemically speaking most of the real work is done by the yeast, which converts the starch to alcohol. The brewmaster is responsible for creating and maintaining the conditions under which that conversion can take place. The brewmaster does not need to understand exactly how the yeast does its work, but does need to understand the properties of the environment in which the yeast will thrive. By providing the right combination of ingredients at the right temperature in the right container, the brewmaster is able to create the necessary conditions for the production of beer.

Something analogous to this process may be possible in the creation of an artificial intelligence. It is unlikely that intelligence would spontaneously appear in a random network of neurons, just as it is unlikely that life would spontaneously appear in barley soup. But just as carefully mixed soup can be inoculated with yeast, it may be that a carefully constructed network of artificial neurons can be inoculated with thought.

The approach depends on the possibility of separating human intelligence into two parts, corresponding to the soup and the yeast. Depending on one’s point of view, these two parts can be viewed as hardware and software, intellect and knowledge, nature and nurture, or program and data. Each point of view carries with it a particular set of intuitions about the nature of the split and the relative complexity of the parts.

One way that biologists determine if a living entity is a symbiont is to see if the individual components can be kept alive separately. For example, biologists have tried (unsuccessfully) to prove the oil-drop theory by sustaining metabolizing oil drops in an artificial nutrient broth. Such an experiment for human intelligence would have two parts. One would be a test of the human ape’s ability to live without the ideas of human culture. This experiment is occasionally conducted in an uncontrolled form when feral children are reared by animals. The two-part theory would predict that such children, before human contact, would not be significantly brighter than nonhuman primates. The complementary experiment, sustaining human ideas and culture in an artificial broth, is the one in which we are more specifically interested. If this were successful we would have a thinking machine, although perhaps it would not be accurate to call it an artificial intelligence. It would be natural intelligence sustained within an artificial mind.

To pursue the consequences of this point of view, we will assume that human intelligence can be cleanly divided into two portions that we will refer to as acquired and inherited intelligence. These correspond to the songs and to the apes, respectively, or in the fermentation metaphor, the yeast and the barley soup. We will consider only those features of inherited intelligence that are necessary to support acquired intelligence, only those features of acquired intelligence that impose requirements on inherited intelligence. We will study the interface between the two.

Even accepting this definition of the problem, it is not obvious that the interface is easy to understand or recreate. This leads to a specific question about the scope of the interface that can presumably be answered by experiment.

The functional scope of the interface between acquired and inherited intelligence is not the only property that can be investigated. To build a home for an animal, the first thing we would need to know is the animal’s size. This is also one of the first things we need to know in building an artificial home for acquired intelligence. This leads to question number two:

The guesses to answers that I have given are imprecise, but the questions are not. In principle they can be answered by experiment. The final question I will pose is more problematic. What I would like to ask is "What are the organizing principles of inherited intelligence?" but this question is vague and it is not clear what would be an acceptable answer. I shall substitute a more specific question that hopefully captures the same intent:

"Question IV: What quantities remain constant during the computation of intelligence; or, equivalently, what functions of state are minimized?"

This question assumes that inheritable intelligence is some form of homeostatic process and asks what quantity is held static. It is the most difficult of the four questions, but historically it has been an important question to ask in areas when there was not yet a science to guide progress.

The study of chemistry is one example. In chemical reactions between substances it is obvious that a great number of things change and not so obvious what stays the same. It turns out that if the experiment is done carefully, the weight of the reactants will always equal the weight of the product. The total weight remains the same. This is an important organizing principle in chemistry and understanding it was a stepping stone to the understanding of an even more important principle: the conservation of the weights of the individual elements. The technical difficulty of defining and creating a truly closed experiment, in particular eliminating the inflow and outflow of gases, explains why chemists did not fully appreciate these principles until the middle of the 19th century.

Another very different example of a system that can be understood in terms of what is held constant is the system of formal logic. This is a set of rules under which sentences may be changed without changing their truth. A similar example, which has also been important to artificial intelligence, is the lambda calculus, which is the basis of the language Lisp. This is a system of transforming expressions in such a way that their "values" do not change, where the values are those forms of the expression which are not changed by the transformations. (This sounds circular because it is. A more detailed explanation would show it to be more so.) These formal systems are conceptually organized around that which is held constant.

In physics there are many examples of how conservations have been used successfully to organize our conception of reality, but while conservations of energy, momentum, mass, and charge are certainly important, I do not wish to make too much of them in this context. In this sense the principles of conservation will more likely resemble those of biology than physics.

One of the most useful conservation principles in biology appears in the notion of a gene. This is the unit of character determination that is conserved during reproduction. In sexual reproduction this can get complicated since an individual receives a set of genes from each of two parents. A gene that affects a given trait may not be expressed if it is masked by another, and there is not a simple correspondence between genes and measurable traits. The notion that atomic units of inheritance are always present, even when they are not expressed, was hard to accept and it was not widely believed almost a century after Mendel’s initial experiments. In fact the conservation is not perfect, but it is still one of the most important organizing principles in the study of living organisms.

In biology, the rules of conservation are often expressed as minimum principles. The two forms are equivalent. For instance, the minimum principle corresponding to the physical conservation of momentum is the principle of least action. A biological example is the principle of optimal adaptation, which states that species will evolve toward fitness to their environments. The distance to the ideal is minimized. A conservation principle associated with this is the Fischer Theorem of Natural Selection, which states that the rate of change in fitness is equal to the genetic variance. In cases where this minimum principle can be applied, it allows biologists to quantitatively predict the values of various biological parameters.

For example, sickle-cell anemia is a congenital disease controlled by a recessive gene. Individuals who inherit the gene from both parents are likely to die without reproducing, but individuals who inherit the gene from a single parent are resistant to malaria. In certain regions of West Africa 40% of the population carries the gene. From this fact and the principle of optimal fitness, it is possible to predict that the survival advantage of resistance to malaria is about 25% in these regions. This estimate fits well with measured data. Similar methods have been used to estimate the number of eggs laid by a bird, the shape of sponges, and the gait of animals at different speeds. But these examples of applying a minimum principle are not so crisp as those of physics. Why, for example, do we not evolve a non-lethal gene that protects against malaria? The answer is complicated, and the principle of fitness offers no help. It is useful in aiding our understanding, but it does not explain all. This is probably the kind of answer to Question IV for which we will have to settle.

Even in physics, knowledge of the exact law does not really explain all behaviors. The snowflakes and whirlpools of water are examples. The forces that govern the interaction of water molecules are understood in some detail, but there is no analytical understanding of the connection between these forces and their emergent behaviors of water.

On the other hand, our goal is not necessarily to understand, but to recreate. In both of the examples mentioned, conservation principles give us sufficient understanding to recreate the phenomena.

In order to achieve this kind of understanding for intelligence it will be necessary to ask and answer the kinds of questions that are mentioned above.

I do not know the answer to Question IV. It is possible that it will be very complicated and the interface between acquired and inherited intelligence will be difficult to reproduce. But it is also possible that it will be simple. One can imagine this would be the artificial substrate for thought.

Once this is achieved it will still remain to inoculate the artificial mind with the seed of knowledge. I imagine this to be not so different from the process of teaching a child. It will be a tricky and uncertain process since, like a child, this mind will presumably be susceptible to bad ideas as well as good. The first steps will be the most delicate. If we have prepared well, it will reach a point where it can sustain itself and grow of its own accord.

For the first time human thought will live free of bones and flesh, giving this child of mind an earthly immortality denied to us.

References

Dyson, Freeman. "The Origins of Life", Cambridge University Press, 1985.

Haldane, J. B. S. "The Causes of Evolution", Harper & Brothers, 1932.

Hillis, W. Daniel. "The Connection Machine", The MIT Press, 1985.

Luria, A. R. "Mind of the Mnemonist", Basic Books, 1968.
Newell, Allen. "Human Problem Solving", Prentice Hall, 1972.

Wolfram, Stephen. "Theory of Applications of Cellular Automata", World Scientific, 1986.

"Intelligence as an Emergent Behavior or, The Songs of Eden" reprinted by permission of Daedalus, Journal of the American Academy of Arts and Sciences, from the issue entitled, "Artificial Intelligence," Winter 1988, Vol. 117, No. 1.