Systems Science 5 — Information and Knowledge
In the last post in this series, I introduced the notion of organization as a fundamental concept in systems science. A key question in science in general and in systems science in particular is where does organization come from? How does the world we see today, with life and all the complexities of climate, oceans, mountains, and mineral deposits within the crustal layer of the earth, come to be?
Ontology Revisited and Some Associated Metaphysics
What exists? In Systems Science 4 I indicated that at a fundamental level, we have energy, which in a concentrated form is matter. But this is only half of the story. In this installment I want to introduce another consideration of what constitutes reality. Not everyone is going to accept this view, it is still in the realm of philosophy more than in science. However, I will provide arguments that derive from our scientific understanding of what might seem like an ethereal substance but is, in fact, quite real. The other half of the story is the Information/Knowledge duality. It is related to the Energy/Matter duality in a very deep way, as I will explain.
In Figure 1 below we see a diagrammatic representation of the structure of reality at the most fundamental level. Having some fun with lexical play, I call this the Ontos, what exists in physical reality. By this I mean to distinguish between physical reality and an imagined, or claimed reality where other things like spirits and unicorns might exist. Imagined concepts certainly 'exist' in the Ontos, but as knowledge structures within heads of people and as symbolic representations (e.g. dragons flying about in a video game or movie). Someday I need to explore the idea that humans have achieved a level of imagination that creates sometimes impossible things and events (in the sense of physical embodiment) while grounded in physical reality with all of its inherent constraints (like gravity). What does it mean????
The center of the diagram shows the basic existence of what we might call the 'stuff' of the universe. There are four quadrants, energy, matter, information, and knowledge. Matter and energy are in the left two quadrants, collectively the 'Physical' side of reality. These are the stuffs that we can observe and of which we are made. On the right side, the two quadrants labeled Information and Knowledge are, in my terminology, called the 'Ethereal' side of reality. There is noting material about information or knowledge, yet these 'stuffs' exist and are very real.
The upper half of the quadrants represent the dynamic or 'Movement' aspect of reality. Energy and Information are the stuffs that travel. But they respectively drive Matter, through forces, and Knowledge, through influencing construction (below). The lower half represents the actual 'Structure' of reality, how it is organized materially. Note that there are several relations, with arrows in the diagram, that interrelate these four aspects. Information is transferred by the flow of energy (with or without matter, i.e. as in light). Knowledge is composed in matter, i.e. the arrangement of matter in a system is a representation of that system's knowledge (also below). The last relation is an upward causal one from the lower half, 'Structure' to the upper half, 'Movement'. Namely the actual arrangements of matter, and hence knowledge, channel or constrain the flow of energy and, hence, information. In other words, there is a causal feedback loop from Movement to Structure back to Movement that recurrently creates a generative force involving all four stuffs. More on this later.
All of these are observable at the human scale of reality. Indeed it is these stuffs and their relations which form the self-similar, recursive patterns discussed in my last systems science post. But the universe scales across many orders of magnitude, from the smallest (right side), where reality is dominated by quantum laws, to the largest (left side), where gravity rules and the bulk of the universe seems to be made up of dark energy and matter. I have whimsically called this larger framework of reality the 'Mysterium', partly because the two ends of these scales are only dimly understood by human science, and partly to give a nod to mysterianism, which holds that consciousness will never be understood by the very mind that experiences it. Indeed, if the latter philosophy holds up, they may never be fully understood! But the relationship between dark energy and dark matter and the physical side of human scale reality seems straightforward, even if mysterious [incidentally, dark energy may yet prove to be a misnomer or a result of misunderstanding in cosmology, so don't take this too seriously — I just like the symmetry!] On the small side quantum effects that may be the source of things like measurement uncertainty (e.g. Heisenberg's Uncertainty Principle) may turn out to be the ultimate source of information, which is, after all, a measure of improbability (below). Quantum weirdness may lie at the root of our inability to have perfect knowledge. Uncertainty and weirdness are thus related to our remains of ignorance.
Figure 1. Reality and what exists — the Ontos. Everything that exists can be categorized as energy, matter, information, or knowledge. However, all of these are related at a deep level such that all of the structure and movement of the Universe that man can observe involve these four 'stuffs'. The Mysterium represents the scale boundaries of human understanding, at least at present.
Just as with the duality relation between energy and matter (remember Einstein?) information and knowledge are, in a sense, interconvertable. That is, information can give rise to knowledge, and knowledge can give rise to information. Unlike matter/energy which is ultimately constrained by a conservation principle (neither created nor destroyed), the information/knowledge duo appears to expand owing to the positive feedback nature of the information → knowledge → information loop. My suspicion is that the expansion is bounded ultimately by the rate of energy flow, which, when the universe comes to maximum entropy — all the stars are burned out — will bring the loop to a halt.
To explain the I/K duality I will have to treat each separately first but then delve into the interrelationship since that is the essence of this thesis. There are, however, a few additional principles to recognize as they will play into the descriptions below.
The Principle of Flow
The first principle to recognize is that of flow, or moving from one position in spacetime to a different position. In material/energy systems this is achieved when there is a region of spacetime in which there is a concentration of material/energy and another region in which there is not. The former is called a high potential region, with respect to the material/energy, the latter is called a low potential region. The rule is very simple. Material/energy flows from the region of high potential (called a source) to the region of low potential (called a sink) at a rate that is proportional to the numeric difference in their respective concentrations and inversely proportional to any "resistance" to flow through the interceding region or channel. The force exerted, pushing material/energy from source to sink, diminishes as the concentration at the sink becomes equal to that at the source.
Of course this simple description leaves out a tremendous amount of detail, such as boundaries and boundary conditions, channels and their internal interactions with the flowing material/energy, etc. As I said earlier, this is not meant to be a physics text and so the reader who is not familiar with these concepts from physics will need to take some time to brush up on them. This principle, of flow, the basis for the Second Law of Thermodynamics, is essential to understanding the Ontos.
The Principle of Dissipation
Energy can flow through material structures, appropriately arranged. In other words it is possible to find some arrangement of matter (say a configuration of atoms in a molecule) that will absorb energy in some appropriate form, but quickly be given off without doing any substantial work on the matter, i.e., change the configuration in the long run. Molecules can absorb a photon of light, maybe an electron is raised to a higher orbital energy for a pico second, and then when the electron falls back to its original (lower energy) state, a photon is emitted. The molecule may have gone through a temporary configuration change but nothing substantive changed.
This is as opposed to situations where the input of energy results in some substantial physical (mechanical or chemical) work being done to change the configuration of the matter. Energy, for example, might be stored in chemical bonds creating new molecules. Depending on how thermally stable they are, they might continue in that configuration even as additional energy is input. Such systems are further from an equilibrium state and additional input of energy may not do any additional work, being dissipated as in the above paragraph.
Systems like this might be thought of as being 'pumped' up. They reach a point at which additional input has no effect on the structure. Any additional energy input simply 'maintains' the pumped up structure.
The Principle of Amplification
Doing work involves controlling the flow of energy through a system. It involves applying the correct amount of energy at the appropriate time. Amplifiers and their binary cousins, switches, are material devices through which energy can be made to flow under the influence of a second, usually lower-level or lower power, input of energy. The transistor is a paradigm example. A high-level voltage is applied to one end of a wire that is broken in the center of the device and the gap filled with a semi conducting material. The wire exits the other side and eventually terminates in a low-level voltage (ground) but under ordinary conditions, the energy flow (the current) cannot get through the gap so that no energy flows through and out the exit. A third wire is used to control the flow of energy. This wire can carry a low-energy signal to the gap where it's current can turn the semiconductor into a conductor. The result is that the high-energy flow can cross the gap and exit to ground. Thus, a small or weak signal can be used to control a large or strong signal. In an ideal amplifier, the amount of weak signal is proportionally represented in the amount of strong signal. This means that the strong signal essentially mimics the weak one in its ups and downs, but just has more umph (a technical term meaning strength or capacity to do work!).
Figure 2 shows a generalized representation of amplification. Figure 2A shows a device that is "off" in that no low-level energy is flowing so no high-level energy can flow either. In Figure 2B, the low-level energy flow is on, which then turns on the high-energy flow, in proportion to the amount of low-level energy. [Note: Transistors used in digital devices do not actually function as amplifiers. The low-energy input is actually at the same potential as the high-energy input. The device then acts as just a switch, it is either on or off, hence the use of the term "binary".]
Figure 2. An amplifier is any physical device in which a small flux of energy (thin arrow) is used to modulate a large flux of energy (thick arrow). Application of a low-level energy flow through the thin arrow (B) opens the pathway for the high-level energy flow through the amplifying device. Such devices can be used to boost the signal of a low-level energy flow, as when an audio amplifier converts a weak radio signal into a signal strong enough to drive a speaker coil. Or it can be used to switch a flow on or off, as in the case of transistors used in digital circuits.
Amplification is what makes work potentially useful. We will revisit this principle again in the general theory of the tool. But for now let us just note that amplification is the phenomenon that makes what follows make sense. Information is conveyed in energy flows. The control of an energy flow makes it possible to encode information in the form of fluctuations in that flow. An information process is one that modulates the flow of energy in a way that results in the accomplishment of useful work. And useful work results in the modification of a physical system — in the configuration of material structures. This latter, we will argue below, is the basis of what we call knowledge.
The Principle of Transduction
If amplification is the modulation of a high-level energy signal by means of a low-level signal, then transduction is essentially its opposite. That is, transduction is the process of modulating a low-level energy signal in proportion to a high-level signal. This is the means by which sensors of various kinds produce signals related to the physical forces they respond to. A sensor is any device that converts a large input signal into a small output signal. As a general rule the input signal is due to an energy flow in one form, say mechanical, and the output signal is in another form, say electrical.
In most respects, transduction is physically similar to amplification. It is just one energy flow providing a sufficient amount of free energy to the physical device to do some work resulting in the outflow of another flow of energy. In some devices there doesn't even need to be a distinction between high-level and low-level signals, as in a simple switching device. All that is necessary is that some of the energy inflow is available to do the work of modulating the resulting signal outflow. Nevertheless, it is instructive to consider amplification and transduction as essential physical processes in order to understand the basis for the "ethereal" stuffs. Information and knowledge are dependent on the ability of systems to encode messages into communications channels and extract those messages at the receiving end.
The Principle of Buffering (Storage)
There are special devices that transduce energy flows and amplify energy flows. Energy flow is across space and time. Often time it involves a flow of matter as well. Because physical channels have finite capacities, in order to have anything like coordination between the outflow from one system to the inflow of another there needs to be some intermediary device that can temporarily store accumulations of the energy (and matter). If, for example, a receiving system is not ready to accept an inflow of energy there needs to be some intermediary buffer that can hold the energy until the receiver is ready.
The buffer is, of necessity, at a lower potential than the source of the flow, so acts as a finite sink that, as it fills up, comes to the same potential level as the source. Hence it will act as a source to the ultimate destination sink — the receiver system.
The Principle of the Quantum
One final observation about the physical stuff is in order. I believe the evidence is sufficiently strong to suggest that matter-energy structures are, at base, quantized. That is at the smallest scales of matter and energy these come in discrete packets or quanta. This certainly seems to be the case at Planck-scales of time and space. The importance of this principle is that when we look at a number of systems we see that what appears to be continuous at one scale is readily decomposed to discrete quantized steps at a smaller scale. The most obvious example of this is the digitization of analog signals that is used in high quality electronic systems such as DVDs. Any analog signal can be discretized and quantized, that is, it can be broken into equi-interval pieces with each piece given a fixed (often) integer value. This process is based on taking a periodic measurement of the analog signal at a rate that is rapid compared with the normal fluctuations in the signal. The transducer (sensor) described above can be made to do this interval sampling and is, in fact, the basis of digital electronics systems.
The importance of this dual nature of matter/energy — describable in both continuous and discrete quantal terms — is that we can treat these descriptions as essentially equal. If it turns out that, at bottom, Planck-scale, all things are discrete it doesn't matter. That scale is so infinitesimally small compared to processes of interest that everything is essentially continuous. This is just like the nature of water. It is composed of discrete molecules of H2O. Yet at our ordinary scale of perception water appears to be a continuous fluid (at room temperature). We can describe it as such, but, if needs be, also choose a description based on its discrete nature. [As an aside, the latter is possible in principle but proves damned hard to do in practice since the scale of description of the discrete will always be much smaller than the scale of description of the continuous. We still are finding out things about the nature of water at the molecular level that surprise us!]
The Principle of Random Events
At a level of description that is continuous we recognize variations in the process that seem to deviate from the expected. That is, things take a turn that cannot be predicted from our knowledge of the behavior of the process, except in some statistical sense. If we were to be able to drill down to the underlying discrete substrate (the molecular level in water) we might find perfectly deterministic explanations for these seemingly random fluctuations. Such explanation for apparent randomness has been advanced both philosophically and in some instances of physics (e.g. chaos theory applied to nonlinear, dynamical systems). The important point is that for all real processes, described as continuous, there is some, however slight, random quality that makes perfect predictions impossible.
An example of this principle can be seen in random mutations, necessary for the generation of novelty in genetic material upon which natural selection can operate in biological evolution. Taken at the level of description of the population we can only provide a statistical picture of mutations and subsequent inheritance. We cannot predict what mutations will take place or when. At the molecular level there are causal sequences that lead up to the replacement, say, of a specific nucleotide in the gene. Because of the sheer complexity (number of variables) of the situation it is impossible, even in principle, to measure each and every particle, its position and velocity, that would be needed to explain the results. We need to be satisfied with our statistical descriptions based on a theory of probability that serves us well in the macrosphere. Hence, we treat randomicity as effectively real.
These are some of the important principles that appear to be operative at all levels of organization of reality. They describe the basis for the relationships that can be found at all levels between the four stuffs in our ontology. Now I can delve into information and knowledge as the second great duality of reality — the Ethereal realm.
Information
The problem with the term information is that it has been used for many purposes. It is so common that it is in danger of losing its most valuable kernel of meaning. I say this as a computer scientist who has witnessed its usage by fellow academics who play fast and loose with it. For example, how often have we heard the phrase " information management" only to find that what is really being described is the management of data? Yes, there is an important distinction to be made between information and data. But there are more distinctions still. In this section I will spend no small amount of effort to make those distinctions so that you will (I hope) understand how information is such a fundamental stuff.
In particular I will make a most important distinction between information and knowledge. It is amazing to me how often these two terms are interchanged. On the other hand, I suppose I shouldn't be too surprised. As I will explain, there is a deep connection between information and knowledge that can blur the distinction. Knowledge is a result of information acting on a physical system which is capable of undergoing a change in configuration. Thus, it is sometimes said that knowledge is embodied information. It is true, in some sense, but we should be very careful not to think that they are therefore the same. For example, it is wrong to say (in my Ontos!) that a teacher imparts knowledge to a student. As we will see, it is even wrong to say that a teacher provides a student with information - but that is a little trickier. It is the student who constructs knowledge (in the brain) as a result of the information she receives from the world, which could include a teacher.
What then is information? Gregory Bateson had an apt definition; "...information is news of difference that makes a difference". By this he meant that information is that which tells a receiver (also called an observer) that something is different from what the receiver expected and that is the basis for an active receiver to do something different than what it would have done otherwise. Claude Shannon developed the communications theory of information which comes very close to this idea of difference from expectation. He defined information in terms of a probability function. First he had to distinguish the roles of various players in the act of communicating information.
As already mentioned, you need a receiver or observer. This entity receives messages from a sender. A message is conveyed along a channel and is comprised of a coded configuration of matter propelled by energy. The channel is anything that acts as a medium for the message. So, for example, a message may be an organized set of voltage pulses in a copper wire. The wire is the channel and the organized pulses are pieces of the message called symbols. A famous example of a coded message process is the Morse code of telegraph fame.
Figure 3. A communications model. The definition of information depends on this model.
The symbols are thought of as being sent and received in a serial fashion with varying timing between. It is this temporal arrangement of symbols which constitutes the message. Before getting into the details of messages and their meaning, I want to provide a small example of information communication that will be useful for further development of these ideas and especially with the relationship between information and knowledge.
Let's suppose we have a system comprised of a sender, channel and receiver as in Figure 3. Let's further suppose that there are just three symbols possible for the sender to send and receiver to receive. In other words, both sender and receiver must a priori be "designed" to "handle" these symbols. I will explain more about the designed part and what it means to handle symbols shortly. For now please just accept this as a given. Also accept that for simplicity sake we will state that the symbol generation by the sender (that is the creation and insertion into the channel of a symbol) is clocked at discrete intervals.
Let the symbols be represented by the characters a, b and c. We would say that these are symbols in a fixed set called S, and represent this set so:
S = {a, b, c}
At each interval of time, Δt, the sender selects one of the symbols in the set and inserts it into the channel. At some time later, depending on the speed of transmission and length of the channel (details that won't concern us here), the symbol arrives at the receiver. Symbols arrive with the same Δt interval, just offset to some later time by the transmission delay. So at each interval, the receiver gets a symbol from the set S. The question Shannon asked (and answered) was: How much information is conveyed by each symbol received. To be fair, Shannon tackled much more complicated versions of this question, some of which we will go into later. But for now lets follow the reasoning and see what we mean by an "amount of information."
Shannon, and others who were interested in measuring the value of messages, reasoned that each symbol must convey an amount of information that is inversely proportional to the likelihood of its arrival. What this means is simply this: Information is a measure of surprise to the receiver! The more a receiver expects to get a specific message at a specific time, the less informed they are. You can relate this to your own common experience. If you don't know what to expect from observing some event, say who will win a Superbowl game, then the result of the event will inform you that one team was "better" than the other (I should caution you not to think of this as the same as the meaning of a message — we will cover that later). Even more to the point, if you did expect team A to win and team B did, then you would be surprised (and maybe a bit disappointed if you had bet money on A). The level of surprise is the measure of information.
In some ways this is not intuitive because we ordinarily attach the idea of meaning to the common use of information. But this is not what is meant by the measure of information. That measure, I, is a number between infinitesimally close to zero to infinity! Here is how Shannon set it up. If I is inversely proportional to the expectation of an event (the receipt of a specific symbol at a specific time), this means that the more probable the event is, the less information is conveyed. In math talk:
I ∝ Pt(x←{a, b, c})-1
which, translated to English means that information is proportional to the inverse of the probability of the event being the arrival of one of the symbols from the symbol set. [OK, maybe that isn't regular English! So say, if a symbol is highly probable (expected) then it produces little information; if it is improbable, then it produces a lot of information.]
Here, the probability Pt(x) is defined by the likelihood of x being one of a, b or c at time t. And what is tremendously important to grasp is that the likelihood is based on the receiver's expectations.
What Shannon did, a stroke of genius, was to see that the relationship could not be a simple one. Probabilities range from zero (absolutely impossible) to one (absolutely certain). Information, intuitively, ranges from minuscule (I just knew that would happen) to humongous (wow, I didn't see that one coming!). Furthermore, it should not scale linearly because, again intuitively, a large change in probability when you are above a 50-50 chance should not convey that much more (or less) information as compared with a small change when the probability is near unlikely, which should tell you a lot.
A function that has these properties is the negative logarithmic function shown in Graph 1. Shannon chose the base 2 log function since it has the nice property of producing one unit of information for a probability of 0.5. This unit, Shannon named the "bit" for binary digit (or binary digit if you prefer!). It represents the amount of information that one gets from flipping a fair coin. A priori there is a 50-50 chance of getting either heads or tails — they are equiprobable. The flip results in one or the other, producing just one bit of information, which, if you think about it, isn't much.
Graph 1. Relation between probability and measure of information. This graph shows information as being equal to the negative logarithm (base 2) of the probability.
The bit works for binary choices — on-off, one-zero, yes-no, etc. As a result it is the basic unit of information used in computational engines. Numbers are represented in computer memories by arrays of binary values, the ones and zeros. A single memory cell is called a bit, and a string of bits is used to represent a base 2 number. We will have more to say later about computation and representation, but we will leave off talking about computer numbers for the moment since there is nothing more that can be said of binary choices that helps further understand information.
The real nature of information shows up in more complex decision situations. We can start to see this with our model communications system above. Remember that the system only had three symbols. A question we might ask is, how much information each symbol conveys when it arrives at the receiver? This question is motivated by a prior agreement as to the meaning of each symbol. For purposes of our example, and to keep it simple, lets say that the symbols have been assigned the following meanings.
a → BAD, HURT, PAIN!
b → NOTHING, NEUTRAL
c → GOOD, REWARD!
That is, the arrival of an a means either something bad is going to happen to the receiver at some time in the near future, or it could mean that something bad has happened to the sender at some time in the past — it all depends on what the designer intended the system to be used for. Similarly, the arrival of a c means something good. Now the information value of the message takes on some significance. Clearly if the receiver is getting a lot of bad symbols, it should be doing something with that information that would be different from what it might do if it were getting good symbols. And if it were getting the neutral b it wouldn't need to do anything. This may seem a little abstract right now, but you will see it come into play when I describe how the receipt of an informational message 'causes' changes in the receiver below.
Now lets start the system (start the clock ticking). What is the information value of each message? One way to think about this is that the receiver maintains an array of three probability values, one for each symbol type in the set. If we had to guess what any symbol's information value would be at the outset, we would guess they were equally probable. That is, initially we would assume that the P(a) = P(b) = P(c) = 0.3333... But there is no reason to assume these are the "real" probabilities, representing the long-term experience of the sender. Imagine that something really bad is happening to the sender who wants to convey this message to the receiver. Then the sender would start sending out a series of as. The receipt of the first a would produce an amount of information equal to the -log2(0.333), which is about 1.586 bits. Not a spectacular amount of information. Not much of a surprise. The conclusion one should draw from this is, well, not much. We don't have enough information to determine if something bad is happening or not.
But now lets say the receiver has an ability to use the information received to modify its own probability array. We need a function that will relate the information received to a value that we can add to the current probability to get a new probability for the next occurrence. For example we could simply divide the information received by 10 (this assumes that if we were to receive 10 bits of information we would be so overloaded that the whole system would blow up!) and add that to the current value. This new probability would be the basis for computing the next informational value of the receipt of that symbol.
To see this work, say the probability for receiving an a starts out at 0.333. Now, for the next 15 clock ticks the receiver gets an a. This is telling the receiver that something bad is happening, but with each receipt the message is less and less surprising ("tell me something I didn't already know!"). That is, there is less and less information generated with each new receipt. Table 1, below, shows the schedule for probability and information with each message (symbol received) for 15 ticks. Notice how the probability climbs, but at a decreasing rate, and the actual information value declines. Less information results in less increase in the probability which leads to less information in the next iteration.
Tick | Probability | Information |
0 | 0.333 | 1.586405918 |
1 | 0.491640592 | 1.02432406 |
2 | 0.594072998 | 0.751287879 |
3 | 0.669201786 | 0.5794868 |
4 | 0.727150466 | 0.45967417 |
5 | 0.773117883 | 0.371239686 |
6 | 0.810241851 | 0.303575489 |
7 | 0.8405994 | 0.250509668 |
8 | 0.865650367 | 0.208143652 |
9 | 0.886464732 | 0.17386486 |
10 | 0.903851218 | 0.145842783 |
11 | 0.918435496 | 0.122749693 |
12 | 0.930710466 | 0.103595665 |
13 | 0.941070032 | 0.087626006 |
14 | 0.949832633 | 0.074254772 |
Table 1. The change in probability resulting from a steady receipt of the same symbol. As the probability increases, the amount of information declines. The receiver comes to 'expect' that the next symbol to be received will be an a!
There is, of course one caveat. As the probability of the next received symbol being an a increases, we must decrease the probabilities of the other two symbols. This is just one of the rules of probability theory for mutually exclusive events. The sum of all of the probabilities of all of the symbols must be 1.0 and not a jot more (or less). So, if the probability of receipt of a goes up, because now we can see that with each such receipt we are coming to "expect" a, the probabilities of the other symbols must go down accordingly. I won't bother with the details of how this might be accomplished here. Suffice it to say that we need to proportionally reduce the other two probabilities by an amount 1.0 - P(anew). In other words we take that difference and reduce the other two to portions of the difference so that the total sum still equals 1.0. The results would look like Table 2, representing the array of probabilities after the 15th tick from above.
P(a) | P(b) | P(c) | Total P |
0.9498... | 0.0251.. | 0.0251.. | 1.0000 |
Table 2. The memory array in the receiver after 15 ticks. The probabilities of b and c have declined
To carry this a step further, suppose that after the 15th tick the receiver gets a c. What does this mean? Since the probability of receiving a c is only 0.0251, the information generated is 5.316 bits. This is a lot! It means that the receipt of c was relatively unexpected. It's receipt was a big surprise. What should happen now?
By our procedure above we need to increase the new probability of c to 0.5891 and reduce the probabilities of a and b by amounts proportional to their current values and to 1 - P(cnew). Shortcutting the math the new array would look like:
P(a) | P(b) | P(c) | Total P |
0.3902... | 0.0206.. | 0.5891.. | 1.0000 |
Table 3. The memory array after the receipt of c (after the string of
a s received previously). This single event was unexpected, resulting in a substantial
shift in the probabilities of both a and c. The probability of b
is barely affected.
[In real systems it is more likely that a radical shift like
this is not the result of a single symbol receipt. More generally a more graded response
is the result. That is, the receiving system will respond only minimally to the shift
from a to c, treating this shift as potentially due to noise —
a random event. If the receiver continues to get c after the initial one, then
the receiver will respond more strongly. In other words, the system should employ
filters that moderate the information encoding so that the receiver does not make
such radical internal changes due to noise.]
If the receiver continues to get a symbols for several more ticks, then the probability of receiving a goes back up, just as it did with c. Any series of symbols will cause the values to rise and decline accordingly. It should be easy to see that if the symbols arrive in a random fashion, with a uniform distribution, then the probability values in the array will tend to equilibrate again to 0.333...
Over time, the values associated with each symbol will represent the receiver's expectations for what is happening in its world — the sender. As time goes on the receiver can expect bad things to happen, if mostly as are received. It could come to expect good things, if mostly cs are received. Or it could come to expect that things are neutral for the sender if mostly bs are received.
This example is very simple, yet you can see that even it has its complexities. The more symbols that a system has, the more complex the computation of probabilities becomes. Though the method used here is not mathematically rigorous, I took some unexplained shortcuts, it illustrates the basic concept of information leading to a change in the receiver that then results in a change in the receiver's expectations for future events. The fact is that the array of probabilities, in a very real sense, represents the receiver's knowledge of the sender's state. If you had to say what the general state of the sender was you could look at the array and make a prediction. And this is exactly what we mean by knowledge. We will take a closer look below.
There is yet one more point to make about information before moving on. We should note that there is a profound relationship between information and energy. It takes energy and work to send messages, so the whole system uses some energy in the act of communicating. It also takes energy and work to compute the new values of probabilities. The energy applied to the computation, in general, comes from other sources and the information input operates through amplification (or transduction) to make physical changes in the receiving systems (see Knowledge below). That is, information receipt — receiving an unexpected quantum of energy encoding a message, triggers work to be accomplished in the receiver such that a similar quantum received in the future will be more likely dissipated, producing less work to modify the system next time.
Actually, it takes energy just to maintain the storage of these values (as for example the electricity needed to maintain a memory cell in a computer). There is a very fundamental relationship between energy and information in the sense that there is always some energetic cost (work must be accomplished) to convert information into a change in configuration representing a change in probability for future similar messages. In another way of putting this, the entropy of a system is increased with every message! This deep relationship can be seen in looking at the similarity of the formulation for average information in a message and that for the statistical mechanics formulation of entropy. Indeed, one may hear someone refer to information as the entropy in a message.
This has been a brief look at the nature of information as what I have called a 'stuff' in my Ontos. Information is a fundamental category of what exists. It is real even if ethereal-like. It has a real effect on physical systems, otherwise computation and thinking would not be possible. Yet we are only now coming to recognize it for what it is and to realize what its role is in the functioning of the Universe.
Knowledge
The more you know (about a message source) the less surprised you are by the receipt of a message, in general from that source. The less you know, the easier it is to be surprised. In this sense, knowledge is the inverse of information. Figure 4, below, shows the relationship between information and knowledge from this perspective. What is being measured, as above, is the probability of an event, the receipt of a specific symbol, say. Having a higher expectation of the event means its occurrence is not very informative. But having that expectation means that the receiver must have accumulated experiences with the sender such that, as described above, the array of expectations represents the fact that that event has occurred frequently in the past. Similarly, an event that has not occurred frequently will have a low expectancy and thereby, when it occurs, generate more information for the receiver.
Figure 4. Knowledge can be thought of as the reciprocal (inverse) of information. Knowing less means that messages will be more highly informational. Note one of the consequences of this view is that there is no such thing as perfect or complete knowledge, i.e., since the function approaches probability 1 asymptotically there will always be some residual ignorance!
Both receivers and senders are physical systems, connected by a physical channel through which energy or energy-driven matter can flow. The energy moves, as the principle of flow demands, from a high potential source in the sender, to a low potential sink in the receiver. That energy is transduced by a physical aspect of the receiver and, in turn, can be amplified (using energy flows inherent in the receiver) to do work on the internal physical configuration of the receiver. In other words, the receiver's internal organization can be changed by that work so as to represent the message receipt.
But by the principle of dissipation (above) we can see how a sufficiently complex organization within a receiver can become 'pumped' up by prior messages such that future similar messages will have no impact on changes to that structure. The receiver's internal structure that embodies its encoded history of messages represents its knowledge of the sender! This is what I meant above by an array of expectations. A configuration of matter that can be reorganized by the receipt of a message (transduced and then, through amplification, doing work on that matter) can be seen to be 'encoding' that message. The extent to which such a configuration approaches a pumped up saturation, and subsequent dissipation of energy, determines how much capacity for storing expectations a system has. Systems that dissipate too rapidly cannot retain a 'memory' of prior events, whereas a system that can jump to a fully dissipational state and maintains stability for too long, cannot learn new relationships between message states. Systems that have workable memories have to be able to encode informational messages but also forget (reduce their expectations of) messages that were encoded but are no longer arriving regularly.
Sometime in the not-too-distant future I will write about how the human brain (neural networks) fulfill this function; how neurons and synapses do work to encode messages that are unexpected and dissipate the energy of messages that are expected. We need to make the connection between this general model of information and knowledge with what goes on in the brain in order to more fully appreciate the principles involved.
So knowledge and information form a duality just as matter and energy do. Moreover, knowledge is embodied in the configuration of matter, but the latter is determined by the flow and interactions of matter with energy. The improbability of the latter, in any given time frame, is gauged by how responsive the material configuration is to the flow. Responsiveness, followed by some form of longer-term storage of energy, constitutes a memory of the energy flow. If the configuration is then able to dissipate additional flows without significant re-configuration, then the system can be said to be NOT surprised(!) by that additional flow. Thus the four stuffs of the Ontos are deeply interrelated to one another.
Whereas the matter/energy duality is constrained by the law of conservation, it is not clear that such a law pertains to knowledge/information. The latter is characterized by the fact that information begets knowledge and knowledge changes the behavior of the knowing system. This change of behavior is the source of information for other observers that perceive our knower as a sender! Thus information increases in the universe as a result. That is the causal loop I mentioned above.
Look for future posts to use this Ontos model and the relationships just described to explicate evolution and organization. I hope to show, through specific examples, how this applies to any system we might look at, thus strengthening my argument that systems science is the meta-science I claimed.