Stanford Encyclopedia of Philosophy


Colorado Delta
Information can tell us everything. It has all the answers. But they are answers to questions we have not asked, and which doubtless don't even arise.
-- Jean Baudrillard (b. 1929), French semiologist. Cool Memories, Ch. 5 (1987; tr. 1990)

Information

Information is commonly understood as knowledge or facts acquired or derived from, e.g., study, instruction or observation (Macmillan Contemporary Dictionary, 1979). On this notion, information is presumed to be both meaningful and veridical, and to have some appropriate connection to its object; it is concerned with representations and symbols in the most general sense (MacKay 1969). Information might be misleading, but it can never be false. Deliberately misleading data is misinformation. The scientific notion of information abstracts from the representational idea, and includes anything that could potentially serve as a source of information. The most fundamental notion of information, attributed to a number of different authors, is "a distinction that makes a difference". Information theory, then, is fundamentally the rigorous study of distinctions and their relations, inasmuch as they make a difference.
Contemporary information theory attempts to make our informal notion of information precise and quantify it. The quantitative approach takes at once a broader and more restrictive view, generalising to omit content, thereby including a much broader range of cases, but at the same time losing any distinctions based in content.

In each case there is a common sense notion of information that is a more restricted special case reflecting our interests and capacity to represent, as based in the relevant distinctions within the system under study. The various mathematical theories can regarded as dealing with information capacities, the ability to carry information in the usual sense (for more detail on this, see the discussion of the combinatorial approach to information). So far, the intentional aspect of information lies outside the capabilities of rigorous information theory, though it remains a highly prized philosophical goal. A construction within information theory would start with one of the standard formulations of information theory, with the foundational elements being the elements of that theory, and various philosophically justifiable constraints (and possibly further operators) are added to produce a theory that corresponds to our intuitions about intentionality, as mediated by a process of reflective equililibration (reflective readjustment of our intuitions and the introduced constraints and operators). Despite the difference between technical definitions of information and our everyday usage, it has become conventional to refer to information capacity simply as information. This can be confusing to neophytes, so some experts prefer to reserve the term `information' for intentional cases, and use 'complexity' for the other cases. Unfortunately, `complexity' also has many disparate uses. There is currently some movement towards a unified approach to information, but several philosophically interesting issues remain unresolved.

In philosophy, information theory has been applied to logic, perception, epistemology, the relation of mind to bodysemantics and pragmatics, natural laws, causation, the interpretation of randomness and probability, measurement, and game theory and economics. Although there are many interconnections among these topics, the current state is also predominantly one of increasing divergence, and the overall picture is confusing. Nonetheless, a concept with close ties to logic, computation, probability, causation, the mental, the physical, and meaning shows promise for permitting considerable unification, if foundational issues and intense territorial disputes can be resolved.


The Basic Idea of Information

The scientific notion of information as a quantifiable measure emerged in the 1930s and 1940s, was refined and integrated with logic and other parts of mathematics in the 1960s and '70s. This led to an explosion of theoretical and papers and broad applications in the 1970s and '80s, and more refined work thereafter. Rather than try to trace this multifaceted history, I will give an account of information that is common to all approaches. This account is non-standard in the sense that it is not found in the literature as such, but is a distillation of what underlies various different approaches that are found in the literature. The student of information should be aware that there are many terminological disputes and substantive disagreements as to what constitutes information, and that the views presented in this section and the next are convenient for expressing the various positions rather than widely accepted canonical views.

The fundamental quantitative notion of information is of a unit of distinction (Spencer Brown 1972), called a logon by MacKay (1969) which enables the isolation of a distinguishable group. A distinction is thus an operation, or possible operation. A classification places particular objects in the domain of the classification according to the types of the classification. The types are abstract, permitting further specification, whereas particulars are maximally determinate. For example, in order to classify a tree, a grouping of properties that is sufficient to define a tree must be distinguished from the domain of properties in general. To represent a specific tree, this grouping must be specific enough to pick out the tree in question. More information is required to distinguish a specific tree than a tree in general, since a specific tree is first of all a tree, and must be distinguished from other trees by at least some of its peculiar properties. This might not be obvious. We can, for example, pick out a specific tree by pointing, or saying "that tall skinny thing over there", which doesn't contain any representation of a tree. The problem is that pointing and demonstrative reference do not constitute representations, but can guide us to representations. (This is also true of definite descriptions, at least in their referential mode, as opposed to their descriptive mode, in which the description does not need to be true in order to achieve reference.) In general, the more specific the representation, the more information it requires, i.e. the more distinctions are involved. However, this holds only for representations ordered by determinateness on W.E. Johnson's determinable/determinate scale. Structures of the same determinateness involve less or more information, respectively, depending on whether they are more or less regular, or ordered.

To take a simple example, consider a frame cube and a spatial structure composed of eight irregularly placed nodes with straight line connections between each node. Both structures may encompass the same volume with the same number of components, but the regularity of the cube reduces the amount of information required to specify it. This information reduction results from the mutual constraints on values in the system implied by the regularities in the cube ö all the sides, angles and nodes must be the same. This redundancy reduces, for example, the amount of information required in a program that draws the cube over that required by a program that draws the arbitrary eight node volume. On the other hand, the notion of a cube involves more information than the notion of an eight node volume. This is because "cube" is more determinate than the much more determinable "eight node volume".
Complexity
It has become popular to talk of complex structures as being midway between highly ordered structures like perfect crystals and highly disordered structures like ideal gases, along the lines of the figure to the left. The scale between the two (the x axis) is a measure of the amount of information required to fully determine the structure. The crystal requires determining the location of a few atoms and the repetitive relations to the other atoms of the crystal, while the gas requires the specification of the position and momentum of each molecule, a very large amount of information indeed. Note, however, that in describing the gas as low in complexity one is abstracting from the detailed behaviour of its molecules, but this is not so in the case of the crystal, in which the position and location of at least some individual atoms is determined. The comparison is possible, but it is misleading. It is better to compare complexity at a comparable level of determinateness, and recognise that there are two dimensions to the middle realm, organisation and complexity, which have differing measures, as indicated in the figure to the right. An interesting question is whether there is an information theoretic quantification of organisation. Charles Bennett has proposed logical depth as a suitable measure. Organisation & Complexity

A sequence of 32 Î7's requires a shorter program to produce (namely one specifying 5 doublings of an initial output of Î7') than does an arbitrary sequence of decimal digits. To take a less obvious case, any specific sequence of digits in the expansion of the transcendental number =3.14159... can be produced with a relatively short program, despite the apparent randomness of expansions of . The information required to unambiguously describe certain types of structures can be compressed due to the redundant information they contain; other structures can not be so compressed. This is a property of the constraints contained in the structures, not directly of any particular description of the structures, or language used for description. The length of a shortest description of a structure encoded as a string of 1s and 0s represents the amount of information in the structure. This length is the minimal number of distinctions (logons) required to define the structure.
 

Mathematical Information Theory and Logic

The notion of information is central to logic and reasoning, including probability and meaning, though whether a logic based in information is sufficient for the formal aspects of these topics is as yet undecided. The interrelations are still a bit murky, and there has been a confusing proliferation of mathematical "informations" and "entropies", many of dubious pedigree. This is typical of an immature science, and is not necessarily unhealthy. The main results are in communications theory, computation or recursion theory, logic and metamathematics, and probability theory. I will try to develop the fundamental ideas from the idea of a logon.

Information, Logic and Computation

Anything that can be represented can be represented (in principle) as a string of binary digits via an isomorphic mapping, as pictured to the right. The process is similar to producing a string that is rather like the record of an extended game of twenty questions, with 1s conventionally representing affirmation and 0s representing negation. A successful series of guesses produces a truth table row that represents the original thing. The string, represented in the middle in the picture to the right, has the same structure as the original structure (more correctly, they are manifestations of the same abstract structure); that is, for each set of distinctions in the structure, there is an equivalent set of distinctions in the string. Each string has at least one most compressed form that is non-redundant, represented by the bottom row in the picture to the right. A most compressed string is a generator of all truth table rows implied by what is true of whatever is represented, and can be thought of as a vector in the minimal dimensional space in which the thing can be fully represented (see Bell and Demopoulos, 1996 for details of the notion of a generator). It is relatively trivial to render truth-functional operators into information theoretic form. Negation is just the complement of a string (replace 1s for 0s and 0s for 1s), and conjunction is the mutual information between two strings (mutual information is defined in most elementary works on information within the formalism of the work; intuitively it is the information in both strings). The set of minimal strings is equivalent to a set of equivalent truth-functional propositions. This latter equivalence class can be thought of as the fundamental fact of the structure. It has the advantage of being completely unambiguous, whereas individual minimal length strings can equally well serve as truth functional generators of the truth table rows that are satisfied by the original structure. Each proposition in the set can form a basis for the binary linear Boolean space of truths about the structure, and the variant propositions form alternative bases for this space. Thus information theory and propositional logic have the same foundation, converging in truth table rows together with the equivalence of the truth functional operators with certain information theoretic functions.

A logon might be considered to be an element of the uncompressed string, but then some of the logons would then be redundant. The redundancy means that the digit may not make a difference, or that its difference is not a distinct 1 or 0 value. The redundancy is lost only in the compressed form. Therefore, I define a logon as the value of a place in a maximally compressed string. The information represented by a digit of an uncompressed string can be one or fewer logons. A logon is therefore not the same as a bit, except for maximally compressed strings. Each digit of an uncompressed string is a bit of information, even though it may be redundant. The length of the compressed string is the information content in logons. In general, the number of bits in a string is not the same as its information content in logons.

Classifications distribute tokens among the types of a class according to "yes", "no" questions concerning whether they are of the type. As such, they classify according to information, and a specific classification (assignment of tokens to types) can represented by a set of strings, one for each token. A nonredundant classification would require that each of these strings is not compressible. Ideally, classifications should be nonredundant, but this is often not true in practice, and different classes (types) contain mutual information that is not merely a consequence of their being subclasses within the same classification. The "twenty question game" discussed in the previous section can be thought of as a complete classification of some thing. If we have an ideal and complete classification, the tokens will have equivalence classes of mutual information that can be taken as the types of the classification. If we take these to be predicates, and the tokens to be objects, we can abstract from particular classifications and tokens to implicitly include all possible classifications and tokens. We can then define existential quantification as the assertion that a type is not empty, and universal classification as the assertion that all members of a type exist. This gives, together with the interpretation of truth-functional logic, an information-theoretic interpretation of predicate (1st order) logic.

The relation between information and logic can be seen from another perspective. George Spencer Brown's Laws of Form, or calculus of distinctions (1972), uses the calculus of distinctions to derive truth functional logic. The distinction can be thought of as an operation represented by Spencer Brown's basic symbol, the right corner, in which case there is one primitive to his system, as Brown thought. Alternatively, a distinction can be thought of as symbol, in which case there are at least two primitives, distinction (represented by a right corner) and non-distinction (represented by a blank) (Cull and Frank 1979). In any case, both the making of a distinction and the failure to make a distinction are required, so a second state is implied by the operator approach. The failure to make a distinction is just the blank, which may be regarded as the only constant (Cull and Frank 1979). Banaschewski (1977) showed that truth functional logic implies the calculus of distinctions, proving that they are notational variants, since Spencer Brown had earlier proven that truth-functional logic follows from the logic of distinctions. Cull and Frank (1979) made this more perspicuous by rendering Brown's axioms into a more standard notation, showing the equivalence of the calculus of distinctions and two-element Boolean algebra directly, up to notational variation. Thus it is firmly established that the logic of distinctions is none other than truth-functional logic, or, equivalently, the two-element Boolean algebra (it is known that there is only one two-element Boolean algebra). Present digital computers process information by distinguishing between "on" and "off" states at certain locations, and using these states, through their circuitry, to control the state of other locations in the computer.

There are several ways to approach information theory. The most familiar is the statistical approach used in Shannon's communication theory. Since communication theory is a rather high order application, I will leave full discussion of it until later. It is closely related to the combinatorial approach. Both these approaches are better suited to ensembles of system states rather than individual states. Algorithmic information theory, on the other hand, is well suited to individual states. It also has practical variants that try to estimate the complexity of a data set. A third variant is a theoretical demonstration using Boolean rings and sub-rings to demonstrate that probability is dispensable in information theory. Due to the technical nature of these approaches, I have set them on separate pages:

Randomness and Probability
The close connection between information and probability is evident from the statistical and Boolean approaches. Although historically probability theory preceded Shannon entropy, it is possible to define Shannon information without the explicit use of probability by using Boolean algebra. A brief summary of a proof by Ingarden et al (1990: 25ff) is here. It follows from this that the probability axioms can be explicitly defined within information theory, though I won't give the proof here. Given the inseparability of information theory and logic, probability theory is thus a branch of logic. Given the close connection between information and logic, it seems reasonable to conclude that information is the more fundamental notion. Hume, among others, thought that chance was completely a consequence of ignorance, or lack of information. We now know that this is unlikely, but the syntactic character of contemporary information theory allows us to go beyond epistemic and even intentional characterisations of information.

The most refined approach to defining randomness is found within the algorithmic complexity approach to information, and goes back to Kolmogorov (1968), who also gave a standard axiomatisation of probability theory. The approach is based on the noncomputabilty of incompressible strings by any program of cardinality less than themselves. If a string has this characteristic, then it is not distinguishable from a random string by any effective statistical test. Some of the more important details are given here
 

Organisation and Logical Depth

Organisation is the co-ordination or interdependence of parts or components, especially in support of vital functioning (OED). A living body, for example, is well organised when its organs so interrelate that the body as a whole can maintain all its vital functions. Correlations entail descriptive redundancy; if A, B and C are correlated in respect X we may replace their independent description {A(X), B(X), C(X)} with {A(X), R(A,B,C)} and so on. So a formal characterisation of organisation might well focus on a specification in terms of redundancy. Following Shannon (1949) redundancy orders are determined by the minimal number of elements in which a redundancy can be detected, so the redundancy in a system can be decomposed into orders n based on the number of components, kn required to detect the redundancy of order n. Order 1 redundancy can be detected by examining elements of a system pairwise, whereas order n redundancy is detectable over a minimum of 2n elements. Examples of low order redundancies are the simple repetitions of molecular arrangement in a crystal and the requirement that being a word of English places on sequences of letters. An example of higher order redundancy is the long-range correlations imposed by being a sequence from a possible lost Shakespearean play or being a sequence of letters from a PhD thesis.

To be organised requires redundancy. But real systems show various combinations of high and low order redundancy, local and global redundancy. This provides an internal richness to the notion of organisation. It also undermines any attempt to provide a simple univocal redundancy correlate of organisation. A significantly organised system is not maximally complex, because of its redundancy (more internally ordered than a gas), but it is not maximally ordered either, because of its higher order correlations (less ordered than a crystal).

Because the information in organised systems involves large numbers of components considered together without any possibility of simplification to logically additive combinations of subsystems, computation of the surface form from the maximally compressed form (typically an equation) requires many individual steps, i.e. it has considerable logical depth (Li and Vitányi 1990 pg. 238). Of course, this measure applies whether or not we regard the order as epistemically hidden or buried. Formally, logical depth is a measure of the minimal computation time (in number of computational steps) required to compute an uncompressed string from its maximally compressed form.

C.H. Bennett has proposed that logical depth is a suitable measure of the organisation in a system. However, while adding more components to a system at the same redundancy level will not increase the system organisation, only the size of the system organised, it will increase its depth because the sheer length of the sequence to be computed has increased. All sequences of n identical entries are intuitively equally trivial, however the depth of each string depends on the depth of n itself. This effect can be made negligible if we consider only relative depth: The depth of a sequence relative to the depth of the length of the sequence. The relative depth itself of a sequence of n identical entries is no more than the depth required to specify the entry itself (and negligible if the entry is 0 or 1). In the case of adding identical components to a system the relative depth does not increase since the depth of a component is already included in the original system relative depth. It is not transparent whether relative depth deals satisfactorily with all possible cases of this kind, but it is a reasonable, and plausibly sufficient, refinement of logical depth simplicitur to adopt.

When we observe organisation we can reasonably infer that it is the result of a dynamical process that can produce depth. The most likely source of the complex connections in an organised system is an historically long dynamical process. Bennett recognised this in the following conjecture:

A structure is deep, if it is superficially random but subtly redundant, in other words, if almost all its algorithmic probability is contributed by slow-running programs. ... A priori the most probable explanation of ‘organized information' such as the sequence of bases in a naturally occurring DNA molecule is that it is the product of an extremely long biological process. (Bennett, 1985; quoted in Li and Vitányi, 1990: 238)

The converse of Bennett's claim is not generally true: a system's being the product of an extremely long process does not ensure that it will contain a lot of organised information. There are further problems about how depth in material systems might arise, and why it seems to be favoured. For further detailed discussion, see Collier and Hooker (1999).

 

Communications Theory

Communications theory was the first applied mathematical theory of information developed. For practical reasons involving technological applications in the communications and computation industry, it is the one that has been pursued the furthest. Communications theory is the theory of the evaluation and control of the probability of transmission of messages with specified accuracy in the presence of noise, including transmission failure, distortion and accidental additions. Its basic elements are a message source, an encoder, a channel over which the message is transmitted, a decoder, and a message recipient. Numerically, information is measured in bits (short for binary digits). One bit is equivalent to the choice between two equally likely choices. For several equally likely choices, the number of bits is the base two logarithm of the number of choices. When the choices are not equally probable, the information is the sum logarithm of the probability of each choice weighted by the probability choice, yielding and equation similar in form to that for entropy in Boltzmann's statistical thermodynamics. The greater the information in a message, the more possible cases it rules out, i.e., the more specific it is, and the less likely it is to be true. Because a less likely message is more surprising, the information is sometimes called the surprisal. Any message of equal length to a maximally unlikely message, but less than maximally likely, must contain some redundancy. In a channel with no noise, the maximal information capacity can be gained by coding to eliminate redundancies in the source. In the presence of noise, which introduces equivocation into the message, reducing its probability of transmission, clever coding can reduce the loss, but at the expense of greater redundancy. This places an upper limit on the probability of transmission of a message in a noisy channel. Because of the provable existence of maximally efficient codings for any message, for any given channel there is a limiting capacity or rate at which it can carry information, expressed in bits per second. Once the information content and channel capacity are calculated, specific coding techniques can be used to control errors in the channel. The communication problem is to maximise the mutual information of the source and receiver. The mutual information can be expressed as the intersection of the information in the source and in the receiver, and in bits is the base two logarithm of the correlation of the source and receiver. Most of the fundamentals were first presented in Claude Shannon's painstaking seminal work (1949). The theory does strikingly well in defining the engineering requirements and limitations of communications systems.

Despite this success, the theory has metaphysical difficulties, since it requires quantification over an ensemble of states, most of which are often non-existent. This is not a problem when we are concerned only with capacities or potentials of communications channels, but presents problems when the theory is applied, as it often is, to the information content of individual messages or even to specific information sources. The problem concerns the grounding of the probabilities used to compute the information contents. If a source is ergodic, originally meaning that energy alone describes the dynamical state of the system, but now usually interpreted as the ensemble average of the source almost certainly equalling the time average of the source, the probabilities in the ensemble can be understood in terms of potential emissions at some time. Unfortunately, ergodicity in real sources is usually trivial or very hard to establish. The problem of grounding the ensemble probabilities is often dealt with by using operational procedures that vary according to the details of the case. The success of these methods depends on the reliability of the approximations for the problem to be solved as well as the nature of the source. Inappropriate choices can lead to perfectly justifiable formal measures that are nonetheless intuitively wildly unsatisfactory. For example, segments of the decimal expansion of  show no obvious regularity if sampled by standard statistical methods (very recent work may have refuted this), but are highly correlated with ubiquitous physical and mathematical functions involving  either explicitly or implicitly. This ergodic problem is part of the reason for the proliferation of information and entropy measures. It should be noted that an analogous problem is also foundational in statistical mechanics.

 

Semantic Information

A rigorous account of semantic information remains an elusive object of desire. Early attempts were made within the Logical Empiricist approach to language. Carnap and Bar-Hillel (Bar-Hillel, 1964) used the resources of inductive logic to define the information content of a statement in a given language in terms of the possible states it rules out. For "technical reasons" they calculate the states ruled out as a number of state descriptions. A state description is a conjunction of atomic statements assigning each primitive monadic predicate or its negation (but never both) to each individual constant of the language. The information content of a statement is thus relative to a language. Evidence, in the form of observation statements, contains information in virtue of the class of state descriptions the evidence rules out. (They assumed that observation statements can be connected to experience unambiguously.) Information content, then, is inversely related to probability, as intuition would suggest. Our pre-systematic intuitions, though, confuse two different measures of information content, both of which have plausible but incompatible properties. The first measure of the information content of statement S is called the content measure, cont(S). It is defined as the complement of the a priori probability that S is true:
[1]     cont(S) = 1- prob(S)
This measure fails the additivity condition, according to which the combined information content of two inductively independent statements should be the sum of their individual information contents (Bar-Hillel, 1964: 302). It also fails some natural assumptions about conditional information. These problems motivated the introduction of another measure, called the information measure, inf(S):
[2]     inf(S) = log2 (1/(1- cont(S))) = -log2 prob(S)
The value of this measure is in bits. Although inf satisfies additivity and conditionalisation requirements, it has a property that some people find counterintuitive. If some evidence E is negatively relevant to a statement S, then the information measure of S conditional on E will be greater than the absolute information measure of S. This violates a common intuition that the information of S given E must be less than or equal to the absolute information of S. The content measure, cont(S), does satisfy this intuition (Bar-Hillel, 1964: 306-7). I do not share this widespread intuition since it requires effort to correct the inference based on E that S is less likely. A more serious problem with the whole approach is the linguistic relativity of information, and problems with the Logical Empiricist program that supports it, such as what has somewhat misleadingly been called the theory ladenness of observation (Collier 1990).

More recent approaches start with meaningful representations and try to specify their interpretation, making use of available empirical constraints (Barwise and Perry 1983, Dretske 1981, Israel and Perry 1990, Devlin 1991). On this view, the interpretation of a representation is given in terms of the information it conveys. Unlike the formal approach, in which information content is determined entirely by the structure of language (or other representational system), information in this approach is the content (or factual content) of a representation (or information-report).

The goal of this approach is to connect meaningful representations to the concrete situations represented. In their situation semantics, Perry and Barwise base this connection on nomic regularities, called constraints (Israel and Perry 1990). Information is conveyed to us by causal chains connecting situations in a lawful way. The information indicated by a situation is relative to the causal chains connecting the indicating situation both to our beliefs and to the situation the information is about. Thus, "[t]he information a factual state of affairs carries is relative to a constraint". Complete determination of the reference of a representation (at least in cases involving indexicality) also requires specific circumstances. The information content of a representation available to us is delimited by our ability to invoke relevant constraints in the circumstances.

Situation semantics requires that there is something "out there in the world" that can be transmitted to intelligent beings who can understand the information it contains, and pass it around among themselves. The Barwise/Perry approach needs an information-theoretic account of nomic regularities and causal interactions, and of the transmission of the information these nomic regularities and causal interactions contain. Collier (1990, 1999) has offered one such approach based in physical interpretations of information. Barwise and Seligman's (1997) seminal work on the mathematical structure of information flow in terms of classifications of tokens under types related through infomorphisms that retain the structure of a classification of tokens across both changes in classifications and tokens approaches the issue from a more formal direction.

Superficially, Dretske's (1981) approach to information resembles the Carnap/Bar-Hillel approach. He also defines the information content of a piece of evidence in terms of the cases ruled out. A major difference is that Dretske does not try to specify representations purely syntactically. Rather than calculating the information content of statements, he uses states of affairs directly. His measure of information is similar to the inf definition (Dretske 1981: 52):

[3]     I(s) = -log2 prob(s)   (in bits)
where prob(s) is the probability of the state of affairs s. The use of states of affairs has the potential to avoid the problems of relativity to language that plague the formal approach (see Bell and Demopoulos 1998 for details of the logic of the relativity problem).

Dretske held that "[t]he ultimate source of intentionality inherent in the transmission and receipt of information is, of course, the nomic regularities on which the transmission of information depends" (1981: 76). This is similar to Barwise and Perry's placement of meaning in the world. Dretske's definition of information in terms of the cases ruled out might seem to fall afoul of the problem of background knowledge that plagues the Carnap/Bar-Hillel approach, however, information is transmitted (perhaps indirectly) from structure to structure according to causal laws. If it is transmitted to a structure with the right order of intentionality, the causal constraints imply that reliable belief. The causal processes producing beliefs eliminate other possibilities from consideration. Linguistic relativity and related problems are mitigated by allowing information to exist in the non-mental world as well as in the mind and as a purely formal abstraction. Dretske defines three orders of intentionality in order to admit higher order cognitive states. The first requires that all Fs are Gs, S has the content that t is F, and S does not have the contentn that t is G, where S is a structure, and t an object. The second order requires of the first condition that it is a natural law that Fs are Gs, and the third requires that it is analytically necessary that Fs are Gs. Dretske notes that the second and third orders don't have a clearly defined boundary, but he calls any propositional content exhibiting the third order of intentionality a semantic content. His definitions require that beliefs have higher order intentionality than structures with respect to information content; first order cases have the systematic content required for belief (though they can qualify as awareness or sensation). Dretske holds that the higher orders are formed from the lower orders through a process of digitalisation of the information from the analogue form in which it is received, where a digital representation has a form containing all and only the information of its semantic content. First order intentionality is analogue and vague (a little like C. I. Lewis's ineffability of the given, or James "blooming, buzzing confusion").

Perry has expanded on this idea by noting that we get information by making discriminations or distinctions within a context, thereby specifying which of several possibilities we mean. To be successful in these discriminations, the distinctions must also exist elsewhere. To be successful in making use of our semantic discriminations in our interactions with the world, there must be appropriately correlated distinctions in the world. Much of the work in situation semantics involves unpacking "discriminations", "interactions", "relevant" and "appropriate" in logical and information theoretic terms (see Devlin 1991 for a recent account).

 

Physical Information

Physical information is closely connected to its entropy, which is, very roughly, a measure of the objective disorder of the system. The Second Law of Thermodynamics requires that the entropy of an isolated system cannot decrease with time. This means, again very roughly, that only some energy within an isolated system (and more generally, in all connected systems) is available for work, and that this energy never decreases. This was deeply disturbing to the Victorian mind. Maxwell posited a "a very observant and neat-fingered beingä that sits by a frictionless door between two chambers A and B, initially at the same temperature. The demon opens the door whenever either a relatively fast moving molecule moves towards it from B, or a relatively slow moving molecule moves towards it from A. Gradually, the manipulations of the demon lead, without the expenditure of available energy, to a sorting of the fast moving molecules into A and the slow moving molecules into B. This lowers the temperature in B relative to A, decreasing the total entropy of the system, apparently violating of the Second Law. It was quickly evident that a purely mechanical demon was not possible.

Szillard (1929) developed an idealised argument involving a single particle on one side or another of a piston that excludes a demon that detects molecules with radiation, showing that each molecule detected required dispersion of production of an amount of entropy equal to the amount lost by sorting it, thus tying detection to entropy increase. Schrödinger (1944) proposed that order as found in macromolecules that carry biological information was the negative of entropy, or negentropy. Schrödinger said that he used negative entropy rather than free energy because of misunderstandings of the relation of the technical notion of free energy to the common notions of free and energy, and traced the idea back to Boltzmann. Brillouin (1962) formalised this idea and related it to the Shannon information of communications theory (others developing similar ideas were Gabor, Raymond and Rothstein, see Leff and Rex, 1990). The negentropy principle of information implies that no physical entity can use information in a physical system to lower its entropy. In particular, it implies that any measurement requires the dissipation of a minimal amount of energy in any measurement. It is worth noting that Shannon entropies have the same mathematical form as entropy, but correspond more closely to negentropy in most applications. The difference doesn't matter much to abstract communications theory, which quantifies over ensembles of messages, most of which are fictional, but when we turn to concrete particular messages, such as the information in a measurement, the difference becomes crucial. Shannon entropy can be decreased by a passive filter, but physical entropy, by the Second Law, cannot. This means that the Shannon entropy of a source must be negentropic to be measured.

A second approach to physical information is through the physics of computation. Rolf Landauer, noting that some computations are logically reversible, asked whether physical computation is reversible. He concluded that the only essentially irreversible step is erasure. It is possible to make a computer without erasure, as shown by Fredkin and others. However, for computations showing other than logical equivalence, large amounts of waste storage are produced. A computer can be implemented on a system of colliding elastic balls, so at least a reversible physical implementation of a general purpose computer is possible in principle. Erasure corresponds to loss of information, and waste of unusable storage, its reversible equivalent, corresponds to information that cannot be used to for further computations without an equivalent loss. The parallel to the Second Law of Thermodynamics did not go unnoticed, and Charles Bennett (1987) argued that Maxwell's demon failed because it must erase information. Collier (1990) argued that the demon fails because it can only make accessible the information required for manipulating the macrostate so as to reduce entropy by making an equivalent or greater amount of information inaccessible in the sense mentioned previously, thereby lowering the entropy. Earman and Norton (1999) argue for the irrelevance of information theory to the explanation of the Second Law, echoing claims by the Denbeighs. The resolution for the issue requires a deeper understanding of the problem the demon has to solve.

Maxwell himself used the demon in an argument that the Second Law was statistical in nature, and was subject to exceptions, though these were highly unlikely. Explaining the statistical nature of entropy led to the ergodic problem, which is the problem the of how the state parameters of a system with components with significant spatial and momenta parameters could depend on energy alone. Research in the ergodic theory has been extensive, but has drifted away from the original problem, which remains unresolved except for some very special cases (Sklar 1993, 1996). The connections among computation, chance and probability, as well as the demon problem, through information theory suggest information might play a central role in understanding the Second Law, despite cogent arguments to the contrary.

Information theory is connected to the problem of the direction of time through thermodynamics as well as through the asymmetry in our information about the past and the future, and through related asymmetries in causal processes. The significance of the asymmetries is a subject of much current debate.

Information theory with a physical interpretation has been applied to biology, with limited success so far. Notable attempts are Gatlin (1976), Holzmüller (1984), Küppers (1990), Kauffman (1993) and Brooks and Wiley (1988). None of this work has yet been widely accepted in the biology community.

Measurement

Measurement involves getting information about a source via a physical process. This requires the transmission of information from the source, or from something that contains information about the source. Essentially, measurement is a co-ordination problem, in which the mutual information of the source and the result must be maximised through some physical process or processes. This appears to be a problem in communication theory, and it is at least that, but further issues involve the role of natural laws, theory, and often tacit auxiliary assumptions in specifying both what is measured and its significance. Some of these problems converge with the problems concerning semantic information, discussed above. In particular, tacit assumptions place constraints on the interpretation of observations, and causal processes convey information from what is measured to the measuring device. Independent of Quantum Mechanics, the non-existence of a Maxwellian demon places limits on the total information that can be extracted in any particular measurement process because measurement requires the expenditure of available energy, or exergy. This places the sort of physical limits on the accuracy of measurement discussed by Brillouin (1962, Chapter 16).

The measurement process itself can be thought of as a source, coding, channel transmission and decoding process, much as in a communications channel. For some purposes, for example in seismology, where the physics of the source and its connection to the channel are well understood, this model can be quite useful. It is also useful for determining the sensitivity of observations, and the amount of information that can be conveyed by a particular experiment. In many if not most cases, though, problem with understanding the analogues to coding and the channel, not to mention decoding, are very unclear, and involve problems of observational dependence on theory and related issues. For these same reasons, Barwise and Seligman's (1997) approach to information flow is not immediately helpful, depending, as it does, so heavily on knowing the classifications in the infomorphisms.

One area that has not been investigated as well as it should is the role if distinctions in crucial experiments with an eye to how theory based semantic distinctions connect to experimental distinctions. Testing in general involves classification, and seems ripe for information theoretic analysis.

Causation

Information theory has a bearing on a number of the characteristics of physical causation. Some involve the temporal asymmetries mentioned previously. One prominent approach to causation, the mark approach initiated by Reichenbach (1956, 1958) and furthered by Salmon, defines a causal process as one that can bear information, and causal interactions in terms of forks in causal processes. Causal forks exhibit the probabilistic relations dealt with in theories of probabilistic causation. Collier (1999) has given a definition of causal process and causal forks in terms of physical information theory based on the algorithmic model, from which the necessity and other modal properties of causation and natural laws follow naturally. The main problem with this approach, aside from the obscurity of the resources it uses relative to commonly understood ideas, is a possible circularity in the notion of information transfer that may also infect similar accounts.

Natural Laws

As usually conceived, natural laws have the role of axioms for the world. Therefore, the question of the information content of natural laws makes sense in the same way as the question of the information content of an axiomatic formal system makes sense. In both cases, it is the abstract structure of the system, whether laws or axioms, that is relevant to the algorithmic complexity. Axiomatic theories and mathematical models can be treated in the same way. Some attempts, e.g., by Brillouin, have been made to determine the information content of empirical laws, and a number of others have noted that simplicity of theories and compression are connected, but so far there is no canonical way make the connection. The situation is likely to be analytically intractable for reasons mentioned in the discussions of various mathematical approaches to information theory, but some progress has been made with mathematical models by using minimum message length and minimum description length techniques.

Perception and Epistemology

Dretske's (1981) account of perceptual knowledge remains one of the most advanced philosophical accounts based on information theory. It is based in causal constraints, requiring probability one that the information represents the object for perceptual beliefs, and is thus a reliability account (no justification required). His three levels of intentionality, distinguishing between digital vs. analogue information, allows us to distinguish between simple perception and perceptual beliefs.

Other approaches use evolutionary considerations to yield a naturalised epistemology based on information flow, but relaxing various requirements of Dretske's account. A completely different approach uses Bayesian methods together with the idea that knowledge is a correlation of mental state with the world. This approach gives up the requirement of probability one. It is also a naturalised approach, since prior probabilities are required, and evolution is the most natural source of these. One account of how an initial reliability can be established was given by Mohan Matthen (1988). Grandy (1987) has extended the correlation account to take practical considerations of survival into account, which creates some problems for the pure correlation account, including Dretske's account.

Two approaches that avoid evolutionary considerations make use the idea of compression. The Minimum Message Length (MML) approach was developed by Wallace (see 1999) and the Minimum Description Length (MDL) approach was developed by Rissanen (1989). The basic idea of both accounts is to find a minimal message that encodes binary coded data about the real world, though the best that can be achieved in most cases, given the noncomputability of the shortest string, is a probability distribution over a set of strings that gives a model of the probability that the data represented by a string is true of the real world. The two approaches have some differences, over which there has been some dispute. Part of this may stem from differing intuitions about the nature of the task. Wallace and his colleague David Dowe see their approach as fundamentally Bayesian, whereas Rissanen sees his approach as giving an actual hypothesis about the world, suggesting he sees the process embedded in an "epistemic engine" in which the strings have a natural interpretation. When I mentioned this to Dowe, he found the idea preposterous. In any case, both approaches have had some success with restricted data sets, such as DNA strings.

John Dorling (1991) has a more ambitious project of basing theory construction on the minimisation of information relative to data, the best theory being the one that most minimises the data. Brillouin earlier tried a similar approach, trying to determine the information in a theory, but it drew little attention, and had small success. Some of the problems are mentioned in the previous section.

 

Philosophy of Mind

The neutrality of syntactic information between the dynamical and logical, the representational and the represented has been noted by a number of authors (e.g., Sayre, Maturana and Varela 1980, Kampis 1991 and Devlin 1991) as relevant to the Philosophy of Mind. Some of the issues have been discussed under semantics, causation and perception. Whether there is more that information theory can offer the philosophy of mind is open to debate. The Dretskean view of information flow dovetails nicely with computational accounts of mind, and evolutionary accounts of perceptual information fits well with dynamical accounts of mind. No current approach makes deep connections between these larger approaches, however.

A second question is whether information theory has anything to say about traditional problems in the Philosophy of Mind such as the mind-body problem, the problem of intentionality, and the "hard problem" of consciousness. At this stage it seems unlikely that it will help with these problems in their traditional form, but it might be helpful in reformulating the problems in a more intelligible form. For example, Dretske's three levels of intentionality, though hardly providing a complete solution to the problem, suggests that the problem of intentionality is not a single problem, and that different informational states have differing causal and logical properties relevant to representation. Possibly, the traditional questions are the wrong questions to ask, or are at least too confused to have coherent answers.

 

Game Theory and Economics

One of the main issues in game theory is how to deal with imperfect information each player has about the others strategies. This is taken up in the article on game theory. The problem is especially difficult in cases of changing information. Since the information states are relevant to determining what game is being played, the dynamics of information is fundamental to useful application of game theory.

 

Bibliography

Resources

A readable and non-mathematical introduction to the issues involving information discussed here is Paul Young's The Nature of Information (1987). There is no compendium of mathematical information theory that covers all aspects of the topic. Kolmogorov's "Three Approaches to the Quantitative Definition of Information" (1965) lays out the basics nicely. Li and Vitányi (1993) review the scope of algorithmic information theory fairly completely. Calude (1994) is a basic text on information and randomness. Ingarden et al (1997) review some central principles, and applications to dynamical systems. Shannon's original paper on communications theory (1949) is still unsurpassed as a source on this topic. Chaitin's Algorithmic Information Theory (1987) expresses main results of metalogic in terms of information theory. Unfortunately, his account is somewhat inaccessible due to his use of LISP as his formal language. See Boolos and Jeffrey (xx) for a more familiar approach. Keith Devlin's Logic and Information (1991) reviews the basic results of mathematical information and its connections to logic, mental states, perception and action and situation semantics. Jon Barwise and Jerry Seligman's Information Flow (1997) sets new standards for discussion of the links between information and classifications of tokens by types. As the discussion of this entry has indicated, these issue are central to the role of information theory in a range of philosophical and scientific endeavours. Leon Brillouin's, Science and Information Theory (1962) is a classic source for the connections between information theory and physics. A general but not completely reliable introduction to the issues is Jeremy Campbell's Grammatical Man (1982). Leff and Rex (1990) have collected central papers to 1990 in Maxwellâs Demon: Entropy, Information, Computing. A classic philosophical source on information and perception is Fred Dretske's Knowledge and the Flow of Information (1981). The other areas covered in this article are still too ill-formed or too controversial to have reliable canonical texts.

References

Other Internet Resources

Copyright © 1999 by
John Collier
pljdc@alinga.newcastle.edu.au


First posted: February 19, 1999
Last modified: Mai 3, 2002