All posts by IEP Author

The Classical Theory of Concepts

The classical theory of concepts is one of the five primary theories of concepts, the other four being prototype or exemplar theories, atomistic theories, theory-theories, and neoclassical theories. The classical theory implies that every complex concept has a classical analysis, where a classical analysis of a concept is a proposition giving metaphysically necessary and jointly sufficient conditions for being in the extension across possible worlds for that concept. That is, a classical analysis for a complex concept C gives a set of individually necessary conditions for being a C (or conditions that must be satisfied in order to be a C) that together are sufficient for being a C (or are such that something’s satisfying every member of that set of necessary conditions entails its being a C). The classical view also goes by the name of “the definitional view of concepts,” or “definitionism,” where a definition of a concept is given in terms of necessary and jointly sufficient conditions.

This article provides information on the classical theory of concepts as present in the historical tradition, on concepts construed most generally, on the nature of classical conceptual analysis, and on the most significant of the objections raised against the classical view.

Table of Contents

  1. Historical Background and Advantages of the Classical View
  2. Concepts in General
    1. Concepts as Semantic Values
    2. Concepts as Universals
    3. Concepts as Mind-Dependent or Mind-Independent
    4. Concepts as the Targets of Analysis
    5. The Classical View and Concepts in General
  3. Classical Analyses
    1. Necessary and Sufficient Conditions
    2. Logical Constitution
    3. Other Conditions on Classical Analyses
    4. Testing Candidate Analyses
    5. Apriority and Analyticity with respect to Classical Analyses
  4. Objections to the Classical View
    1. Plato’s Problem
    2. The Argument from Categorization
    3. Arguments from Vagueness
    4. Quine’s Criticisms
    5. Scientific Essentialist Criticisms
  5. References and Further Reading

1. Historical Background and Advantages of the Classical View

The classical view can be traced back to at least the time of Socrates, for in many of Plato’s dialogues Socrates is clearly seeking a classical analysis of some notion or other. In the Euthyphro, for instance, Socrates seeks to know the nature of piety: Yet what he seeks is not given in terms of, for example, a list of pious people or actions, nor is piety to be identified with what the gods love. Instead, Socrates seeks an account of piety in terms of some specification of what is shared by all things pious, or what makes pious things pious—that is, he seeks a specification of the essence of piety itself. The Socratic elenchus is a method of finding out the nature or essence of various kinds of things, such as friendship (discussed in the Lysis), courage (the Laches), knowledge (the Theatetus), and justice (the Republic). That method of considering candidate definitions and seeking counterexamples to them is the same method one uses to test candidate analyses by seeking possible counterexamples to them, and thus Socrates is in effect committed to something very much like the classical view of concepts.

One sees the same sort of commitment throughout much of the Western tradition in philosophy from the ancient Greeks through the present. Clear examples include Aristotle’s notion of a definition as “an account [or logos] that signifies the essence” (Topics I) by way of a specification of essential attributes, as well as his account of definitions for natural kinds in terms of genus and difference. Particular examples of classical-style analyses abound after Aristotle: For instance, Descartes (in Meditation VI) defines body as that which is extended in both space and time, and mind as that which thinks. Locke (in the Essay Concerning Human Understanding, Ch. 21) defines being free with respect to doing an action A as choosing/willing to do A where one’s choice is part of the cause of one’s actually doing A. Hume defines a miracle (in Enquiry Concerning Human Understanding, §X) as an event that is both a violation of the laws of nature and caused by God. And so on. The classical view looks to be a presumption of the early analytic philosophers as well (with Wittgenstein being a notable exception). The classical view is present in the writings of Frege and Russell, and the view receives its most explicit treatment by that time in G.E. Moore’s Lectures on Philosophy and other writings. Moore gives a classical analysis of the very notion of a classical analysis, and from then on the classical view (or some qualified version of it) has been one of the pillars of analytic philosophy itself.

One reason the classical view has had such staying power is that it provides the most obvious grounding for the sort of inquiry within philosophy that Socrates began. If one presumes that there are answers to What is F?-type questions, where such questions ask for the nature of knowledge, mind, goodness, etc., then that entails that there is such a thing as the nature of knowledge, mind, goodness, etc. The nature of knowledge, for example, is that which is shared by all cases of knowledge, and a classical analysis of the concept of knowledge specifies the nature of knowledge itself. So the classical view fits neatly with the reasonable presumption that there are legitimate answers to philosophical questions concerning the natures or essences of things. As at least some other views of concepts reject the notion that concepts have metaphysically necessary conditions, accepting such other views is tantamount to rejecting (or at least significantly revising) the legitimacy of an important part of the philosophical enterprise.

The classical view also serves as the ground for one of the most basic tools of philosophy—the critical evaluation of arguments. For instance, one ground of contention in the abortion debate concerns whether fetuses have the status of moral persons or not. If they do, then since moral persons have the right not to be killed, generally speaking, then it would seem to follow that abortion is immoral. The classical view grounds the natural way to address the main contention here, for part of the task at hand is to find a proper analysis of the concept of being a moral person. If that analysis specifies features such that not all of them are had by fetuses, then fetuses are not moral persons, and the argument against the moral permissibility of abortion fails. But without there being analyses of the sort postulated by the classical view, it is far from clear how such critical analysis of philosophical arguments is to proceed. So again, the classical view seems to underpin an activity crucial to the practice of philosophy itself.

In contemporary philosophy, J. J. Katz (1999), Frank Jackson (1994, 1998), and Christopher Peacocke (1992) are representative of those who hold at least some qualified version of the classical view. There are others as well, though many philosophers have rejected the view (at least in part due to the criticisms to be discussed in section 4 below). The view is almost universally rejected in contemporary psychology and cognitive science, due to both theoretical difficulties with the classical view and the arrival of new theories of concepts over the last quarter of the twentieth century.

2. Concepts in General

The issue of the nature of concepts is important in philosophy generally, but most perspicuously in philosophy of language and philosophy of mind. Most generally, concepts are thought to be among those things that count as semantic values or meanings (along with propositions). There is also reason to think that concepts are universals (along with properties, relations, etc.), and what general theory of universals applies to concepts is thus a significant issue with respect to the nature of concepts. Whether concepts are mind-dependent or mind-independent is another such issue. Finally, concepts tend to be construed as the targets of analysis. If one then treats analysis as classical analysis, and holds that all complex concepts have classical analyses, then one accepts the classical view. Other views of concepts might accept the thesis that concepts are targets of analysis, but differ from the classical view over the sort of analysis that all complex concepts have.

a. Concepts as Semantic Values

As semantic values, concepts are the intensions or meanings of sub-sentential verbal expressions such as predicates, adjectives, verbs, and adverbs. Just as the sentence “The sun is a star” expresses the proposition that the sun is a star, the predicate “is a star” expresses the concept of being a star (or [star], to introduce notation to be used in what follows). Further, just as the English sentence “Snow is white” expresses the proposition that snow is white, and so does the German sentence “Schnee ist Weiss,” the predicates “is white” in English and “ist Weiss” in German both express the same concept, the concept of being white (or [white]). The intension or meaning of a sentence is a proposition. The intensions or meanings of many sub-sentential entities are concepts.

b. Concepts as Universals

Concepts are also generally thought to be universals. The reasons for this are threefold:

(1) A given concept is expressible using distinct verbal expressions. This can occur in several different ways. My uttering “Snow is white” and your uttering “Snow is white” are distinct utterances, and their predicates are distinct expressions of the same concept [white]. My uttering “Snow is white” and your uttering “Schnee ist Weiss” are distinct sentences with their respective predicates expressing the same concept ([white], again). Even within the same language, my uttering “Grisham is the author of The Firm” and your uttering “Grisham is The Firm’s author” are distinct sentences with distinct predicates, yet their respective predicates express the same concept (the concept [the author of The Firm], in this case).

(2) Second, different agents can possess, grasp, or understand the same concept, though such possession might come in degrees. Most English speakers possess the concept [white], and while many possess [neutrino], not many possess that concept to such a degree that one knows a great deal about what neutrinos themselves are.

(3) Finally, concepts typically have multiple exemplifications or instantiations. Many distinct things are white, and thus there are many exemplifications or instances of the concept [white]. There are many stars and many neutrinos, and thus there are many instances of [star] and [neutrino]. Moreover, distinct concepts can have the very same instances. The concepts [renate] and [cardiate] have all the same actual instances, as far as we know, and so does [human] and [rational animal]. Distinct concepts can also have necessarily all of the same instances: For instance, the concepts [triangular figure] and [trilateral figure] must have the same instances, yet the predicates “is a triangular figure” and “is a trilateral figure” seem to have different meanings.

As universals, concepts may be treated under any of the traditional accounts of universals in general. Realism about concepts (considered as universals) is the view that concepts are distinct from their instances, and nominalism is the view that concepts are nothing over and above, or distinct from, their instances. Ante rem realism (or platonism) about concepts is the view that concepts are ontologically prior to their instances—that is, concepts exist whether they have instances or not. In re realism about concepts is the view that concepts are in some sense “in” their instances, and thus are not ontologically prior to their instances. Conceptualism with respect to concepts holds that concepts are mental entities, being either immanent in the mind itself as a sort of idea, as constituents of complete thoughts, or somehow dependent on the mind for their existence (perhaps by being possessed by an agent or by being possessible by an agent). Conceptualist views also include imagism, the view (dating from Locke and others) that concepts are a sort of mental image. Finally, nominalist views of concepts might identify concepts with classes or sets of particular things (with the concept [star] being identified with the set of all stars, or perhaps the set of all possible stars). Linguistic nominalism identifies concepts with the linguistic expressions used to express them (with [star] being identified with the predicate “is a star,” perhaps). Type linguistic nominalism identifies concepts with types of verbal expressions (with [star] identified with the type of verbal expression exemplified by the predicate “is a star”).

c. Concepts as Mind-Dependent or Mind-Independent

On many views, concepts are things that are “in” the mind, or “part of” the mind, or at least are dependent for their existence on the mind in some sense. Other views deny such claims, holding instead that concepts are mind-independent entities. Conceptualist views are examples of the former, and platonic views are examples of the latter. The issue of whether concepts are mind-dependent or mind-independent carries great weight with respect to the clash between the classical view and other views of concepts (such as prototype views and theory-theories). If concepts are immanent in the mind as mental particulars, for instance, then various objections to the classical view have more force; if concepts exist independently of one’s ideas, beliefs, capacities for categorizing objects, etc., then some objections to the classical view have much less force.

d. Concepts as the Targets of Analysis

Conceptual analysis is of concepts, and philosophical questions of the form What is F? (such as “What is knowledge?,” “What is justice?,” “What is a person?,” etc.) are questions calling for conceptual analyses of various concepts (such as [knowledge], [justice], [person], etc.). Answering the further question “What is a conceptual analysis?” is yet another way to distinguish among different views of concepts. For instance, the classical view holds that all complex concepts have classical analyses, where a complex concept is a concept having an analysis in terms of other concepts. Alternatively, prototype views analyze concepts in terms of typical features or in terms of a prototypical or exemplary case. For instance, such a view might analyze the concept of being a bird in terms of such typical features as being capable of flight, being small, etc., which most birds share, even if not all of them do. A second sort of prototype theory (sometimes called “the exemplar view”) might analyze the concept of being a bird in terms of a most exemplary case (a robin, say, for the concept of being a bird). So-called theory-theories analyze a concept in terms of some internally represented theory about the members of the extension of that concept. For example, one might have an overall theory of birds, and the concept one expresses with one’s use of ‘bird’ is then analyzed in terms of the role that concept plays in that internally represented theory. Neoclassical views of concepts preserve one element of the classical view, namely the claim that all complex concepts have metaphysically necessary conditions (in the sense that, for example, being unmarried is necessary for being a bachelor), but reject the claim that all complex concepts have metaphysically sufficient conditions. Finally, atomistic views reject all notions of analysis just mentioned, denying that concepts have analyses at all.

e. The Classical View and Concepts in General

The classical view claims simply that all complex concepts have classical analyses. As such, the classical view makes no claims as to the status of concepts as universals, or as being mind-dependent or mind-independent entities. The classical view also is consistent with concepts being analyzable by means of other forms of analysis. Yet some views of universals are more friendly to the classical view than others, and the issue of the mind-dependence or mind-independence of concepts is of some importance to whether the classical view is correct or not. For instance, if concepts are identical to ideas present in the mind (as would be true on some conceptualist views), then if the contents of those ideas fail to have necessary and sufficient defining conditions, then the classical view looks to be false (or at least not true for all concepts). Alternatively, on platonic views of concepts, such a lack of available necessary and jointly sufficient conditions for the contents of our own ideas is of no consequence to the classical view, since ideas are not concepts according to platonic accounts.

3. Classical Analyses

There are two components to an analysis of a complex concept (where a complex concept is a concept that has an analysis in terms of other “simpler” concepts): The analysandum, or the concept being analyzed, and the analysans, or the concept that “does the analyzing.” For a proposition to be a classical analysis, the following conditions must hold:

(I) A classical analysis must specify a set of necessary and jointly sufficient conditions for being in the analysandum’s extension (where a concept’s extension is everything to which that concept could apply). (Other classical theorists deny that all classical analysis specify jointly sufficient conditions, holding instead that classical analyses merely specify necessary and sufficient conditions.)

(II) A classical analysis must specify a logical constitution of the analysandum.

Other suggested conditions on classical analysis are given below.

a. Necessary and Sufficient Conditions

Consider an arbitrary concept [F]. A necessary condition for being an F is a condition such that something must satisfy that condition in order for it to be an F. For instance, being male is necessary for being a bachelor, and being four-sided is necessary for being a square. Such characteristics specified in necessary conditions are shared by, or had in common with, all things to which the concept in question applies.

A sufficient condition for being an F is a condition such that if something satisfies that condition, then it must be an F. Being a bachelor is sufficient for being male, for instance, and being a square is sufficient for being a square.

A necessary and sufficient condition for being an F is a condition such that not only must a thing satisfy that condition in order to be an F, but it is also true that if a thing satisfies that condition, then it must be an F. For instance, being a four-sided regular, plane figure is both necessary and sufficient for being a square. That is, a thing must be a four-sided regular plane figure in order for it to be a square, and if a thing is a four-sided regular plane figure, then it must be a square. [The word “regular” means that all sides are the same length.]

Finally, for a concept [F], necessary and jointly sufficient conditions for being an F is a set of necessary conditions such that satisfying all of them is sufficient for being an F. The conditions of being four-sided and of being a regular figure are each necessary conditions for being a square, for instance, and the conjunction of them is a sufficient condition for being a square.

b. Logical Constitution

A classical analysis also gives a logical constitution of the concept being analyzed, in keeping with Moore’s idea that an analysis breaks a concept up into its components or constituents. In an analysis, it is the logical constituents that an analysis specifies, where a logical constituent of a concept is a concept entailed by that concept. (A concept entails another concept when being in the extension of the former entails being in the extension of the latter.) For instance, [four-sided] is a logical constituent of [square], since something’s being a square entails that it is four-sided.

For a logical constitution specified by a classical analysis, a logical constitution of a concept [F] is a collection of concepts, where each member of that collection is entailed by [F], and where [F] entails all of them taken collectively.

Most complex concepts will have more than one logical constitution, given that there are different ways of analyzing the same concept. For instance, “A square is a four-sided regular figure” expresses an analysis of [square], but so does “A square is a four-sided, closed plane figure having sides all the same length and having neighboring sides orthogonal to one another.” The first analysis gives one logical constitution for [square], and the second analysis seems to give another.

c. Other Conditions on Classical Analyses

In addition to conditions (I) and (II), other conditions on classical analyses have been proposed. Among them are the following:

(III) A classical analysis must not include the analysandum as either its analysans or as part of its analysans. That is, a classical analysis cannot be circular. “A square is a square” does not express an analysis, and neither does “A true sentence is a sentence that specifies a true correspondence between the proposition it expresses and the world.”

(IV) A classical analysis must not have its analysandum be more complex than its analysans. That is, while “A square is a four-sided regular figure” expresses an analysis, “A four-sided regular figure is a square” does not. While the latter sentence is true, it does not express an analysis of [four-sided regular figure]. The concept [four-sided regular figure] analyzes [square], not the other way around.

(V) A classical analysis specifies a precise extension of the concept being analyzed, in the sense of specifying for any possible particular whether it is definitely in or definitely not in that concept’s extension.

(VI) A classical analysis does not include any vague concepts in either its analysandum or its analysans.

The last two conditions concern vagueness. It might be thought that an analysis has to specify in some very precise way what is, and what is not, in that concept’s extension (condition (V)), and also that an expression of an analysis itself cannot include any vague terms (condition (VI)).

d. Testing Candidate Analyses

In seeking a correct analysis for a concept, one typically considers some number of so-called candidate analyses. A correct analysis will have no possible counterexamples, where such counterexamples might show a candidate analysis to be either too broad or too narrow. For instance, let

“A square is a four-sided, closed plane figure”

express a candidate analysis for the concept of being a square. This candidate analysis is too broad, since it would include some things as being squares that are nevertheless not squares. Counterexamples include any trapezoid or rectangle (that is not itself a square, that is).

On the other hand, the candidate analysis expressed by

“A square is a red four-sided regular figure”

is too narrow, as it rules out some genuine squares as being squares, as it is at least possible for there to be squares other than red ones. Assuming for sake of illustration that squares are the sorts of things that can be colored at all, a blue square counts as a counterexample to this candidate analysis, since it fails one of the stated conditions that a square be red.

It might be wondered as to why correct analyses have no possible counterexamples, instead of the less stringent condition that correct analyses have no actual counterexamples. The reason is that analyses are put forth as necessary truths. An analysis of a concept like the concept of being a mind, for instance, is a specification of what is shared by all possible minds, not just what is in common among those minds that actually happen to exist. Similarly, in seeking an analysis of the concept of justice or piety (as Socrates sought), what one seeks is not a specification of what is in common among all just actions or all pious actions that are actual. Instead, what one seeks is the nature of justice or piety, and that is what is in common among all possible just actions or pious actions.

e. Apriority and Analyticity with respect to Classical Analyses

Classical analyses are commonly thought to be both a priori and analytic. They look to be a priori since there is no empirical component essential to their justification, and in that sense classical analyses are knowable by reason alone. In fact, the method of seeking possible counterexamples to a candidate analysis is a paradigmatic case of justifying a proposition a priori. Classical analyses also appear to be analytic, since on the rough construal of analytic propositions as those propositions “true by meaning alone,” classical analyses are indeed that sort of proposition. For instance, “A square is a four-sided regular figure” expresses an analysis, and if “square” and “four-sided regular figure” are identical in meaning, then the analysis is true by meaning alone. On an account of analyticity where analytic propositions are those propositions where what is expressed by the predicate expression is “contained in” what is expressed in the subject expression, classical analyses turn out to be analytic. If what is expressed by “four-sided regular figure” is contained in what is expressed by “square,” then “A square is a four-sided regular figure” is such that the meaning of its predicate expression is contained in what its subject expresses. Finally, on an account of analyticity treating analytic propositions as those where substitution of codesignating terms yields a logical truth, classical analyses turn out to be analytic propositions once more. For since “square” and “four-sided regular figure” have the same possible-worlds extension, then substituting “square” for “four-sided regular figure” in “A square is a four-sided regular figure” yields “A square is a square,” which is a logical truth. (For a contrary view holding that analyses are synthetic propositions, rather than analytic, see Ackerman 1981, 1986, and 1992.)

4. Objections to the Classical View

Despite its history and natural appeal, in many circles the classical view has long since been rejected for one reason or another. Even in philosophy, many harbor at least some skepticism of the thesis that all complex concepts have classical analyses with the character described above. A much more common view is that some complex concepts follow the classical model, but not all of them. This section considers six fairly common objections to the classical view.

a. Plato’s Problem

Plato’s problem is that after over two and a half millennia of seeking analyses of various philosophically important concepts, few if any classical analyses of such concepts have ever been discovered and widely agreed upon as fact. If there are classical analyses for all complex concepts, the critics claim, then one would expect a much higher rate of success in finding such analyses given the effort expended so far. In fact, aside from ordinary concepts such as [bachelor] and [sister], along with some concepts in logic and mathematics, there seems to be no consensus on analyses for any philosophically significant concepts. Socrates’ question “What is justice?,” for instance, has received a monumental amount of attention since Socrates’ time, and while there has been a great deal of progress made with respect to what is involved in the nature of justice, there still is not a consensus view as to an analysis of the concept of justice. The case is similar with respect to questions such as “What is the mind?,” “What is knowledge?,” “What is truth?,” “What is freedom?,” and so on.

One might think that such an objection holds the classical view to too high a standard. After all, even in the sciences there is rarely universal agreement with respect to a particular scientific theory, and progress is ongoing in furthering our understanding of entities such as electrons and neutrinos, as well as events like the Big Bang—there is always more to be discovered. Yet it would be preposterous to think that the scientific method is flawed in some way simply because such investigations are ongoing, and because there is not universal agreement with respect to various theories in the sciences. So why think that the method of philosophical analysis, with its presumption that all complex concepts have classical analyses, is flawed in some way because of the lack of widespread agreement with respect to completed or full analyses of philosophically significant concepts?

Yet while there are disagreements in the sciences, especially in cases where a given scientific theory is freshly proposed, such disagreements are not nearly as common as they are in philosophy. For instance, while there are practicing scientists that claim to be suspicious of quantum mechanics, of the general theory of relativity, or of evolution, such detractors are extremely rare compared to what is nearly a unanimous opinion that those theories are correct or nearly correct. In philosophy, however, there are widespread disagreements concerning even the most basic questions in philosophy. For instance, take the questions “Are we free?” and “Does being free require somehow being able to do otherwise?” The first question asks for an analysis of what is meant by “free,” and the second asks whether being able to do otherwise is a necessary condition on being free. Much attention has been paid to such basic questions, and the critics of the classical view claim that one would expect some sort of consensus as to the answers to them if the concept of freedom really has a classical analysis. So there is not mere disagreement with respect to the answers to such questions, but such disagreements are both widespread and involve quite fundamental issues as well. As a result, the difficulty in finding classical analyses has led many to reject the classical view.

b. The Argument from Categorization

There are empirical objections to the classical view as well. The argument from categorization takes as evidence various data with respect to our sorting or categorizing things into various categories, and infers that such behavior shows that the classical view is false. The evidence shows that we tend not to use any set of necessary and sufficient conditions to sort things in to one category or another, where such sorting behavior is construed as involving the application of various concepts. It is not as if one uses a classical analysis to sort things into the bird category, for instance. Instead, it seems that things are categorized according to typical features of members of the category in question, and the reason for this is that more typical members of a given category are sorted into that category more quickly than less typical members of that same category. Robins are sorted into the bird category more quickly than eagles, for instance, and eagles are sorted into the bird category more quickly than ostriches. What this suggests is that if concepts are used for acts of categorization, and classical analyses are not used in all such categorization tasks, then the classical view is false.

One presumption of the argument is that when one sorts something into one category or another, one uses one’s understanding of a conceptual analysis to accomplish the task. Yet classical theorists might complain that this need not be the case. One might use a set of typical features to sort things into the bird category, even if there is some analysis not in terms of typical features that gives the essential features shared by all birds. In other words (as Rey (1983) points out), there is a difference between what it is to look like a bird and what it is to be a bird. An analysis of a concept gives the conditions on which something is an instance of that concept, and it would seem that a concept can have an analysis (classical or otherwise) even if agents use some other set of conditions in acts of categorization.

Whether this reply to the argument from categorization rebuts the argument remains to be seen, but many researchers in cognitive psychology have taken the empirical evidence from acts of categorization to be strong evidence against the classical view. For such evidence also serves as evidence in favor of a view of concepts in competition with the classical view: the so-called prototype view of concepts. According to the prototype view, concepts are analyzed not in terms of necessary and jointly sufficient conditions, but in terms of lists of typical features. Such typical features are not shared by all instances of a given concept, but are shared by at least most of them. For instance, a typical bird flies, is relatively small, and is not carnivorous. Yet none of these features is shared by all birds. Penguins don’t fly, albatrosses are quite large, and birds of prey are carnivores. Such a view of concepts fits much more neatly with the evidence concerning our acts of categorization, so such critics reject the classical view.

c. Arguments from Vagueness

Vagueness has also been seen as problematic for the classical view. For one might think that in virtue of specifying necessary and jointly sufficient conditions, a classical analysis thus specifies a precise extension for the concept being analyzed (where a concept C has a precise extension if and only if for all x, x is either definitely in the extension of C or definitely not in the extension of C). Yet most complex concepts seem not to have such precise extensions. Terms like “bald,” “short,” and “old” all seem to have cases where it is unclear whether the term applies or not. That is, it seems that the concepts expressed by those terms are such that their extensions are unclear. For instance, it seems that there is no precise boundary between the bald and the non-bald, the short and the non-short, and the old and the non-old. But if there are no such precise boundaries to the extensions for many concepts, and a classical analysis specifies such precise boundaries, then there cannot be classical analyses for what is expressed by vague terms.

Two responses deserve note. One reply on behalf of the classical view is that vagueness is not part of the world itself, but instead is a matter of our own epistemic shortcomings. We find unclear cases simply because we don’t know where the precise boundaries for various concepts lie. There could very well be a precise boundary between the bald and the non-bald, for instance, but we find “bald” to be vague simply because we do not know where that boundary lies. Such an epistemic view of vagueness would seem to be of assistance to the classical view, though such a view of vagueness needs a defense, particularly given the presence of other plausible views of vagueness. The second response is that one might admit the presence of unclear cases, and admit the presence of vagueness or “fuzziness” as a feature of the world itself, but hold that such fuzziness is mirrored in the analyses of the concepts expressed by vague terms. For instance, the concept of being a black cat might be analyzed in terms of [black] and [cat], even if “black” and “cat” are both vague terms. So classical theorists might reply that if the vagueness of a term can be mirrored in an analysis in such a way, then the classical view can escape the criticisms.

d. Quine’s Criticisms

A family of criticisms of the classical view is based on W.V.O. Quine’s (1953/1999, 1960) extensive attack on analyticity and the analytic/synthetic distinction. According to Quine, there is no philosophically clear account of the distinction between analytic and synthetic propositions, and as such there is either no such distinction at all or it does no useful philosophical work. Yet classical analyses would seem to be paradigmatic cases of analytic propositions (for example, [bachelors are unmarried males], [a square is a four-sided regular figure]), and if there are no analytic propositions then it seems there are no classical analyses. Furthermore, if there is no philosophically defensible distinction between analytic and synthetic propositions, then there is no legitimate criterion by which to delineate analyses from non-analyses. Those who hold that analyses are actually synthetic propositions face the same difficulty. If analyses are synthetic, then one still needs a principled difference between analytic and synthetic propositions in order to distinguish between analyses and non-analyses.

The literature on Quine’s arguments is vast, and suffice it to say that criticism of Quine’s arguments and of his general position is widespread as well. Yet even among those philosophers who reject Quine’s arguments, most admit that there remains a great deal of murkiness concerning the analytic/synthetic distinction, despite its philosophical usefulness. With respect to the classical view of concepts, the options available to classical theorists are at least threefold: Either meet Quine’s arguments in a satisfactory way, reject the notion that all analyses are analytic (or that all are synthetic), or characterize classical analysis in a way that is neutral with respect to the analytic/synthetic distinction.

e. Scientific Essentialist Criticisms

Scientific essentialism is the view that the members of natural kinds (like gold, tiger, and water) have essential properties at the microphysical level of description, and that identity statements between natural kind terms and descriptions of such properties are metaphysically necessary and knowable only a posteriori. Some versions of scientific essentialism include the thesis that such identity statements are synthetic. That such statements are a posteriori and synthetic looks to be problematic for the classical view. For sake of illustration, let “Water is H2O” express an analysis of what is meant by the natural kind term “water.” According to scientific essentialism, such a proposition is metaphysically necessary in that it is true in all possible worlds, but it is a necessary truth discovered via empirical science. As such, it is not discovered by the a priori process of seeking possible counterexamples, revising candidate analyses in light of such counterexamples, and so on. But if water’s being H2O is known a posteriori, this runs counter to the usual position that all classical analyses are a priori. Furthermore, given that what is expressed by “Water is H2O” is a posteriori, this entails that it is synthetic, rather than analytic as the classical view would normally claim.

Again, the literature is vast with respect to scientific essentialism, identity statements involving natural kind terms, and the epistemic and modal status of such statements. For classical theorists, short of denying the basic theses of scientific essentialism, some options that save some portion of the classical view include holding that the classical view holds for some concepts (such as those in logic and mathematics) but not others (such as those expressed by natural kind terms), or characterizing classical analysis in a way that is neutral with respect to the analytic/synthetic distinction. How successful such strategies would be remains to be seen, and such a revised classical view would have to be weighed against other theories of concepts that handle all complex concepts with a unified treatment.

5. References and Further Reading

  • Ackerman, D. F. 1981. “The Informativeness of Philosophical Analysis.” In P. French, et al. (Eds.), Midwest Studies in Philosophy, vol. 6. Minneapolis, Minnesota: University of Minnesota Press, 313-320.
  • Ackerman, D. F. 1986. “Essential Properties and Philosophical Analysis.” In P. French, et al. (Eds.), Midwest Studies in Philosophy, vol. 11. Minneapolis, Minnesota: University of Minnesota Press, 304-313.
  • Ackerman, D. F. 1992. “Analysis and Its Paradoxes.” In E. Ullman-Margalit (Ed.), The Scientific Enterprise: The Israel Colloquium Studies in History, Philosophy, and Sociology of Science, vol. 4. Norwell, Massachusetts: Kluwer.
  • Bealer, George. 1982. Quality and Concept. Oxford: Clarendon Press.
  • Bealer, George. 1996. “A Priori Knowledge and the Scope of Philosophy.” Philosophical Studies 81, 121-142.
  • Bonjour, Laurence. 1998. In Defense of Pure Reason: A Rationalist Account of A Priori Justification. Cambridge: Cambridge University Press.
  • Chalmers, David J. and Jackson, Frank. 2001. “Conceptual Analysis and Reductive Explanation” [On-line]. Available: http://www.u.arizona.edu/~chalmers/papers/analysis.html
  • Donnellan, Keith. 1983. “Kripke and Putnam on Natural Kind Terms.” In C. Ginet and S. Shoemaker (Eds.), Knowledge and Mind. Oxford: Oxford University Press, 84-104.
  • Fodor, Jerry A. 1998. Concepts: Where Cognitive Science Went Wrong. Oxford: Clarendon Press.
  • Fodor, Jerry A., Garrett, M. F., Walker, E. C. T., and Parkes, C. H. 1980/1999. “Against Definitions.” In Margolis and Laurence 1999, 491-512.
  • Grice, H. P. and Strawson, P. F. 1956. “In Defense of a Dogma.” The Philosophical Review 65 (2), 141-158.
  • Hanna, Robert. 1998. “A Kantian Critique of Scientific Essentialism.” Philosophy and Phenomenological Research 58 (3), 497-528.
  • Harman, Gilbert. 1999. “Doubts About Conceptual Analysis.” In Gilbert Harman, Reasoning, Meaning, and Mind, Oxford: Oxford University Press, 138-143.
  • Jackson, Frank. 1994. “Armchair Metaphysics.” In M. Michael and J. O’Leary-Hawthorne (Eds.), Philosophy in Mind. Dordrecht: Kluwer.
  • Jackson, Frank. 1998. From Metaphysics to Ethics: A Defence of Conceptual Analysis. Oxford: Clarendon Press.
  • Katz, J. J. 1999.
  • Keefe, Rosanna and Smith, Peter (Eds.). 1999. Vagueness: A Reader. Cambridge, Massachusetts: M.I.T. Press.
  • King, Jeffrey C. 1998. “What is a Philosophical Analysis?” Philosophical Studies 90, 155-179.
  • Kripke, Saul A. 1980. Naming and Necessity. Cambridge, Massachusetts: Harvard University Press.
  • Kripke, Saul A. 1993. “Identity and Necessity.” In A. W. Moore, Meaning and Reference. Oxford: Oxford University Press, 162-191.
  • Langford, C. H. 1968. “The Notion of Analysis in Moore’s Philosophy.” In Schlipp 1968, 321-342.
  • Laurence, Stephen and Margolis, Eric. 1999. “Concepts and Cognitive Science.” In Margolis and Laurence 1999, 3-81.
  • Margolis, Eric and Laurence, Stephen (Eds.). 1999. Concepts: Core Readings. M.I.T. Press.
  • Moore, G. E. 1966. Lectures on Philosophy. Ed. C. Lewy. London: Humanities Press.
  • Moore, G. E. 1968. “A Reply to My Critics.” In Schlipp 1968, 660-677.
  • Murphy, Gregory L. 2002. The Big Book of Concepts. Cambridge: M.I.T. Press.
  • Peacocke, Christopher. 1992. A Study of Concepts. Cambridge: M.I.T. Press.
  • Pitt, David. 1999. “In Defense of Definitions.” Philosophical Psychology 12 (2), 139-156.
  • Plato. 1961a. The Collected Dialogues of Plato. Ed. Edith Hamilton and Huntington Cairns. Princeton, New Jersey: Princeton University Press.
  • Plato. 1961b. Euthyphro. Trans. L. Cooper. In Plato 1961a, 169-185.
  • Plato. 1961c. Laches. Trans. L. Cooper. In Plato 1961a, 123-144.
  • Plato. 1961d. Lysis. Trans. L. Cooper. In Plato 1961a, 145-168.
  • Plato. 1961e. Theatetus. Trans. L. Cooper. In Plato 1961a, 845-919.
  • Plato. 1992. Republic. Trans. G. M. A. Grube. Indianapolis, Indiana: Hackett.
  • Prinz, Jesse J. 2002. Furnishing the Mind: Concepts and Their Perceptual Basis. Cambridge: M.I.T. Press.
  • Putnam, Hilary. 1962. “It Ain’t Necessarily So.” Journal of Philosophy 59 (22), 658-671.
  • Putnam, Hilary. 1966. “The Analytic and the Synthetic.” In H. Feigl and G. Maxwell, eds., Minnesota Studies in the Philosophy of Science, vol. III. Minneapolis, Minnesota: University of Minnesota Press, 358-397. Putnam,
  • Hilary. 1970. “Is Semantics Possible?” In H. Keifer and M. Munitz, eds., Language, Belief, and Metaphysics. New York: State University of New York Press, 50-63.
  • Putnam, Hilary. 1975. “The Meaning of ‘Meaning’.” In Keith Gunderson (Ed.), Minnesota Studies in the Philosophy of Science, vol. VII. Minneapolis, Minnesota: University of Minnesota Press, 131-193.
  • Putnam, Hilary. 1983. “‘Two Dogmas’ Revisited.” In Hilary Putnam, Realism and Reason: Philosophical Papers, Volume 3. Cambridge: Cambridge University Press, 87-97.
  • Putnam, Hilary. 1990. “Is Water Necessarily H2O?” In James Conant (Ed.), Realism with a Human Face. Cambridge: Harvard University Press, 54-79.
  • Quine, W. V. O. 1953/1999. “Two Dogmas of Empiricism.” In Margolis and Laurence 1999, 153-170.
  • Quine, W. V. O. 1960. Word and Object. Cambridge: The M.I.T. Press.
  • Ramsey, William. 1992. “Prototypes and Conceptual Analysis.” Topoi 11, 59-70.
  • Rey, Georges. 1983. “Concepts and Stereotypes.” Cognition 15, 237-262.
  • Rey, Georges. 1985. “Concepts and Conceptions: A Reply to Smith, Medin and Rips.” Cognition 19, 297-303.
  • Rosch, Eleanor. 1999. “Principles of Categorization.” In Margolis and Laurence 1999, 189-206.
  • Schlipp, P. (Ed.). 1968. The Philosophy of G. E. Moore. LaSalle, Illinois: Open Court.
  • Smith, Edward E. 1989. “Three Distinctions About Concepts and Categorization.” Mind and Language 4 (1, 2), 57-61.
  • Smith, Edward E., and Medin, Douglas L. 1981. Categories and Concepts. Cambridge: Harvard University Press.
  • Smith, Edward E. 1999. “The Exemplar View.” In Margolis and Laurence 1999, 207-221.
  • Smith, Edward E., Medin, Douglas L., and Rips, Lance J. 1984. “A Psychological Approach to Concepts: Comments on Rey’s ‘Concepts and Stereotypes.’” Cognition 17, 265-274.
  • Sosa, Ernest. 1983. “Classical Analysis.” Journal of Philosophy 80 (11), 695-710.
  • Stalnaker, Robert. 2001. “Metaphysics Without Conceptual Analysis.” Philosophy and Phenomenological Research 62 (3), 631-636.
  • Williamson, Timothy. 1994. Vagueness. New York: Routledge. Williamson, Timothy. 1999. “Vagueness and Ignorance.” In Keefe and Smith 1999, 265-280.

Author Information

Dennis Earl
Email: dearl@coastal.edu
Coastal Carolina University
U. S. A.

Fallibilism

Fallibilism is the epistemological thesis that no belief (theory, view, thesis, and so on) can ever be rationally supported or justified in a conclusive way. Always, there remains a possible doubt as to the truth of the belief. Fallibilism applies that assessment even to science’s best-entrenched claims and to people’s best-loved commonsense views. Some epistemologists have taken fallibilism to imply skepticism, according to which none of those claims or views are ever well justified or knowledge. In fact, though, it is fallibilist epistemologists (which is to say, the majority of epistemologists) who tend not to be skeptics about the existence of knowledge or justified belief. Generally, those epistemologists see themselves as thinking about knowledge and justification in a comparatively realistic way — by recognizing the fallibilist realities of human cognitive capacities, even while accommodating those fallibilities within a theory that allows perpetually fallible people to have knowledge and justified beliefs. Still, although that is the aim of most epistemologists, the question arises of whether it is a coherent aim. Are they pursuing a coherent way of thinking about knowledge and justification? Much current philosophical debate is centered upon that question. Epistemologists generally seek to understand knowledge and justification in a way that permits fallibilism to be describing a benign truth about how we can gain knowledge and justified beliefs. One way of encapsulating that project is by asking whether it is possible for a person ever to have fallible knowledge and justification.

Table of Contents

  1. Introduction
  2. Formulating Fallibilism: Preliminaries
  3. Formulating Fallibilism: A Thesis about Justification
  4. Formulating Fallibilism: Necessary Truths
  5. Empirical Evidence of Fallibility
  6. Philosophical Sources of Fallibilism: Hume
  7. Philosophical Sources of Fallibilism: Descartes
  8. Implications of Fallibilism: No Knowledge?
  9. Implications of Fallibilism: Knowing Fallibly?
  10. Implications of Fallibilism: No Justification?
  11. References and Further Reading

1. Introduction

The term “fallibilism” comes from the nineteenth century American philosopher Charles Sanders Peirce, although the basic idea behind the term long predates him. According to that basic idea, no beliefs (or opinions or views or theses, and so on) are so well justified or supported by good evidence or apt circumstances that they could not be false. Fallibilism tells us that there is no conclusive justification and no rational certainty for any of our beliefs or theses. That is fallibilism in its strongest form, being applied to all beliefs without exception. In principle, it is also possible to be a restricted fallibilist, accepting a fallibilism only about some narrower class of beliefs. For example, we might be fallibilists about whatever beliefs we gain through the use of our senses — even while remaining convinced that we possess the ability to reason in ways that can, at least sometimes, manifest infallibility. Thus, one special case of this possible selectivity would have us being fallibilists about empirical science even while exempting mathematical reasoning from that verdict. For simplicity, though (and because it represents the thinking of most epistemologists), in what follows I will generally discuss fallibilism in its unrestricted form. (The exception will be section 6, where a particularly significant, but seemingly narrower, form of fallibilism will be presented.)

Fallibilism is an epistemologically pivotal thesis, and our initial priority must be to formulate it carefully. Almost all contemporary epistemologists will say that they are fallibilists. Yet the vast majority of them also wish not to be skeptics. They would rather not be committed to embracing principles about the nature of knowledge and justification which commit them to denying that there can be any knowledge or justified belief. This desire coexists, nonetheless, with the belief that fallibility is rampant. Many epistemological debates, it transpires, can be understood in terms of how they try to balance these epistemologically central desires. So, can we find a precise philosophical understanding of ourselves as being perpetually fallible even though reassuringly rational and, for the most part, knowledgeable?

2. Formulating Fallibilism: Preliminaries

An initial statement of fallibilism might be this:

All beliefs are fallible. (No belief is infallible.)

But what, exactly, is that saying? Here are three claims it is not making.

(1) Fallible people. It is not saying just that all believers — all people — are fallible. A person as such is fallible if, at least sometimes, he is capable of forming false beliefs. But that is compatible with the person’s often — on some other occasions — believing infallibly. And that is not a state of affairs which is compatible with fallibilism.

(2) Actually false beliefs. Nor is fallibilism the thesis that in fact all beliefs are false. That possibility is allowed — but it is not required — by fallibilism. Hence, it is false to portray fallibilism — as commentators on science, in particular, sometimes do — in these terms:

All scientific beliefs are false. This includes all scientific theories, of course. (After all, even scientific theories are only theories. So they are fallible — and therefore false.)

Regardless of whether or not that is a correct claim about scientific beliefs and theories, it is not an accurate portrayal of what fallibilism means to say. The key term in fallibilism, as we have so far formulated it, is “fallible.” And this conveys — through its use of “-ible” — only some kind of possibility of falsity, rather than the definite presence of actual falsity.

(3) Contingent truths. Take the belief that there are currently at least one thousand kangaroos alive in Australia. That belief is true, although it need not have been. It could have been false — in that the world need not have been such as to make it true. So, the belief is only contingently true (as philosophers say). By definition, any contingent truth could have failed to be true. But even if we were to accept that all truths are only contingently true, we would not be committed to fallibilism. The recognition that contingent truths exist is not what underlies fallibilism. The claim that any contingent truth could instead have been false is not the fallibilist claim, because fallibilism is not a thesis about truths in themselves. Instead, it is about our attempts in themselves to accept or believe truths. It concerns a kind of fundamental limitation first and foremost upon our powers of rational thought and representation. And although a truth’s being contingent means that it did not have to be true, this does not mean that it will, or even that it can, be altering its truth-value (by becoming false) in such a way as to deceive you. For instance, the truth that there are now more than one thousand kangaroos alive in Australia is not made false even by there being only five kangaroos alive in Australia in two days time from now.

3. Formulating Fallibilism: A Thesis about Justification

Given section 2’s details, a better (and routine) expression of fallibilism is this:

F: All beliefs are only, at best, fallibly justified.

F’s main virtue, as a formulation of fallibilism, is its locating the culprit fallibility as arising within the putative justification that is present on behalf of a given belief. The kind of justification in question is called “epistemic justification” by epistemologists. And the suggested formulation, F, of fallibilism is saying that there is never conclusive justification for the truth of a given belief.

There are competing epistemological theories of what, exactly, epistemic justification is. Roughly speaking, though, it is whatever would make a belief more, rather than less, rationally well supported or established. This sort of rationality is meant to be truth-directed. For example (as Conee and Feldman 2004 would argue), whenever some evidence is providing epistemic support — justification — for a belief, this is a matter of its supporting the truth of that belief. In that sense, the evidence provides good reason to adopt the belief — to adopt it as true. Or (to take another example, such as would be approved of by the kind of theory from Goldman 1979) a believer might have formed her belief within some circumstance or in some way that — regardless of whether she can notice this — makes her belief likely to be true. (And when are these kinds of justificatory support present? In particular, are they only ever present if they are guaranteeing that the belief being supported is true? Are any actually false beliefs ever justified? Section 10 will focus on the question of whether fallible justification is ever present, either for true or for false beliefs.)

Just as there are competing interpretations of the nature of epistemic justification, epistemologists exercise care in how they read F. Perhaps the most natural reading of it says that no one is ever so situated — even when possessing evidence in favor of the truth of a particular belief — that, if she were to be rational in the sense of respecting and understanding and responding just to that evidence, she could not proceed to doubt that the belief is true. More generally, the idea behind F is that, no matter how good one’s justification is in support of a particular belief’s being true, that justification is never so good as to be conclusive — leaving no room for anyone who might be rationally attending to that justification not to have the belief it is supporting. At any stage, according to F, doubt could sensibly (in some relevant sense of “sensibly”) arise as to the truth of the particular belief.

Often, therefore, this kind of possible doubt is called a rational doubt. This is not to say that, necessarily, the most rational reaction is to be swayed by the doubt, accepting it as decisive; whether one should react like that is a separate issue, probably deserving to be decided only after some subtle argument. The term “rational doubt” is meant only to distinguish this sort of actual or possible doubt from a patently irrational one — a doubt that is psychologically, but not even prima facie rationally, available. How might a doubt that is not even prima facie rational arise? Here is one possible way. Imagine a person who is attending to evidence for the truth of a particular belief, yet who refuses to accept the belief’s being true. Suppose that this refusal is due either (i) to her misunderstanding the evidence or (ii) to some psychological quirk such as a general lack of respect for evidence at all or such as mere obstinacy (without her supplying counter-reasons disputing the truth or power of the evidence). There is no accounting for why some people will in fact doubt a given belief: psychologically, doubt could be an option even in the face of rationally conclusive evidence. Nevertheless, fallibilism is not a thesis about that psychological option. The option it describes concerns rationality. Fallibilism is about what it claims to be the ever-present availability of rational doubt.

Accordingly, one possible way of misinterpreting F would involve confusing the concept of a rational doubt with that of a subjectively felt doubt or, maybe more generally, a psychologically present doubt. Rational doubts need not be psychologically actual doubts, just as psychologically actual ones need not be rational. In theory, a person might have or feel some doubt as to whether a particular claim is true — some doubt which she should not have or feel. (Perhaps she is misevaluating the strength of the evidence she has in support of that claim.) Equally, someone might have or feel no doubt as to the truth of a belief he has — when he should have or feel some such doubt. (Perhaps he, too, is misevaluating the strength of the evidence he has in support of his belief.) In either case, the way in which the person is in fact reacting — by having, or by not having, an actual doubt — does not determine whether his or her evidence is in fact providing rationally conclusive support. That is because a particular reaction — of doubting or of not doubting — might not be as justified or rational in itself as is possible. (By analogy, we may keep in mind the case — unfortunately, all too common a kind of case — of a brutal tyrant who claims, sincerely, to have a clear conscience at the end of his life. The morality of his actions is more obviously to be explicated in terms of what his conscience should be telling him rather than of what it is telling him.) In effect, F is saying that no matter what evidence you have, no matter how carefully you have accumulated it, and no matter how rationally you use and evaluate it, you can never thereby have conclusive justification for a belief which you wish to support via all that evidence. Equally, F is saying that no matter what circumstance you occupy, and no matter how you are forming a particular belief, no guarantee is thereby being provided of your belief being true. In those respects (according to F), any justification you have is fallible — and it will remain so, no matter what you do with it, no matter how assiduously you attend to it, no matter what the circumstances are in which you are operating. The problem will also remain, no matter how you might supplement or try to improve your evidence or circumstances. Any possible addition or alteration that you might make will continue leaving open at least a possibility — one to which a careful and rational thinker would in principle respond respectfully if she were to notice it — of your belief’s being false.

In that way, fallibilism — as a thesis about justification — travels more deeply into the human cognitive condition than it would do if it were a point merely about logic, say. It is not saying that no belief is ever supported by evidence whose content logically entails the first belief’s content. An example of that situation would be provided by a person’s having, as evidence, the belief that he is a living, breathing Superman — from which he infers that he is alive. The evidence’s content (“I am a living, breathing Superman”) does logically entail the truth of the inferred content (“I am alive”). (This attribution of logical validity or entailment means — from standard deductive logic — that it is impossible for the first content to be true without the second one also being true.) But the justification being supplied is fallible, because — obviously — the person will have, at best, inconclusive justification for thinking that he is a living, breathing Superman in the first place. The putative justification is the belief (about being Superman) and its history, not only its content and the associated logical relations. Yet fallibilism says that, even when all such further features are taken into account, some potential will remain for rational doubt to be present.

4. Formulating Fallibilism: Necessary Truths

Nevertheless, a modification of F (in section 3) is required, it seems, if fallibilism is to apply to beliefs like mathematical ones or to beliefs reporting theses of pure logic, for instance. Most philosophers would accept that it is possible to be fallible in holding such a belief — and that this is so, even given that there is a sense in which such a belief, when true, could not ever be false. Thus, perhaps mathematical believing is a fallible process, able to lead to false beliefs. Perhaps this is so, even if mathematical truths themselves never “just happen” to be true — never depending upon changeable surrounding circumstances for their truth, hence never being susceptible to being rendered false by some change in those surrounding circumstances. How should we modify F, therefore, so as to understand the way in which fallibility can nonetheless be present in such a case? More generally, how should we modify F, so as to understand the prospect of a person ever having fallible beliefs (let alone only fallible ones) in what philosophers call necessary truths?

By definition, any truth which is not contingent is necessary. The class of necessary truths is the class of propositions or contents which, necessarily, are true. They could not have failed to be true. And that class will generally be thought to contain — maybe most significantly — mathematical truths. Consider, then, the belief that 2 + 2 = 4. In itself (almost every philosopher will concur), there is no possibility of that belief’s being false. However, if it is impossible for that belief to be false, then there is also no possible evidence on the basis of which — in coming to believe that 2 + 2 = 4 — a person could be forming a false belief. In this way, no belief that 2 + 2 = 4 could be merely fallibly justified — at least as this phenomenon has been portrayed in F. Yet it is clear — or so most epistemologists will aver — that mathematical believing can be fallible. Indeed, if fallibilism is true, all mathematical beliefs will be subject to some sort of fallibility: even mathematical beliefs would, at best, be only fallibly justified. How, therefore, is this to be understood?

Here is one suggestion — F* — which modifies F by drawing upon some standard epistemological thinking. The aim in moving from F to F* would be to allow for the possibility of having a fallible belief in a necessary truth:

F*: All beliefs are, at best, only fallibly justified. (And a belief is fallibly justified when — even if the belief, considered in itself, could not be false — the justification for it exemplifies or reflects some more general way or process of thinking or forming beliefs, a way or process which is itself fallible due to its capacity to result in false beliefs.)

Sections 5 and 7 will describe a few possible reasons for a fallibilist to regard your belief that 2 + 2 = 4 as being fallible. In the meantime, we need only note schematically how F* would accommodate those possible reasons. The basic approach would be as follows. Although your belief that 2 + 2 = 4 cannot be false (once it is present), your supposed justification for it is fallible. This could be so in a few ways. For a start, maybe you are merely repeating by rote something you were told many years ago by a somewhat unreliable school teacher. (Imagine the teacher having been poor at making accurate claims within most other areas of mathematics. Even with respect to the elements of mathematics about which she was accurate, she might have been merely repeating by rote what she had been told by her own early — and similarly unreliable — teachers.) The fallibility of memory is also relevant: over the years, one forgets much. Still, your current belief that 2 + 2 = 4 seems accurate. And it need not be present only because of your fallible memory of what your fallible teacher told you. Suppose that you are now very sophisticated in your mathematical thinking: in particular, your justification for your belief that 2 + 2 = 4 is purely mathematical in content. That justification involves clever representation, via precisely defined symbols, of abstract ideas. Nevertheless, even such purely mathematical reasoning can mislead you (no matter that it has not done so on this occasion). Really proving that 2 + 2 = 4 is quite difficult; and when people are seeking to grasp and to implement such proofs, human fallibility may readily intrude. Actual attempts to establish mathematical truths need not always lead to accurate or true beliefs.

At any rate, that is how a fallibilist might well analyze the case.

5. Empirical Evidence of Fallibility

How can we ascertain which of our ways of thinking are fallible? Both ordinary observation and sophisticated empirical research are usually regarded as able to help us here, by revealing some of the means by which fallibility enters our cognitive lives. I will list several of the seemingly fallible means of belief-formation and belief-maintenance that have been noticed.

(1) Misusing evidence. Apparently, people often misevaluate the strength of their evidence. By taking it to be stronger or weaker support than in fact it is for the truth of a particular belief, a person could easily be led to adopt or retain a false, rather than true, belief. Indeed, there are many possible ways not to use evidence properly. For example, people do not always notice, let alone compare and resolve, conflicting pieces of evidence. They might overlook some of the evidence available to them. There can be inattention to details of their evidence. And so forth.

(2) Unreliable senses. How many of us have wholly reliable — always accurate — senses? Shortsightedness is not so rare. The same is true of long-sightedness. People can have poor hearing, not to mention less-than-perfectly discerning senses of smell, taste, and so on. Sensory illusions and hallucinations affect us, too. The road seems to ripple under the heat of the sun; the stick appears to bend as it enters the glass of water; and so forth. In such cases we will think, upon reflection, that what we seem to sense is something we only seem to sense.

(3) Unreliable memory. At times, people suffer lapses of memory; and they can realize this, experiencing “blanks” as they endeavor to recall something. They can also feel as though they are remembering something, when actually this feeling is inaccurate. (A “false memory” is like that. The event which a person seems to recall, for instance, never actually happened.)

(4) Reasoning fallaciously. To reason in a logically invalid way is to reason in a way which, even given the truth of one’s premises or evidence, can lead to falsity. It is thereby to reason fallibly. Do we often reason like that? Seemingly, yes. Of course, often we and others realize that we are doing so. And we and those others might generally be satisfied with our admittedly fallible reasoning. (But should we ever regard it with satisfaction? Section 10 will consider this kind of question.) There are times, though, when we and others do not notice the fallibility in our reasoning. On those occasions, we are — without realizing this about ourselves — reasoning fallaciously. That is, we are reasoning in ways which are logically invalid but which most people mistakenly, albeit routinely, regard as being logically valid.

(5) Intelligence limitations. Is each of us so intelligent as never to make mistakes which a more intelligent person would be less likely (all else being equal) to make? Presumably none of us escape that limitation. Do we notice people making mistakes due to their exercising (and perhaps possessing) less intelligence than was needed not to make those mistakes? We appear to do so. Sometimes (often too late), we observe this in ourselves, too.

(6) Representational limitations. We use language and thought to represent or describe reality — hopefully, to do this accurately. But people have often, we believe, made mistakes about the world around them because of inadequacies in their representational or descriptive resources. For example, they can have been applying misleading and clumsily constructed concepts — ones which could well be replaced within an improved science. (And this sort of problem — at least to judge by the apparent inescapability of disputes among its practitioners — might be even more acute within such areas of thought as philosophy.)

(7) Situational limitations. It is not uncommon for people to make mistakes of fact because they have biases or prejudices that impede their ability to perceive or represent or reflect accurately upon those facts. Such mistakes may be made when people are manifesting an insufficiently developed awareness of pertinent aspects of the world. Maybe a person’s early upbringing, and how she has subsequently lived her life, has not exposed her to a particularly wide range of ideas. Perhaps she has not encountered what are, as it happens, more accurate ideas or principles than the ones she is applying in her attempts to understand the world. All of this might well prevent her even noticing some relevant aspects of the world. (When both I and a doctor gaze at an X-ray, only one of us notices much of medical relevance.)

That list of realistically possible sources of fallibility — philosophers will suspect — could be continued indefinitely. And its scope is disturbingly expansive. Thus, even when you do not feel as though a belief of yours has been formed or maintained in some way that manifests any of those failings, you could be mistaken about that. This is a factual matter; or so most philosophers will say. On any given occasion, it is an empirical question as to whether in fact you are being fallible in one of those ways. (Notably, it is not simply a matter of whether you are feeling fallible.) Accordingly, many epistemologists have paid attention to pertinent empirical research by psychiatrists, neurologists, biologists, anthropologists, and the like, into actual limitations upon human cognitive powers. Data uncovered so far have unveiled the existence of much fallibility. (See, for example, Nisbett and Ross 1980; Kahneman, Slovic, and Tversky 1982.)

Some epistemologists have found this to be worrying in itself. Still, has enough fallibility thereby been uncovered to justify an acceptance of fallibilism? (Remember that fallibilism, in its most general form, is the thesis that all of our beliefs are fallible.) This, too, is at least partly an empirical question. It is the question of just how fallible people are as a group — and, naturally, of just how much a given individual ever manages to transcend such limitations upon people in general. How fallibly, as it happens, do people ever form and maintain beliefs? Is every single one of us fallible enough to render every single one of our beliefs fallible?

It is difficult, perhaps impossible, to use personal observations and empirical research to answer those questions conclusively. (And fallibilism would deny that this is possible anyway.) For presumably such fallibilities would also afflict people as observers and as scientific inquirers. Hence, this would occur even when theorists — let alone casual observers — are investigating those fallibilities. The history of science reveals that many scientific theories which were at one time considered to be true have subsequently been supplanted, with later theories deeming the earlier ones to have been false.

Is science therefore especially fallible as a way of forming beliefs about the world? That is a matter of some philosophical dispute. Empirical science is performed by fallible people, often involving much fallible coordination among themselves. It relies on the fallible process of observation. And it can generate quite complicated theories and beliefs — with that complexity affording scope for marked fallibility. Yet in spite of these sources of fallibility nestling within it (when it is conceived of as a method), science might well (when it is conceived of as a body of theses and doctrines) encompass the most cognitively impressive store of knowledge that humans have ever amassed. Even if not all of its theories and beliefs are true (and therefore not all of them are knowledge), a significant percentage of them seem to have a strong case for being knowledge. Is that compatible with science’s fallibility, even its inherent fallibility, as a method? Or are none of its theories and beliefs knowledge, simply because (as later scientists will realize) some of them are not? Alternatively, are none of them knowledge, because none of them are conclusively justified? That depends on what kind of knowledge scientific knowledge would be. This is a subtle matter, asking us first to consider in general whether there can be inconclusively justified knowledge at all. Section 9 will indicate how epistemologists might take a step towards answering that question. It will do so by discussing the idea of fallible knowledge. (And section 10 will comment on science and fallible justification.)

6. Philosophical Sources of Fallibilism: Hume

Section 5 indicated some empirical grounds on which fallibilism might be thought to be true. Epistemologists have also provided non-empirical arguments for fallibilism, both in its strongest form and in important-but-weaker forms. This section and the next will present two of those arguments.

One of them comes from the eighteenth-century Scottish philosopher David Hume’s classic invention of what is now called inductive skepticism. (For a succinct version of his argument, see his 1902 [1748], sec. IV. For some sense of the philosophical and historical dimensions of that notion, see Buckle 2001: part 2, ch. 4.) At the core of his skeptical argument was an important-even-if-possibly-not-wholly-general fallibilism. Hume’s argument showed, at the very least, the inescapable fallibility of an extremely significant kind of belief — any belief which either is or could be an inductive extrapolation from observational data. According to Hume, no beliefs about what is yet to be observed (by a particular person or some group) can be infallibly established on the basis of what has been observed (by that person or that group). Consider any use of present and past observations, perhaps to derive and at least to support, some view that aims to describe aspects of the world that have not yet been observed. (Standard examples include people’s seeking to justify the belief that the sun will rise tomorrow, by using past observations of it having risen, and people’s many observations of black ravens supposedly justifying the belief that all ravens are black.) Hume noticed that observations can never provide conclusive assurance — a proof — that the world is not about to change from what it has thus far been observed to be like. Even if all observed Fs have been Gs, say, this does not entail that any, let alone all, of the currently unobserved Fs are also Gs. No such guarantee can be given by the past observations. And this is so, no matter how many observations of Fs have been made (short of having observed all of them, while realizing that this has occurred).

Hume presents his argument as one that uncovers a limitation upon the power or reach of reason — that is, upon how much can be revealed to us by reason as such. Possibly, this is in part because that is the non-trivial aspect of his argument. Overall, his argument is describing a limitation upon the power or reach both of reason and of observation — upon how far these faculties or capacities can take us towards proving the truth of various beliefs which, inevitably, we find ourselves having. But that limitation reflects both a point that is non-trivially true (about reason) and one that is trivially true (about observation). Hume combines those two points (as follows) to attain his fallibilism. (1) It is trivially true that any observations that have been made at and before a given time have not been of what, at that time, is yet to be observed. (2) It is true (although not trivially so) that our powers of reason face a limitation of their own, one that leaves them unable to overcome (1)’s limitation upon observation. Our capacity to reason — our powers simply of reflection — must concede that, regardless of however unlikely this might seem at the time, the unobserved Fs could be different in a relevant way from those that have been observed. Hence, in particular, whatever powers of reason we might use in seeking to move beyond our observations will be unable to eliminate the possibility that the presently unobserved Fs are quite different (as regards being Gs) from the Fs that have been observed. Our powers of reason must concede — again, even if this seems unlikely at the time — that continued observations of Fs might be about to begin giving results that are quite different to what such observations have previously revealed about Fs being Gs. Obviously, the past observations of Fs (all of which, we are supposing, were Gs) do not tell us that this is likely to occur, let alone that it is about to do so. But, crucially, pure reason tells us that it could be about to occur. (3) Consequently, if we combine (1) and (2), we reach this result:

Neither observation nor reason can reveal with rational certainty anything about the nature of any of the Fs that are presently unobserved.

In other words, there is always a “logical gap” between the observations of Fs that have been made (either by some individual or a group) and any conclusion regarding Fs that have not yet been observed (by either that individual or that group).

Our appreciation of that gap’s existence is made specific — even dramatic — by the Humean thought that the world could be about to change in the relevant respect. We thus see that fallibility cannot be excluded from any justification which we might think is present for a belief that either is or could be an extrapolation from some observations. Such a belief could be about the future (“The sun will rise tomorrow”), the presently unobserved past (“Dinosaurs used to live here”), populations (“The cats in this neighborhood are vicious”), and so on. Beliefs like that are pivotal in our mental lives, it seems.

Indeed, as some philosophers argue, they can be all-but-ubiquitous — even surprisingly so. When you believe that you are seeing a cat, is this an extrapolation from observations? At first glance, it seems straightforwardly observational itself. Yet maybe it is an extrapolation in a less obvious way. Perhaps it is an extrapolation from both your present sensory experience and similar ones that you have had in the past. Perhaps it is implicitly a prediction that the object in front of you is not about to begin looking and acting like a dog, and that it will continue looking and acting like a cat. (Is this part of what it means to say that the object is a cat — a genuine-flesh-and-blood-physical-object cat?) Are even simple observational beliefs therefore concealed or subtle extrapolations? If they are to be justified, will this need to be inductive justification?

If so, the Humean verdict (when formulated in contemporary epistemological language) remains that, even at best, such beliefs are only fallibly justified. Any justification for them would need to be observations from which they might have been extrapolated (even if in fact this is not, psychologically speaking, how they were reached). And no such justification could ever rationally eliminate the possibility that any group of apparently supportive observations is misleading as to what the world would be found to be like if further observations were to be made.

That is Hume’s inductive fallibilism — a fallibilism about all actual or possible inductive extrapolations from observations. Many interpreters believe that his argument established — or at least that Hume meant it to establish — more than a kind of fallibilism. This is why it is generally called an argument for inductive skepticism, not just for inductive fallibilism. (On Hume’s transition from fallibilism to skepticism, see Stove 1973.) Accordingly, his conclusion is sometimes presented more starkly, as saying that observations never rationally show or establish or support or justify at all any extrapolations beyond observational data, even ones that purport only to describe a likelihood of some observed pattern’s being perpetuated. At its most combative, his conclusion might be said — and sometimes is, especially by non-philosophers — to reveal that predictions are rationally useless or untenable, or that any beliefs “going beyond” observational reports are, rationally speaking, nothing more than guesses. Whether or not that skeptical thesis is true depends, for a start, upon whether there can be such a thing as fallible justification — or whether, once fallibility is present, justification departs. Section 10 will consider that issue.

In any case, Hume’s fallibilism is generally considered by philosophers (for instance, see Quine 1969; Miller 1994: 2-13; Howson 2000: ch. 1) to have struck a serious blow against the otherwise beguiling picture of science as delivering conclusive knowledge of the inner continuing workings of the world. It is not uncommon for people to react to this interpretation of Hume’s result by inferring that therefore science — with its reliance upon observations as data, with which it supports its predictions and more general principles and posits — never really gives us knowledge of a world beyond those observations. The appropriateness of that skeptical inference depends on whether or not there can be such a thing as fallible knowledge — or whether, once fallibility is present, knowledge departs. Section 9 will consider that issue.

7. Philosophical Sources of Fallibilism: Descartes

Does Hume’s reasoning (described in section 6) support fallibilism in its most general form? It does, if all beliefs depend for their justification upon extrapolations from observational experience. And section 6 also indicated briefly how there can be more beliefs like that than we might realize. Nevertheless, the usual philosophical reading of Hume’s argument does not assume that the argument shows that all beliefs are to be supported either fallibly or not at all. We should therefore pay attention to another equally famous philosophical argument, one whose conclusion is definitely that no beliefs at all are conclusively justified.

This argument comes to us from the seventeenth-century French philosopher René Descartes. In his seminal Meditations on First Philosophy (1911 [1641]), Descartes ended Meditation I skeptically, denying himself all knowledge. How was that skeptical conclusion derived? It was based upon a fallibilism — a wholly general fallibilism. And his argument for that fallibilism — the Evil Genius (or Evil Demon) argument, as it is often called — may be presented in this way:

Any beliefs you have about … well, anything … could be present within you merely because some evil genius or demon has installed them there. And they might have been installed so as to deceive you: maybe any or all of them are false. Admittedly, you do not feel as if this has happened within you. Nonetheless, it could have done so. Note that the evil genius is not simply some other person, even an especially clever one. Rather, it would be God-like in pertinent powers although malevolent in accompanying intent — mysteriously able to implant any false beliefs within you so that their presence will feel natural to you, leaving you unaware that any of your beliefs are bedeviled by this untoward causal origin. You will never notice the evil genius’s machinations. All will seem normal to you within your mind. It will feel just as it would if you were observing and thinking carefully and insightfully.

Is that state of affairs possible? Indeed it is (said Descartes, and most epistemologists have since agreed with him about that). Moreover, if it is always present as a possibility, then one pressing part of it — your being mistaken — is always present as a possibility. This is always present, as a possibility afflicting each of your beliefs. What is true of you in this respect, too, is true of everyone. The evil genius could be manipulating all of our minds. Hence, any belief could be false, no matter who has it and no matter how much evidence they have on its behalf. Even the evidence, after all, could have been installed and controlled by an evil genius.

Interestingly, the reference to an evil genius as such, provocative though it is, was not essential even to Descartes’ own reasoning. In Meditation I, he had already — immediately prior to outlining the Evil Genius argument — presented a sufficiently fallibilist worry. It concerned the possibility of his having been formed or created in some way — whatever way that might be — which would leave him perpetually fallible. He wanted to believe that God was his creator. However (he wondered), would God create him as a being who constantly makes mistakes, or who is at least always liable to do so? God would be powerful enough to do this. But (Descartes also thought) surely God would have had no reason to allow him to make even some mistakes. Yet manifestly Descartes does make them. So (he inferred), he could not take for granted at this early stage of his inquiry (as it is portrayed in his Meditations) that he has actually been formed or created by a perfect God. The evidence of his fallibility opens the door to the possibility that he does not have that causal background. So (he continues), maybe his causal origins are something less than perfect, as of course they would be if anything less than a perfect God were involved in them. In that event, however, he is even more likely to make mistakes than he would be if God was his creator. In one way or the other, therefore (concludes Descartes), fallibility is unavoidable for him: no belief of his is immune from the possibility of being mistaken. Thus, fallibilism is thrust upon Descartes by this reasoning. (He realizes, nonetheless, that it is subtle reasoning. He might not retain it in his thinking. He might overlook his fallibility, if he is not mentally vigilant. Hence, he proceeds to describe the evil genius possibility to himself, as a graphic way of holding the fallibilism fast in his mind. The Evil Genius argument is, in effect, a philosophical mnemonic for him.)

Descartes himself did not remain a fallibilist. He believed that (in his Meditation II) he had found a convincing answer to that fallibilist argument. This answer was his Cogito, one of philosophy’s emblematic moments, and it arose via the following reasoning. Descartes thought that if ever in fact he is being deceived by an evil genius, at least he will thereby be in existence at these moments. (It is impossible to be an object of deception without existing.) The deception would be inflicted upon him while he exists as a thinker — specifically, as someone thinking whatever false thoughts are being controlled within him by the evil genius. But this entails (reasoned Descartes) that there is a kind of thought about which he cannot be deceived, even by an evil genius. Because he can know that he is having a particular thought, he can know that he exists at that time. And so he thought, “I think, therefore I am.” (This is the usual translation into English of the “Cogito, ergo sum” from Latin. The latter version is from Descartes’ Discourse on Method.) He would thereby know that much, at any rate (inferred Descartes). He need not — and at this point in his inquiry he does not think that he can — know which, if any, of his beliefs about the wider world are true. Nonetheless, he has knowledge of his inner world — knowledge of his own thinking. He would know not only that he is thinking, but even what it is that he is thinking. These beliefs about his mental life are conclusively supported, too, because — as he has just argued — they are beyond the relevant reach of any evil genius. No evil genius can give him these thoughts (that he is thinking and hence existing) and thereby be deceiving him.

But most subsequent epistemologists have been more swayed by the fallibilism emerging from the Evil Genius argument than by Descartes’ reply to that argument. (For a discussion of these issues in Descartes’ project, see Curley 1978; Wilson 1978.) One common epistemological objection to his use of the Cogito is as follows. How could Descartes have known that it was he in particular who was thinking? Shouldn’t he have rested content with the more cautious and therefore less dubitable thought, “There is some thinking occurring” — instead of inferring the less cautious and therefore more dubitable thought, “I am thinking”? That objection was proposed by Georg Lichtenberg in the eighteenth century. (For a criticism of it, see Williams 1978: ch. 3.) An advocate of it might call upon such reasoning as this:

In order to know that it is his own thinking, as against just some thinking or other, Descartes has to know already — on independent grounds — that he exists. However, in that event he would not know of his existing, only through his knowing of the thinking actually occurring: he would have some other source of knowledge of his existence. Yet his Cogito had been relied upon by him because he was assuming that his knowing of the thinking actually occurring was (in the face of the imagined evil genius) the only way for him to know of his existence.

That reasoning would claim to give us the following results. (1) Descartes does not know that he is thinking — because he would have to know already that he exists (in order to be the subject of the thinking which is noticed), and because he can know that he exists only if he already knows that he is thinking (the latter knowledge being all that is claimed to be invulnerable to the Evil Genius argument). (2) Similarly, Descartes does not know that he exists — because he would have to know already that he is thinking (this being all that is claimed to be invulnerable to the evil genius argument), and because he could know that he is thinking only by already knowing that he exists (thereby being able to be the subject of the thinking that is being noticed). (3) And once we combine those two results, (1) and (2), what do we find? The objection’s conclusion is that Descartes knows of his thinking and of his existence all at once — or not at all. In short, he is not entitled — as a knower — to the “therefore” in his “I think, therefore I exist.”

That is one possible objection to the Cogito. Still, even if it succeeds on its own terms, it leaves open the following question. Can Descartes have all of that knowledge — the knowledge of his thinking and the knowledge of his existence — all at once? This depends on whether, once he has doubted as strongly and widely as he has done, he can have knowledge even of what is in his own mind. In the mid-twentieth century, the Austrian philosopher Ludwig Wittgenstein mounted a deep challenge to anything like the Cogito as a way of grounding our thought and knowledge. Was Descartes legitimately using words at all so as to form clearly known thoughts, such as “I am thinking”? How could he know what these even mean, unless he is applying some understood language? And Wittgenstein argued that no one could genuinely be thinking thoughts which are not depending upon an immersion in a “public” language, presumably a language shared by other speakers, certainly one already built up over time. In which case, Descartes would be mistaken in believing that, even if the possibility of an evil genius imperils all of his other knowledge, he could retain the knowledge of his own thinking. For even that thinking would have its content only by using terms borrowed from a public language. Hence, Descartes would have to be presupposing some knowledge of that public world, even when supposedly retreating to the inner comfort and security of knowing just what he is thinking. (It should be noted that Wittgenstein himself did not generally direct his reasoning — his Private Language argument, as it came to be called — specifically against Descartes by name. For Wittgenstein’s reasoning, see his 1978 [1953] secs. 243-315, 348-412.)

Of course, even if the Cogito does in fact succeed, epistemologists all-but-unite in denying that such conclusiveness would be available for many — or perhaps any — other beliefs. Accordingly, we would still confront an all-but-universal fallibilism, with Descartes having provided an easy way to remember our all-but-inescapable fallibility. In any case, it remains possible that the Cogito does not succeed, and that instead the evil genius argument shows that no belief is ever conclusively justified. Descartes’ argument is not the only one for such a fallibilism. But most epistemologists still refer to it routinely and with some respect, as being a paradigm argument for the most general form of fallibilism.

8. Implications of Fallibilism: No Knowledge?

If we were to accept that fallibilism is true, to what else would we thereby be committed? In particular, what further philosophical views must we hold (all else being equal) if we hold fallibilism?

Probably the most significant idea that arises, in response to that question, is the suggestion that any fallibilist about justification has to be a skeptic about the existence of knowledge. (There is also the proposal that she must be a skeptic about the existence of justification. Section 10 will discuss that proposal.) This potential implication has made fallibilism particularly interesting to many philosophers. Should we accept the skeptical thesis that because (as fallibilists claim) no one is ever holding a belief infallibly, no one ever has a belief which amounts to being knowledge? In this section and the next, we will consider that question — first (in this section) by examining how one might argue for the skeptical thesis, next (in section 9) by seeing how one might argue against it.

That hypothesized skeptic is reasoning along these lines:

  1. Any belief, if it is to be knowledge, needs to be conclusively justified.
  2. No belief is conclusively justified. [Fallibilism tells us this.]
  3. Hence, no belief is knowledge. [This follows from 1-plus-2.]

Fallibilism gives us 2; deductive logic gives us 3 (as following from 1 and 2); and in this section we are not asking whether fallibilism is true. (We are assuming – for the sake of argument – that it is.) So, our immediate challenge is to ask whether 1 is true. Is it a correct thesis about knowledge? Does knowledge require infallibility (as 1 claims it does)? The rest of this section will evaluate what are probably the two most commonly encountered arguments for the claim that knowledge is indeed like that.

(1) Impossibility. Many people say this about knowledge:

If you have knowledge of some aspect of the world, it is impossible for you to be mistaken about that aspect. (An example: “If you know that it’s a dog, you can’t be mistaken about its being one.”)

We may call that the Impossibility of Mistake thesis. Its advocates might infer, from the conjunction of it with fallibilism, that no one ever has any knowledge. Their reasoning would be like this:

Because no one ever has conclusive justification for a belief, mistakes are always possible within one’s beliefs. Hence, no beliefs attain the rank of knowledge. (We would just think — mistakenly — that often knowledge is present.)

But almost all epistemologists would regard that sort of inference as reflecting a misunderstanding of what the Impossibility of Mistake thesis is actually saying. More specifically, they will say that there is a misunderstanding of how the term “impossible” is being used in that thesis. Here are two possible claims that the Impossibility of Mistake thesis could be thought to be making:

Any instance of knowledge is — indeed, it must be — directed at what is true.  (Knowledge entails truth.)

Any instance of knowledge has as its content what, in itself, could not possibly be false. (Knowledge entails necessary truth.)

The first of those two interpretations of the Impossibility of Mistake thesis says that knowledge, in itself, has to be knowledge of what is true. The second of the two possible interpretations says that knowledge is of what, in itself, has to be true. The two claims will be correlatively different in what they imply.

Epistemologists will insist that the first possible interpretation (which could be called the Necessarily, Knowledge Is of What Is True thesis) is manifestly true — but that it does not join together with fallibilism to entail skepticism. Recall (from (2) in section 2) that fallibilism does not deny that there can be truths among our claims and thoughts. It denies only that we are ever conclusively justified in any specific claim or thought as to which claims or thoughts are true. So, while the Necessarily, Knowledge Is of What Is True thesis entails that any case of knowledge would be knowledge of a truth, fallibilism — because it does not deny that there are truths — does not entail that there is no knowledge.

Epistemologists will also deny that the second possible interpretation (which may be called the Knowledge Is of What Is Necessarily True thesis), even if it is true, entails skepticism. Recall (this time from (3) in section 2) that fallibilism is not a thesis which denies that knowledge could ever be of contingent truths. So, while the Knowledge Is of What Is Necessarily True thesis entails that any case of knowledge would be knowledge of a necessary truth, fallibilism — because it does not, in itself, deny that there is knowledge of contingent truths — does not entail that there is no knowledge. (But most epistemologists, incidentally, will deny that the Knowledge Is of What Is Necessarily True thesis is true. They believe that — if there can be knowledge at all — there can be knowledge of contingent truths, not only of necessary ones.)

(2) Linguistic oddity. Another way in which people are sometimes led to deny that a wholly general fallibilism is compatible with people ever having knowledge is by their reflecting on some supposed linguistic infelicities. Imagine saying or thinking something like this:

“I know that’s true, even though I could be mistaken about its being true.” (An example: “I know that it’s raining, even though I could be mistaken in thinking that it is.”)

That is indeed an odd way to speak or think. Let us refer to it as The Self-Doubting Knowledge Claim. Epistemologists also refer to such claims as concessive knowledge-attributions — for short, as CKAs. Should we infer, from that claim’s being so linguistically odd, that no instance of knowledge can allow the possibility (corresponding to the “could” in The Self-Doubting Knowledge Claim) of being mistaken? Would this imply the incompatibility of fallibilism with anyone’s ever having knowledge? Does this show that, whenever one’s evidence in support of a belief does not provide a conclusive proof, the belief fails to be knowledge?

Few epistemologists will think so. They are yet to agree on what, exactly, the oddity of a sentence like The Self-Doubting Knowledge Claim reflects. (Very roughly: there is some oddity in that claim’s expressed mixture of confidence and caution.) But few of them believe that the oddity — however, ultimately, it is to be understood — will imply that knowledge cannot ever be fallible. Their usual view is that the oddity will be found to reside only in the talking or the thinking — in someone’s actively using — any such sentence. And this could be so (they continue) without the sentence’s also actually being false, even when it is being used. Some sentences which clearly are internally logically consistent — and hence which in some sense could be true — cannot be used without a similar linguistic oddity being manifested. Try saying, for example, “It’s raining, but I don’t believe that it is.” As the twentieth-century English philosopher G. E. Moore remarked (and his observation has come to be called Moore’s Paradox), something is amiss in any utterance of that kind of sentence. (For more on Moore’s Paradox, see Sorensen 1988, ch. 1; Baldwin 1990: 226-32.) This particular sentence — “It’s raining, but I don’t believe that it is” — is manifestly odd, seemingly in a similar way to any utterance of The Self-Doubting Knowledge Claim. Yet this does not entail the sentence’s being false. For each half of it could well be true; and they could be true together. The fact that it is raining is logically consistent with the speaker’s not believing that it is. (She could be quite unaware of the weather at the time.) So, the sentence could be true within itself, no matter that it cannot sensibly be uttered, say. That is, its content — what it reports — could be true, even if it cannot sensibly be asserted — as a case of reporting — in living-and-breathing speech or thought.

And the same is true (epistemologists will generally concur) of The Self-Doubting Knowledge Claim, the analogous sentence about knowledge and the possibility of being mistaken. Are they correct about that? The next section engages with that question.

9. Implications of Fallibilism: Knowing Fallibly?

The question with which section 8 ended amounts to this: is it possible for there to be fallible knowledge? If The Self-Doubting Knowledge Claim could ever be true, this would be because at least some beliefs are capable of being knowledge even when there is an accompanying possibility of their being mistaken. Any such belief, it seems, would thereby be both knowledge and fallible.

Many epistemologists, probably the majority, wish to accept that there can be fallible knowledge (although they do not always call it this). Few of them are skeptics about knowledge: almost all epistemologists believe that everyone has much knowledge. But what do they believe about the nature of such knowledge? When an epistemologist attributes knowledge, what — more fully — is being attributed? In general, epistemologists also accept that (for reasons such as those outlined in sections 5 through 7) knowledge is rarely, if ever, based upon infallible justification: they believe that there is little, if any, infallible justification. Hence, most epistemologists, it seems, accept that when people do gain knowledge, this usually, maybe always, involves fallibility.

Epistemologists generally regard this fallibilist approach as more likely to generate a realistic conception of knowledge, too. Their aim is to be tolerant of the cognitive fallibilities that people have as inquirers, while nevertheless according people knowledge (usually a great deal of it). The knowledge would therefore be gained in spite of the fallibility. And, significantly, it would be a kind of knowledge which somehow reflects and incorporates the fallibility. Indeed, it would thereby be fallible knowledge. (It would not be infallible knowledge coexisting with fallibility existing only elsewhere in people’s thinking.) With this strategy in mind, then, epistemologists who are fallibilists tend not to embrace skepticism.

Nor (if section 8 is right) should they do so. That section reported (i) the two reasons most commonly thought to show that fallibility in one’s support for a belief is not good enough if the belief is to be knowledge, along with (ii) the explanations of why (according to most epistemologists) those reasons mentioned in (i) are not good enough to entail their intended result. Given (ii), therefore, (i) will at least fail to give us infallible justification for thinking that fallible knowledge is not possible. Accordingly, perhaps such knowledge is possible. But if it is, then what form would it take?

Almost all epistemologists will adopt this generic conception of it:

Any instance of fallible knowledge is a true belief which is at least fallibly (and less than infallibly) justified.

(And remember that F*, in section 4, gave us some sense of what fallible justification is.) Let us call this the Fallible Knowledge Thesis. It is an application, to fallible knowledge in particular, of what is commonly called the Justified-True-Belief Analysis of Knowledge. (For an overview of that sort of analysis, see Hetherington 1996.) As stated, the Fallible Knowledge Thesis is quite general, in that it says almost nothing about what specific forms the justification within knowledge might take; all that it does require is that the justification would provide only fallible support.

Nonetheless, generic though it is, the question still arises of whether the Fallible Knowledge Thesis is ever satisfiable, let alone actually satisfied. And that question readily leads into this more specific one: Can a true belief ever be knowledge without having its truth entailed by the justification which is contributing to making the belief knowledge? (Sometimes this talk of justification is replaced by references to warrant, where this designates the justification and/or anything else that is being said to be needed if a particular true belief is to be knowledge. For that use of the term “warrant,” see Plantinga 1993.) Section 8 has disposed of some objections to there being any fallible knowledge; and the previous paragraph has gestured at how — via the Justified-True-Belief Analysis — one might conceive of fallible knowledge. Nonetheless, there could be residual resistance to accepting that there can be fallible knowledge like that. Undoubtedly, some people will think, “There just seems to be something wrong with allowing a belief or claim to be knowledge when it could be mistaken.”

That residual resistance is not clearly decisive, though. It could well owe its existence to a failure to distinguish between two significantly different kinds of question. The first asks whether a particular belief, given the justification supporting it, is true (and thereby fallible knowledge). The other question asks whether, given that belief’s being true, there is enough supporting justification in order for it to be (fallible) knowledge. The former question is raised from “within” a particular inquiry into the truth of a particular belief. The latter question arises from “outside” that inquiry into that belief’s being true (even if this question is arising within another inquiry, perhaps an epistemological one). There is no epistemologically standard way of designating the relevant difference between those kinds of question. Perhaps the following is a helpful way to clarify that difference.

(1) The not-necessarily-epistemological question as to whether a belief is true. Imagine trying to ascertain whether some actual or potential belief or claim is true. You ask yourself, say, “Do I know whether I passed that exam?” Suppose that you have good — fallibly good — evidence in favor of your having passed the exam. (You studied well. You concentrated hard. You felt confident. Your earlier marks in similar exams have been good.) And now suppose that you recall the Justified-True-Belief Analysis. You apply it to your case. What does it tell you? It tells you just that if your actual or possible belief (namely, the belief that you passed the exam) is true, then — given your having fallibly good evidence supporting the belief — the belief is or would be knowledge, albeit fallible knowledge. But does this reasoning tell you whether the belief is knowledge? It does not. All that you have been given is this conditional result: If your belief is true, then (given the justification you have in support of it) the belief is also knowledge. You have no means other than your justification, though, of determining whether the belief is true; and because the justification is fallible, it gives you no guarantee of the belief’s being true (and thereby of being knowledge). Moreover, if fallibilism is true, then any justification which you might have, no matter how extensive or detailed it is, would not save you from that plight. Thus (given fallibilism), you are trapped in the situation of being able to reach, at best, the following conclusion: “Because my evidence provides fallible justification for my belief, the belief is fallible knowledge if it is true.” At which point, most probably, you will wonder, “Is it true? That’s what I still don’t know. (I have no other way of knowing it to be true.)” And so — right there and then — you are denying that your belief is knowledge, because you are denying that you know it to be true. The fallibility in your justification leaves you dissatisfied, as an inquirer into the truth of a particular belief, at the idea of allowing that it could be knowledge, even fallible knowledge. When still inquiring into the truth of a particular belief, it is natural for you to deny that (even if, as it happens, the belief is true) your having fallible justification is enough to make the belief knowledge.

(2) The epistemological question as to whether a belief is knowledge. But the epistemologist’s question (asked at the start of this section) as to whether there can be fallible knowledge is not asked from the sort of inquirer’s perspective described in (1). The epistemologist is not asking whether your particular belief is true (while noting the justification you have for the belief). That is the question you are restricted to asking, when you are proceeding as the inquirer in (1). The epistemological question is subtly different. It does not imagine a fallibly justified belief — before asking, without making any actual or hypothetical commitment as to the belief’s truth, whether the belief is knowledge. Rather, the epistemologist’s question considers the conceptual combination of the belief plus the justification for it plus the belief’s being true — which is to say, the whole package that, in this case, is deemed by the Justified-True-Belief Analysis to be knowledge — before proceeding to ask whether this entirety is an instance of knowledge. To put that observation more simply, this epistemological question asks whether a belief which is fallibly justified, and which is true, is (fallible) knowledge. This is the question of whether your belief is knowledge, given (even if only for argument’s sake) that it is true. In (1), your focus was different to that. In wondering whether you had passed the exam, you were asking whether the belief is true: you were still leaving open the issue of whether or not the belief is true. And, as you realized, your fallible justification was also leaving open that question. For it left open the possibility of the belief’s falsity.

Consequently, from (1), it is obvious why an inquirer might want infallibility in her justification for a belief’s truth. Infallibility would mean her not having to leave open the question of the belief’s truth. Without infallibility, the possibility is left open by her justification (which is her only indication of whether her belief is true) of her belief being false — and hence not knowledge. (This is so, even if we demand that, in order for an inquirer’s belief to be knowledge, she has to know that it is. That demand is called the KK-thesis (with its most influential analysis and defense coming from Hintikka 1962: ch. 5) — because one’s having a piece of knowledge is taken to require one’s Knowing that one has that Knowledge. Yet even satisfying that demand does not remove the rational doubt described in (1). If the extra knowledge — the knowledge of the initial belief’s being knowledge — is not required to be infallible itself, then scope for doubt will remain as to whether the initial belief really is knowledge.) But if we can either (i) know or (ii) suppose (for the sake of another kind of inquiry) that the belief is true, then we may switch our perspective, so as to be asking a different question. That is what the epistemologist is doing in (2), by adopting the latter, (ii), of these two options. She supposes, for the sake of argument, that the belief is true; then she can ask, “Would the belief’s being both true and fallibly justified suffice for it to be knowledge?” She can do this without knowing at all, let alone infallibly, whether the belief is true. (She will also not know infallibly, at least not via this questioning, whether the belief is knowledge. Yet what else is to be expected if fallibilism is true?)

It is also obvious, from (1), why an inquirer might want infallibility in her justification, insofar as she is wondering whether to say or claim that some actual or potential belief of hers is knowledge. Nonetheless, this does not entail her needing such justification if her belief is to be knowledge. Remember — from (2) in section 8 — that whether one has a specific piece of knowledge could be quite a different matter to whether one may properly claim to have it. Similarly, most epistemologists will advise us not to confuse what makes a belief knowledge with what rationally assures someone that her belief is knowledge. For example, it is possible — according to fallibilist epistemologists in general — for a person to have some fallible knowledge, even if she does not know infallibly which of her beliefs attain that status.

This section began by asking the epistemological question of whether there can be fallible knowledge. And with our having seen — in this section’s (2) — what that question is actually asking, along with — in this section’s (1) — what it is not asking, we should end the section by acknowledging that, in asking that epistemological question, we need not be crediting epistemological observers with having a special insight into whether, in general, people’s beliefs are true. The question of whether those beliefs are true is not the question being posed by the epistemological observer. She is asking whether a particular belief is knowledge, given (even if only for argument’s sake) that it is true and fallibly justified. She is asking this from “above” or “outside” the various “lower level” or “inner” attempts to know whether the given beliefs are true. The other (“lower level”) inquirers, in contrast, are asking whether their fallibly justified beliefs are true. There is fallibility in each of those processes of questioning; they just happen to have somewhat different subject-matters and methods.

We should not leave a discussion of the Fallible Knowledge Thesis without observing that, even if it is correct in its general thrust, epistemologists have faced severe challenges in their attempts to complete its details — to make it more precise and less generic. Over the past forty or so years, there have been many such attempts. But these have encountered one problem after another, mostly as epistemologists have struggled to solve what is often called the Gettier Problem, stemming from a 1963 article by Edmund Gettier.

A very brief word on that problem is in order here. It has become the epistemological challenge of defining knowledge precisely, so as to understand all actual or possible cases of knowledge — where one of the project’s guiding assumptions has been that it is possible for instances of knowledge to involve justification which supplies only fallible support. In other words, the project has striven to find a precise analysis of what the Fallible Knowledge Thesis would deem to be fallible knowledge; and, unfortunately, the Gettier Problem is generally thought by epistemologists still to be awaiting a definitive solution. Such a solution would determine wholly and exactly how fallible a particular justified true belief can be, and in what specific ways it can be fallible, without that justified true belief failing to be knowledge. In the meantime (while awaiting that sort of solution), epistemologists incline towards accepting the Justified-True-Belief Analysis — represented here in the Fallible Knowledge Thesis — as being at least approximately correct. Certainly in practice, most epistemologists treat the analysis as being correct enough — so that it functions well as giving us a concept of knowledge that is adequate to whatever demands we would place upon a concept of knowledge within most of the contexts where we need a concept of knowledge at all. Such epistemologists take the difficulties that have been encountered in the attempts to ascertain exactly how a fallibly justified true belief can manage to be knowledge as being difficulties of mere (and maybe less important) detail, not ones of insuperable and vital principle. Those epistemologists tend to assume that eventually the needed details will emerge, that these will be agreed upon by epistemologists, and hence that the basic idea behind the Fallible Knowledge Thesis will finally and definitively be vindicated. (For more on the history of that epistemological project, see Shope 1983 and Hetherington 2016.)

But again, that definitive vindication is yet to be achieved. And, of course, it will not eventuate if we should be answering “No” to the question (discussed earlier in this section) of whether a true belief which is less than infallibly justified is able to be knowledge. When there is fallibility in the justification for a particular true belief, is this fact already sufficient to prevent that belief from being knowledge? Few epistemologists wish to believe so. What we have found in this section is that they are at least not obviously mistaken in that optimistic interpretation.

10. Implications of Fallibilism: No Justification?

Sometimes epistemologists believe that fallibilism opens the door upon an even more striking worry than the one discussed in section 9 (namely, the possibility of there being no knowledge, due to the impossibility of knowledge’s ever being fallible). Sometimes they infer, from the presence of fallibility, that even justification (let alone knowledge) is absent. That is, once fallibility enters, even justification — all justification — departs. Consequently, those epistemologists — once they accept that a universal fallibilism obtains — are skeptics even about the existence of justification. (For an example of such an approach, see Miller 1994: ch. 3.)

How would that interpretation of the impact of fallibilism be articulated? In effect, the idea is that if evidence, say, is to provide even good (let alone very good or excellent or perfect) guidance as to which beliefs are true, it is not allowed to be fallible. No justification worthy of the name is able to be merely fallible. And from that viewpoint, of course, skepticism beckons insofar as no one is ever capable of having any infallible justification. If fallibility is rampant, yet infallibility is required if evidence or the like is ever to be supplying real justification, then no real justification is ever supplied. In short, no beliefs are ever justified.

That is a wholly general skepticism about justification, emerging from a wholly general fallibilism. A possible example of that form of skepticism would be the one with which Descartes ended his Meditation I. Cartesian evil genius skepticism would say that, because there is always the possibility of Descartes’ evil genius (in section 7) controlling our minds, any evidence or reasoning that one ever has could be a result just of the evil genius’s hidden intrusion into one’s mind. The evil genius — by making everything within one’s mind false and misleading — could render false all of one’s evidence, along with all of one’s ideas as to what is good reasoning. None of one’s evidence, and none of one’s beliefs as to how to use that evidence, would be true. However, if there were no truth anywhere in one’s thinking (with one never realizing this), then no components of one’s thinking would be truth-indicative or truth-conducive. No part of one’s thinking would ever lead one to have an accurate belief. Continually, one would both begin and end with falsity. And there are many epistemologists in whose estimation this would mean that no part of one’s thinking is ever really justifying some other part of one’s thinking. For justification is usually supposed to have some relevant link to truth. And presumably there would be no such link, if every single element in one’s thinking is misleading — as would be the case if an evil genius was at work. Is that possible, then? Moreover, is it so dramatic a possibility that if we are forever unable to prove that it is absent, then our minds will never contain real justification for even some of our beliefs?

A potentially less general skepticism about justification would be a Humean inductive skepticism (mentioned in section 6). The thinking behind this sort of skepticism infers — from the inherent fallibility of any inductive extrapolations that could be made from some observations — that no such extrapolation is ever even somewhat rational or justifying. Again, the skeptical interpretation of Humean inductive fallibilism is that, given that all possible extrapolations from observations are fallible, neither logic nor any other form of reason can favor one particular extrapolation over another. The fallibilism implies that there is fallibility within any extrapolation: none are immune. And the would-be skeptic infers from this that, once there is such widespread fallibility, there may as well be a complete absence of any pretence at rationality. The fallibility will be inescapable, even as we seek to defend the rationality of one extrapolation over another. Why is that? Well, we could mount such a defense only by pointing to one sort of extrapolation’s possessing a better past record of predictive success, say. But we would be pointing to that better past record, only in order to infer that such an extrapolation is more trustworthy on the present occasion. And that inference would itself be an inductive extrapolation. It, too, is therefore fallible. Accordingly, if there was previously a need to overcome inductive fallibility (with this need being the reason for consulting the past records of success in the first place), then there remains such a need, even after past records of success have been consulted. In this way, it is the fallibility’s inescapability that generates the skepticism.

Yet, as we noted earlier, most epistemologists would wish to evade or undermine skeptical arguments such as those ones — arguments that seek to convert a kind of fallibilism into a corresponding skepticism. How might this non-skeptical maneuver be achieved? There has been a plethora of attempts, too many to mention here. (For one survey, see Rescher 1980.) Moreover, no consensus has developed on how to escape skeptical arguments like these. That issue is beyond the scope of this article.

What may usefully (even if generically) be described here, however, is a fundamental choice as to how to interpret the force of fallibilism within our cognitive lives. Any response to the skeptical challenges will make that choice (even if usually implicitly and in some more specific way). The basic choice will be between the following two underlying pictures of what a wholly general fallibilism would tell us about ourselves:

(A) The inescapable fallibility of one’s cognitive efforts would be like the inescapable limits — whatever, precisely, these are — upon one’s bodily muscles. These limit what one’s body is capable of — while nonetheless being part of how it achieves whatever it does achieve. Inescapable fallibility would thus be like a background limitation — always present, sometimes a source of frustration, but rarely a danger. When used appropriately, muscles strengthen themselves in accomplished yet limited ways. Would the constant presence of fallibility be like a (fallibly) self-correcting mechanism?

(B) Inescapable fallibility would be like a debilitating illness which “feeds upon” itself. It would become ever more dangerous, as its impact is compounded by repeated use. This would badly lower the quality of one’s thinking. (For a model of that process, notice how easily instances of minor fallibility can interact so as to lead to major fallibility. For example, a sequence in which one slightly fallible piece of evidence after another is used as support for the next can end up providing very weak — overly fallible — support: [80%-probabilification X 80%-probabilification X 80%-probabilification X 80%-probabilification]

How are we to choose between (A) and (B) — between the Limited Muscles model of fallibilism and the Debilitating Illness model of it?

Because most epistemologists are non-skeptics, they favor (A) — the Limited Muscles model. This is not to insist that thinking in an (A)-influenced way is bound to succeed against skeptical arguments. The point right now is simply that this way of thinking is one possible goal for an epistemologist. It is the goal of finding some means of successfully understanding and defending an instance of the Limited Muscles model. What is described by that model would be such a theorist’s desired way to conceive, if this is possible, of the general idea of inescapable fallibility. She will seek to conceive of inescapable fallibility as being manageable, even useful. Hence, the Limited Muscles model is a framework which — in extremely general terms — she will hope allows her to understand — in more specific terms — the nature and significance of fallibilism. Perhaps the most influential modern example of this approach was Quine’s (1969), centered upon a famous metaphor from Neurath (1959 [1932/33], sec. 201). That metaphor portrays human cognitive efforts as akin to a boat, afloat at sea. The boat has its own sorts of fallibility. It is subject to stresses and cracks. And how worrying is that? Must the boat sink whenever those weaknesses manifest themselves? No, because that is not how boats usually function. In general, repairs can be made. This may occur even while the boat is still at sea. Structurally, it is strong enough to support repairs to itself, even as it continues being used, even while making progress towards its destination. Neurath regarded cognitive progress as being like that — as did Quine, who further developed Neurath’s model. On what Quine called his “naturalized” conception of epistemology (a conception that many subsequent thinkers have sought to make more detailed and to apply more widely), human observation and reason make cognitive progress in spite of their fallibility. They do so, even when discovering their own fallibility — finding their own stresses and cracks. Must they then sink, floundering in futility? No. They continue being used, often while repairing their own stresses and cracks — reliably correcting their own deliverances and predictions. Section 5 asked whether science is an especially fallible method. As was also noted, though, science provides impressive results. Indeed, it was Quine’s favored example of large-scale cognitive progress. How can that occur? How can scientific claims — including so many striking ones — be justified, in spite of the fallibility that remains? Maybe science is like a ship that carries within it some skilled and imaginative artisans (carpenters, welders, electricians, and the like). Not only can it survive; it can become more grand and capable when being repaired at sea. (Even so, is such cognitive progress best described in probabilistic terms? On that possibility, implied by Humean fallibilism, see Howson 2000.)

Naturally, in contrast to that optimistic model for thinking about fallible justification, skeptics will prefer (B) — the Debilitating Illness model. We have examined (in sections 6 and 7) a couple of specific ways in which they might try to instantiate that general model. We have also seen (in sections 8 through 10) some reasons why those skeptics might not be right. Perhaps they overstate the force of fallibilism — inferring too much from the facts of fallibility. In any case, the present point is that skeptics (like non-skeptics) seek specific arguments in pursuit of a successful articulation and defense of an underlying picture of inescapable fallibility. Both skeptics and non-skeptics thereby search for an understanding of fallibilism’s nature and significance. They simply reach for opposed conceptions of what fallibilism implies about people’s ability to observe and to reason justifiably.

So, there is a substantial choice to be made; and each of us makes it, more or less carefully and consciously, when reflecting upon these topics. Which of those two basic interpretive directions, then, should we follow? The intellectual implications of this difficult choice are exhilaratingly deep.

11. References and Further Reading

  • Baldwin, T. G. E. Moore. London: Routledge, 1990. 226-32.
    • On Moore’s paradox.
  • Buckle, S. Hume’s Enlightenment Tract: The Unity and Purpose of An Enquiry Concerning Human Understanding. Oxford: Oxford University Press, 2001. Part 2, chapter 4.
    • On Hume’s famous skeptical reasoning in his first Enquiry.
  • Conee, E. and Feldman, R. Evidentialism: Essays in Epistemology. Oxford: Oxford University Press, 2004.
    • A traditional (and popular) approach to understanding the nature of epistemic justification.
  • Curley, E. M. Descartes against the Skeptics. Cambridge, Mass.: Harvard University Press, 1978.
    • On Descartes’ skeptical doubting.
  • Descartes, R. The Philosophical Works of Descartes, Vol. I, (eds. and trans.) E. S. Haldane and G. R. T. Ross. Cambridge: Cambridge University Press, 1911 [1641].
    • Contains both the Discourse and the Meditations. These include both the Evil Genius argument and the Cogito.
  • Feldman, R. “Fallibilism and Knowing That One Knows.” The Philosophical Review 90 (1981): 266-82.
    • On the nature and availability of fallible knowledge.
  • Gettier, E. L. “Is Justified True Belief Knowledge?” Analysis 23 (1963): 121-3.
    • The genesis of the Gettier Problem.
  • Goldman, A. I. “What is Justified Belief?” In G. S. Pappas (ed.), Justification and Knowledge: New Studies in Epistemology. Dordrecht: D. Reidel, 1979.
    • An influential analysis of the nature of epistemic justification.
  • Hetherington, S. Knowledge Puzzles: An Introduction to Epistemology. Boulder, Colo.: Westview Press, 1996.
    • Includes an overview of many of the commonly noticed difficulties posed by the Gettier problem for our attaining a full understanding of fallible knowledge.
  • Hetherington, S. “Knowing Failably.” Journal of Philosophy 96 (1999): 565-87.
    • Describes the genus of which fallible knowledge is a species.
  • Hetherington, S. “Fallibilism and Knowing That One Is Not Dreaming.” Canadian Journal of Philosophy 32 (2002): 83-102.
    • Shows how fallibilism need not lead to skepticism about knowledge.
  • Hetherington, S. “Concessive Knowledge-Attributions: Fallibilism and Gradualism.” Synthese 190 (2013): 2835-51.
    • A fallibilist interpretation of concessive knowledge-attributions (instances of the Self-Doubting Knowledge Claim).
  • Hetherington, S. Knowledge and the Gettier Problem. Cambridge: Cambridge University Press (2016).
    • A critical analysis of the history of the Gettier Problem.
  • Hintikka, J. Knowledge and Belief: An Introduction to the Logic of the Two Notions Ithaca, NY: Cornell University Press, 1962. ch. 5.
    • On the KK-thesis — that is, on knowing that one knows.
  • Howson, C. Hume’s Problem: Induction and the Justification of Belief. Oxford: Oxford University Press, 2000.
    • A technically detailed response to Hume’s fallibilist challenge to the possibility of inductively justified belief.
  • Hume, D. An Enquiry Concerning Human Understanding, in Hume’s Enquiries, (ed.) L. A. Selby-Bigge, 2nd edn. Oxford: Oxford University Press, 1902 [1748].
    • This includes, in section IV, the most generally cited version of Hume’s inductive fallibilism and inductive skepticism.
  • Kahneman, D., Slovic, P., and Tversky, A. (eds.). Judgment under Uncertainty: Heuristics and Biases. Cambridge: Cambridge University Press, 1982.
    • On empirical evidence of people’s cognitive fallibilities.
  • Merricks, T. “More on Warrant’s Entailing Truth.” Philosophy and Phenomenological Research 57 (1997): 627-31.
    • Argues against the possibility of there being fallible knowledge.
  • Miller, D. Critical Rationalism: A Restatement and Defence. Chicago: Open Court, 1994.
    • Discusses many ideas (including a skepticism about epistemic justification) that might arise if fallibilism is true.
  • Morton, A. A Guide through the Theory of Knowledge, 3rd edn. Malden, Mass.: Blackwell, 2003. ch. 5.
    • On the basic idea, plus some possible forms, of fallibilism.
  • Nagel, T. The View from Nowhere. New York: Oxford University Press, 1986.
    • See especially chapters I and V. Discusses the interplay of different perspectives (“inner” and “outer” ones) that a person might seek upon herself, especially as greater objectivity is sought. (This bears upon section 9’s distinction between two possible kinds of question that can be asked about whether a particular belief is fallible knowledge.)
  • Neurath, O. “Protocol Sentences,” in A. J. Ayer (ed.), Logical Positivism. Glencoe, Ill.: The Free Press, 1959 [1932/33].
    • Includes the famous “boat at sea” metaphor.
  • Nisbett, R. and Ross, L. Human Inference: Strategies and Shortcomings of Social Judgment. Englewood Cliffs, NJ: Prentice-Hall, 1980.
    • On empirical evidence of people’s cognitive fallibilities.
  • Peirce, C. S. Collected Papers, (eds.) C. Hartshorne and P. Weiss. Cambridge, Mass.: Harvard University Press, 1931-60.
    • See, for example, 1.120, and 1.141 through 1.175, for some of Peirce’s originating articulation of the concept of fallibilism as such.
  • Plantinga, A. Warrant: The Current Debate. New York: Oxford University Press, 1993.
    • An analysis of some proposals as to what warrant might be within (fallible) knowledge.
  • Quine, W. V. “Epistemology Naturalized,” in Ontological Relativity and Other Essays. New York: Columbia University Press, 1969.
    • A bold and prominent statement of the program of naturalized epistemology, trying to understand fallibility as a part of, rather than a threat to, the justified uses of observation and reason.
  • Reed, B. “How to Think about Fallibilism.” Philosophical Studies 107 (2002): 143-57.
    • An attempt to define fallible knowledge.
  • Rescher, N. Scepticism: A Critical Reappraisal. Oxford: Blackwell, 1980.
    • On fallibilism and many associated skeptical issues about knowledge and justification.
  • Shope, R. K. The Analysis of Knowing: A Decade of Research Princeton: Princeton University Press, 1983.
    • Presents much of the earlier history of attempts to solve the Gettier problem — and thereby to define fallible knowledge.
  • Sorensen, R. A. Blindspots. Oxford: Oxford University Press, 1988. ch. 1.
    • A philosophical analysis of the kinds of thought or sentence that constitute Moore’s paradox.
  • Stove, D. C. Probability and Hume’s Inductive Scepticism. Oxford: Oxford University Press, 1973.
    • Explains how Hume’s inductive fallibilism gives way to his inductive skepticism.
  • Williams, B. Descartes: The Project of Pure Enquiry. Hassocks: The Harvester Press, 1978.
    • Analysis of Descartes’ skeptical doubts.
  • Wilson, M. D. Descartes. London: Routledge & Kegan Paul, 1978.
    • Includes an account of Descartes’ skeptical endeavors.
  • Wittgenstein, L. Philosophical Investigations, (trans.) G. E. M. Anscombe. Oxford: Blackwell, 1978 [1953]. Sections 243-315, 348-412.
    • Presents the private language argument (against the possibility of anyone’s being able to think in a language which only they could understand).

Author Information

Stephen Hetherington
Email: s.hetherington@unsw.edu.au
University of New South Wales
Australia

Yang Xiong (53 B.C.E.—18 C.E.)

Yang_XiongYang Xiong (Yang Hsiung) was a prolific yet reclusive court poet whose writings and tragic life spanned the collapse of the Former Han dynasty (202 B.C.E.-9 C.E.) and the brief and catastrophic usurpation of the throne by the Imperial Regent Wang Mang (9-23 C.E.). He is best known for his assertion that human nature originally is neither good (as argued by Mencius) nor depraved (as argued by Xunzi) but rather comes into existence as a mixture of both. Yang Xiong’s chief philosophical writings – an abstruse book of divination known as the Tai xuan (The Great Dark Mystery) and his Fa yan (Words to Live By), a collection of aphorisms and dialogues on a variety of historical and philosophical topics – are little known even among Chinese scholars. These works combine a Daoist concern for cosmology, but may be best described as a product of the intellectual and spiritual syncretism characteristic of the Han dynasty (202 B.C.E.-220 C.E.). As a social critic and classical scholar, he is considered to be the chief representative of the Old Text School (guxue) of Confucianism. Although some think he was one of the most important writers of the late Former Han, he had little influence during his own time and was vilified for his association with the usurper Wang Mang. Consequently, his works have largely been left out of the Confucian canon.

Table of Contents

  1. Life and Writings
  2. Intellectual Context
    1. Han Syncretism and Correlative Cosmology
    2. The Old Text / New Text Controversy
  3. Tai xuan (The Great Dark Mystery)
    1. Date and Significance
    2. The Influence of the Laozi and the Yijing
    3. Correlative Cosmology in the Tai xuan
  4. Fa yan (Words to Live By)
    1. Date and Significance
    2. The Influence of the Lunyu
    3. Syncretism in the Fa yan
    4. Old Text Themes in the Fa yan
    5. Political Philosophy in the Fa yan
    6. View of Human Nature
  5. Poetical Works
  6. References and Further Reading

1. Life and Writings

Yang Xiong was born in 53 B.C.E. in the western city of Chengdu in the province of Shu. His biography in the Qian Han Shu (History of the Former Han) remarks that Yang Xiong was fond of learning, was unconcerned with wealth, office, and reputation, and suffered from a speech impediment and consequently spoke little. As a youth he probably was a student of Zhuang Zun, a reclusive marketplace fortune teller who refused to take office, opting instead to use divination and fortune-telling as a means to encourage virtue among the common people. Before coming to the capital he gained renown for his poetic writings, in particular for his fu, a poetic genre associated with an earlier native of Shu, Sima Xiangru (179-117 B.C.E.). Yang Xiong’s reputation as a poet eventually reached the capital of Chang’an, and around 20 B.C.E. he was summoned to the court of Emperor Cheng. Between the years 14-10 B.C.E., Yang Xiong submitted several poetic pieces commemorating imperial sacrifices and hunts, and finally in 10 B.C.E. he was appointed to the humble office of “Gentleman in Attendance” and “Servitor at the Yellow Gate,” where he would remain until his final days. While not much is known of Yang Xiong’s activities as a lowly official at the Han court, it appears that, as far back as 9 B.C.E., Emperor Cheng issued a decree excusing him from the direct official service, while maintaining an official title, salary, and access to the imperial library for him.

Shortly after his appointment, Yang Xiong became disillusioned with the rectifying power of his poetry and stopped writing it for the court. Yang Xiong’s decision appears to have coincided with the death of his son, a tragedy which left him despondent and financially impoverished. Over the next two decades he produced his two works on philology: Cang Jie xun zuan (Annotations to the Cang Jie), a compilation of annotations to the Qin dynasty’s official imperial dictionary, and Fang yan (Dialects), a collection of regional expressions. During this period, he also produced his Tai xuan (The Great Dark Mystery), which he completed around 2 B.C.E., and Fa yan (Words to Live By), which he completed in 9 CE – right about the time that the Imperial Regent Wang Mang usurped the throne and established the brief Xin dynasty (9-23 CE).

Yang Xiong’s life and writings were overshadowed by the rise and fall of the notorious Wang Mang (45 B.C.E.-23 CE). A nephew of the wife of Emperor Yuan (who reigned 48-32 B.C.E.), Wang Mang rose to the rank of Imperial Regent. In 9 CE, through a combination of court intrigue, political machinations, manipulation of popular superstitions, and opportunity, he seized the throne from the founding House of Liu and declared himself the rightful possessor of the Mandate of Heaven. His short-lived Xin dynasty marks the dividing line between the Former or Western Han (202 B.C.E.-9 C.E.) and the Later or Eastern Han (25-220 CE) and, due to widespread rebellion and a series of natural catastrophes, is widely considered one of the most calamitous periods in Chinese history.

While little is known of Yang Xiong’s activities during his final years, his biography notes that, shortly after Wang Mang’s usurpation Yang Xiong attempted suicide when he was named in a scandal involving one of his former students. He survived the attempt. When Wang Mang heard of it, he ordered all charges against Yang Xiong dropped, proclaiming that the poet had never been involved in any political affairs at court. His final work, Ju qin mei xin, appears to have been a controversial memorial presented to Wang Mang around 14 CE; its title is translated by Knechtges as Denigrating Qin and Praising Xin. Yang Xiong died four years later at the age of 71.

2. Intellectual Context

a. Han Syncretism and Correlative Cosmology

The focus of Yang Xiong’s writings during the middle years of his life is commonly seen as reflecting the Han trend toward syncretism and correlative cosmology. While the disunity of the Warring States period (475-221 B.C.E.) provided fertile soil for the flourishing of the “One Hundred Schools of Thought” (baijia), the unification brought about by the Qin (221-206 B.C.E.) and the Former Han dynasties provided the impetus for their coalescence. This combination of diverse views during the Qin and the Han periods can be seen in works such as the Lushi chunqiu (The Spring and Autumn Annals of Mr. Lu) and the Huainanzi (The Master of Huainan), which blend various streams of ancient Chinese thought, including Daoism, Confucianism, Legalism, Huang-Lao thought, Militarism, Mohism, and yinyang and wuxing (Five Phase) thought.

Though Confucianism became the dominant and official school of thought in the Han, it borrowed heavily from earlier schools, particularly the yinyang and wuxing schools. The former explains all entities and events in terms of the interaction between two interdependent properties, yin (associated with darkness, passivity, and femininity) and yang (associated with light, activity, and masculinity). The latter takes a similar approach to understanding natural phenomena but includes the idea that “Five Phases” (each associated with metal, wood, water, fire, and earth, respectively) succeed one another in a never-ending cyclical process. The amalgamation of Confucianism, yinyang, and wuxing theory is especially evident in the writings of the scholar Dong Zhongshu (179-104 B.C.E.), whose Chunqiu fanlu (Luxuriant Dew of the Spring and Autumn Annals) illustrates a synthesis between Confucian ethics and an amalgam of yinyang and wuxing cosmology. Attempts to develop exhaustive systems of classification (leishu) were also common during this period and can be seen as part of the larger trend toward syncretization. These tables often use a Five Phase cosmological framework in which things are organized analogically on the basis of their relevant associations, rather than on the basis of some discrete essence. As can be seen in Yang Xiong’s Tai xuan, the correlations which form the basis of these classification systems can be bewildering – especially to anyone unfamiliar with the sorts of complex associations found in early Chinese culture.

b. The Old Text / New Text Controversy

Many historians of Chinese philosophy have identified Yang Xiong’s final and best-known work, the Fa yan (Words to Live By), as representative of a more rational and sober-minded form of Confucianism known as the Old Text School (guxue). In contrast to the New Text School, which relied on versions of the classics written in the simpler and officially recognized script of the Han dynasty known as “new script” (jinwen), the Old Text School relied on versions written in the archaic scripts (guwen) and characters of the Zhou dynasty (c. 1100-221 B.C.E.). Legend has it that these latter texts survived the book burnings of the Qin dynasty by lying concealed in the walls of the home of Confucius. Generally speaking, the Old Text School was associated with the simpler, more pragmatic philosophy of Confucius’s native state of Lu, while the New Text school was associated with the often fantastic writings of Zou Yan (305-240 B.C.E.), a native of Qi and founder of the yinyang and wuxing schools of thought.

Through much of the late Former Han dynasty, Confucianism was under the influence of the yinyang and wuxing theories promoted by New Text adherents. During this period, New Text scholars increasingly became interested in esoteric readings of the classics, cosmological speculation, and calamity and portent interpretation. The chief representatives of this period were classical scholars who commonly employed wuxing and yinyang correlations, numerical calculations, and various techniques of divination to fathom the harmony and continuity of humanity, nature, and the ancestral spirits – and to forecast disruptions.

By the reigns of the last Former Han Emperors, the use of yinyang and wuxing theory in interpreting the classics and the progress of history closely paralleled methods found in apocryphal oracle books and commentaries that treated the classics as fortune-telling handbooks and used reports of unusual phenomena not to boldly admonish the Emperor – as did Zou Yan and Dong Zhongshu – but to curry favor with those in power. This trend reached its climax with Wang Mang, whose rise to power and eventual usurpation was associated with, and to a large extent legitimated by, hundreds of favorable omens and the generous rewarding of those who reported them.

While scholars are divided on whether the Old Text School originated from Xunzi’s branch of Confucianism, most characterize this movement as a rational response to the excesses of the New Text school, whose influence had left the Han court and its scholars heavily dependent upon yinyang and wuxing thinking. More broadly, the Old Text school can be seen as a response to the often irrational and superstitious world of the late Former Han – a world that interpreted the classics as containing secret magical formulas and prognostications, was fascinated by talk of immortals, saw itself near the bottom in the historical cycle of rise and decline, and interpreted the passing of each childless Emperor and reports of calamities as portents to be dreaded.

3. Tai xuan (The Great Dark Mystery)

a. Date and Significance

Completed around 2 B.C.E., the Tai xuan is Yang Xiong’s longest and most difficult work. Few scholars have taken time to study it, and those who have often disagree about its import. Some scholars view the main focus of the text to be wuxing theory, others view its main focus to be the Five Constant Virtues (wuchang) of Confucianism, and still others view the Tai xuan as political satire of Wang Mang and other historical figures of the late Former Han. (See Michael Nylan’s translation and commentary of the Tai Xuan (1993)). While the Tai xuan is more a manual of divination than a philosophical treatise, it embodies a number of assumptions about the nature of the world, its cycles of transformation, and the central importance of timeliness in making one’s way in the world. Just as in his earlier poetry, in the Tai xuan Yang Xiong reiterates the view that success and failure do not all come down to individual effort but have much to do with the times and circumstances in which one lives, and that if one does not meet one’s proper time for acting, then one should retire or withdraw and wait for more advantageous times.

b. The Influence of the Laozi and the Yijing

The term xuan in the title is typically used in Chinese literature as a modifier to describe that which is dark, black, mysterious, profound, abstruse or hidden. Yang Xiong, however, uses the term xuan much like the term dao in the Laozi to refer to the hidden fountainhead or initial state out of which things emerge and the mysterious process through which they unfold. While Yang Xiong’s conception of xuan seems to be derived from the Laozi, the text of the Tai xuan is modeled on the Yijing (Book of Changes), certainly the most enigmatic philosophical document in early Chinese literature. Like the Yijing, the Tai xuan is a book of divination based on an evolving sequence of figures that, when taken together, map out the cycles of transformation underlying all things. In both texts, each figure-image-circumstance is articulated through an evolving series of statements that describes and appraises the unfolding of the situation and the meaning of the image. Appended to both the Yijing and the Tai xuan is a set of commentaries that elaborates on the inner meanings of their respective texts.

In some ways, the Tai xuan is even more complex than its model. While the Yijing is made up of 64 hexagrams, the Tai xuan is made up of 81 tetragrams. In the Yijing, each hexagram line can be solid or broken (representing the polarities of yin and yang). In the Tai xuan, each tetragram line can be solid, broken once, or broken twice (representing the triad of heaven, earth, and man), and each of the 81 tetragrams is correlated with, among other things, yin or yang, one of the “Five Phases,” a hexagram from the Yijing, a constellation, days of the calendar, and a musical note.

c. Correlative Cosmology in the Tai xuan

In the Tai xuan, each tetragram is articulated though an evolving series of nine appraisals or judgments (whereas in the Yijing, each hexagram is articulated through a series of six line statements). These line appraisals unfold in a cyclical pattern corresponding to periods of time, the transformations of yin and yang, and a continuous cycle of commencement, maturity and decline. The appraisals can also be divided into those that address the commoner, the noble, and the Emperor.

Also, the often obscure correlative-poetic organization of the images and their associated line appraisals can be seen in the Tai xuan commentary “Numbers of the Dark Mystery,” an example of the Han genre of classificatory works known as leishu. For example, “Numbers of the Mystery” correlates the number five with the earth, the color yellow, fear, wind omens, tumuli, the naked animal (humankind), fur, bottles, weaving, sleeping mats, complying, verticality, glue, sacks, hubs, calves, coffins, bows and arrows, stupidity, and the center courtyard rain well. The basis of these associations is analogical; A is to B as C is to D. The organization scheme is fivefold. The five numerical categories (three and eight, four and nine, two and seven, one and six, and five) correspond to the five directions (east, west, south, north, center), the five phases (wood, metal, fire, water, earth), the seasons (spring, autumn, summer, winter, four seasons), the five colors (green, white, red, black, yellow), the five trades (carpentry, metal smithing, working with fire, waterworks, earth works), and the like.

4. Fa yan (Words to Live By)

a. Date and Significance

Unlike Yang Xiong’s other works, the dating of the Fa yan is fairly certain. In the final passage of the text, there is a reference to Wang Mang as the Duke of Han. The fact that Wang Mang held this title from 1-9 CE implies that the Fa yan could not have been submitted after 9 CE when he took the title of Emperor. In Fa yan 13:34 there is a reference to the Han dynasty as having ruled for 210 years. If the founding of the Han is taken to be 202 B.C.E., then the passage would have been written no earlier than 8 CE. Whatever the date of completion, there is little doubt that the Fa Yan was written during a period when Wang Mang held in his hands the reigns of power and the destiny of his sovereign. It remains his best-known work.

b. The Influence of the Lunyu

In his autobiography, Yang Xiong notes that, just as he modeled his Tai xuan on the greatest of the classics, the Yijing, so he modeled his Fa yan on the text he saw as the greatest of the commentaries – the Confucian Lunyu (Analects). Like the Lunyu, the Fa yan consists of a series of aphorisms and dialogues on a wide variety of historical and philosophical topics. Also like the Lunyu, the language of the Fa yan is archaic, its style terse, and its organization puzzling. While the form, language, and style of the Fa yan all seem to be derived from the Lunyu, the two works are most similar in their underlying concerns.

Both the Lunyu and the Fa Yan focus on the perennial Confucian theme of self-cultivation while emphasizing the importance of learning, friendship, role models, rites and music, and the human virtues. Both works look back to the ancient sage kings, the ways of the Zhou dynasty, and the teachings of the classics as models for their own troubled times. Each work has been read as a subtle attack on the predominant political powers. Finally, both the Lunyu and Fa yan can be characterized as works of frustration that lament the political instability of their respective times, the tendency of princes and officials to overstep their roles, and the failure of Confucius (Kongzi) and Yang Xiong to gain recognition or to exercise political influence.

c. Syncretism in the Fa yan

Among the disjointed sayings and dialogues of the Fa yan, one finds a wide variety of topics and themes. As noted, the most central of these are the perennial Confucian themes: self-cultivation, learning, the natural tendencies, the human virtues, the value of the classics, rites and music, the princely person, the sage, ruling, filial responsibility, and so forth. One also finds in the Fa yan discussions of concepts and themes usually associated with Daoism such as dao (way), de (potency), ziran (spontaneity), wuwei (non-coercive action), minimizing desire, and withdrawing from public life. These topics are often explicated through discussions of an unusually broad assortment of historical figures, including poets, philosophers, rhetoricians, rulers, officials, generals, merchants, rebels, assassins, jesters, recluses, and others. These topics are similarly interpreted through discussions of historical events, such as the collapse of the Zhou dynasty, the intrigues of the Warring States, the rise of the Qin dynasty and its rapid fall, the struggle between Xiang Ji (233-202 B.C.E.) and the Han dynastic founder Liu Bang (247-195 B.C.E.), and the founding of the Han dynasty.

Also included among the numerous topics discussed in the Fa yan are more immediate concerns of the late Former Han. These include the assimilation of heterodox teachings and popular superstitions into commentaries and interpretations of the classics, the decline of the ruling house of Han, the popularity of portents and the rise of Wang Mang, and government reforms in taxation, punishment, division of land, and relations with barbarian tribes. Finally, there are sayings and dialogues which address the concerns of scholar officials living not only in the troubled late Former Han, but throughout much of China’s long history – the practicality and viability of the Confucian way of life, the vanity of the desires for wealth, office and renown, and the challenges of surviving and maintaining one’s integrity in a time of disorder.

d. Old Text Themes in the Fa yan

Throughout the Fa yan, Yang Xiong sets the tone for subsequent representatives of the Old Text School by repeatedly poking fun at questions on magic, immortals, spirits, omens and portents, and esoteric interpretations of the classics. Instead he redirects attention toward concerns directly affecting the living: wealth and poverty, gain and loss, glory and disgrace, success and failure, friendship, joy, integrity, the dangers of public office, ruling the Empire, fate and circumstance, fleeing the world, and death. While the Tai xuan might be described as a synthesis of the various schools of early Chinese thought, the Fa yan elevates the Confucian school above all the others. In aphorism after aphorism, the Fa yan praises Confucius and the classics as the standards, stresses the importance of learning, rites and music, the five virtues, the five relations, and filial responsibility, while at the same time offering sardonic remarks on Daoist, Legalist, and yinyang and wuxing thinkers and their doctrines.

e. Political Philosophy in the Fa yan

On governing, the Fa yan can be seen as advancing a Reformist position. While the literary world of the late Former Han is often explicated in terms of the New and the Old Text schools, the political world of this period is similarly explicated in terms of two opposing camps: Modernists who, like earlier Legalists, advocated policies that sought to enrich the wealth and power of the state through conquering border tribes, opening trade routes, and establishing government monopolies, and Reformists who accused Modernists of ignoring the welfare of the people and advocated instead for a more frugal form of government that emphasized retrenchment in foreign policy, abolition of government monopolies, and land reform. In the Fa yan, Yang Xiong aligns himself with the Reformists by speaking out against government monopolies and expensive military campaigns and voices support for an easing of heavy burdens on the populace and the reinstitution of Zhou dynasty practices and policies.

The Reformist tone of the Fa yan gives credence to the association of Yang Xiong with “the Usurper,” Wang Mang, which has become standard throughout generations of Chinese scholarship. While Wang Mang’s rise to power met with opposition and spurred a number of insurrections, he seems to have found support in the ranks of court scholars for his display of Confucian virtue and his attempts to reorganize the social institutions of the Han along the lines of the Zhou dynasty – the system of rites and institutions highly prized by Confucian scholars since the Warring States period. Some have even seen Wang Mang as genuine in his espousal of Confucian ideals and as a sincere believer that reviving the institutions and rites of the Zhou dynasty would lead to a period of great peace and harmony. The more typical view, dating back to the account of Ban Gu (32-92 CE) in the Qian Han Shu (History of the Former Han), portrays Wang Mang as an ambitious, duplicitous, and murderous charlatan who rebelled against his sovereign and left the Empire in ruins.

Little is known of Yang Xiong’s actual political leanings in the face of Wang Mang’s rise to power. Those who portray Yang Xiong as a Wang Mang partisan point to the fact that, when Wang Mang declared himself Emperor, Yang Xiong did not commit suicide or leave court to become a recluse as did many other Han officials. His supporters, however, point out that, in his earlier poetic works and in the Fa yan, Yang Xiong has a great deal to say – most of it critical – about men who, in the name of principle, committed suicide or fled to the mountains. As noted above, it appears that Yang Xiong preferred instead to follow his teacher Zhuang Zun – though not as a recluse among men, but as a recluse at court. Although the Fa yan was written during Wang Mang’s rise in power and apparently finished shortly before his usurpation, he is mentioned only once in it. Nonetheless, some read the text as an apology for Wang Mang’s usurpation and the Confucian reforms he attempted to institute. Others read the Fa yan as consisting of a number of cleverly veiled attacks on Wang Mang’s penchant for superstition, his insatiable ambition, and his pretense to being a humble Confucian.

Some passages of the Fa yan have been read as offering neither flattery nor ridicule but bold admonitions, counseling Wang Mang to remember his filial duties and to return the reigns of power to the rightful ruler. For example, in Fa yan 8:21, there is a terse passage that reads, “The Red and Black Bows and Arrows do not amount to having it.” Centuries earlier the Imperial house of the Zhou dynasty awarded princes a set of bows and arrows as symbol of investiture to punish all within their jurisdiction. In an attempt to follow this ancient tradition, a set of red and black bows and arrows was awarded to Wang Mang in 5 CE as part of the “Conferment of the Nine Distinctions” bestowed on him by ministers, officials, and scholars of the Han court. While commentators uniformly read the phrase “red and black bows and arrows” in Fa yan 8:21 as a reference to this award, they are divided over its meaning. While some see 8:21 as flattering praise, others see it as reminding Wang Mang that having been bestowed the honor of the “Red and Black Bows and Arrows” does not amount to the possession of the mandate.

The passage most frequently cited as evidence of Yang Xiong’s political leanings is found in Fa yan 13:34, where Wang Mang is compared to two of the greatest ministers in Chinese history: Zhou Gong (the Duke of Zhou, c. 12th century B.C.E.) and Yi Yin (c. 18th century B.C.E.). Given the location of this passage at the very end of the text, some have considered it to be a forgery. Others have seen it as a flattering endorsement of Wang Mang. The great Neo-Confucian philosopher Zhu Xi (1130-1200 CE), for example, reads this passage as lavish praise of Wang Mang’s achievements and, on the basis of it, dismisses Yang Xiong as “Wang Mang’s Grandee.” Still others have seen it as admonishing Wang Mang to be like Yi Yin and Zhou Gong before him and to return the reigns of power to his rightful sovereign. It is important to point out that, like Wang Mang, both Yi Yin and Zhou Gong served as Imperial Regents. Like Yi Yin, Wang Mang stood in the wings through a series of short-lived reigns. As in the case of Yi Yin, it fell on Wang Mang to name a successor to the throne. Both Yi Yin and Wang Mang served as regents while their hand-picked successors lacked maturity. But while Yi Yin and Zhou Gong are remembered for handing back the reigns of power, Wang Mang is popularly remembered in the chengyu (proverb) as one who “usurped the Han and named himself Emperor.”

f. View of Human Nature

As Wing-tsit Chan and others have pointed out, the view for which Yang Xiong has become most famous – that human nature is a mixture of good and evil – is articulated only in a single passage of the Fa yan (3:2) and is not elaborated any further:

Human nature is a muddle [hun] of good and evil tendencies. Cultivating the good tendencies makes a person good. Cultivating the evil ones makes a person depraved. This force [qi] – is it not like a horse that drives one towards good or evil?

This hardly amounts to the kind of sustained development of a view of human nature found, for example, in the work of Mencius or Xunzi, who represent opposite poles on the continuum of ancient Chinese views of human nature. Nonetheless, Yang Xiong’s view here, although undefended in philosophical terms, contradicts Mencius’ view that human nature originally is good and can only be warped (but never entirely destroyed) through neglect or negative influences. After Mencius’ view became the orthodox one among Confucians, especially during the Neo-Confucian movement of medieval and early modern China, Yang Xiong’s work came in for a great deal of criticism from Confucians. Thus, rather like Xunzi, Yang Xiong may be seen as something of a black sheep among early Confucians because of his deviation from what became Confucian orthodoxy in a later age.

5. Poetical Works

Before being summoned to court, Yang Xiong wrote a number of poetic pieces of which only one – Fan sao (Refuting Sorrow) – survives. As Yang Xiong explains in his autobiography, Fan Sao was written in response to Li sao (Encountering Sorrow), a poem by the legendary Warring States poet Qu Yuan (340-278 B.C.E.). According to the Shiji (Historical Records) account, Qu Yuan served as a trusted official to King Huai of Chu, but, after he was slandered by a jealous minister, he fell from favor and was exiled. Qu Yuan desperately wished to return to the service of King Huai, but in the end he gave up hope and after composing Li sao, he drowned himself.

While Yang Xiong’s Fan sao is similar in style to Qu Yuan’s Li sao, its outlook is very different. Qu Yuan saw suicide as the only option left to persons of character living in a corrupt age. Yang Xiong, on the other hand, compares Qu Yuan’s response to failure in the political sphere with the response of Confucius. Unlike Qu Yuan, Confucius’s disappointments in searching for rulers who would employ him in “making good government” did not stop him from living a full life of travel, teaching, and writing. Here and in his later philosophical works, we find Yang Xiong maintaining that success and failure do not come down to individual effort but have much to do with the times and circumstances in which one lives. If one does not meet one’s proper time for acting, then one should retire or withdraw and like a snake or dragon lie submerged or like a phoenix remain concealed and wait for more advantageous times.

While at court, Yang Xiong composed a number of primarily autobiographical poetic pieces where he reflects on his poverty, lowly position, lack of recognition, and the ridicule and difficulties these frustrations have engendered. In Jie chao (Dissolving Ridicule), for example, Yang Xiong portrays himself as ridiculed for his low position and his failure to influence the court. In responding, Yang Xiong reiterates a familiar theme in his writings, arguing that in an age beset with chaos, it is better to remain silent and unknown since, as David R. Knechtges translates it, “those who grab for power die, and those who remain silent survive; those who reach the highest positions endanger their family, while those who maintain themselves intact survive.” In Zhu bin (Expelling Poverty), Yang Xiong expels an unwelcome guest named “Poverty” whose lingering presence in the poet’s life has labored his body and afflicted his health, cut him off from friends, and slowed his promotion in office. After listening to Yang Xiong vent, Poverty humbly agrees to leave, but first reminds Yang Xiong of the virtue of the impoverished sage Shun, warns him of the greed of the tyrants Jie and Zhi, and offers the consolation that it is only because of his privation that the poet is able to bear heat and cold, and to live freely with equanimity. Enlightened, Yang Xiong apologizes to Poverty and welcomes him as an honored guest.

Yang Xiong wrote several pieces in a genre known as fu, a term translated by Knechtges as “rhapsody.” Marked by its florid imagery and ecstatic tone, this genre was commonly employed by Han court officials as a means of offering indirect criticism and admonition to the Emperor. As Knechtges points out, most of the well known early writers of rhapsodies, such as Lu Jia (228-140 B.C.E.) and Jia Yi (200-168 B.C.E.), were not only poets but also scholar-officials who saw it as their duty to offer advice and remonstrance (jian) to rulers and did so through their poetic works. In the rhapsodies of later Former Han writers like Sima Xiangru, however, verbal decoration and entertainment took precedence over instruction and admonition.

In his early years at the court of Emperor Cheng, Yang Xiong submitted a number of rhapsodies. At first glance, these works appear to be little more than ornate, fanciful, and flattering descriptions of Imperial spectacles. In Fa yan (Words to Live By) and in the autobiographical section of his biography, however, Yang Xiong stresses that, like earlier poets, he envisioned the primary purpose of these works to be remonstrance – a dangerous political task widely recognized as one of the most central duties of the Confucian scholar. While, on the surface, Yang Xiong’s rhapsodies heap lavish praise on the Emperor, they also contain stern reprimands and warning. For example, within the fanciful descriptions of Imperial grandeur found in the Ganquan fu (Sweet Springs Rhapsody), Yang Xiong indirectly admonishes Emperor Cheng to be more solemn in conducting affairs, suggesting through allusion that, like the lascivious tyrant kings Jie and Xia, Emperor Cheng’s wanton conduct would lead to his downfall. In the Jiaolie fu (Barricade Hunt Rhapsody) and the Changyang fu (Changyang Palace Rhapsody), both of which commemorate imperial hunts, Yang Xiong indirectly criticizes the hunts as lavish, wasteful spectacles that burden the peasants and destroy their farms and farmlands. In his later writings, Yang Xiong claims that he eventually came to see the ornate style of rhapsody as excessive, and realizing that the moral admonitions he tried to provide had gone unheeded (if not unnoticed), he renounced it. He never gave up writing poetry altogether, however.

6. References and Further Reading

There are very few published studies of Yang Xiong in English. Of these, Nylan’s pioneering translation and commentary of the Tai Xuan (1993) is the most complete account of Yang Xiong’s philosophy, while Knechtges’s studies of Yang Xiong’s fu poetry (1976, 1977) and his Qian Han Shu biography (1982) offer superb translations and interpretations of Yang Xiong’s life and literary works. Colvin (2001) provides a translation of the Fa yan and an examination of the seemingly haphazard organization of its aphorisms and dialogues. For a fuller understanding of Yang Xiong’s thought, readers are encouraged to explore the more general accounts of the literary, intellectual, and political contexts of the Former Han dynasty in Bielenstein (1984), Feng (1953), Loewe (1974, 1986), Thomsen (1988), Xiao (1979), and Yu (1967).

  • Bielenstein, Hans. “Han Portents and Prognostications.” Museum of Far Eastern Antiquities 56 (1984): 97-112.
  • Chan, Wing-tsit. “Taoistic Confucianism: Yang Hsiung.” In A Source Book in Chinese Philosophy, ed. Wing-tsit Chan (Princeton: Princeton University Press, 1963), 289-291.
  • Colvin, Andrew. Patterns of Coherence in Yang Xiong’s Fa Yan. Ph.D. dissertation, University of Hawaii at Manoa, 2001.
  • Doeringer, Franklin M. Yang Xiong and his Formulation of a Classicism. Ph.D. dissertation, Columbia University, 1971.
  • Feng, Yulan. A History of Chinese Philosophy, Vol. 2: The Period of Classical Learning. Trans. Derke Bodde. Princeton: Princeton University Press, 1953.
  • Knechtges, David R. The Han Rhapsody: A Study of the Fu of Yang Xiong (53 B.C.- A.D.18). Cambridge: Cambridge University Press, 1976.
  • Knechtges, David R. “Uncovering the Sauce Jar: A Literary Interpretation of Yang Hsiung’s Chu ch’in mei Hsin.” In Ancient China: Studies in Early Civilization, eds. David T. Roy et al (Hong Kong: Chinese University Press, 1977), 229-252.
  • Knechtges, David R. “The Liu Hsin /Yang Hsiung Correspondence on the Fang Yen.” Monumenta Serica 33 (1977): 309-325.
  • Knechtges, David R. The Han Shu Biography of Yang Xiong (53 B.C. to A.D. 18). Tempe: Arizona State University Press, 1982.
  • Loewe, Michael. Crisis and Conflict in Han China 104 B.C. to A.D. 9. London: George Allen and Unwin, 1974.
  • Nylan, Michael. The Canon of Supreme Mystery by Yang Xiong: A Translation with Commentary of the T’ai Hsüan Ching. Albany: State University of New York Press, 1993.
  • Nylan, Michael. “Han Classicists Writing in Dialogue about their Own Tradition.” Philosophy East & West 47/2 (1996): 133-188.
  • Thomsen, Rudi. Ambition and Confucianism: A Biography of Wang Mang. Aarhus: Aarhus University Press, 1988.
  • Twichett, Denis, and Michael Loewe, eds. The Cambridge History of China, Vol. 1: The Ch’in and Han Empires, 221 B.C. – A.D. 220. Cambridge: Cambridge University Press, 1986.
  • Xiao, Gongjun. A History of Chinese Political Thought, Vol. 1: From the Beginnings to the Sixth Century A.D. Trans. F.W. Mote. Princeton: Princeton University Press, 1979.
  • Yu, Yingshi. Trade and Expansion in Han China. Berkeley: University of California Press, 1967.

Author Information

Andrew Colvin
Email: andrew.colvin@sru.edu
Slippery Rock University
U. S. A.

Gettier Problems

Gettier problems or cases are named in honor of the American philosopher Edmund Gettier, who discovered them in 1963. They function as challenges to the philosophical tradition of defining knowledge of a proposition as justified true belief in that proposition. The problems are actual or possible situations in which someone has a belief that is both true and well supported by evidence, yet which — according to almost all epistemologists — fails to be knowledge. Gettier’s original article had a dramatic impact, as epistemologists began trying to ascertain afresh what knowledge is, with almost all agreeing that Gettier had refuted the traditional definition of knowledge. They have made many attempts to repair or replace that traditional definition of knowledge, resulting in several new conceptions of knowledge and of justificatory support. In this respect, Gettier sparked a period of pronounced epistemological energy and innovation — all with a single two-and-a-half page article. There is no consensus, however, that any one of the attempts to solve the Gettier challenge has succeeded in fully defining what it is to have knowledge of a truth or fact. So, the force of that challenge continues to be felt in various ways, and to various extents, within epistemology. Sometimes, the challenge is ignored in frustration at the existence of so many possibly failed efforts to solve it. Often, the assumption is made that somehow it can — and will, one of these days — be solved. Usually, it is agreed to show something about knowledge, even if not all epistemologists concur as to exactly what it shows.

Table of Contents

  1. Introduction
  2. The Justified-True-Belief Analysis of Knowledge
  3. Gettier’s Original Challenge
  4. Some other Gettier Cases
  5. The Basic Structure of Gettier Cases
  6. The Generality of Gettier Cases
  7. Attempted Solutions: Infallibility
  8. Attempted Solutions: Eliminating Luck
  9. Attempted Solutions: Eliminating False Evidence
  10. Attempted Solutions: Eliminating Defeat
  11. Attempted Solutions: Eliminating Inappropriate Causality
  12. Attempted Dissolutions: Competing Intuitions
  13. Attempted Dissolutions: Knowing Luckily
  14. Gettier Cases and Analytic Epistemology
  15. References and Further Reading

1. Introduction

Gettier problems or cases arose as a challenge to our understanding of the nature of knowledge. Initially, that challenge appeared in an article by Edmund Gettier, published in 1963. But his article had a striking impact among epistemologists, so much so that hundreds of subsequent articles and sections of books have generalized Gettier’s original idea into a more wide-ranging concept of a Gettier case or problem, where instances of this concept might differ in many ways from Gettier’s own cases. Philosophers swiftly became adept at thinking of variations on Gettier’s own particular cases; and, over the years, this fecundity has been taken to render his challenge even more significant. This is especially so, given that there has been no general agreement on how to solve the challenge posed by Gettier cases as a group — Gettier’s own ones or those that other epistemologists have observed or imagined. (Note that sometimes this general challenge is called the Gettier problem.) What, then, is the nature of knowledge? And can we rigorously define what it is to know? Gettier’s article gave to these questions a precision and urgency that they had formerly lacked. The questions are still being debated — more or less fervently at different times — within post-Gettier epistemology.

2. The Justified-True-Belief Analysis of Knowledge

Gettier cases are meant to challenge our understanding of propositional knowledge. This is knowledge which is described by phrases of the form “knowledge that p,” with “p” being replaced by some indicative sentence (such as “Kangaroos have no wings”). It is knowledge of a truth or fact — knowledge of how the world is in whatever respect is being described by a given occurrence of “p”. Usually, when epistemologists talk simply of knowledge they are referring to propositional knowledge. It is a kind of knowledge which we attribute to ourselves routinely and fundamentally.

Hence, it is philosophically important to ask what, more fully, such knowledge is. If we do not fully understand what it is, will we not fully understand ourselves either? That is a possibility, as philosophers have long realized. Those questions are ancient ones; in his own way, Plato asked them.

And, prior to Gettier’s challenge, different epistemologists would routinely have offered in reply some more or less detailed and precise version of the following generic three-part analysis of what it is for a person to have knowledge that p (for any particular “p”):

  1. Belief. The person believes that p. This belief might be more or less confident. And it might — but it need not — be manifested in the person’s speech, such as by her saying that p or by her saying that she believes that p. All that is needed, strictly speaking, is for her belief to exist (while possessing at least the two further properties that are about to be listed).
  2. Truth. The person’s belief that p needs to be true. If it is incorrect instead, then — no matter what else is good or useful about it — it is not knowledge. It would only be something else, something lesser. Admittedly, even when a belief is mistaken it can feel to the believer as if it is true. But in that circumstance the feeling would be mistaken; and so the belief would not be knowledge, no matter how much it might feel to the believer like knowledge.
  3. Justification. The person’s belief that p needs to be well supported, such as by being based upon some good evidence or reasoning, or perhaps some other kind of rational justification. Otherwise, the belief, even if it is true, may as well be a lucky guess. It would be correct without being knowledge. It would only be something else, something lesser.

Supposedly (on standard pre-Gettier epistemology), each of those three conditions needs to be satisfied, if there is to be knowledge; and, equally, if all are satisfied together, the result is an instance of knowledge. In other words, the analysis presents what it regards as being three individually necessary, and jointly sufficient, kinds of condition for having an instance of knowledge that p.

The analysis is generally called the justified-true-belief form of analysis of knowledge (or, for short, JTB). For instance, your knowing that you are a person would be your believing (as you do) that you are one, along with this belief’s being true (as it is) and its resting (as it does) upon much good evidence. That evidence will probably include such matters as your having been told that you are a person, your having reflected upon what it is to be a person, your seeing relevant similarities between yourself and other persons, and so on.

It is important to bear in mind that JTB, as presented here, is a generic analysis. It is intended to describe a general structuring which can absorb or generate comparatively specific analyses that might be suggested, either of all knowledge at once or of particular kinds of knowledge. It provides a basic outline — a form — of a theory. In practice, epistemologists would suggest further details, while respecting that general form. So, even when particular analyses suggested by particular philosophers at first glance seem different to JTB, these analyses can simply be more specific instances or versions of that more general form of theory.

Probably the most common way for this to occur involves the specific analyses incorporating, in turn, further analyses of some or all of belief, truth, and justification. For example, some of the later sections in this article may be interpreted as discussing attempts to understand justification more precisely, along with how it functions as part of knowledge. In general, the goal of such attempts can be that of ascertaining aspects of knowledge’s microstructure, thereby rendering the general theory JTB as precise and full as it needs to be in order genuinely to constitute an understanding of particular instances of knowing and of not knowing. Steps in that direction by various epistemologists have tended to be more detailed and complicated after Gettier’s 1963 challenge than had previously been the case. Roderick Chisholm (1966/1977/1989) was an influential exemplar of the post-1963 tendency; A. J. Ayer (1956) famously exemplified the pre-1963 approach.

3. Gettier’s Original Challenge

Gettier’s article described two possible situations. This section presents his Case I. (It is perhaps the more widely discussed of the two. The second will be mentioned in the next section.) Subsequent sections will use this Case I of Gettier’s as a focal point for analysis.

The case’s protagonist is Smith. He and Jones have applied for a particular job. But Smith has been told by the company president that Jones will win the job. Smith combines that testimony with his observational evidence of there being ten coins in Jones’s pocket. (He had counted them himself — an odd but imaginable circumstance.) And he proceeds to infer that whoever will get the job has ten coins in their pocket. (As the present article proceeds, we will refer to this belief several times more. For convenience, therefore, let us call it belief b.) Notice that Smith is not thereby guessing. On the contrary; his belief b enjoys a reasonable amount of justificatory support. There is the company president’s testimony; there is Smith’s observation of the coins in Jones’s pocket; and there is Smith’s proceeding to infer belief b carefully and sensibly from that other evidence. Belief b is thereby at least fairly well justified — supported by evidence which is good in a reasonably normal way. As it happens, too, belief b is true — although not in the way in which Smith was expecting it to be true. For it is Smith who will get the job, and Smith himself has ten coins in his pocket. These two facts combine to make his belief b true. Nevertheless, neither of those facts is something that, on its own, was known by Smith. Is his belief b therefore not knowledge? In other words, does Smith fail to know that the person who will get the job has ten coins in his pocket? Surely so (thought Gettier).

That is Gettier’s Case I, as it was interpreted by him, and as it has subsequently been regarded by almost all other epistemologists. The immediately pertinent aspects of it are standardly claimed to be as follows. It contains a belief which is true and justified — but which is not knowledge. And if that is an accurate reading of the case, then JTB is false. Case I would show that it is possible for a belief to be true and justified without being knowledge. Case I would have established that the combination of truth, belief, and justification does not entail the presence of knowledge. In that sense, a belief’s being true and justified would not be sufficient for its being knowledge.

But if JTB is false as it stands, with what should it be replaced? (Gettier himself made no suggestions about this.) Its failing to describe a jointly sufficient condition of knowing does not entail that the three conditions it does describe are not individually necessary to knowing. And if each of truth, belief, and justification is needed, then what aspect of knowledge is still missing? What feature of Case I prevents Smith’s belief b from being knowledge? What is the smallest imaginable alteration to the case that would allow belief b to become knowledge? Would we need to add some wholly new kind of element to the situation? Or is JTB false only because it is too general — too unspecific? For instance, are only some kinds of justification both needed and enough, if a true belief is to become knowledge? Must we describe more specifically how justification ever makes a true belief knowledge? Is Smith’s belief b justified in the wrong way, if it is to be knowledge?

4. Some other Gettier Cases

Having posed those questions, though, we should realize that they are merely representative of a more general epistemological line of inquiry. The epistemological challenge is not just to discover the minimal repair that we could make to Gettier’s Case I, say, so that knowledge would then be present. Rather, it is to find a failing — a reason for a lack of knowledge — that is common to all Gettier cases that have been, or could be, thought of (that is, all actual or possible cases relevantly like Gettier’s own ones). Only thus will we be understanding knowledge in general — all instances of knowledge, everyone’s knowledge. And this is our goal when responding to Gettier cases.

Sections 7 through 11 will present some attempted diagnoses of such cases. In order to evaluate them, therefore, it would be advantageous to have some sense of the apparent potential range of the concept of a Gettier case. I will mention four notable cases.

The lucky disjunction (Gettier’s second case: 1963). Again, Smith is the protagonist. This time, he possesses good evidence in favor of the proposition that Jones owns a Ford. Smith also has a friend, Brown. Where is Brown to be found at the moment? Smith does not know. Nonetheless, on the basis of his accepting that Jones owns a Ford, he infers — and accepts — each of these three disjunctive propositions:

  • Either Jones owns a Ford, or Brown is in Boston.
  • Either Jones owns a Ford, or Brown is in Barcelona.
  • Either Jones owns a Ford, or Brown is in Brest-Litovsk.

No insight into Brown’s location guides Smith in any of this reasoning. He realizes that he has good evidence for the first disjunct (regarding Jones) in each of those three disjunctions, and he sees this evidence as thereby supporting each disjunction as a whole. Seemingly, he is right about that. (These are inclusive disjunctions, not exclusive. That is, each can, if need be, accommodate the truth of both of its disjuncts. Each is true if even one — let alone both — of its disjuncts is true.) Moreover, in fact one of the three disjunctions is true (albeit in a way that would surprise Smith if he were to be told of how it is true). The second disjunction is true because, as good luck would have it, Brown is in Barcelona — even though, as bad luck would have it, Jones does not own a Ford. (As it happened, the evidence for his doing so, although good, was misleading.) Accordingly, Smith’s belief that either Jones owns a Ford or Brown is in Barcelona is true. And there is good evidence supporting — justifying — it. But is it knowledge?

The sheep in the field (Chisholm 1966/1977/1989). Imagine that you are standing outside a field. You see, within it, what looks exactly like a sheep. What belief instantly occurs to you? Among the many that could have done so, it happens to be the belief that there is a sheep in the field. And in fact you are right, because there is a sheep behind the hill in the middle of the field. You cannot see that sheep, though, and you have no direct evidence of its existence. Moreover, what you are seeing is a dog, disguised as a sheep. Hence, you have a well justified true belief that there is a sheep in the field. But is that belief knowledge?

The pyromaniac (Skyrms 1967). A pyromaniac reaches eagerly for his box of Sure-Fire matches. He has excellent evidence of the past reliability of such matches, as well as of the present conditions — the clear air and dry matches — being as they should be, if his aim of lighting one of the matches is to be satisfied. He thus has good justification for believing, of the particular match he proceeds to pluck from the box, that it will light. This is what occurs, too: the match does light. However, what the pyromaniac did not realize is that there were impurities in this specific match, and that it would not have lit if not for the sudden (and rare) jolt of Q-radiation it receives exactly when he is striking it. His belief is therefore true and well justified. But is it knowledge?

The fake barns (Goldman 1976). Henry is driving in the countryside, looking at objects in fields. He sees what looks exactly like a barn. Accordingly, he thinks that he is seeing a barn. Now, that is indeed what he is doing. But what he does not realize is that the neighborhood contains many fake barns — mere barn facades that look like real barns when viewed from the road. And if he had been looking at one of them, he would have been deceived into believing that he was seeing a barn. Luckily, he was not doing this. Consequently, his belief is justified and true. But is it knowledge?

In none of those cases (or relevantly similar ones), say almost all epistemologists, is the belief in question knowledge. (Note that some epistemologists do not regard the fake barns case as being a genuine Gettier case. There is a touch of vagueness in the concept of a Gettier case.)

5. The Basic Structure of Gettier Cases

Although the multitude of actual and possible Gettier cases differ in their details, some characteristics unite them. For a start, each Gettier case contains a belief which is true and well justified without — according to epistemologists as a whole — being knowledge. The following two generic features also help to constitute Gettier cases:

  1. Fallibility. The justification that is present within each case is fallible. Although it provides good support for the truth of the belief in question, that support is not perfect, strictly speaking. This means that the justification leaves open at least the possibility of the belief’s being false. The justification indicates strongly that the belief is true — without proving conclusively that it is.
  2. Luck. What is most distinctive of Gettier cases is the luck they contain. Within any Gettier case, in fact the well-but-fallibly justified belief in question is true. Nevertheless, there is significant luck in how the belief manages to combine being true with being justified. Some abnormal or odd circumstance is present in the case, a circumstance which makes the existence of that justified and true belief quite fortuitous.

Here is how those two features, (1) and (2), are instantiated in Gettier’s Case I. Smith’s evidence for his belief b was good but fallible. This left open the possibility of belief b being mistaken, even given that supporting evidence. As it happened, that possibility was not realized: Smith’s belief b was actually true. Yet this was due to the intervention of some good luck. Belief b could easily have been false; it was made true only by circumstances which were hidden from Smith. That is, belief b was in fact made true by circumstances (namely, Smith’s getting the job and there being ten coins in his pocket) other than those which Smith’s evidence noticed and which his evidence indicated as being a good enough reason for holding b to be true. What Smith thought were the circumstances (concerning Jones) making his belief b true were nothing of the sort. Luckily, though, some facts of which he had no inkling were making his belief true.

Similar remarks pertain to the sheep-in-the-field case. Within it, your sensory evidence is good. You rely on your senses, taking for granted — as one normally would — that the situation is normal. Then, by standard reasoning, you gain a true belief (that there is a sheep in the field) on the basis of that fallible-but-good evidence. Nonetheless, wherever there is fallibility there is a chance of being mistaken — of gaining a belief which is false. And that is exactly what would have occurred in this case (given that you are actually looking at a disguised dog) — if not, luckily, for the presence behind the hill of the hidden real sheep. Only luckily, therefore, is your belief both justified and true. And because of that luck (say epistemologists in general), the belief fails to be knowledge.

6. The Generality of Gettier Cases

JTB says that any actual or possible case of knowledge that p is an actual or possible instance of some kind of well justified true belief that p — and that any actual or possible instance of some kind of well justified true belief that p is an actual or possible instance of knowledge that p. Hence, JTB is false if there is even one actual or possible Gettier situation (in which some justified true belief fails to be knowledge). Accordingly, since 1963 epistemologists have tried — again and again and again — to revise or repair or replace JTB in response to Gettier cases. The main aim has been to modify JTB so as to gain a ‘Gettier-proof’ definition of knowledge.

How extensive would such repairs need to be? After all, even if some justified true beliefs arise within Gettier situations, not all do so. In practise, such situations are rare, with few of our actual justified true beliefs ever being “Gettiered.” Has Gettier therefore shown only that not all justified true beliefs are knowledge? Correlatively, might JTB be almost correct as it is — in the sense of being accurate about almost all actual or possible cases of knowledge?

On the face of it, Gettier cases do indeed show only that not all actual or possible justified true beliefs are knowledge — rather than that a belief’s being justified and true is never enough for its being knowledge. Nevertheless, epistemologists generally report the impact of Gettier cases in the latter way, describing them as showing that being justified and true is never enough to make a belief knowledge. Why do epistemologists interpret the Gettier challenge in that stronger way?

The reason is that they wish — by way of some universally applicable definition or formula or analysis — to understand knowledge in all of its actual or possible instances and manifestations, not only in some of them. Hence, epistemologists strive to understand how to avoid ever being in a Gettier situation (from which knowledge will be absent, regardless of whether such situations are uncommon). But that goal is, equally, the aim of understanding what it is about most situations that constitutes their not being Gettier situations. If we do not know what, exactly, makes a situation a Gettier case and what changes to it would suffice for its no longer being a Gettier case, then we do not know how, exactly, to describe the boundary between Gettier cases and other situations.

We call various situations in which we form beliefs “everyday” or “ordinary,” for example. In particular, therefore, we might wonder whether all “normally” justified true beliefs are still instances of knowledge (even if in Gettier situations the justified true beliefs are not knowledge). Yet even that tempting idea is not as straightforward as we might have assumed. For do we know what it is, exactly, that makes a situation ordinary? Specifically, what are the details of ordinary situations that allow them not to be Gettier situations — and hence that allow them to contain knowledge? To the extent that we do not understand what it takes for a situation not to be a Gettier situation, we do not understand what it takes for a situation to be a normal one (thereby being able to contain knowledge). Understanding Gettier situations would be part of understanding non-Gettier situations — including ordinary situations. Until we adequately understand Gettier situations, we do not adequately understand ordinary situations — because we would not adequately understand the difference between these two kinds of situation.

7. Attempted Solutions: Infallibility

To the extent that we understand what makes something a Gettier case, we understand what would suffice for that situation not to be a Gettier case. Section 5 outlined two key components — fallibility and luck — of Gettier situations. In this section and the next, we will consider whether removing one of those two components — the removal of which will suffice for a situation’s no longer being a Gettier case — would solve Gettier’s epistemological challenge. That is, we will be asking whether we may come to understand the nature of knowledge by recognizing its being incompatible with the presence of at least one of those two components (fallibility and luck).

There is a prima facie case, at any rate, for regarding justificatory fallibility with concern in this setting. So, let us examine the Infallibility Proposal for solving Gettier’s challenge. There have long been philosophers who doubt (independently of encountering Gettier cases) that allowing fallible justification is all that it would take to convert a true belief into knowledge. (“If you know that p, there must have been no possibility of your being mistaken about p,” they might say.) The classic philosophical expression of that sort of doubt was by René Descartes, most famously in his Meditations on First Philosophy (1641). Contemporary epistemologists who have voiced similar doubts include Keith Lehrer (1971) and Peter Unger (1971). In the opinion of epistemologists who embrace the Infallibility Proposal, we can eliminate Gettier cases as challenges to our understanding of knowledge, simply by refusing to allow that one’s having fallible justification for a belief that p could ever adequately satisfy JTB’s justification condition. Stronger justification than that is required within knowledge (they will claim); infallibilist justificatory support is needed. (They might even say that there is no justification present at all, let alone an insufficient amount of it, given the fallibility within the cases.)

Thus, for instance, an infallibilist about knowledge might claim that because (in Case I) Smith’s justification provided only fallible support for his belief b, this justification was always leaving open the possibility of that belief being mistaken — and that this is why the belief is not knowledge. The infallibilist might also say something similar — as follows — about the sheep-in-the-field case. Because you were relying on your fallible senses in the first place, you were bound not to gain knowledge of there being a sheep in the field. (“It could never be real knowledge, given the inherent possibility of error in using one’s senses.”) And the infallibilist will regard the fake-barns case in the same way, claiming that the potential for mistake (that is, the existence of fallibility) was particularly real, due to the existence of the fake barns. And that is why (infers the infallibilist) there is a lack of knowledge within the case — as indeed there would be within any situation where fallible justification is being used.

So, that is the Infallibility Proposal. The standard epistemological objection to it is that it fails to do justice to the reality of our lives, seemingly as knowers of many aspects of the surrounding world. In our apparently “ordinary” situations, moving from one moment to another, we take ourselves to have much knowledge. Yet we rarely, if ever, possess infallible justificatory support for a belief. And we accept this about ourselves, realizing that we are not wholly — conclusively — reliable. We accept that if we are knowers, then, we are at least not infallible knowers. But the Infallibility Proposal — when combined with that acceptance of our general fallibility — would imply that we are not knowers at all. It would thereby ground a skepticism about our ever having knowledge.

Accordingly, most epistemologists would regard the Infallibility Proposal as being a drastic and mistaken reaction to Gettier’s challenge in particular. In response to Gettier, most seek to understand how we do have at least some knowledge — where such knowledge will either always or almost always be presumed to involve some fallibility. The majority of epistemologists still work towards what they hope will be a non-skeptical conception of knowledge; and attaining this outcome could well need to include their solving the Gettier challenge without adopting the Infallibility Proposal.

8. Attempted Solutions: Eliminating Luck

The other feature of Gettier cases that was highlighted in section 5 is the lucky way in which such a case’s protagonist has a belief which is both justified and true. Is it this luck that needs to be eliminated if the situation is to become one in which the belief in question is knowledge? In general, must any instance of knowledge include no accidentalness in how its combination of truth, belief, and justification is effected? The Eliminate Luck Proposal claims so.

Almost all epistemologists, when analyzing Gettier cases, reach for some version of this idea, at least in their initial or intuitive explanations of why knowledge is absent from the cases. Unger (1968) is one who has also sought to make this a fuller and more considered part of an explanation for the lack of knowledge. He says that a belief is not knowledge if it is true only courtesy of some relevant accident. That description is meant to allow for some flexibility. Even so, further care will still be needed if the Eliminate Luck Proposal is to provide real insight and understanding. After all, if we seek to eliminate all luck whatsoever from the production of the justified true belief (if knowledge is thereby to be present), then we are again endorsing a version of infallibilism (as described in section 7). If no luck is involved in the justificatory situation, the justification renders the belief’s truth wholly predictable or inescapable; in which case, the belief is being infallibly justified. And this would be a requirement which (as section 7 explained) few epistemologists will find illuminating, certainly not as a response to Gettier cases.

What many epistemologists therefore say, instead, is that the problem within Gettier cases is the presence of too much luck. Some luck is to be allowed; otherwise, we would again have reached for the Infallibility Proposal. But too large a degree of luck is not to be allowed. This is why we often find epistemologists describing Gettier cases as containing too much chance or flukiness for knowledge to be present.

Nevertheless, how helpful is that kind of description by those epistemologists? How much luck is too much? That is a conceptually vital question. Yet there has been no general agreement among epistemologists as to what degree of luck precludes knowledge. There has not even been much attempt to determine that degree. (It is no coincidence, similarly, that epistemologists in general are also yet to determine how strong — if it is allowed to be something short of infallibility — the justificatory support needs to be within any case of knowledge.) A specter of irremediable vagueness thus haunts the Eliminate Luck Proposal.

Perhaps understandably, therefore, the more detailed epistemological analyses of knowledge have focused less on delineating dangerous degrees of luck than on characterizing substantive kinds of luck that are held to drive away knowledge. Are there ways in which Gettier situations are structured, say, which amount to the presence of a kind of luck which precludes the presence of knowledge (even when there is a justified true belief)? Most attempts to solve Gettier’s challenge instantiate this form of thinking. In sections 9 through 11, we will encounter a few of the main suggestions that have been made.

9. Attempted Solutions: Eliminating False Evidence

A lot of epistemologists have been attracted to the idea that the failing within Gettier cases is the person’s including something false in her evidence. This would be a problem for her, because she is relying upon that evidence in her attempt to gain knowledge, and because knowledge is itself always true. To the extent that falsity is guiding the person’s thinking in forming the belief that p, she will be lucky to derive a belief that p which is true. And (as section 8 indicated) there are epistemologists who think that a lucky derivation of a true belief is not a way to know that truth. Let us therefore consider the No False Evidence Proposal.

In Gettier’s Case I, for example, Smith includes in his evidence the false belief that Jones will get the job. If Smith had lacked that evidence (and if nothing else were to change within the case), presumably he would not have inferred belief b. He would probably have had no belief at all as to who would get the job (because he would have had no evidence at all on the matter). If so, he would thereby not have had a justified and true belief b which failed to be knowledge. Should JTB therefore be modified so as to say that no belief is knowledge if the person’s justificatory support for it includes something false? JTB would then tell us that one’s knowing that p is one’s having a justified true belief which is well supported by evidence, none of which is false.

That is the No False Evidence Proposal. But epistemologists have noticed a few possible problems with it.

First, as Richard Feldman (1974) saw, there seem to be some Gettier cases in which no false evidence is used. Imagine that (contrary to Gettier’s own version of Case I) Smith does not believe, falsely, “Jones will get the job.” Imagine instead that he believes, “The company president told me that Jones will get the job.” (He could have continued to form the first belief. But suppose that, as it happens, he does not form it.) This alternative belief would be true. It would also provide belief b with as much justification as the false belief provided. So, if all else is held constant within the case (with belief b still being formed), again Smith has a true belief which is well-although-fallibly justified, yet which might well not be knowledge.

Second, it will be difficult for the No False Evidence Proposal not to imply an unwelcome skepticism. Quite possibly, there is always some false evidence being relied upon, at least implicitly, as we form beliefs. Is there nothing false at all — not even a single falsity — in your thinking, as you move through the world, enlarging your stock of beliefs in various ways (not all of which ways are completely reliable and clearly under your control)? If there is even some falsity among the beliefs you use, but if you do not wholly remove it or if you do not isolate it from the other beliefs you are using, then — on the No False Evidence Proposal — there is a danger of its preventing those other beliefs from ever being knowledge. This is a worry to be taken seriously, if a belief’s being knowledge is to depend upon the total absence of falsity from one’s thinking in support of that belief.

Unsurprisingly, therefore, some epistemologists, such as Lehrer (1965), have proposed a further modification of JTB — a less demanding one. They have suggested that what is needed for knowing that p is an absence only of significant and ineliminable (non-isolable) falsehoods from one’s evidence for p’s being true. Here is what that means. First, false beliefs which you are — but need not have been — using as evidence for p are eliminable from your evidence for p. And, second, false beliefs whose absence would seriously weaken your evidence for p are significant within your evidence for p. Accordingly, the No False Evidence Proposal now becomes the No False Core Evidence Proposal. The latter proposal says that if the only falsehoods in your evidence for p are ones which you could discard, and ones whose absence would not seriously weaken your evidence for p, then (with all else being equal) your justification is adequate for giving you knowledge that p. The accompanying application of that proposal to Gettier cases would claim that because, within each such case, some falsehood plays an important role in the protagonist’s evidence, her justified true belief based on that evidence fails to be knowledge. On the modified proposal, this would be the reason for the lack of that knowledge.

One fundamental problem confronting that proposal is obviously its potential vagueness. To what extent, precisely, need you be able to eliminate the false evidence in question if knowledge that p is to be present? How easy, exactly, must this be for you? And just how weakened, exactly, may your evidence for p become — courtesy of the elimination of false elements within it — before it is too weak to be part of making your belief that p knowledge? Such questions still await answers from epistemologists.

10. Attempted Solutions: Eliminating Defeat

Section 9 explored the suggestion that the failing within any Gettier case is a matter of what is included within a given person’s evidence: specifically, some core falsehood is accepted within her evidence. A converse idea has also received epistemological attention — the thought that the failing within any Gettier case is a matter of what is not included in the person’s evidence: specifically, some notable truth or fact is absent from her evidence. This proposal would not simply be that the evidence overlooks at least one fact or truth. Like the unmodified No False Evidence Proposal (with which section 9 began), that would be far too demanding, undoubtedly leading to skepticism. Because there are always some facts or truths not noticed by anyone’s evidence for a particular belief, there would be no knowledge either. No one’s evidence for p would ever be good enough to satisfy the justification requirement that is generally held to be necessary to a belief that p’s being knowledge.

Epistemologists therefore restrict the proposal, turning it into what is often called a defeasibility analysis of knowledge. It can also be termed the No Defeat Proposal. The thought behind it is that JTB should be modified so as to say that what is needed in knowing that p is an absence from the inquirer’s context of any defeaters of her evidence for p. And what is a defeater? A particular fact or truth t defeats a body of justification j (as support for a belief that p) if adding t to j, thereby producing a new body of justification j*, would seriously weaken the justificatory support being provided for that belief that p — so much so that j* does not provide strong enough support to make even the true belief that p knowledge. This means that t is relevant to justifying p (because otherwise adding it to j would produce neither a weakened nor a strengthened j*) as support for p — but damagingly so. In effect, insofar as one wishes to have beliefs which are knowledge, one should only have beliefs which are supported by evidence that is not overlooking any facts or truths which — if left overlooked — function as defeaters of whatever support is being provided by that evidence for those beliefs.

In Case I, for instance, we might think that the reason why Smith’s belief b fails to be knowledge is that his evidence includes no awareness of the facts that he will get the job himself and that his own pocket contains ten coins. Thus, imagine a variation on Gettier’s case, in which Smith’s evidence does include a recognition of these facts about himself. Then either (i) he would have conflicting evidence (by having this evidence supporting his, plus the original evidence supporting Jones’s, being about to get the job), or (ii) he would not have conflicting evidence (if his original evidence about Jones had been discarded, leaving him with only the evidence about himself). But in either of those circumstances Smith would be justified in having belief b — concerning “the person,” whoever it would be, who will get the job. Moreover, in that circumstance he would not obviously be in a Gettier situation — with his belief b still failing to be knowledge. For, on either (i) or (ii), there would be no defeaters of his evidence — no facts which are being overlooked by his evidence, and which would seriously weaken his evidence if he were not overlooking them.

Unfortunately, however, this proposal — like the No False Core Evidence Proposal in section 9 — faces a fundamental problem of vagueness. As we have seen, defeaters defeat by weakening justification: as more and stronger defeaters are being overlooked by a particular body of evidence, that evidence is correlatively weakened. (This is so, even when the defeaters clash directly with one’s belief that p. And it is so, regardless of the believer’s not realizing that the evidence is thereby weakened.) How weak, exactly, can the justification for a belief that p become before it is too weak to sustain the belief’s being knowledge that p? This question — which, in one form or another, arises for all proposals which allow knowledge’s justificatory component to be satisfied by fallible justificatory support — is yet to be answered by epistemologists as a group. In the particular instance of the No Defeat Proposal, it is the question, raised by epistemologists such as William Lycan (1977) and Lehrer and Paxson (1969), of how much — and which aspects — of one’s environment need to be noticed by one’s evidence, if that evidence is to be justification that makes one’s belief that p knowledge. There can be much complexity in one’s environment, with it not always being clear where to draw the line between aspects of the environment which do — and those which do not — need to be noticed by one’s evidence. How strict should we be in what we expect of people in this respect?

11. Attempted Solutions: Eliminating Inappropriate Causality

It has also been suggested that the failing within Gettier situations is one of causality, with the justified true belief being caused — generated, brought about — in too odd or abnormal a way for it to be knowledge. This Appropriate Causality Proposal — initially advocated by Alvin Goldman (1967) — will ask us to consider, by way of contrast, any case of observational knowledge. Seemingly, a necessary part of such knowledge’s being produced is a stable and normal causal pattern’s generating the belief in question. You use your eyes in a standard way, for example. A belief might then form in a standard way, reporting what you observed. That belief will be justified in a standard way, too, partly by that use of your eyes. And it will be true in a standard way, reporting how the world actually is in a specific respect. All of this reflects the causal stability of normal visually-based belief-forming processes. In particular, we realize that the object of the knowledge — that perceived aspect of the world which most immediately makes the belief true — is playing an appropriate role in bringing the belief into existence.

Within Gettier’s Case I, however, that pattern of normality is absent. The aspects of the world which make Smith’s belief b true are the facts of his getting the job and of there being ten coins in his own pocket. But these do not help to cause the existence of belief b. (That belief is caused by Smith’s awareness of other facts — his conversation with the company president and his observation of the contents of Jones’s pocket.) Should JTB be modified accordingly, so as to tell us that a justified true belief is knowledge only if those aspects of the world which make it true are appropriately involved in causing it to exist?

Epistemologists have noticed problems with that Appropriate Causality Proposal, though.

First, some objects of knowledge might be aspects of the world which are unable ever to have causal influences. In knowing that 2 + 2 = 4 (this being a prima facie instance of what epistemologists term a priori knowledge), you know a truth — perhaps a fact — about numbers. And do they have causal effects? Most epistemologists do not believe so. (Maybe instances of numerals, such as marks on paper being interpreted on particular occasions in specific minds, can have causal effects. Yet — it is usually said — such numerals are merely representations of numbers. They are not the actual numbers.) Consequently, it is quite possible that the scope of the Appropriate Causality Proposal is more restricted than is epistemologically desirable. The proposal would apply only to empirical or a posteriori knowledge, knowledge of the observable world — which is to say that it might not apply to all of the knowledge that is actually or possibly available to people. And (as section 6 explained) epistemologists seek to understand all actual or possible knowledge, not just some of it.

Second, to what extent will the Appropriate Causality Proposal help us to understand even empirical knowledge? The problem is that epistemologists have not agreed on any formula for exactly how (if there is to be knowledge that p) the fact that p is to contribute to bringing about the existence of the justified true belief that p. Inevitably (and especially when reasoning is involved), there will be indirectness in the causal process resulting in the formation of the belief that p. But how much indirectness is too much? That is, are there degrees of indirectness that are incompatible with there being knowledge that p? And if so, how are we to specify those critical degrees?

For example, suppose that (in an altered Case I of which we might conceive) Smith’s being about to be offered the job is actually part of the causal explanation of why the company president told him that Jones would get the job. The president, with his mischievous sense of humor, wished to mislead Smith. And suppose that Smith’s having ten coins in his pocket made a jingling noise, subtly putting him in mind of coins in pockets, subsequently leading him to discover how many coins were in Jones’s pocket. Given all of this, the facts which make belief b true (namely, those ones concerning Smith’s getting the job and concerning the presence of the ten coins in his pocket) will actually have been involved in the causal process that brings belief b into existence. Would the Appropriate Causality Proposal thereby be satisfied — so that (in this altered Case I) belief b would now be knowledge? Or should we continue regarding the situation as being a Gettier case, a situation in which (as in the original Case I) the belief b fails to be knowledge? If we say that the situation remains a Gettier case, we need to explain why this new causal ancestry for belief b would still be too inappropriate to allow belief b to be knowledge.

Most epistemologists will regard the altered case as a Gettier case. But in that event they continue to owe us an analysis of what makes a given causal history inappropriate. Often, they talk of deviant causal chains. And that is an evocative phrase. But how clear is it? Once more, we will wonder about vagueness. In particular, we will ask, how deviant can a causal chain (one that results in some belief-formation) become before it is too deviant to be able to be bringing knowledge into existence? As we also found in sections 9 and 10, a conceptually deep problem of vagueness thus remains to be solved.

12. Attempted Dissolutions: Competing Intuitions

Sections 9 through 11 described some of the main proposals that epistemologists have made for solving the Gettier challenge directly. Those proposals accept the usual interpretation of each Gettier case as containing a justified true belief which fails to be knowledge. Each proposal then attempts to modify JTB, the traditional epistemological suggestion for what it is to know that p. What is sought by those proposals, therefore, is an analysis of knowledge which accords with the usual interpretation of Gettier cases. That analysis would be intended to cohere with the claim that knowledge is not present within Gettier cases. And why is it so important to cohere with the latter claim? The standard answer offered by epistemologists points to what they believe is their strong intuition that, within any Gettier case, knowledge is absent. Almost all epistemologists claim to have this intuition about Gettier cases. They treat this intuition with much respect. (It seems that most do so as part of a more general methodology, one which involves the respectful use of intuitions within many areas of philosophy. Frank Jackson [1998] is a prominent proponent of that methodology’s ability to aid our philosophical understanding of key concepts.)

Nonetheless, a few epistemological voices dissent from that approach (as this section and the next will indicate). These seek to dissolve the Gettier challenge. Instead of accepting the standard interpretation of Gettier cases, and instead of trying to find a direct solution to the challenge that the cases are thereby taken to ground, a dissolution of the cases denies that they ground any such challenge in the first place. And one way of developing such a dissolution is to deny or weaken the usual intuition by which almost all epistemologists claim to be guided in interpreting Gettier cases.

One such attempt has involved a few epistemologists — Jonathan Weinberg, Shaun Nichols, and Stephen Stich (2001) — conducting empirical research which (they argue) casts doubt upon the evidential force of the usual epistemological intuition about the cases. When epistemologists claim to have a strong intuition that knowledge is missing from Gettier cases, they take themselves to be representative of people in general (specifically, in how they use the word “knowledge” and its cognates such as “know,” knower,” and the like). That intuition is therefore taken to reflect how “we” — people in general — conceive of knowledge. It is thereby assumed to be an accurate indicator of pertinent details of the concept of knowledge — which is to say, “our” concept of knowledge. Yet what is it that gives epistemologists such confidence in their being representative of how people in general use the word “knowledge”? Mostly, epistemologists test this view of themselves upon their students and upon other epistemologists. The empirical research by Weinberg, Nichols, and Stich asked a wider variety of people — including ones from outside of university or college settings — about Gettier cases. And that research has reported encountering a wider variety of reactions to the cases. When people who lack much, or even any, prior epistemological awareness are presented with descriptions of Gettier cases, will they unhesitatingly say (as epistemologists do) that the justified true beliefs within those cases fail to be knowledge? The empirical evidence gathered so far suggests some intriguing disparities in this regard — including ones that might reflect varying ethnic ancestries or backgrounds. In particular, respondents of east Asian or Indian sub-continental descent were found to be more open than were European Americans (of “Western” descent) to classifying Gettier cases as situations in which knowledge is present. A similar disparity seemed to be correlated with respondents’ socio-economic status.

Those data are preliminary. (And other epistemologists have not sought to replicate those surveys.) Nonetheless, the data are suggestive. At the very least, they constitute some empirical evidence that does not simply accord with epistemologists’ usual interpretation of Gettier cases. Hence, a real possibility has been raised that epistemologists, in how they interpret Gettier cases, are not so accurately representative of people in general. Their shared, supposedly intuitive, interpretation of the cases might be due to something distinctive in how they, as a group, think about knowledge, rather than being merely how people as a whole regard knowledge. In other words, perhaps the apparent intuition about knowledge (as it pertains to Gettier situations) that epistemologists share with each other is not universally shared. Maybe it is at least not shared with as many other people as epistemologists assume is the case. And if so, then the epistemologists’ intuition might not merit the significance they have accorded it when seeking a solution to the Gettier challenge. (Indeed, that challenge itself might not be as distinctively significant as epistemologists have assumed it to be. This possibility arises once we recognize that the prevalence of that usual putative intuition among epistemologists has been important to their deeming, in the first place, that Gettier cases constitute a decisive challenge to our understanding of what it is to know that p.)

Epistemologists might reply that people who think that knowledge is present within Gettier cases are not evaluating the cases properly — that is, as the cases should be interpreted. The question thus emerges of whether epistemologists’ intuitions are particularly trustworthy on this topic. Are they more likely to be accurate (than are other people’s intuitions) in what they say about knowledge — in assessing its presence in, or its absence from, specific situations? Presumably, most epistemologists will think so, claiming that when other people do not concur that in Gettier cases there is a lack of knowledge, those competing reactions reflect a lack of understanding of the cases — a lack of understanding which could well be rectified by sustained epistemological reflection.

Potentially, that disagreement has methodological implications about the nature and point of epistemological inquiry. For we should wonder whether those epistemologists, insofar as their confidence in their interpretation of Gettier cases rests upon their more sustained reflection about such matters, are really giving voice to intuitions as such about Gettier cases when claiming to be doing so. Or are they instead applying some comparatively reflective theories of knowledge? The latter alternative need not make their analyses mistaken, of course. But it would make more likely the possibility that the analyses of knowledge which epistemologists develop in order to understand Gettier cases are not based upon a directly intuitive reading of the cases. This might weaken the strength and independence of the epistemologists’ evidential support for those analyses of knowledge.

For example, maybe the usual epistemological interpretation of Gettier cases is manifesting a commitment to a comparatively technical and demanding concept of knowledge, one that only reflective philosophers would use and understand. Even if the application of that concept feels intuitive to them, this could be due to the kind of technical training that they have experienced. It might not be a coincidence, either, that epistemologists tend to present Gettier cases by asking the audience, “So, is this justified true belief within the case really knowledge?” — thereby suggesting, through this use of emphasis, that there is an increased importance in making the correct assessment of the situation. The audience might well feel a correlative caution about saying that knowledge is present. They could feel obliged to take care not to accord knowledge if there is anything odd — as, clearly, there is — about the situation being discussed. When that kind of caution and care are felt to be required, then — as contextualist philosophers such as David Lewis (1996) have argued is appropriate — we are more likely to deny that knowledge is present.

Hence, if epistemologists continue to insist that the nature of knowledge is such as to satisfy one of their analyses (where this includes knowledge’s being such that it is absent from Gettier cases), then there is a correlative possibility that they are talking about something — knowledge — that is too difficult for many, if any, inquirers ever to attain. How should people — as potential or actual inquirers — react to that possibility? Mark Kaplan (1985) has argued that insofar as knowledge must conform to the demands of Gettier cases (and to the usual epistemological interpretation of them), knowledge is not something about which we should care greatly as inquirers. And the fault would be knowledge’s, not ours. Kaplan advocates our seeking something less demanding and more realistically attainable than knowledge is if it needs to cohere with the usual interpretation of Gettier cases. (An alternative thought which Kaplan’s argument might prompt us to investigate is that of whether knowledge itself could be something less demanding — even while still being at least somewhat worth seeking. Section 13 will discuss that idea.)

Those pivotal issues are currently unresolved. In the meantime, their presence confirms that, by thinking about Gettier cases, we may naturally raise some substantial questions about epistemological methodology — about the methods via which we should be trying to understand knowledge. Those questions include the following ones. What evidence should epistemologists consult as they strive to learn the nature of knowledge? Should they be perusing intuitions? If so, whose? Their own? How should competing intuitions be assessed? And how strongly should favored intuitions be relied upon anyway? Are they to be decisive? Are they at least powerful? Or are they no more than a starting-point for further debate — a provider, not an adjudicator, of relevant ideas?

13. Attempted Dissolutions: Knowing Luckily

Section 12 posed the question of whether supposedly intuitive assessments of Gettier situations support the usual interpretation of the cases as strongly — or even as intuitively — as epistemologists generally believe is the case. How best might that question be answered? Sections 5 and 8 explained that when epistemologists seek to support that usual interpretation in a way that is meant to remain intuitive, they typically begin by pointing to the luck that is present within the cases. That luck is standardly thought to be a powerful — yet still intuitive — reason why the justified true beliefs inside Gettier cases fail to be knowledge.

Nevertheless, a contrary interpretation of the luck’s role has also been proposed, by Stephen Hetherington (1998; 2001). It means to reinstate the sufficiency of JTB, thereby dissolving Gettier’s challenge. That contrary interpretation could be called the Knowing Luckily Proposal. And it analyses Gettier’s Case I along the following lines.

This alternative interpretation concedes (in accord with the usual interpretation) that, in forming his belief b, Smith is lucky to be gaining a belief which is true. More fully: He is lucky to do so, given the evidence by which he is being guided in forming that belief, and given the surrounding facts of his situation. In that sense (we might say), Smith came close to definitely lacking knowledge. (For in that sense he came close to forming a false belief; and a belief which is false is definitely not knowledge.) But to come close to definitely lacking knowledge need not be to lack knowledge. It might merely be to almost lack knowledge. So (as we might also say), it could be to know, albeit luckily so. Smith would have knowledge, in virtue of having a justified true belief. (We would thus continue to regard JTB as being true.) However, because Smith would only luckily have that justified true belief, he would only luckily have that knowledge.

Most epistemologists will object that this sounds like too puzzling a way to talk about knowing. Their reaction is natural. Even this Knowing Luckily Proposal would probably concede that there is very little (if any) knowledge which is lucky in so marked or dramatic a way. And because there is so little (if any) such knowledge, our everyday lives leave us quite unused to thinking of some knowledge as being present within ourselves or others quite so luckily: we would actually encounter little (if any) such knowledge. To the extent that the kind of luck involved in such cases reflects the statistical unlikelihood of such circumstances occurring, therefore, we should expect at least most knowledge not to be present in that lucky way. (Otherwise, this would be the normal way for knowledge to be present. It would not in fact be an unusual way. Hence, strictly speaking, the knowledge would not be present only luckily.)

But even if the Knowing Luckily Proposal agrees that, inevitably, at least most knowledge will be present in comparatively normal ways, the proposal will deny that this entails the impossibility of there ever being at least some knowledge which is present more luckily. Ordinarily, when good evidence for a belief that p accompanies the belief’s being true (as it does in Case I), this combination of good evidence and true belief occurs (unlike in Case I) without any notable luck being needed. Ordinary knowledge is thereby constituted, with that absence of notable luck being part of what makes instances of ordinary knowledge ordinary in our eyes. What is ordinary to us will not strike us as being present only luckily. Again, though, is it therefore impossible for knowledge ever to be constituted luckily? The Knowing Luckily Proposal claims that such knowledge is possible even if uncommon. The proposal will grant that there would be a difference between knowing that p in a comparatively ordinary way and knowing that p in a comparatively lucky way. Knowing comparatively luckily that p would be (i) knowing that p (where this might remain one’s having a justified true belief that p), even while also (ii) running, or having run, a greater risk of not having that knowledge that p. In that sense, it would be to know that p less securely or stably or dependably, more fleetingly or unpredictably.

There are many forms that the lack of stability — the luck involved in the knowledge’s being present — could take. Sometimes it might include the knowledge’s having one of the failings found within Gettier cases. The knowledge — the justified true belief — would be present in a correspondingly lucky way. One interpretive possibility — from Hetherington (2001) — is that of describing this knowledge that p as being of a comparatively poor quality as knowledge that p. Normally, knowledge that p is of a higher quality than this — being less obviously flawed, by being less luckily present. The question persists, though: Must all knowledge that p be, in effect, normal knowledge that p — being of a normal quality as knowledge that p? Or could we sometimes — even if rarely — know that p in a comparatively poor and undesirable way? The Knowing Luckily Proposal allows that this is possible — that this is a conceivable form for some knowledge to take.

That proposal is yet to be widely accepted among epistemologists. Their main objection to it has been what they have felt to be the oddity of talking of knowledge in that way. Accordingly, the epistemological resistance to the proposal partly reflects the standard adherence to the dominant (“intuitive”) interpretation of Gettier cases. Yet this section and the previous one have asked whether epistemologists should be wedded to that interpretation of Gettier cases. So, this section leaves us with the following question: Is it conceptually coherent to regard the justified true beliefs within Gettier cases as instances of knowledge which are luckily produced or present? And how are we to answer that question anyway? With intuitions? Whose? Once again, we encounter section 12’s questions about the proper methodology for making epistemological progress on this issue.

14. Gettier Cases and Analytic Epistemology

Since the initial philosophical description in 1963 of Gettier cases, the project of responding to them (so as to understand what it is to know that p) has often been central to the practice of analytic epistemology. Partly this recurrent centrality has been due to epistemologists’ taking the opportunity to think in detail about the nature of justification — about what justification is like in itself, and about how it is constitutively related to knowledge. But partly, too, that recurrent centrality reflects the way in which, epistemologists have often assumed, responding adequately to Gettier cases requires the use of a paradigm example of a method that has long been central to analytic philosophy. That method involves the considered manipulation and modification of definitional models or theories, in reaction to clear counterexamples to those models or theories.

Thus (we saw in section 2), JTB purported to provide a definitional analysis of what it is to know that p. JTB aimed to describe, at least in general terms, the separable-yet-combinable components of such knowledge. Then Gettier cases emerged, functioning as apparently successful counterexamples to one aspect — the sufficiency — of JTB’s generic analysis. That interpretation of the cases’ impact rested upon epistemologists’ claims to have reflective-yet-intuitive insight into the absence of knowledge from those actual or possible Gettier circumstances. These claims of intuitive insight were treated by epistemologists as decisive data, somewhat akin to favored observations. The claims were to be respected accordingly; and, it was assumed, any modification of the theory encapsulated in JTB would need to be evaluated for how well it accommodated them. So, the entrenchment of the Gettier challenge at the core of analytic epistemology hinged upon epistemologists’ confident assumptions that (i) JTB failed to accommodate the data provided by those intuitions — and that (ii) any analytical modification of JTB would need (and would be able) to be assessed for whether it accommodated such intuitions. That was the analytical method which epistemologists proceeded to apply, vigorously and repeatedly.

Nevertheless, the history of post-1963 analytic epistemology has also contained repeated expressions of frustration at the seemingly insoluble difficulties that have accompanied the many attempts to respond to Gettier’s disarmingly simple paper. Precisely how should the theory JTB be revised, in accord with the relevant data? Exactly which data are relevant anyway? We have seen in the foregoing sections that there is much room for dispute and uncertainty about all of this. For example, we have found a persistent problem of vagueness confronting various attempts to revise JTB. This might have us wondering whether a complete analytical definition of knowledge that p is even possible.

That is especially so, given that vagueness itself is a phenomenon, the proper understanding of which is yet to be agreed upon by philosophers. There is much contemporary discussion of what it even is (see Keefe and Smith 1996). On one suggested interpretation, vagueness is a matter of people in general not knowing where to draw a precise and clearly accurate line between instances of X and instances of non-X (for some supposedly vague phenomenon of being X, such as being bald or being tall). On that interpretation of vagueness, such a dividing line would exist; we would just be ignorant of its location. To many philosophers, that idea sounds regrettably odd when the vague phenomenon in question is baldness, say. (“You claim that there is an exact dividing line, in terms of the number of hairs on a person’s head, between being bald and not being bald? I find that claim extremely hard to believe.”) But should philosophers react with such incredulity when the phenomenon in question is that of knowing, and when the possibility of vagueness is being prompted by discussions of the Gettier problem? For most epistemologists remain convinced that their standard reaction to Gettier cases reflects, in part, the existence of a definite difference between knowing and not knowing. But where, exactly, is that dividing line to be found? As we have observed, the usual epistemological answers to this question seek to locate and to understand the dividing line in terms of degrees and kinds of justification or something similar. Accordingly, the threats of vagueness we have noticed in some earlier sections of this article might be a problem for many epistemologists. Possibly, those forms of vagueness afflict epistemologists’ knowing that a difference between knowledge and non-knowledge is revealed by Gettier cases. Epistemologists continue regarding the cases in that way. Are they right to do so? Do they have that supposed knowledge of what Gettier cases show about knowledge?

The Gettier challenge has therefore become a test case for analytically inclined philosophers. The following questions have become progressively more pressing with each failed attempt to convince epistemologists as a group that, in a given article or talk or book, the correct analysis of knowledge has finally been reached. Will an adequate understanding of knowledge ever emerge from an analytical balancing of various theories of knowledge against relevant data such as intuitions? Must any theory of the nature of knowledge be answerable to intuitions prompted by Gettier cases in particular? And must epistemologists’ intuitions about the cases be supplemented by other people’s intuitions, too? What kind of theory of knowledge is at stake? What general form should the theory take? And what degree of precision should it have? If we are seeking an understanding of knowledge, must this be a logically or conceptually exhaustive understanding? (The methodological model of theory-being-tested-against-data suggests a scientific parallel. Yet need scientific understanding always be logically or conceptually exhaustive if it is to be real understanding?)

The issues involved are complex and subtle. No analysis has received general assent from epistemologists, and the methodological questions remain puzzling. Debate therefore continues. There is uncertainty as to whether Gettier cases — and thereby knowledge — can ever be fully understood. There is also uncertainty as to whether the Gettier challenge can be dissolved. Have we fully understood the challenge itself? What exactly is Gettier’s legacy? As epistemologists continue to ponder these questions, it is not wholly clear where their efforts will lead us. Conceptual possibilities still abound.

15. References and Further Reading

  • Ayer, A. J. (1956). The Problem of Knowledge (London: Macmillan), ch. 1.
    • Presents a well-regarded pre-Gettier JTB analysis of knowledge.
  • Chisholm, R. M. (1966/1977/1989). Theory of Knowledge (any of the three editions). (Englewood Cliffs, NJ: Prentice Hall).
    • Includes the sheep-in-the-field Gettier case, along with attempts to repair JTB.
  • Descartes, R. (1911 [1641]). The Philosophical Works of Descartes, Vol. I, (eds. and trans.) E. S. Haldane and G. R. T. Ross. (Cambridge: Cambridge University Press).
    • Contains the Meditations, which develops and applies Descartes’s conception of knowledge as needing to be infallible.
  • Feldman, R. (1974). “An Alleged Defect in Gettier Counterexamples.” Australasian Journal of Philosophy 52: 68-9. Reprinted in Moser (1986).
    • Presents a Gettier case in which, it is claimed, no false evidence is used by the believer.
  • Gettier, E. L. (1963). “Is Justified True Belief Knowledge?” Analysis 23: 121-3. Reprinted in Roth and Galis (1970) and Moser (1986).
  • Goldman, A. I. (1967). “A Causal Theory of Knowing.” Journal of Philosophy 64: 357-72. Reprinted, with revisions, in Roth and Galis (1970).
    • The initial presentation of a No Inappropriate Causality Proposal.
  • Goldman, A. I.. (1976). “Discrimination and Perceptual Knowledge.” Journal of Philosophy 73: 771-91. Reprinted in Pappas and Swain (1978).
    • Includes the fake-barns Gettier case.
  • Hetherington, S. (1996). Knowledge Puzzles: An Introduction to Epistemology (Boulder, Colo.: Westview Press).
    • Includes an introduction to the justified-true-belief analysis of knowledge, and to several responses to Gettier’s challenge.
  • Hetherington, S. (1998). “Actually Knowing.” Philosophical Quarterly 48: 453-69.
    • Includes a version of the Knowing Luckily Proposal.
  • Hetherington, S. (2001). Good Knowledge, Bad Knowledge: On Two Dogmas of Epistemology (Oxford: Oxford University Press).
    • Extends the Knowing Luckily Proposal, by explaining the idea of having qualitatively better or worse knowledge that p.
  • Jackson, F. (1998). From Metaphysics to Ethics: A Defence of Conceptual Analysis (Oxford: Oxford University Press).
    • Includes discussion of Gettier cases and the role of intuitions and conceptual analysis.
  • Kaplan, M. (1985). “It’s Not What You Know That Counts.” Journal of Philosophy 82: 350-63.
    • Argues that, given Gettier cases, knowledge is not what inquirers should seek.
  • Keefe, R. and Smith, P. (eds.) (1996). Vagueness: A Reader (Cambridge, Mass.: The MIT Press).
    • Contains both historical and contemporary analyses of the nature and significance of vagueness in general.
  • Kirkham, R. L. (1984). “Does the Gettier Problem Rest on a Mistake?” Mind 93: 501-13.
    • Argues that the usual interpretation of Gettier cases depends upon applying an extremely demanding conception of knowledge to the described situations, a conception with skeptical implications.
  • Lehrer, K. (1965). “Knowledge, Truth and Evidence.” Analysis 25: 168-75. Reprinted in Roth and Galis (1970).
    • Presents a No Core False Evidence Proposal.
  • Lehrer, K. (1971). “Why Not Scepticism?” The Philosophical Forum 2: 283-98. Reprinted in Pappas and Swain (1978).
    • Outlines a skepticism based on an Infallibility Proposal about knowledge.
  • Lehrer, K., and Paxson, T. D. (1969). “Knowledge: Undefeated Justified True Belief.” Journal of Philosophy 66: 225-37. Reprinted in Pappas and Swain (1978).
    • Presents a No Defeat Proposal.
  • Lewis, D. (1996). “Elusive Knowledge.” Australasian Journal of Philosophy 74: 549-67.
    • Includes a much-discussed response to Gettier cases which pays attention to nuances in how people discuss knowledge.
  • Lycan, W. G. (1977). “Evidence One Does not Possess.” Australasian Journal of Philosophy 55: 114-26.
    • Discusses potential complications in a No Defeat Proposal.
  • Lycan, W. G. (2006). “On the Gettier Problem Problem.” In Epistemology Futures, (ed.) S. Hetherington. (Oxford: Oxford University Press).
    • A recent overview of the history of attempted solutions to the Gettier problem.
  • Moser, P. K. (ed.) (1986). Empirical Knowledge: Readings in Contemporary Epistemology (Totowa, NJ: Rowman & Littlefield).
    • Contains some influential papers on Gettier cases.
  • Pappas, G. S., and Swain, M. (eds.) (1978). Essays on Knowledge and Justification (Ithaca, NY: Cornell University Press).
    • A key anthology, mainly on the Gettier problem.
  • Plato. Meno 97a-98b.
    • For what epistemologists generally regard as being an early version of JTB.
  • Plato. Theatetus 200d-210c.
    • For seminal philosophical discussion of some possible instances of JTB.
  • Roth, M. D., and Galis, L. (eds.) (1970). Knowing: Essays in the Analysis of Knowledge (New York: Random House).
    • Includes some noteworthy papers on Gettier’s challenge.
  • Shope, R. K. (1983). The Analysis of Knowing: A Decade of Research (Princeton: Princeton University Press).
    • Presents many Gettier cases; discusses several proposed analyses of them.
  • Skyrms, B. (1967). “The Explication of ‘X Knows that p’.” Journal of Philosophy 64: 373-89. Reprinted in Roth and Galis (1970).
    • Includes the pyromaniac Gettier case.
  • Unger, P. (1968). “An Analysis of Factual Knowledge.” Journal of Philosophy 65: 157-70. Reprinted in Roth and Galis (1970).
    • Presents an Eliminate Luck Proposal.
  • Unger, P. (1971). “A Defense of Skepticism.” The Philosophical Review 30: 198-218. Reprinted in Pappas and Swain (1978).
    • Defends and applies an Infallibility Proposal about knowledge.
  • Weinberg, J., Nichols, S., and Stich, S. (2001). “Normativity and Epistemic Intuitions.” Philosophical Topics 29: 429-60.
    • Includes empirical data on competing (‘intuitive’) reactions to Gettier cases.
  • Williamson, T. (2000). Knowledge and Its Limits (Oxford: Oxford University Press), Intro., ch. 1.
    • Includes arguments against responding to Gettier cases with an analysis of knowledge.

Author Information

Stephen Hetherington
Email: s.hetherington@unsw.edu.au
University of New South Wales
Australia

Aristotle: Politics

In his Nicomachean Ethics, Aristotle (384-322 B.C.E.) describes the happy life intended for man by nature as one lived in accordance with virtue, and, in his Politics, he describes the role that politics and the political community must play in bringing about the virtuous life in the citizenry.

The Politics also provides analysis of the kinds of political community that existed in his time and shows where and how these cities fall short of the ideal community of virtuous citizens.

Although in some ways we have clearly moved beyond his thought (for example, his belief in the inferiority of women and his approval of slavery in at least some circumstances), there remains much in Aristotle’s philosophy that is valuable today.

In particular, his views on the connection between the well-being of the political community and that of the citizens who make it up, his belief that citizens must actively participate in politics if they are to be happy and virtuous, and his analysis of what causes and prevents revolution within political communities have been a source of inspiration for many contemporary theorists, especially those unhappy with the liberal political philosophy promoted by thinkers such as John Locke and John Stuart Mill.

Table of Contents

  1. Biography and History
  2. The Texts
  3. Challenges of the Texts
  4. Politics and Ethics
  5. The Importance of Telos
  6. The Text of the Politics
  7. The Politics, Book I
    1. The Purpose of the City
    2. How the City Comes Into Being
    3. Man, the Political Animal
    4. Slavery
    5. Women
  8. The Politics, Book II
    1. What Kind of Partnership Is a City?
    2. Existing Cities: Sparta, Crete, Carthage
  9. The Politics, Book III
    1. Who Is the Citizen?
    2. The Good Citizen and the Good Man
    3. Who Should Rule?
  10. The Politics, Book IV
    1. Polity: The Best Practical Regime
    2. The Importance of the Middle Class
  11. The Politics, Book V
    1. Conflict Between the Rich and the Poor
    2. How to Preserve Regimes
  12. The Politics, Book VI
    1. Varieties of Democracy
    2. The Best Kind of Democracy
    3. The Role of Wealth in a Democracy
  13. The Politics, Book VII
    1. The Best Regime and the Best Men
    2. Characteristics of the Best City
  14. The Politics, Book VIII
    1. The Education of the Young
  15. References and Further Reading

1. Biography and History

Aristotle’s life was primarily that of a scholar. However, like the other ancient philosophers, it was not the stereotypical ivory tower existence. His father was court physician to Amyntas III of Macedon, so Aristotle grew up in a royal household. Aristotle also knew Philip of Macedon (son of Amyntas III) and there is a tradition that says Aristotle tutored Philip’s son Alexander, who would later be called “the Great” after expanding the Macedonian Empire all the way to what is now India. Clearly, Aristotle had significant firsthand experience with politics, though scholars disagree about how much influence, if any, this experience had on Aristotle’s thought. There is certainly no evidence that Alexander’s subsequent career was much influenced by Aristotle’s teaching, which is uniformly critical of war and conquest as goals for human beings and which praises the intellectual, contemplative lifestyle. It is noteworthy that although Aristotle praises the politically active life, he spent most of his own life in Athens, where he was not a citizen and would not have been allowed to participate directly in politics (although of course anyone who wrote as extensively and well about politics as Aristotle did was likely to be politically influential).

Aristotle studied under Plato at Plato’s Academy in Athens, and eventually opened a school of his own (the Lyceum) there. As a scholar, Aristotle had a wide range of interests. He wrote about meteorology, biology, physics, poetry, logic, rhetoric, and politics and ethics, among other subjects. His writings on many of these interests remained definitive for almost two millennia. They remained, and remain, so valuable in part because of the comprehensiveness of his efforts. For example, in order to understand political phenomena, he had his students collect information on the political organization and history of 158 different cities. The Politics makes frequent reference to political events and institutions from many of these cities, drawing on his students’ research. Aristotle’s theories about the best ethical and political life are drawn from substantial amounts of empirical research. These studies, and in particular the Constitution of Athens, will be discussed in more detail below (Who Should Rule?). The question of how these writings should be unified into a consistent whole (if that is even possible) is an open one and beyond the scope of this article. This article will not attempt to organize all of Aristotle’s work into a coherent whole, but will draw on different texts as they are necessary to complete one version of Aristotle’s view of politics.

2. The Texts

The most important text for understanding Aristotle’s political philosophy, not surprisingly, is the Politics. However, it is also important to read Nicomachean Ethics in order to fully understand Aristotle’s political project. This is because Aristotle believed that ethics and politics were closely linked, and that in fact the ethical and virtuous life is only available to someone who participates in politics, while moral education is the main purpose of the political community. As he says in Nicomachean Ethics at 1099b30, “The end [or goal] of politics is the best of ends; and the main concern of politics is to engender a certain character in the citizens and to make them good and disposed to perform noble actions.” Most people living today in Western societies like the United States, Canada, Germany, or Australia would disagree with both parts of that statement. We are likely to regard politics (and politicians) as aiming at ignoble, selfish ends, such as wealth and power, rather than the “best end”, and many people regard the idea that politics is or should be primarily concerned with creating a particular moral character in citizens as a dangerous intrusion on individual freedom, in large part because we do not agree about what the “best end” is. In fact, what people in Western societies generally ask from politics and the government is that they keep each of us safe from other people (through the provision of police and military forces) so that each of us can choose and pursue our own ends, whatever they may be. This has been the case in Western political philosophy at least since John Locke. Development of individual character is left up to the individual, with help from family, religion, and other non-governmental institutions. More will be said about this later, but the reader should keep in mind that this is an important way in which our political and ethical beliefs are not Aristotle’s. The reader is also cautioned against immediately concluding from this that Ar istotle was wrong and we are right. This may be so, but it is important to understand why, and the contrast between Aristotle’s beliefs and ours can help to bring the strengths and weaknesses of our own beliefs into greater clarity.

The reference above to “Nicomachean Ethics at 1099b30″ makes use of what is called Bekker pagination. This refers to the location of beginning of the cited text in the edition of Aristotle’s works produced by Immanuel Bekker in Berlin in 1831 (in this case, it begins on page 1099, column b, line 30). Scholars make use of this system for all of Aristotle’s works except the Constitution of Athens (which was not rediscovered until after 1831) and fragmentary works in order to be able to refer to the same point in Aristotle’s work regardless of which edition, translation, or language they happen to be working with. This entry will make use of the Bekker pagination system, and will also follow tradition and refer to Nicomachean Ethics as simply Ethics. (There is also a Eudemian Ethics which is almost certainly by Aristotle (and which shares three of the ten books of the Nicomachean Ethics) and a work on ethics titled Magna Moralia which has been attributed to him but which most scholars now believe is not his work. Regardless, most scholars believe that the Nicomachean Ethics is Aristotle’s fullest and most mature expression of his ethical theory). The translation is that of Martin Ostwald; see the bibliography for full information. In addition to the texts listed above, the student with an interest in Aristotle’s political theory may also wish to read the Rhetoric, which includes observations on ethics and politics in the context of teaching the reader how to be a more effective speaker, and the Constitution of Athens, a work attributed to Aristotle, but which may be by one of his students, which describes the political history of the city of Athens.

3. Challenges of the Texts

Any honest attempt to summarize and describe Aristotle’s political philosophy must include an acknowledgment that there is no consensus on many of the most important aspects of that philosophy. Some of the reasons for this should be mentioned from the outset.

One set of reasons has to do with the text itself and the transmission of the text from Aristotle’s time to ours. The first thing that can lead to disagreement over Aristotle’s beliefs is the fact that the Politics andEthics are believed by many scholars to be his lecture notes, for lectures which were intended to be heard only by his own students. (Aristotle did write for general audiences on these subjects, probably in dialogue form, but only a few fragments of those writings remain). This is also one reason why many students have difficulty reading his work: no teacher’s lecture notes ever make complete sense to anyone else (their meaning can even elude their author at times). Many topics in the texts are discussed less fully than we would like, and many things are ambiguous which we wish were more straightforward. But if Aristotle was lecturing from these writings, he could have taken care of these problems on the fly as he lectured, since presumably he knew what he meant, or he could have responded to requests for clarification or elaboration from his students.

Secondly, most people who read Aristotle are not reading him in the original Attic Greek but are instead reading translations. This leads to further disagreement, because different authors translate Aristotle differently, and the way in which a particular word is translated can be very significant for the text as a whole. There is no way to definitively settle the question of what Aristotle “really meant to say” in using a particular word or phrase.

Third, the Aristotelian texts we have are not the originals, but copies, and every time a text gets copied errors creep in (words, sentences, or paragraphs can get left out, words can be changed into new words, and so forth). For example, imagine someone writing the sentence “Ronald Reagan was the lastcompetent president of the United States.” It is copied by hand, and the person making the copy accidentally writes (or assumes that the author must have written) “Ronald Reagan was the leastcompetent president of the United States.” If the original is then destroyed, so that only the copy remains, future generations will read a sentence that means almost exactly the opposite of what the author intended. It may be clear from the context that a word has been changed, but then again it may not, and there is always hesitation in changing the text as we have it. In addition, although nowadays it is unacceptable to modify someone else’s work without clearly denoting the changes, this is a relatively recent development and there are portions of Aristotle’s texts which scholars believe were added by later writers. This, too, complicates our understanding of Aristotle.

Finally, there are a number of controversies related to the text of the Politics in particular. These controversies cannot be discussed here, but should be mentioned. For more detail consult the works listed in the “Suggestions for further reading” below. First, there is disagreement about whether the books of the Politics are in the order that Aristotle intended. Carnes Lord and others have argued based on a variety of textual evidence that books 7 and 8 were intended by Aristotle to follow book 3. Rearranging the text in this way would have the effect of joining the early discussion of the origins of political life and the city, and the nature of political justice, with the discussion of the ideal city and the education appropriate for it, while leaving together books 4-6 which are primarily concerned with existing varieties of regimes and how they are preserved and destroyed and moving them to the conclusion of the book. Second, some authors, notably Werner Jaeger, have argued that the different focus and orientation of the different portions of the Politics is a result of Aristotle writing them at different times, reflecting his changing interests and orientation towards Plato‘s teachings. The argument is that at first Aristotle stuck very closely to the attitudes and ideas of his teacher Plato, and only later developed his own more empirical approach. Thus any difficulties that there may be in integrating the different parts of the Politicsarise from the fact that they were not meant to be integrated and were written at different times and with different purposes. Third, the Politics as we have it appears to be incomplete; Book 6 ends in the middle of a sentence and Book 8 in the middle of a discussion. There are also several places in the Politicswhere Aristotle promises to consider a topic further later but does not do so in the text as we have i t (for example, at the end of Book II, Chapter 8). It is possible that Aristotle never finished writing it; more likely there is material missing as a result of damage to the scrolls on which it was written. The extent and content of any missing material is a matter of scholarly debate.

Fortunately, the beginning student of Aristotle will not need to concern themselves much with these problems. It is, however, important to get a quality translation of the text, which provides an introduction, footnotes, a glossary, and a bibliography, so that the reader is aware of places where, for example, there seems to be something missing from the text, or a word can have more than one meaning, or there are other textual issues. These will not always be the cheapest or most widely available translations, but it is important to get one of them, from a library if need be. Several suggested editions are listed at the end of this article.

4. Politics and Ethics

In Book Six of the Ethics Aristotle says that all knowledge can be classified into three categories: theoretical knowledge, practical knowledge, and productive knowledge. Put simply, these kinds of knowledge are distinguished by their aims: theoretical knowledge aims at contemplation, productive knowledge aims at creation, and practical knowledge aims at action. Theoretical knowledge involves the study of truth for its own sake; it is knowledge about things that are unchanging and eternal, and includes things like the principles of logic, physics, and mathematics (at the end of the Ethics Aristotle says that the most excellent human life is one lived in pursuit of this type of knowledge, because this knowledge brings us closest to the divine). The productive and practical sciences, in contrast, address our daily needs as human beings, and have to do with things that can and do change. Productive knowledge means, roughly, know-how; the knowledge of how to make a table or a house or a pair of shoes or how to write a tragedy would be examples of this kind of knowledge. This entry is concerned with practical knowledge, which is the knowledge of how to live and act. According to Aristotle, it is the possession and use of practical knowledge that makes it possible to live a good life. Ethics and politics, which are the practical sciences, deal with human beings as moral agents. Ethics is primarily about the actions of human beings as individuals, and politics is about the actions of human beings in communities, although it is important to remember that for Aristotle the two are closely linked and each influences the other.

The fact that ethics and politics are kinds of practical knowledge has several important consequences. First, it means that Aristotle believes that mere abstract knowledge of ethics and politics is worthless. Practical knowledge is only useful if we act on it; we must act appropriately if we are to be moral. He says at Ethics 1103b25: “The purpose of the present study [of morality] is not, as it is in other inquiries, the attainment of theoretical knowledge: we are not conducting this inquiry in order to know what virtue is, but in order to become good, else there would be no advantage in studying it.”

Second, according to Aristotle, only some people can beneficially study politics. Aristotle believes that women and slaves (or at least those who are slaves by nature) can never benefit from the study of politics, and also should not be allowed to participate in politics, about which more will be said later. But there is also a limitation on political study based on age, as a result of the connection between politics and experience: “A young man is not equipped to be a student of politics; for he has no experience in the actions which life demands of him, and these actions form the basis and subject matter of the discussion” (Ethics 1095a2). Aristotle adds that young men will usually act on the basis of their emotions, rather than according to reason, and since acting on practical knowledge requires the use of reason, young men are unequipped to study politics for this reason too. So the study of politics will only be useful to those who have the experience and the mental discipline to benefit from it, and for Aristotle this would have been a relatively small percentage of the population of a city. Even in Athens, the most democratic city in Greece, no more than 15 percent of the population was ever allowed the benefits of citizenship, including political participation. Athenian citizenship was limited to adult males who were not slaves and who had one parent who was an Athenian citizen (sometimes citizenship was further restricted to require both parents to be Athenian citizens). Aristotle does not think this percentage should be increased – if anything, it should be decreased.

Third, Aristotle distinguishes between practical and theoretical knowledge in terms of the level of precision that can be attained when studying them. Political and moral knowledge does not have the same degree of precision or certainty as mathematics. Aristotle says at Ethics 1094b14: “Problems of what is noble and just, which politics examines, present so much variety and irregularity that some people believe that they exist only by convention and not by nature….Therefore, in a discussion of such subjects, which has to start with a basis of this kind, we must be satisfied to indicate the truth with a rough and general sketch: when the subject and the basis of a discussion consist of matters that hold good only as a general rule, but not always, the conclusions reached must be of the same order.” Aristotle does not believe that the noble and the just exist only by convention, any more than, say, the principles of geometry do. However, the principles of geometry are fixed and unchanging. The definition of a point, or a line, or a plane, can be given precisely, and once this definition is known, it is fixed and unchanging for everyone. However, the definition of something like justice can only be known generally; there is no fixed and unchanging definition that will always be correct. This means that unlike philosophers such as Hobbes and Kant, Aristotle does not and in fact cannot give us a fixed set of rules to be followed when ethical and political decisions must be made. Instead he tries to make his students the kind of men who, when confronted with any particular ethical or political decision, will know the correct thing to do, will understand why it is the correct choice, and will choose to do it for that reason. Such a man will know the general rules to be followed, but will also know when and why to deviate from those rules. (I will use “man” and “men” when referring to citizens so that the reader keeps in mind that Aristotle, and the Greeks generally, excluded women from political part icipation. In fact it is not until the mid-19th century that organized attempts to gain the right to vote for women really get underway, and even today in the 21st century there are still many countries which deny women the right to vote or participate in political life).

5. The Importance of Telos

I have already noted the connection between ethics and politics in Aristotle’s thought. The concept that most clearly links the two is that which Aristotle called telos. A discussion of this concept and its importance will help the reader make sense of what follows. Aristotle himself discusses it in Book II, Chapter 3 of the Physics and Book I, Chapter 3 of the Metaphysics.

The word telos means something like purpose, or goal, or final end. According to Aristotle, everything has a purpose or final end. If we want to understand what something is, it must be understood in terms of that end, which we can discover through careful study. It is perhaps easiest to understand what a telos is by looking first at objects created by human beings. Consider a knife. If you wanted to describe a knife, you would talk about its size, and its shape, and what it is made out of, among other things. But Aristotle believes that you would also, as part of your description, have to say that it is made to cut things. And when you did, you would be describing its telos. The knife’s purpose, or reason for existing, is to cut things. And Aristotle would say that unless you included that telos in your description, you wouldn’t really have described – or understood – the knife. This is true not only of things made by humans, but of plants and animals as well. If you were to fully describe an acorn, you would include in your description that it will become an oak tree in the natural course of things – so acorns too have a telos. Suppose you were to describe an animal, like a thoroughbred foal. You would talk about its size, say it has four legs and hair, and a tail. Eventually you would say that it is meant to run fast. This is the horse’s telos, or purpose. If nothing thwarts that purpose, the young horse will indeed become a fast runner.

Here we are not primarily concerned with the telos of a knife or an acorn or a foal. What concerns us is the telos of a human being. Just like everything else that is alive, human beings have a telos. What is it that human beings are meant by nature to become in the way that knives are meant to cut, acorns are meant to become oak trees, and thoroughbred ponies are meant to become race horses? According to Aristotle, we are meant to become happy. This is nice to hear, although it isn’t all that useful. After all, people find happiness in many different ways. However, Aristotle says that living happily requires living a life of virtue. Someone who is not living a life that is virtuous, or morally good, is also not living a happy life, no matter what they might think. They are like a knife that will not cut, an oak tree that is diseased and stunted, or a racehorse that cannot run. In fact they are worse, since they have chosen the life they lead in a way that a knife or an acorn or a horse cannot.

Someone who does live according to virtue, who chooses to do the right thing because it is the right thing to do, is living a life that flourishes; to borrow a phrase, they are being all that they can be by using all of their human capacities to their fullest. The most important of these capacities is logos – a word that means “speech” and also means “reason” (it gives us the English word “logic”). Human beings alone have the ability to speak, and Aristotle says that we have been given that ability by nature so that we can speak and reason with each other to discover what is right and wrong, what is good and bad, and what is just and unjust.

Note that human beings discover these things rather than creating them. We do not get to decide what is right and wrong, but we do get to decide whether we will do what is right or what is wrong, and this is the most important decision we make in life. So too is the happy life: we do not get to decide what really makes us happy, although we do decide whether or not to pursue the happy life. And this is an ongoing decision. It is not made once and for all, but must be made over and over again as we live our lives. Aristotle believes that it is not easy to be virtuous, and he knows that becoming virtuous can only happen under the right conditions. Just as an acorn can only fulfill its telos if there is sufficient light, the right kind of soil, and enough water (among other things), and a horse can only fulfill its telos if there is sufficient food and room to run (again, among other things), an individual can only fulfill their telos and be a moral and happy human being within a well constructed political community. The community brings about virtue through education and through laws which prescribe certain actions and prohibit others.

And here we see the link between ethics and politics in a different light: the role of politics is to provide an environment in which people can live fully human, ethical, and happy lives, and this is the kind of life which makes it possible for someone to participate in politics in the correct way. As Aristotle says at Ethics1103a30: “We become just by the practice of just actions, self-controlled by exercising self-control, and courageous by performing acts of courage….Lawgivers make the citizens good by inculcating [good] habits in them, and this is the aim of every lawgiver; if he does not succeed in doing that, his legislation is a failure. It is in this that a good constitution differs from a bad one.” This is not a view that would be found in political science textbooks today, but for Aristotle it is the central concern of the study of politics: how can we discover and put into practice the political institutions that will develop virtue in the citizens to the greatest possible extent?

6. The Text of the Politics

Having laid out the groundwork for Aristotle’s thought, we are now in a position to look more closely at the text of the Politics. The translation we will use is that of Carnes Lord, which can be found in the list of suggested readings. This discussion is by no means complete; there is much of interest and value in Aristotle’s political writings that will not be considered here. Again, the reader is encouraged to investigate the list of suggested readings. However, the main topics and problems of Aristotle’s work will be included. The discussion will, to the extent possible, follow the organization of the Politics.

7. The Politics, Book I

a. The Purpose of the City

Aristotle begins the Politics by defining its subject, the city or political partnership. Doing so requires him to explain the purpose of the city. (The Greek word for city is polis, which is the word that gives us English words like “politics” and “policy”). Aristotle says that “It is clear that all partnerships aim at some good, and that the partnership that is most authoritative of all and embraces all the others does so particularly, and aims at the most authoritative good of all. This is what is called the city or the political partnership” (1252a3) (See also III.12). In Greece in Aristotle’s time the important political entities were cities, which controlled surrounding territories that were farmed. It is important to remember that the city was not subordinate to a state or nation, the way that cities are today; it was sovereign over the territory that it controlled. To convey this, some translations use the word “city-state” in place of the world “polis.” Although none of us today lives in a polis , we should not be too quick to dismiss Aristotle’s observations on the way of life of the polis as irrelevant to our own political partnerships.

Notice that Aristotle does not define the political community in the way that we generally would, by the laws that it follows or by the group that holds power or as an entity controlling a particular territory. Instead he defines it as a partnership. The citizens of a political community are partners, and as with any other partnership they pursue a common good. In the case of the city it is the most authoritative or highest good. The most authoritative and highest good of all, for Aristotle, is the virtue and happiness of the citizens, and the purpose of the city is to make it possible for the citizens to achieve this virtue and happiness. When discussing the ideal city, he says “[A] city is excellent, at any rate, by its citizens’ – those sharing in the regime – being excellent; and in our case all the citizens share in the regime” (1332a34). In achieving the virtue that is individual excellence, each of them will fulfill his telos. Indeed, it is the shared pursuit of virtue that makes a city a city.

As I have already noted at the beginning of this text, he says in the Ethics at 1099b30: “The end of politics is the best of ends; and the main concern of politics is to engender a certain character in the citizens and to make them good and disposed to perform noble actions.” As has been mentioned, most people today would not see this as the main concern of politics, or even a legitimate concern. Certainly almost everyone wants to see law-abiding citizens, but it is questionable that changing the citizens’ character or making them morally good is part of what government should do. Doing so would require far more governmental control over citizens than most people in Western societies are willing to allow.

Having seen Aristotle’s definition of the city and its purpose, we then get an example of Aristotle’s usual method of discussing political topics. He begins by examining opinions which are “generally accepted,” which means, as he says in the Topics at 100b21, “are accepted by everyone or by the majority or by the philosophers – i.e. by all, or by the majority, or by the most notable and illustrious of them” on the grounds that any such opinions are likely to have at least some truth to them. These opinions (the Greek word isendoxa), however, are not completely true. They must be systematically examined and modified by scholars of politics before the truths that are part of these opinions are revealed. Because Aristotle uses this method of examining the opinions of others to arrive at truth, the reader must be careful to pay attention to whether a particular argument or belief is Aristotle’s or not. In many cases he is setting out an argument in order to challenge it. It can be difficult to tell when Aristotle is arguing in his own voice and when he is considering the opinions of others, but the reader must carefully make this distinction if they are to understand Aristotle’s teachings. (It has also been suggested that Aristotle’s method should be seen as an example of how political discussion ought to be conducted: a variety of viewpoints and arguments are presented, and the final decision is arrived at through a consideration of the strengths and weaknesses of these viewpoints and arguments). For a further discussion of Aristotle’s methodology, see his discussion of reasoning in general and dialectical reasoning in particular in the Topics. Further examples of his approach can be found in Ethics I.4 and VII.1.

In this case, Aristotle takes up the popular opinion that political rule is really the same as other kinds of rule: that of kings over their subjects, of fathers over their wives and children, and of masters over their slaves. This opinion, he says, is mistaken. In fact, each of these kinds of rule is different. To see why, we must consider how the city comes into being, and it is to this that Aristotle next turns in Book I, Chapter 2.

b. How the City Comes Into Being

Here Aristotle tells the story of how cities have historically come into being. The first partnerships among human beings would have been between “persons who cannot exist without one another” (1252a27). There are two pairs of people for whom this is the case. One pair is that of male and female, for the sake of reproduction. This seems reasonable enough to the modern reader. The other pair, however, is that of “the naturally ruling and ruled, on account of preservation” (1252a30). Here Aristotle is referring to slavery. By “preservation” he means that the naturally ruling master and naturally ruled slave need each other if they are to preserve themselves; slavery is a kind of partnership which benefits both master and slave. We will see how later. For now, he simply says that these pairs of people come together and form a household, which exists for the purpose of meeting the needs of daily life (such as food, shelter, clothing, and so forth). The family is only large enough to provide for the bare necessities of life, sustaining its members’ lives and allowing for the reproduction of the species.

Over time, the family expands, and as it does it will come into contact with other families. Eventually a number of such families combine and form a village. Villages are better than families because they are more self-sufficient. Because villages are larger than families, people can specialize in a wider array of tasks and can develop skills in things like cooking, medicine, building, soldiering, and so forth which they could not develop in a smaller group. So the residents of a village will live more comfortable lives, with access to more goods and services, than those who only live in families.

The significant change in human communities, however, comes when a number of villages combine to form a city. A city is not just a big village, but is fundamentally different: “The partnership arising from [the union of] several villages that is complete is the city. It reaches a level of full self-sufficiency, so to speak; and while coming into being for the sake of living, it exists for the sake of living well” (1252b27). Although the founders of cities create them for the sake of more comfortable lives, cities are unique in making it possible for people to live well. Today we tend to think of “living well” as living a life of comfort, family satisfaction, and professional success, surrounded by nice things. But this is not what Aristotle means by “living well”. As we have seen, for Aristotle “living well” means leading a life of happiness and virtue, and by so doing fulfilling one’s telos. Life in the city, in Aristotle’s view, is therefore necessary for anyone who wishes to be completely human. (His particular concern is with the free men who are citizens). “He who is without a city through nature rather than chance is either a mean sort or superior to man,” Aristotle says (1253a3), and adds “One who is incapable of participating or who is in need of nothing through being self-sufficient is no part of a city, and so is either a beast or a god” (1253a27). Humans are not capable of becoming gods, but they are capable of becoming beasts, and in fact the worst kind of beasts: “For just as man is the best of the animals when completed, when separated from law and adjudication he is the worst of all” (1253a30). Outside of the context of life in a properly constructed city, human happiness and well-being is impossible. Even here at the very beginning of the Politics Aristotle is showing the link between ethics and politics and the importance of a well-constructed city in making it possible for the citizens to live well.

There is therefore a sense in which the city “is prior by nature to the household and to each of us” (1253a19). He compares the individual’s relationship with the city to the relationship of a part of the body to the whole body. The destruction of the whole body would also mean the destruction of each of its parts; “if the whole [body] is destroyed there will not be a foot or a hand” (1253a20). And just as a hand is not able to survive without being attached to a functioning body, so too an individual cannot survive without being attached to a city. Presumably Aristotle also means to imply that the reverse is not true; a body can survive the loss of a foot or a hand, although not without consequence. Thus the individual needs the city more than the city needs any of its individual citizens; as Aristotle says in Book 8 before beginning his discussion of the desirable education for the city’s children, “one ought not even consider that a citizen belongs to himself, but rather that all belong to the city; for each individual is a part of the city” (1337a26).

If the history that he has described is correct, Aristotle points out, then the city is natural, and not purely an artificial human construction, since we have established that the first partnerships which make up the family are driven by natural impulses: “Every city, therefore, exists by nature, if such also are the first partnerships. For the city is their end….[T]he city belongs among the things that exist by nature, and…man is by nature a political animal” (1252b30-1253a3). From the very first partnerships of male and female and master and slave, nature has been aiming at the creation of cities, because cities are necessary for human beings to express their capacities and virtues at their best, thus fulfilling their potential and moving towards such perfection as is possible for human beings. While most people today would not agree that nature has a plan for individual human beings, a particular community, or humanity as a whole (although many people would ascribe such a plan to a god or gods), Aristotle believes that nature does indeed have such a plan, and human beings have unique attributes that when properly used make it possible for us to fulfill that plan. What are those attributes?

c. Man, the Political Animal

That man is much more a political animal than any kind of bee or any herd animal is clear. For, as we assert, nature does nothing in vain, and man alone among the animals has speech….[S]peech serves to reveal the advantageous and the harmful and hence also the just and unjust. For it is peculiar to man as compared to the other animals that he alone has a perception of good and bad and just and unjust and other things of this sort; and partnership in these things is what makes a household and a city (1253a8).

Like bees and herd animals, human beings live together in groups. Unlike bees or herd animals, humans have the capacity for speech – or, in the Greek, logos. As we have seen, logos means not only speech but also reason. Here the linkage between speech and reason is clear: the purpose of speech, a purpose assigned to men by nature, is to reveal what is advantageous and harmful, and by doing so to reveal what is good and bad, just and unjust. This knowledge makes it possible for human beings to live together, and at the same time makes it possible for us to pursue justice as part of the virtuous lives we are meant to live. Other animals living in groups, such as bees, goats, and cows, do not have the ability to speak or to reason as Aristotle uses those terms. Of course, they do not need this ability. They are able to live together without determining what is just and unjust or creating laws to enforce justice among themselves. Human beings, for better or worse, cannot do this.

Although nature brings us together – we are by nature political animals – nature alone does not give us all of what we need to live together: “[T]here is in everyone by nature an impulse toward this sort of partnership. And yet the one who first constituted [a city] is responsible for the greatest of goods” [1253a29]. We must figure out how to live together for ourselves through the use of reason and speech, discovering justice and creating laws that make it possible for human community to survive and for the individuals in it to live virtuous lives. A group of people that has done this is a city: “[The virtue of] justice is a thing belonging to the city. For adjudication is an arrangement of the political partnership, and adjudication is judgment as to what is just” (1253a38). And in discovering and living according to the right laws, acting with justice and exercising the virtues that allow human society to function, we make possible not only the success of the political community but also the flourishing of our own individual virtue and happiness. Without the city and its justice, human beings are the worst of animals, just as we are the best when we are completed by the right kind of life in the city. And it is the pursuit of virtue rather than the pursuit of wealth or security or safety or military strength that is the most important element of a city: “The political partnership must be regarded, therefore, as being for the sake of noble actions, not for the sake of living together” (1281a1).

d. Slavery

Having described the basic parts of the city, Aristotle returns in Chapter 3 of Book I to a discussion of the household, beginning with the matter of slavery, including the question of whether slavery is just (and hence an acceptable institution) or not. This, for most contemporary readers is one of the two most offensive portions of Aristotle’s moral and political thought (the other is his treatment of women, about which more will be said below). For most people today, of course, the answer to this is obvious: slavery is not just, and in fact is one of the greatest injustices and moral crimes that it is possible to commit. (Although it is not widely known, there are still large numbers of people held in slavery throughout the world at the beginning of the 21st century. It is easy to believe that people in the “modern world” have put a great deal of moral distance between themselves and the less enlightened people in the past, but it is also easy to overestimate that distance).

In Aristotle’s time most people – at least the ones that were not themselves slaves – would also have believed that this question had an obvious answer, if they had asked the question at all: of course slavery is just. Virtually every ancient Mediterranean culture had some form of the institution of slavery. Slaves were usually of two kinds: either they had at one point been defeated in war, and the fact that they had been defeated meant that they were inferior and meant to serve, or else they were the children of slaves, in which case their inferiority was clear from their inferior parentage. Aristotle himself says that the sort of war that involves hunting “those human beings who are naturally suited to be ruled but [are] unwilling…[is] by nature just” (1256b25). What is more, the economies of the Greek city-states rested on slavery, and without slaves (and women) to do the productive labor, there could be no leisure for men to engage in more intellectual lifestyles. The greatness of Athenian plays, architecture, sculpture, and philosophy could not have been achieved without the institution of slavery. Therefore, as a practical matter, regardless of the arguments for or against it, slavery was not going to be abolished in the Greek world. Aristotle’s willingness to consider the justice of slavery, however we might see it, was in fact progressive for the time. It is perhaps also worth noting that Aristotle’s will specified that his slaves should be freed upon his death. This is not to excuse Aristotle or those of his time who supported slavery, but it should be kept in mind so as to give Aristotle a fair hearing.

Before considering Aristotle’s ultimate position on the justness of slavery – for who, and under what circumstances, slavery is appropriate – it must be pointed out that there is a great deal of disagreement about what that position is. That Aristotle believes slavery to be just and good for both master and slave in some circumstances is undeniable. That he believes that some people who are currently enslaved are not being held in slavery according to justice is also undeniable (this would apparently also mean that there are people who should be enslaved but currently are not). How we might tell which people belong in which group, and what Aristotle believes the consequences of his beliefs about slavery ought to be, are more difficult problems.

Remember that in his discussion of the household, Aristotle has said that slavery serves the interest of both the master and the slave. Now he tells us why: “those who are as different [from other men] as the soul from the body or man from beast – and they are in this state if their work is the use of the body, and if this is the best that can come from them – are slaves by nature….For he is a slave by nature who is capable of belonging to another – which is also why he belongs to another – and who participates in reason only to the extent of perceiving it, but does not have it” (1254b16-23). Notice again the importance of logos – reason and speech. Those who are slaves by nature do not have the full ability to reason. (Obviously they are not completely helpless or unable to reason; in the case of slaves captured in war, for example, the slaves were able to sustain their lives into adulthood and organize themselves into military forces. Aristotle also promises a discussion of “why it is better to hold out freedom as a reward for all slaves” (1330a30) which is not in the Politics as we have it, but if slaves were not capable of reasoning well enough to stay alive it would not be a good thing to free them). They are incapable of fully governing their own lives, and require other people to tell them what to do. Such people should be set to labor by the people who have the ability to reason fully and order their own lives. Labor is their proper use; Aristotle refers to slaves as “living tools” at I.4. Slaves get the guidance and instructions that they must have to live, and in return they provide the master with the benefits of their physical labor, not least of which is the free time that makes it possible for the master to engage in politics and philosophy.

One of the themes running through Aristotle’s thought that most people would reject today is the idea that a life of labor is demeaning and degrading, so that those who must work for a living are not able to be as virtuous as those who do not have to do such work. Indeed, Aristotle says that when the master can do so he avoids labor even to the extent of avoiding the oversight of those who must engage in it: “[F]or those to whom it is open not to be bothered with such things [i.e. managing slaves], an overseer assumes this prerogative, while they themselves engage in politics or philosophy” (1255b35).

This would seem to legitimate slavery, and yet there are two significant problems.

First, Aristotle points out that although nature would like us to be able to differentiate between who is meant to be a slave and who is meant to be a master by making the difference in reasoning capacity visible in their outward appearances, it frequently does not do so. We cannot look at people’s souls and distinguish those who are meant to rule from those who are meant to be ruled – and this will also cause problems when Aristotle turns to the question of who has a just claim to rule in the city.

Second, in Chapter Six, Aristotle points out that not everyone currently held in slavery is in fact a slave by nature. The argument that those who are captured in war are inferior in virtue cannot, as far as Aristotle is concerned, be sustained, and the idea that the children of slaves are meant to be slaves is also wrong: “[T]hey claim that from the good should come someone good, just as from a human being comes from a human being and a beast from beasts. But while nature wishes to do this, it is often unable to” (1255b3). We are left with the position that while some people are indeed slaves by nature, and that slavery is good for them, it is extremely difficult to find out who these people are, and that therefore it is not the case that slavery is automatically just either for people taken in war or for children of slaves, though sometimes it is (1256b23). In saying this, Aristotle was undermining the legitimacy of the two most significant sources of slaves. If Aristotle’s personal life is relevant, while he himself owned slaves, he was said to have freed them upon his death. Whether this makes Aristotle’s position on slavery more acceptable or less so is left to the reader to decide.

In Chapter 8 of Book I Aristotle says that since we have been talking about household possessions such as slaves we might as well continue this discussion. The discussion turns to “expertise in household management.” The Greek word for “household” is oikos, and it is the source of our word “economics.” In Aristotle’s day almost all productive labor took place within the household, unlike today, in modern capitalist societies, when it mostly takes place in factories, offices, and other places specifically developed for such activity.

Aristotle uses the discussion of household management to make a distinction between expertise in managing a household and expertise in business. The former, Aristotle says, is important both for the household and the city; we must have supplies available of the things that are necessary for life, such as food, clothing, and so forth, and because the household is natural so too is the science of household management, the job of which is to maintain the household. The latter, however, is potentially dangerous. This, obviously, is another major difference between Aristotle and contemporary Western societies, which respect and admire business expertise, and encourage many of our citizens to acquire and develop such expertise. For Aristotle, however, expertise in business is not natural, but “arises rather through a certain experience and art” (1257a5). It is on account of expertise in business that “there is held to be no limit to wealth and possessions” (1257a1). This is a problem because some people are led to pursue wealth without limit, and the choice of such a life, while superficially very attractive, does not lead to virtue and real happiness. It leads some people to “proceed on the supposition that they should either preserve or increase without limit their property in money. The cause of this state is that they are serious about living, but not about living well; and since that desire of theirs is without limit, they also desire what is productive of unlimited things” (1257b38).

Aristotle does not entirely condemn wealth – it is necessary for maintaining the household and for providing the opportunity to develop one’s virtue. For example, generosity is one of the virtues listed in the Ethics, but it is impossible to be generous unless one has possessions to give away. But Aristotle strongly believes that we must not lose sight of the fact that wealth is to be pursued for the sake of living a virtuous life, which is what it means to live well, rather than for its own sake. (So at 1258b1 he agrees with those who object to the lending of money for interest, upon which virtually the entire modern global economy is based). Someone who places primary importance on money and the bodily satisfactions that it can buy is not engaged in developing their virtue and has chosen a life which, however it may seem from the outside or to the person living it, is not a life of true happiness.

This is still another difference between Aristotle and contemporary Western societies. For many if not most people in such societies, the pursuit of wealth without limit is seen as not only acceptable but even admirable. At the same time, many people reject the emphasis Aristotle places on the importance of political participation. Many liberal democracies fail to get even half of their potential voters to cast a ballot at election time, and jury duty, especially in the United States, is often looked on as a burden and waste of time, rather than a necessary public service that citizens should willingly perform. In Chapter 11, Aristotle notes that there is a lot more to be said about enterprise in business, but “to spend much time on such things is crude” (1258b35). Aristotle believes that we ought to be more concerned with other matters; moneymaking is beneath the attention of the virtuous man. (In this Aristotle is in agreement with the common opinion of Athenian aristocrats). He concludes this discussion with a story about Thales the philosopher using his knowledge of astronomy to make a great deal of money, “thus showing how easy it is for philosophers to become wealthy if they so wish, but it is not this they are serious about” (1259a16). Their intellectual powers, which could be turned to wealth, are being used in other, better ways to develop their humanity.

In the course of discussing the various ways of life open to human beings, Aristotle notes that “If, then, nature makes nothing that is incomplete or purposeless, nature must necessarily have made all of these [i.e. all plants and animals] for the sake of human beings” (1256b21). Though not a directly political statement, it does emphasize Aristotle’s belief that there are many hierarchies in nature, as well as his belief that those who are lower in the natural hierarchy should be under the command of those who are higher.

e. Women

In Chapter 12, after the discussion of business expertise has been completed, Aristotle returns to the subject of household rule, and takes up the question of the proper forms of rule over women and children. As with the master’s rule over the slave, and humanity’s rule over plants and other animals, Aristotle defines these kinds of rule in terms of natural hierarchies: “[T]he male, unless constituted in some respect contrary to nature, is by nature more expert at leading than the female, and the elder and complete than the younger and incomplete” (1259a41). This means that it is natural for the male to rule: “[T]he relation of male to female is by nature a relation of superior to inferior and ruler to ruled” (1245b12). And just as with the rule of the master over the slave, the difference here is one of reason: “The slave is wholly lacking the deliberative element; the female has it but it lacks authority; the child has it but it is incomplete” (1260a11).

There is a great deal of scholarly debate about what the phrase “lacks authority” means in this context. Aristotle does not elaborate on it. Some have suggested that it means not that women’s reason is inferior to that of men but that women lack the ability to make men do what they want, either because of some innate psychological characteristic (they are not aggressive and/or assertive enough) or because of the prevailing culture in Greece at the time. Others suggest that it means that women’s emotions are ultimately more influential in determining their behavior than reason is so that reason lacks authority over what a woman does. This question cannot be settled here. I will simply point out the vicious circle in which women were trapped in ancient Greece (and still are in many cultures). The Greeks believed that women are inferior to men (or at least those Greeks who wrote philosophy, plays, speeches, and so forth did. These people, of course, were all men. What Greek women thought of this belief is impossible to say). This belief means that women are denied access to certain areas of life (such as politics). Denying them access to these spheres means that they fail to develop the knowledge and skills to become proficient in them. This lack of knowledge and skills then becomes evidence to reinforce the original belief that they are inferior.

What else does Aristotle have to say about the rule of men over women? He says that the rule of the male over the female and that of the father over children are different in form from the rule of masters over slaves. Aristotle places the rule of male over female in the household in the context of the husband over the wife (female children who had not yet been married would have been ruled by their father. Marriage for girls in Athens typically took place at the age of thirteen or fourteen). Aristotle says at 1259a40 that the wife is to be ruled in political fashion. We have not yet seen what political rule looks like, but here Aristotle notes several of its important features, one of which is that it usually involves “alternation in ruling and being ruled” (1259b2), and another is that it involves rule among those who “tend by their nature to be on an equal footing and to differ in nothing” (1259b5). In this case, however, the husband does not alternate rule with the wife but instead always rules. Apparently the husband is to treat his wife as an equal to the degree that it is possible to do so, but must retain ultimate control over household decisions.

Women have their own role in the household, preserving what the man acquires. However, women do not participate in politics, since their reason lacks the authority that would allow them to do so, and in order to properly fulfill this role the wife must pursue her own telos. This is not the same as that of a man, but as with a man nature intends her to achieve virtues of the kind that are available to her: “It is thus evident that…the moderation of a woman and a man is not the same, nor their courage or justice…but that there is a ruling and a serving courage, and similarly with the other virtues” (1260a19). Unfortunately Aristotle has very little to say about what women’s virtues look like, how they are to be achieved, or how women should be educated. But it is clear that Aristotle believes that as with the master’s superiority to the slave, the man’s superiority to a woman is dictated by nature and cannot be overcome by human laws, customs, or beliefs.

Aristotle concludes the discussion of household rule, and the first book of the Politics, by stating that the discussion here is not complete and “must necessarily be addressed in the [discourses] connected with the regimes” (1260a11). This is the case because both women and children “must necessarily be educated looking to the regime, at least if it makes any difference with a view to the city’s being excellent that both its children and its women are excellent. But it necessarily makes a difference…” (1260a14). “Regime” is one of the ways to translate the Greek word politeia, which is also often translated as “constitution” or “political system.” Although there is some controversy about how best to translate this word, I will use the word “regime” throughout this article. The reader should keep in mind that if the word “constitution” is used this does not mean a written constitution of the sort that most contemporary nation-states employ. Instead, Aristotle uses politeia (however it is translated) to mean the way the state is organized, what offices there are, who is eligible to hold them, how they are selected, and so forth. All of these things depend on the group that holds political power in the city. For example, sometimes power is held by one man who rules in the interest of the city as a whole; this is the kind of regime called monarchy. If power is held by the wealthy who rule for their own benefit, then the regime is an oligarchy.

We will have much more to say later on the topic of regimes. Here Aristotle is introducing another important idea which he will develop later: the idea that the people living under a regime, including the women and children, must be taught to believe in the principles that underlie that regime. (In Book II, Chapter 9, Aristotle severely criticizes the Spartan regime for its failure to properly educate the Spartan women and shows the negative consequences this has had for the Spartan regime). For a monarchy to last, for example, the people must believe in the rightness of monarchical rule and the principles which justify it. Therefore it is important for the monarch to teach the people these principles and beliefs. In Books IV-VI Aristotle develops in much more detail what the principles of the different regimes are, and the Politics concludes with a discussion of the kind of education that the best regime ought to provide its citizens.

8. The Politics, Book II

“Cities…that are held to be in a fine condition” In Book II, Aristotle changes his focus from the household to the consideration of regimes that are “in use in some of the cities that are said to be well managed and any others spoken about by certain persons that are held to be in a fine condition” (1260a30). This examination of existing cities must be done both in order to find out what those cities do properly, so that their successes can be imitated, and to find out what they do improperly so that we can learn from their mistakes. This study and the use of the knowledge it brings remains one of the important tasks of political science. Merely imitating an existing regime, no matter how excellent its reputation, is not sufficient. This is the case “because those regimes now available are in fact not in a fine condition” (1260a34). In order to create a better regime we must study the imperfect ones found in the real world. He will do this again on a more theoretical level in Books IV-VI. We should also examine the ideal regimes proposed by other thinkers. As it turns out, however fine these regimes are in theory, they cannot be put into practice, and this is obviously reason enough not to adopt them. Nevertheless, the ideas of other thinkers can assist us in our search for knowledge. Keep in mind that the practical sciences are not about knowledge for its own sake: unless we put this knowledge to use in order to improve the citizens and the city, the study engaged in by political science is pointless. We will not consider all the details of the different regimes Aristotle describes, but some of them are important enough to examine here.

a. What Kind of Partnership Is a City?

Aristotle begins his exploration of these regimes with the question of the degree to which the citizens in a regime should be partners. Recall that he opened the Politics with the statement that the city is a partnership, and in fact the most authoritative partnership. The citizens of a particular city clearly share something, because it is sharing that makes a partnership. Consider some examples of partnerships: business partners share a desire for wealth; philosophers share a desire for knowledge; drinking companions share a desire for entertainment; the members of a hockey team share a desire to win their game.

So what is it that citizens share? This is an important question for Aristotle, and he chooses to answer this question in the context of Socrates’ imagined community in Plato‘s dialogue The Republic. Aristotle has already said that the regime is a partnership in adjudication and justice. But is it enough that the people of a city have a shared understanding of what justice means and what the laws require, or is the political community a partnership in more than these things? Today the answer would probably be that these things are sufficient – a group of people sharing territory and laws is not far from how most people would define the modern state. In the Republic, Socrates argues that the city should be unified to the greatest degree possible. The citizens, or at least those in the ruling class, ought to share everything, including property, women, and children. There should be no private families and no private property. But this, according to Aristotle, is too much sharing. While the city is clearly a kind of unity, it is a unity that must derive from a multitude. Human beings are unavoidably different, and this difference, as we saw earlier, is the reason cities were formed in the first place, because difference within the city allows for specialization and greater self-sufficiency. Cities are preserved not by complete unity and similarity but by “reciprocal equality,” and this principle is especially important in cities where “persons are free and equal.” In such cities “all cannot rule at the same time, but each rules for a year or according to some other arrangement or period of time. In this way, then, it results that all rule…” (1261a30). This topic, the alternation of rule in cities where the citizens are free and equal, is an important part of Aristotle’s thought, and we will return to it later.

There would be another drawback to creating a city in which everything is held in common. Aristotle notes that people value and care for what is their own: “What belongs in common to the most people is accorded the least care: they take thought for their own things above all, and less about things common, or only so much as falls to each individually” (1261b32). (Contemporary social scientists call this a problem of “collective goods”). Therefore to hold women and property in common, as Socrates proposes, would be a mistake. It would weaken attachments to other people and to the common property of the city, and this would lead to each individual assuming that someone else would care for the children and property, with the end result being that no one would. For a modern example, many people who would not throw trash on their own front yard or damage their own furniture will litter in a public park and destroy the furniture in a rented apartment or dorm room. Some in Aristotle’s time (and since) have suggested that holding property in common will lead to an end to conflict in the city. This may at first seem wise, since the unequal distribution of property in a political community is, Aristotle believes, one of the causes of injustice in the city and ultimately of civil war. But in fact it is not the lack of common property that leads to conflict; instead, Aristotle blames human depravity (1263b20). And in order to deal with human depravity, what is needed is to moderate human desires, which can be done among those “adequately educated by the laws” (1266b31). Inequality of property leads to problems because the common people desire wealth without limit (1267b3); if this desire can be moderated, so too can the problems that arise from it. Aristotle also includes here the clam that the citizens making up the elite engage in conflict because of inequality of honors (1266b38). In other words, they engage in conflict with the other citizens because of their desire for an unequal share of honor, which leads them to treat the many with condescension and arrogance. Holding property in common, Aristotle notes, will not remove the desire for honor as a source of conflict.

b. Existing Cities: Sparta, Crete, Carthage

In Chapters 9-11 of Book II, Aristotle considers existing cities that are held to be excellent: Sparta in Chapter 9, Crete in Chapter 10, and Carthage (which, notably, was not a Greek city) in Chapter 11. It is noteworthy that when Athens is considered following this discussion (in Chapter 12), Aristotle takes a critical view and seems to suggest that the city has declined since the time of Solon. Aristotle does not anywhere in his writings suggest that Athens is the ideal city or even the best existing city. It is easy to assume the opposite, and many have done so, but there is no basis for this assumption. We will not examine the particulars of Aristotle’s view of each of these cities. However, two important points should be noted here. One general point that Aristotle makes when considering existing regimes is that when considering whether a particular piece of legislation is good or not, it must be compared not only to the best possible set of arrangements but also the set of arrangements that actually prevails in the city. If a law does not fit well with the principles of the regime, although it may be an excellent law in the abstract, the people will not believe in it or support it and as a result it will be ineffective or actually harmful (1269a31). The other is that Aristotle is critical of the Spartans because of their belief that the most important virtue to develop and the one that the city must teach its citizens is the kind of virtue that allows them to make war successfully. But war is not itself an end or a good thing; war is for the sake of peace, and the inability of the Spartans to live virtuously in times of peace has led to their downfall. (See also Book VII, Chapter 2, where Aristotle notes the hypocrisy of a city whose citizens seek justice among themselves but “care nothing about justice towards others” (1324b35) and Book VII, Chapter 15).

9. The Politics, Book III

a. Who Is the Citizen?

In Book III, Aristotle takes a different approach to understanding the city. Again he takes up the question of what the city actually is, but here his method is to understand the parts that make up the city: the citizens. “Thus who ought to be called a citizen and what the citizen is must be investigated” (1274b41). For Americans today this is a legal question: anyone born in the United States or born to American citizens abroad is automatically a citizen. Other people can become citizens by following the correct legal procedures for doing so. However, this rule is not acceptable for Aristotle, since slaves are born in the same cities as free men but that does not make them citizens. For Aristotle, there is more to citizenship than living in a particular place or sharing in economic activity or being ruled under the same laws. Instead, citizenship for Aristotle is a kind of activity: “The citizen in an unqualified sense is defined by no other thing so much as by sharing in decision and office” (1275a22). Later he says that “Whoever is entitled to participate in an office involving deliberation or decision is, we can now say, a citizen in this city; and the city is the multitude of such persons that is adequate with a view to a self-sufficient life, to speak simply” (1275b17). And this citizen is a citizen “above all in a democracy; he may, but will not necessarily, be a citizen in the others” (1275b4). We have yet to talk about what a democracy is, but when we do, this point will be important to defining it properly. When Aristotle talks about participation, he means that each citizen should participate directly in the assembly – not by voting for representatives – and should willingly serve on juries to help uphold the laws. Note again the contrast with modern Western nation-states where there are very few opportunities to participate directly in politics and most people struggle to avoid serving on juries.

Participation in deliberation and decision making means that the citizen is part of a group that discusses the advantageous and the harmful, the good and bad, and the just and unjust, and then passes laws and reaches judicial decisions based on this deliberative process. This process requires that each citizen consider the various possible courses of action on their merits and discuss these options with his fellow citizens. By doing so the citizen is engaging in reason and speech and is therefore fulfilling his telos, engaged in the process that enables him to achieve the virtuous and happy life. In regimes where the citizens are similar and equal by nature – which in practice is all of them – all citizens should be allowed to participate in politics, though not all at once. They must take turns, ruling and being ruled in turn. Note that this means that citizenship is not just a set of privileges, it is also a set of duties. The citizen has certain freedoms that non-citizens do not have, but he also has obligations (political participation and military service) that they do not have. We will see shortly why Aristotle believed that the cities existing at the time did not in fact follow this principle of ruling and being ruled in turn.

b. The Good Citizen and the Good Man

Before looking more closely at democracy and the other kinds of regimes, there are still several important questions to be discussed in Book III. One of the most important of these from Aristotle’s point of view is in Chapter 4. Here he asks the question of “whether the virtue of the good man and the excellent citizen is to be regarded as the same or as not the same” (1276b15). This is a question that seems strange, or at least irrelevant, to most people today. The good citizen today is asked to follow the laws, pay taxes, and possibly serve on juries; these are all good things the good man (or woman) would do, so that the good citizen is seen as being more or less subsumed into the category of the good person. For Aristotle, however, this is not the case. We have already seen Aristotle’s definition of the good man: the one who pursues his telos, living a life in accordance with virtue and finding happiness by doing so. What is Aristotle’s definition of the good citizen?

Aristotle has already told us that if the regime is going to endure it must educate all the citizens in such a way that they support the kind of regime that it is and the principles that legitimate it. Because there are several different types of regime (six, to be specific, which will be considered in more detail shortly), there are several different types of good citizen. Good citizens must have the type of virtue that preserves the partnership and the regime: “[A]lthough citizens are dissimilar, preservation of the partnership is their task, and the regime is [this] partnership; hence the virtue of the citizen must necessarily be with a view to the regime. If, then, there are indeed several forms of regime, it is clear that it is not possible for the virtue of the excellent citizen to be single, or complete virtue” (1276b27).

There is only one situation in which the virtue of the good citizen and excellent man are the same, and this is when the citizens are living in a city that is under the ideal regime: “In the case of the best regime, [the citizen] is one who is capable of and intentionally chooses being ruled and ruling with a view to the life in accordance with virtue” (1284a1). Aristotle does not fully describe this regime until Book VII. For those of us not living in the ideal regime, the ideal citizen is one who follows the laws and supports the principles of the regime, whatever that regime is. That this may well require us to act differently than the good man would act and to believe things that the good man knows to be false is one of the unfortunate tragedies of political life.

There is another element to determining who the good citizen is, and it is one that we today would not support. For Aristotle, remember, politics is about developing the virtue of the citizens and making it possible for them to live a life of virtue. We have already seen that women and slaves are not capable of living this kind of life, although each of these groups has its own kind of virtue to pursue. But there is another group that is incapable of citizenship leading to virtue, and Aristotle calls this group “the vulgar”. These are the people who must work for a living. Such people lack the leisure time necessary for political participation and the study of philosophy: “it is impossible to pursue the things of virtue when one lives the life of a vulgar person or a laborer” (1278a20). They are necessary for the city to exist – someone must build the houses, make the shoes, and so forth – but in the ideal city they would play no part in political life because their necessary tasks prevent them from developing their minds and taking an active part in ruling the city. Their existence, like those of the slaves and the women, is for the benefit of the free male citizens. Aristotle makes this point several times in the Politics: see, for example, VII.9 and VIII.2 for discussions of the importance of avoiding the lifestyle of the vulgar if one wants to achieve virtue, and I.13 and III.4, where those who work with their hands are labeled as kinds of slaves.

The citizens, therefore, are those men who are “similar in stock and free,” (1277b8) and rule over such men by those who are their equals is political rule, which is different from the rule of masters over slaves, men over women, and parents over children. This is one of Aristotle’s most important points: “[W]hen [the regime] is established in accordance with equality and similarity among the citizens, [the citizens] claim to merit ruling in turn” (1279a8). Throughout the remainder of the Politics he returns to this point to remind us of the distinction between a good regime and a bad regime. The correct regime of polity, highlighted in Book IV, is under political rule, while deviant regimes are those which are ruled as though a master was ruling over slaves. But this is wrong: “For in the case of persons similar by nature, justice and merit must necessarily be the same according to nature; and so if it is harmful for their bodies if unequal persons have equal sustenance and clothing, it is so also [for their souls if they are equal] in what pertains to honors, and similarly therefore if equal persons have what is unequal” (1287a12).

c. Who Should Rule?

This brings us to perhaps the most contentious of political questions: how should the regime be organized? Another way of putting this is: who should rule? In Books IV-VI Aristotle explores this question by looking at the kinds of regimes that actually existed in the Greek world and answering the question of who actually does rule. By closely examining regimes that actually exist, we can draw conclusions about the merits and drawbacks of each. Like political scientists today, he studied the particular political phenomena of his time in order to draw larger conclusions about how regimes and political institutions work and how they should work. As has been mentioned above, in order to do this, he sent his students throughout Greece to collect information on the regimes and histories of the Greek cities, and he uses this information throughout the Politics to provide examples that support his arguments. (According to Diogenes Laertius, histories and descriptions of the regimes of 158 cities were written, but only one of these has come down to the present: the Constitution of Athens mentioned above).

Another way he used this data was to create a typology of regimes that was so successful that it ended up being used until the time of Machiavelli nearly 2000 years later. He used two criteria to sort the regimes into six categories.

The first criterion that is used to distinguish among different kinds of regimes is the number of those ruling: one man, a few men, or the many. The second is perhaps a little more unexpected: do those in power, however many they are, rule only in their own interest or do they rule in the interest of all the citizens? “[T]hose regimes which look to the common advantage are correct regimes according to what is unqualifiedly just, while those which look only to the advantage of the rulers are errant, and are all deviations from the correct regimes; for they involve mastery, but the city is a partnership of free persons” (1279a16).

Having established these as the relevant criteria, in Book III Chapter 7 Aristotle sets out the six kinds of regimes. The correct regimes are monarchy (rule by one man for the common good), aristocracy (rule by a few for the common good), and polity (rule by the many for the common good); the flawed or deviant regimes are tyranny (rule by one man in his own interest), oligarchy (rule by the few in their own interest), and democracy (rule by the many in their own interest). Aristotle later ranks them in order of goodness, with monarchy the best, aristocracy the next best, then polity, democracy, oligarchy, and tyranny (1289a38). People in Western societies are used to thinking of democracy as a good form of government – maybe the only good form of government – but Aristotle considers it one of the flawed regimes (although it is the least bad of the three) and you should keep that in mind in his discussion of it. You should also keep in mind that by the “common good” Aristotle means the common good of the citizens, and not necessarily all the residents of the city. The women, slaves, and manual laborers are in the city for the good of the citizens.

Almost immediately after this typology is created, Aristotle clarifies it: the real distinction between oligarchy and democracy is in fact the distinction between whether the wealthy or the poor rule (1279b39), not whether the many or the few rule. Since it is always the case that the poor are many while the wealthy are few, it looks like it is the number of the rulers rather than their wealth which distinguishes the two kinds of regimes (he elaborates on this in IV.4). All cities have these two groups, the many poor and the few wealthy, and Aristotle was well aware that it was the conflict between these two groups that caused political instability in the cities, even leading to civil wars (Thucydides describes this in his History of the Peloponnesian War, and the Constitution of Athens also discusses the consequences of this conflict). Aristotle therefore spends a great deal of time discussing these two regimes and the problem of political instability, and we will focus on this problem as well.

First, however, let us briefly consider with Aristotle one other valid claim to rule. Those who are most virtuous have, Aristotle says, the strongest claim of all to rule. If the city exists for the sake of developing virtue in the citizens, then those who have the most virtue are the most fit to rule; they will rule best, and on behalf of all the citizens, establishing laws that lead others to virtue. However, if one man or a few men of exceptional virtue exist in the regime, we will be outside of politics: “If there is one person so outstanding by his excess of virtue – or a number of persons, though not enough to provide a full complement for the city – that the virtue of all the others and their political capacity is not commensurable…such persons can no longer be regarded as part of the city” (1284a4). It would be wrong for the other people in the city to claim the right to rule over them or share rule with them, just as it would be wrong for people to claim the right to share power with Zeus. The proper thing would be to obey them (1284b28). But this situation is extremely unlikely (1287b40). Instead, cities will be made up of people who are similar and equal, which leads to problems of its own.

The most pervasive of these is that oligarchs and democrats each advance a claim to political power based on justice. For Aristotle, justice dictates that equal people should get equal things, and unequal people should get unequal things. If, for example, two students turn in essays of identical quality, they should each get the same grade. Their work is equal, and so the reward should be too. If they turn in essays of different quality, they should get different grades which reflect the differences in their work. But the standards used for grading papers are reasonably straightforward, and the consequences of this judgment are not that important, relatively speaking – they certainly are not worth fighting and dying for. But the stakes are raised when we ask how we should judge the question of who should rule, for the standards here are not straightforward and disagreement over the answer to this question frequently does lead men (and women) to fight and die.

What does justice require when political power is being distributed? Aristotle says that both groups – the oligarchs and democrats – offer judgments about this, but neither of them gets it right, because “the judgment concerns themselves, and most people are bad judges concerning their own things” (1280a14). (This was the political problem that was of most concern to the authors of the United States Constitution: given that people are self-interested and ambitious, who can be trusted with power? Their answer differs from Aristotle’s, but it is worth pointing out the persistence of the problem and the difficulty of solving it). The oligarchs assert that their greater wealth entitles them to greater power, which means that they alone should rule, while the democrats say that the fact that all are equally free entitles each citizen to an equal share of political power (which, because most people are poor, means that in effect the poor rule). If the oligarchs’ claim seems ridiculous, you should keep in mind that the American colonies had property qualifications for voting; those who could not prove a certain level of wealth were not allowed to vote. And poll taxes, which required people to pay a tax in order to vote and therefore kept many poor citizens (including almost all African-Americans) from voting, were not eliminated in the United States until the mid-20th century. At any rate, each of these claims to rule, Aristotle says, is partially correct but partially wrong. We will consider the nature of democracy and oligarchy shortly.

Aristotle also in Book III argues for a principle that has become one of the bedrock principles of liberal democracy: we ought, to the extent possible, allow the law to rule. “One who asks the law to rule, therefore, is held to be asking god and intellect alone to rule, while one who asks man adds the beast. Desire is a thing of this sort; and spiritedness perverts rulers and the best men. Hence law is intellect without appetite” (1287a28). This is not to say that the law is unbiased. It will reflect the bias of the regime, as it must, because the law reinforces the principles of the regime and helps educate the citizens in those principles so that they will support the regime. But in any particular case, the law, having been established in advance, is impartial, whereas a human judge will find it hard to resist judging in his own interest, according to his own desires and appetites, which can easily lead to injustice. Also, if this kind of power is left in the hands of men rather than with the laws, there will be a desperate struggle to control these offices and their benefits, and this will be another cause of civil war. So whatever regime is in power should, to the extent possible, allow the laws to rule. Ruling in accordance with one’s wishes at any particular time is one of the hallmarks of tyranny (it is the same way masters rule over slaves), and it is also, Aristotle says, typical of a certain kind of democracy, which rules by decree rather than according to settled laws. In these cases we are no longer dealing with politics at all, “For where the laws do not rule there is no regime” (1292b30). There are masters and slaves, but there are no citizens.

10. The Politics, Book IV

a. Polity: The Best Practical Regime

In Book IV Aristotle continues to think about existing regimes and their limitations, focusing on the question: what is the best possible regime? This is another aspect of political science that is still practiced today, as Aristotle combines a theory about how regimes ought to be with his analysis of how regimes really are in practice in order to prescribe changes to those regimes that will bring them more closely in line with the ideal. It is in Book VII that Aristotle describes the regime that would be absolutely the best, if we could have everything the way we wanted it; here he is considering the best regime that we can create given the kinds of human beings and circumstances that cities today find themselves forced to deal with, “For one should study not only the best regime but also the regime that is [the best] possible, and similarly also the regime that is easier and more attainable for all” (1288b37).

Aristotle also provides advice for those that want to preserve any of the existing kinds of regime, even the defective ones, showing a kind of hard-headed realism that is often overlooked in his writings. In order to do this, he provides a higher level of detail about the varieties of the different regimes than he has previously given us. There are a number of different varieties of democracy and oligarchy because cities are made up of a number of different groups of people, and the regime will be different depending on which of these groups happens to be most authoritative. For example, a democracy that is based on the farming element will be different than a democracy that is based on the element that is engaged in commerce, and similarly there are different kinds of oligarchies. We do not need to consider these in detail except to note that Aristotle holds to his position that in either a democracy or an oligarchy it is best if the law rules rather than the people possessing power. In the case of democracy it is best if the farmers rule, because farmers will not have the time to attend the assembly, so they will stay away and will let the laws rule (VI.4).

It is, however, important to consider polity in some detail, and this is the kind of regime to which Aristotle next turns his attention. “Simply speaking, polity is a mixture of oligarchy and democracy” (1293b32). Remember that polity is one of the correct regimes, and it occurs when the many rule in the interest of the political community as a whole. The problem with democracy as the rule of the many is that in a democracy the many rule in their own interest; they exploit the wealthy and deny them political power. But a democracy in which the interests of the wealthy were taken into account and protected by the laws would be ruling in the interest of the community as a whole, and it is this that Aristotle believes is the best practical regime. The ideal regime to be described in Book VII is the regime that we would pray for if the gods would grant us our wishes and we could create a city from scratch, having everything exactly the way we would want it. But when we are dealing with cities that already exist, their circumstances limit what kind of regime we can reasonably expect to create. Creating a polity is a difficult thing to do, and although he provides many examples of democracies and oligarchies Aristotle does not give any examples of existing polities or of polities that have existed in the past.

One of the important elements of creating a polity is to combine the institutions of a democracy with those of an oligarchy. For example, in a democracy, citizens are paid to serve on juries, while in an oligarchy, rich people are fined if they do not. In a polity, both of these approaches are used, with the poor being paid to serve and the rich fined for not serving. In this way, both groups will serve on juries and power will be shared. There are several ways to mix oligarchy and democracy, but “The defining principle of a good mixture of democracy and oligarchy is that it should be possible for the same polity to be spoken of as either a democracy or an oligarchy” (1294b14). The regime must be said to be both – and neither – a democracy and an oligarchy, and it will be preserved “because none of the parts of the city generally would wish to have another regime” (1294b38).

b. The Importance of the Middle Class

In addition to combining elements from the institutions of democracy and oligarchy, the person wishing to create a lasting polity must pay attention to the economic situation in the city. In Book II of the EthicsAristotle famously establishes the principle that virtue is a mean between two extremes. For example, a soldier who flees before a battle is guilty of the vice of cowardice, while one who charges the enemy singlehandedly, breaking ranks and getting himself killed for no reason, is guilty of the vice of foolhardiness. The soldier who practices the virtue of courage is the one who faces the enemy, moves forward with the rest of the troops in good order, and fights bravely. Courage, then, is a mean between the extremes of cowardice and foolhardiness. The person who has it neither flees from the enemy nor engages in a suicidal and pointless attack but faces the enemy bravely and attacks in the right way.

Aristotle draws a parallel between virtue in individuals and virtue in cities. The city, he says, has three parts: the rich, the poor, and the middle class. Today we would probably believe that it is the rich people who are the most fortunate of those three groups, but this is not Aristotle’s position. He says: “[I]t is evident that in the case of the goods of fortune as well a middling possession is the best of all. For [a man of moderate wealth] is readiest to obey reason, while for one who is [very wealthy or very poor] it is difficult to follow reason. The former sort tend to become arrogant and base on a grand scale, the latter malicious and base in petty ways; and acts of injustice are committed either through arrogance or through malice” (1295b4). A political community that has extremes of wealth and poverty “is a city not of free persons but of slaves and masters, the ones consumed by envy, the others by contempt. Nothing is further removed from affection and from a political partnership” (1295b22). People in the middle class are free from the arrogance that characterizes the rich and the envy that characterizes the poor. And, since members of this class are similar and equal in wealth, they are likely to regard one another as similar and equal generally, and to be willing to rule and be ruled in turn, neither demanding to rule at all times as the wealthy do or trying to avoid ruling as the poor do from their lack of resources. “Thus it is the greatest good fortune for those who are engaged in politics to have a middling and sufficient property, because where some possess very many things and others nothing, either [rule of] the people in its extreme form must come into being, or unmixed oligarchy, or – as a result of both of these excesses – tyranny. For tyranny arises from the most headstrong sort of democracy and from oligarchy, but much less often from the middling sorts [of regime] and those close to them” (1295b39).

There can be an enduring polity only when the middle class is able either to rule on its own or in conjunction with either of the other two groups, for in this way it can moderate their excesses: “Where the multitude of middling persons predominates either over both of the extremities together or over one alone, there a lasting polity is capable of existing” (1296b38). Unfortunately, Aristotle says, this state of affairs almost never exists. Instead, whichever group, rich or poor, is able to achieve power conducts affairs to suit itself rather than considering the interests of the other group: “whichever of the two succeeds in dominating its opponents does not establish a regime that is common or equal, but they grasp for preeminence in the regime as the prize of victory” (1296a29). And as a result, neither group seeks equality but instead each tries to dominate the other, believing that it is the only way to avoid being dominated in turn. This is a recipe for instability, conflict, and ultimately civil war, rather than a lasting regime. For the polity (or any other regime) to last, “the part of the city that wants the regime to continue must be superior to the part not wanting this” in quality and quantity (1296b16). He repeats this in Book V, calling it the “great principle”: “keep watch to ensure that that the multitude wanting the regime is superior to that not wanting it” (1309b16), and in Book VI he discusses how this can be arranged procedurally (VI.3).

The remainder of Book IV focuses on the kinds of authority and offices in the city and how these can be distributed in democratic or oligarchic fashion. We do not need to concern ourselves with these details, but it does show that Aristotle is concerned with particular kinds of flawed regimes and how they can best operate and function in addition to his interest in the best practical government and the best government generally.

11. The Politics, Book V

a. Conflict between the Rich and the Poor

In Book V Aristotle turns his attention to how regimes can be preserved and how they are destroyed. Since we have seen what kind of regime a polity is, and how it can be made to endure, we are already in a position to see what is wrong with regimes which do not adopt the principles of a polity. We have already seen the claims of the few rich and the many poor to rule. The former believe that because they are greater in material wealth they should also be greater in political power, while the latter claim that because all citizens are equally free political power should also be equally distributed, which allows the many poor to rule because of their superior numbers. Both groups are partially correct, but neither is entirely correct, “And it is for this reason that, when either [group] does not share in the regime on the basis of the conception it happens to have, they engage in factional conflict” which can lead to civil war (1301a37). While the virtuous also have a claim to rule, the very fact that they are virtuous leads them to avoid factional conflict. They are also too small a group to be politically consequential: “[T]hose who are outstanding in virtue do not engage in factional conflict to speak of; for they are few against many” (1304b4). Therefore, the conflict that matters is the one between the rich and poor, and as we have seen, whichever group gets the upper hand will arrange things for its own benefit and in order to harm the other group. The fact that each of these groups ignores the common good and seeks only its own interest is why both oligarchy and democracy are flawed regimes. It is also ultimately self-destructive to try to put either kind of regime into practice: “Yet to have everywhere an arrangement that is based simply on one or the other of these sorts of equality is a poor thing. This is evident from the result: none of these sorts of regimes is lasting” (1302a3). On the other hand, “[O]ne should not consider as characteristic of popular rule or of oligarchy something tha t will make the city democratically or oligarchically run to the greatest extent possible, but something that will do so for the longest period of time” (1320a1). Democracy tends to be more stable than oligarchy, because democracies only have a conflict between rich and poor, while oligarchies also have conflicts within the ruling group of oligarchs to hold power. In addition, democracy is closer to polity than oligarchy is, and this contributes to its greater stability. And this is an important goal; the more moderate a regime is, the longer it is likely to remain in place.

Why does factional conflict arise? Aristotle turns to this question in Chapter 2. He says: “The lesser engage in factional conflict in order to be equal; those who are equal, in order to be greater” (1302a29). What are the things in which the lesser seek to be equal and the equal to be greater? “As for the things over which they engage in factional conflict, these are profit and honor and their opposites….They are stirred up further by arrogance, by fear, by preeminence, by contempt, by disproportionate growth, by electioneering, by underestimation, by [neglect of] small things, and by dissimilarity” (1302a33). Aristotle describes each of these in more detail. We will not examine them closely, but it is worth observing that Aristotle regards campaigning for office as a potentially dangerous source of conflict. If the city is arranged in such a way that either of the major factions feels that it is being wronged by the other, there are many things that can trigger conflict and even civil war; the regime is inherently unstable. We see again the importance of maintaining a regime which all of the groups in the city wish to see continue.

Aristotle says of democracies that “[D]emocracies undergo revolution particularly on account of the wanton behavior of the popular leaders” (1304b20). Such leaders will harass the property owners, causing them to unify against the democracy, and they will also stir up the poor against the rich in order to maintain themselves in power. This leads to conflict between the two groups and civil war. Aristotle cites a number of historical examples of this. Oligarchies undergo revolution primarily “when they treat the multitude unjustly. Any leader is then adequate [to effect revolution]” (1305a29). Revolution in oligarchical regimes can also come about from competition within the oligarchy, when not all of the oligarchs have a share in the offices. In this case those without power will engage in revolution not to change the regime but to change those who are ruling.

b. How to Preserve Regimes

However, despite all the dangers to the regimes, and the unavoidable risk that any particular regime will be overthrown, Aristotle does have advice regarding the preservation of regimes. In part, of course, we learn how to preserve the regimes by learning what causes revolutions and then avoiding those causes, so Aristotle has already given us useful advice for the preservation of regimes. But he has more advice to offer: “In well-blended regimes, then, one should watch out to ensure there are no transgressions of the laws, and above all be on guard against small ones” (1307b29). Note, again, the importance of letting the laws rule.

It is also important in every regime “to have the laws and management of the rest arranged in such a way that it is impossible to profit from the offices….The many do not chafe as much at being kept away from ruling – they are even glad if someone leaves them the leisure for their private affairs – as they do when they suppose that their rulers are stealing common [funds]; then it pains them both not to share in the prerogatives and not to share in the profits” (1308b32).

And, again, it is beneficial if the group that does not have political power is allowed to share in it to the greatest extent possible, though it should not be allowed to hold the authoritative offices (such as general, treasurer, and so forth). Such men must be chosen extremely carefully: “Those who are going to rule in the authoritative offices ought to have three things: first, affection for the established regime; next, a very great capacity for the work involved in rule; third, virtue and justice – in each regime the sort that is relative to the regime…” (1309a33). It is difficult to find all three of these in many men, but it is important for the regime to make use of the men with these qualities to the greatest degree possible, or else the regime will be harmed, either by sedition, incompetence, or corruption. Aristotle also reminds us of the importance of the middling element for maintaining the regime and making it long-lasting; instead of hostility between the oligarchs and democrats, whichever group has power should be certain always to behave benevolently and justly to the other group (1309b18).

“But the greatest of all the things that have been mentioned with a view to making regimes lasting – though it is now slighted by all – is education relative to the regimes. For there is no benefit in the most beneficial laws, even when these have been approved by all those engaging in politics, if they are not going to be habituated and educated in the regime – if the laws are popular, in a popular spirit, if oligarchic, in an oligarchic spirit” (1310a13). This does not mean that the people living in a democracy should be educated to believe that oligarchs are enemies of the regime, to be oppressed as much as possible and treated unjustly, nor does it mean that the wealthy under an oligarchy should be educated to believe that the poor are to be treated with arrogance and contempt. Instead it means being educated in the principles of moderate democracy and moderate oligarchy, so that the regime will be long-lasting and avoid revolution.

In the remainder of Book V Aristotle discusses monarchy and tyranny and what preserves and destroys these types of regimes. Here Aristotle is not discussing the kind of monarchies with which most people today are familiar, involving hereditary descent of royal power, usually from father to son. A monarch in Aristotle’s sense is one who rules because he is superior to all other citizens in virtue. Monarchy therefore involves individual rule on the basis of merit for the good of the whole city, and the monarch because of his virtue is uniquely well qualified to determine what that means. The tyrant, on the other hand, rules solely for his own benefit and pleasure. Monarchy, therefore, involving the rule of the best man over all, is the best kind of regime, while tyranny, which is essentially the rule of a master over a regime in which all are slaves, is the worst kind of regime, and in fact is really no kind of regime at all. Aristotle lists the particular ways in which both monarchy and tyranny are changed and preserved. We do not need to spend much time on these, for Aristotle says that in his time “there are many persons who are similar, with none of them so outstanding as to match the extent and the claim to merit of the office” that would be required for the rule of one man on the basis of exceptional virtue that characterizes monarchy (1313a5), and tyranny is inherently extremely short lived and clearly without value. However, those wishing to preserve either of these kinds of regimes are advised, as oligarchs and democrats have been, to pursue moderation, diminishing the degree of their power in order to extend its duration.

12. The Politics, Book VI

a. Varieties of Democracy

Most of Book VI is concerned with the varieties of democracy, although Aristotle also revisits the varieties of oligarchy. Some of this discussion has to do with the various ways in which the offices, laws, and duties can be arranged. This part of the discussion we will pass over. However, Aristotle also includes a discussion of the animating principle of democracy, which is freedom: “It is customarily said that only in this sort of regime do [men] share in freedom, for, so it is asserted, every democracy aims at this” (1317a40). In modern liberal democracies, of course, the ability of all to share in freedom and for each citizen to live as one wants is considered one of the regime’s strengths. However, keep in mind that Aristotle believes that human life has a telos and that the political community should provide education and laws that will lead to people pursuing and achieving this telos. Given that this is the case, a regime that allows people to do whatever they want is in fact flawed, for it is not guiding them in the direction of the good life.

b. The Best Kind of Democracy

He also explains which of the varieties of democracy is the best. In Chapter 4, we discover that the best sort of democracy is the one made up of farmers: “The best people is the farming sort, so that it is possible also to create [the best] democracy wherever the multitude lives from farming or herding. For on account of not having much property it is lacking in leisure, and so is unable to hold frequent assemblies. Because they do not have the necessary things, they spend their time at work and do not desire the things of others; indeed, working is more pleasant to them than engaging in politics and ruling, where there are not great spoils to be gotten from office” (1318b9). This is a reason why the authoritative offices can be in the hands of the wealthy, as long as the people retain control of auditing and adjudication: “Those who govern themselves in this way must necessarily be finely governed. The offices will always be in the hands of the best persons, the people being willing and not envious of the respectable, while the arrangement is satisfactory for the respectable and notable. These will not be ruled by others who are their inferiors, and they will rule justly by the fact that others have authority over the audits” (1318b33). By “adjudication” Aristotle means that the many should be certain that juries should be made up of men from their ranks, so that the laws will be enforced with a democratic spirit and the rich will not be able to use their wealth to put themselves above the law. By “authority over the audits” Aristotle refers to an institution which provided that those who held office had to provide an accounting of their activities at regular intervals: where the city’s funds came from, where they went, what actions they took, and so forth. They were liable to prosecution if they were found to have engaged in wrongdoing or mismanagement, and the fear of this prosecution, Aristotle says, will keep them honest and ensure that they act according to the wishes of the democracy.

So we see again that the institutions and laws of a city are important, but equally important is the moral character of the citizens. It is only the character of the farming population that makes the arrangements Aristotle describes possible: “The other sorts of multitude out of which the remaining sorts of democracy are constituted are almost all much meaner than these: their way of life is a mean one, with no task involving virtue among the things that occupy the multitude of human beings who are vulgar persons and merchants or the multitude of laborers” (1319a24). And while Aristotle does not say it here, of course a regime organized in this way, giving a share of power to the wealthy and to the poor, under the rule of law, in the interest of everyone, would in fact be a polity more than it would be a democracy.

c. The Role of Wealth in a Democracy

In Chapter 5 of Book VI he offers further advice that would move the city in the direction of polity when he discusses how wealth should be handled in a democracy. Many democracies offer pay for serving in the assembly or on juries so that the poor will be able to attend. Aristotle advises minimizing the number of trials and length of service on juries so that the cost will not be too much of a burden on the wealthy where there are not sources of revenue from outside the city (Athens, for example, received revenue from nearby silver mines, worked by slaves). Where such revenues exist, he criticizes the existing practice of distributing surpluses to the poor in the form of cash payments, which the poor citizens will take while demanding more. However, poverty is a genuine problem in a democracy: “[O]ne who is genuinely of the popular sort (i.e. a supporter of democracy) should see to it that the multitude is not overly poor, for this is the reason for democracy being depraved” (1320a33). Instead the surplus should be allowed to accumulate until enough is available to give the poor enough money to acquire land or start a trade. And even if there is no external surplus, “[N]otables who are refined and sensible will divide the poor among themselves and provide them with a start in pursuing some work” (1320b8). It seems somewhat unusual for Aristotle to be advocating a form of welfare, but that is what he is doing, on the grounds that poverty is harmful to the character of the poor and this harms the community as a whole by undermining its stability.

13. The Politics, Book VII

a. The Best Regime and the Best Men

It is in Book VII that Aristotle describes the regime that is best without qualification. This differs from the discussion of the best regime in Book IV because in Book IV Aristotle’s concern was the best practical regime, meaning one that it would be possible to bring about from the material provided by existing regimes. Here, however, his interest is in the best regime given the opportunity to create everything just as we would want it. It is “the city that is to be constituted on the basis of what one would pray for” (1325b35). As would be expected, he explicitly ties it to the question of the best way of life: “Concerning the best regime, one who is going to undertake the investigation appropriate to it must necessarily discuss first what the most choiceworthy way of life is. As long as this is unclear, the best regime must necessarily be unclear as well…” (1323a14). We have already discussed the best way of life, as well as the fact that most people do not pursue it: “For [men] consider any amount of virtue to be adequate, but wealth, goods, power, reputation, and all such things they seek to excess without limit” (1323a35). This is, as we have said more than once, a mistake: “Living happily…is available to those who have to excess the adornments of character and mind but behave moderately in respect to the external acquisition of good things” (1323b1). And what is true for the individual is also true for the city. Therefore “the best city is happy and acts nobly. It is impossible to act nobly without acting [to achieve] noble things; but there is no noble deed either of a man or of a city that is separate from virtue and prudence. The courage, justice, and prudence of a city have the same power and form as those human beings share in individually who are called just, prudent, and sound.” (1324b30). The best city, like any other city, must educate its citizens to support its principles. The difference between this city and other cities is that the principles that it teaches its citizens are the correct principles for living the good life. It is here, and nowhere else, that the excellent man and the good citizen are the same.

b. Characteristics of the Best City

What would be the characteristics of the best city we could imagine? First of all, we want the city to be the right size. Many people, Aristotle says, are confused about what this means. They assume that the bigger the city is, the better it will be. But this is wrong. It is certainly true that the city must be large enough to defend itself and to be self-sufficient, but “This too, at any rate, is evident from the facts: that it is difficult – perhaps impossible – for a city that is too populous to be well managed” (1326a26). So the right size for the city is a moderate one; it is the one that enables it to perform its function of creating virtuous citizens properly. “[T]he [city] that is made up of too few persons is not self-sufficient, though the city is a self-sufficient thing, while the one that is made up of too many persons is with respect to the necessary things self-sufficient like a nation, but is not a city; for it is not easy for a regime to be present” (1326b3). There is an additional problem in a regime that is too large: “With a view to judgment concerning the just things and with a view to distributing offices on the basis of merit, the citizens must necessarily be familiar with one another’s qualities; where this does not happen to be the case, what is connected with the offices and with judging must necessarily be carried on poorly” (1326b13).

The size of the territory is also an important element of the ideal regime, and it too must be tailored to the purpose of the regime. Aristotle says “[the territory should be] large enough so that the inhabitants are able to live at leisure in liberal fashion and at the same time with moderation” (1326b29). Again Aristotle’s main concern is with life at peace, not life at war. On the other hand, the city and its territory should be such as to afford its inhabitants advantages in times of war; “it ought to be difficult for enemies to enter, but readily exited by [the citizens] themselves,” and not so big that it cannot be “readily surveyable” because only such a territory is “readily defended” (1326b41). It should be laid out in such a way as to be readily defensible (Book VII, Chapters 11-12). It should also be defensible by sea, since proper sea access is part of a good city. Ideally the city will (like Athens) have a port that is several miles away from the city itself, so that contact with foreigners can be regulated. It should also be in the right geographical location.

Aristotle believed that geography was an important factor in determining the characteristics of the people living in a certain area. He thought that the Greeks had the good traits of both the Europeans (spiritedness) and Asians (souls endowed with art and thought) because of the Greek climate (1327b23). While the harsh climate to the north made Europeans hardy and resilient, as well as resistant to being ruled (although Aristotle did not know about the Vikings, they are perhaps the best example of what he is talking about), and the climate of what he called Asia and we now call the Middle East produced a surplus of food that allowed the men the leisure to engage in intellectual and artistic endeavors while robbing them of spiritedness, the Greeks had the best of both worlds: “[I]t is both spirited and endowed with thought, and hence both remains free and governs itself in the best manner and at the same time is capable of ruling all…” (1327b29).

However, despite the necessary attention to military issues, when we consider the ideal city, the principles which we have already elaborated about the nature of the citizens remain central. Even in the ideal city, constructed to meet the conditions for which we would pray, the need for certain tasks, such as farming and laboring, will remain. Therefore there will also be the need for people to do these tasks. But such people should not be citizens, for (as we have discussed) they will lack the leisure and the intellect to participate in governing the city. They are not really even part of the city: “Hence while cities need possessions, possessions are no part of the city. Many animate things (i.e. slaves and laborers) are part of possessions. But the city is a partnership of similar persons, for the sake of a life that is the best possible” (1328a33). The citizens cannot be merchants, laborers, or farmers, “for there is a need for leisure both with a view to the creation of virtue and with a view to political activities” (1329a1). So all the people living in the city who are not citizens are there for the benefit of the citizens. Any goals, wishes, or desires that they might have are irrelevant; in Kant’s terms, they are treated as means rather than ends.

Those that live the lives of leisure that are open to citizens because of the labor performed by the non-citizens (again, including the women) are all similar to one another, and therefore the appropriate political arrangement for them is “in similar fashion to participate in ruling and being ruled in turn. For equality is the same thing [as justice] for persons who are similar, and it is difficult for a regime to last if its constitution is contrary to justice” (1332b25). These citizens will only be able to rule and be ruled in turn if they have had the proper upbringing, and this is the last major topic that Aristotle takes up in the Politics. Most cities make the mistake of neglecting education altogether, leaving it up to fathers to decide whether they will educate their sons at all, and if so what subject matter will be covered and how it will be taught. Some cities have in fact paid attention to the importance of the proper education of the young, training them in the virtues of the regime. Unfortunately, these regimes have taught them the wrong things. Aristotle is particularly concerned with Sparta here; the Spartans devoted great effort to bringing up their sons to believe that the virtues related to war were the only ones that mattered in life. They were successful; but because war is not the ultimate good, their education was not good. (Recall that the Spartan education was also flawed because it neglected the women entirely).

It is important for the person devising the ideal city to learn from this mistake. Such cities do not last unless they constantly remain at war (which is not an end in itself; no one pursues war for its own sake). Aristotle says “Most cities of this sort preserve themselves when at war, but once having acquired [imperial] rule they come to ruin; they lose their edge, like iron, when they remain at peace. The reason is that the legislator has not educated them to be capable of being at leisure” (1334a6). The proper education must be instilled from the earliest stages of life, and even before; Aristotle tells us the ages that are appropriate for marriage (37 for men, 18 for women) in order to bring about children of the finest quality, and insists on the importance of a healthful regimen for pregnant women, specifying that they take sufficient food and remain physically active. He also says that abortion is the appropriate solution when the population threatens to grow too large (1335b24).

14. The Politics, Book VIII

a. The Education of the Young

Book VIII is primarily concerned with the kind of education that the children of the citizens should receive. That this is a crucial topic for Aristotle is clear from its first sentence: “That the legislator must, therefore, make the education of the young his object above all would be disputed by no one” (1337a10). It is so important that it cannot be left to individual families, as was the custom in Greece. Instead, “Since there is a single end for the city as a whole, it is evident that education must necessarily be one and the same for all, and that the superintendence of it should be common and not on a private basis….For common things the training too should be made common” (1337a21). The importance of a common education shaping each citizen so as to enable him to serve the common good of the city recalls the discussion of how the city is prior to the individual in Book I Chapter 2; as has been quoted already in the discussion above, “one ought not even consider that a citizen belongs to himself, but rather that all belong to the city; for each individual is a part of the city” (1337a26).

He elaborates on the content of this education, noting that it should involve the body as well as the mind. Aristotle includes physical education, reading and writing, drawing, and music as subjects which the young potential citizens must learn. The aim of this education is not productive or theoretical knowledge. Instead it is meant to teach the young potential citizens practical knowledge – the kind of knowledge that each of them will need to fulfill his telos and perform his duties as a citizen. Learning the subjects that fall under the heading of productive knowledge, such as how to make shoes, would be degrading to the citizen. Learning the subjects that would fall under the heading of theoretical knowledge would be beyond the ability of most of the citizens, and is not necessary to them as citizens.

15. References and Further Reading

The list below is not intended to be comprehensive. It is limited to works published from 1962 to 2002. Most of these have their own bibliographies and suggested reading lists, and the reader is encouraged to take advantage of these.

Translations of Aristotle

  • Barnes, Jonathan, ed. The Complete Works of Aristotle: The Revised Oxford Translation. Princeton: Princeton University Press, 1984. Two volumes.
    • The standard edition of Aristotle’s complete works.
  • Irwin, Terence, and Gail Fine, eds. Aristotle: Introductory Readings. Indianapolis, IN: Hackett Publishing Company, Inc., 1996
    • As the title suggests, this book includes excerpts from Aristotle’s writings. Understanding any of Aristotle’s texts means reading it in its entirety, but if you want a book by your side to check cross-references from whichever of his texts you are reading (for example, if the editor of the edition of the Politics you are reading refers to the Ethics), this one should do the trick.
  • Aristotle. Nicomachean Ethics. Translated and edited by Roger Crisp. Cambridge: Cambridge University Press, 2000.
    • This translation lacks the scholarly and critical apparatus of the Rowe translation but is still a fine choice.
  • Aristotle. Nicomachean Ethics. Translated and edited by Terry Irwin. Indianapolis: Hackett Publishing, 1999.
  • Aristotle. Nicomachean Ethics. Translated and with an introduction by Martin Ostwald. New York: Macmillan Publishing Company, 1962.
    • The translation used in preparing this entry. A good basic translation.
  • Aristotle. Nicomachean Ethics. Translated and with an introduction by David Ross. Revised by J.L. Ackrill and J.O. Urmson. Oxford: Oxford University Press, 1980.
    • Updated and revised version of a classic translation from 1925. See also Ross’ book on Aristotle below.
  • Aristotle. Nicomachean Ethics. Translation and historical introduction by Christopher Rowe; philosophical introduction and commentary by Sarah Broadie. Oxford: Oxford University Press, 2002.
    • A very thorough introduction and commentary are included with this translation of theEthics. A good choice for the beginning student – but remember that the introduction and commentary are not meant to substitute for actually reading the text!
  • Aristotle. The Politics. Translated and with an introduction by Carnes Lord. Chicago: University of Chicago Press, 1984.
    • The translation used in preparing this entry. A useful introduction and very thorough notes, identifying names, places, and terms with which the reader may not be familiar.
  • Aristotle. The Politics. Translated by C.D.C. Reeve. Indianapolis : Hackett Publishing, 1998.
  • Aristotle. The Politics of Aristotle. Translated by Peter Simpson. Chapel Hill: University of North Carolina Press, 1997.
  • Aristotle. The Politics and The Constitution of Athens. Edited by Stephen Everson. Cambridge: Cambridge University Press, 1996.
    • If you’re looking for The Constitution of Athens this is a good place to go – and with thePolitics in the same book it’s easy to compare the two books to each other. However, the texts are lacking in footnotes, which is a particular problem with the Constitution since it records Athenian history. So, for example, on page 237 we learn that during the rule of the Thirty Tyrants in Athens the rulers chose “ten colleagues to govern the Peiraeus,” without any indication that the Peiraeus was the Athenian harbor and its surrounding community, five miles from the city (it is also the setting of Plato’s Republic). It would help to have names, places, and concepts defined and explained through footnotes for the beginning student. The more advanced student may wish to consult the four volumes on the Politics in the Oxford University Press’s Clarendon Aristotle Series. Volume I, covering Books I and II of the Politics, is by Trevor Saunders; Volume II, on Books III and IV, is by Richard Robinson; Volume III, on Books V and VI, is by David Keyt, and Volume IV, on Books VII and VIII, is by Richard Kraut.
  • Aristotle. The Rhetoric. In George A. Kennedy, Aristotle On Rhetoric: A Theory of Civic Discourse.Translated and with an introduction by George A. Kennedy. New York: Oxford University Press, 1991.
    • The Rhetoric includes observations on politics and ethics in the context of teaching the reader how to become a rhetorician. Whether or not this requires the student to behave ethically is a matter of some debate. Speaking well in public settings was crucial to attaining political success in the Athenian democracy (and is still valuable today) and much of Aristotle’s practical advice remains useful.

Secondary literature – general works on Aristotle

  • Ackrill, J. L. Aristotle the Philosopher. New York: Oxford University Press, 1981.
  • Adler, Mortimer. Aristotle for Everybody: Difficult Thought Made Easy. New York: Macmillan Publishing Co., Inc., 1978.
    • This is probably the easiest-to-read exposition of Aristotle available; Adler says that it is aimed at “everybody – of any age, from twelve or fourteen years upward.” Obviously the author has had to make some sacrifices in the areas of detail and complexity to accomplish this, and anyone who has spent any time at all with Aristotle will probably wish to start elsewhere. Nevertheless, the author succeeds to a very great degree in delivering on the promise of the subtitle, expressing the basics of Aristotle’s thought in simple language using common examples and straightforward descriptions.
  • Barnes, Jonathan. Aristotle: A Very Short Introduction. New York: Oxford University Press, 2000.
  • Barnes, Jonathan, ed. The Cambridge Companion to Aristotle. Cambridge: Cambridge University Press, 1995.
    • “The Companion is intended for philosophical readers who are new to Aristotle,” Barnes writes in the Introduction, and the book delivers. Chapter Seven, by D.S. Hutchinson, covers Aristotle’s ethical theory; Chapter Eight, by C.C.W. Taylor, his political theory. Barnes himself writes the first chapter on Aristotle’s life and work, as well as an excellent introduction which includes an explanation of why no book (or, I would add, encyclopedia article) can substitute for reading the original Aristotelian texts. It also includes the following: “Plato had an influence second only to Aristotle…. But Plato’s philosophical views are mostly false, and for the most part they are evidently false; his arguments are mostly bad, and for the most part they are evidently bad.” If those remarks provoke any kind of emotional or intellectual response in you, you may as well give up: you are on the way to being a student of philosophy.
  • Guthrie, W.K.C. Aristotle: An Encounter. Cambridge: Cambridge University Press, 1981.
    • Volume 6 of his six volume Cambridge History of Ancient Greek Philosophy written between 1962 and 1981.
  • Robinson, Timothy A. Aristotle in Outline. Indianapolis: Hackett Publishing Company, Inc., 1995.
    • Another short (125 pages) introduction to Aristotle’s thought, with three sections: Wisdom and Science, Aristotle’s Ethics, and Politics. It would be an excellent choice for the beginning student or anyone who just wants to be introduced to Aristotle’s philosophy. Robinson is sympathetic to Aristotle but also to his readers, keeping things easy to read while at the same time offering enough detail about Aristotle’s doctrines to illuminate his entire system and making the interconnections among the various elements of Aristotle’s system clear.
  • Ross, Sir David. Aristotle. With an introduction by John L. Ackrill. Sixth edition. London: Routledge, 1995.
    • This is a classic in the field, now in its sixth edition, having first been published in 1923. Not many books can stay useful for eighty years. “It is not an elementary introduction for the absolute beginner,” the introduction says, and that seems right to me, but neither does it require the reader to be an expert. It covers all of Aristotle’s work, with chapters on Logic, Philosophy of Nature, Biology, Psychology, Metaphysics, Ethics, Politics, and Rhetoric and Poetics.
  • Thompson, Garrett and Marshall Missner. On Aristotle. Belmont, CA: Wadsworth, 2000.
    • Another short (100 page) overview of Aristotle’s thought that is too short to be adequate for any one topic (Chapter Nine, Aristotle’s view of politics, is less than six pages long) but might be useful for the new student of Aristotle interested in a brief look at the breadth of Aristotle’s interests. The book by Barnes included above is to be preferred.

Secondary literature – books on Aristotle’s Politics

  • Keyt, David, and Fred Miller, eds. A Companion to Aristotle’s Politics. London: Blackwell, 1991.
  • Kraut, Richard. Aristotle: Political Philosophy. Oxford: Oxford University Press, 2002.
    • An exceptional work of scholarship. Detailed, insightful, and as close to being comprehensive as anyone is likely to get in one book. The text is clearly broken down by topic and sub-topic, and the bibliography will help steer the Aristotle student in the right direction for future research. Kraut also notes other authors who disagree with his interpretation and why he believes they are wrong; this too is helpful for further research. Highly recommended.
  • Miller, Fred. Nature, Justice and Rights in Aristotle’s Politics. New York: Oxford University Press, 1995.
  • Mulgan, R.G. Aristotle’s Political Theory: An Introduction for Students of Political Theory. Oxford: Clarendon Press, 1977.
    • Mulgan’s book “is intended for students of political theory who are meeting the Politics for the first time and in an English translation.” It is divided into subjects rather than following the topics in the order discussed in the Politics as this article has done, with footnotes to the relevant passages in Aristotle’s texts. It is nicely detailed and offers excellent discussions (and criticisms) of Aristotle’s thought.
  • Simpson, Peter. A Philosophical Commentary on the Politics of Aristotle. Chapel Hill: University of North Carolina Press, 1998.

Author Information:

Edward Clayton
Email: clayt1ew@cmich.edu
Central Michigan University
U. S. A.

Propositional Logic

Propositional logic, also known as sentential logic and statement logic, is the branch of logic that studies ways of joining and/or modifying entire propositions, statements or sentences to form more complicated propositions, statements or sentences, as well as the logical relationships and properties that are derived from these methods of combining or altering statements. In propositional logic, the simplest statements are considered as indivisible units, and hence, propositional logic does not study those logical properties and relations that depend upon parts of statements that are not themselves statements on their own, such as the subject and predicate of a statement. The most thoroughly researched branch of propositional logic is classical truth-functional propositional logic, which studies logical operators and connectives that are used to produce complex statements whose truth-value depends entirely on the truth-values of the simpler statements making them up, and in which it is assumed that every statement is either true or false and not both. However, there are other forms of propositional logic in which other truth-values are considered, or in which there is consideration of connectives that are used to produce statements whose truth-values depend not simply on the truth-values of the parts, but additional things such as their necessity, possibility or relatedness to one another.

Table of Contents

  1. Introduction
  2. History
  3. The Language of Propositional Logic
    1. Syntax and Formation Rules of PL
    2. Truth Functions and Truth Tables
    3. Definability of the Operators and the Languages PL’ and PL”
  4. Tautologies, Logical Equivalence and Validity
  5. Deduction: Rules of Inference and Replacement
    1. Natural Deduction
    2. Rules of Inference
    3. Rules of Replacement
    4. Direct Deductions
    5. Conditional and Indirect Proofs
  6. Axiomatic Systems and the Propositional Calculus
  7. Meta-Theoretic Results for the Propositional Calculus
  8. Other Forms of Propositional Logic
  9. References and Further Reading

1. Introduction

A statement can be defined as a declarative sentence, or part of a sentence, that is capable of having a truth-value, such as being true or false. So, for example, the following are statements:

  • George W. Bush is the 43rd President of the United States.
  • Paris is the capital of France.
  • Everyone born on Monday has purple hair.

Sometimes, a statement can contain one or more other statements as parts. Consider for example, the following statement:

  • Either Ganymede is a moon of Jupiter or Ganymede is a moon of Saturn.

While the above compound sentence is itself a statement, because it is true, the two parts, “Ganymede is a moon of Jupiter” and “Ganymede is a moon of Saturn”, are themselves statements, because the first is true and the second is false.

The term proposition is sometimes used synonymously with statement. However, it is sometimes used to name something abstract that two different statements with the same meaning are both said to “express”. In this usage, the English sentence, “It is raining”, and the French sentence “Il pleut”, would be considered to express the same proposition; similarly, the two English sentences, “Callisto orbits Jupiter” and “Jupiter is orbited by Callisto” would also be considered to express the same proposition. However, the nature or existence of propositions as abstract meanings is still a matter of philosophical controversy, and for the purposes of this article, the phrases “statement” and “proposition” are used interchangeably.

Propositional logic, also known as sentential logic, is that branch of logic that studies ways of combining or altering statements or propositions to form more complicated statements or propositions. Joining two simpler propositions with the word “and” is one common way of combining statements. When two statements are joined together with “and”, the complex statement formed by them is true if and only if both the component statements are true. Because of this, an argument of the following form is logically valid:

Paris is the capital of France and Paris has a population of over two million.
Therefore, Paris has a population of over two million.

Propositional logic largely involves studying logical connectives such as the words “and” and “or” and the rules determining the truth-values of the propositions they are used to join, as well as what these rules mean for the validity of arguments, and such logical relationships between statements as being consistent or inconsistent with one another, as well as logical properties of propositions, such as being tautologically true, being contingent, and being self-contradictory. (These notions are defined below.)

Propositional logic also studies way of modifying statements, such as the addition of the word “not” that is used to change an affirmative statement into a negative statement. Here, the fundamental logical principle involved is that if a given affirmative statement is true, the negation of that statement is false, and if a given affirmative statement is false, the negation of that statement is true.

What is distinctive about propositional logic as opposed to other (typically more complicated) branches of logic is that propositional logic does not deal with logical relationships and properties that involve the parts of a statement smaller than the simple statements making it up. Therefore, propositional logic does not study those logical characteristics of the propositions below in virtue of which they constitute a valid argument:

  1. George W. Bush is a president of the United States.
  2. George W. Bush is a son of a president of the United States.
  3. Therefore, there is someone who is both a president of the United States and a son of a president of the United States.

The recognition that the above argument is valid requires one to recognize that the subject in the first premise is the same as the subject in the second premise. However, in propositional logic, simple statements are considered as indivisible wholes, and those logical relationships and properties that involve parts of statements such as their subjects and predicates are not taken into consideration.

Propositional logic can be thought of as primarily the study of logical operators. A logical operator is any word or phrase used either to modify one statement to make a different statement, or join multiple statements together to form a more complicated statement. In English, words such as “and”, “or”, “not”, “if … then…”, “because”, and “necessarily”, are all operators.

A logical operator is said to be truth-functional if the truth-values (the truth or falsity, etc.) of the statements it is used to construct always depend entirely on the truth or falsity of the statements from which they are constructed. The English words “and”, “or” and “not” are (at least arguably) truth-functional, because a compound statement joined together with the word “and” is true if both the statements so joined are true, and false if either or both are false, a compound statement joined together with the word “or” is true if at least one of the joined statements is true, and false if both joined statements are false, and the negation of a statement is true if and only if the statement negated is false.

Some logical operators are not truth-functional. One example of an operator in English that is not truth-functional is the word “necessarily”. Whether a statement formed using this operator is true or false does not depend entirely on the truth or falsity of the statement to which the operator is applied. For example, both of the following statements are true:

  • 2 + 2 = 4.
  • Someone is reading an article in a philosophy encyclopedia.

However, let us now consider the corresponding statements modified with the operator “necessarily”:

  • Necessarily, 2 + 2 = 4.
  • Necessarily, someone is reading an article in a philosophy encyclopedia.

Here, the first example is true but the second example is false. Hence, the truth or falsity of a statement using the operator “necessarily” does not depend entirely on the truth or falsity of the statement modified.

Truth-functional propositional logic is that branch of propositional logic that limits itself to the study of truth-functional operators. Classical (or “bivalent”) truth-functional propositional logic is that branch of truth-functional propositional logic that assumes that there are are only two possible truth-values a statement (whether simple or complex) can have: (1) truth, and (2) falsity, and that every statement is either true or false but not both.

Classical truth-functional propositional logic is by far the most widely studied branch of propositional logic, and for this reason, most of the remainder of this article focuses exclusively on this area of logic. In addition to classical truth-functional propositional logic, there are other branches of propositional logic that study logical operators, such as “necessarily”, that are not truth-functional. There are also “non-classical” propositional logics in which such possibilities as (i) a proposition’s having a truth-value other than truth or falsity, (ii) a proposition’s having an indeterminate truth-value or lacking a truth-value altogether, and sometimes even (iii) a proposition’s being both true and false, are considered. (For more information on these alternative forms of propositional logic, consult Section VIII below.)

2. History

The serious study of logic as an independent discipline began with the work of Aristotle (384-322 BCE). Generally, however, Aristotle’s sophisticated writings on logic dealt with the logic of categories and quantifiers such as “all”, and “some”, which are not treated in propositional logic. However, in his metaphysical writings, Aristotle espoused two principles of great importance in propositional logic, which have since come to be called the Law of Excluded Middle and the Law of Contradiction. Interpreted in propositional logic, the first is the principle that every statement is either true or false, the second is the principle that no statement is both true and false. These are, of course, cornerstones of classical propositional logic. There is some evidence that Aristotle, or at least his successor at the Lyceum, Theophrastus (d. 287 BCE), did recognize a need for the development of a doctrine of “complex” or “hypothetical” propositions, that is, those involving conjunctions (statements joined by “and”), disjunctions (statements joined by “or”) and conditionals (statements joined by “if… then…”), but their investigations into this branch of logic seem to have been very minor.

More serious attempts to study statement operators such as “and”, “or” and “if… then…” were conducted by the Stoic philosophers in the late 3rd century BCE. Since most of their original works—if indeed, these writings were even produced—are lost, we cannot make many definite claims about exactly who first made investigations into what areas of propositional logic, but we do know from the writings of Sextus Empiricus that Diodorus Cronus and his pupil Philo had engaged in a protracted debate about whether the truth of a conditional statement depends entirely on it not being the case that its antecedent (if-clause) is true while its consequent (then-clause) is false, or whether it requires some sort of stronger connection between the antecedent and consequent—a debate that continues to have relevance for modern discussion of conditionals. The Stoic philosopher Chrysippus (roughly 280-205 BCE) perhaps did the most in advancing Stoic propositional logic, by marking out a number of different ways of forming complex premises for arguments, and for each, listing valid inference schemata. Chrysippus suggested that the following inference schemata are to be considered the most basic:

  1. If the first, then the second; but the first; therefore the second.
  2. If the first, then the second; but not the second; therefore, not the first.
  3. Not both the first and the second; but the first; therefore, not the second.
  4. Either the first or the second [and not both]; but the first; therefore, not the second.
  5. Either the first or the second; but not the second; therefore the first.

Inference rules such as the above correspond very closely to the basic principles in a contemporary system of natural deduction for propositional logic. For example, the first two rules correspond to the rules of modus ponens and modus tollens, respectively. These basic inference schemata were expanded upon by less basic inference schemata by Chrysippus himself and other Stoics, and are preserved in the work of Diogenes Laertius, Sextus Empiricus and later, in the work of Cicero.

Advances on the work of the Stoics were undertaken in small steps in the centuries that followed. This work was done by, for example, the second century logician Galen (roughly 129-210 CE), the sixth century philosopher Boethius (roughly 480-525 CE) and later by medieval thinkers such as Peter Abelard (1079-1142) and William of Ockham (1288-1347), and others. Much of their work involved producing better formalizations of the principles of Aristotle or Chrysippus, introducing improved terminology and furthering the discussion of the relationships between operators. Abelard, for example, seems to have been the first to clearly differentiate exclusive disjunction from inclusive disjunction (discussed below), and to suggest that inclusive disjunction is the more important notion for the development of a relatively simple logic of disjunctions.

The next major step forward in the development of propositional logic came only much later with the advent of symbolic logic in the work of logicians such as Augustus DeMorgan (1806-1871) and, especially, George Boole (1815-1864) in the mid-19th century. Boole was primarily interested in developing a mathematical-style “algebra” to replace Aristotelian syllogistic logic, primarily by employing the numeral “1” for the universal class, the numeral “0” for the empty class, the multiplication notation “xy” for the intersection of classes x and y, the addition notation “x + y” for the union of classes x and y, etc., so that statements of syllogistic logic could be treated in quasi-mathematical fashion as equations; for example, “No x is y” could be written as “xy = 0”. However, Boole noticed that if an equation such as “x = 1” is read as “x is true”, and “x = 0” is read as “x is false”, the rules given for his logic of classes can be transformed into a logic for propositions, with “x + y = 1” reinterpreted as saying that either x or y is true, and “xy = 1” reinterpreted as meaning that x and y are both true. Boole’s work sparked rapid interest in logic among mathematicians. Later, “Boolean algebras” were used to form the basis of the truth-functional propositional logics utilized in computer design and programming.

In the late 19th century, Gottlob Frege (1848-1925) presented logic as a branch of systematic inquiry more fundamental than mathematics or algebra, and presented the first modern axiomatic calculus for logic in his 1879 work Begriffsschrift. While it covered more than propositional logic, from Frege’s axiomatization it is possible to distill the first complete axiomatization of classical truth-functional propositional logic. Frege was also the first to systematically argue that all truth-functional connectives could be defined in terms of negation and the material conditional.

In the early 20th century, Bertrand Russell gave a different complete axiomatization of propositional logic, considered on its own, in his 1906 paper “The Theory of Implication”, and later, along with A. N. Whitehead, produced another axiomatization using disjunction and negation as primitives in the 1910 work Principia Mathematica. Proof of the possibility of defining all truth functional operators in virtue of a single binary operator was first published by American logician H. M. Sheffer in 1913, though American logician C. S. Peirce (1839-1914) seems to have discovered this decades earlier. In 1917, French logician Jean Nicod discovered that it was possible to axiomatize propositional logic using the Sheffer stroke and only a single axiom schema and single inference rule.

The notion of a “truth table” is often utilized in the discussion of truth-functional connectives (discussed below). It seems to have been at least implicit in the work of Peirce, W. S. Jevons (1835-1882), Lewis Carroll (1832-1898), John Venn (1834-1923), and Allan Marquand (1853-1924). Truth tables appear explicitly in writings by Eugen Müller as early as 1909. Their use gained rapid popularity in the early 1920s, perhaps due to the combined influence of the work of Emil Post, whose 1921 work makes liberal use of them, and Ludwig Wittgenstein’s 1921 Tractatus Logico-Philosophicus, in which truth tables and truth-functionality are prominently featured.

Systematic inquiry into axiomatic systems for propositional logic and related metatheory was conducted in the 1920s, 1930s and 1940s by David Hilbert, Paul Bernays, Alfred Tarski, Jan Łukasiewicz, Kurt Gödel, Alonzo Church, and others. It is during this period, that most of the important metatheoretic results such as those discussed in Section VII were discovered.

Complete natural deduction systems for classical truth-functional propositional logic were developed and popularized in the work of Gerhard Gentzen in the mid-1930s, and subsequently introduced into influential textbooks such as that of F. B. Fitch (1952) and Irving Copi (1953).

Modal propositional logics are the most widely studied form of non-truth-functional propositional logic. While interest in modal logic dates back to Aristotle, by contemporary standards the first systematic inquiry into this modal propositional logic can be found in the work of C. I. Lewis in 1912 and 1913. Among other well-known forms of non-truth-functional propositional logic, deontic logic began with the work of Ernst Mally in 1926, and epistemic logic was first treated systematically by Jaakko Hintikka in the early 1960s. The modern study of three-valued propositional logic began in the work of Jan Łukasiewicz in 1917, and other forms of non-classical propositional logic soon followed suit. Relevance propositional logic is relatively more recent; dating from the mid-1970s in the work of A. R. Anderson and N. D. Belnap. Paraconsistent logic, while having its roots in the work of Łukasiewicz and others, has blossomed into an independent area of research only recently, mainly due to work undertaken by N. C. A. da Costa, Graham Priest and others in the 1970s and 1980s.

3. The Language of Propositional Logic

The basic rules and principles of classical truth-functional propositional logic are, among contemporary logicians, almost entirely agreed upon, and capable of being stated in a definitive way. This is most easily done if we utilize a simplified logical language that deals only with simple statements considered as indivisible units as well as complex statements joined together by means of truth-functional connectives. We first consider a language called PL for “Propositional Logic”. Later we shall consider two even simpler languages, PL’ and PL”.

a. Syntax and Formation Rules of PL

In any ordinary language, a statement would never consist of a single word, but would always at the very least consist of a noun or pronoun along with a verb. However, because propositional logic does not consider smaller parts of statements, and treats simple statements as indivisible wholes, the language PL uses uppercase letters ‘A‘, ‘B‘, ‘C‘, etc., in place of complete statements. The logical signs ‘\land‘, ‘\lor‘, ‘→’, ‘↔’, and ‘\neg‘ are used in place of the truth-functional operators, “and”, “or”, “if… then…”, “if and only if”, and “not”, respectively. So, consider again the following example argument, mentioned in Section I.

Paris is the capital of France and Paris has a population of over two million.
Therefore, Paris has a population of over two million.

If we use the letter ‘C‘ as our translation of the statement “Paris is the captial of France” in PL, and the letter ‘P‘ as our translation of the statement “Paris has a population of over two million”, and use a horizontal line to separate the premise(s) of an argument from the conclusion, the above argument could be symbolized in language PL as follows:

\( \begin{array}{l} C \land P\\ \hline P \end{array} \)

In addition to statement letters like ‘C‘ and ‘P‘ and the operators, the only other signs that sometimes appear in the language PL are parentheses which are used in forming even more complex statements. Consider the English compound sentence, “Paris is the most important city in France if and only if Paris is the capital of France and Paris has a population of over two million.” If we use the letter ‘I‘ in language PL to mean that Paris is the most important city in France, this sentence would be translated into PL as follows:

I \leftrightarrow (C \land P)

The parentheses are used to group together the statements ‘C‘ and ‘P‘ and differentiate the above statement from the one that would be written as follows:

(I \leftrightarrow C) \land P

This latter statement asserts that Paris is the most important city in France if and only if it is the capital of France, and (separate from this), Paris has a population of over two million. The difference between the two is subtle, but important logically.

It is important to describe the syntax and make-up of statements in the language PL in a precise manner, and give some definitions that will be used later on. Before doing this, it is worthwhile to make a distinction between the language in which we will be discussing PL, namely, English, from PL itself. Whenever one language is used to discuss another, the language in which the discussion takes place is called the metalanguage, and language under discussion is called the object language. In this context, the object language is the language PL, and the metalanguage is English, or to be more precise, English supplemented with certain special devices that are used to talk about language PL. It is possible in English to talk about words and sentences in other languages, and when we do, we place the words or sentences we wish to talk about in quotation marks. Therefore, using ordinary English, I can say that “parler” is a French verb, and “I \land C” is a statement of PL. The following expression is part of PL, not English:

(I \leftrightarrow C) \land P

However, the following expression is a part of English; in particular, it is the English name of a PL sentence:

(I \leftrightarrow C) \land P

This point may seem rather trivial, but it is easy to become confused if one is not careful.

In our metalanguage, we shall also be using certain variables that are used to stand for arbitrary expressions built from the basic symbols of PL. In what follows, the Greek letters ‘\alpha‘, ‘\beta‘, and so on, are used for any object language (PL) expression of a certain designated form. For example, later on, we shall say that, if \alpha is a statement of PL, then so is \ulcorner \neg \alpha \urcorner. Notice that ‘\alpha‘ itself is not a symbol that appears in PL; it is a symbol used in English to speak about symbols of PL. We will also be making use of so-called “Quine corners”, written ‘\ulcorner‘ and ‘\urcorner‘, which are a special metalinguistic device used to speak about object language expressions constructed in a certain way. Suppose \alpha is the statement “(I \leftrightarrow C)” and \beta is the statement “(P \land C)“; then \ulcorner \alpha \lor \beta \urcorner is the complex statement “(I \leftrightarrow C) \lor (P \land C)“.

Let us now proceed to giving certain definitions used in the metalanguage when speaking of the language PL.

Definition: A statement letter of PL is defined as any uppercase letter written with or without a numerical subscript.

Note: According to this definition, ‘A‘, ‘B‘, ‘B_2‘, ‘C_3‘, and ‘P_{14}‘ are examples of statement letters. The numerical subscripts are used just in case we need to deal with more than 26 simple statements: in that case, we can use ‘P_1‘ to mean something different than ‘P_2‘, and so forth.

Definition: A connective or operator of PL is any of the signs ‘\neg‘, ‘\land‘, ‘\lor‘, ‘→’, and ‘↔’.

Definition: A well-formed formula (hereafter abbreviated as wff) of PL is defined recursively as follows:

  1. Any statement letter is a well-formed formula.
  2. If \alpha is a well-formed formula, then so is \ulcorner \neg \alpha \urcorner.
  3. If \alpha and \beta are well-formed formulas, then so is \ulcorner (\alpha \land \beta) \urcorner.
  4. If \alpha and \beta are well-formed formulas, then so is \ulcorner (\alpha \lor \beta) \urcorner.
  5. If \alpha and \beta are well-formed formulas, then so is \ulcorner (\alpha \rightarrow \beta) \urcorner.
  6. If \alpha and \beta are well-formed formulas, then so is \ulcorner (\alpha \leftrightarrow \beta) \urcorner.
  7. Nothing that cannot be constructed by successive steps of (1)-(6) is a well-formed formula.

Note: According to part (1) of this definition, the statement letters ‘C‘, ‘P‘ and ‘M‘ are wffs. Because ‘C‘ and ‘P‘ are wffs, by part (3), “(C \land P)” is a wff. Because it is a wff, and ‘M‘ is also a wff, by part (6), “(M \leftrightarrow (C \land P))” is a wff. It is conventional to regard the outermost parentheses on a wff as optional, so that “M \leftrightarrow (C \land P)” is treated as an abbreviated form of “(M \leftrightarrow (C \land P))“. However, whenever a shorter wff is used in constructing a more complicated wff, the parentheses on the shorter wff are necessary.

The notion of a well-formed formula should be understood as corresponding to the notion of a grammatically correct or properly constructed statement of language PL. This definition tells us, for example, that “\neg (Q \lor \neg R)” is grammatical for PL because it is a well-formed formula, whereas the string of symbols, “\neg Q \neg \lor ( \leftrightarrow P \land“, while consisting entirely of symbols used in PL, is not grammatical because it is not well-formed.

b. Truth Functions and Truth Tables

So far we have in effect described the grammar of language PL. When setting up a language fully, however, it is necessary not only to establish rules of grammar, but also describe the meanings of the symbols used in the language. We have already suggested that uppercase letters are used as complete simple statements. Because truth-functional propositional logic does not analyze the parts of simple statements, and only considers those ways of combining them to form more complicated statements that make the truth or falsity of the whole dependent entirely on the truth or falsity of the parts, in effect, it does not matter what meaning we assign to the individual statement letters like ‘P‘, ‘Q‘ and ‘R‘, etc., provided that each is taken as either true or false (and not both).

However, more must be said about the meaning or semantics, of the logical operators ‘\land‘, ‘\lor‘, ‘→’, ‘↔’, and ‘\neg‘. As mentioned above, these are used in place of the English words, ‘and’, ‘or’, ‘if… then…’, ‘if and only if’, and ‘not’, respectively. However, the correspondence is really only rough, because the operators of PL are considered to be entirely truth-functional, whereas their English counterparts are not always used truth-functionally. Consider, for example, the following statements:

  1. If Bob Dole is president of the United States in 2004, then the president of the United States in 2004 is a member of the Republican party.
  2. If Al Gore is president of the United States in 2004, then the president of the United States in 2004 is a member of the Republican party.

For those familiar with American politics, it is tempting to regard the English sentence (1) as true, but to regard (2) as false, since Dole is a Republican but Gore is not. But notice that in both cases, the simple statement in the “if” part of the “if… then…” statement is false, and the simple statement in the “then” part of the statement is true. This shows that the English operator “if… then…” is not fully truth-functional. However, all the operators of language PL are entirely truth-functional, so the sign ‘→’, though similar in many ways to the English “if… then…” is not in all ways the same. More is said about this operator below.

Since our study is limited to the ways in which the truth-values of complex statements depend on the truth-values of the parts, for each operator, the only aspect of its meaning relevant in this context is its associated truth-function. The truth-function for an operator can be represented as a table, each line of which expresses a possible combination of truth-values for the simpler statements to which the operator applies, along with the resulting truth-value for the complex statement formed using the operator.

The signs ‘\land‘, ‘\lor‘, ‘→’, ‘↔’, and ‘\neg‘, correspond, respectively, to the truth-functions of conjunction, disjunction, material implication, material equivalence, and negation. We shall consider these individually.

Conjunction: The conjunction of two statements \alpha and \beta, written in PL as \ulcorner (\alpha \land \beta) \urcorner, is true if both \alpha and \beta are true, and is false if either \alpha is false or \beta is false or both are false. In effect, the meaning of the operator ‘\land‘ can be displayed according to the following chart, which shows the truth-value of the conjunction depending on the four possibilities of the truth-values of the parts:

\alpha \beta (\alpha \land \beta)
T
T
F
F
T
F
T
F
T
F
F
F

Conjunction using the operator ‘\land‘ is language PL’s rough equivalent of joining statements together with ‘and’ in English. In a statement of the form \ulcorner (\alpha \land \beta) \urcorner, the two statements joined together, \alpha and \beta, are called the conjuncts, and the whole statement is called a conjunction.

Instead of the sign ‘\land‘, some other logical works use the signs ‘\&‘ or ‘\bullet‘ for conjunction.

Disjunction: The disjunction of two statements \alpha and \beta, written in PL as \ulcorner (\alpha \lor \beta) \urcorner, is true if either \alpha is true or \beta is true, or both \alpha and \beta are true, and is false only if both \alpha and \beta are false. A chart similar to that given above for conjunction, modified for to show the meaning of the disjunction sign ‘\lor‘ instead, would be drawn as follows:

\alpha \beta (\alpha \lor \beta)
T
T
F
F
T
F
T
F
T
T
T
F

This is language PL’s rough equivalent of joining statements together with the word ‘or’ in English. However, it should be noted that the sign ‘\lor‘ is used for disjunction in the inclusive sense. Sometimes when the word ‘or’ is used to join together two English statements, we only regard the whole as true if one side or the other is true, but not both, as when the statement “Either we can buy the toy robot, or we can buy the toy truck; you must choose!” is spoken by a parent to a child who wants both toys. This is called the exclusive sense of ‘or’. However, in PL, the sign ‘\lor‘ is used inclusively, and is more analogous to the English word ‘or’ as it appears in a statement such as (for example, said about someone who has just received a perfect score on the SAT), “either she studied hard, or she is extremely bright”, which does not mean to rule out the possibility that she both studied hard and is bright. In a statement of the form \ulcorner (\alpha \lor \beta) \urcorner, the two statements joined together, \alpha and \beta, are called the disjuncts, and the whole statement is called a disjunction.

Material Implication: This truth-function is represented in language PL with the sign ‘→’. A statement of the form \ulcorner (\alpha \rightarrow \beta) \urcorner, is false if \alpha is true and \beta is false, and is true if either \alpha is false or \beta is true (or both). This truth-function generates the following chart:

\alpha \beta (\alpha \rightarrow \beta)
T
T
F
F
T
F
T
F
T
F
T
T

Because the truth of a statement of the form \ulcorner (\alpha \rightarrow \beta) \urcorner rules out the possibility of \alpha being true and \beta being false, there is some similarity between the operator ‘→’ and the English phrase, “if… then…”, which is also used to rule out the possibility of one statement being true and another false; however, ‘→’ is used entirely truth-functionally, and so, for reasons discussed earlier, it is not entirely analogous with “if… then…” in English. If \alpha is false, then \ulcorner (\alpha \rightarrow \beta) \urcorner is regarded as true, whether or not there is any connection between the falsity of \alpha and the truth-value of \beta. In a statement of the form \ulcorner (\alpha \rightarrow \beta) \urcorner, we call \alpha the antecedent, and we call \beta the consequent, and the whole statement \ulcorner (\alpha \rightarrow \beta) \urcorner is sometimes also called a (material) conditional.

The sign ‘\supset‘ is sometimes used instead of ‘→’ for material implication.

Material Equivalence: This truth-function is represented in language PL with the sign ‘↔’. A statement of the form \ulcorner (\alpha \leftrightarrow \beta) \urcorner is regarded as true if \alpha and \beta are either both true or both false, and is regarded as false if they have different truth-values. Hence, we have the following chart:

\alpha \beta (\alpha \leftrightarrow \beta)
T
T
F
F
T
F
T
F
T
F
F
T

Since the truth of a statement of the form \ulcorner (\alpha \leftrightarrow \beta) \urcorner requires \alpha and \beta to have the same truth-value, this operator is often likened to the English phrase “…if and only if…”. Again, however, they are not in all ways alike, because ‘↔’ is used entirely truth-functionally. Regardless of what \alpha and \beta are, and what relation (if any) they have to one another, if both are false, \ulcorner (\alpha \leftrightarrow \beta) \urcorner is considered to be true. However, we would not normally regard the statement “Al Gore is the President of the United States in 2004 if and only if Bob Dole is the President of the United States in 2004” as true simply because both simpler statements happen to be false. A statement of the form \ulcorner (\alpha \leftrightarrow \beta) \urcorner is also sometimes referred to as a (material) biconditional.

The sign ‘\equiv‘ is sometimes used instead of ‘↔’ for material equivalence.

Negation: The negation of statement \alpha, simply written \ulcorner \neg \alpha \urcorner in language PL, is regarded as true if \alpha is false, and false if \alpha is true. Unlike the other operators we have considered, negation is applied to a single statement. The corresponding chart can therefore be drawn more simply as follows:

\alpha \neg \alpha
T
F
F
T

The negation sign ‘\neg‘ bears obvious similarities to the word ‘not’ used in English, as well as similar phrases used to change a statement from affirmative to negative or vice-versa. In logical languages, the signs ‘\sim‘ or ‘-‘ are sometimes used in place of ‘\neg‘.

The five charts together provide the rules needed to determine the truth-value of a given wff in language PL when given the truth-values of the independent statement letters making it up. These rules are very easy to apply in the case of a very simple wff such as “(P \land Q)“. Suppose that ‘P‘ is true, and ‘Q‘ is false; according to the second row of the chart given for the operator, ‘\land‘, we can see that this statement is false.

However, the charts also provide the rules necessary for determining the truth-value of more complicated statements. We have just seen that “(P \land Q)” is false if ‘P‘ is true and ‘Q‘ is false. Consider a more complicated statement that contains this statement as a part, for example, “((P \land Q) \rightarrow \neg R)“, and suppose once again that ‘P‘ is true, and ‘Q‘ is false, and further suppose that ‘R‘ is also false. To determine the truth-value of this complicated statement, we begin by determining the truth-value of the internal parts. The statement “(P \land Q)“, as we have seen, is false. The other substatement, “\neg R“, is true, because ‘R‘ is false, and ‘\neg‘ reverses the truth-value of that to which it is applied. Now we can determine the truth-value of the whole wff, “((P \land Q) \rightarrow \neg R)“, by consulting the chart given above for ‘→’. Here, the wff “(P \land Q)” is our \alpha, and “\neg R” is our \beta, and since their truth-values are F and T, respectively, we consult the third row of the chart, and we see that the complex statement “((P \land Q) \rightarrow \neg R)” is true.

We have so far been considering the case in which ‘P‘ is true and ‘Q‘ and ‘R‘ are both false. There are, however, a number of other possibilities with regard to the possible truth-values of the statement letters, ‘P‘, ‘Q‘ and ‘R‘. There are eight possibilities altogether, as shown by the following list:

P
Q
R
T
T
T
T
F
F
F
F
T
T
F
F
T
T
F
F
T
F
T
F
T
F
T
F

Strictly speaking, each of the eight possibilities above represents a different truth-value assignment, which can be defined as a possible assignment of truth-values T or F to the different statement letters making up a wff or series of wffs. If a wff has n distinct statement letters making up, the number of possible truth-value assignments is 2n. With the wff, “((P \land Q) \rightarrow \neg R)“, there are three statement letters, ‘P‘, ‘Q‘ and ‘R‘, and so there are 8 truth-value assignments.

It then becomes possible to draw a chart showing how the truth-value of a given wff would be resolved for each possible truth-value assignment. We begin with a chart showing all the possible truth-value assignments for the wff, such as the one given above. Next, we write out the wff itself on the top right of our chart, with spaces between the signs. Then, for each, truth-value assignment, we repeat the appropriate truth-value, ‘T’, or ‘F’, underneath the statement letters as they appear in the wff. Then, as the truth-values of those wffs that are parts of the complete wff are determined, we write their truth-values underneath the logical sign that is used to form them. The final column filled in shows the truth-value of the entire statement for each truth-value assignment. Given the importance of this column, we highlight it in some way. Here, we highlight it in yellow.

P
Q
R
|
((P
\land
Q)
\neg
R)
T
T
T
T
F
F
F
F
T
T
F
F
T
T
F
F
T
F
T
F
T
F
T
F
T
T
T
T
F
F
F
F
T
T
F
F
F
F
F
F
T
T
F
F
T
T
F
F
F
T
T
T
T
T
T
T
F
T
F
T
F
T
F
T
T
F
T
F
T
F
T
F

Charts such as the one given above are called truth tables. In classical truth-functional propositional logic, a truth table constructed for a given wff in effects reveals everything logically important about that wff. The above chart tells us that the wff “((P \land Q) \rightarrow \neg R)” can only be false if ‘P‘, ‘Q‘ and ‘R‘ are all true, and is true otherwise.

c. Definability of the Operators and the Languages PL’ and PL”

The language PL, as we have seen, contains operators that are roughly analogous to the English operators ‘and’, ‘or’, ‘if… then…’, ‘if and only if’, and ‘not’. Each of these, as we have also seen, can be thought of as representing a certain truth-function. It might be objected however, that there are other methods of combining statements together in which the truth-value of the statement depends wholly on the truth-values of the parts, or in other words, that there are truth-functions besides conjunction, (inclusive) disjunction, material implication, material equivalence and negation. For example, we noted earlier that the sign ‘\lor‘ is used analogously to ‘or’ in the inclusive sense, which means that language PL has no simple sign for ‘or’ in the exclusive sense. It might be thought, however, that the langauge PL is incomplete without the addition of an additional symbol, say ‘\veebar‘, such that \ulcorner (\alpha \veebar \beta) \urcorner would be regarded as true if \alpha is true and \beta is false, or \alpha is false and \beta is true, but would be regarded as false if either both \alpha and \beta are true or both \alpha and \beta are false.

However, a possible response to this objection would be to make note that while language PL does not include a simple sign for this exclusive sense of disjunction, it is possible, using the symbols that are included in PL, to construct a statement that is true in exactly the same circumstances. Consider, for example, a statement of the form \ulcorner \neg (\alpha \leftrightarrow \beta) \urcorner. It is easily shown, using a truth table, that any wff of this form would have the same truth-value as a would-be statement using the operator ‘\veebar‘. See the following chart:

\alpha
\beta
|
\neg
(\alpha
\beta)
T
T
F
F
T
F
T
F
F
T
T
F
T
T
F
F
T
F
F
T
T
F
T
F

Here we see that a wff of the form \ulcorner \neg (\alpha \leftrightarrow \beta) \urcorner is true if either \alpha or \beta is true but not both. This shows that PL is not lacking in any way by not containing a sign ‘\veebar‘. All the work that one would wish to do with this sign can be done using the signs ‘↔’ and ‘\neg‘. Indeed, one might claim that the sign ‘\veebar‘ can be defined in terms of the signs ‘↔’, and ‘\neg‘, and then use the form \ulcorner (\alpha \veebar \beta) \urcorner as an abbreviation of a wff of the form \ulcorner \neg (\alpha \leftrightarrow \beta) \urcorner, without actually expanding the primitive vocabulary of language PL.

The signs ‘\land‘, ‘\lor‘, ‘→’, ‘↔’ and ‘\neg‘, were chosen as the operators to include in PL because they correspond (roughly) the sorts of truth-functional operators that are most often used in ordinary discourse and reasoning. However, given the preceding discussion, it is natural to ask whether or not some operators on this list can be defined in terms of the others. It turns out that they can. In fact, if for some reason we wished our logical language to have a more limited vocabulary, it is possible to get by using only the signs ‘\neg‘ and ‘→’, and define all other possible truth-functions in virtue of them. Consider, for example, the following truth table for statements of the form \ulcorner \neg (\alpha \rightarrow \neg \beta) \urcorner:

\alpha
\beta
|
\neg
(\alpha
\neg
\beta)
T
T
F
F
T
F
T
F
T
F
F
F
T
T
F
F
F
T
T
T
F
T
F
T
T
F
T
F

We can see from the above that a wff of the form \ulcorner \neg (\alpha \rightarrow \neg \beta) \urcorner always has the same truth-value as the corresponding statement of the form \ulcorner (\alpha \land \beta) \urcorner. This shows that the sign ‘\land‘ can in effect be defined using the signs ‘\neg‘ and ‘→’.

Next, consider the truth table for statements of the form \ulcorner (\neg \alpha \rightarrow \beta) \urcorner:

\alpha
\beta
|
(\neg
\alpha
\beta)
T
T
F
F
T
F
T
F
F
F
T
T
T
T
F
F
T
T
T
F
T
F
T
F

Here we can see that a statement of the form \ulcorner (\neg \alpha \rightarrow \beta) \urcorner always has the same truth-value as the corresponding statement of the form \ulcorner (\alpha \lor \beta) \urcorner. Again, this shows that the sign ‘\lor‘ could in effect be defined using the signs ‘→’ and ‘\neg‘.

Lastly, consider the truth table for a statement of the form \ulcorner \neg (( \alpha \rightarrow \beta) \rightarrow \neg (\beta \rightarrow \alpha)) \urcorner:

\alpha
\beta
|
\neg
((\alpha
\beta)
\neg
(\beta
\alpha))
T
T
F
F
T
F
T
F
T
F
F
T
T
T
F
F
T
F
T
T
T
F
T
F
F
T
T
F
F
F
T
F
T
F
T
F
T
T
F
T
T
T
F
F

From the above, we see that a statement of the form \ulcorner \neg (( \alpha \rightarrow \beta) \rightarrow \neg (\beta \rightarrow \alpha)) \urcorner always has the same truth-value as the corresponding statement of the form \ulcorner (\alpha \leftrightarrow \beta) \urcorner. In effect, therefore, we have shown that the remaining operators of PL can all be defined in virtue of ‘→’, and ‘\neg‘, and that, if we wished, we could do away with the operators, ‘\land‘, ‘\lor‘ and ‘↔’, and simply make do with those equivalent expressions built up entirely from ‘→’ and ‘\neg‘.

Let us call the language that results from this simplication PL’. While the definition of a statement letter remains the same for PL’ as for PL, the definition of a well-formed formula (wff) for PL’ can be greatly simplified. In effect, it can be stated as follows:

Definition: A well-formed formula (or wff) of PL’ is defined recursively as follows:

  1. Any statement letter is a well-formed formula.
  2. If \alpha is a well-formed formula, then so is \ulcorner \neg \alpha \urcorner.
  3. If \alpha and \beta are well-formed formulas, then so is \ulcorner (\alpha \rightarrow \beta) \urcorner.
  4. Nothing that cannot be constructed by successive steps of (1)-(3) is a well-formed formula.

Strictly speaking, then, the langauge PL’ does not contain any statements using the operators ‘\lor‘, ‘\land‘, or ‘↔’. One could however, utilize conventions such that, in language PL’, an expression of the form \ulcorner (\alpha \land \beta) \urcorner is to be regarded as a mere abbreviation or short-hand for the corresponding statement of the form \ulcorner \neg (\alpha \rightarrow \neg \beta) \urcorner, and similarly that expressions of the forms \ulcorner (\alpha \lor \beta) \urcorner and \ulcorner (\alpha \leftrightarrow \beta) \urcorner are to be regarded as abbreviations of expressions of the forms \ulcorner (\neg \alpha \rightarrow \beta) \urcorner or \ulcorner \neg (( \alpha \rightarrow \beta) \rightarrow \neg (\beta \rightarrow \alpha)) \urcorner, respectively. In effect, this means that it is possible to translate any wff of language PL into an equivalent wff of language PL’.

In Section VII, it is proven that not only are the operators ‘\neg‘ and ‘→’ sufficient for defining every truth-functional operator included in language PL, but also that they are sufficient for defining any imaginable truth-functional operator in classical propositional logic.

Nevertheless, the choice of ‘\neg‘ and ‘→’ for the primitive signs used in language PL’ is to some extent arbitrary. It would also have been possible to define all other operators of PL (including ‘→’) using the signs ‘\neg‘ and ‘\lor‘. On this approach, \ulcorner (\alpha \land \beta) \urcorner would be defined as \ulcorner \neg (\neg \alpha \lor \neg \beta) \urcorner, \ulcorner (\alpha \rightarrow \beta) \urcorner would be defined as \ulcorner (\neg \alpha \lor \beta) \urcorner, and \ulcorner (\alpha \leftrightarrow \beta) \urcorner would be defined as \ulcorner \neg (\neg(\neg \alpha \lor \beta) \lor \neg (\neg \beta \lor \alpha)) \urcorner. Similarly, we could instead have begun with ‘\neg‘ and ‘\land‘ as our starting operators. On this way of proceeding, \ulcorner (\alpha \lor \beta) \urcorner would be defined as \ulcorner \neg (\neg \alpha \land \neg \beta) \urcorner, \ulcorner (\alpha \rightarrow \beta) \urcorner would be defined as \ulcorner \neg (\alpha \land \neg \beta) \urcorner, and \ulcorner (\alpha \leftrightarrow \beta) \urcorner would be defined as \ulcorner (\neg (\alpha \land \neg \beta) \land \neg (\beta \land \neg \alpha) \urcorner.

There are, as we have seen, multiple different ways of reducing all truth-functional operators down to two primitives. There are also two ways of reducing all truth-functional operators down to a single primitive operator, but they require using an operator that is not included in language PL as primitive. On one approach, we utilize an operator written ‘|’, and explain the truth-function corresponding to this sign by means of the following chart:

\alpha \beta (\alpha | \beta)
T
T
F
F
T
F
T
F
F
T
T
T

Here we can see that a statement of the form \ulcorner (\alpha | \beta) \urcorner is false if both \alpha and \beta are true, and true otherwise. For this reason one might read ‘|’ as akin to the English expression, “Not both … and …”. Indeed, it is possible to represent this truth-function in language PL using an expression of the form, \ulcorner \neg (\alpha \land \beta) \urcorner. However, since it is our intention to show that all other truth-functional operators, including ‘\neg‘ and ‘\land‘ can be derived from ‘|’, it is better not to regard the meanings of ‘\neg‘ and ‘\land‘ as playing a part of the meaning of ‘|’, and instead attempt (however counterintuitive it may seem) to regard ‘|’ as conceptually prior to ‘\neg‘ and ‘\land‘.

The sign ‘|’ is called the Sheffer stroke, and is named after H. M. Sheffer, who first publicized the result that all truth-functional connectives could be defined in virtue of a single operator in 1913.

We can then see that the connective ‘\land‘ can be defined in virtue of ‘|’, because an expression of the form \ulcorner ((\alpha | \beta) | (\alpha | \beta)) \urcorner generates the following truth table, and hence is equivalent to the corresponding expression of the form \ulcorner (\alpha \land \beta) \urcorner:

\alpha
\beta
|
((\alpha
|
\beta)
|
(\alpha
|
\beta))
T
T
F
F
T
F
T
F
T
T
F
F
F
T
T
T
T
F
T
F
T
F
F
F
T
T
F
F
F
T
T
T
T
F
T
F

Similarly, we can define the operator ‘\lor‘ using ‘|’ by noting that an expression of the form \ulcorner ((\alpha | \alpha) | (\beta | \beta)) \urcorner always has the same truth-value as the corresponding statement of the form \ulcorner (\alpha \lor \beta) \urcorner:

\alpha
\beta
|
((\alpha
|
\alpha)
|
(\beta
|
\beta))
T
T
F
F
T
F
T
F
T
T
F
F
F
F
T
T
T
T
F
F
T
T
T
F
T
F
T
F
F
T
F
T
T
F
T
F

The following truth table shows that a statement of the form \ulcorner (\alpha | (\beta | \beta)) \urcorner always has the same truth table as a statement of the form \ulcorner (\alpha \rightarrow \beta) \urcorner:

\alpha
\beta
|
(\alpha
|
(\beta
|
\beta))
T
T
F
F
T
F
T
F
T
T
F
F
T
F
T
T
T
F
T
F
F
T
F
T
T
F
T
F

Although far from intuitively obvious, the following table shows that an expression of the form \ulcorner (((\alpha | \alpha) | (\beta | \beta)) | (\alpha | \beta)) \urcorner always has the same truth-value as the corresponding wff of the form \ulcorner (\alpha \leftrightarrow \beta) \urcorner:

\alpha
\beta
|
(((\alpha
|
\alpha)
|
(\beta
|
\beta))
|
(\alpha
|
\beta))
T

T
F
F

T
F
T
F
T
T
F
F
F
F
T
T
T
T
F
F
T
T
T
F
T
F
T
F
F
T
F
T
T
F
T
F
T
F
F
T
T
T
F
F
F
T
T
T
T
F
T
F

This leaves only the sign ‘\neg‘, which is perhaps the easiest to define using ‘|’, as clearly \ulcorner (\alpha | \alpha) \urcorner, or, roughly, “not both \alpha and \alpha“, has the opposite truth-value from \alpha itself:

\alpha
|
(\alpha
|
\alpha)
T

F

T
F
F
T
T
F

If, therefore, we desire a language for use in studying propositional logic that has as small a vocabulary as possible, we might suggest using a language that employs the sign ‘|’ as its sole primitive operator, and defines all other truth-functional operators in virtue of it. Let us call such a language PL”. PL” differs from PL and PL’ only in that its definition of a well-formed formula can be simplified even further:

Definition: A well-formed formula (or wff) of PL” is defined recursively as follows:

  1. Any statement letter is a well-formed formula.
  2. If \alpha and \beta are well-formed formulas, then so is \ulcorner (\alpha | \beta) \urcorner.
  3. Nothing that cannot be constructed by successive steps of (1)-(2) is a well-formed formula.

In language PL”, strictly speaking, ‘|’ is the only operator. However, for reasons that should be clear from the above, any expression from language PL that involves any of the operators ‘\neg‘, ‘\land‘, ‘\lor‘, ‘→’, or ‘↔’ could be translated into language PL” without the loss of any of its important logical properties. In effect, statements using these signs could be regarded as abbreviations or shorthand expressions for wffs of PL” that only use the operator ‘|’.

Even here, the choice of ‘|’ as the sole primitive is to some extent arbitrary. It would also be possible to reduce all truth-functional operators down to a single primitive by making use of a sign ‘\downarrow‘, treating it as roughly equivalent to the English expression, “neither … nor …”, so that the corresponding chart would be drawn as follows:

\alpha \beta (\alpha \downarrow \beta)
T
T
F
F
T
F
T
F
F
F
F
T

If we were to use ‘\downarrow‘ as our sole operator, we could again define all the others. \ulcorner \neg \alpha \urcorner would be defined as \ulcorner (\alpha \downarrow \alpha) \urcorner; \ulcorner (\alpha \lor \beta) \urcorner would be defined as \ulcorner ((\alpha \downarrow \beta) \downarrow (\alpha \downarrow \beta)) \urcorner; \ulcorner (\alpha \land \beta) \urcorner would be defined as \ulcorner ((\alpha \downarrow \alpha) \downarrow (\beta \downarrow \beta)) \urcorner; and similarly for the other operators. The sign ‘\downarrow‘ is sometimes also referred to as the Sheffer stroke, and is also called the Peirce/Sheffer dagger.

Depending on one’s purposes in studying propositional logic, sometimes it makes sense to use a rich language like PL with more primitive operators, and sometimes it makes sense to use a relatively sparse language such as PL’ or PL” with fewer primitive operators. The advantage of the former approach is that it conforms better with our ordinary reasoning and thinking habits; the advantage of the latter is that it simplifies the logical language, which makes certain interesting results regarding the deductive systems making use of the language easier to prove.

For the remainder of this article, we shall primarily be concerned with the logical properties of statements formed in the richer language PL. However, we shall consider a system making use of language PL’ in some detail in Section VI, and shall also make brief mention of a system making use of language PL”.

4. Tautologies, Logical Equivalence and Validity

Truth-functional propositional logic concerns itself only with those ways of combining statements to form more complicated statements in which the truth-values of the complicated statements depend entirely on the truth-values of the parts. Owing to this, all those features of a complex statement that are studied in propositional logic derive from the way in which their truth-values are derived from those of their parts. These features are therefore always represented in the truth table for a given statement.

Some complex statements have the interesting feature that they would be true regardless of the truth-values of the simple statements making them up. A simple example would be the wff “P \lor \neg P“; that is, “P or not P“. It is fairly easy to see that this statement is true regardless of whether ‘P‘ is true or ‘P‘ is false. This is also shown by its truth table:

P
|
P
\lor
\neg
P
T

F

T
F
T
T
F
T
T
F

There are, however, statements for which this is true but it is not so obvious. Consider the wff, “R \rightarrow ((P \rightarrow Q) \lor \neg (R \rightarrow Q))“. This wff also comes out as true regardless of the truth-values of ‘P‘, ‘Q‘ and ‘R‘.

P
Q
R
|
R
((P
Q)
\lor
\neg
(R
Q))
T
T
T
T
F
F
F
F
T
T
F
F
T
T
F
F
T
F
T
F
T
F
T
F
T
F
T
F
T
F
T
F
T
T
T
T
T
T
T
T
T
T
T
T
F
F
F
F
T
T
F
F
T
T
T
T
T
T
F
F
T
T
F
F
T
T
T
F
T
T
T
T
F
F
T
F
F
F
T
F
T
F
T
F
T
F
T
F
T
T
F
T
T
T
F
T
T
T
F
F
T
T
F
F

Statements that have this interesting feature are called tautologies. Let us define this notion precisely.

Definition: a wff is a tautology if and only if it is true for all possible truth-value assignments to the statement letters making it up.

Tautologies are also sometimes called logical truths or truths of logic because tautologies can be recognized as true solely in virtue of the principles of propositional logic, and without recourse to any additional information.

On the other side of the spectrum from tautologies are statements that come out as false regardless of the truth-values of the simple statements making them up. A simple example of such a statement would be the wff “P \land \neg P“; clearly such a statement cannot be true, as it contradicts itself. This is revealed by its truth table:

P
|
P
\land
\neg
P
T

F

T
F
F
F
F
T
T
F

To state this precisely:

Definition: a wff is a self-contradiction if and only if it is false for all possible truth-value assignments to the statement letters making it up.

Another, more interesting, example of a self-contradiction is the statement “\neg (P \rightarrow Q) \land \neg (Q \rightarrow P)“; this is not as obviously self-contradictory. However, we can see that it is when we consider its truth table:

P
Q
|
\neg
(P
Q)
\land
\neg
(Q
P)
T

T
F
F

T
F
T
F
F
T
F
F
T
T
F
F
T
F
T
T
T
F
T
F
F
F
F
F
F
F
T
F
T
F
T
F
T
T
F
T
T
T
F
F

A statement that is neither self-contradictory nor tautological is called a contingent statement. A contingent statement is true for some truth-value assignments to its statement letters and false for others. The truth table for a contingent statement reveals which truth-value assignments make it come out as true, and which make it come out as false. Consider the truth table for the statement “(P \rightarrow Q) \land (P \rightarrow \neg Q)“:

P
Q
|
(P
Q)
\land
(P
\neg
Q)
T

T
F
F

T
F
T
F
T
T
F
F
T
F
T
T
T
F
T
F
F
F
T
T
T
T
F
F
F
T
T
T
F
T
F
T
T
F
T
F

We can see that of the four possible truth-value assignments for this statement, two make it come as true, and two make it come out as false. Specifically, the statement is true when ‘P‘ is false and ‘Q‘ is true, and when ‘P‘ is false and ‘Q‘ is false, and the statement is false when ‘P‘ is true and ‘Q‘ is true and when ‘P‘ is true and ‘Q‘ is false.

Truth tables are also useful in studying logical relationships that hold between two or more statements. For example, two statements are said to be consistent when it is possible for both to be true, and are said to be inconsistent when it is not possible for both to be true. In propositional logic, we can make this more precise as follows.

Definition: two wffs are consistent if and only if there is at least one possible truth-value assignment to the statement letters making them up that makes both wffs true.

Definition: two wffs are inconsistent if and only if there is no truth-value assignment to the statement letters making them up that makes them both true.

Whether or not two statements are consistent can be determined by means of a combined truth table for the two statements. For example, the two statements, “P \lor Q” and “\neg (P \leftrightarrow \neg Q)” are consistent:

P
Q
|
P
\lor
Q
\neg
(P
\neg
Q)
T

T
F
F

T
F
T
F
T
T
F
F
T
T
T
F
T
F
T
F
T
F
F
T
T
T
F
F
F
T
T
F
F
T
F
T
T
F
T
F

Here, we see that there is one truth-value assignment, that in which both ‘P‘ and ‘Q‘ are true, that makes both “P \lor Q” and “\neg (P \leftrightarrow \neg Q)” true. However, the statements “(P \rightarrow Q) \land P” and “\neg (Q \lor \neg P)” are inconsistent, because there is no truth-value assignment in which both come out as true.

P
Q
|
(P
Q)
\land
P
\neg
(Q
\lor
\neg
P))
T

T
F
F

T
F
T
F
T
T
F
F
T
F
T
T
T
F
T
F
T
F
F
F
T
T
F
F
F
T
F
F
T
F
T
F
T
F
T
T
F
F
T
T
T
T
F
F

Another relationship that can hold between two statements is that of having the same truth-value regardless of the truth-values of the simple statements making them up. Consider a combined truth table for the wffs “\neg P \rightarrow \neg Q” and “\neg (Q \land \neg P)“:

P
Q
|
\neg
P
\neg
Q
\neg
(Q
\land
\neg
P))
T

T
F
F

T
F
T
F
F
F
T
T
T
T
F
F
T
T
F
T
F
T
F
T
T
F
T
F
T
T
F
T
T
F
T
F
F
F
T
F
F
F
T
T
T
T
F
F

Here we see that these two statements necessarily have the same truth-value.

Definition: two statements are said to be logically equivalent if and only if all possible truth-value assignments to the statement letters making them up result in the same resulting truth-values for the whole statements.

The above statements are logically equivalent. However, the truth table given above for the statements “P \lor Q” and “\neg (P \leftrightarrow \neg Q)” show that they, on the other hand, are not logically equivalent, because they differ in truth-value for three of the four possible truth-value assignments.

Finally, and perhaps most importantly, truth tables can be utilized to determine whether or not an argument is logically valid. In general, an argument is said to be logically valid whenever it has a form that makes it impossible for the conclusion to be false if the premises are true. (See the encyclopedia article on “Validity and Soundness“.) In classical propositional logic, we can give this a more precise characterization.

Definition: a wff \beta is said to be a logical consequence of a set of wffs \alpha_1, \alpha_2, ..., \alpha_n, if and only if there is no truth-value assignment to the statement letters making up these wffs that makes all of \alpha_1, \alpha_2, ..., \alpha_n true but does not make \beta true.

An argument is logically valid if and only if its conclusion is a logical consequence of its premises. If an argument whose conclusion is \beta and whose only premise is \alpha is logically valid, then \alpha is said to logically imply \beta.

For example, consider the following argument:

\( \begin{array}{l} P \rightarrow Q\\ \neg Q \rightarrow P\\ \hline Q \end{array} \)

We can test the validity of this argument by constructing a combined truth table for all three statements.

P
Q
|
P
Q
\neg
Q
P
Q
T

T
F
F

T
F
T
F
T
T
F
F
T
F
T
T
T
F
T
F
F
T
F
T
T
F
T
F
T
T
T
F
T
T
F
F
T
F
T
F

Here we see that both premises come out as true in the case in which both ‘P‘ and ‘Q‘ are true, and in which ‘P‘ is false but ‘Q‘ is true. However, in those cases, the conclusion is also true. It is possible for the conclusion to be false, but only if one of the premises is false as well. Hence, we can see that the inference represented by this argument is truth-preserving. Contrast this with the following example:

\( \begin{array}{l} P \rightarrow Q\\ \hline \neg Q \lor \neg P \end{array} \)

Consider the truth-value assignment making both ‘P‘ and ‘Q‘ true. If we were to fill in that row of the truth-value for these statements, we would see that “P \rightarrow Q” comes out as true, but “\neg Q \lor \neg P” comes out as false. Even if ‘P‘ and ‘Q‘ are not actually both true, it is possible for them to both be true, and so this form of reasoning is not truth-preserving. In other words, the argument is not logically valid, and its premise does not logically imply its conclusion.

One of the most striking features of truth tables is that they provide an effective procedure for determining the logical truth, or tautologyhood of any single wff, and for determining the logical validity of any argument written in the language PL. The procedure for constructing such tables is purely rote, and while the size of the tables grows exponentially with the number of statement letters involved in the wff(s) under consideration, the number of rows is always finite and so it is in principle possible to finish the table and determine a definite answer. In sum, classical propositional logic is decidable.

5. Deduction: Rules of Inference and Replacement

a. Natural Deduction

Truth tables, as we have seen, can theoretically be used to solve any question in classical truth-functional propositional logic. However, this method has its drawbacks. The size of the tables grows exponentially with the number of distinct statement letters making up the statements involved. Moreover, truth tables are alien to our normal reasoning patterns. Another method for establishing the validity of an argument exists that does not have these drawbacks: the method of natural deduction. In natural deduction an attempt is made to reduce the reasoning behind a valid argument to a series of steps each of which is intuitively justified by the premises of the argument or previous steps in the series.

Consider the following argument stated in natural language:

Either cat fur or dog fur was found at the scene of the crime. If dog fur was found at the scene of the crime, officer Thompson had an allergy attack. If cat fur was found at the scene of the crime, then Macavity is responsible for the crime. But officer Thompson didn’t have an allergy attack, and so therefore Macavity must be responsible for the crime.

The validity of this argument can be made more obvious by representing the chain of reasoning leading from the premises to the conclusion:

  1. Either cat fur was found at the scene of the crime, or dog fur was found at the scene of the crime. (Premise)
  2. If dog fur was found at the scene of the crime, then officer Thompson had an allergy attack. (Premise)
  3. If cat fur was found at the scene of the crime, then Macavity is responsible for the crime. (Premise)
  4. Officer Thompson did not have an allergy attack. (Premise)
  5. Dog fur was not found at the scene of the crime. (Follows from 2 and 4.)
  6. Cat fur was found at the scene of the crime. (Follows from 1 and 5.)
  7. Macavity is responsible for the crime. (Conclusion. Follows from 3 and 6.)

Above, we do not jump directly from the premises to the conclusion, but show how intermediate inferences are used to ultimately justify the conclusion by a step-by-step chain. Each step in the chain represents a simple, obviously valid form of reasoning. In this example, the form of reasoning exemplified in line 5 is called modus tollens, which involves deducing the negation of the antecedent of a conditional from the conditional and the negation of its consequent. The form of reasoning exemplified in step 5 is called disjunctive syllogism, and involves deducing one disjunct of a disjunction on the basis of the disjunction and the negation of the other disjunct. Lastly, the form of reasoning found at line 7 is called modus ponens, which involves deducing the truth of the consequent of a conditional given truth of both the conditional and its antecedent. “Modus ponens” is Latin for affirming mode, and “modus tollens” is Latin for denying mode.

A system of natural deduction consists in the specification of a list of intuitively valid rules of inference for the construction of derivations or step-by-step deductions. Many equivalent systems of deduction have been given for classical truth-functional propositional logic. In what follows, we sketch one system, which is derived from the popular textbook by Irving Copi (1953). The system makes use of the language PL.

b. Rules of Inference

Here we give a list of intuitively valid rules of inference. The rules are stated in schematic form. Any inference in which any wff of language PL is substituted unformly for the schematic letters in the forms below constitutes an instance of the rule.

Modus ponens (MP):

\( \begin{array}{l} \alpha \rightarrow \beta, \alpha\\ \hline \beta \end{array} \)

(Modus ponens is sometimes also called “modus ponendo ponens”, “detachment” or a form of “→-elimination”.)

Modus tollens (MT):

\( \begin{array}{l} \alpha \rightarrow \beta, \neg \beta\\ \hline \neg \alpha \end{array} \)

(Modus tollens is sometimes also called “modus tollendo tollens” or a form of “→-elimination”.)

Disjunctive syllogism (DS): (two forms)

\( \begin{array}{l} \alpha \lor \beta, \neg \alpha\\ \hline \beta \end{array} \)

\( \begin{array}{l} \alpha \lor \beta, \neg \beta\\ \hline \alpha \end{array} \)

(Disjunctive syllogism is sometimes also called “modus tollendo ponens” or “\lor-elimination”.)

Addition (Add): (two forms)

\( \begin{array}{l} \alpha\\ \hline \alpha \lor \beta \end{array} \)

\( \begin{array}{l} \beta\\ \hline \alpha \lor \beta \end{array} \)

(Addition is sometimes also called “disjunction introduction” or “\lorintroduction”.)

Simplification (Simp): (two forms)

\( \begin{array}{l} \alpha \land \beta\\ \hline \alpha \end{array} \)

\( \begin{array}{l} \alpha \land \beta\\ \hline \beta \end{array} \)

(Simplification is sometimes also called “conjunction elimination” or “\land-elimination”.)

Conjunction (Conj):

\( \begin{array}{l} \alpha, \beta\\ \hline \alpha \land \beta \end{array} \)

(Conjunction is sometimes also called “conjunction introduction”, “\land-introduction” or “logical multiplication”.)

Hypothetical syllogism (HS):

\( \begin{array}{l} \alpha \rightarrow \beta, \beta \rightarrow \gamma\\ \hline \alpha \rightarrow \gamma \end{array} \)

(Hypothetical syllogism is sometimes also called “chain reasoning” or “chain deduction”.)

Constructive dilemma (CD):

\( \begin{array}{l} (\alpha \rightarrow \gamma) \land (\beta \rightarrow \delta), \alpha \lor \beta\\ \hline \gamma \lor \delta \end{array} \)

Absorption (Abs):

\( \begin{array}{l} \alpha \rightarrow \beta\\ \hline \alpha \rightarrow (\alpha \land \beta) \end{array} \)

c. Rules of Replacement

The nine rules of inference listed above represent ways of inferring something new from previous steps in a deduction. Many systems of natural deduction, including those initially designed by Gentzen, consist entirely of rules similar to the above. If the language of a system involves signs introduced by definition, it must also allow the substitution of a defined sign for the expression used to define it, or vice versa. Still other systems, while not making use of defined signs, allow one to make certain substitutions of expressions of one form for expressions of another form in certain cases in which the expressions in question are logically equivalent. These are called rules of replacement, and Copi’s natural deduction system invokes such rules. Strictly speaking, rules of replacement differ from inference rules, because, in a sense, when a rule of replacement is used, one is not inferring something new but merely stating what amounts to the same thing using a different combination of symbols. In some systems, rules for replacement can be derived from the inference rules, but in Copi’s system, they are taken as primitive.

Rules of replacement also differ from inference rules in other ways. Inference rules only apply when the main operators match the patterns given and only apply to entire statements. Inference rules are also strictly unidirectional: one must infer what is below the horizontal line from what is above and not vice-versa. However, replacement rules can be applied to portions of statements and not only to entire statements; moreover, they can be implemented in either direction.

The rules of replacement used by Copi are the following:

Double negation (DN):

\ulcorner \neg \neg \alpha \urcorner is interreplaceable with \alpha

(Double negation is also called “\neg-elimination”.)

Commutativity (Com): (two forms)

\ulcorner \alpha \land \beta \urcorner is interreplaceable with \ulcorner \beta \land \alpha \urcorner
\ulcorner \alpha \lor \beta \urcorner is interreplaceable with \ulcorner \beta \lor \alpha \urcorner

Associativity (Assoc): (two forms)

\ulcorner (\alpha \land \beta) \land \gamma \urcorner is interreplaceable with \ulcorner \alpha \land (\beta \land \gamma) \urcorner
\ulcorner (\alpha \lor \beta) \lor \gamma \urcorner is interreplaceable with \ulcorner \alpha \lor (\beta \lor \gamma) \urcorner

Tautology (Taut): (two forms)

\alpha is interreplaceable with \ulcorner \alpha \land \alpha \urcorner
\alpha is interreplaceable with \ulcorner \alpha \lor \alpha \urcorner

DeMorgan’s Laws (DM): (two forms)

\ulcorner \neg (\alpha \land \beta) \urcorner is interreplaceable with \ulcorner \neg \alpha \lor \neg \beta \urcorner
\ulcorner \neg (\alpha \lor \beta) \urcorner is interreplaceable with \ulcorner \neg \alpha \land \neg \beta \urcorner

Transposition (Trans):

\ulcorner \alpha \rightarrow \beta \urcorner is interreplaceable with \ulcorner \neg \beta \rightarrow \neg \alpha \urcorner

(Transposition is also sometimes called “contraposition”.)

Material Implication (Impl):

\ulcorner \alpha \rightarrow \beta \urcorner is interreplaceable with \ulcorner \neg \alpha \lor \beta \urcorner

Exportation (Exp):

\ulcorner \alpha \rightarrow (\beta \rightarrow \gamma) \urcorner is interreplaceable with \ulcorner (\alpha \land \beta) \rightarrow \gamma \urcorner

Distribution (Dist): (two forms)

\ulcorner \alpha \land (\beta \lor \gamma) \urcorner is interreplaceable with \ulcorner (\alpha \land \beta) \lor (\alpha \land \gamma) \urcorner
\ulcorner \alpha \lor (\beta \land \gamma) \urcorner is interreplaceable with \ulcorner (\alpha \lor \beta) \land (\alpha \lor \gamma) \urcorner

Material Equivalence (Equiv): (two forms)

\ulcorner \alpha \leftrightarrow \beta \urcorner is interreplaceable with \ulcorner (\alpha \rightarrow \beta) \land (\beta \rightarrow \alpha) \urcorner
\ulcorner \alpha \leftrightarrow \beta \urcorner is interreplaceable with \ulcorner (\alpha \land \beta) \lor (\neg \alpha \land \neg \beta) \urcorner

(Material equivalence is sometimes also called “biconditional introduction/elimination” or “↔-introduction/elimination”.)

d. Direct Deductions

A direct deduction of a conclusion from a set of premises consists of an ordered sequence of wffs such that each member of the sequence is either (1) a premise, (2) derived from previous members of the sequence by one of the inference rules, (3) derived from a previous member of the sequence by the replacement of a logically equivalent part according to the rules of replacement, and such that the conclusion is the final step of the sequence.

To be even more precise, a direct deduction is defined as an ordered sequence of wffs, \beta_1, \beta_2, ..., \beta_n, such that for each step \beta_i where i is between 1 and n inclusive, either (1) \beta_i is a premise, (2) \beta_i matches the form given below the horizontal line for one of the 9 inference rules, and there are wffs in the sequence prior to \beta_i matching the forms given above the horizontal line, (3) there is a previous step in the sequence \beta_j where j < i and \beta_j differs from \beta_i at most by matching or containing a part that matches one of the forms given for one of the 10 replacement rules in the same place in whcih \beta_i contains the wff of the corresponding form, and such that the conclusion of the argument is \beta_n.

Using line numbers and the abbreviations for the rules of the system to annotate, the chain of reasoning given above in English, when transcribed into language PL and organized as a direct deduction, would appear as follows:

1. C \lor D Premise
2. C \rightarrow O Premise
3. D \rightarrow M Premise
4. \neg O Premise
5. \neg C 2,4 MT
6. D 1,5 DS
7. M 2,6 MP

There is no unique derivation for a given conclusion from a given set of premises. Here is a distinct derivation for the same conclusion from the same premises:

1. C \lor D Premise
2. C \rightarrow O Premise
3. D \rightarrow M Premise
4. \neg O Premise
5. (C \rightarrow O) \land (D \rightarrow M) 2,3 Conj
6. O \lor M 1,5 CD
7. M 4,6 DS

Consider next the argument:

\( \begin{array}{l} P \leftrightarrow Q\\ (S \lor T) \rightarrow Q\\ \neg P \lor (\neg T \land R)\\ \hline T \rightarrow U \end{array} \)

This argument has six distinct statement letters, and hence constructing a truth table for it would require 64 rows. The table would have 22 columns, thereby requiring 1,408 distinct T/F calculations. Happily, the derivation of the conclusion of the premises using our inference and replacement rules, while far from simple, is relatively less exhausting:

1. P \leftrightarrow Q Premise
2. (S \lor T) \rightarrow Q Premise
3. \neg P \lor (\neg T \land R) Premise
4. (P \rightarrow Q) \land (Q \rightarrow P) 1 Equiv
5. Q \rightarrow P 4 Simp
6. (S \lor T) \rightarrow P 2,5 HS
7. P \rightarrow (\neg T \land R) 3 Impl
8. (S \lor T) \rightarrow (\neg T \land R) 6,7 HS
9. \neg (S \lor T) \lor (\neg T \land R) 8 Impl
10. (\neg S \land \neg T) \lor (\neg T \land R) 9 DM
11. ((\neg S \land \neg T) \lor \neg T) \land ((\neg S \land \neg T) \lor R) 10 Dist
12. (\neg S \land \neg T) \lor \neg T 11 Simp
13. \neg T \lor (\neg S \land \neg T) 12 Com
14. (\neg T \lor \neg S) \land (\neg T \lor \neg T) 13 Dist
15. \neg T \lor \neg T 14 Simp
16. \neg T 15 Taut
17. \neg T \lor U 16 Add
18. T \rightarrow U 17 Impl

e. Conditional and Indirect Proofs

Together the nine inference rules and ten rules of replacement are sufficient for creating a deduction for any logically valid argument, provided that the argument has at least one premise. However, to cover the limiting case of arguments with no premises, and simply to facillitate certain deductions that would be recondite otherwise, it is also customary to allow for certain methods of deduction other than direct derivation. Specifically, it is customary to allow the proof techniques known as conditional proof and indirect proof.

A conditional proof is a derivation technique used to establish a conditional wff, that is, a wff whose main operator is the sign ‘→’. This is done by constructing a sub-derivation within a derivation in which the antecedent of the conditional is assumed as a hypothesis. If, by using the inference rules and rules of replacement (and possibly additional sub-derivations), it is possible to arrive at the consequent, it is permissible to end the sub-derivation and conclude the truth of the conditional statement within the main derivation, citing the sub-derivation as a conditional proof, or ‘CP’ for short. This is much clearer by considering the following example argument:

\( \begin{array}{l} P \rightarrow (Q \lor R)\\ P \rightarrow \neg S\\ S \leftrightarrow Q\\ \hline P \rightarrow R \end{array} \)

While a direct derivation establishing the validity of this argument is possible, it is easier to establish the validity of this argument using a conditional derivation.

1. P \rightarrow (Q \lor R) Premise
2. P \rightarrow \neg S Premise
3. S \leftrightarrow Q Premise
4. P Assumption
5. Q \lor R 1,4 MP
6. \neg S 2,4 MP
7. (S \rightarrow Q) \land (Q \rightarrow S) 3 Equiv
8. Q \rightarrow S 7 Simp
9. \neg Q 6,8 MT
10. R 5,9 DS
11. P \rightarrow R 4-10 CP

Here in order to establish the conditional statement “P \rightarrow R“, we constructed a sub-derivation, which is the portion found at lines 4-10. First, we assumed the truth of ‘P‘, and found that with it, we could derive ‘R‘. Given the premises, we therefore had shown that if ‘P‘ were also true, so would be ‘R‘. Therefore, on the basis of the sub-derivation we were justified in concluding “P \rightarrow R“. This is the usual methodology used in logic and mathematics for establishing the truth of a conditional statement.

Another common method is that of indirect proof, also known as proof by reductio ad absurdum. (For a fuller discussion, see the article on reductio ad absurdum in the encyclopedia.) In an indirect proof (‘IP’ for short), our goal is to demonstrate that a certain wff is false on the basis of the premises. Again, we make use of a sub-derivation; here, we begin by assuming the opposite of that which we’re trying to prove, that is, we assume that the wff is true. If on the basis of this assumption, we can demonstrate an obvious contradiction, that is, a statement of the form \ulcorner \alpha \land \neg \alpha \urcorner, we can conclude that the assumed statement must be false, because anything that leads to a contradiction must be false.

For example, consider the following argument:

\( \begin{array}{l} P \rightarrow Q\\ P \rightarrow (Q \rightarrow \neg P)\\ \hline \neg P \end{array} \)

While, again, a direct derivation of the conclusion for this argument from the premises is possible, it is somewhat easier to prove that “\neg P” is true by showing that, given the premises, it would be impossible for ‘P‘ to be true by assuming that it is and showing this to be absurd.

1. P \rightarrow Q Premise
2. P \rightarrow (Q \rightarrow \neg P) Premise
3. P Assumption
4. Q 1,3 MP
5. Q \rightarrow \neg P 2,3 MP
6. \neg P 4,5 MP
7. P \land \neg P 3,6 Conj
8. \neg P 3-7 IP

Here we were attempting to show that “\neg P” was true given the premises. To do this we assumed instead that ‘P‘ was true. Since this assumption was impossible, we were justified in concluding that ‘P‘ is false, that is, that “\neg P” is true.

When making use of either conditional proof or indirect proof, once a sub-derivation is finished, the lines making it up cannot be used later on in the main derivation or any additional sub-derivations that may be constructed later on.

This completes our characterization of a system of natural deduction for the language PL.

The system of natural deduction just described is formally adequate in the following sense. Earlier, we defined a valid argument as one in which there is no possible truth-value assignment to the statement letters making up its premises and conclusion that makes the premises all true but the conclusion untrue. It is provable that an argument in the language of PL is formally valid in that sense if and only if it is possible to construct a derivation of the conclusion of that argument from the premises using the above rules of inference, rules of replacement and techniques of conditional and indirect proof. Space limitations preclude a full proof of this in the metalanguage, although the reasoning is very similar to that given for the axiomatic Propositional Calculus discussed in Sections VI and VII below.

Informally, it is fairly easy to see that no argument for which a deduction is possible in this system could be invalid according to truth tables. Firstly, the rules of inference are all truth-preserving. For example, in the case of modus ponens, it is fairly easy to see from the truth table for any set of statements of the appropriate form that no truth-value assignment could make both \ulcorner \alpha \rightarrow \beta \urcorner and \alpha true while making \beta false. A similar consideration applies for the others. Moreover, truth tables can easily be used to verify that statements of one of the forms mentioned in the rules of replacement are all logically equivalent with those the rule allows one to swap for them. Hence, the statements could never differ in truth-value for any truth-value assignment. In case of conditional proof, note that any truth-value assignment must make either the conditional true, or it must make the antecedent true and consequent false. The antecedent is what is assumed in a conditional proof. So, if the truth-value assignment makes both it and the premises of the argument true, because the other rules are all truth-preserving, it would be impossible to derive the consequent unless it were also true. A similar consideration justifies the use of indirect proof.

This system represents a useful method for establishing the validity of an argument that has the advantage of coinciding more closely with the way we normally reason. (As noted earlier, however, there are many equivalent systems of natural deduction, all coinciding relatively closely to ordinary reasoning patterns.) One disadvantage this method has, however, is that, unlike truth tables, it does not provide a means for recognizing that an argument is invalid. If an argument is invalid, there is no deduction for it in the system. However, the system itself does not provide a means for recognizing when a deduction is impossible.

Another objection that might be made to the system of deduction sketched above is that it contains more rules and more techniques than it needs to. This leads us directly into our next topic.

6. Axiomatic Systems and the Propositional Calculus

The system of deduction discussed in the previous section is an example of a natural deduction system, that is, a system of deduction for a formal language that attempts to coincide as closely as possible to the forms of reasoning most people actually employ. Natural systems of deduction are typically contrasted with axiomatic systems. Axiomatic systems are minimalist systems; rather than including rules corresponding to natural modes of reasoning, they utilize as few basic principles or rules as possible. Since so few kinds of steps are available in a deduction, relatively speaking, an axiomatic system usually requires more steps for the deduction of a conclusion from a given set of premises as compared to a natural deduction system.

Typically, an axiomatic system consists in the specification of certain wffs that are specified as “axioms”. An axiom is something that is taken as a fundamental truth of the system that does not itself require proof. To allow for the deduction of results from the axioms or the premises of an argument, the system typically also includes at least one (and often only one) rule of inference. Usually, an attempt is made to limit the number of axioms to as few as possible, or at least, limit the number of forms axioms can take.

Because axiomatic systems aim to be minimal, typically they employ languages with simplified vocabularies whenever possible. For classical truth-functional propositional logic, this might involve using a simpler language such as PL’ or PL” instead of the full language PL.

For most of the remainder of this section, we shall sketch an axiomatic system for classical truth-functional propositional logic, which we shall dub the Propositional Calculus (or PC for short). The Propositional Calculus makes use of language PL’, described above. That is, the only connectives it uses are ‘→’ and ‘\neg‘, and the other operators, if used at all, would be understood as shorthand abbreviations making use of the definitions discussion in Section III(c).

System PC consists of three axiom schemata, which are forms a wff fits if it is axiom, along with a single inference rule: modus ponens. We make this more precise by specifying certain definitions.

Definition: a wff of language PL’ is an axiom of PC if and only if it is an instance of one of the following three forms:

\alpha \rightarrow (\beta \rightarrow \alpha) (Axiom Schema 1, or AS1)
(\alpha \rightarrow (\beta \rightarrow \gamma)) \rightarrow ((\alpha \rightarrow \beta) \rightarrow (\alpha \rightarrow \gamma)) (Axiom Schema 2, or AS2)
(\neg \alpha \rightarrow \neg \beta) \rightarrow ((\neg \alpha \rightarrow \beta) \rightarrow \alpha) (Axiom Schema 3, or AS3)

Note that according to this definition, every wff of the form \ulcorner \alpha \rightarrow (\beta \rightarrow \alpha) \urcorner is an axiom. This includes an infinite number of different wffs, from simple cases such as “P \rightarrow (Q \rightarrow P)“, to much more complicated cases such as “(\neg R \rightarrow \neg \neg S) \rightarrow (\neg(\neg M \rightarrow N) \rightarrow (\neg R \rightarrow \neg \neg S))“.

An ordered step-by-step deduction constitutes a derivation in system PC if and only if each step in the deduction is either (1) a premise of the argument, (2) an axiom, or (3) derived from previous steps by modus ponens. Once again we can make this more precise with the following (more recondite) definition:

Definition: an ordered sequence of wffs \beta_1, \beta_2, ..., \beta_n is a derivation in system PC of the wff \beta_n from the premises \alpha_1, \alpha_2, ..., \alpha_m if and only if, for each wff \beta_i in the sequence \beta_1, \beta_2, ..., \beta_n, either (1) \beta_i is one of the premises \alpha_1, \alpha_2, ..., \alpha_m, (2) \beta_i is an axiom of PC, or (3) \beta_i follows from previous members of the series by the inference rule modus ponens (that is, there are previous members of the sequence, \beta_j and \beta_k, such that \beta_j takes the form \ulcorner \beta_k \rightarrow \beta_i \urcorner).

For example, consider the following argument written in the language PL’:

\( \begin{array}{l} P\\ (R \rightarrow P) \rightarrow (R \rightarrow (P \rightarrow S))\\ \hline R \rightarrow S \end{array} \)

The following constitutes a derivation in system PC of the conclusion from the premises:

1. P Premise
2. (R \rightarrow P) \rightarrow (R \rightarrow (P \rightarrow S)) Premise
3. P \rightarrow (R \rightarrow P) Instance of AS1
4. R \rightarrow P 1,3 MP
5. R \rightarrow (P \rightarrow S) 2,4 MP
6. (R \rightarrow (P \rightarrow S)) \rightarrow ((R \rightarrow P) \rightarrow (R \rightarrow S)) Instance of AS2
7. (R \rightarrow P) \rightarrow (R \rightarrow S) 5,6 MP
8. R \rightarrow S 4,7 MP

Historically, the original axiomatic systems for logic were designed to be akin to other axiomatic systems found in mathematics, such as Euclid’s axiomatization of geometry. The goal of developing an axiomatic system for logic was to create a system in which to derive truths of logic making use only of the axioms of the system and the inference rule(s). Those wffs that can be derived from the axioms and inference rule alone, that is, without making use of any additional premises, are called theorems or theses of the system. To make this more precise:

Definition: a wff \alpha is said to be a theorem of PC if and only if there is an ordered sequence of wffs, specifically, a derivation, \beta_1, \beta_2, ..., \beta_n such that, \alpha is \beta_n and each wff \beta_i in the sequence \beta_1, \beta_2, ..., \beta_n, is such that either (1) \beta_i is an axiom of PC, or (2) \beta_i follows from previous members of the series by modus ponens.

One very simple theorem of system PC is the wff “P \rightarrow P“. We can show that it is a theorem by constructing a derivation of “P \rightarrow P” that makes use only of axioms and MP and no additional premises.

1. P \rightarrow (P \rightarrow P) Instance of AS1
2. P \rightarrow ((P \rightarrow P) \rightarrow P) Instance of AS1
3. (P \rightarrow ((P \rightarrow P) \rightarrow P)) \rightarrow ((P \rightarrow (P \rightarrow P)) \rightarrow (P \rightarrow P)) Instance of AS2
4. (P \rightarrow (P \rightarrow P)) \rightarrow (P \rightarrow P) 2,3 MP
5. P \rightarrow P 1,4 MP

It is fairly easy to see that not only is “P \rightarrow P” a theorem of PC, but so is any wff of the form \ulcorner \alpha \rightarrow \alpha \urcorner. Whatever \alpha happens to be, there will be a derivation in PC of the same form:

1. \alpha \rightarrow (\alpha \rightarrow \alpha) Instance of AS1
2. \alpha \rightarrow ((\alpha \rightarrow \alpha) \rightarrow \alpha) Instance of AS1
3. (\alpha \rightarrow ((\alpha \rightarrow \alpha) \rightarrow \alpha)) \rightarrow ((\alpha \rightarrow (\alpha \rightarrow \alpha)) \rightarrow (\alpha \rightarrow \alpha)) Instance of AS2
4. (\alpha \rightarrow (\alpha \rightarrow \alpha)) \rightarrow (\alpha \rightarrow \alpha) 2,3 MP
5. \alpha \rightarrow \alpha 1,4 MP

So even if we make \alpha in the above the more complicated wff, for example, “\neg (\neg M \rightarrow N)“, a derivation with the same form shows that “\neg (\neg M \rightarrow N) \rightarrow \neg (\neg M \rightarrow N)” is also a theorem of PC. Hence, we call \ulcorner \alpha \rightarrow \alpha \urcorner a theorem schema of PC, because all of its instances are theorems of PC. From now on, let’s call it “Theorem Schema 1”, or “TS1” for short.

The following are also theorem schemata of PC:

\alpha \rightarrow \neg \neg \alpha (Theorem Schema 2, or TS2)
\neg \alpha \rightarrow (\alpha \rightarrow \beta) (TS3)
\alpha \rightarrow (\neg \beta \rightarrow \neg (\alpha \rightarrow \beta)) (TS4)
(\alpha \rightarrow \beta) \rightarrow ((\neg \alpha \rightarrow \beta) \rightarrow \beta) (TS5)

You may wish to verify this for yourself by attempting to construct the appropriate proofs for each. Be warned that some require quite lengthy derivations!

It is common to use the notation:

\vdash \beta

to mean that β is a theorem. Similarly, it is common to use the notation:

\alpha_1, \alpha_2, ..., \alpha_m \vdash \beta

to mean that it is possible to construct a derivation of \beta making use of \alpha_1, \alpha_2, ..., \alpha_m as premises.

Considered in terms of number of rules it employs, the axiomatic system PC is far less complex than the system of natural deduction sketched in the previous section. The natural deduction system made use of nine inference rules, ten rules of replacement and two additional proof techniques. The axiomatic system instead, makes use of three axiom schemata and a single inference rule and no additional proof techniques. Yet, the axiomatic system is not lacking in any way.

Indeed, for any argument using language PL’ that is logically valid according to truth tables it is possible to construct a derivation in system PC for that argument. Moreover, every wff of language PL’ that is a logical truth, that is, a tautology according to truth tables, is a theorem of PC. The reverse of these results is true as well; every theorem of PC is a tautology, and every argument for which a derivation in system PC exists is logically valid according to truth tables. These and other features of the Propositional Calculus are discussed, and some are even proven in the next section below.

While the Propositional Calculus is simpler in one way than the natural deduction system sketched in the previous section, in many ways it is actually more complicated to use. For any given argument, a deduction of the conclusion from the premises conducted in PC is likely to be far longer and less psychologically natural than one carried out in a natural deduction system. Such deductions are only simpler in the sense that fewer distinct rules are employed.

System PC is only one of many possible ways of axiomatizing propositional logic. Some systems differ from PC in only very minor ways. For example, we could alter our definition of “axiom” so that a wff is an axiom iff it is an instance of (A1), an instance of (A2), or an instance of the following:

(A3′) (\neg \alpha \rightarrow \neg \beta) \rightarrow (\beta \rightarrow \alpha)

Replacing axiom schema (A3) with (A3′), while altering the way certain deductions must be constructed (making the proofs of many important results longer), has little effect otherwise; the resulting system would have all the same theorems and every argument for which a deduction is possible in the system above would also have a deduction in the revised system, and vice versa.

We also noted above that, strictly speaking, there are an infinite number of axioms of system PC. Instead of utilizing an infinite number of axioms, we might alternatively have utilized only three axioms, namely, the specific wffs:

(A1*) P \rightarrow (Q \rightarrow P)
(A2*) (P \rightarrow (Q \rightarrow R)) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R))
(A3*) (\neg P \rightarrow \neg Q) \rightarrow ((\neg P \rightarrow Q) \rightarrow P)

Note that (A1*) is just a unique wff; on this approach, the wff “(\neg R \rightarrow \neg \neg S) \rightarrow (\neg(\neg M \rightarrow N) \rightarrow (\neg R \rightarrow \neg \neg S))” would not count as an axiom, even though it shares a common form with (A1*). To such a system it would be necessary to add an additional inference rule, a rule of substitution or uniform replacement. This would allow one to infer, from a theorem of the system, the result of uniformly replacing any given statement letter (for example, ‘P‘ or ‘Q‘) that occurs within the theorem, with any wff, simple or complex, provided that the same wff replaces all occurrences of the same statement letter in the theorem. On this approach, “(\neg R \rightarrow \neg \neg S) \rightarrow (\neg(\neg M \rightarrow N) \rightarrow (\neg R \rightarrow \neg \neg S))“, while not an axiom, would still be a theorem because it could be derived from the rule of uniform replacement twice, that is, by first replacing ‘P‘ in (A1*) with “(\neg R \rightarrow \neg \neg S)“, and then replacing ‘Q‘ with “\neg(\neg M \rightarrow N)“. The resulting system differs in only subtle ways from our earlier system PC. System PC, strictly speaking, uses only one inference rule, but countenances an infinite number of axioms. This system uses only three axioms, but makes use of an additional rule. System PC, however, avoids this additional inference rule by allowing everything that one could get by substitution in (A1*) to be an axiom. For every theorem \alpha, therefore, if \beta is a wff obtained from \alpha by uniformly substituting wffs for statement letters in \alpha, then \beta is also a theorem of PC, because there would always be a proof of \beta analogous to the proof of \alpha only beginning from different axioms.

It is also possible to construct even more austere systems. Indeed, it is possible to utilize only a single axiom schema (or a single axiom plus a rule of replacement). One possibility, suggested by C. A. Meredith (1953), would be to define an axiom as any wff matching the following form:

((((\alpha \rightarrow \beta) \rightarrow (\neg \gamma \rightarrow \neg \delta)) \rightarrow \gamma) \rightarrow \epsilon) \rightarrow ((\epsilon \rightarrow \alpha) \rightarrow (\delta \rightarrow \alpha))

The resulting system is equally powerful as system PC and has exactly the same set of theorems. However, it is far less psychologically intuitive and straightforward, and deductions even for relatively simple results are often very long.

Historically, the first single axiom schema system made use, instead of language PL’, the even simpler language PL” in which the only connective is the Sheffer stroke, ‘|’, as discussed above. In that case, it is possible to make use only of the following axiom schema:

(\alpha | (\beta | \gamma)) | ((\delta | (\delta | \delta)) | ((\epsilon | \beta) | ((\alpha | \epsilon) | (\alpha | \epsilon))))

The inference rule of MP is replaced with the rule that from wffs of the form \ulcorner \alpha | (\beta | \gamma) \urcorner and \alpha, one can deduce the wff \gamma. This system was discovered by Jean Nicod (1917). Subsequently, a number of possible single axiom systems have been found, some faring better than others in terms of the complexity of the single axiom and in terms of how long deductions for the same results are required to be. (For research in this area, consult McCune et. al. 2002.) Generally, however the more the system allows, the shorter the deductions.

Besides axiomatic and natural deduction forms, deduction systems for propositional logic can also take the form of a sequent calculus; here, rather than specifying definitions of axioms and inference rules, the rules are stated directly in terms of derivability or entailment conditions; for example, one rule might state that if (either \alpha \vdash \beta or \alpha \vdash \gamma) then if \gamma, \alpha \vdash \beta then \alpha \vdash \beta. Sequent calculi, like modern natural deduction systems, were first developed by Gerhard Gentzen. Gentzen’s work also suggests the use of tree-like deduction systems rather than linear step-by-step deduction systems, and such tree systems have proven more useful in automated theorem-proving, that is, in the creation of algorithms for the mechanical construction of deductions (for example, by a computer). However, rather then exploring the details of these and other rival systems, in the next section, we focus on proving things about the system PC, the axiomatic system treated at length above.

7. Important Meta-Theoretic Results for the Propositional Calculus

Note: this section is relatively more technical, and is designed for audiences with some prior background in logic or mathematics. Beginners may wish to skip to the next section.

In this section, we sketch informally the proofs given for certain important features of the Propositional Calculus. Our first topic, however, concerns the language PL’ generally.

Metatheoretic result 1: Language PL’ is expressively adequate, that is, within the context of classical bivalent logic, there are no truth-functions that cannot be represented in it.

We noted in Section III(c) that the connectives ‘\land‘, ‘↔’ and ‘\lor‘ can be defined using the connectives of PL’ (‘→’ and ‘\neg‘). More generally, metatheoretic result 1 holds that any statement built using truth-functional connectives, regardless of what those connectives are, has an equivalent statement formed using only ‘→’ and ‘\neg‘. Here’s the proof.

1. Assume that \alpha is some wff built in some language containing any set of truth-functional connectives, including those not found in PL, PL’ or PL”. For example, \alpha might make use of some three or four-place truth-functional connectives, or connectives such as the exclusive or, or the sign ‘\downarrow‘, or any others you might imagine.

2. We need to show that there is a wff \beta formed only with the connectives ‘→’ and ‘\neg‘ that is logically equivalent with \alpha. Because we have already shown that forms equivalent to those built from ‘\land‘, ‘↔’, and ‘\lor‘ can be constructed from ‘→’ and ‘\neg‘, we are entitled to use them as well.

3. In order for it to be logically equivalent to \alpha, the wff \beta that we construct must have the same final truth-value for every possible truth-value assignment to the statement letters making up \alpha, or in other words, it must have the same final column in a truth table.

4. Let p_1, p_2, ..., p_n be the distinct statement letters making up \alpha. For some possible truth-value assignments to these letters, \alpha may be true, and for others \alpha may be false. The only hard case would be the one in which \alpha is contingent. If \alpha were not contingent, it must either be a tautology, or a self-contradiction. Since clearly tautologies and self-contradictions can be constructed in PL’, and all tautologies are logically equivalent to one another, and all self-contradictions are equivalent to one another, in those cases, our job is easy. Let us suppose instead that \alpha is contingent.

5. Let us construct a wff \beta in the following way.

(a) Consider in turn each truth-value assignment to the letters p_1, p_2, ..., p_n. For each truth-value assignment, construct a conjunction made up of those letters the truth-value assignment makes true, along with the negations of those letters the truth-value assignment makes false. For instance, if the letters involved are ‘A‘, ‘B‘ and ‘C‘, and the truth-value assignment makes ‘A‘ and ‘C‘ true but ‘B‘ false, consider the conjunction ‘((A \land \neg B) \land C)‘.

(b) From the resulting conjunctions, form a complex disjunction formed from those conjunctions formed in step (a) for which the corresponding truth-value assignment makes \alpha true. For example, if the truth-value assignment making ‘A‘ and ‘C‘ true but ‘B‘ false makes \alpha true, include it the disjunction. Suppose, for example, that this truth-value assignment does make \alpha true, as does that assignment in which ‘A‘ and ‘B‘ and ‘C‘ are all made false, but no other truth-value assignment makes \alpha true. In that case, the resulting disjunction would be ‘((A \land \neg B) \land C) \lor ((\neg A \land \neg B) \land \neg C)‘.

6. The wff \beta constructed in step 5 is logically equivalent to \alpha. Consider that for those truth-value assignments making \alpha true, one of the conjunctions making up the disjunction \beta is true, and hence the whole disjunction is true as well. For those truth-value assignments making \alpha false, none of the conjunctions making up \beta is true, because each conjunction will contain at least one conjunct that is false on that truth-value assignment.

7. Because \beta is constructed using only ‘\land‘, ‘\lor‘ and ‘\neg‘, and these can in turn be defined using only ‘\neg‘ and ‘→’, and because \beta is equivalent to \alpha, there is a wff built up only from ‘\neg‘ and ‘→’ that is equivalent to \alpha, regardless of the connectives making up \alpha.

8. Therefore, PL’ is expressively adequate.

Corollary 1.1: Language PL” is also expressively adequate.

The corollary follows at once from metatheoretic result 1, along with the fact, noted in Section III(c), that ‘→’, and ‘\neg‘ can be defined using only ‘|’.

Metatheoretic result 2 (a.k.a. “The Deduction Theorem”): In the Propositional Calculus, PC, whenever it holds that \alpha_1, ..., \alpha_n \vdash \beta, it also holds that \alpha_1, ..., \alpha_{n-1} \vdash \alpha_n \rightarrow \beta

What this means is that whenever we can prove a given result in PC using a certain number of premises, then it is possible, using all the same premises leaving out one exception, \alpha_n, to prove the conditional statement made up of the removed premise, \alpha_n, as antecedent and the conclusion of the original derivation, \beta, as consequent. The importance of this result is that, in effect, it shows that the technique of conditional proof, typically found in natural deduction (see Section V), is unnecessary in PC, because whenever it is possible to prove the consequent of a conditional by taking the antecedent as an additional premise, a derivation directly for the conditional can be found without taking the antecedent as a premise.

Here’s the proof:

1. Assume that \alpha_1, ..., \alpha_n \vdash \beta. This means that there is a derivation of \beta in the Propositional Calculus from the premises \alpha_1, ..., \alpha_n. This derivation takes the form of an ordered sequence \gamma_1, \gamma_2, ..., \gamma_m, where the last member of the sequence, \gamma_m, is \beta, and each member of the sequence is either (1) a premise, that is, it is one of \alpha_1, ..., \alpha_n, (2) an axiom of PC, (3) derived from previous members of the sequence by modus ponens.

2. We need to show that there is a derivation of \ulcorner \alpha_n \rightarrow \beta \urcorner, which, while possibly making use of the other premises of the argument, does not make use of \alpha_n. We’ll do this by showing that for each member, \gamma_i, of the sequence of the original derivation: \gamma_1, \gamma_2, ..., \gamma_m, one can derive \ulcorner \alpha_n \rightarrow \gamma_i \urcorner without making use of \alpha_n as a premise.

3. Each step \gamma_i in the sequence of the original derivation was gotten at in one of three ways, as mentioned in (1) above. Regardless of which case we are dealing with, we can get the result that \alpha_1, ..., \alpha_{n-1} \vdash \alpha_n \rightarrow \gamma_i. There are three cases to consider:

Case (a): Suppose \gamma_i is a premise of the original argument. Then \gamma_i is either one of \alpha_1, ..., \alpha_{n-1} or it is \alpha_n itself. In the latter subcase, what we desire to get is that \ulcorner \alpha_n \rightarrow \alpha_n \urcorner can be gotten at without using \alpha_n as a premise. Because \ulcorner \alpha_n \rightarrow \alpha_n \urcorner is an instance of TS1, we can get it without using any premises. In the latter case, notice that \gamma_i is one of the premises we’re allowed to use in the new derivation. We’re also allowed to introduce the instance of AS1, \ulcorner \gamma_i \rightarrow (\alpha_n \rightarrow \gamma_i) \urcorner. From these, we can get \ulcorner \alpha_n \rightarrow \gamma_i \urcorner by modus ponens.

Case (b): Suppose \gamma_i is an axiom. We need to show that we can get \ulcorner \alpha_n \rightarrow \gamma_i \urcorner without using \alpha_n as a premise. In fact, we can get it without using any premises. Because \gamma_i is an axiom, we can use it in the new derivation as well. As in the last case, we have \ulcorner \gamma_i \rightarrow (\alpha_n \rightarrow \gamma_i) \urcorner as another axiom (an instance of AS1). From these two axioms, we arrive at \ulcorner \alpha_n \rightarrow \gamma_i \urcorner by modus ponens.

Case (c): Suppose that \gamma_i was derived from previous members of the sequence by modus ponens. Specifically, there is some \gamma_j and \gamma_k such that both j and k are less than i, and \gamma_j takes the form \ulcorner \gamma_k \rightarrow \gamma_i \urcorner. We can assume that we have already been able to derive both \ulcorner \alpha_n \rightarrow \gamma_j \urcorner—that is, \ulcorner \alpha_n \rightarrow (\gamma_k \rightarrow \gamma_i) \urcorner—and \ulcorner \alpha_n \rightarrow \gamma_k \urcorner in the new derivation without making use of \alpha_n. (This may seem questionable in the case that either \gamma_j or \gamma_k was itself gotten at by modus ponens. But notice that this just pushes the assumption back, and eventually one will reach the beginning of the original derivation. The first two steps of the sequence, namely, \gamma_1 and \gamma_2, cannot have been derived by modus ponens, since this would require there to have been two previous members of the sequence, which is impossible.) So, in our new derivation, we already have both \ulcorner \alpha_n \rightarrow (\gamma_k \rightarrow \gamma_i) \urcorner and \ulcorner \alpha_n \rightarrow \gamma_k \urcorner.
Notice that \ulcorner (\alpha_n \rightarrow (\gamma_k \rightarrow \gamma_i)) \rightarrow ((\alpha_n \rightarrow \gamma_k) \rightarrow (\alpha_n \rightarrow \gamma_i)) \urcorner is an instance of AS2, and so it can be introduced in the new derivation. By two steps of modus ponens, we arrive at \ulcorner \alpha_n \rightarrow \gamma_i \urcorner, again without using \alpha_n as a premise.

4. If we continue through each step of the original derivation, showing for each such step \gamma_i, we can get \ulcorner \alpha_n \rightarrow \gamma_i \urcorner without using \alpha_n as a premise, eventually, we come to the last step of the original derivation, \gamma_m, which is \beta itself. Applying the procedure from step (3), we get that \ulcorner \alpha_n \rightarrow \beta \urcorner without making use of \alpha_n as a premise. Therefore, the new derivation formed in this way shows that \alpha_1, ..., \alpha_{n-1} \vdash \alpha_n \rightarrow \beta, which is what we were attempting to show.

What’s interesting about this proof for metatheoretic result 2 is that it provides a recipe, given a derivation for a certain result that makes use of one or more premises, for transforming that derivation into one of a conditional statement in which one of the premises of the original argument has become the antecedent. This may be much clearer with an example.

Consider the following derivation for the result that Q \rightarrow R \vdash (P \rightarrow Q) \rightarrow (P \rightarrow R):

1. Q \rightarrow R Premise
2. (Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R)) AS1
3. P \rightarrow (Q \rightarrow R) 1,2 MP
4. (P \rightarrow (Q \rightarrow R)) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R)) AS2
5. (P \rightarrow Q) \rightarrow (P \rightarrow R) 3,4 MP

It is possible to transform the above derivation into one that uses no premises and that shows that \ulcorner (Q \rightarrow R) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R)) \urcorner is a theorem of PC. The procedure for such a transformation involves looking at each step of the original derivation, and for each one, attempting to derive the same statement, only beginning with “(Q \rightarrow R) \rightarrow ...“, without making use of “(Q \rightarrow R)” as a premise. How this is done depends on whether the step is a premise, an axiom, or a result of modus ponens, and depending on which it is, applying one of the three procedures sketched in the proof above. The result is the following:

1. (Q \rightarrow R) \rightarrow (Q \rightarrow R) TS1
2. (Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R)) AS1
3. ((Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R))) \rightarrow ((Q \rightarrow R) \rightarrow ((Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R)))) AS1
4. (Q \rightarrow R) \rightarrow ((Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R))) 2,3 MP
5. ((Q \rightarrow R) \rightarrow ((Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R)))) \rightarrow (((Q \rightarrow R) \rightarrow (Q \rightarrow R)) \rightarrow ((Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R)))) AS2
6. ((Q \rightarrow R) \rightarrow (Q \rightarrow R)) \rightarrow ((Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R)))) 4,5 MP
7. (Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R)) 1,6 MP
8. (P \rightarrow (Q \rightarrow R)) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R)) AS2
9. ((P \rightarrow (Q \rightarrow R)) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R))) \rightarrow \big((Q \rightarrow R) \rightarrow ((P \rightarrow (Q \rightarrow R)) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R)))\big) AS1
10. (Q \rightarrow R) \rightarrow ((P \rightarrow (Q \rightarrow R)) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R))) 8,9 MP
11. \big((Q \rightarrow R) \rightarrow ((P \rightarrow (Q \rightarrow R)) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R)))\big) \rightarrow (((Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R))) \rightarrow ((Q \rightarrow R) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R)))) AS2
12. ((Q \rightarrow R) \rightarrow (P \rightarrow (Q \rightarrow R))) \rightarrow ((Q \rightarrow R) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R))) 10,11 MP
13. (Q \rightarrow R) \rightarrow ((P \rightarrow Q) \rightarrow (P \rightarrow R)) 7,12 MP

The procedure for transforming one sort of derivation into another is purely rote. Moreover, the result is quite often not the most elegant or easy way to show that which you were trying to show. Notice, for example, in the above that lines (2) and (7) are redudant, and more steps were taken than necessary. However, the purely rote procedure is effective.

This metatheoretic result is due to Jacques Herbrand (1930).

It is interesting on its own, especially when one reflects on it as a substitution or replacement for the conditional proof technique. However, it is also very useful for proving other metatheoretic results, as we shall see below.

Metatheoretic result 3: If \alpha is a wff of language PL’, and the statement letters making it up are p_1, p_2, ..., p_n, then if we consider any possible truth-value assignment to these letters, and consider the set of premises, \Delta, that contains p_1 if the truth-value assignment makes p_1 true, but contains \ulcorner \neg p_1 \urcorner if the truth-value assignment makes p_1 false, and similarly for p_2, ..., p_n, if the truth-value assignment makes \alpha true, then in PC, it holds that \Delta \vdash \alpha, and if it makes \alpha false, then \Delta \vdash \neg \alpha.

Here’s the proof.

1. By the definition of a wff, \alpha is either itself a statement letter, or ultimately built up from statement letters by the connectives ‘\neg‘ and ‘→’.

2. If \alpha is itself a statement letter, then obviously either it or its negation is a member of \Delta. It is a member of \Delta if the truth-value assignment makes it true. In that case, obviously, there is a derivation of \alpha from \Delta, since a premise maybe introduced at any time. If the truth-value assignment makes it false instead, then \ulcorner \neg \alpha \urcorner is a member of \Delta, and so we have a derivation of \ulcorner \neg \alpha \urcorner from \Delta, since again a premise may be introduced at any time. This covers the case in which our wff is simply a statement letter.

3. Suppose that \alpha is built up from some other wff \beta with the sign ‘\neg‘, that is, suppose that \alpha is \ulcorner \neg \beta \urcorner. We can assume that we have already gotten the desired result for \beta. (Either \beta is a statement letter, in which case the result holds by step (2), or is itself ultimately built up from statement letters, so even if verifying this assumption requires making a similar assumption, ultimately we will get back to statement letters.) That is, if the truth-value assignment makes \beta true, then we have a derivation of \beta from \Delta. If it makes it false, then we have a derivation of \ulcorner \neg \beta \urcorner from \Delta. Suppose that it makes \beta true. Since \alpha is the negation of \beta, the truth-value assignment must make \alpha false. Hence, we need to show that there is a derivation of \ulcorner \neg \alpha \urcorner from \Delta . Since \alpha is \ulcorner \neg \beta \urcorner, \ulcorner \neg \alpha \urcorneris \ulcorner \neg \neg \beta \urcorner. If we append to our derivation of \beta from \Delta the derivation of \ulcorner \beta \rightarrow \neg \neg \beta \urcorner, an instance of TS2, we can reach a derivation of \ulcorner \neg \neg \beta \urcorner by modus ponens, which is what was required. If we assume instead that the truth-value assignment makes \beta false, then by our assumption, there is a derivation of \ulcorner \neg \beta \urcorner from \Delta. Since \alpha is the negation of \beta, this truth-value assigment must make \alpha true. Now, \alpha simply is \ulcorner \neg \beta \urcorner, so we already have a derivation of it from \Delta.

4. Suppose instead that \alpha is built up from other wffs \beta and \gamma with the sign ‘→’, that is, suppose that \alpha is \ulcorner \beta \rightarrow \gamma \urcorner. Again, we can assume that we have already gotten the desired result for \beta and \gamma. (Again, either they themselves are statement letters or built up in like fashion from statement letters.) Suppose that the truth-value assignment we are considering makes \alpha true. Because \alpha is \ulcorner \beta \rightarrow \gamma \urcorner, by the semantics for the sign ‘→’, the truth-value assignment must make either \beta false or \gamma true. Take the first subcase. If it makes \beta false, then by our assumption, there is a derivation of \ulcorner \neg \beta \urcorner from \Delta. If we append to this the derivation of the instance of TS3, \ulcorner \neg \beta \rightarrow (\beta \rightarrow \gamma) \urcorner, by modus ponens we arrive at derivation of \ulcorner \beta \rightarrow \gamma \urcorner, that is, \alpha, from \Delta. If instead, the truth-value assignment makes \gamma true, then by our assumption there is a derivation of \gamma from \Delta. If we add to this derivation the instance of AS1, \ulcorner \gamma \rightarrow (\beta \rightarrow \gamma) \urcorner, by modus ponens, we then again arrive at a derivation of \ulcorner \beta \rightarrow \gamma \urcorner, that is, \alpha, from \Delta. If instead, the truth-value assignment makes \alpha false, then since \alpha is \ulcorner \beta \rightarrow \gamma \urcorner, the truth-value assignment in question must make \beta true and \gamma false. By our assumption, then it is possible to prove both \beta and \ulcorner \neg \gamma \urcorner from \Delta. If we concatenate these two derivations, and add to them the derivation of the instance of TS4, \ulcorner \beta \rightarrow (\neg \gamma \rightarrow \neg (\beta \rightarrow \gamma)) \urcorner, then by two applications of modus ponens, we can derive \ulcorner \neg (\beta \rightarrow \gamma) \urcorner, which is simply \ulcorner \neg \alpha \urcorner, which is what was desired.

From the above we see that the Propositional Calculus PC can be used to demonstrate the appropriate results for a complex wff if given as premises either the truth or falsity of all its simple parts. This is of course the foundation of truth-functional logic, that the truth or falsity of those complex statements one can make in it be determined entirely by the truth or falsity of the simple statements entering in to it. Metatheoretic result 3 is again interesting on its own, but it plays a crucial role in the proof of completeness, which we turn to next.

Metatheoretic result 4 (Completeness): If \alpha is a wff of language PL’ and a tautology, then \alpha is a theorem of the Propositional Calculus.

This feature of the Propositional Calculus is called completeness because it shows that the Propositional Calculus, as a deductive system aiming to capture all the truths of logic, is a success. Every wff true solely in virtue of the truth-functional nature of the connectives making it up is something that one can prove using only the axioms of PC along with modus ponens. Here’s the proof:

1. Suppose that \alpha is a tautology. This means that every possible truth-value assignment to its statement letters makes it true.

2. Let the statement letters making up \alpha be p_1, p_2, ..., p_n, arranged in some order (say alphabetically and by the numerical order of their subscripts). It follows from (1) and metatheoretic result 3, that there is a derivation in PC of \alpha using any possible set of premises that consists, for each statement letter, of either it or its negation.

3. By metatheoretic result 2, we can remove from each of these sets of premises either p_n or \ulcorner \neg p_n \urcorner, depending on which it contains, and make it an antecedent of a conditional in which \alpha is consequent, and the result will be provable without using p_n or \ulcorner \neg p_n \urcorner as a premise. This means that for every possible set of premises consisting of either p_1 or \ulcorner \neg p_1 \urcorner and so on, up until p_{n-1}, we can derive both \ulcorner p_n \rightarrow \alpha \urcorner and \ulcorner \neg p_n \rightarrow \alpha \urcorner.

4. The wff \ulcorner (p_n \rightarrow \alpha) \rightarrow ((\neg p_n \rightarrow \alpha) \rightarrow \alpha) \urcorner is an instance of TS5. Therefore, for any set of premises from which one can derive both \ulcorner p_n \rightarrow \alpha \urcorner and \ulcorner \neg p_n \rightarrow \alpha \urcorner, by two applications of modus ponens, one can also derive \alpha itself.

5. Putting (3) and (4) together, we have the result that \alpha can be derived from every possible set of premises consisting of either p_1 or \ulcorner \neg p_1 \urcorner and so on, up until p_{n-1}.

6. We can apply the same reasoning given in steps (3)-(5) to remove p_{n-1} or its negation from the premise sets by the deduction theorem, arriving at the result that for every set of premises consisting of either p_1 or \ulcorner \neg p_1 \urcorner and so on, up until p_{n-2}, it is possible to derive \alpha. If we continue to apply this reasoning, eventually, we’ll get the result that we can derive \alpha with either p_1 or its negation as our sole premise. Again, applying the deduction theorem, this means that both \ulcorner p_1 \rightarrow \alpha \urcorner and \ulcorner \neg p_1 \rightarrow \alpha \urcorner can be proven in PC without using any premises, that is, they are theorems. Concatenating the derivations of these theorems, along with the instance of TS5, \ulcorner (p_1 \rightarrow \alpha) \rightarrow ((\neg p_1 \rightarrow \alpha) \rightarrow \alpha) \urcorner, and by two applications of modus ponens, it follows that \alpha itself is a theorem, which is what we sought to demonstrate.

The above proof of the completeness of system PC is easier to appreciate when visualized. Suppose, just for the sake of illustration, that the tautology we wish to demonstrate in system PC has three statement letters, ‘P‘, ‘Q‘ and ‘R‘. There are eight possible truth-value assignments to these letters, and since \alpha is a tautology, all of them make \alpha true. We can sketch in at least this much of \alpha‘s truth table:

P
Q
R
|
\alpha
T
T
T
T
F
F
F
F
T
T
F
F
T
T
F
F
T
F
T
F
T
F
T
F
T
T
T
T
T
T
T
T

Now, given this feature of \alpha, it follows from metatheoretic result 3, that for every possible combination of premises that consists of either ‘P‘ or “\neg P” (but not both), either ‘Q‘ or “\neg Q“, and ‘R‘ or “\neg R“, it is possible from those premises to construct a derivation showing \alpha. This can be visualized as follows:

P, Q, R \vdash \alpha
P, Q, \neg R \vdash \alpha
P, \neg Q, R \vdash \alpha
P, \neg Q, \neg R \vdash \alpha
\neg P, Q, R \vdash \alpha
\neg P, Q, \neg R \vdash \alpha
\neg P, \neg Q, R \vdash \alpha
\neg P, \neg Q, \neg R \vdash \alpha

By the deduction theorem, we can pull out the last premise from each list of premises and make it an antecedent. However, because from the same remaining list of premises we get both \ulcorner R \rightarrow \alpha \urcorner and \ulcorner \neg R \rightarrow \alpha \urcorner, we can get \alpha by itself from those premises according to TS5. Again, to visualize this:

P, Q \vdash R \rightarrow \alpha … and so P, Q \vdash \alpha
P, Q \vdash \neg R \rightarrow \alpha
P, \neg Q \vdash R \rightarrow \alpha … and so P, \neg Q \vdash \alpha
P, \neg Q \vdash \neg R \rightarrow \alpha
\neg P, Q \vdash R \rightarrow \alpha … and so \neg P, Q \vdash \alpha
\neg P, Q \vdash \neg R \rightarrow \alpha
\neg P, \neg Q \vdash R \rightarrow \alpha … and so \neg P, \neg Q \vdash \alpha
\neg P, \neg Q \vdash \neg R \rightarrow \alpha

We can continue this line of reasoning until all the premises are removed.

P, Q \vdash \alpha P \vdash Q \rightarrow \alpha and so P \vdash \alpha and so \vdash P \rightarrow \alpha and so \vdash \alpha
P, \neg Q \vdash \alpha P \vdash \neg Q \rightarrow \alpha
\neg P, Q \vdash \alpha \neg P \vdash Q \rightarrow \alpha and so \neg P \vdash \alpha and so \vdash \neg P \rightarrow \alpha
\neg P, \neg Q \vdash \alpha \neg P \vdash \neg Q \rightarrow \alpha

At the end of this process, we see that \alpha is a theorem. Despite only having three axiom schemata and a single inference rule, it is possible to prove any tautology in the simple Propositional Calculus, PC. It is complete in the requisite sense.

This method of proving the completeness of the Propositional Calculus is due to Kalmár (1935).

Corollary 4.1: If a given wff \beta of language PL’ is a logical consequence of a set of wffs \alpha_1, \alpha_2, ..., \alpha_n, according to their combined truth table, then there is a derivation of \beta with \alpha_1, \alpha_2, ..., \alpha_n as premises in the Propositional Calculus.

Without going into the details of the proof of this corollary, it follows from the fact that if \beta is a logical consequence of \alpha_1, \alpha_2, ..., \alpha_n, then the wff of the form \ulcorner (\alpha_1 \rightarrow (\alpha_2 \rightarrow ... (\alpha_n \rightarrow \beta)...)) \urcorner is a tautology. As a tautology, it is a theorem of PC, and so if one begins with its derivation in PC and appends a number of steps of modus ponens using \alpha_1, \alpha_2, ..., \alpha_n as premises, one can derive \beta.

Metatheoretic result 5 (Soundness): If a wff \alpha is a theorem of the Propositional Calculus (PC), then \alpha is a tautology.

Above, we saw that all tautologies are theorems of PC. The reverse is also true: all theorems of PC are tautologies. Here’s the proof:

1. Suppose that \alpha is a theorem of PC. This means that there is an ordered sequence of steps, each of which is either (1) an axiom of PC, or (2) derived from previous members of the sequence by modus ponens, and such that \alpha is the last member of the sequence.

2. We can show that not only is \alpha a tautology, but so are all the members of the sequence leading to it. The first thing to note is that every axiom of PC is a tautology. To be an axiom of PC, a wff must match one of the axiom schemata AS1, AS2 or AS3. All such wffs must be tautologous; this can easily be verified by constructing truth tables for AS1, AS2 and AS3. (This is left to the reader.)

3. The rule of modus ponens preserves tautologyhood. If \alpha is a tautology and \ulcorner \alpha \rightarrow \beta \urcorner is also a tautology, \beta must be a tautology as well. This is because if \beta were not a tautology, it would be false on some truth-value assignments. However, \alpha, as a tautology, is true for all truth-value assignments. Because a statement of the form \ulcorner \alpha \rightarrow \beta \urcorner is false for any truth-value assignment making \alpha true and \beta false, it would then follow that some truth-value assignment makes \ulcorner \alpha \rightarrow \beta \urcorner false, which is impossible if it too is a tautology.

4. Hence, we see that the axioms with which we begin the sequence, and every step derived from them using modus ponens, must all be tautologies, and consequently, the last step of the sequence, \alpha, must also be a tautology.

This result is called the soundness of the Propositional Calculus; it shows that in it, one cannot demonstrate something that is not logically true.

Corollary 5.1: A wff \alpha of language PL’ is a tautology if and only if \alpha is a theorem of system PC.

This follows immediately from metatheoretic results 4 and 5.

Corollary 5.2 (Consistency): There is no wff \alpha of language PL’ such that both \alpha and \ulcorner \neg \alpha \urcorner are theorems of the Propositional Calculus (PC).

Due to metatheoretic result 5, all theorems of PC are tautologies. It is therefore impossible for both \alpha and \ulcorner \neg \alpha \urcorner to be theorems, as this would require both to be tautologies. That would mean that both are true for all truth-value assignments, but obviously, they must have different truth-values for any given truth-value assignment, and cannot both be true for any, much less all, such assignments.

This result is called consistency because it guarantees that no theorem of system PC can be inconsistent with any other theorem.

Corollary 5.3: If there is a derivation of the wff \beta with \alpha_1, \alpha_2, ..., \alpha_n as premises in the Propositional Calculus, then \beta is a logical consequence of the set of wffs \alpha_1, \alpha_2, ..., \alpha_n, according to their combined truth table.

This is the converse of Corollary 4.1. It follows by the reverse reasoning involved in that corollary. If there is a derivation of \beta taking \alpha_1, \alpha_2, ..., \alpha_n as premises, then by multiple applications of the deduction theorem (Metatheoretic result 2), it follows that \ulcorner (\alpha_1 \rightarrow (\alpha_2 \rightarrow ... (\alpha_n \rightarrow \beta)...)) \urcorner is a theorem of PC. By metatheoretic result 5, \ulcorner (\alpha_1 \rightarrow (\alpha_2 \rightarrow ... (\alpha_n \rightarrow \beta)...)) \urcorner must be a tautology. If so, then there cannot be a truth-value assignment making all of \alpha_1, \alpha_2, ..., \alpha_n true while making \beta false, and so \beta is a logical consequence of \alpha_1, \alpha_2, ..., \alpha_n.

Corollary 5.4: There is a derivation of the wff \beta with \alpha_1, \alpha_2, ..., \alpha_n as premises in the Propositional Calculus if and only if \beta is a logical consequence of \alpha_1, \alpha_2, ..., \alpha_n, according to their combined truth table.

This follows at once from corollaries 4.1 and 5.3. In sum, then, the Propositional Calculus method of demonstrating something to follow from the axioms of logic is extensionally equivalent to the truth table method of determining whether or not something is a logical truth. Similarly, the truth-table method for testing the validity of an argument is equivalent to the test of being able to construct a derivation for it in the Propositional Calculus. In short, the Propositional Calculus is exactly what we wanted it to be.

Corollary 5.5 (Decidability): The Propositional Calculus (PC) is decidable, that is, there is a finite, effective, rote procedure for determining whether or not a given wff \alpha is a theorem of PC or not.

By Corollary 5.1, a wff \alpha is a theorem of PC if and only if it is a tautology. Truth tables provide a rote, effective, and finite procedure for determining whether or not a given wff is a tautology. They therefore also provide such a procedure for determining whether or not a given wff is a theorem of PC.

8. Forms of Propositional Logic

So far we have focused only on classical, truth-functional propositional logic. Its distinguishing features are (1) that all connectives it uses are truth-functional, that is, the truth-values of complex statements formed with those connectives depend entirely on the truth-values of the parts, and (2) that it assumes bivalence: all statements are taken to have exactly one of two truth-values—truth or falsity—with no statement assigned both truth-values or neither. Classical truth-functional propositional logic is the most widely studied and discussed form, but there are other forms of propositional logic.

Perhaps the most well known form of non-truth-functional propositional logic is modal propositional logic. Modal propositional logic involves introducing operators into the logic involving necessity and possibility, usually along with truth-functional operators such as ‘→’, ‘\land‘, ‘\neg‘, etc.. Typically, the sign ‘\Box‘ is used in place of the English operator, “it is necessary that…”, and the sign ‘\Diamond‘ is used in place of the English operator “it is possible that…”. Sometimes both these operators are taken as primitive, but quite often one is defined in terms of the other, since \ulcorner \neg \Box \neg \alpha \urcorner would appear to be logically equivalent with \ulcorner \Diamond \alpha \urcorner. (Roughly, it means the same to say that something is not necessarily not true as it does to say that it is possibly true.)

To see that modal propositional logic is not truth-functional, just consider the following pair of statements:

\Box P
\Box (P \lor \neg P)

The first states that it is necessary that P. Let us suppose in fact that ‘P‘ is true, but might have been false. Since P is not necessarily true, the statement “\Box P” is false. However, the statement “P \lor \neg P” is a tautology and so it could not be false. Hence, the statement “\Box (P \lor \neg P)” is true. Notice that both ‘P‘ and “P \lor \neg P” are true, but different truth-values result when the operator ‘\Box‘ is added. So, in modal propositional logic, the truth-value of a statement does not depend entirely on the truth-values of the parts.

The study of modal propositional logic involves identifying under what conditions statements involving the operators ‘\Box‘ and ‘\Diamond‘ should be regarded as true. Different notions or conceptions of necessity lead to different answers to that question. It also involves discovering what inference rules or systems of deduction would be appropriate given the addition of these operators. Here, there is more controversy than with classical truth-functional logic. For example, in the context of discussions of axiomatic systems for modal propositional logic, very different systems result depending on whether instances of the following schemata are regarded as axiomatic truths, or even truths at all:

\Box \alpha \rightarrow \Box \Box \alpha
\Diamond \alpha \rightarrow \Box \Diamond \alpha

If a statement is necessary, is it necessarily necessary? If a statement is possible, is it necessarily possible? A positive answer to the first question is a key assumption in a logical system known as S4 modal logic. Positive answers to both these questions are key assumptions in a logical system known as S5 modal logic. Other systems of modal logic that avoid such assumptions have also been developed. (For an excellent introduction survey, see Hughes and Cresswell 1996.)

Deontic propositional logic and epistemic propositional logic are two other forms of non-truth-functional propositional logic. The former involves introduction of operators similar to the English operators “it is morally obligatory that…” and “it is morally permissible that…”. Obviously, some things that are in fact true were not morally obligatory, whereas some things that are true were morally obligatory. Again, the truth-value of a statement in deontic logic does not depend wholly on the truth-value of the parts. Epistemic logic involves the addition of operators similar to the English operators “it is known that…” and “it is believed that …”. While everything that is known to be the case is in fact the case, not everything that is the case is known to be the case, so a statement built up with a “it is known that…” will not depend entirely on the truth of the proposition it modifies, even if it depends on it to some degree.

Yet another widely studied form of non-truth-functional propositional logic is relevance propositional logic, which involves the addition of an operator ‘Rel‘ used to connect two statements \alpha and \beta to form a statement \ulcorner Rel(\alpha, \beta) \urcorner, which is interpreted to mean that \alpha is related to \beta in theme or subject matter. For example, if ‘P‘ means that Ben loves Jennifer and ‘Q‘ means that Jennifer is a pop star, then the statement “Rel(P, Q)” is regarded as true; whereas if ‘S‘ means The sun is shining in Tokyo, then “Rel(P, S)” is false, and hence “\neg Rel(P, S)” is true. Obviously, whether or not a statement formed using the connective ‘Rel‘ is true does not depend solely on the truth-value of the propositions involved.

One of the motivations for introducing non-truth-functional propositional logics is to make up for certain oddities of truth-functional logic. Consider the truth table for the sign ‘→’ used in Language PL. A statement of the form \ulcorner \alpha \rightarrow \beta \urcorner is regarded as true whenever its antecedent is false or consequent is true. So if we were to translate the English sentence, “if the author of this article lives in France, then the moon is made of cheese” as “E \rightarrow M“, then strangely, it comes out as true given the semantics of the sign ‘→’ because the antecedent, ‘E‘, is false. In modal propositional logic it is possible to define a much stronger sort of operator to use to translate English conditionals as follows:

\ulcorner \alpha\beta \urcorner is defined as \ulcorner \Box (\alpha \rightarrow \beta) \urcorner

If we transcribe the English “if the author of this article lives in France, then the moon is made of cheese” instead as “EM“, then it does not come out as true, because presumably, it is possible for the author of this article to live in France without the moon being made of cheese. Similarly, in relevance logic, one could also define a stronger sort of connective as follows:

\ulcorner \alpha \Rightarrow \beta \urcorner is defined as \ulcorner Rel(\alpha, \beta) \land (\alpha \rightarrow \beta) \urcorner

Here too, if we were to transcribe the English “if the author of this article lives in France, then the moon is made of cheese” as “E \Rightarrow M” instead of simply “E \rightarrow M“, it comes out as false, because the author of this article living in France is not related to the composition of the moon.

Besides non-truth-functional logic, other logical systems differ from classical truth-functional logic by allowing statements to be assigned truth-values other than truth or falsity, or to be assigned neither truth nor falsity or both truth and falsity. These sorts of logical systems may still be truth-functional in the sense that the truth-value of a complex statement may depend entirely on the truth-values of the parts, but the rules governing such truth-functionality would be more complicated than for classical logic, because it must consider possibilities that classical logic rejects.

Many-valued or multivalent logics are those that consider more than two truth-values. They may admit anything from three to an infinite number of possible truth-values. The simplest sort of many-valued logic is one that admits three truth-values, for example, truth, falsity and indeterminancy. It might seem, for example, that certain statements such as statements about the future, or paradoxical statements such as “this sentence is not true” cannot easily be assigned either truth or falsity, and so, it might be concluded, must have an indeterminate truth-value. The admission of this third truth-value requires one to expand the truth tables given in Section III(a). There, we gave a truth table for statements formed using the operator ‘→’; in three-valued logic, we have to decide what the truth-value of a statement of the form \ulcorner \alpha \rightarrow \beta \urcorner is when either or both of \alpha and \beta has an indeterminate truth-value. Arguably, if any component of a statement is indeterminate in truth-value, then the whole statement is indeterminate as well. This would lead to the following expanded truth table:

\alpha \beta (\alpha \rightarrow \beta)
T
T
T
I
I
I
F
F
F
T
I
F
T
I
F
T
I
F
T
I
F
I
I
I
T
I
T

However, we might wish to retain the feature of classical logic that a statement of the form \ulcorner \alpha \rightarrow \beta \urcorner is always true when its antecedent is false or its consequent is true, and hold that it is indeterminate only when its antecedent is indeterminate and its consequent false or when its antecedent is true and its consequent indeterminate, so that its truth table appears:

\alpha \beta (\alpha \rightarrow \beta)
T
T
T
I
I
I
F
F
F
T
I
F
T
I
F
T
I
F
T
I
F
T
T
I
T
T
T

Such details will have an effect on the remainders of the logical systems. For example, if an axiomatic or natural deduction system is created, and a desirable feature is that something be provable from no premises if and only if it is a tautology in the sense of being true (and not just not false) for all possible truth-value assignments, if we make use of the first truth table for ‘→’, then “P \rightarrow P” should not be provable, because it is indeterminate when ‘P‘ is, whereas if we use the second truth table, then “P \rightarrow P” should be provable, since it is a tautology according to that truth table, that is, it is true regardless of which of the three truth-values is assigned to ‘P‘.

Here we get just a glimpse at the complications created by admitting more than two truth-values. If more than three are admitted, and possibly infinitely many, then the issues become even more complicated.

Intuitionistic propositional logic results from rejecting the assumption that every statement is true or false, and countenances statements that are neither. The result is a sort of logic very much akin to a three-valued logic, since “neither true nor false”, while strictly speaking the rejection of a truth-value, can be thought of as though it were a third truth-value. In intuitionistic logic, the so-called “law of excluded middle,” that is, the law that all statements of the form \ulcorner \alpha \lor \neg \alpha \urcorner are true is rejected. This is because intuitionistic logic takes truth to coincide with direct provability, and it may be that certain statements, such as Goldbach’s conjecture in mathematics, are neither provably the case nor provably not the case.

Paraconsistent propositional logic is even more radical, in countenancing statements that are both true and false. Again, depending on the nature of the system, semantic rules have to be given that determine what the truth-value or truth-values a complex statement has when its component parts are both true and false. Such decisions determine what sorts of new or restricted rules of inference would apply to the logical system. For example, paraconsistent logics, if not trivial, must restrict the rules of inference allowable in classical truth-functional logic, because in systems such as those sketched in Sections V and VI above, from a contradiction, that is, a statement of the form \ulcorner \alpha \land \neg \alpha \urcorner, it is possible to deduce any other statement. Consider, for example, the following deduction in the natural deduction system sketched in Section V.

1. P \land \neg P Premise
2. P 1 Simp
3. \neg P 1 Simp
4. P \lor Q 2 Add
5. Q 3,4 DS

In order to avoid this result, paraconsistent logics must restrict the notion of a valid inference. In order for an inference to be considered valid, not only must it be truth-preserving, that is, that it be impossible to arrive at something untrue when starting with true premises, it must be falsity-avoiding, that is, it must be impossible, starting with true premises, to arrive at something that is false. In paraconsistent logic, where a statement can be both true and false, these two requirements do not coincide. The inference rule of disjunctive syllogism, while truth-preserving, is not falsity-avoiding. In cases in which its premises are true, its conclusion can still be false; more specifically, provided that at least one of its premises is both true and false, its conclusion can be false.

Other forms of non-classical propositional logic, and non-truth-functional propositional logic, continue to be discovered. Obviously any deviance from classical bivalent propositional logic raises complicated logical and philosophical issues that cannot be fully explored here. For more details both on non-classical logic, and on non-truth-functional logic, see the recommended reading section.

9. Suggestions for Further Reading

  • Anderson, A. R. and N. D. Belnap [and J. M. Dunn]. 1975 and 1992. Entailment. 2 vols. Princeton, NJ: Princeton University Press.
  • Bocheński, I. M. 1961. A History of Formal Logic. Notre Dame, Ind.: University of Notre Dame Press.
  • Boole, George. 1847. The Mathematical Analysis of Logic. Cambridge: Macmillan.
  • Boole, George. 1854. An Investigation into the Laws of Thought. Cambridge: Macmillan.
  • Carroll, Lewis. 1958. Symbolic Logic and the Game of Logic. London: Dover.
  • Church, Alonzo. 1956. Introduction to Mathematical Logic. Princeton, NJ: Princeton University Press.
  • Copi, Irving. 1953. Introduction to Logic. New York: Macmillan.
  • Copi, Irving. 1974. Symbolic Logic. 4th ed. New York: Macmillan.
  • da Costa, N. C. A. 1974. “On the Theory of Inconsistent Formal Systems,” Notre Dame Journal of Formal Logic 25: 497-510.
  • De Morgan, Augustus. 1847. Formal Logic. London: Walton and Maberly.
  • Fitch, F. B. 1952. Symbolic Logic: An Introduction. New York: Ronald Press.
  • Frege, Gottlob. 1879. Begriffsschrift, ene der arithmetischen nachgebildete Formelsprache des reinen Denkens. Halle: L. Nerbert. Published in English as Conceptual Notation, ed. and trans. by Terrell Bynum. Clarendon: Oxford, 1972.
  • Frege, Gottlob. 1923. “Gedankengefüge,” Beträge zur Philosophie des deutchen Idealismus 3: 36-51. Published in English as “Compound Thoughts,” in The Frege Reader, edited by Michael Beaney. Oxford: Blackwell, 1997.
  • Gentzen, Gerhard. 1934. “Untersuchungen über das logische Schließen” Mathematische Zeitschrift 39: 176-210, 405-31. Published in English as “Investigations into Logical Deduction,” in Gentzen 1969.
  • Gentzen, Gerhard. 1969. Collected Papers. Edited by M. E. Szabo. Amsterdam: North-Holland Publishing.
  • Haack, Susan. 1996. Deviant Logic, Fuzzy Logic. Chicago: University of Chicago Press.
  • Herbrand, Jacques. 1930. “Recherches sur la théorie de la démonstration,” Travaux de la Société des Sciences et de la Lettres de Varsovie 33: 133-160.
  • Hilbert, David and William Ackermann. 1950. Principles of Mathematical Logic. New York: Chelsea.
  • Hintikka, Jaakko. 1962. Knowledge and Belief: An Introduction to the Logic of the Two Notions. Ithaca: Cornell University Press.
  • Hughes, G. E. and M. J. Cresswell. 1996. A New Introduction to Modal Logic. London: Routledge.
  • Jevons, W. S. 1880. Studies in Deductive Logic. London: Macmillan.
  • Kalmár, L. 1935. “Über die Axiomatisierbarkeit des Aussagenkalküls,” Acta Scientiarum Mathematicarum 7: 222-43.
  • Kleene, Stephen C. 1952. Introduction to Metamathematics. Princeton, NJ: Van Nostrand.
  • Kneale, William and Martha Kneale. 1962. The Development of Logic. Clarendon: Oxford.
  • Lewis, C. I. and C. H. Langford. 1932. Symbolic Logic. New York: Dover.
  • Łukasiewicz, Jan. 1920. “O logice trojwartosciowej,” Ruch Filozoficny 5: 170-171. Published in English as “On Three-Valued Logic,” in Łukasiewicz 1970.
  • Łukasiewicz, Jan. 1970. Selected Works. Amsterdam: North-Holland.
  • Łukasiewicz, Jan and Alfred Tarski. 1930. “Untersuchungen über den Aussagenkalkül,” Comptes Rendus des séances de la Société des Sciences et de la Lettres de Varsovie 32: 30-50. Published in English as “Investigations into the Sentential Calculus,” in Tarski 1956.
  • Mally, Ernst. 1926. Grundgesetze des Sollens: Elemente der Logik des Willens. Graz: Leuschner und Lubensky.
  • McCune, William, Robert Veroff, Branden Fitelson, Kenneth Harris, Andrew Feist and Larry Wos. 2002. “Short Single Axioms for Boolean Algebra,” Journal of Automated Reasoning 29: 1-16.
  • Mendelson, Elliot. 1997. Introduction to Mathematical Logic. 4th ed. London: Chapman and Hall.
  • Meredith, C. A. 1953. “Single Axioms for the Systems (C, N), (C, O) and (A, N) of the Two-valued Propositional Calculus,” Journal of Computing Systems 3: 155-62.
  • Müller, Eugen, ed. 1909. Abriss der Algebra der Logik, by E. Schröder. Leipzig: Teubner.
  • Nicod, Jean. 1917. “A Reduction in the Number of the Primitive Propositions of Logic,” Proceedings of the Cambridge Philosophical Society 19: 32-41.
  • Peirce, C. S. 1885. “On the Algebra of Logic,” American Journal of Mathematics 7: 180-202.
  • Post, Emil. 1921. “Introduction to a General Theory of Propositions,” American Journal of Mathematics 43: 163-185.
  • Priest, Graham, Richard Routley and Jean Norman, eds. 1990. Paraconsistent Logic. Munich: Verlag.
  • Prior, Arthur. 1990. Formal Logic. 2nd. ed. Oxford: Oxford University Press.
  • Read, Stephen, 1988. Relevant Logic. New York: Blackwell.
  • Rescher, Nicholas. 1966. The Logic of Commands. London: Routledge and Kegan Paul.
  • Rescher, Nicholas. 1969. Many-Valued Logic. New York: McGraw Hill.
  • Rosser, J. B. 1953. Logic for Mathematicians. New York: McGraw Hill.
  • Russell, Bertrand. 1906. “The Theory of Implication,” American Journal of Mathematics 28: 159-202.
  • Schlesinger, G. N. 1985. The Range of Epistemic Logic. Aberdeen: Aberdeen University Press.
  • Sheffer, H. M. 1913. “A Set of Five Postulates for Boolean Algebras with Application to Logical Constants,” Transactions of the American Mathematical Society 14: 481-88.
  • Smullyan, Raymond. 1961. Theory of Formal Systems. Princeton: Princeton University Press.
  • Tarski, Alfred. 1956. Logic, Semantics and Meta-Mathematics. Oxford: Oxford University Press.
  • Urquhart, Alasdair. 1986. “Many-valued Logic,” In Handbook of Philosophical Logic, vol. 3, edited by D. Gabbay and F. Guenthner. Dordrecht: Reidel.
  • Venn, John. 1881. Symbolic Logic. London: Macmillan.
  • Whitehead, Alfred North and Bertrand Russell. 1910-1913. Principia Mathematica. 3 vols. Cambridge: Cambridge University Press.
  • Wittgenstein, Ludwig. 1922. Tractatus Logico-Philosophicus. London: Routledge and Kegan Paul.

Author Information

Kevin C. Klement
Email: klement@philos.umass.edu
University of Massachusetts, Amherst
U. S. A.

Epsilon Calculi

Epsilon Calculi are extended forms of the predicate calculus that incorporate epsilon terms. Epsilon terms are individual terms of the form ‘εxFx’, being defined for all predicates in the language. The epsilon term ‘εxFx’ denotes a chosen F, if there are any F’s, and has an arbitrary reference otherwise. Epsilon calculi were originally developed to study certain forms of arithmetic, and set theory; also to prove some important meta-theorems about the predicate calculus. Later formal developments have included a variety of intensional epsilon calculi, of use in the study of necessity, and more general intensional notions, like belief. An epsilon term such as ‘εxFx’ was originally read as ‘the first F’, and in arithmetical contexts as ‘the least F’. More generally it can be read as the demonstrative description ‘that F’, when arising either deictically, that is, in a pragmatic context where some F is being pointed at, or in linguistic cross-reference situations, as with, for example, ‘There is a red-haired man in the room. That red-haired man is Caucasian’. The application of epsilon terms to natural language shares some features with the use of iota terms within the theory of descriptions given by Bertrand Russell, but differs in formalising aspects of a slightly different theory of reference, first given by Keith Donnellan. More recently, epsilon terms have been used by a number of writers to formalise cross-sentential anaphora, which would arise if ‘that red-haired man’ in the linguistic case above was replaced with a pronoun such as ‘he’. There is then also the similar application in intensional cases, like ‘There is a red-haired man in the room. Celia believed he was a woman.’

Table of Contents

  1. Introduction
  2. Descriptions and Identity
  3. Rigid Epsilon Terms
  4. The Epsilon Calculus’ Problematic
  5. The Formal Semantics of Epsilon Terms
  6. Some Metatheory
  7. References and Further Reading

1. Introduction

Epsilon terms were introduced by the german mathematician David Hilbert, in Hilbert 1923, 1925, to provide explicit definitions of the existential and universal quantifiers, and resolve some problems in infinitistic mathematics. But it is not just the related formal results, and structures which are of interest. In Hilbert’s major book Grundlagen der Mathematik, which he wrote with his collaborator Paul Bernays, epsilon terms were presented as formalising certain natural language constructions, like definite descriptions. And they in fact have a considerably larger range of such applications, for instance in the symbolisation of certain cross-sentential anaphora. Hilbert and Bernays also used their epsilon calculus to prove two important meta-theorems about the predicate calculus. One theorem subsequently led, for instance, to the development of semantic tableaux: it is called the First Epsilon Theorem, and its content and proof will be given later, in section 6 below. A second theorem that Hilbert and Bernays proved, which we shall also look at then, establishes that epsilon calculi are conservative extensions of the predicate calculus, that is, that no more theorems expressible just in the quantificational language of the predicate calculus can be proved in epsilon calculi than can be proved in the predicate calculus itself. But while epsilon calculi do have these further important formal functions, we will not only be concerned to explore them, for we shall also first discuss the natural language structures upon which epsilon calculi have a considerable bearing.

The growing awareness of the larger meaning and significance of epsilon calculi has only come in stages. Hilbert and Bernays introduced epsilon terms for several meta-mathematical purposes, as above, but the extended presentation of an epsilon calculus, as a formal logic of interest in its own right, in fact only first appeared in Bourbaki’s Éléments de Mathématique (although see also Ackermann 1937-8). Bourbaki’s epsilon calculus with identity (Bourbaki, 1954, Book 1) is axiomatic, with Modus Ponens as the only primitive inference or derivation rule. Thus, in effect, we get:

(X ∨ X) → X,
X → (X ∨ Y),
(X ∨ Y) → (Y ∨ X),
(X ∨ Y) → ((Z ∨ X) → (Z ∨ Y)),
Fy → FεxFx,
x = y → (Fx ↔ Fy),
(x)(Fx ↔ Gx) → εxFx = εxGx.

This adds to a basis for the propositional calculus an epsilon axiom schema, then Leibniz’ Law, and a second epsilon axiom schema, which is a further law of identity. Bourbaki, though, used the Greek letter tau rather than epsilon to form what are now called ‘epsilon terms’; nevertheless, he defined the quantifiers in terms of his tau symbol in the manner of Hilbert and Bernays, namely:

(∃x)Fx ↔ FεxFx,
(x)Fx ↔ Fεx¬Fx;

and note that, in his system the other usual law of identity, ‘x = x’, is derivable.

The principle purpose Bourbaki found for his system of logic was in his theory of sets, although through that, in the modern manner, it thereby came to be the foundation for the rest of mathematics. Bourbaki’s theory of sets discriminates amongst predicates those which determine sets: thus some, but only some, predicates determine sets, i.e. are ‘collectivisantes’. All the main axioms of classical Set Theory are incorporated in his theory, but he does not have an Axiom of Choice as a separate axiom, since its functions are taken over by his tau symbol. The same point holds in Bernays’ epsilon version of his set theory (Bernays 1958, Ch VIII).

Epsilon calculi, during this period, were developed without any semantics, but a semantic interpretation was produced by Gunter Asser in 1957, and subsequently published in a book by A.C. Leisenring, in 1969. Even then, readings of epsilon terms in ordinary language were still uncommon. A natural language reading of epsilon terms, however, was present in Hilbert and Bernays’ work. In fact the last chapter of book 1 of the Grundlagen is a presentation of a theory of definite descriptions, and epsilon terms relate closely to this. In the more well known theory of definite descriptions by Bertrand Russell (Russell 1905) there are three clauses: with

The king of France is bald

we get, on Russell’s theory, first

there is a king of France,

second

there is only one king of France,

and third

anyone who is king of France is bald.

Russell uses the Greek letter iota to formalise the definite description, writing the whole

BιxKx,

but he recognises the iota term is not a proper individual symbol. He calls it an ‘incomplete symbol’, since, because of the three parts, the whole proposition is taken to have the quantificational analysis,

(∃x)(Kx & (y)(Ky → y = x) & (y)(Ky → By)),

which is equivalent to

(∃x)(Kx & (y)(Ky→ y = x) & Bx).

And that means that it does not have the form ‘Bx’. Russell believed that, in addition to his iota terms, there was another class of individual terms, which he called ‘logically proper names’. These would simply fit into the ‘x’ place in ‘Bx’. He believed that ‘this’ and ‘that’ were in this class, but gave no symbolic characterisation of them.

Hilbert and Bernays, by contrast, produced what is called a ‘pre-suppositional theory’ of definite descriptions. The first two clauses of Russell’s definition were not taken to be part of the meaning of ‘The King of France is bald’: they were merely conditions under which they took it to be permitted to introduce a complete individual term for ‘the King of France’, which then satisfies

Kx & (y)(Ky → y = x).

Hilbert and Bernays continued to use the Greek letter iota in their individual term, although it has a quite different grammar from Russell’s iota term, since, when Hilbert and Bernays’ term can be introduced, it is provably equivalent to the corresponding epsilon term (Kneebone 1963, p102). In fact it was later suggested by many that epsilon terms are not only complete symbols, but can be seen as playing the same role as the ‘logically proper names’ Russell discussed.

It is at the start of book 2 of the Grundlagen that we find the definition of epsilon terms. There, Hilbert and Bernays first construct a theory of indefinite descriptions in a similar manner to their theory of definite descriptions. They allow, now, an eta term to be introduced as long as just the first of Russell’s conditions is met. That is to say, given

(∃x)Fx,

one can introduce the term ‘ηxFx’, and say

FηxFx.

But the condition for the introduction of the eta term can be established logically, for certain predicates, since

(∃x)((∃y)Fy → Fx),

is a predicate calculus theorem (Copi 1973, p110). It is the eta term this theorem allows us to introduce which is otherwise called an epsilon term, and its logical basis enables entirely formal theories to be constructed, since such individual terms are invariably defined. Thus we may invariably introduce ‘ηx((∃y)Fy → Fx)’, and this is commonly written ‘εxFx’, about which we can therefore say

(∃y)Fy → FεxFx.

Since it is that F which exists if anything is F, Hilbert read the epsilon term in this case ‘the first F’. For instance, in arithmetic, ‘the first’ may be taken to be the least number operator. However, while if there are F’s then the first F is clearly some chosen one of them, if there are no F’s then ‘the first F’ must be a misnomer. And that form of speech only came to be fully understood in the theories of reference which appeared much later, when reference and denotation came to be more clearly separated from description and attribution. Donnellan (Donnellan 1966) used the example ‘the man with martini in his glass’, and pointed out that, in certain uses, this can refer to someone without martini in his glass. In the terminology Donnellan made popular, ‘the first F’, in the second case above works similarly: it cannot be attributive, and so, while it refers to something, it must refer arbitrarily, from a semantic point of view.

With reference in this way separated from attribution it becomes possible to symbolise the anaphoric cross-reference between, for instance, ‘There is one and only one king of France’ and ‘He is bald’. For, independently of whether the former is true, the ‘he’ in the latter is a pronoun for the epsilon term in the former — by a simple extension of the epsilon definition of the existential quantifier. Thus the pair of remarks may be symbolised

(∃x)(Kx & (y)(Ky → y = x)) & Bεx(Kx & (y)(Ky → y = x)).

Furthermore such cross-reference may occur in connection with intensional constructions of a kind Russell also considered, such as

George IV wondered whether the author of Waverley was Scott.

Thus we can say ‘There is an author of Waverley, and George IV wondered whether he was Scott’. But the epsilon analysis of these cases puts intensional epsilon calculi at odds with Russellian views of such constructions, as we shall see later. The Russellian approach, by not having complete symbols for individuals, tends to confuse cases in which assertions are made about individuals and cases in which assertions are made about identifying properties. As we shall see, epsilon terms enable us to make the discrimination between, for instance,

s = εx(y)(Ay ↔ y = x),

(i.e. ‘Scott is the author of Waverley’), and

(y)(Ay ↔ y = s),

(that is, ‘there is one and only one author of Waverley and he is Scott’), and so it enables us to locate more exactly the object of George IV’s thought.

2. Descriptions and Identity

When one starts to ask about the natural language meaning of epsilon terms, it is interesting that Leisenring just mentions the ‘formal superiority’ of the epsilon calculus (Leisenring 1969, p63, see also Routley 1969, Hazen 1987). Leisenring took the epsilon calculus to be a better logic than the predicate calculus, but merely because of the Second Epsilon Theorem. Its main virtue, to Leisenring, was that it could prove all that seemingly needed to be proved, but in a more elegant way. Epsilon terms were just neater at calculating which were the valid theorems of the predicate calculus.

Remembering Hilbert and Bernays’ discussion of definite and indefinite descriptions, clearly there is more to the epsilon calculus than this. And there are, in fact, two specific theorems provable within the epsilon calculus, though not the predicate calculus, which will start to indicate the epsilon calculus’ more general range of application. They concern individuals, since the epsilon calculus is distinctive in providing an appropriate, and systematic means of reference to them.

The need to have complete symbols for individuals became evident some years after Russell’s promotion of incomplete symbols for them. The first major book to allow for this was Rosser’s Logic for Mathematicians, in 1953, although there were precursors. For the classical difficulty with providing complete terms for individuals concerns what to do with ‘non-denoting’ terms, and Quine, for instance, following Frege, often gave them an arbitrary, though specific referent (Marciszewski 1981, p113). This idea is also present in Kalish and Montague (Kalish and Montague 1964, pp242-243), who gave the two rules:

(∃x)(y)(Fy ↔ y = x) ├ FιxFx,
¬(∃x)(y)(Fy ↔ y = x) ├ιxFx = ιx¬(x = x),

where ‘ιxFx’ is what otherwise might be written ‘εx(y)(Fy ↔ y = x)’. Kalish and Montague believed, however, that the second rule ‘has no intuitive counterpart, simply because ordinary language shuns improper definite descriptions’ (Kalish and Montague 1964, p244). And, at that time, what Donnellan was to publish in Donnellan 1966, about improper definite descriptions, was certainly not well known. In fact ordinary speech does not shun improper definite descriptions, although their referents are not as fixed as the above second rule requires. Indeed the very fact that the descriptions are improper means that their referents are not determined semantically: instead they are just a practical, pragmatic choice.

Stalnaker and Thomason recognised the need to be more liberal when they defined their referential terms, which also had to refer, in the contexts they were concerned with, in more than one possible world (Thomason and Stalnaker 1968, p363):

In contrast with the Russellian analysis, definite descriptions are treated as genuine singular terms; but in general they will not be substance terms [rigid designators]. An expression like ιxPx is assigned a referent which may vary from world to world. If in a given world there is a unique existing individual which has the property corresponding to P, this individual is the referent of ιxPx; otherwise, ιxPx refers to an arbitrarily chosen individual which does not exist in that world.

Stalnaker and Thomason appreciated that ‘A substance term is much like what Russell called a logically proper name’, but they said that an individual constant might or might not be a substance term, depending on whether it was more like ‘Socrates’ or ‘Miss America’ (Thomason and Stalnaker 1968, p362). A more complete investigation of identity and descriptions, in modal and general intensional contexts, was provided in Routley, Meyer and Goddard 1974, and Routley 1977, see also Hughes and Cresswell 1968, Ch 11. And with these writers we get the explicit rendering of definite descriptions in epsilon terms, as in Goddard and Routley 1973, p558, Routley 1980, p277, c.f. Hughes and Cresswell 1968, p203.

Certain specific theorems in the epsilon calculus, as was said before, support these kinds of identification. One theorem demonstrates directly the relation between Russell’s attributive, and some of Donnellan’s referential ideas. For

(∃x)(Fx & (y)(Fy → y = x) & Gx)

is logically equivalent to

(∃x)(Fx & (y)(Fy → y = x)) & Ga,

where a = εx(Fx & (y)(Fy → y = x)). This arises because the latter is equivalent to

Fa & (y)(Fy → y = a) & Ga,

which entails the former. But the former is

Fb & (y)(Fy → y = b) & Gb,

with b = εx(Fx & (y)(Fy → y = x) & Gx), and so entails

(∃x)(Fx & (y)(Fy → y = x)),

and

Fa & (y)(Fy → y = a).

But that means that, from the uniqueness clause,

a = b,

and so

Ga,

meaning the former entails the latter, and therefore the former is equivalent to the latter.

The former, of course, gives Russell’s Theory of Descriptions, in the case of ‘The F is G’; it explicitly asserts the first two clauses, to do with the existence and uniqueness of an F. A presuppositional theory, such as we saw in Hilbert and Bernays, would not explicitly assert these two clauses: on such an account they are a precondition before the term ‘the F’ can be introduced. But neither of these theories accommodate improper definite descriptions. Since Donnellan it is more common to allow that we can always use ‘the F’: if the description is improper then the referent of this term is simply found in the term’s practical use.

One detail of Donnellan’s historical account, however, must be treated with some care, at this point. Donnellan was himself concerned with definite descriptions which were improper in the sense that they did not uniquely describe what the speaker took to be their referent. So the description might still be ‘proper’ in the above sense — if there still was something to which it uniquely applied, on account of its semantic content. Thus Donnellan allowed ‘the man with martini in his glass’ to identify someone without martini in his glass irrespective of whether there was some sole man with martini in his glass. But if one talks about ‘the man with martini in his glass’ one can be correctly taken to be talking about who this describes, if it does in fact correctly describe someone — as Devitt and Bertolet pointed out in criticism of Donnellan (Devitt 1974, Bertolet 1980). It is this aspect of our language which the epsilon account matches, for an epsilon account allows definite descriptions to refer without attribution of their semantic character, but only if nothing uniquely has that semantic character. Thus it is not the whole of the first statement above , but only the third part of the second statement which makes the remark ‘The F is G’.

The difficulty with Russell’s account becomes more plain if we read the two equivalent statements using relative and personal pronouns. They then become

There is one and only one F, which is G,
There is one and only one F; it is G.

But using just the logic derived from Frege, Russell could formalise the ‘which’, but could not separate out the last clause, ‘it is G’. In that clause ‘it’ is an anaphor for ‘the (one and only) F’, and it still has this linguistic meaning if there is no such thing, since that is just a matter of grammar. But the uniqueness clause is needed for the two statements to be equivalent — without uniqueness there is no equivalence, as we shall see – so ‘which’ is not itself equivalent to ‘it’. Russell, however, because he could not separate out the ‘it’, had to take the whole of the first expression as the analysis of ‘The F is G’ — he could not formulate the needed ‘logically proper name’.

But how can something be the one and only F ‘if there is no such thing’? That is where another important theorem provable in the epsilon calculus is illuminating, namely:

(Fa & (y)(Fy → y = a)) → a = εx(Fx & (y)(Fy → y = x)).

The important thing is that there is a difference between the left hand side and the right hand side, i.e. between something being alone F, and that thing being the one and only F. For the left-right implication cannot be reversed. We get from the left to the right when we see that the left as a whole entails

(∃x)(Fx & (y)(Fy → y = x)),

and so also its epsilon equivalent

Fεx(Fx & (y)(Fy → y = x)) & (z)(Fz → z = εx(Fx & (y)(Fy → y = x))).

Given Fa, then from the second clause here we get the right hand side of our original implication. But if we substitute ‘εx(Fx & (y)(Fy → y = x))’ for ‘a’ in that implication then on the right we have something which is necessarily true. But the left hand side is then the same as

(∃x)(Fx & (y)(Fy → y = x)),

and that is in general contingent. Hence the implication cannot generally be reversed. Having the property of being alone F is here contingent, but possessing the identity of the one and only F is necessary.

The distinction is not made in Russell’s logic, since possession of the relevant property is the only thing which can be formally expressed there. In Russell’s theory of descriptions, a’s possession of the property of being alone a king of France is expressed as a quasi identity

a = ιxKx,

and that has the consequence that such identities are contingent. Indeed, in counterpart theories of objects in other possible worlds the idea is pervasive that an entity may be defined in terms of its contingent properties in a given world. Hughes and Cresswell, however, differentiated between contingent identities and necessary identities in the following way (Hughes and Cresswell 1968, p191):

Now it is contingent that the man who is in fact the man who lives next door is the man who lives next door, for he might have lived somewhere else; that is living next door is a property which belongs contingently, not necessarily, to the man to whom it does belong. And similarly, it is contingent that the man who is in fact the mayor is the mayor; for someone else might have been elected instead. But if we understand [The man who lives next door is the mayor] to mean that the object which (as a matter of contingent fact) possesses the property of being the man who lives next door is identical with the object which (as a matter of contingent fact) possesses the property of being the mayor, then we are understanding it to assert that a certain object (variously described) is identical with itself, and this we need have no qualms about regarding as a necessary truth. This would give us a way of construing identity statements which makes [(x = y) → L(x = y)] perfectly acceptable: for whenever x = y is true we can take it as expressing the necessary truth that a certain object is identical with itself.

There are more consequences of this matter, however, than Hughes and Cresswell drew out. For now that we have proper referring terms for individuals to go into such expressions as ‘x = y’, we first see better where the contingency of the properties of such individuals comes from — simply the linguistic facility of using improper definite descriptions. But we also see, because identities between such terms are necessary, that proper referring terms must be rigid, i.e. have the same reference in all possible worlds.

This is not how Stalnaker and Thomason saw the matter. Stalnaker and Thomason, it will be remembered, said that there were two kinds of individual constants: ones like ‘Socrates’ which can take the place of individual variables, and others like ‘Miss America’ which cannot. The latter, as a result, they took to be non-rigid. But it is strictly ‘Miss America in year t’ which is meant in the second case, and that is not a constant expression, even though such functions can take the place of individual variables. It was Routley, Meyer and Goddard who most seriously considered the resultant possibility that all properly individual terms are rigid. At least, they worked out many of the implications of this position, even though Routley was not entirely content with it.

Routley described several rigid intensional semantics (Routley 1977, pp185-186). One of these, for instance, just took the first epsilon axiom to hold in any interpretation, and made the value of an epsilon term itself. On such a basis Routley, Meyer and Goddard derived what may be called ‘Routley’s Formula’, i.e.

L(∃x)Fx → (∃x)LFx.

In fact, on their understanding, this formula holds for any operator and any predicate, but they had in mind principally the case of necessity illustrated here, with ‘Fx’ taken as ‘x numbers the planets’, making ‘εxFx’ ‘the number of the planets’. The formula is derived quite simply, in the following way: from

L(∃x)Fx,

we can get

LFεxFx,

by the epsilon definition of the existential quantifier, and so

(∃x)LFx,

by existential generalisation over the rigid term (Routley, Meyer and Goddard 1974, p308, see also Hughes and Cresswell 1968, pp197, 204). Routley, however, was still inclined to think that a rigid semantics was philosophically objectionable (Routley 1977, p186):

Rigid semantics tend to clutter up the semantics for enriched systems with ad hoc modelling conditions. More important, rigid semantics, whether substitutional or objectual, are philosophically objectionable. For one thing, they make Vulcan and Hephaestus everywhere indistinguishable though there are intensional claims that hold of one but not of the other. The standard escape from this sort of problem, that of taking proper names like ‘Vulcan’ as disguised descriptions we have already found wanting… Flexible semantics, which satisfactorily avoid these objections, impose a more objectual interpretation, since, even if [the domain] is construed as the domain of terms, [the value of a term in a world] has to be permitted, in some cases at least, to vary from world to world.

As a result, while Routley, Meyer and Goddard were still prepared to defend the formula, and say, for instance, that there was a number which necessarily numbers the planets, namely the number of the planets (np), they thought that this was only in fact the same as 9, so that one still could not argue correctly that as L(np numbers the planets), so L(9 numbers the planets). ‘For extensional identity does not warrant intersubstitutivity in intensional frames’ (Routley, Meyer and Goddard 1974, p309). They held, in other words that the number of the planets was only contingently 9.

This means that they denied ‘(x = y) → L(x = y)’, but, as we shall see in more detail later, there are ways to hold onto this principle, i.e. maintain the invariable necessity of identity.

3. Rigid Epsilon Terms

There is some further work which has helped us to understand how reference in modal and general intensional contexts must be rigid. But it involves some different ideas in semantics, and starts, even, outside our main area of interest, namely predicate logic, in the semantics of propositional logic.

When one thinks of ‘semantics’ one maybe thinks of the valuation of formulas. Since the 1920s a meta-study of this kind was certainly added to the previous logical interest in proof theory. Traditional proof theory is commonly associated with axiomatic procedures, but, from a modern perspective, its distinction is that it is to do with ‘object languages’. Tarski’s theory of truth relies crucially on the distinction between object languages and meta-languages, and so semantics generally seems to be necessarily a meta-discipline. In fact Tarski believed that such an elevation of our interest was forced upon us by the threat of semantic paradoxes like The Liar. If there was, by contrast, ‘semantic closure’, i.e. if truth and other semantic notions were definable at the object level, then there would be contradictions galore (c.f. Priest 1984). In this way truth may seem to be necessarily a predicate of (object-level) sentences.

But there is another way of looking at the matter which is explicitly non-Tarskian, and which others have followed (see Prior 1971, Ch 7, Sayward 1987). This involves seeing ‘it is true that’ as not a predicate, but an object-level operator, with the truth tabulations in Truth Tables, for instance, being just another form of proof procedure. Operators indeed include ‘it is provable that’, and this is distinct from Gödel’s provability predicate, as Gödel himself pointed out (Gödel 1969). Operators are intensional expressions, as in the often discussed ‘it is necessary that’ and ‘it is believed that’, and trying to see such forms of indirect discourse as metalinguistic predicates was very common in the middle of the last century. It was pervasive, for instance, in Quine’s many discussions of modality and intensionality. Wouldn’t someone be believing that the Morning Star is in the sky, but the Evening Star is not, if, respectively, they assented to the sentence ‘the Morning Star is in the sky’, and dissented from ‘the Evening Star is in the sky’? Anyone saying ‘yes’ is still following the Quinean tradition, but after Montague’s and Thomason’s work on operators (e.g. Montague 1963, Thomason 1977, 1980) many logicians are more persuaded that indirect discourse is not quotational. It is open to doubt, that is to say, whether we should see the mind in terms of the direct words which the subject would use.

The alternative involves seeing the words ‘the Morning Star is in the sky’ in such an indirect speech locution as ‘Quine believes that the Morning Star is in the sky’ as words merely used by the reporter, which need not directly reflect what the subject actually says. That is indeed central to reported speech — putting something into the reporter’s own words rather than just parroting them from another source. Thus a reporter may say

Celia believed that the man in the room was a woman,

but clearly that does not mean that Celia would use ‘the man in the room’ for who she was thinking about. So referential terms in the subordinate proposition are only certainly in the mouth of the reporter, and as a result only certainly refer to what the reporter means by them. It is a short step from this thought to seeing

There was a man in the room, but Celia believed that he was a woman,

as involving a transparent intensional locution, with the same object, as one might say, ‘inside’ the belief as ‘outside’ in the room. So it is here where rigid constant epsilon terms are needed, to symbolise the cross-sentential anaphor ‘he’, as in:

(∃x)(Mx & Rx) & BcWεx(Mx & Rx).

To understand the matter fully, however, we must make the shift from meta- to object language we saw at the propositional level above with truth. Routley, Meyer and Goddard realised that a rigid semantics required treating such expressions as ‘BcWx’ as simple predicates, and we must now see what this implies. They derived, as we saw before, ‘Routley’s Formula’

L(∃x)Fx → (∃x)LFx,

but we can now start to spell out how this is to be understood, if we hold to the necessity of identities, i.e. if we use ‘=’ so that

x = y → L(x = y).

Again a clear illustration of the validity of Routley’s Formula is provided by the number of the planets, but now we may respect the fact that some things may lack a number, and also the fact that referential, and attributive senses of terms may be distinguished. Thus if we write ‘(nx)Px’ for ‘there are n P’s’, then εn(ny)Py will be the number of P’s, and it is what numbers them (i.e. ([εn(ny)Py]x)Px) if they have a number (i.e. if (∃n)(nx)Px) — by the epsilon definition of the existential quantifier. Then, with ‘Fx’ as the proper (necessary) identity ‘x = εn(ny)Py’ Routley’s Formula holds because the number in question exists eternally, making both sides of the formula true. But if ‘Fn’ is simply the attributive ‘(ny)Py’ then this is not necessary, since it is contingent even, in the first place, that there is a number of P’s, instead of just some P, making both sides of the formula false.

Hughes and Cresswell argue against the principle saying (Hughes and Cresswell 1968, p144):

…let [Fx] be ‘x is the number of the planets’. Then the antecedent is true, for there must be some number which is the number of the planets (even if there were no planets at all there would still be such a number, namely 0): but the consequent is false, for since it is a contingent matter how many planets there are, there is no number which must be the number of the planets.

But this forgets continuous quantities, where there are no discrete items before the nomination of a unit. The number associated with some planetary material, for instance, numbers only arbitrary units of that material, and not the material itself. So the antecedent of Routley’s Formula is not necessarily true.

Quine also used the number of the planets in his central argument against quantification into modal contexts. He said (Quine 1960, pp195-197):

If for the sake of argument we accept the term ‘analytic’ as predicable of sentences (hence as attachable predicatively to quotations or other singular terms designating sentences), then ‘necessarily’ amounts to ‘is analytic’ plus an antecedent pair of quotation marks. For example, the sentence:

(1) Necessarily 9 > 4

is explained thus:

(2) ‘9 > 4’ is analytic…

So suppose (1) explained as in (2). Why, one may ask, should we preserve the operatorial form as of (1), and therewith modal logic, instead of just leaving matters as in (2)? An apparent advantage is the possibility of quantifying into modal positions; for we know we cannot quantify into quotation, and (2) uses quotation…

But is it more legitimate to quantify into modal positions than into quotation? For consider (1) even without regard to (2); surely, on any plausible interpretation, (1) is true and this is false:

(3) Necessarily the number of major planets > 4.

Since 9 = the number of major planets, we can conclude that the position of ‘9’ in (1) is not purely referential and hence that the necessity operator is opaque.

But here Quine does not separate out the referential ‘the number of the major planets is greater than 4’, i.e. ‘εn(ny)Py > 4’, from the attributive ‘There are more than 4 major planets’, i.e. ‘(∃n)((ny)Py & n > 4)’. If 9 = εn(ny)Py, then it follows that εn(ny)Py > 4, but it does not follow that (∃n)((ny)Py & n > 4). Substitution of identicals in (1), therefore, does yield (3), even though it is not necessary that there are more than 4 major planets.

We can now go into some details of how one gets the ‘x’ in such a form as ‘LFx’ to be open for quantification. For, what one finds in traditional modal semantics (see Hughes and Cresswell 1968, passim) are formulas in the meta-linguistic style, like

V(Fx, i) = 1,

which say that the valuation put on ‘Fx’ is 1, in world i. There should be quotation marks around the ‘Fx’ in such a formula, to make it meta-linguistic, but by convention they are generally omitted. To effect the change to the non-meta-linguistic point of view, we must simply read this formula as it literally is, so that the ‘Fx’ is in indirect speech rather than direct speech, and the whole becomes the operator form ‘it would be true in world i that Fx’. In this way, the term ‘x’ gets into the language of the reporter, and the meta/object distinction is not relevant. Any variable inside the subordinate proposition can now be quantified over, just like a variable outside it, which means there is ‘quantifying in’, and indeed all the normal predicate logic operations apply, since all individual terms are rigid.

A example illustrating this rigidity involves the actual top card in a pack, and the cards which might have been top card in other circumstances (see Slater 1988a). If the actual top card is the Ace of Spades, and it is supposed that the top card is the Queen of Hearts, then clearly what would have to be true for those circumstances to obtain would be for the Ace of Spades to be the Queen of Hearts. The Ace of Spades is not in fact the Queen of Hearts, but that does not mean they cannot be identical in other worlds (c.f. Hughes and Cresswell, 1968, p190). Certainly if there were several cards people variously thought were on top, those cards in the various supposed circumstances would not provide a constant c such that Fc is true in all worlds. But that is because those cards are functions of the imagined worlds — the card a believes is top (εxBaFx) need not be the card b believes is top (εxBbFx), etc. It still remains that there is a constant, c, such that Fc is true in all worlds. Moreover, that c is not an ‘intensional object’, for the given Ace of Spades is a plain and solid extensional object, the actual top card (εxFx).

Routley, Meyer and Goddard did not accept the latter point, wanting a rigid semantics in terms of ‘intensional objects’ (Goddard and Routley, 1973, p561, Routley, Meyer and Goddard, 1974, p309, see also Hughes and Cresswell 1968, p197). Stalnaker and Thomason accepted that certain referential terms could be functional, when discriminating ‘Socrates’ from ‘Miss America’ — although the functionality of ‘Miss America in year t’ is significantly different from that of ‘the top card in y’s belief’. For if this year’s Miss America is last year’s Miss America, still it is only one thing which is identical with itself, unlike with the two cards. Also, there is nothing which can force this year’s Miss America to be last year’s different Miss America, in the way that the counterfactuality of the situation with the playing cards forces two non-identical things in the actual world to be the same thing in the other possible world. Other possible worlds are thus significantly different from other times, and so, arguably, other possible worlds should not be seen from the Realist perspective appropriate for other times — or other spaces.

4. The Epsilon Calculus’ Problematic

It might be said that Realism has delayed a proper logical understanding of many of these things. If you look ‘realistically’ at picturesque remarks like that made before, namely ‘the same object is ‘inside’ the belief as ‘outside’ in the room’, then it is easy for inappropriate views about the mind to start to interfere, and make it seem that the same object cannot be in these two places at once. But if the mind were something like another space or time, then counterfactuality could get no proper purchase — no one could be ‘wrong’, since they would only be talking about elements in their ‘world’, not any objective, common world. But really, all that is going on when one says, for instance,

There was a man in the room, but Celia believed he was a woman,

is that the same term — or one term and a pronominal surrogate for it — appears at two linguistic places in some discourse, with the same reference. Hence there is no grammatical difference between the cross reference in such an intensional case and the cross reference in a non-intensional case, such as

There was a man in the room. He was hungry.

i.e.

(∃x)Mx & HεxMx.

What has been difficult has merely been getting a symbolisation of the cross-reference in this more elementary kind of case. But it just involves extending the epsilon definition of existential statements, using a reiteration of the substituted epsilon term, as we can see.

It is now widely recognised how the epsilon calculus allows us to do this (Purdy 1994, Egli and von Heusinger 1995, Meyer Viol 1995, Ch 6), the theoretical starting point being the theorem about the Russellian theory of definite descriptions proved before, which breaks up what otherwise would be a single sentence into a sequential piece of discourse, enabling the existence and uniqueness clauses to be put in one sentence while the characterising remark is in another. The relationship starts to matter when, in fact, there is no obvious way to formulate a combination of anaphoric remarks in the predicate calculus, as in, for instance,

There is a king of France. He is bald,

where there is no uniqueness clause. This difficulty became a major problem when logicians started to consider anaphoric reference in the 1960s.

Geach, for instance, in Geach 1962, even believed there could not be a syllogism of the following kind (Geach 1962, p126):

A man has just drunk a pint of sulphuric acid.
Nobody who drinks a pint of sulphuric acid lives through the day.
So, he won’t live through the day.

He said, one could only draw the conclusion:

Some man who has just drunk a pint of sulphuric acid won’t live through the day.

Certainly one can only derive

(∃x)(Mx & Dx & ¬Lx)

from

(∃x)(Mx & Dx),

and

(x)(Dx → ¬Lx),

within predicate logic. But one can still derive

¬Lεx(Mx & Dx),

within the epsilon calculus.

Geach likewise was foxed later when he produced his famous case (numbered 3 in Geach 1967):

Hob thinks a witch has blighted Bob’s mare, and Nob wonders whether she (the same witch) killed Cob’s sow,

which is, in epsilon terms

Th(∃x)(Wx & Bxb) & OnKεx(Wx & Bxb)c.

For Geach saw that this could not be (4)

(∃x)(Wx & ThBxb & OnKxc),

or (5)

(∃x)(Th(Wx & Bxb)& OnKxc).

But also a reading of the second clause as (c.f. 18)

Nob wonders whether the witch who blighted Bob’s mare killed Cob’s sow,

in which ‘the witch who blighted Bob’s mare killed Cob’s sow’ is analysed in the Russellian manner, i.e. as (20)

just one witch blighted Bob’s mare and she killed Cob’s sow,

Geach realised does not catch the specific cross-reference — amongst other things because of the uniqueness condition which is then introduced.

This difficulty with the uniqueness clause in Russellian analyses has been widely commented on, although a recent theorist, Neale, has said that Russell’s theory only needs to be modestly modified: Neale’s main idea is that, in general, definite descriptions should just be localised to the context. His resolution of Geach’s troubling cases thus involves suggesting that ‘she’, in the above, might simply be ‘the witch we have been hearing about’ (Neale 1990, p221). Neale might here have said ‘that witch who blighted Bob’s mare’, showing that an Hilbertian account of demonstrative descriptions would have a parallel effect.

A good deal of the ground breaking work on these matters, however, was done by someone again much influenced by Russell: Evans. But Evans significantly broke with Russell over uniqueness (Evans 1977, pp516-517):

One does not want to be committed, by this way of telling the story, to the existence of a day on which just one man and boy walked along a road. It was with this possibility in mind that I stated the requirement for the appropriate use of an E-type pronoun in terms of having answered, or being prepared to answer upon demand, the question ‘He? Who?’ or ‘It? Which?’ In order to effect this liberalisation we should allow the reference of the E-type pronoun to be fixed not only by predicative material explicitly in the antecedent clause, but also by material which the speaker supplies upon demand. This ruling has the effect of making the truth conditions of such remarks somewhat indeterminate; a determinate proposition will have been put forward only when the demand has been made and the material supplied.

It was Evans who gave us the title ‘E-type pronoun’ for the ‘he’ in such expressions as

A Cambridge philosopher smoked a pipe, and he drank a lot of whisky,

i.e., in epsilon terms,

(∃x)(Cx & Px) & Dεx(Cx & Px).

He also insisted (Evans 1977, p516) that what was unique about such pronouns was that this conjunction of statements was not equivalent to

A Cambridge philosopher, who smoked a pipe, drank a lot of whisky,

i.e.

(∃x)(Cx & Px & Dx).

Clearly the epsilon account is entirely in line with this, since it illustrates the point made before about cases without a uniqueness clause. Only the second expression, which contains a relative pronoun, is formalisable in the predicate calculus. To formalise the first expression, which contains a personal pronoun, one at least needs something with the expressive capabilities of the epsilon calculus.

5. The Formal Semantics of Epsilon Terms

The semantics of epsilon terms is nowadays more general, but the first interpretations of epsilon terms were restricted to arithmetical cases, and specifically took epsilon to be the least number operator. Hilbert and Bernays developed Arithmetic using the epsilon calculus, using the further epsilon axiom schema (Hilbert and Bernays 1970, Book 2, p85f, c.f. Leisenring 1969, p92) :

(εxAx = st) → ¬At,

where ‘s’ is intended to be the successor function, and ‘t’ is any numeral. This constrains the interpretation of the epsilon symbol, but the least number interpretation is not strictly forced, since the axiom only ensures that no number having the property A immediately precedes εxAx.

The new axiom, however, is sufficient to prove mathematical induction, in the form:

(A0 & (x)(Ax → Asx)) → (x)Ax.

For assume the reverse, namely

A0 & (x)(Ax → Asx) & ¬(x)Ax,

and consider what happens when the term ‘εx¬Ax’ is substituted in

t = 0 ∨ t = sn,

which is derivable from the other axioms of number theory which Hilbert and Bernays are using. If we had

εx¬Ax = 0,

then, since it is given that A0, then we would have Aεx¬Ax. But since, by the definition of the universal quantifier,

Aεx¬Ax ↔ (x)Ax,

we know, because ¬(x)Ax is also given, that ¬Aεx¬Ax, which means we cannot have εx¬Ax = 0. Hence we must have the other alternative, i.e.

εx¬Ax = sn,

for some n. But from the new axiom

(εx¬Ax = sn) → An,

hence we must have An, although we must also have

An → Asn,

because (x)(Ax → Asx). All together that requires Aεx¬Ax again, which is impossible. Hence the further epsilon axiom is sufficient to establish the given principle of induction.

The more general link between epsilon terms and choice functions was first set out by Asser, although Asser’s semantics for an elementary epsilon calculus without the second epsilon axiom makes epsilon terms denote rather complex choice functions. Wilfrid Meyer Viol, calling an epsilon calculus without the second axiom an ‘intensional’ epsilon calculus, makes the epsilon terms in such a calculus instead name Skolem functions. Skolem functions are also called Herbrand functions, although they arise in a different way, namely in Skolem’s Theorem. Skolem’s Theorem states that, if a formula in prenex normal form is provable in the predicate calculus, then a certain corresponding formula, with the existential quantifiers removed, is provable in a predicate calculus enriched with function symbols. The functions symbolised are called Skolem functions, although, in another context, they would be Herbrand functions.

Skolem’s Theorem is a meta-logical theorem, about the relation between two logical calculi, but a non-metalogical version is in fact provable in the epsilon calculus from which Skolem’s actual theorem follows, since, for example, we can get, by the epsilon definition, now of the existential quantifier

(x)(∃y)Fxy ↔ (x)FxεyFxy.

As a result, if the left hand side of such an equivalence is provable in an epsilon calculus the right hand side is provable there. But the left hand side is provable in an epsilon calculus if it is provable in the predicate calculus, by the Second Epsilon Theorem; and if the right hand side is provable in an epsilon calculus it is provable in a predicate calculus enriched with certain function symbols — epsilon terms, like ‘εyFxy’. So, by generalisation, we get Skolem’s original result.

When we add to an intensional epsilon calculus the second epsilon axiom

(x)(Fx ↔ Gx) →εxFx = εxGx,

the interpretation of epsilon terms is commonly extensional, i.e. in terms of sets, since two predicates ‘F’ and ‘G’ satisfying the antecedent of this second axiom will determine the same set — if they determine sets at all, that is. For that requires the predicates to be collectivisantes, in Bourbaki’s terms, as with explicit set membership statements, like ‘x ∈ y’. In such a case the epsilon term ‘εx(x ∈ y)’ designates a choice function, i.e. a function which selects one from a given set (c.f. Leisenring 1969, p19, Meyer Viol 1995, p42). In the case where there are no members of the set the selection is arbitrary, although for all empty sets it is invariably the same. Thus the second axiom validates, for example, Kalish and Montague’s rule for this case, which they put in the form

εxFx = εx¬(x = x).

Kalish and Montague in fact prove a version of the second epsilon axiom in their system (Kalish and Montague 1964, see T407, p256). The second axiom also holds in Hermes’ system (Hermes 1965), although there one in addition finds a third epsilon axiom,

εx¬(x = x) = εx(x = x),

for which there would seem to be no real justification.

But the second epsilon axiom itself is curious. One questionable thing about it is that both Leisenring and Meyer Viol do not state that the predicates in question must determine sets before their choice function semantics can apply. That the predicates are collectivisantes is merely presumed in their theories, since ‘εxBx’ is invariably modelled by means of a choice from the presumed set of things which in the model are B. Certainly there is a special clause dealing with the empty set; but there is no consideration of the case where some things are B although those things are not discrete, as with the things which are red, for instance. If the predicate in question is not a count noun then there is no set of things involved, since with mass terms, and continuous quantities there are no given elements to be counted (c.f. Bunt 1985, pp262-263 in particular). Of course numbers can still be associated with them, but only given an arbitrary unit. With the cows in a field, for instance, we can associate a determinate number, but with the beef there we cannot, unless we consider, say, the number of pounds of it.

The point, as we saw before, has a formalisation in epsilon terms. Thus if we write ‘(nx)Fx’, for ‘there are n F’s’, then εn(ny)Fy will be the number of F’s, and it is what numbers them if they have a number. But in the reverse case the previously mentioned arbitrariness of the epsilon term comes in. For if ¬(∃n)(nx)Fx, then ¬([εn(ny)Fy]x)Fx, and so, although an arbitrary number exists, it does not number the F’s. In that case, in other words, we do not have a number of F’s, merely some F.

In fact, even when there is a set of things, the second epsilon axiom, as stated above, does not apply in general, since there are intensional differences between properties to consider, as in, for instance ‘There is a red-haired man, and a Caucasian in the room, and they are different’. Here, if there were only red-haired Caucasians in the room, then with the above second axiom, we could not find epsilon substitutions to differentiate the two individuals involved. This may remind us that it is necessary co-extensionality, and not just contingent co-extensionality which is the normal criterion for the identity of properties (c.f. Hughes and Cresswell 1968, pp209-210). So it leads us to see the appropriateness of a modalised second axiom, which uses just an intensional version of the antecedent of the previous second epsilon axiom, in which ‘L’ means ‘it is necessary that’, namely:

L(x)(Fx ↔ Gx) →εxFx = εxGx.

For with this axiom only the co-extensionalities which are necessary will produce identities between the associated epsilon terms. We can only get, for instance,

εxPx = εx(Px ∨ Px),

and

εxFx = εyFy,

and all other identities derivable in a similar way.

However, the original second epsilon axiom is then provable, in the special case where the predicates express set membership. For if necessarily

(x)(x ∈ y ↔ x ∈ z) ↔ y = z,

while necessarily

y = z ↔ L(y = z),

(see Hughes and Cresswell, 1968, p190) then

L(x)(x ∈ y ↔ x ∈ z) ↔ (x)(x ∈ y ↔ x ∈ z),

and so, from the modalised second axiom we can get

(x)(x ∈ y ↔ x ∈ z) →εx(x ∈ y) = εx(x ∈ z).

Note, however, that if one only has contingently

(x)(Fx ↔ x ∈ z),

then one cannot get, on this basis,

εxFx = εx(x ∈ z).

But this is something which is desirable, as well. For we have seen that it is contingent that the number of the planets does number the planets — because it is not necessary that ([εn(ny)Py]x)Px. This makes ‘(9x)Px’ contingent, even though the identity ‘9 = εn(nx)Px’ remains necessary. But also it is contingent that there is the set of planets, p, which there is, since while, say,

(x)(x ∈ p ↔ Px),

where

εn(nx)(x ∈ p) = εn(nx)Px = 9,

it is still possible that, in some other possible world,

(x)(x ∈ p’ ↔ Px),

with p’ the set of planets there, and

¬(εn(nx)(x ∈ p’) = 9).

We could not have this further contingency, however, if the original second epsilon axiom held universally.

It is on this fuller basis that we can continue to hold ‘x = y → L(x = y)’, i.e. the invariable necessity of identity — one merely distinguishes ‘(9x)Px’ from ‘9 = εx(nx)Px’, and from ‘9 = εx(nx)(x ∈ p)’, as above.

Adding the original second epsilon axiom to an intensional epsilon calculus is therefore acceptable only if all the predicates are about set membership. This is not an uncommon assumption, indeed it is pervasive in the usually given semantics for predicate logic, for instance. But if, by contrast, we want to allow for the fact that not all predicates are collectivisantes then we should take just the first epsilon axiom with merely a modalised version of the second epsilon axiom. The interpretation of epsilon terms is then always in terms of Skolem functions, although if we are dealing with the membership of sets, those Skolem functions naturally are choice functions.

6. Some Metatheory

To finish we shall briefly look, as promised, at some meta-theory.

The epsilon calculi that were first described were not very convenient to use, and Hilbert and Bernays’ proofs of the First and Second Epsilon Theorems were very complex. This was because the presentation was axiomatic, however, and with the development of other means of presenting the same logics we get more readily available meta-logical results. I will indicate some of the early difficulties before showing how these theorems can be proved, nowadays, much more simply.

The problem with proving the Second Epsilon Theorem, on an axiomatic basis, is that complex, and non-constant epsilon terms may enter a proof in the epsilon calculus by means of substitutions into the axioms. What has to be proved is that an epsilon calculus proof of an epsilon-free theorem (i.e. one which can be expressed just in predicate calculus language) can be replaced by a predicate calculus proof. So some analysis of complex epsilon terms is required, to show that they can be eliminated in the relevant cases, leaving only constant epsilon terms, which are sufficiently similar to the individual symbols in standard predicate logic. Hilbert and Bernays (Hilbert and Bernays 1970, Book 2, p23f) say that one epsilon term ‘εxFx’ is subordinate to another ‘εyGy’ if and only if ‘G’ contains ‘εxFx’, and a free occurrence of the variable ‘y’ lies within ‘εxFx’. For instance ‘εxRxy’ is a complex, and non-constant epsilon term, which is subordinate to ‘εySyεxRyx’. Hilbert and Bernays then define the rank of an epsilon term to be 1 if there are no epsilon terms subordinate to it, and otherwise to be one greater than the maximal rank of the epsilon terms which are subordinate to it. Using the same general ideas, Leisenring proves two theorems (Leisenring 1969, p72f). First he proves a rank reduction theorem, which shows that epsilon proofs of epsilon-free formulas in which the second epsilon axiom is not used, but in which every term is of rank less than or equal to r, may be replaced by epsilon proofs in which every term is of rank less than or equal to r – 1. Then he proves the eliminability of the second epsilon axiom in proofs of epsilon-free formulas. Together, these two theorems show that if there is an epsilon proof of an epsilon-free formula, then there is such a proof not using the second epsilon axiom, and in which all epsilon terms have rank just 1. Even though such epsilon terms might still contain free variables, if one replaces those that do with a fixed symbol ‘a’ (starting with those of maximal length) that reduces the proof to one in what is called the ‘epsilon star’ system, in which there are only constant epsilon terms (Leisenring 1969, p66f). Leisenring shows that proofs in the epsilon star system can be turned into proofs in the predicate calculus, by replacing the epsilon terms by individual symbols.

But, as was said before, there is now available a much shorter proof of the Second Epsilon Theorem. In fact there are several, but I shall just indicate one, which arises simply by modifying the predicate calculus truth trees, as found in, for instance, Jeffrey (see Jeffrey 1967). Jeffrey uses the standard propositional truth tree rules, together with the rules of quantifier interchange, which remain unaffected, and which are not material to the present purpose. He also has, however, a rule of existential quantifier elimination,

(∃x)Fx ├ Fa,

in which ‘a’ must be new, and a rule of universal quantifier elimination

(x)Fx ├ Fb,

in which ‘b’ must be old — unless no other individual terms are available. By reducing closed formulas of the form ‘P & ¬C’ to absurdity Jeffrey can then prove ‘P → C’, and validate ‘P ├ C’ in his calculus. But clearly, upon adding epsilon terms to the language, the first of these rules must be changed to

(∃x)Fx ├ FεxFx,

while also the second rule can be replaced by the pair

(x)Fx ├ Fεx¬Fx,
Fεx¬Fx ├ Fa,

(where ‘a’ is old) to produce an appropriate proof procedure. Steen reads ‘εx¬Fx’ as ‘the most un-F-like thing’ (Steen 1972, p162), which explains why Fεx¬Fx entails Fa, since if the most un-F-like thing is in fact F, then the most plausible counter-example to the generalisation is in fact not so, making the generalisation exceptionless. But there is a more important reason why the rule of universal quantifier elimination is best broken up into two parts.

For Jeffrey’s rules only allow him ‘limited upward correctness’ (Jeffrey 1967, p167), since Jeffrey has to say, with respect to his universal quantifier elimination rule, that the range of the quantification there be limited merely to the universe of discourse of the path below. This is because, if an initial sentence is false in a valuation so also must be one of its conclusions. But the first epsilon rule which replaces Jeffrey’s rule ensures, instead, that there is ‘total upwards correctness’. For if it is false that everything is F then, without any special interpretation of the quantifier, one of the given consequences of the universal statement is false, namely the immediate one — since Fεx¬Fx is in fact equivalent to (x)Fx. A similar improvement also arises with the existential quantifier elimination rule. For Jeffrey can only get ‘limited downwards correctness’, with his existential quantifier elimination rule (Jeffrey 1967, p165), since it is not an entailment. In fact, in order to show that if an initial sentence is true in a valuation so is one of its conclusions, in this case, Jeffrey has to stretch his notion of ‘truth’ to being true either in the given valuation, or some nominal variant of it.

The epsilon rule which replaces Jeffrey’s overcomes this difficulty by not employing names, only demonstrative descriptions, and by being, as a result, totally downward correct. For if there is an F then that F is F, whatever name is used to refer to it. The epsilon calculus terminology thus precedes any naming: it gets hold of the more primitive, demonstrative way we have of referring to objects, using phrases like ‘that F’. Thus in explication of the predicate calculus rule we might well have said

suppose there is an F, well, call that F ‘a’, then Fa,

but that requires we understand ‘that F’ before we come to use ‘a’.

So how does the Second Epsilon Theorem follow? This theorem, as before, states that an epsilon calculus proof of an epsilon-free theorem may be replaced by a predicate calculus proof of the same formula. But the transformation required in the present setting is now evident: simply change to new names all epsilon terms introduced in the epsilon calculus quantifier elimination rules. This covers both the new names in Jeffrey’s first rule, but also the odd case where there are no old names in Jeffrey’s second rule. The epsilon calculus proofs invariably use constant epsilon terms, and are thus effectively in Leisenring’s epsilon star system.

Epsilon terms which are non-constant, however, crucially enter the proof of the First Epsilon Theorem. The First Epsilon Theorem states that if C is a provable predicate calculus formula, in prenex normal form, i.e. with all quantifiers at the front, then a finite disjunction of instances of C’s matrix is provable in the epsilon calculus. The crucial fact is that the epsilon calculus gives us access to Herbrand functions, which arise when universal quantifiers are eliminated from formulas using their epsilon definition. Thus

(∃y)(x)¬Fyx,

for instance, is equivalent to

(∃y)¬Fyεx¬¬Fyx,

and so

(∃y)¬FyεxFyx,

and the resulting epsilon term ‘εxFyx’ is a Herbrand function.

Using such reductions, all universal quantifiers can evidently be removed from formulas in prenex normal form, and the additional fact that, in a certain specific way, the remaining existential quantifiers are disjunctions makes all predicate calculus formulas equivalent to disjunctions. Remember that a formula is provable if its negation is reducible to absurdity, which means that its truth tree must close. But, by König’s Lemma, if there is no open path through a truth tree then there is some finite stage at which there is no open path, so, in the case above, for instance, if no valuation makes the last formula’s negation true, then the tree of the instances of that negative statement must close in a finite length. But the negative statement is the universal formula

(y)FyεxFyx,

by the rules of quantifier interchange, so a finite conjunction of instances of the matrix of this universal formula, namely Fyx, must reduce to absurdity. For the rules of universal quantifier elimination only produce consequences with the form of this matrix. By de Morgan’s Laws, that makes necessary a finite disjunction of instances of ¬Fyx. By generalisation we thus get the First Epsilon Theorem.

The epsilon calculus, however, can take us further than the First Epsilon Theorem. Indeed, one has to take care with the impression this theorem may give that existential statements are just equivalent to disjunctions. If that were the case, then existential statements would be unlike individual statements, saying not that one specified thing has a certain property, but merely that one of a certain group of things has a certain property. The group in question is normally called the ‘domain’ of the quantification, and this, it seems, has to be specified when setting out the semantics of quantifiers. But study of the epsilon calculus shows that there is no need for such ‘domains’, or indeed for such semantics. This is because the example above, for instance, is also equivalent to

¬FaεzFaz,

where a = εy¬FεxFyx. So the previous disjunction of instances of ¬Fyx is in fact only true because this specific disjunct is true. The First Epsilon Theorem, it must be remembered, does not prove that an existential statement is equivalent to a certain disjunction; it shows merely that an existential statement is provable if and only if a certain disjunction is provable. And what is also provable, in such a case, is a statement merely about one object. Indeed the existential statement is provably equivalent to it. It is this fact which supports the epsilon definition of the quantifiers; and it is what permits anaphoric reference to the same object by means of the same epsilon term. An existential statement is thus just another statement about an individual — merely a nameless one.

The reverse point goes for the universal quantifier: a universal statement is not the conjunction of its instances, even though it implies them. A generalisation is simply equivalent to one of its instances — to the one involving the prime putative exception to it, as we have seen. Not being able to specify that prime putative exception leaves Jeffrey saying that if a generalisation is false then one of its instances is false without any way of ensuring that that instance has been drawn as a conclusion below it in the truth tree except by limiting the interpretation of the generalisation just to the universe of discourse of the path. It thus seems necessary, within the predicate calculus, that there be a ‘model’ for the quantifiers which restricts them to a certain ‘domain’, which means that they do not necessarily range over everything. But in the epsilon calculus the quantifiers do, invariably, range over everything, and so there is no need to specify their range.

7. References and Further Reading

  • Ackermann, W. 1937-8, ‘Mengentheoretische Begründung der Logik’, Mathematische Annalen, 115, 1-22.
  • Asser, G. 1957, ‘Theorie der Logischen Auswahlfunktionen’, Zeitschrift für Mathematische Logik und Grundlagen der Mathematik, 3, 30-68.
  • Bernays, P. 1958, Axiomatic Set Theory, North Holland, Dordrecht.
  • Bertolet, R. 1980, ‘The Semantic Significance of Donnellan’s Distinction’, Philosophical Studies, 37, 281-288.
  • Bourbaki, N. 1954, Éléments de Mathématique, Hermann, Paris.
  • Bunt, H.C. 1985, Mass Terms and Model-Theoretic Semantics, C.U.P., Cambridge.
  • Church, A. 1940, ‘A Formulation of the Simple Theory of Types’, Journal of Symbolic Logic, 5, 56-68.
  • Copi, I. 1973, Symbolic Logic, 4th ed. Macmillan, New York.
  • Devitt, M. 1974, ‘Singular Terms’, The Journal of Philosophy, 71, 183-205.
  • Donnellan, K. 1966, ‘Reference and Definite Descriptions’, Philosophical Review, 75, 281-304.
  • Egli, U. and von Heusinger, K. 1995, ‘The Epsilon Operator and E-Type Pronouns’ in U. Egli et al. (eds.), Lexical Knowledge in the Organisation of Language, Benjamins, Amsterdam.
  • Evans, G. 1977, ‘Pronouns, Quantifiers and Relative Clauses’, Canadian Journal of Philosophy, 7, 467-536.
  • Geach, P.T. 1962, Reference and Generality, Cornell University Press, Ithaca.
  • Geach, P.T. 1967, ‘Intentional Identity’, The Journal of Philosophy, 64, 627-632.
  • Goddard, L. and Routley, R. 1973, The Logic of Significance and Context, Scottish Academic Press, Aberdeen.
  • Gödel, K. 1969, ‘An Interpretation of the Intuitionistic Sentential Calculus’, in J. Hintikka (ed.), The Philosophy of Mathematics, O.U.P. Oxford.
  • Hazen, A. 1987, ‘Natural Deduction and Hilbert’s ε-operator’, Journal of Philosophical Logic, 16, 411-421.
  • Hermes, H. 1965, Eine Termlogik mit Auswahloperator, Springer Verlag, Berlin.
  • Hilbert, D. 1923, ‘Die Logischen Grundlagen der Mathematik’, Mathematische Annalen, 88, 151-165.
  • Hilbert, D. 1925, ‘On the Infinite’ in J. van Heijenhoort (ed.), From Frege to Gödel, Harvard University Press, Cambridge MA.
  • Hilbert, D. and Bernays, P. 1970, Grundlagen der Mathematik, 2nd ed., Springer, Berlin.
  • Hughes, G.E. and Cresswell, M.J. 1968, An Introduction to Modal Logic, Methuen, London.
  • Jeffrey, R. 1967, Formal Logic: Its Scope and Limits, 1st Ed. McGraw-Hill, New York.
  • Kalish, D. and Montague, R. 1964, Logic: Techniques of Formal Reasoning, Harcourt, Brace and World, Inc, New York.
  • Kneebone, G.T. 1963, Mathematical Logic and the Foundations of Mathematics, Van Nostrand, Dordrecht.
  • Leisenring, A.C. 1969, Mathematical Logic and Hilbert’s ε-symbol, Macdonald, London.
  • Marciszewski, W. 1981, Dictionary of Logic, Martinus Nijhoff, The Hague.
  • Meyer Viol, W.P.M. 1995, Instantial Logic, ILLC Dissertation Series 1995-11, Amsterdam.
  • Montague, R. 1963, ‘Syntactical Treatments of Modality, with Corollaries on Reflection Principles and Finite Axiomatisability’, Acta Philosophica Fennica, 16, 155-167.
  • Neale, S. 1990, Descriptions, MIT Press, Cambridge MA.
  • Priest, G.G. 1984, ‘Semantic Closure’, Studia Logica, XLIII 1/2, 117-129.
  • Prior, A.N., 1971, Objects of Thought, O.U.P. Oxford.
  • Purdy, W.C. 1994, ‘A Variable-Free Logic for Anaphora’ in P. Humphreys (ed.) Patrick Suppes: Scientific Philosopher, Vol 3, Kluwer, Dordrecht, 41-70.
  • Quine, W.V.O. 1960, Word and Object, Wiley, New York.
  • Rasiowa, H. 1956, ‘On the ε-theorems’, Fundamenta Mathematicae, 43, 156-165.
  • Rosser, J. B. 1953, Logic for Mathematicians, McGraw-Hill, New York.
  • Routley, R. 1969, ‘A Simple Natural Deduction System’, Logique et Analyse, 12, 129-152.
  • Routley, R. 1977, ‘Choice and Descriptions in Enriched Intensional Languages II, and III’, in E. Morscher, J. Czermak, and P. Weingartner (eds), Problems in Logic and Ontology, Akademische Druck und Velagsanstalt, Graz.
  • Routley, R. 1980, Exploring Meinong’s Jungle, Departmental Monograph #3, Philosophy Department, R.S.S.S., A.N.U. Canberra.
  • Routley, R., Meyer, R. and Goddard, L. 1974, ‘Choice and Descriptions in Enriched Intensional Languages I’, Journal of Philosophical Logic, 3, 291-316.
  • Russell, B. 1905, ‘On Denoting’ Mind, 14, 479-493.
  • Sayward, C. 1987, ‘Prior’s Theory of Truth’ Analysis, 47, 83-87.
  • Slater, B.H. 1986(a), ‘E-type Pronouns and ε-terms’, Canadian Journal of Philosophy, 16, 27-38.
  • Slater, B.H. 1986(b), ‘Prior’s Analytic’, Analysis, 46, 76-81.
  • Slater, B.H. 1988(a), ‘Intensional Identities’, Logique et Analyse, 121-2, 93-107.
  • Slater, B.H. 1988(b), ‘Hilbertian Reference’, Noûs, 22, 283-97.
  • Slater, B.H. 1989(a), ‘Modal Semantics’, Logique et Analyse, 127-8, 195-209.
  • Slater, B.H. 1990, ‘Using Hilbert’s Calculus’, Logique et Analyse, 129-130, 45-67.
  • Slater, B.H. 1992(a), ‘Routley’s Formulation of Transparency’, History and Philosophy of Logic, 13, 215-24.
  • Slater, B.H. 1994(a), ‘The Epsilon Calculus’ Problematic’, Philosophical Papers, XXIII, 217-42.
  • Steen, S.W.P. 1972, Mathematical Logic, C.U.P. Cambridge.
  • Thomason, R. 1977, ‘Indirect Discourse is not Quotational’, Monist, 60, 340-354.
  • Thomason, R. 1980, ‘A Note on Syntactical Treatments of Modality’, Synthese, 44, 391-395.
  • Thomason, R.H. and Stalnaker, R.C. 1968, ‘Modality and Reference’, Noûs, 2, 359-372.

Author Information

Barry Hartley Slater
Email: slaterbh@cyllene.uwa.edu.au
University of Western Australia
Australia

Thomas Hobbes: Moral and Political Philosophy

hobbesThe English philosopher Thomas Hobbes (1588-1679) is best known for his political thought, and deservedly so. His vision of the world is strikingly original and still relevant to contemporary politics. His main concern is the problem of social and political order: how human beings can live together in peace and avoid the danger and fear of civil conflict. He poses stark alternatives: we should give our obedience to an unaccountable sovereign (a person or group empowered to decide every social and political issue). Otherwise what awaits us is a state of nature that closely resembles civil war – a situation of universal insecurity, where all have reason to fear violent death and where rewarding human cooperation is all but impossible.

One controversy has dominated interpretations of Hobbes. Does he see human beings as purely self-interested or egoistic? Several passages support such a reading, leading some to think that his political conclusions can be avoided if we adopt a more realistic picture of human nature. However, most scholars now accept that Hobbes himself had a much more complex view of human motivation. A major theme below will be why the problems he poses cannot be avoided simply by taking a less selfish view of human nature.

Table of Contents

  1. Introduction
  2. Life and Times
  3. Two Intellectual Influences
  4. Ethics and Human Nature
    1. Materialism Versus Self-Knowledge
    2. The Poverty of Human Judgment and our Need for Science
    3. Motivation
    4. Political Philosophy
  5. The Natural Condition of Mankind
    1. The Laws of Nature and the Social Contract
    2. Why Should we Obey the Sovereign?
    3. Life Under the Sovereign
  6. Conclusion
  7. References and Further Reading

1. Introduction

Hobbes is the founding father of modern political philosophy. Directly or indirectly, he has set the terms of debate about the fundamentals of political life right into our own times. Few have liked his thesis, that the problems of political life mean that a society should accept an unaccountable sovereign as its sole political authority. Nonetheless, we still live in the world that Hobbes addressed head on: a world where human authority is something that requires justification, and is automatically accepted by few; a world where social and political inequality also appears questionable; and a world where religious authority faces significant dispute. We can put the matter in terms of the concern with equality and rights that Hobbes’s thought heralded: we live in a world where all human beings are supposed to have rights, that is, moral claims that protect their basic interests. But what or who determines what those rights are? And who will enforce them? In other words, who will exercise the most important political powers, when the basic assumption is that we all share the same entitlements?

We can see Hobbes’s importance if we briefly compare him with the most famous political thinkers before and after him. A century before, Nicolo Machiavelli had emphasized the harsh realities of power, as well as recalling ancient Roman experiences of political freedom. Machiavelli appears as the first modern political thinker, because like Hobbes he was no longer prepared to talk about politics in terms set by religious faith (indeed, he was still more offensive than Hobbes to many orthodox believers), instead, he looked upon politics as a secular discipline divorced from theology. But unlike Hobbes, Machiavelli offers us no comprehensive philosophy: we have to reconstruct his views on the importance and nature of freedom; it remains uncertain which, if any, principles Machiavelli draws on in his apparent praise of amoral power politics.

Writing a few years after Hobbes, John Locke had definitely accepted the terms of debate Hobbes had laid down: how can human beings live together, when religious or traditional justifications of authority are no longer effective or persuasive? How is political authority justified and how far does it extend? In particular, are our political rulers properly as unlimited in their powers as Hobbes had suggested? And if they are not, what system of politics will ensure that they do not overstep the mark, do not trespass on the rights of their subjects?

So, in assessing Hobbes’s political philosophy, our guiding questions can be: What did Hobbes write that was so important? How was he able to set out a way of thinking about politics and power that remains decisive nearly four centuries afterwards? We can get some clues to this second question if we look at Hobbes’s life and times.

2. Life and Times

Hobbes’s biography is dominated by the political events in England and Scotland during his long life. Born in 1588, the year the Spanish Armada made its ill-fated attempt to invade England, he lived to the exceptional age of 91, dying in 1679. He was not born to power or wealth or influence: the son of a disgraced village vicar, he was lucky that his uncle was wealthy enough to provide for his education and that his intellectual talents were soon recognized and developed (through thorough training in the classics of Latin and Greek). Those intellectual abilities, and his uncle’s support, brought him to university at Oxford. And these in turn—together with a good deal of common sense and personal maturity—won him a place tutoring the son of an important noble family, the Cavendishes. This meant that Hobbes entered circles where the activities of the King, of Members of Parliament, and of other wealthy landowners were known and discussed, and indeed influenced. Thus intellectual and practical ability brought Hobbes to a place close to power—later he would even be math tutor to the future King Charles II. Although this never made Hobbes powerful, it meant he was acquainted with and indeed vulnerable to those who were. As the scene was being set for the Civil Wars of 1642-46 and 1648-51—wars that would lead to the King being executed and a republic being declared—Hobbes felt forced to leave the country for his personal safety, and lived in France from 1640 to 1651. Even after the monarchy had been restored in 1660, Hobbes’s security was not always certain: powerful religious figures, critical of his writings, made moves in Parliament that apparently led Hobbes to burn some of his papers for fear of prosecution.

Thus Hobbes lived in a time of upheaval, sharper than any England has since known. This turmoil had many aspects and causes, political and religious, military and economic. England stood divided against itself in several ways. The rich and powerful were divided in their support for the King, especially concerning the monarch’s powers of taxation. Parliament was similarly divided concerning its own powers vis-à-vis the King. Society was divided religiously, economically, and by region. Inequalities in wealth were huge, and the upheavals of the Civil Wars saw the emergence of astonishingly radical religious and political sects. (For instance, the Levellers called for much greater equality in terms of wealth and political rights; the Diggers, more radical still, fought for the abolition of wage labor.) Civil war meant that the country became militarily divided. And all these divisions cut across one another: for example, the army of the republican challenger, Cromwell, was the main home of the Levellers, yet Cromwell in turn would act to destroy their power within the army’s ranks. In addition, England’s recent union with Scotland was fragile at best, and was almost destroyed by King Charles I’s attempts to impose consistency in religious practices. We shall see that Hobbes’s greatest fear was social and political chaos—and he had ample opportunity both to observe it and to suffer its effects.

Although social and political turmoil affected Hobbes’s life and shaped his thought, it never hampered his intellectual development. His early position as a tutor gave him the scope to read, write and publish (a brilliant translation of the Greek writer Thucydides appeared in 1629), and brought him into contact with notable English intellectuals such as Francis Bacon. His self-imposed exile in France, along with his emerging reputation as a scientist and thinker, brought him into contact with major European intellectual figures of his time, leading to exchange and controversy with figures such as Descartes, Mersenne and Gassendi. Intensely disputatious, Hobbes repeatedly embroiled himself in prolonged arguments with clerics, mathematicians, scientists and philosophers—sometimes to the cost of his intellectual reputation. (For instance, he argued repeatedly that it is possible to square the circle  It is no accident that the phrase is now proverbial for a problem that cannot be solved!) His writing was as undaunted by age and ill health as it was by the events of his times. Though his health slowly failed—from about sixty, he began to suffer shaking palsy, probably Parkinson’s disease, which steadily worsened—even in his eighties he continued to dictate his thoughts to a secretary, and to defend his quarter in various controversies.

Hobbes gained a reputation in many fields. He was known as a scientist (especially in optics), as a mathematician (especially in geometry), as a translator of the classics, as a writer on law, as a disputant in metaphysics and epistemology; not least, he became notorious for his writings and disputes on religious questions. But it is for his writings on morality and politics that he has, rightly, been most remembered. Without these, scholars might remember Hobbes as an interesting intellectual of the seventeenth century; but few philosophers would even recognize his name.

What are the writings that earned Hobbes his philosophical fame? The first was entitled The Elements of Law (1640); this was Hobbes’s attempt to provide arguments supporting the King against his challengers. De Cive [On the Citizen] (1642) has much in common with Elements, and offers a clear, concise statement of Hobbes’s moral and political philosophy. His most famous work is Leviathan, a classic of English prose (1651; a slightly altered Latin edition appeared in 1668). Leviathan expands on the argument of De Cive, mostly in terms of its huge second half that deals with questions of religion. Other important works include: De Corpore [On the Body] (1655), which deals with questions of metaphysics; De Homine [On Man] (1657); and Behemoth (published 1682, though written rather earlier), in which Hobbes gives his account of England’s Civil Wars. But to understand the essentials of Hobbes’s ideas and system, one can rely on De Cive and Leviathan. It is also worth noting that, although Leviathan is more famous and more often read, De Cive actually gives a much more straightforward account of Hobbes’s ideas. Readers whose main interest is in those ideas may wish to skip the next section and go straight to ethics and human nature.

3. Two Intellectual Influences

As well as the political background just stressed, two influences are extremely marked in Hobbes’s work. The first is a reaction against religious authority as it had been known, and especially against the scholastic philosophy that accepted and defended such authority. The second is a deep admiration for (and involvement in) the emerging scientific method, alongside an admiration for a much older discipline, geometry. Both influences affected how Hobbes expressed his moral and political ideas. In some areas it is also clear that they significantly affected the ideas themselves.

Hobbes’s contempt for scholastic philosophy is boundless. Leviathan and other works are littered with references to the “frequency of insignificant speech” in the speculations of the scholastics, with their combinations of Christian theology and Aristotelian metaphysics. Hobbes’s reaction, apart from much savage and sparkling sarcasm, is twofold. In the first place, he makes very strong claims about the proper relation between religion and politics. He was not (as many have charged) an atheist, but he was deadly serious in insisting that theological disputes should be kept out of politics. (He also adopts a strongly materialist metaphysics, that—as his critics were quick to charge—makes it difficult to account for God’s existence as a spiritual entity.) For Hobbes, the sovereign should determine the proper forms of religious worship, and citizens never have duties to God that override their duty to obey political authority. Second, this reaction against scholasticism shapes the presentation of Hobbes’s own ideas. He insists that terms be clearly defined and relate to actual concrete experiences—part of his empiricism. (Many early sections of Leviathan read rather like a dictionary.) Commentators debate how seriously to take Hobbes’s stress on the importance of definition, and whether it embodies a definite philosophical doctrine. What is certain, and more important from the point of view of his moral and political thought, is that he tries extremely hard to avoid any metaphysical categories that do not relate to physical realities (especially the mechanical realities of matter and motion). Commentators further disagree whether Hobbes’s often mechanical sounding definitions of human nature and human behavior are actually important in shaping his moral and political ideas—see Materialism versus self-knowledge below.

Hobbes’s determination to avoid the “insignificant” (that is, meaningless) speech of the scholastics also overlaps with his admiration for the emerging physical sciences and for geometry. His admiration is not so much for the emerging method of experimental science, but rather for deductive science—science that deduces the workings of things from basic first principles and from true definitions of the basic elements. Hobbes therefore approves a mechanistic view of science and knowledge, one that models itself very much on the clarity and deductive power exhibited in proofs in geometry. It is fair to say that this a priori account of science has found little favor after Hobbes’s time. It looks rather like a dead-end on the way to the modern idea of science based on patient observation, theory-building and experiment. Nonetheless, it certainly provided Hobbes with a method that he follows in setting out his ideas about human nature and politics. As presented in Leviathan, especially, Hobbes seems to build from first elements of human perception and reasoning, up to a picture of human motivation and action, to a deduction of the possible forms of political relations and their relative desirability. Once more, it can be disputed whether this method is significant in shaping those ideas, or merely provides Hobbes with a distinctive way of presenting them.

4. Ethics and Human Nature

Hobbes’s moral thought is difficult to disentangle from his politics. On his view, what we ought to do depends greatly on the situation in which we find ourselves. Where political authority is lacking (as in his famous natural condition of mankind), our fundamental right seems to be to save our skins, by whatever means we think fit. Where political authority exists, our duty seems to be quite straightforward: to obey those in power.

But we can usefully separate the ethics from the politics if we follow Hobbes’s own division. For him ethics is concerned with human nature, while political philosophy deals with what happens when human beings interact. What, then, is Hobbes’s view of human nature?

a. Materialism Versus Self-Knowledge

Reading the opening chapters of Leviathan is a confusing business, and the reason for this is already apparent in Hobbes’s very short Introduction. He begins by telling us that the human body is like a machine, and that political organization (the commonwealth) is like an artificial human being. He ends by saying that the truth of his ideas can be gauged only by self-examination, by looking into our selves to adjudge our characteristic thoughts and passions, which form the basis of all human action. But what is the relationship between these two very different claims? For obviously when we look into our selves we do not see mechanical pushes and pulls. This mystery is hardly answered by Hobbes’s method in the opening chapters, where he persists in talking about all manner of psychological phenomena—from emotions to thoughts to whole trains of reasoning – as products of mechanical interactions. (As to what he will say about successful political organization, the resemblance between the commonwealth and a functioning human being is slim indeed. Hobbes’s only real point seems to be that there should be a head that decides most of the important things that the body does.)

Most commentators now agree with an argument made in the 1960’s by the political philosopher Leo Strauss. Hobbes draws on his notion of a mechanistic science, that works deductively from first principles, in setting out his ideas about human nature. Science provides him with a distinctive method and some memorable metaphors and similes. What it does not provide—nor could it, given the rudimentary state of physiology and psychology in Hobbes’s day—are any decisive or substantive ideas about what human nature really is. Those ideas may have come, as Hobbes also claims, from self-examination. In all likelihood, they actually derived from his reflection on contemporary events and his reading of classics of political history such as Thucydides.

This is not to say that we should ignore Hobbes’s ideas on human nature—far from it. But it does mean we should not be misled by scientific imagery that stems from an in fact non-existent science (and also, to some extent, from an unproven and uncertain metaphysics). The point is important mainly when it comes to a central interpretative point in Hobbes’s work: whether or not he thinks of human beings as mechanical objects, programmed as it were to pursue their self-interest. Some have suggested that Hobbes’s mechanical world-view leaves no room for the influence of moral ideas, that he thinks the only effective influence on our behavior will be incentives of pleasure and pain. But while it is true that Hobbes sometimes says things like this, we should be clear that the ideas fit together only in a metaphorical way. For example, there is no reason why moral ideas should not “get into” the mechanisms that drive us round (like so many clock-work dolls perhaps?). Likewise, there is no reason why pursuing pleasure and pain should work in our self-interest. (What self-interest is depends on the time-scale we adopt, and how effectively we might achieve this goal also depends on our insight into what harms and benefits us). If we want to know what drives human beings, on Hobbes’s view, we must read carefully all he says about this, as well as what he needs to assume if the rest of his thought is to make sense. The mechanistic metaphor is something of a red herring and, in the end, probably less useful than his other starting point in Leviathan, the Delphic epithet: nosce teipsum (know thyself).

b. The Poverty of Human Judgment and our Need for Science

There are two major aspects to Hobbes’s picture of human nature. As we have seen, and will explore below, what motivates human beings to act is extremely important to Hobbes. The other aspect concerns human powers of judgment and reasoning, about which Hobbes tends to be extremely skeptical. Like many philosophers before him, Hobbes wants to present a more solid and certain account of human morality than is contained in everyday beliefs. Plato had contrasted knowledge with opinion. Hobbes contrasts science with a whole raft of less reliable forms of belief—from probable inference based on experience, right down to “absurdity, to which no living creature is subject but man” (Leviathan, v.7).

Hobbes has several reasons for thinking that human judgment is unreliable, and needs to be guided by science. Our judgments tend to be distorted by self-interest or by the pleasures and pains of the moment. We may share the same basic passions, but the various things of the world affect us all very differently; and we are inclined to use our feelings as measures for others. It becomes dogmatic through vanity and morality, as with “men vehemently in love with their own new opinions…and obstinately bent to maintain them, [who give] their opinions also that reverenced name of conscience” (Leviathan, vii.4). When we use words which lack any real objects of reference, or are unclear about the meaning of the words we use, the danger is not only that our thoughts will be meaningless, but also that we will fall into violent dispute. (Hobbes has scholastic philosophy in mind, but he also makes related points about the dangerous effects of faulty political ideas and ideologies.) We form beliefs about supernatural entities, fairies and spirits and so on, and fear follows where belief has gone, further distorting our judgment. Judgment can be swayed this way and that by rhetoric, that is, by the persuasive and “colored speech” of others, who can deliberately deceive us and may well have purposes that go against the common good or indeed our own good. Not least, much judgment is concerned with what we should do now, that is, with future events, “the future being but a fiction of the mind” (Leviathan, iii.7) and therefore not reliably known to us.

For Hobbes, it is only science, “the knowledge of consequences” (Leviathan, v.17), that offers reliable knowledge of the future and overcomes the frailties of human judgment. Unfortunately, his picture of science, based on crudely mechanistic premises and developed through deductive demonstrations, is not even plausible in the physical sciences. When it comes to the complexities of human behavior, Hobbes’s model of science is even less satisfactory. He is certainly an acute and wise commentator of political affairs; we can praise him for his hard-headedness about the realities of human conduct, and for his determination to create solid chains of logical reasoning. Nonetheless, this does not mean that Hobbes was able to reach a level of scientific certainty in his judgments that had been lacking in all previous reflection on morals and politics.

c. Motivation

The most consequential aspect of Hobbes’s account of human nature centers on his ideas about human motivation, and this topic is therefore at the heart of many debates about how to understand Hobbes’s philosophy. Many interpreters have presented the Hobbesian agent as a self-interested, rationally calculating actor (those ideas have been important in modern political philosophy and economic thought, especially in terms of rational choice theories). It is true that some of the problems that face people like this—rational egoists, as philosophers call them—are similar to the problems Hobbes wants to solve in his political philosophy. And it is also very common for first-time readers of Hobbes to get the impression that he believes we are all basically selfish.

There are good reasons why earlier interpreters and new readers tend to think the Hobbesian agent is ultimately self-interested. Hobbes likes to make bold and even shocking claims to get his point across. “I obtained two absolutely certain postulates of human nature,” he says, “one, the postulate of human greed by which each man insists upon his own private use of common property; the other, the postulate of natural reason, by which each man strives to avoid violent death” (De Cive, Epistle Dedicatory). What could be clearer?—We want all we can get, and we certainly want to avoid death. There are two problems with thinking that this is Hobbes’s considered view, however. First, quite simply, it represents a false view of human nature. People do all sorts of altruistic things that go against their interests. They also do all sorts of needlessly cruel things that go against self-interest (think of the self-defeating lengths that revenge can run to). So it would be uncharitable to interpret Hobbes this way, if we can find a more plausible account in his work. Second, in any case Hobbes often relies on a more sophisticated view of human nature. He describes or even relies on motives that go beyond or against self-interest, such as pity, a sense of honor or courage, and so on. And he frequently emphasizes that we find it difficult to judge or appreciate just what our interests are anyhow. (Some also suggest that Hobbes’s views on the matter shifted away from egoism after De Cive, but the point is not crucial here.)

The upshot is that Hobbes does not think that we are basically or reliably selfish; and he does not think we are fundamentally or reliably rational in our ideas about what is in our interests. He is rarely surprised to find human beings doing things that go against self-interest: we will cut off our noses to spite our faces, we will torture others for their eternal salvation, we will charge to our deaths for love of country. In fact, a lot of the problems that befall human beings, according to Hobbes, result from their being too little concerned with self-interest. Too often, he thinks, we are too much concerned with what others think of us, or inflamed by religious doctrine, or carried away by others’ inflammatory words. This weakness as regards our self-interest has even led some to think that Hobbes is advocating a theory known as ethical egoism. This is to claim that Hobbes bases morality upon self-interest, claiming that we ought to do what it is most in our interest to do. But we shall see that this would over-simplify the conclusions that Hobbes draws from his account of human nature.

d. Political Philosophy

This is Hobbes’s picture of human nature. We are needy and vulnerable. We are easily led astray in our attempts to know the world around us. Our capacity to reason is as fragile as our capacity to know; it relies upon language and is prone to error and undue influence. When we act, we may do so selfishly or impulsively or in ignorance, on the basis of faulty reasoning or bad theology or others’ emotive speech.

What is the political fate of this rather pathetic sounding creature—that is, of us? Unsurprisingly, Hobbes thinks little happiness can be expected of our lives together. The best we can hope for is peaceful life under an authoritarian-sounding sovereign. The worst, on Hobbes’s account, is what he calls the natural condition of mankind, a state of violence, insecurity and constant threat. In outline, Hobbes’s argument is that the alternative to government is a situation no one could reasonably wish for, and that any attempt to make government accountable to the people must undermine it, so threatening the situation of non-government that we must all wish to avoid. Our only reasonable option, therefore, is a “sovereign” authority that is totally unaccountable to its subjects. Let us deal with the “natural condition” of non-government, also called the “state of nature,” first of all.

5. The Natural Condition of Mankind

The state of nature is “natural” in one specific sense only. For Hobbes political authority is artificial: in the “natural” condition human beings lack government, which is an authority created by men. What is Hobbes’s reasoning here? He claims that the only authority that naturally exists among human beings is that of a mother over her child, because the child is so very much weaker than the mother (and indebted to her for its survival). Among adult human beings this is invariably not the case. Hobbes concedes an obvious objection, admitting that some of us are much stronger than others. And although he is very sarcastic about the idea that some are wiser than others, he does not have much difficulty with the idea that some are fools and others are dangerously cunning. Nonetheless, it is almost invariably true that every human being is capable of killing any other. “Even the strongest must sleep; even the weakest might persuade others to help him kill another”. (Leviathan, xiii.1-2) Because adults are equal in this capacity to threaten one another’s lives, Hobbes claims there is no natural source of authority to order their lives together. (He is strongly opposing arguments that established monarchs have a natural or God-given right to rule over us.)

Thus, as long as human beings have not successfully arranged some form of government, they live in Hobbes’s state of nature. Such a condition might occur at the “beginning of time” (see Hobbes’s comments on Cain and Abel, Leviathan, xiii.11, Latin version only), or in “primitive” societies (Hobbes thought the American Indians lived in such a condition). But the real point for Hobbes is that a state of nature could just as well occur in seventeenth century England, should the King’s authority be successfully undermined. It could occur tomorrow in every modern society, for example, if the police and army suddenly refused to do their jobs on behalf of government. Unless some effective authority stepped into the King’s place (or the place of army and police and government), Hobbes argues the result is doomed to be deeply awful, nothing less than a state of war.

Why should peaceful cooperation be impossible without an overarching authority? Hobbes provides a series of powerful arguments that suggest it is extremely unlikely that human beings will live in security and peaceful cooperation without government. (Anarchism, the thesis that we should live without government, of course disputes these arguments.) His most basic argument is threefold. (Leviathan, xiii.3-9) (i) He thinks we will compete, violently compete, to secure the basic necessities of life and perhaps to make other material gains. (ii) He argues that we will challenge others and fight out of fear (“diffidence”), so as to ensure our personal safety. (iii) And he believes that we will seek reputation (“glory”), both for its own sake and for its protective effects (for example, so that others will be afraid to challenge us).

This is a more difficult argument than it might seem. Hobbes does not suppose that we are all selfish, that we are all cowards, or that we are all desperately concerned with how others see us. Two points, though. First, he does think that some of us are selfish, some of us cowardly, and some of us “vainglorious” (perhaps some people are of all of these!). Moreover, many of these people will be prepared to use violence to attain their ends—especially if there is no government or police to stop them. In this Hobbes is surely correct. Second, in some situations it makes good sense, at least in the short term, to use violence and to behave selfishly, fearfully or vaingloriously. If our lives seem to be at stake, after all, we are unlikely to have many scruples about stealing a loaf of bread; if we perceive someone as a deadly threat, we may well want to attack first, while his guard is down; if we think that there are lots of potential attackers out there, it is going to make perfect sense to get a reputation as someone who should not be messed with. In Hobbes’s words, “the wickedness of bad men also compels good men to have recourse, for their own protection, to the virtues of war, which are violence and fraud”. (De Cive, Epistle Dedicatory) As well as being more complex than first appears, Hobbes’s argument becomes very difficult to refute.

Underlying this most basic argument is an important consideration about insecurity. As we shall see Hobbes places great weight on contracts (thus some interpreters see Hobbes as heralding a market society dominated by contractual exchanges). In particular, he often speaks of “covenants,” by which he means a contract where one party performs his part of the bargain later than the other. In the state of nature such agreements are not going to work. Only the weakest will have good reason to perform the second part of a covenant, and then only if the stronger party is standing over them. Yet a huge amount of human cooperation relies on trust, that others will return their part of the bargain over time. A similar point can be made about property, most of which we cannot carry about with us and watch over. This means we must rely on others respecting our possessions over extended periods of time. If we cannot do this, then many of the achievements of human society that involve putting hard work into land (farming, building) or material objects (the crafts, or modern industrial production, still unknown in Hobbes’s time) will be near impossible.

One can reasonably object to such points: Surely there are basic duties to reciprocate fairly and to behave in a trustworthy manner? Even if there is no government providing a framework of law, judgment and punishment, do not most people have a reasonable sense of what is right and wrong, which will prevent the sort of contract-breaking and generalized insecurity that Hobbes is concerned with? Indeed, should not our basic sense of morality prevent much of the greed, pre-emptive attack and reputation-seeking that Hobbes stressed in the first place? This is the crunch point of Hobbes’s argument, and it is here (if anywhere) that one can accuse Hobbes of pessimism. He makes two claims. The first concerns our duties in the state of nature (that is, the so-called “right of nature”). The second follows from this, and is less often noticed: it concerns the danger posed by our different and variable judgments of what is right and wrong.

On Hobbes’s view the right of nature is quite simple to define. Naturally speaking—that is, outside of civil society – we have a right to do whatever we think will ensure our self-preservation. The worst that can happen to us is violent death at the hands of others. If we have any rights at all, if (as we might put it) nature has given us any rights whatsoever, then the first is surely this: the right to prevent violent death befalling us. But Hobbes says more than this, and it is this point that makes his argument so powerful. We do not just have a right to ensure our self-preservation: we each have a right to judge what will ensure our self-preservation. And this is where Hobbes’s picture of humankind becomes important. Hobbes has given us good reasons to think that human beings rarely judge wisely. Yet in the state of nature no one is in a position to successfully define what is good judgment. If I judge that killing you is a sensible or even necessary move to safeguard my life, then—in Hobbes’s state of nature – I have a right to kill you. Others might judge the matter differently, of course. Almost certainly you will have quite a different view of things (perhaps you were just stretching your arms, not raising a musket to shoot me). Because we are all insecure, because trust is more-or-less absent, there is little chance of our sorting out misunderstandings peacefully, nor can we rely on some (trusted) third party to decide whose judgment is right. We all have to be judges in our own causes, and the stakes are very high indeed: life or death.

For this reason Hobbes makes very bold claims that sound totally amoral. “To this war of every man against every man,” he says, “this also is consequent [i.e., it follows]: that nothing can be unjust. The notions of right and wrong, justice and injustice have no place [in the state of nature]”. (Leviathan, xiii.13) He further argues that in the state of nature we each have a right to all things, “even to one another’s body’ (Leviathan, xiv.4). Hobbes is dramatizing his point, but the core is defensible. If I judge that I need such and such—an object, another person’s labor, another person’s death—to ensure my continued existence, then in the state of nature, there is no agreed authority to decide whether I’m right or wrong. New readers of Hobbes often suppose that the state of nature would be a much nicer place, if only he were to picture human beings with some basic moral ideas. But this is naïve: unless people share the same moral ideas, not just at the level of general principles but also at the level of individual judgment, then the challenge he poses remains unsolved: human beings who lack some shared authority are almost certain to fall into dangerous and deadly conflict.

There are different ways of interpreting Hobbes’s view of the absence of moral constraints in the state of nature. Some think that Hobbes is imagining human beings who have no idea of social interaction and therefore no ideas about right and wrong. In this case, the natural condition would be a purely theoretical construction, and would demonstrate what both government and society do for human beings. (A famous statement about the state of nature in De Cive (viii.1) might support this interpretation: “looking at men as if they had just emerged from the earth like mushrooms and grown up without any obligation to each other…”) Another, complementary view reads Hobbes as a psychological egoist, so that—in the state of nature as elsewhere – he is merely describing the interaction of ultimately selfish and amoral human beings.

Others suppose that Hobbes has a much more complex picture of human motivation, so that there is no reason to think moral ideas are absent in the state of nature. In particular, it is historically reasonable to think that Hobbes invariably has civil war in mind, when he describes our “natural condition.” If we think of civil war, we need to imagine people who have lived together and indeed still do live together—huddled together in fear in their houses, banded together as armies or guerrillas or groups of looters. The problem here is not a lack of moral ideas—far from it – rather that moral ideas and judgments differ enormously. This means (for example) that two people who are fighting tooth and nail over a cow or a gun can both think they are perfectly entitled to the object and both think they are perfectly right to kill the other—a point Hobbes makes explicitly and often. It also enables us to see that many Hobbesian conflicts are about religious ideas or political ideals (as well as self-preservation and so on)—as in the British Civil War raging while Hobbes wrote Leviathan, and in the many violent sectarian conflicts throughout the world today.

In the end, though, whatever account of the state of nature and its (a) morality we attribute to Hobbes, we must remember that it is meant to function as a powerful and decisive threat: if we do not heed Hobbes’s teachings and fail to respect existing political authority, then the natural condition and its horrors of war await us.

a. The Laws of Nature and the Social Contract

Hobbes thinks the state of nature is something we ought to avoid, at any cost except our own self-preservation (this being our “right of nature,” as we saw above). But what sort of ought is this? There are two basic ways of interpreting Hobbes here. It might be a counsel of prudence: avoid the state of nature, if you’re concerned to avoid violent death. In this case Hobbes’s advice only applies to us (i) if we agree that violent death is what we should fear most and should therefore avoid; and (ii) if we agree with Hobbes that only an unaccountable sovereign stands between human beings and the state of nature. This line of thought fits well with an egoistic reading of Hobbes, but it faces serious problems, as will be seen.

The other way of interpreting Hobbes is not without problems either. This takes Hobbes to be saying that we ought, morally speaking, to avoid the state of nature. We have a duty to do what we can to avoid this situation arising, and a duty to end it, if at all possible. Hobbes often makes his view clear, that we have such moral obligations. But then two difficult questions arise: Why these obligations? And why are they obligatory?

Hobbes frames the issues in terms of an older vocabulary, using the idea of natural law that many ancient and medieval philosophers had relied on. Like them, he thinks that human reason can discern some eternal principles to govern our conduct. These principles are independent of (though also complementary to) whatever moral instruction we might get from God or religion. In other words, they are laws given by nature rather than revealed by God. But Hobbes makes radical changes to the content of these so-called laws of nature. In particular, he does not think that natural law provides any scope whatsoever to criticize or disobey the actual laws made by a government. He thus disagrees with those Protestants who thought that religious conscience might sanction disobedience of immoral laws, and with Catholics who thought that the commandments of the Pope have primacy over those of national political authorities.

Although he sets out nineteen laws of nature, it is the first two that are politically crucial. A third, that stresses the important of keeping to contracts we have entered into, is important in Hobbes’s moral justifications of obedience to the sovereign. (The remaining sixteen can be quite simply encapsulated in the formula, do as you would be done by. While the details are important for scholars of Hobbes, they do not affect the overall theory and will be ignored here.)

The first law reads as follows:

Every man ought to endeavor peace, as far as he has hope of obtaining it, and when he cannot obtain it, that he may seek and use all helps and advantages of war. (Leviathan, xiv.4)

This repeats the points we have already seen about our right of nature, so long as peace does not appear to be a realistic prospect. The second law of nature is more complicated:

That a man be willing, when others are so too, as far-forth as for peace and defense of himself he shall think it necessary, to lay down this right to all things, and be contented with so much liberty against other men, as he would allow other men against himself. (Leviathan, xiv.5)

What Hobbes tries to tackle here is the transition from the state of nature to civil society. But how he does this is misleading and has generated much confusion and disagreement. The way that Hobbes describes this second law of nature makes it look as if we should all put down our weapons, give up (much of) our “right of nature,” and jointly authorize a sovereign who will tell us what is permitted and punish us if we do not obey. But the problem is obvious. If the state of nature is anything like as bad as Hobbes has argued, then there is just no way people could ever make an agreement like this or put it into practice.

At the end of Leviathan, Hobbes seems to concede this point, saying “there is scarce a commonwealth in the world whose beginnings can in conscience be justified” (Review and Conclusion, 8). That is: governments have invariably been foisted upon people by force and fraud, not by collective agreement. But Hobbes means to defend every existing government that is powerful enough to secure peace among its subjects—not just a mythical government that’s been created by a peaceful contract out of a state of nature. His basic claim is that we should behave as if we had voluntarily entered into such a contract with everyone else in our society—everyone else, that is, except the sovereign authority.

In Hobbes’s myth of the social contract, everyone except the person or group who will wield sovereign power lays down their “right to all things.” They agree to limit drastically their right of nature, retaining only a right to defend their lives in case of immediate threat. (How limited this right of nature becomes in civil society has caused much dispute, because deciding what is an immediate threat is a question of judgment. It certainly permits us to fight back if the sovereign tries to kill us. But what if the sovereign conscripts us as soldiers? What if the sovereign looks weak and we doubt whether he can continue to secure peace…?) The sovereign, however, retains his (or her, or their) right of nature, which we have seen is effectively a right to all things—to decide what everyone else should do, to decide the rules of property, to judge disputes and so on. Hobbes concedes that there are moral limits on what sovereigns should do (God might call a sovereign to account). However, since in any case of dispute the sovereign is the only rightful judge—on this earth, that is – those moral limits make no practical difference. In every moral and political matter, the decisive question for Hobbes is always: who is to judge? As we have seen, in the state of nature, each of us is judge in our own cause, part of the reason why Hobbes thinks it is inevitably a state of war. Once civil society exists, the only rightful judge is the sovereign.

b. Why Should we Obey the Sovereign?

If we had all made a voluntary contract, a mutual promise, then it might seem half-way plausible to think we have an obligation to obey the sovereign (although even this requires the claim that promising is a moral value that overrides all others). If we have been conquered or, more fortunately, have simply been born into a society with an established political authority, this seems quite improbable. Hobbes has to make three steps here, all of which have seemed weak to many of his readers. First of all, he insists that promises made under threat of violence are nonetheless freely made, and just as binding as any others. Second, he has to put great weight on the moral value of promise keeping, which hardly fits with the absence of duties in the state of nature. Third, he has to give a story of how those of us born and raised in a political society have made some sort of implied promise to each other to obey, or at least, he has to show that we are bound (either morally or out of self-interest) to behave as if we had made such a promise.

In the first place, Hobbes draws on his mechanistic picture of the world, to suggest that threats of force do not deprive us of liberty. Liberty, he says, is freedom of motion, and I am free to move whichever way I wish, unless I am literally enchained. If I yield to threats of violence, that is my choice, for physically I could have done otherwise. If I obey the sovereign for fear of punishment or in fear of the state of nature, then that is equally my choice. Such obedience then comes, for Hobbes, to constitute a promise that I will continue to obey.

Second, promises carry a huge moral weight for Hobbes, as they do in all social contract theories. The question, however, is why we should think they are so important. Why should my (coerced) promise oblige me, given the wrong you committed in threatening me and demanding my valuables? Hobbes has no good answer to this question (but see below, on egoistic interpretations of Hobbes’s thinking here). His theory suggests that (in the state of nature) you could do me no wrong, as the right of nature dictates that we all have a right to all things. Likewise, promises do not oblige in the state of nature, inasmuch as they go against our right of nature. In civil society, the sovereign’s laws dictate what is right and wrong; if your threat was wrongful, then my promise will not bind me. But as the sovereign is outside of the original contract, he sets the terms for everyone else: so his threats create obligations.

As this suggests, Hobbesian promises are strangely fragile. Implausibly binding so long as a sovereign exists to adjudicate and enforce them, they lose all power should things revert to a state of nature. Relatedly, they seem to contain not one jot of loyalty. To be logically consistent, Hobbes needs to be politically implausible. Now there are passages where Hobbes sacrifices consistency for plausibility, arguing we have a duty to fight for our (former) sovereign even in the midst of civil war. Nonetheless the logic of his theory suggests that, as soon as government starts to weaken and disorder sets in, our duty of obedience lapses. That is, when the sovereign power needs our support, because it is no longer able to coerce us, there is no effective judge or enforcer of covenants, so that such promises no longer override our right of nature. This turns common sense on its head. Surely a powerful government can afford to be challenged, for instance by civil disobedience or conscientious objection? But when civil conflict and the state of nature threaten, in other words when government is failing, then we might reasonably think that political unity is as morally important as Hobbes always suggests. A similar question of loyalty also comes up when the sovereign power has been usurped—when Cromwell has supplanted the King, when a foreign invader has ousted our government. Right from the start, Hobbes’s critics saw that his theory makes turncoats into moral heroes: our allegiance belongs to whoever happens to be holding the gun(s). Perversely, the only crime the makers of a coup can commit is to fail.

Why does this problem come about? To overcome the fact that his contract is a fiction, Hobbes is driven to construct a “sort of” promise out of the fact of our subjugation to whatever political authority exists. He stays wedded to the idea that obedience can only find a moral basis in a “voluntary” promise, because only this seems to justify the almost unlimited obedience and renunciation of individual judgment he is determined to prove. It is no surprise that Hobbes’s arguments creak at every point: nothing could bear the weight of justifying such an overriding duty.

All the difficulties in finding a reliable moral obligation to obey might tempt us back to the idea that Hobbes is some sort of egoist. However, the difficulties with this tack are even greater. There are two sorts of egoism commentators have attributed to Hobbes: psychological and ethical. The first theory says that human beings always act egoistically, the second that they ought to act egoistically. Either view might support this simple idea: we should obey the sovereign, because his political authority is what keeps us from the evils of the natural condition. But the basic problem with such egoistic interpretations, from the point of view of Hobbes’s system of politics, is shown when we think about cases where selfishness seems to conflict with the commands of the sovereign—for example, where illegal conduct will benefit us or keep us from danger. For a psychologically egoist agent, such behavior will be irresistible; for an ethically egoist agent, it will be morally obligatory. Now, providing the sovereign is sufficiently powerful and well-informed, he can prevent many such cases arising by threatening and enforcing punishments of those who disobey. Effective threats of punishment mean that obedience is in our self-interest. But such threats will not be effective when we think our disobedience can go undetected. After Orwell’s 1984 we can imagine a state that is so powerful that no reasonable person would ever think disobedience could pay. But for Hobbes, such a powerful sovereign was not even conceivable: he would have had to assume that there would be many situations where people could reasonably hope to “get away with it.” (Likewise, under non-totalitarian, liberal politics, there are many situations where illegal behavior is very unlikely to be detected or punished.) So, still thinking of egoistic agents, the more people do get away with it, the more reason others have to think they can do the same. Thus the problem of disobedience threatens to “snowball,” undermining the sovereign and plunging selfish agents back into the chaos of the state of nature.

In other words, sovereignty as Hobbes imagined it, and liberal political authority as we know it, can only function where people feel some additional motivation apart from pure self-interest. Moreover, there is strong evidence that Hobbes was well aware of this. Part of Hobbes’s interest in religion (a topic that occupies half of Leviathan) lies in its power to shape human conduct. Sometimes this does seem to work through self-interest, as in crude threats of damnation and hell-fire. But Hobbes’s main interest lies in the educative power of religion, and indeed of political authority. Religious practices, the doctrines taught in the universities (!), the beliefs and habits inculcated by the institutions of government and society: how these can encourage and secure respect for law and authority seem to be even more important to Hobbes’s political solutions than his theoretical social contract or shaky appeals to simple self-interest.

What are we to conclude, then, given the difficulties in finding a reliable moral or selfish justification for obedience? In the end, for Hobbes, everything rides on the value of peace. Hobbes wants to say both that civil order is in our “enlightened” self-interest, and that it is of overwhelming moral value. Life is never going to be perfect for us, and life under the sovereign is the best we can do. Recognizing this aspect of everyone’s self-interest should lead us to recognize the moral value of supporting whatever authority we happen to live under. For Hobbes, this moral value is so great—and the alternatives so stark – that it should override every threat to our self-interest except the imminent danger of death. The million-dollar question is then: is a life of obedience to the sovereign really the best human beings can hope for?

c. Life Under the Sovereign

Hobbes has definite ideas about the proper nature, scope and exercise of sovereignty. Much that he says is cogent, and much of it can reduce the worries we might have about living under this drastically authoritarian sounding regime. Many commentators have stressed, for example, the importance Hobbes places upon the rule of law. His claim that much of our freedom, in civil society, “depends on the silence of the laws” is often quoted (Leviathan, xxi.18). In addition, Hobbes makes many points that are obviously aimed at contemporary debates about the rights of King and Parliament—especially about the sovereign’s rights as regards taxation and the seizure of property, and about the proper relation between religion and politics. Some of these points continue to be relevant, others are obviously anachronistic: evidently Hobbes could not have imagined the modern state, with its vast bureaucracies, massive welfare provision and complicated interfaces with society. Nor could he have foreseen how incredibly powerful the state might become, meaning that sovereigns such as Hitler or Stalin might starve, brutalize and kill their subjects, to such an extent that the state of nature looks clearly preferable.

However, the problem with all of Hobbes’s notions about sovereignty is that—on his account – it is not Hobbes the philosopher, nor we the citizens, who decide what counts as the proper nature, scope or exercise of sovereignty. He faces a systematic problem: justifying any limits or constraints on the sovereign involves making judgments about moral or practical requirements. But one of his greatest insights, still little recognized by many moral philosophers, is that any right or entitlement is only practically meaningful when combined with a concrete judgment as to what it dictates in some given case. Hobbes’s own failure, however understandable, to foresee the growth of government and its powers only supports this thought: that the proper nature, scope or exercise of sovereignty is a matter of complex judgment. Alone among the people who comprise Hobbes’s commonwealth, it is the sovereign who judges what form he should appear in, how far he should reach into the lives of his subjects, and how he should exercise his powers.

It should be added that the one part of his system that Hobbes concedes not to be proven with certainty is just this question: who or what should constitute the sovereign power. It was natural for Hobbes to think of a King, or indeed a Queen (he was born under Elizabeth I). But he was certainly very familiar with ancient forms of government, including aristocracy (government by an elite) and democracy (government by the citizens, who formed a relatively small group within the total population). Hobbes was also aware that an assembly such as Parliament could constitute a sovereign body. All have advantages and disadvantages, he argues. But the unity that comes about from having a single person at the apex, together with fixed rules of succession that pre-empt dispute about who this person should be, makes monarchy Hobbes’s preferred option.

In fact, if we want to crack open Hobbes’s sovereign, to be able to lay down concrete ideas about its nature and limits, we must begin with the question of judgment. For Hobbes, dividing capacities to judge between different bodies is tantamount to letting the state of nature straight back in. “For what is it to divide the power of a commonwealth, but to dissolve it; for powers divided mutually destroy each other”. (Leviathan, xxix.12; cf De Cive, xii.5) Beyond the example of England in the 1640s, Hobbes hardly bothers to argue the point, although it is crucial to his entire theory. Always in his mind is the Civil War that arose when Parliament claimed the right to judge rules of taxation, and thereby prevented the King from ruling and making war as he saw fit, and when churches and religious sects claimed prerogatives that went against the King’s decisions.

Especially given modern experiences of the division of powers, however, it is easy to see that these examples are extreme and atypical. We might recall the American constitution, where powers of legislation, execution and case-by-case judgment are separated (to Congress, President and the judiciary respectively) and counter-balance one another. Each of these bodies is responsible for judging different questions. There are often, of course, boundary disputes, as to whether legislative, executive or judicial powers should apply to a given issue, and no one body is empowered to settle this crucial question of judgment. Equally obviously, however, such disputes have not led to a state of nature (well, at least if we think of the US after the Civil War). For Hobbes it is simply axiomatic that disputation as to who should judge important social and political issues spells the end of the commonwealth. For us, it is equally obvious that only a few extreme forms of dispute have this very dangerous power. Dividing the powers that are important to government need not leave a society more open to those dangerous conflicts. Indeed, many would now argue that political compromises which provide different groups and bodies with independent space to judge certain social or political issues can be crucial for preventing disputes from escalating into violent conflict or civil war.

6. Conclusion

What happens, then, if we do not follow Hobbes in his arguments that judgment must, by necessity or by social contract or both, be the sole province of the sovereign? If we are optimists about the power of human judgment, and about the extent of moral consensus among human beings, we have a straightforward route to the concerns of modern liberalism. Our attention will not be on the question of social and political order, rather on how to maximize liberty, how to define social justice, how to draw the limits of government power, and how to realize democratic ideals. We will probably interpret Hobbes as a psychological egoist, and think that the problems of political order that obsessed him were the product of an unrealistic view of human nature, or unfortunate historical circumstances, or both. In this case, I suggest, we might as well not have read Hobbes at all.

If we are less optimistic about human judgment in morals and politics, however, we should not doubt that Hobbes’s problems remain our problems. But hindsight shows grave limitations to his solutions. Theoretically, Hobbes fails to prove that we have an almost unlimited obligation to obey the sovereign. His arguments that sovereignty—the power to judge moral and political matters, and enforce those judgments—cannot be divided are not only weak; they are simply refuted by the (relatively) successful distribution of powers in modern liberal societies. Not least, the horrific crimes of twentieth century dictatorships show beyond doubt that judgment about right and wrong cannot be a question only for our political leaders.

If Hobbes’s problems are real and his solutions only partly convincing, where will we go? It might reasonably be thought that this is the central question of modern political thought. We will have no doubt that peaceful coexistence is one of the greatest goods of human life, something worth many inconveniences, sacrifices and compromises. We will see that there is moral force behind the laws and requirements of the state, simply because human beings do indeed need authority and systems of enforcement if they are to cooperate peacefully. But we can hardly accept that, because human judgment is weak and faulty, that there can be only one judge of these matters—precisely because that judge might turn out to be very faulty indeed. Our concern will be how we can effectively divide power between government and people, while still ensuring that important questions of moral and political judgment are peacefully adjudicated. We will be concerned with the standards and institutions that provide for compromise between many different and conflicting judgments. And all the time, we will remember Hobbes’s reminder that human life is never without inconvenience and troubles, that we must live with a certain amount of bad, to prevent the worst: fear of violence, and violent death.

7. References and Further Reading

  • Edwards, Alistair (2002) “Hobbes” in Interpreting Modern Political Philosophy: From Machiavelli to Marx, eds. A Edwards and J Townshend (Palgrave Macmillan, Houndmills)
    • A very helpful overview of key interpretative debates about Hobbes in the twentieth century.
  • Hill, Christopher (1961/1980) The Century of Revolution, 1603-1714, second ed (Routledge, London)
    • The classic work on the history and repercussions of England’s civil war.
  • Hobbes, Thomas (1998 [1642]) On the Citizen, ed & trans Richard Tuck and Michael Silverthorne (Cambridge University Press, Cambridge)
    • The best translation of Hobbes’s most straightforward book,De Cive.
  • Hobbes, Thomas (1994 [1651/1668]) Leviathan, ed Edwin Curley (Hackett, Indianapolis)
    • The best edition of Hobbes’s magnum opus, including extensive additional material and many important variations (ignored by all other editions) between the English text and later Latin edition.
  • Sorrell, Tom (1986) Hobbes (Routledge & Kegan Paul, London)
    • A concise and well-judged account of Hobbes’s life and works.
  • Sorrell, Tom, ed (1996) The Cambridge Companion to Hobbes (Cambridge University Press, Cambridge)
    • An excellent set of essays on all aspects of Hobbes’s intellectual endeavors.

Author Information

Garrath Williams
Email: g.d.williams@lancaster.ac.uk
Lancaster University
United Kingdom

Nasir Khusraw (1004—1060)

Abu Mo’in Hamid al-Din Nasir ibn Khusraw is an important figure in the development of Ismaili philosophy. Much of his biography and philosophical ideology has been obtained through fragmented texts, both in poetry and prose.  Born into a politically connected family, Khusraw was well-educated and in the sciences and humanities.  Having spent most of his life occupying prestigious positions within the Sajuq court, Khusraw converted to the Ismaili faith at the age of forty after careful study.  He spent the rest of his life writing and advocating for the Ismaili faith, and eventually was forced into exile by Sunni authorities.

Consistent with other Ismaili philosopher, Khusraw’s cosmology is heavily inspired by Neoplatonism.  His metaphysics describes a God from which everything emanates and consistently strives back towards.  Through God, existence is cast into being through Universal Soul and Universal Intellect.  Each of these concepts provides the foundation for material objects, ascending from minerals to human beings.  Within each human being exists a soul and intellect, imperfect in form but existing within the Universals.   Khusraw interweaves his metaphysics within the Shi’i doctrine, requiring a divinely inspired guide to assist us in our journey to reconnect with Universal Intellect and Soul.  In holding to this cosmogonic description, Khusraw distinguishes his philosophy from previous Ismaili thought introduced by al-Farabi and picked up by Ibn Sina and al-Kirmani.

Table of Contents

  1. Life
  2. Philosophy
  3. References and Further Reading

1. Life

In striking contrast to other Ismaili writers of the time (s.v., Hamid ai-din al Kirmani; Abu Ya‘qub al-Sijistani), many sources of information exist pertaining to Khusraw’s life.   Documentation was recorded,  with vary degrees of accuracy, by Khusraw himself, a (hostile) contemporary, and by later historians.  Since his death, Khusraw has been included in every major literary or historical survey of Ismailism.  Khusraw’s life can be divided into four periods: his early years up to the age of forty (discernible from fragments of various texts); his conversion to Ismailism (of which he has left two different versions in the form of prose and poetry); his seven-year journey (documented in Safarnama); and his years of preaching followed by persecution and exile (drawn primarily from his poetry, but also a few statements in his philosophical works).

In 1004, Abu Mo’in Hamid al-Din Nasir ibn Khusraw was born in Qobadiyan, the district of Marv, in the eastern Iranian province of Khurasan. Along with two of his brothers, Khusraw occupied a high position in the administrative ranks of the Saljuq court – reportedly in the revenue department.  Evidence also suggests that he was familiar with the court of previous dynasty, the Ghaznavids.  Based on the quality of his writings, he received an excellent education in the sciences, literatures and philosophies of his time, including the study of Greek and Neoplatonic philosophy.  In his writing, Khusraw reportes examining the doctrines of the different Islamic schools and not being satisfied until he found and understood the Ismaili faith.  As a result of his conversion to Ismailism he embarked on a seven-year journey, during which time he spent three years in the Ismaili court in Cairo under the Fatimid caliph, al-Mustansir (1029-1094). The Fatimid dynasty (909-1171) aimed at creating an Islamic state based on Ismaili tenets, and thus presented a direct theological and military challenge to the Sunni ‘Abbasid caliphate based in Baghdad. Khusraw left Cairo as the head (hujjat) of Ismaili missionary activities in his home province of Khurasan.  After leaving Cairo, Khusraw was forced into exile by the Sunni authorities.  He spent the rest of his life exiled in the Pamir Mountains in Badakhshan, located in modern-day Tajikistan and Afghanistan.

2. Philosophy

Khusraw’s philosophical works reveal a strong Neoplatonic structure and vocabulary.  For example, his cosmogony closely follows Plotinus, moving from God and God’s word (logos) to Intellect, Soul, and the world of Nature.  Underlying each of the Ismaili cosmogonic systems is a fundamental division of the world into two realms, the esoteric (batin) and the exoteric (zahir).  From this division, everything in the physical world points to its counterpart in the spiritual, which is seen as its source, or true form.  The cosmogonic structure itself reveals a purposeful, providential unfolding from the spiritual realm into the physical world.  Conversely, as a reflection, the physical world seeks to grasp the spiritual realm and comprehend it.    In holding to this cosmogonic description, Khusraw follows his fellow Ismailis (Nasafi and al-Sijistani) while differentiating his theory from the structure introduced by al-Farabi and later adopted by Ibn Sina and the Ismaili philosopher al-Kirmani.

Khusraw begins with a discussion of tawhid (oneness, God’s unity), the clear understanding of which is the only way to achieve spiritual perfection. For Nasir, God Himself is indescribable beyond all categories of being and non-being (nothing which has an opposite can be ascribed to Him, since that would be limiting Him to human concepts).   However, from God emerges his Word (kalmia), ‘Be!’, which brings into existence Universal Intellect, perfect in potentiality and actuality.  Universal Intellect transcends time and space,  containing all being within itself.  Universal Intellect enjoys a worshipful intimacy with God and derives perfection from this intimacy.  From this worship emerges Universal Soul, perfect in potentiality but not in actuality because it is separated from God by Intellect.  Universal Soul recognizes its separation from God, and moves closer to God in a desire for the perfection enjoyed by Intellect.  Through its search for perfection, Universal Soul introduces the first movement into the entire structure, manifest in time and space.

The entire cosmos is set into motion through the movement of Universal Soul.  As a corollary, being is differentiated into two sets of opposites:  hot and cold, wet and dry.  Derived from these sets of opposites are the four elements: earth, air, fire, and water.  From these four elements arise the successive development of   minerals, plants, and animals.  Finally, as the summit of physical creation, human beings arise.  Within each human being exists an individual intellect and individual soul manifesting the same characteristics (but on a smaller level) as the universals.  In fact, the entire cosmos is formed on a matrix of Intellect and Soul; everything within the cosmos displays original intelligence and the search for perfection exhibited by the soul.

Khusraw’s ethics grow from and reflect this cosmogony. Each individual’s task is to recognize his or her own imperfections and then move to correct them, seeking the closest relationship possible with God.  For Khusraw, this is achieved by stringent and repeated application of the intellect to both physical and spiritual matters.  In order to correct these imperfections a believer must find a guide and study dilligently, perform all required religious acts with a full understanding, and supplement new understanding with higher levels of worldly activity.  As an Ismaili, Khusraw held the Shi‘i doctrine that God would not send a revelation without a guide to interpret it.  For the Ismailis, this guide must be a living person, the Imam of the Time.  As a living bridge between the two realms, this person must be divinely inspired, infallible, and perfectly capable of providing guidance in spiritual and worldly affairs.

3. References and Further Reading

The following sources elucidate Khusraw’s philosophy:

  • H. Corbin, “Nasir-i Khusrau and Iranian Ismailism,” in The Cambridge History of Iran: Volume 4, ed., R. N. Frye (Cambridge 1975), pp. 520-42 and 689-90;
  • A. Hunsberger, “Nasir Khusraw: Fatimid Intellectual,” in F. Daftary, ed., Intellectual Traditions in Islam (London 2000), pp. 112-29;
  • A. Hunsberger, Nasir Khusraw’s Doctrine of the Soul: From the Universal Intellect to the Physical World in Ismaili Philosophy, PhD thesis, Columbia University, New York, 1992;
  • S. Meskoob, Shahrokh, “The Origin and Meaning of ‘Aql (Reason) in the View of Nasir Khusraw,” Iran Nameh, 6 (1989), pp. 239-57, and 7 (1989), pp. 405-29.

For a full bibliography of Nasir Khusraw’s works and ideas, see:

  • A. C. Hunsberger, Nasir Khusraw, the Ruby of Badakhshan: A Portrait of the Persian Poet, Traveller and Philosopher (London 2000).

For works still in manuscript, see:

  • I. K. Poonawala, Bibibliography of Ismaili Literature, Malibu, Calif., 1977, p. 123.

Author Information

Alice C. Hunsberger
Email: info@iis.ac.uk
Institute of Ismaili Studies
United Kingdom

Logical Consequence

Logical consequence is arguably the central concept of logic. The primary aim of logic is to tell us what follows logically from what. In order to simplify matters we take the logical consequence relation to hold for sentences rather than for abstract propositions, facts, state of affairs, etc. Correspondingly, logical consequence is a relation between a given class of sentences and the sentences that logically follow. One sentence is said to be a logical consequence of a set of sentences, if and only if, in virtue of logic alone, it is impossible for the sentences in the set to be all true without the other sentence being true as well. If sentence X is a logical consequence of a set of sentences K, then we may say that K implies or entails X, or that one may correctly infer the truth of X from the truth of the sentences in K. For example, Kelly is not at work is a logical consequence of Kelly is not both at home and at work and Kelly is at home. However, the sentence Kelly is not a football fan does not follow from All West High School students are football fans and Kelly is not a West High School student. The central question to be investigated here is: What conditions must be met in order for a sentence to be a logical consequence of others?

One popular answer derives from the work of Alfred Tarski, one of the preeminent logicians of the twentieth century, in his famous 1936 paper, “The Concept of Logical Consequence.” Here Tarski uses his observations of the salient features of what he calls the common concept of logical consequence to guide his theoretical development of it. Accordingly, we begin by examining the common concept focusing on Tarski’s observations of the criteria by which we intuitively judge what follows from what and which Tarski thinks must be reflected in any theory of logical consequence. Then two theoretical definitions of logical consequence are introduced: the model theoretic and the deductive theoretic definitions. They represent two major approaches to making the common concept of logical consequence more precise. The article concludes by highlighting considerations relevant to evaluating model theoretic and deductive theoretic characterizations of logical consequence.

Table of Contents

  1. Introduction
  2. The Concept of Logical Consequence: Model-Theoretic and Deductive-Theoretic Conceptions of Logic
    1. Tarski’s Characterization of the Common Concept of Logical Consequence
      1. The Logical Consequence Relation Has a Modal Element
      2. The Logical Consequence Relation is Formal
      3. The Logical Consequence Relation is A Priori
    2. Logical and Non-Logical Terminology
      1. The Nature of Logical Constants Explained in Terms of Their Semantic Properties
      2. The Nature of Logical Constants Explained in Terms of Their Inferential Properties
  3. Model-Theoretic and Deductive-Theoretic Conceptions of Logic
  4. Conclusion
  5. References and Further Reading

1. Introduction

For a given language, a sentence is said to be a logical consequence of a set of sentences, if and only if, in virtue of logic alone, the sentence must be true if every sentence in the set were to be true. This corresponds to the ordinary notion of a sentence “logically following” from others. Logicians have attempted to make the ordinary concept more precise relative to a given language L by sketching a deductive system for L, or by formalizing the intended semantics for L. Any adequate precise characterization of logical consequence must reflect its salient features such as those highlighted by Alfred Tarski: (1) that the logical consequence relation is formal, that is, depends on the forms of the sentences involved, (2) that the relation is a priori, that is, it is possible to determine whether or not it holds without appeal to sense-experience, and (3) that the relation has a modal element.

For more comprehensive presentations of the two definitions of logical consequence, as well as further critical discussion, see the entries Logical Consequence, Model-Theoretic Conceptions and Logical Consequence, Deductive-Theoretic Conceptions.

2. The Concept of Logical Consequence

a. Tarski’s Characterization of the Common Concept of Logical Consequence

Tarski begins his article, “On the Concept of Logical Consequence,” by noting a challenge confronting the project of making precise the common concept of logical consequence.

The concept of logical consequence is one of those whose introduction into a field of strict formal investigation was not a matter of arbitrary decision on the part of this or that investigator; in defining this concept efforts were made to adhere to the common usage of the language of everyday life. But these efforts have been confronted with the difficulties which usually present themselves in such cases. With respect to the clarity of its content the common concept of consequence is in no way superior to other concepts of everyday language. Its extension is not sharply bounded and its usage fluctuates. Any attempt to bring into harmony all possible vague, sometimes contradictory, tendencies which are connected with the use of this concept, is certainly doomed to failure. We must reconcile ourselves from the start to the fact that every precise definition of this concept will show arbitrary features to a greater or less degree. (Tarski 1936, p. 409)

Not every feature of the technical account will be reflected in the ordinary concept, and we should not expect any clarification of the concept to reflect each and every deployment of it in everyday language and life. Nevertheless, despite its vagueness, Tarski believes that there are identifiable, essential features of the common concept of logical consequence.

…consider any class K of sentences and a sentence X which follows from this class. From an intuitive standpoint, it can never happen that both the class K consists of only true sentences and the sentence X is false. Moreover, since we are concerned here with the concept of logical, that is, formal consequence, and thus with a relation which is to be uniquely determined by the form of the sentences between which it holds, this relation cannot be influenced in any way by empirical knowledge, and in particular by knowledge of the objects to which the sentence X or the sentences of class K refer. The consequence relation cannot be affected by replacing designations of the objects referred to in these sentences by the designations of any other objects. (Tarski 1936, pp. 414-415)

According to Tarski, the logical consequence relation as it is employed by typical reasoners is (1) necessary, (2) formal, and (3) not influenced by empirical knowledge. I now elaborate on (1)-(3) in order to shape two preliminary characterizations of logical consequence.

i. The Logical Consequence Relation Has a Modal Element

Tarski countenances an implicit modal notion in the common concept of logical consequence. If X is a logical consequence of K, then not only is it the case that not all of the elements of K are true and X is false, but also this is necessarily the case. That is, X follows from K only if it is not possible for all of the sentences in K to be true with X false. For example, the supposition that All West High School students are football fans and that Kelly is not a West High School student does not rule out the possibility that Kelly is a football fan. Hence, the sentences All West High School students are football fans and Kelly is not a West High School student do not entail Kelly is not a football fan, even if she, in fact, isn’t a football fan. Also, Most of Kelly’s male classmates are football fans does not entail Most of Kelly’s classmates are football fans. What if the majority of Kelly’s class is composed of females who are not fond of football?

We said above that Kelly is not both at home and at work and Kelly is at home jointly imply Kelly is not at work. Note that it doesn’t seem possible for the first two sentences to be true and Kelly is not at work false. But it is hard to see what this comes to without further clarification of the relevant notion of possibility. For example, consider the following pairs of sentences.

Kelly kissed her sister at 2:00pm.
2:00pm is not a time during which Kelly
and her sister were 100 miles apart.
Kelly is a female.
Kelly is not the US President.
There is a chimp in Paige’s house.
There is a primate in Paige’s house.
Ten is a prime number.
Ten is greater than nine.

For each pair of sentences, there is a sense in which it is not possible for the first to be true and the second false. At the very least an account of logical consequence must distinguish logical possibility from other types of possibility. Should truths about physical laws, US political history, zoology, and mathematics constrain what we take to be possible in determining whether or not the first sentence of each pair could logically be true with the second sentence false? If not, then this seems to mystify logical possibility (e.g., how could ten be a prime number?). To paraphrase questions asked by G.E. Moore (1959, pp. 231-238), given that I know that George W. Bush is US President and that he is not a female named Kelly, isn’t it inconsistent for me to grant the logical possibility of the truth of Kelly is a female and the falsity of Kelly is not the US President? Or should I ignore my present state of knowledge in considering what is logically possible? Tarski does not derive a clear notion of logical possibility from the common concept of logical consequence. Perhaps there is none to be had, and we should seek the help of a proper theoretical development in clarifying the notion of logical possibility. Towards this end, let’s turn to the other features of logical consequence highlighted by Tarski, starting with the formality criterion of logical consequence.

ii. The Logical Consequence Relation is Formal

Tarski observes that logical consequence is a formal consequence relation. And he tells us that a formal consequence relation is a consequence relation that is uniquely determined by the form of the sentences between which it holds. Consider the following pair of sentences

(1) Some children are both lawyers and peacemakers.
(2) Some children are peacemakers

Intuitively, (2) is a logical consequence of (1). It appears that this fact does not turn on the subject matter of the sentences. Replace ‘children’, ‘lawyers’, and ‘peacemakers’ in (1) and (2) with the variables S, M, and P to get the following.

(1′) Some S are both M and P
(2′) Some S are P

(1′) and (2′) are forms of (1) and (2), respectively. Note that there is no interpretation of S, M, and P according to which the sentence that results from (1′) is true and the resulting instance of (2′) is false. Hence, (2) is a formal consequence of (1) and on each interpretation of S, M, and P the resulting (2′) is a formal consequence of the sentence that results from (1′) (e.g., Some clowns are sad is a formal consequence of Some clowns are both lonely and sad). Tarski’s observation is that for any sentence X and set K of sentences, X is a logical consequence of K only if X is a formal consequence of K. The formality criterion of logical consequence can work in explaining why one sentence doesn’t entail another in cases where it seems impossible for the first to be true and the second false. For example, (3) is false and (4) is true.

(3) Ten is a prime number
(4) Ten is greater than nine

Does (4) follow from (3)? One might think that (4) does not follow from (3) because being a prime number does not necessitate being greater than nine. However, this does not require one to think that ten could be a prime number and less than or equal to nine, which is probably a good thing since it is hard to see how this is possible. Rather, we take

(3′) a is a P
(4′) a is R than b

to be the forms of (5) and (6) and note that there are interpretations of ‘a’, ‘b’, ‘P’, and ‘R’ according to which the first is true and the second false (e.g. let ‘a’ and ‘b’ name the numbers two and ten, respectively, and let ‘P’ mean prime number, and ‘R’ greater). Note that the claim here is not that formality is sufficient for a consequence relation to qualify as logical but only that it is a necessary condition. I now elaborate on this last point by saying a little more about forms of sentences (that is, sentential forms) and formal consequence.

Distinguishing between a term of a sentence replaced with a variable and one held constant determines a form of the sentence. In Some children are both lawyers and peacemakers we may replace ‘Some’ with a variable and treat all the other terms as constant. Then

(1”) D children are both lawyers and peacemakers

is a form of (1), and each sentence generated by assigning a meaning to D shares this form with (1). For example, the following three sentences are instances of (1”), produced by interpreting D as ‘No’, ‘Many’, and ‘Few’.

No children are both lawyers and peacemakers
Many children are both lawyers and peacemakers
Few children are both lawyers and peacemakers

Whether X is a formal consequence of K then turns on a prior selection of terms as constant and others replaced with variables. Relative to such a determination, X is a formal consequence of K if and only if (iff) there is no interpretation of the variables according to which each of the K are true and X is false. So, taking all the terms, except for ‘Some’, in (1) Some children are both philosophers and peacemakers and in (2) Some children are peacemakers as constants makes the following forms of (1) and (2).

(1”) D children are both lawyers and peacemakers
(2”) D children are peacemakers

Relative to this selection, (2) is not a formal consequence of (1) because replacing ‘D’ with ‘No’ yields a true instance of (1”) and a false instance of (2”).

Consider the following pair.

(5) Kelly is female
(6) Kelly is not US President

(6) is a formal consequence of (5) relative to replacing ‘Kelly’ with a variable. Given current U.S. political history, there is no individual whose name yields a true (5) and a false (6) when it replaces ‘Kelly’. This is not, however, sufficient reason for seeing (6) as a logical consequence of (5). There are two ways of thinking about why, a metaphysical consideration and an epistemological one. First the metaphysical consideration. It seems possible for (5) to be true and (6) false. The course of U.S. political history could have turned out differently. One might think that the current US President could–logically–have been a female named, say, ‘Sally’. Using ‘Sally’ as a replacement for ‘Kelly’ would yield in that situation a true (5) and a false (6). Also, it seems possible that in the future there will be a female US President. In order for a formal consequence relation from K to X to qualify as logical it has to be the case that it is necessary that there is no interpretation of the variables in K and X according to which the K-sentences are true and X is false.

The epistemological consideration is that one might think that knowledge that X follows logically from K should not essentially depend on being justified by experience of extra-linguistic states of affairs. Clearly, the determination that (6) follows formally from (5) essentially turns on empirical knowledge, specifically knowledge about the current political situation in the US. This leads to the final highlight of Tarski’s rendition of the intuitive concept of logical consequence: that logical consequence cannot be influenced by empirical knowledge.

iii. The Logical Consequence Relation is A Priori

Tarski says that by virtue of being formal, knowledge that X follows logically from K cannot be affected by knowledge of the objects that X and the sentences of K are about. Hence, our knowledge that X is a logical consequence of K cannot be influenced by empirical knowledge. However, as noted above, formality by itself does not insure that the extension of a consequence relation is not influenced by empirical knowledge. So, let’s view this alleged feature of logical consequence as independent of formality. We characterize empirical knowledge in two steps as follows. First, a priori knowledge is knowledge “whose truth, given an understanding of the terms involved, is ascertainable by a procedure which makes no reference to experience” (Hamlyn 1967, p. 141). Empirical, or a posteriori, knowledge is knowledge that is not a priori, that is, knowledge whose validation necessitates a procedure that does make reference to experience. We can safely read Tarski as saying that a consequence relation is logical only if knowledge that something falls in its extension is a priori, that is, only if the relation is a priori. Knowledge of physical laws, a determinant in people’s observed sizes, is not a priori and such knowledge is required to know that there is no interpretation of k, h, and t according to which (7) is true and (8) false.

(7) k kissed h at time t
(8) t is not a time during which k and h were 100 miles apart

So (8) cannot be a logical consequence of (7). However, my knowledge that Kelly is not Paige’s only friend follows from Kelly is taller than Paige’s only friend is a priori since I know a priori that nobody is taller than herself.

Let’s summarize and tie things together. We began by asking, for a given language L, what conditions must be met in order for a sentence X of L to be a logical consequence of a class K of L-sentences? Tarski thinks that an adequate response must reflect the common concept of logical consequence, that is, the concept as it is ordinarily employed. By the lights of this concept, an adequate account of logical consequence must reflect the formality and necessity of logical consequence, and must also reflect the fact that knowledge of what follows logically from what is a priori. Tying the criteria together, in order to fix what follows logically from what in a given language L, we must select a class of constants that determines a formal consequence relation that is both necessary and known, if at all, a priori. Such constants are called logical constants, and we say that the logical form of a sentence is a function of the logical constants that occur in the sentence and the pattern of the remaining expressions. As was illustrated above, the notion of formality does not presuppose a criterion of logical constancy. A consequence relation based on any division between constants and terms replaced with variables will automatically be formal with respect to the latter.

b. Logical and Non-Logical Terminology

Tarski’s basic move from his rendition of the common concept of logical consequence is to distinguish between logical terms and non-logical terms and then say that X is a logical consequence of K only if there is no possible interpretation of the non-logical terms of the language L that makes all of the sentences in K true and X false. The choice of the right terms as logical will reflect the modal element in the concept of logical consequence, that is, will insure that there is no ‘possible’ interpretation of the variable, non-logical terms of the language L that makes all of the K true and X false, and will insure that this is known a priori. Of course, we have yet to spell out the modal notion in the concept of logical consequence. Tarski pretty much left this underdeveloped in his (1936). Lacking such an explanation hampers our ability to clarify the rationale for a selection of terms to serve as the logical ones.

Traditionally, logicians have regarded sentential connectives such as and, not, or, ifthen, the quantifiers all and some, and the identity predicate ‘=’ as logical terms. Remarking on the boundary between logical and non-terms, Tarski (1936, p. 419) writes the following.

Underlying this characterization of logical consequence is the division of all terms of the language discussed into logical and extra-logical. This division is not quite arbitrary. If, for example, we were to include among the extra-logical signs the implication sign, or the universal quantifier, then our definition of the concept of consequence would lead to results which obviously contradict ordinary usage. On the other hand, no objective grounds are known to me which permit us to draw a sharp boundary between the two groups of terms. It seems to be possible to include among logical terms some which are usually regarded by logicians as extra-logical without running into consequences which stands in sharp contrast to ordinary usage.

Tarski seems right to think that the logical consequence relation turns on the work that the logical terminology does in the relevant sentences. It seems odd to say that Kelly is happy does not logically follow from All are happy because the second is true and the first false when All is replaced with Few. However, by Tarski’s version of the ordinary concept of logical consequence there is no reason not to treat say taller than as a logical term along with not and, therefore, no reason not to take Kelly is not taller than Paige as following logically from Paige is taller than Kelly. Also, it seems plausible to say that I know a priori that there is no possible interpretation of Kelly and is mortal according to which it is necessary that Kelly is mortal is true and Kelly is mortal is false. This makes Kelly is mortal a logical consequence of it is necessary that Kelly is mortal. Given that taller than and it is necessary that, along with other terms, were not generally regarded as logical terms by logicians of Tarski’s day, the fact that they seem to be logical terms by the common concept of logical consequence, as observed by Tarski, highlights the question of what it takes to be a logical term. Tarski says that future research will either justify the traditional boundary between the logical and the non-logical or conclude that there is no such boundary and the concept of logical consequence is a relative concept whose extension is always relative to some selection of terms as logical (p. 420). For further discussion of Tarski’s views on logical terminology and contemporary views see Logical Consequence, Model-Theoretic Conceptions: Section 5.3.

How, exactly, does the terminology usually regarded by logicians as logical work in making it the case that one sentence follows from others? In the next two sections two distinct approaches to understanding the nature of logical terms are sketched. Each approach leads to a unique way of characterizing logical consequence and thus yields a unique response to the above question.

i. The Nature of Logical Constants Explained in Terms of Their Semantic Properties

Consider the following metaphor, borrowed from Bencivenga (1999).

The locked room metaphor

Suppose that you are locked in a dark windowless room and you know everything about your language but nothing about the world outside. A sentence X and a class K of sentences are presented to you. If you can determine that X is true if all the sentences in K are, X is a logical consequence of K.

Ignorant of US politics, I couldn’t determine the truth of Kelly is not US President solely on the basis of Kelly is a female. However, behind such a veil of ignorance I would be able to tell that Kelly is not US President is true if Kelly is female and Kelly is not US President is true. How? Short answer: based on my linguistic competence; longer answer: based on my understanding of the semantic contribution of and to the determination of the truth conditions of a sentence of the form P and Q. For any sentences P and Q, I know that P and Q is true just in case P is true and Q is true. So, I know, a priori, if P and Q is true, then Q is true. As noted by one philosopher, “This really is remarkable since, after all, it’s what they mean, together with the facts about the non-linguistic world, that decide whether P or Q are true” (Fodor 2000, p.12).

Taking not and and to be the only logical constants in (9) Kelly is not both at home and at work, (10) Kelly is at home, and (11) Kelly is not at work, we formalize the sentences as follows, letting k mean Kelly, H mean is at home, and W mean is at work.

(9′) not-(Hk and Wk)
(10′) Hk
(11′) not-Wk

There is no interpretation of k, H, and W according to which (9′) and (10′) are true and (11′) is false. The reason why turns on the semantic properties of and and not, which are knowable a priori. Suppose (9′) and (10′) are true on some interpretation of the variable terms. Then the meaning of not in (9′) makes it the case that Hk and Wk is false, which, by the meaning of and requires that Hk is false or Wk is false. Given (10′), it must be that Wk is false, that is, not-Wk is true. So, there can’t be an interpretation of the variable terms according to which (9′) and (10′) are true and (11′) is false, and, as the above reasoning illustrates, this is due exclusively to the semantic properties of not and and. So the reason that it is impossible that an interpretation of k, H, and W make (9′) and (10′) true and (11′) false is that the supposition otherwise is inconsistent with the semantic functioning of not and and. Compare: the supposition that there is an interpretation of k according to which k is a female is true and k is not US President is false does not seem to violate the semantic properties of the constant terms. If we identify the meanings of the predicates with their extensions in all possible worlds, then the supposition that there is a female U.S. President does not violate the meanings of female and US President for surely it is possible that there be a female US President. But, supposing that (9′) and (10′) could be true with (11′) false on some interpretation of k, H, and W, violates the semantic properties of either and or not.

In sum, our first-step characterization of logical consequence is the following. For a given language L,

X is a logical consequence of K if and only if there is no possible interpretation of the non-logical terminology of L according to which all the sentence in K are true and X is false.

A possible interpretation of the non-logical terminology of the language L according to which sentences are true or false is a reading of the non-logical terms according to which the sentences receive a truth-value (that is, is either true or false) in a situation that is not ruled out by the semantic properties of the logical constants. The philosophical locus of the technical development of ‘possible interpretation’ in terms of models is Tarski (1936). A model for a language L is the theoretical development of a possible interpretation of non-logical terminology of L according to which the sentences of L receive a truth-value. Models have become standard tools for characterizing the logical consequence relation, and the characterization of logical consequence in terms of models is called the Tarskian or model-theoretic characterization of logical consequence. We say that X is a model-theoretic consequence of K if and only if all models of K are models of X. This relation may be represented as K ⊨ X. If model-theoretic consequence is adequate as a representation of logical consequence, then it must reflect the salient features of the common concept, which, according to Tarski means that it must be necessary, formal and a priori.

For further discussion of this conception of logical consequence, see the article, Logical Consequence, Model-Theoretic Conceptions.

ii. The Nature of Logical Constants Explained in Terms of Their Inferential Properties

We now turn to a second approach to understanding logical constants. Instead of understanding the nature of logical constants in terms of their semantic properties as is done on the model-theoretic approach, on the second approach we appeal to their inferential properties conceived of in terms of principles of inference, that is, principles justifying steps in deductions. We begin with a remark made by Aristotle. In his study of logical consequence, Aristotle comments that

A syllogism is discourse in which, certain things being stated, something other than what is stated follows of necessity from their being so. I mean by the last phrase that they produce the consequence, and by this, that no further term is required from without in order to make the consequence necessary. (Prior Analytics, p. 24b)

Adapting this to our X and K, we may say that X is a logical consequence of K when the sentences of K are sufficient to produce X. How are we to think of a sentence being produced by others? One way of developing this is to appeal to a notion of an actual or possible deduction. X is a deductive consequence of K if and only if there is a deduction of X from K. In such a case, we say that X may be correctly inferred from K or that it would be correct to conclude X from K. A deduction is associated with a pair ; the set K of sentences is the basis of the deduction, and X is the conclusion. A deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S (that is, each intermediate conclusion) is derived from a sentence (or more) in K or from previous sentences in S in accordance with a correct principle of inference.

For example, intuitively, the following inference seems correct.

  Kelly is not both at home and at work
  Kelly is at home
(therefore) Kelly is not at work

The set K of sentences above the line is the basis of the inference and the sentence X below is the conclusion. We represent their logical forms, again, as follows.

  (9′) not-(Hk and Wk)
  (10′) Hk
(therefore) (11′) not-Wk

Consider the following deduction of (11′) from (10′) and (9′).

Deduction: Assume that (12′) Wk. Then from (10′) and (12′) we may deduce that (13′) Hk and Wk. (13′) contradicts (9′) and so (12′), our initial assumption, must be false. We have deduced not-Wk from not-(Hk and Wk) and Hk.

Since the deduction of not-Wk from not-(Hk and Wk) and Hk did not depend on the interpretation of k, W, and H, the deductive relation is formal. Furthermore, my knowledge of this is a priori because my knowledge of the underlying principles of inference in the above deduction is not empirical. For example, letting P and Q be any sentences, we know a priori that P and Q may be inferred from the set K={P, Q} of basis sentences. This principle grounds the move from (10′) and (12′) to (13′). Also, the deduction appeals to the principle that if we deduce a contradiction from an assumption, then we may infer that the assumption is false. The correctness of this principle seems to be an a priori matter. Let’s look at another example of a deduction.

  (1) Some children are both lawyers and peacemakers
(therefore) (2) Some children are peacemakers

The logical forms are, again, the following.

  (1′) Some S are both M and P
(therefore) (2′) Some S are P

Again, intuitively, (2′) is deducible from (1′).

Deduction: The basis tells us that at least one S–let’s call this Sa‘–is both an M and a P. Clearly, a is a P may be deduced from a is both an M and a P. Since we’ve assumed that a is an S, what we derive with respect to a we derive with respect to some S. So our derivation of a is a P is a derivation of Some S is a P, which is our desired conclusion.

Since the deduction is formal, we have shown not merely that (2) can be correctly inferred from (1), but we have shown that for any interpretation of S, M, and P it is correct to infer (2′) from (1′).

Typically, deductions leave out steps (perhaps because they are too obvious), and they usually do not justify each and every step made in moving towards the conclusion (again, obviousness begets brevity). The notion of a deduction is made precise by describing a mechanism for constructing deductions that are both transparent and rigorous (each step is explicitly justified and no steps are omitted). This mechanism is a deductive system (also known as a formal system or as a formal proof calculus). A deductive system D is a collection of rules that govern which sequences of sentences, associated with a given , are allowed and which are not. Such a sequence is called a proof in D (or, equivalently, a deduction in D) of X from K. The rules must be such that whether or not a given sequence associated with qualifies as a proof in D of X from K is decidable purely by inspection and calculation. That is, the rules provide a purely mechanical procedure for deciding whether a given object is a proof in D of X from K.

We say that a deductive system D is correct when for any K and X, proofs in D of X from K correspond to intuitively valid deductions. For example, intuitively, there are no correct principles of inference according to which it is correct to conclude

Some animals are both mammals and reptiles

on the basis of the following two sentences.

Some animals are mammals
Some animals are reptiles

Hence, a proof in a deductive system of the former sentence from the latter two is evidence that the deductive system is incorrect. The point here is that a proof in D may fail to represent a deduction if D is incorrect.

A rich variety of deductive systems have been developed for registering deductions. Each system has its advantages and disadvantages, which are assessed in the context of the more specific tasks the deductive system is designed to accomplish. Historically, the general purpose of the construction of deductive systems was to reduce reasoning to precise mechanical rules (Hodges 1983, p. 26). Some view a deductive system defined for a language L as a mathematical model of actual or possible chains of correct reasoning in L. Sundholm (1983) offers a thorough survey of three main types of deductive systems. For a shorter, excellent introduction to the concept of a deductive system see Henkin (1967). A deductive system is developed in detail in the accompanying article, Logical Consequence, Deductive-Theoretic Conceptions.

If there is a proof of X from K in deductive system D, then we may say that X is a deductive consequence in D of K, which is sometimes expressed as K ⊢D X. Relative to a correct deductive system D, we characterize logical consequence in terms of deductive consequence as follows.

X is a logical consequence of K if and only if X is a deductive consequence in D of K, that is, there is an actual or possible proof in D of X from K.

This is called the deductive-theoretic (or proof-theoretic) characterization of logical consequence.

3. Model-Theoretic and Deductive-Theoretic Conceptions of Logic

We began with Tarski’s observations of the common or ordinary concept of logical consequence that we employ in daily life. According to Tarski, if X is a logical consequence of a set of sentences, K, then, in virtue of the logical forms of the sentences involved, if all of the members of K are true, then X must be true, and furthermore, we know this a priori. The formality criterion makes the logical constants the essential determinant of the logical consequence relation. The logical consequence relation is fixed exclusively in terms of the nature of the logical terminology. We have highlighted two different approaches to the nature of a logical constant: (1) in terms of its semantic contribution to sentences in which it occurs and (2) in terms of its inferential properties. The two approaches yield distinct conceptions of the notion of necessity inherent in the common concept of logical consequence, and lead to the following characterizations of logical consequence.

(1) X is a logical consequence of K if and only if there is no possible interpretation of the non-logical terminology of the language according to which all the sentences in K are true and X is false.

(2) X is a logical consequence of K if and only if X is deducible from K.

We make the notions of possible interpretation in (1) and deducibility in (2) precise by appealing to the technical notions of model and deductive system. This leads to the following theoretical characterizations of logical consequence.

(1) The model-theoretic characterization of logical consequence: X is a logical consequence of K iff all models of K are models of X.

(2) The deductive- theoretic characterization of logical consequence: X is a logical consequence of K iff there is a deduction in a correct deductive system of X from K.

Following Shapiro (1991, p. 3) define a logic to be a language L plus either a model-theoretic or a deductive-theoretic account of logical consequence. A language with both characterizations is a full logic just in case both characterizations coincide. A soundness proof establishes K ⊢D X only if K ⊨ X, and a completeness proof establishes K ⊢D X if K ⊨ X. These proofs together establish that the two characterizations coincide, and in such a case the deductive system D is said to be complete and sound with respect to the model-theoretic consequence relation defined for the relevant language L.

We said that the primary aim of logic is to tell us what follows logically from what. These two characterizations of logical consequence lead to two different orientations or conceptions of logic (see Tharp 1975, p. 5).

Model-theoretic approach: Logic is a theory of possible interpretations. For a given language the class of situations that can–logically–be described by that language.

Deductive-theoretic approach: Logic is a theory of formal deductive inference.

The article now concludes by highlighting three considerations relevant to evaluating a particular deployment of the model-theoretic or deductive-theoretic definition in defining logical consequence. These considerations emerge from the above development of the two theoretic definitions from the common concept of logical consequence.

4. Conclusion

The two theoretical characterizations of logical consequence do not provide the means for drawing a boundary in a language L between logical and non-logical terms. Indeed, their use presupposes that a list of logical terms is in hand. Hence, in evaluating a model-theoretic or deductive-theoretic definition of logical consequence for a language L the issue arises whether or not the boundary in L between logical and non-logical terms has been correctly drawn. This requires a response to a central question in the philosophy of logic: what qualifies as a logical constant? Tarski gives a well-reasoned response in his (1986). (For more recent discussion see McCarthy 1981 and 1998, Hanson 1997, and Warbrod 1999.)

A second thing to consider in evaluating a theoretical account of logical consequence is whether or not its characterization of the logical terminology is accurate. For example, model-theoretic and deductive accounts of logical consequence are inadequate unless they reflect the semantic and inferential properties of the logical terms, respectively. So a model-theoretic account is inadequate unless it gets right the semantic contributions of the logical terms to the truth conditions of the sentences formed using them. For a particular deductive system D, the question arises whether or not D’s rules of inference reflect the inferential properties of the logical terms. (For further discussion of the semantic and inferential properties of logical terms see Haack 1978 and 1996, Read 1995, and Quine 1986.)

A third consideration in assessing the success of a theoretical definition of logical consequence is whether or not the definition, relative to a selection of terms as logical, reflects the salient features of the common concept of logical consequence. There are criticisms of the theoretical definitions that claim that they are incapable of reflecting the common concept of logical consequence. Typically, such criticisms are used to question the status of the model-theoretic and deductive-theoretic approaches to logic.

For example, there are critics who question the model-theoretic approach to logic by arguing that any model-theoretic account lacks the conceptual resources to reflect the notion of necessity inherent in the common concept of logical consequence because such an account does not rule out the possibility of there being logically possible situations in which sentences in K are true and X is false even though every model of K is a model of X. Kneale (1961) is an early critic, Etchemendy (1988, 1999) offers a sustained and multi-faceted attack. Also, it is argued that the model-theoretic approach to logic makes knowledge of what follows from what depend on knowledge of the existence of models, which is knowledge of worldly matters of fact. But logical knowledge should not depend on knowledge about the extra-linguistic world (recall the locked room metaphor in 2.2.1). This standard logical positivist line has been recently challenged by those who see logic penetrated and permeated by metaphysics (e.g., Putnam 1971, Almog 1989, Sher 1991, Williamson 1999).

The status of the deductive-theoretic approach to logic is not clear for, as Tarski argues in his (1936), deductive-theoretic accounts are unable to reflect the fact that, according to the common concept, logical consequence is not compact. Relative to any deductive system D, the ⊢D-consequence relation is compact if and only if for any sentence X and set K of sentences, if K ⊢D X, then K’ ⊢D X, where K’ is a finite subset of sentences from K. But there are intuitively correct principles of inference according to which one may infer a sentence X from a set K of sentences, even though it is incorrect to infer X from any finite subset of K. This suggests that the intuitive notion of deducibility is not completely captured by any compact consequence relation. We need to weaken

X is a logical consequence of K if and only if there is a proof in a correct deductive system of X from K,

given above, to

X is a logical consequence of K if there is a proof in a correct deductive system of X from K.

In sum, the issue of the nature of logical consequence, which intersects with other areas of philosophy, is still a matter of debate. Tarski’s analysis of the concept is not universally accepted; philosophers and logicians differ over what the features of the common concept are. For example, some offer accounts of the logical consequence relation according to which it is not a priori (e.g., see Koslow 1999, Sher 1991 and see Hanson 1997 for criticism of Sher) or deny that it even need be strongly necessary (Smiley 1995, 2000, section 6). The entry Logical Consequence, Model-Theoretic Conceptions gives a model-theoretic definition of logical consequence. For a detailed development of a deductive system see the entry Logical Consequence, Deductive-Theoretic Conceptions. The critical discussion in both articles deepens and extends points made in the conclusion of this article.

5. References and Further Reading

  • Almog, J. (1989): “Logic and the World”, pp. 43-65 in Themes From Kaplan, ed. J. Almog, J. Perry, J., and H. Wettstein. New York: Oxford UP.
  • Aristotle. (1941): Basic Works, ed. R. McKeon. New York: Random House.
  • Bencivenga, E. (1999): “What is Logic About?”, pp. 5-19 in Varzi (1999).
  • Etchemendy, J. (1983): “The Doctrine of Logic as Form”, Linguistics and Philosophy 6, pp. 319-334.
  • Etchemendy, J. (1988): “Tarski on truth and logical consequence”, Journal of Symbolic Logic 53, pp. 51-79.
  • Etchemendy, J. (1999): The Concept of Logical Consequence. Stanford: CSLI Publications.
  • Fodor, J. (2000): The Mind Doesn’t Work That Way. Cambridge: The MIT Press.
  • Gabbay, D. and F. Guenthner, eds. (1983): Handbook of Philosophical Logic, Vol 1. Dordrecht: D. Reidel Publishing Company.
  • Haack, S. (1978): Philosophy of Logics . Cambridge: Cambridge University Press.
  • Haack, S. (1996): Deviant Logic, Fuzzy Logic. Chicago: The University of Chicago Press.
  • Hodges, W. (1983): “Elementary Predicate Logic”, in Gabbay, D. and F. Guenthner (1983).
  • Hamlyn, D.W. (1967): “A Priori and A Posteriori”, pp.105-109 in The Encyclopedia of Philosophy, Vol. 1, ed. P. Edwards. New York: Macmillan & The Free Press.
  • Hanson, W. (1997): “The Concept of Logical Consequence”, The Philosophical Review 106, pp. 365-409.
  • Henkin, L. (1967): “Formal Systems and Models of Formal Systems”, pp. 61-74 in The Encyclopedia of Philosophy, Vol. 8, ed. P. Edwards. New York: Macmillan & The Free Press.
  • Kneale, W. (1961): “Universality and Necessity”, British Journal for the Philosophy of Science 12, pp. 89-102.
  • Koslow, A. (1999): “The Implicational Nature of Logic: A Structuralist Account”, pp. 111-155 in Varzi (1999).
  • McCarthy, T. (1981): “The Idea of a Logical Constant”, Journal of Philosophy 78, pp. 499-523.
  • McCarthy, T. (1998): “Logical Constants”, pp. 599-603 in Routledge Encyclopedia of Philosophy, Vol. 5, ed. E. Craig. London: Routledge.
  • McGee, V. (1999): “Two Problems with Tarski’s Theory of Consequence”, Proceedings of the Aristotelean Society 92, pp. 273-292.
  • Moore, G.E., (1959): “Certainty”, pp. 227-251 in Philosophical Papers. London: George Allen & Unwin.
  • Priest. G. (1995): “Etchemendy and Logical Consequence”, Canadian Journal of Philosophy 25, pp. 283-292.
  • Putnam, H. (1971): Philosophy of Logic. New York: Harper & Row.
  • Quine, W.V. (1986): Philosophy of Logic, 2nd ed.. Cambridge: Harvard UP.
  • Read, S. (1995): Thinking About Logic. Oxford: Oxford UP.
  • Shapiro, S. (1991): Foundations without Foundationalism: A Case For Second-Order Logic. Oxford: Clarendon Press.
  • Shapiro, S. (1993): ” Modality and Ontology”, Mind 102, pp. 455-481.
  • Shapiro, S. (1998): “Logical Consequence: Models and Modality”, pp. 131-156 in The Philosophy of Mathematics Today, ed. Matthias Schirn. Oxford, Clarendon Press.
  • Shapiro, S. (2000): Thinking About Mathematics , Oxford: Oxford University Press.
  • Sher, G. (1989): “A Conception of Tarskian Logic”, Pacific Philosophical Quarterly 70, pp. 341-368.
  • Sher, G. (1991): The Bounds of Logic: A Generalized Viewpoint, Cambridge, MA: The MIT Press.
  • Sher, G. (1996): “Did Tarski commit ‘Tarski’s fallacy’?” Journal of Symbolic Logic 61, pp. 653-686.
  • Sher, G. (1999): “Is Logic a Theory of the Obvious?”, pp. 207-238 in Varzi (1999).
  • Smiley, T. (1995): “A Tale of Two Tortoises”, Mind 104, pp. 725-36.
  • Smiley, T. (1998): “Consequence, Conceptions of”, pp. 599-603 in Routledge Encyclopedia of Philosophy, vol. 2, ed. E. Craig. London: Routledge.
  • Sundholm, G. (1983): “Systems of Deduction”, in Gabbay and Guenthner (1983).
  • Tarski, A. (1933): “Pojecie prawdy w jezykach nauk dedukeycyjnych”, translated as “On the Concept of Truth in Formalized Languages”, pp. 152-278 in Tarski (1983).
  • Tarski, A. (1936): “On the Concept of Logical Consequence”, pp. 409-420 in Tarski (1983).
  • Tarski, A. (1983): Logic, Semantics, Metamathematics, 2nd ed. Indianapolis: Hackett Publishing.
  • Tarski, A. (1986): “What are logical notions?” History and Philosophy of Logic 7, pp. 143-154.
  • Tharp, L. (1975): “Which Logic is the Right Logic?” Synthese 31, pp. 1-21.
  • Warbrod, K., (1999): “Logical Constants” Mind 108, pp. 503-538.
  • Williamson, T. (1999): “Existence and Contingency”, Proceedings of the Aristotelian Society Supplementary Vol. 73, pp. 181-203.
  • Varzi, A., ed. (1999): European Review of Philosophy, Vol. 4: The Nature of Logic, Stanford: CSLI Publications.

Author Information

Matthew McKeon
Email: mckeonm@msu.edu
Michigan State University
U. S. A.

Deductive-Theoretic Conceptions of Logical Consequence

According to the deductive-theoretic conception of logical consequence, a sentence X is a logical consequence of a set K of sentences if and only if X is a deductive consequence of K, that is, X is deducible or provable from K. Deductive consequence is clarified in terms of the notion of proof in a correct deductive system. Since, arguably, logical consequence conceived deductive-theoretically is not a compact relation and deducibility in a deductive system is, there are languages for which deductive consequence cannot be defined in terms of deducibility in a correct deductive system. However, it is true that if a sentence is deducible in a correct deductive system from other sentences, then the sentence is a deductive consequence of them. A deductive system is correct only if its rules of inference correspond to intuitively valid principles of inference. So whether or not a natural deductive system is correct brings into play rival theories of valid principles of inference such as classical, relevance, intuitionistic, and free logics.

Table of Contents

  1. Introduction
  2. Linguistic Preliminaries: the Language M
    1. Syntax of M
    2. Semantics for M
  3. What is a Logic?
  4. Deductive System N
  5. The Status of the Deductive Characterization of Logical Consequence in Terms of N
    1. Tarski’s argument that the model-theoretic characterization of logical consequence is more basic than its characterization in terms of a deductive system
    2. Is deductive system N correct?
      1. Relevance logic
      2. Intuitionistic logic
      3. Free logic
  6. Conclusion
  7. References and Further Reading

1. Introduction

According to the deductive-theoretic conception of logical consequence, a sentence X is a logical consequence of a set K of sentences if and only if X is a deductive consequence of K, that is, X is deducible from K. X is deducible from K just in case there is an actual or possible deduction of X from K. In such a case, we say that X may be correctly inferred from K or that it would be correct to conclude X from K. A deduction is associated with a pair ; the set K of sentences is the basis of the deduction, and X is the conclusion. A deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S (that is, each intermediate conclusion) is derived from a sentence (or more) in K or from previous sentences in S in accordance with a correct principle of inference. The notion of a deduction is clarified by appealing to a deductive system. A deductive system D is a collection of rules that govern which sequences of sentences, associated with a given , are allowed and which are not. Such a sequence is called a proof in D (or, equivalently, a deduction in D) of X from K. The rules must be such that whether or not a given sequence associated with qualifies as a proof in D of X from K is decidable purely by inspection and calculation. That is, the rules provide a purely mechanical procedure for deciding whether a given object is a proof in D of X from K. We write

K ⊢D X

to mean

X is deducible in deductive system D from K.

See the entry Logical Consequence, Philosophical Considerations for discussion of the interplay between the concepts of logical consequence and deductive consequence, and deductive systems. We say that a deductive system D is correct when for any K and X, proofs in D of X from K correspond to intuitively valid deductions. For a given language the deductive consequence relation is defined in terms of a correct deductive system D only if it is true that

X is a deductive consequence of K if and only if X is deducible in D from K.

Sundholm (1983) offers a thorough survey of three main types of deductive systems. In this article, a natural deductive system is presented that originates in the work of the mathematician Gerhard Gentzen (1934) and the logician Fredrick Fitch (1952). We will refer to the deductive system as N (for ‘natural deduction’). For an in-depth introductory presentation of a natural deductive system very similar to N see Barwise and Etchemendy (2001). N is a collection of inference rules. A proof of X from K that appeals exclusively to the inference rules of N is a formal deduction or formal proof. We shall take a formal proof to be associated with a pair where K is a set of sentences from a first-order language M, which will be introduced below, and X is an M-sentence. The set K of sentences is the basis of the deduction, and X is the conclusion. We say that a formal deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S is either an assumption, deduced from a sentence (or more) in K, or deduced from previous sentences in S in accordance with one of N’s inference rules.

Formal proofs are not only epistemologically significant for securing knowledge, but also the derivations making up formal proofs may serve as models of the informal deductive reasoning performed using sentences from language M. Indeed, a primary value of a formal proof is that it can serve as a model of ordinary deductive reasoning that explains the force of such reasoning by representing the principles of inference required to get to X from K.

Gentzen, one of the first logicians to present a natural deductive system, makes clear that a primary motive for the construction of his system is to reflect as accurately as possible the actual logical reasoning involved in mathematical proofs. He writes,

My starting point was this: The formalization of logical deduction especially as it has been developed by Frege, Russell, and Hilbert, is rather far removed from the forms of deduction used in practice in mathematical proofs…In contrast, I intended first to set up a formal system which comes as close as possible to actual reasoning. The result was a ‘calculus of natural deduction’. (Gentzen 1934, p. 68)

Natural deductive systems are distinguished from other deductive systems by their usefulness in modeling ordinary, informal deductive inferential practices. Paraphrasing Gentzen, we may say that if one is interested in seeing logical connections between sentences in the most natural way possible, then a natural deductive system is a good choice for defining the deductive consequence relation.

The remainder of the article proceeds as follows. First, an interpreted language M is given. Next, we present the deductive system N and represent the deductive consequence relation in M. After discussing the philosophical significance of the deductive consequence relation defined in terms of N, we consider some standard criticisms of the correctness of deductive system N.

2. Linguistic Preliminaries: the Language M

Here we define a simple language M, a language about the McKeon family, by first sketching what strings qualify as well-formed formulas (wffs) in M. Next we define sentences from formulas, and then give an account of truth in M, that is we describe the conditions in which M-sentences are true.

a. Syntax of M

Building blocks of formulas

Terms

Individual names—’beth’, ‘kelly’, ‘matt’, ‘paige’, ‘shannon’, ‘evan’, and ‘w1‘, ‘w2‘, ‘w3‘, etc.

Variables—’x’, ‘y’, ‘z’, ‘x1‘, ‘y1‘, ‘z1‘, ‘x2‘, ‘y2‘, ‘z2‘, etc.

Predicates

1-place predicates—’Female’, ‘Male’

2-place predicates—’Parent’, ‘Brother’, ‘Sister’, ‘Married’, ‘OlderThan’, ‘Admires’, ‘=’.

Blueprints of well-formed formulas (wffs)

Atomic formulas: An atomic wff is any of the above n-place predicates followed by n terms which are enclosed in parentheses and separated by commas.

Formulas: The general notion of a well-formed formula (wff) is defined recursively as follows:

(1) All atomic wffs are wffs.
(2) If α is a wff, so is ''.
(3) If α and β are wffs, so is '(α & β)'.
(4) If α and β are wffs, so is 'v β)'.
(5) If α and β are wffs, so is '(α → β)'.
(6) If Ψ is a wff and v is a variable, then 'vΨ' is a wff.
(7) If Ψ is a wff and v is a variable, then 'vΨ' is a wff.
Finally, no string of symbols is a well-formed formula of M unless the string can be derived from (1)-(7).

The signs ‘~’, ‘&’, ‘v‘, and ‘→’, are called sentential connectives. The signs ‘∀’ and ‘∃’ are called quantifiers.

It will prove convenient to have available in M an infinite number of individual names as well as variables. The strings ‘Parent(beth, paige)’ and ‘Male(x)’ are examples of atomic wffs. We allow the identity symbol in an atomic formula to occur in between two terms, e.g., instead of ‘=(evan, evan)’ we allow ‘(evan = evan)’. The symbols ‘~’, ‘&’, ‘v‘, and ‘→’ correspond to the English words ‘not’, ‘and’, ‘or’ and ‘if…then’, respectively. ‘∃’ is our symbol for an existential quantifier and ‘∀’ represents the universal quantifier. 'vΨ' and 'vΨ' correspond to for some v, Ψ, and for all v, Ψ, respectively. For every quantifier, its scope is the smallest part of the wff in which it is contained that is itself a wff. An occurrence of a variable v is a bound occurrence iff it is in the scope of some quantifier of the form 'v' or the form 'v', and is free otherwise. For example, the occurrence of ‘x’ is free in ‘Male(x)’ and in ‘∃y Married(y, x)’. The occurrences of ‘y’ in the second formula are bound because they are in the scope of the existential quantifier. A wff with at least one free variable is an open wff, and a closed formula is one with no free variables. A sentence is a closed wff. For example, ‘Female(kelly)’ and ‘∃y∃x Married(y, x)’ are sentences but ‘OlderThan(kelly, y)’ and ‘(∃x Male(x) & Female(z))’ are not. So, not all of the wffs of M are sentences. As noted below, this will affect our definition of truth for M.

b. Semantics for M

We now provide a semantics for M. This is done in two steps. First, we specify a domain of discourse, that is, the chunk of the world that our language M is about, and interpret M’s predicates and names in terms of the elements composing the domain. Then we state the conditions under which each type of M-sentence is true. To each of the above syntactic rules (1-7) there corresponds a semantic rule that stipulates the conditions in which the sentence constructed using the syntactic rule is true. The principle of bivalence is assumed and so ‘not true’ and ‘false’ are used interchangeably. In effect, the interpretation of M determines a truth-value (true, false) for each and every sentence of M.

Domain D—The McKeons: Matt, Beth, Shannon, Kelly, Paige, and Evan.

Here are the referents and extensions of the names and predicates of M.

Terms: ‘matt’ refers to Matt, ‘beth’ refers to Beth, ‘shannon’ refers to Shannon, etc.

Predicates. The meaning of a predicate is identified with its extension, that is the set (possibly empty) of elements from the domain D the predicate is true of. The extension of a one-place predicate is a set of elements from D, the extension of a two-place predicate is a set of ordered pairs of elements from D.

The extension of ‘Male’ is {Matt, Evan}.

The extension of ‘Female’ is {Beth, Shannon, Kelly, Paige}.

The extension of ‘Parent’ is {<Matt, Shannon>, <Matt, Kelly>, <Matt, Paige>, <Matt, Evan>, <Beth, Shannon>, <Beth, Kelly>, <Beth, Paige>, <Beth, Evan>}.

The extension of ‘Married’ is {<Matt, Beth>, <Beth, Matt>}.

The extension of ‘Sister’ is {<Shannon, Kelly>, <Kelly, Shannon>, <Shannon, Paige>, <Paige, Shannon>, <Kelly, Paige>, <Paige, Kelly>, <Kelly, Evan>, <Paige, Evan>, <Shannon, Evan>}.

The extension of ‘Brother’ is {<Evan, Shannon>, <Evan, Kelly>, <Evan, Paige>}.

The extension of ‘OlderThan’ is {<Beth, Matt>, <Beth, Shannon>, <Beth, Kelly>, <Beth, Paige>, <Beth, Evan>, <Matt, Shannon>, <Matt, Kelly>, <Matt, Paige>, <Matt, Evan>, <Shannon, Kelly>, <Shannon, Paige>, <Shannon, Evan>, <Kelly, Paige>, <Kelly, Evan>, <Paige, Evan>}.

The extension of ‘Admires’ is {<Matt, Beth>, <Shannon, Matt>, <Shannon, Beth>, <Kelly, Beth>, <Kelly, Matt>, <Kelly, Shannon>, <Paige, Beth>, <Paige, Matt>, <Paige, Shannon>, <Paige, Kelly>, <Evan, Beth>, <Evan, Matt>, <Evan, Shannon>, <Evan, Kelly>, <Evan, Paige>}.

The extension of ‘=’ is {<Matt, Matt>, <Beth, Beth>, <Shannon, Shannon>, <Kelly, Kelly>, <Paige, Paige>, <Evan, Evan>}.

The atomic sentence ‘Female(kelly)’ is true because, as indicated above, the referent of ‘kelly’ is in the extension of the property designated by ‘Female’. The atomic sentence ‘Married(shannon, kelly)’ is false because the ordered pair is not in the extension of the relation designated by ‘Married’.

(I) An atomic sentence with a one-place predicate is true iff the referent of the term is a member of the extension of the predicate, and an atomic sentence with a two-place predicate is true iff the ordered pair formed from the referents of the terms in order is a member of the extension of the predicate.
(II) '' is true iff α is false.
(III) '(α & β)' is true when both α and β are true; otherwise '(α & β)' is false.
(IV) 'v β)' is true when at least one of α and β is true; otherwise 'v β)' is false.
(V) '(α → β)' is true if and only if (iff) α is false or β is true. So, '(α → β)' is false just in case α is true and β is false.

The meanings for ‘~’ and ‘&’ roughly correspond to the meanings of ‘not’ and ‘and’ as ordinarily used. We call '' and '(α & β)' negation and conjunction formulas, respectively. The formula '(~α v β)' is called a disjunction and the meaning of ‘v‘ corresponds to inclusive or. There are a variety of conditionals in English (e.g., causal, counterfactual, logical), with each type having a distinct meaning. The conditional defined by (V) above is called the material conditional. One way of following (V) is to see that the truth conditions for '(α → β)' are the same as for '~(α & ~β)'.

By (II) ‘~Married(shannon, kelly)’ is true because, as noted above, ‘Married(shannon, kelly)’ is false. (II) also tells us that ‘~Female(kelly)’ is false since ‘Female(kelly)’ is true. According to (III), ‘(~Married(shannon, kelly) & Female(kelly))’ is true because ‘~Married(shannon, kelly)’ is true and ‘Female(kelly)’ is true. And ‘(Male(shannon) & Female(shannon))’ is false because ‘Male(shannon)’ is false. (IV) confirms that ‘(Female(kelly) v Married(evan, evan))’ is true because, even though ‘Married(evan, evan)’ is false, ‘Female(kelly)’ is true. From (V) we know that the sentence ‘(~(beth = beth) → Male(shannon))’ is true because ‘~(beth = beth)’ is false. If α is false then '(α → β)' is true regardless of whether or not β is true. The sentence ‘(Female(beth) → Male(shannon))’ is false because ‘Female(beth)’ is true and ‘Male(shannon)’ is false.

Before describing the truth conditions for quantified sentences we need to say something about the notion of satisfaction. We’ve defined truth only for the formulas of M that are sentences. So, the notions of truth and falsity are not applicable to non-sentences such as ‘Male(x)’ and ‘((x = x) → Female(x))’ in which ‘x’ occurs free. However, objects may satisfy wffs that are non-sentences. We introduce the notion of satisfaction with some examples. An object satisfies ‘Male(x)’ just in case that object is male. Matt satisfies ‘Male(x)’, Beth does not. This is the case because replacing ‘x’ in ‘Male(x)’ with ‘Matt’ yields a truth while replacing the variable with ‘beth’ yields a falsehood. An object satisfies ‘((x = x) → Female(x))’ if and only if it is either not identical with itself or is a female. Beth satisfies this wff (we get a truth when ‘beth’ is substituted for the variable in all of its occurrences), Matt does not (putting ‘matt’ in for ‘x’ wherever it occurs results in a falsehood). As a first approximation, we say that an object with a name, say ‘a’, satisfies a wff 'Ψv' in which at most v occurs free if and only if the sentence that results by replacing v in all of its occurrences with ‘a’ is true. ‘Male(x)’ is neither true nor false because it is not a sentence, but it is either satisfiable or not by a given object. Now we define the truth conditions for quantifications, utilizing the notion of satisfaction. For a more detailed discussion of the notion of satisfaction, see the article, “Logical Consequence, Model-Theoretic Conceptions.”

Let Ψ be any formula of M in which at most v occurs free.

(VI) 'vΨ' is true just in case there is at least one individual in the domain of quantification (e.g. at least one McKeon) that satisfies Ψ.
(VII) 'vΨ' is true just in case every individual in the domain of quantification (e.g. every McKeon) satisfies Ψ.

Here are some examples. ‘∃x(Male(x) & Married(x, beth))’ is true because Matt satisfies ‘(Male(x) & Married(x, beth))’; replacing ‘x’ wherever it appears in the wff with ‘matt’ results in a true sentence. The sentence ‘∃xOlderThan(x, x)’ is false because no McKeon satisfies ‘OlderThan(x, x)’, that is replacing ‘x’ in ‘OlderThan(x, x)’ with the name of a McKeon always yields a falsehood.

The universal quantification ‘∀x( OlderThan(x, paige) → Male(x))’ is false for there is a McKeon who doesn’t satisfy ‘(OlderThan(x, paige) → Male(x))’. For example, Shannon does not satisfy ‘(OlderThan(x, paige) → Male(x))’ because Shannon satisfies ‘OlderThan(x, paige)’ but not ‘Male(x)’. The sentence ‘∀x(x = x)’ is true because all McKeons satisfy ‘x = x’; replacing ‘x’ with the name of any McKeon results in a true sentence.

Note that in the explanation of satisfaction we suppose that an object satisfies a wff only if the object is named. But we don’t want to presuppose that all objects in the domain of discourse are named. For the purposes of an example, suppose that the McKeons adopt a baby boy, but haven’t named him yet. Then, ‘∃x Brother(x, evan)’ is true because the adopted child satisfies ‘Brother(x, evan)’, even though we can’t replace ‘x’ with the child’s name to get a truth. To get around this is easy enough. We have added a list of names, ‘w1′, ‘w2′, ‘w3′, etc. to M, and we may say that any unnamed object satisfies 'Ψv' iff the replacement of v with a previously unused wi assigned as a name of this object results in a true sentence. In the above scenerio, ‘∃xBrother(x, evan)’ is true because, ultimately, treating ‘w1‘ as a temporary name of the child, ‘Brother(w1, evan)’ is true. Of course, the meanings of the predicates would have to be amended in order to reflect the addition of a new person to the domain of McKeons.

3. What is a Logic?

We have characterized an interpreted formal language M by defining what qualifies as a sentence of M and by specifying the conditions under which any M-sentence is true. The received view of logical consequence entails that the logical consequence relation in M turns on the nature of the logical constants in the relevant M-sentences. We shall regard just the sentential connectives, the quantifiers of M, and the identity predicate as logical constants (the language M is a first-order language). For discussion of the notion of a logical constant see Logical Consequence, Philosophical Considerations and Logical Consequence, Model-Theoretic Conceptions. Intuitively, one M-sentence is a logical consequence of a set of M-sentences if and only if it is impossible for all the sentences in the set to be true without the former sentence being true as well. A model-theoretic conception of logical consequence in M clarifies this intuitive characterization of logical consequence by appealing to the semantic properties of the logical constants, represented in the above truth clauses (I)-(VII). The entry Logical Consequence, Model-Theoretic Conceptions formalizes the account of truth in language M and gives a model-theoretic characterization of logical consequence in M. In contrast to the model-theoretic conception, the deductive-theoretic conception clarifies logical consequence, conceived of in terms of deducibility, by appealing to the inferential properties of logical constants portrayed as intuitively valid principles of inference, that is, principles justifying steps in deductions. See Logical Consequence, Philosophical Considerations for discussion of the relationship between the logical consequence relation and the model-theoretic and deductive-theoretic conceptions of it.

Deductive system N’s inference rules, introduced below, are introduction and elimination rules, defined for each logical constant of our language M. An introduction rule introduces a logical constant into a proof and is useful for deriving a sentence that contains the constant. An elimination rule for the constant makes it possible to derive a sentence that has at least one less occurrence of the logical constant. Elimination rules are useful for deriving a sentence from another in which the constant appears.

Following Shapiro (1991, p. 3), we define a logic to be a language L plus either a model-theoretic or a deductive-theoretic account of logical consequence. A language with both characterizations is a full logic just in case both characterizations coincide. For discussion on the relationship between the model-theoretic and deductive-theoretic accounts of logical consequence, see Logical Consequence, Philosophical Considerations. The logic for M developed below may be viewed as a classical logic or a first-order theory.

4. Deductive System N

In stating N’s rules, we begin with the simpler inference rules and give a sample formal deduction of them in action. Then we turn to the inference rules that employ what we shall call sub-proofs. In the statement of the rules, we let P and Q be any sentences from our language M. We shall number each line of a formal deduction with a positive integer. We let k, l, m, n, o, p and q be any positive integers such that k < m, and l < m, and m < n < o < p < q.

&-Intro

k. P
l. Q
m. (P & Q) &-Intro: k, l

&-Elim

k. (P & Q) k. (P & Q)
m. P &-Elim: k m. Q &-Elim: k

&-Intro allows us to derive a conjunction from both of its two parts (called conjuncts). According to the &-Elim rule we may derive a conjunct from a conjunction. To the right of the sentence derived using an inference rule is the justification. Steps in a proof are justified by identifying both the lines in the proof used and by citing the appropriate rule. The vertical lines serve as proof margins, which, as you will shortly see, help in portraying the structure of a proof when it contains embedded sub-proofs.

~-Elim

k. ~~P
m. P ~-Elim: k

The ~-Elim rule allows us to drop double negations and infer what was subject to the two negations.

v-Intro

k. P k. P
m. (P v Q) v-Intro: k m. (Q v P) v-Intro: k

By v-Intro we may derive a disjunction from one of its parts (called disjuncts).

-Elim

k. (P → Q)
l. P
m. Q →-Elim: k, l

The →- Elim rule corresponds to the principle of inference called modus ponens: from a conditional and its antecedent one may infer the consequent.

Here’s a sample deduction using the above inference rules. The formal deduction–the sequence of sentences 4-11—is associated with the pair

<{(Female(paige) & Female (kelly)), (Female(paige) → ~~Sister(paige, kelly)), (Female(kelly) → ~~Sister(paige, shannon))}, ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan))>.

The first element is the set of basis sentences and the second element is the conclusion. We number the basis sentences and list them (beginning with 1) ahead of the deduction. The deduction ends with the conclusion.

1. (Female(paige) & Female (kelly)) Basis
2. (Female(paige) → ~~Sister(paige, kelly)) Basis
3. (Female(kelly) → ~~Sister(paige, shannon)) Basis
4. Female(paige) &-Elim: 1
5. Female(kelly) &-Elim: 1
6. ~~Sister(paige, kelly) →-Elim: 2, 4
7. Sister(paige, kelly) ~-Elim: 6
8. ~~Sister(paige, shannon) →-Elim: 3, 5
9. Sister(paige, shannon) ~-Elim: 8
10. (Sister(paige, kelly) & Sister(paige, shannon)) &-Intro: 7, 9
11. ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan)) v-Intro: 10

Again, the column all the way to the right gives the explanations for each line of the proof. Assuming the adequacy of N, the formal deduction establishes that the following inference is correct.

(Female(paige) & Female (kelly))
(Female(paige) → ~~Sister(paige, kelly))
(Female(kelly) → ~~Sister(paige, shannon))


(therefore) ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan))

For convenience in building proofs, we expand M to include ‘⊥’, which we use as a symbol for a contradiction (e.g., ‘(Female(beth) & ~Female(beth))’).

⊥-Intro

k. P
l. ~P
m. ⊥-Intro: k, l

⊥-Elim

k.
m. P ⊥-Elim: k

If we have derived a sentence and its negation we may derive ⊥ using ⊥-Intro. The ⊥-Elim rule represents the idea that any sentence P is deducible from a contradiction. So, from ⊥ we may derive any sentence P using ⊥-Elim.

Here’s a deduction using the two rules.

1. (Parent(beth, evan) & ~Parent(beth, evan)) Basis
2. Parent(beth, evan) &-Elim: 1
3. ~Parent(beth, evan) &-Elim: 1
4. ⊥-Intro: 2, 3
5. Parent(beth, shannon) ⊥-Elim: 4

For convenience, we introduce a reiteration rule that allows us to repeat steps in a proof as needed.

Reit

k. P
.
.
.
m. P Reit: k

We now turn to the rules for the sentential connectives that employ what we shall call sub-proofs. Consider the following inference.

1. ~(Married(shannon, kelly) & OlderThan(shannon, kelly))
2. Married(shannon, kelly)


(therefore) ~Olderthan(shannon, kelly)

Here is an informal deduction of the conclusion from the basis sentences.

Proof: Suppose that ‘Olderthan(shannon, kelly)’ is true. Then, from this assumption and basis sentence 2 it follows that ‘((Shannon is married to Kelly) & (Shannon is taller than Kelly))’ is true. But this contradicts the first basis sentence ‘~((Shannon is married to Kelly) & (Shannon is taller than Kelly))’, which is true by hypothesis. Hence our initial supposition is false. We have derived that ‘~(Shannon is married to Kelly)’ is true.

Such a proof is called a reductio ad absurdum proof (or reductio for short). Reductio ad absurdum is Latin for ‘reduction to the absurd’. (For more information, see the article “Reductio ad absurdum“.) In order to model this proof in N we introduce the ~-Intro rule.

~-Intro

k. P Assumption
.
.
.
m.
n. ~P ~-Intro: k-m

The ~-Intro rule allows us to infer the negation of an assumption if we have derived a contradiction, symbolized by ‘⊥’, from the assumption. The indented proof margin (k-m) signifies a sub-proof. In a sub-proof the first line is always an assumption (and so requires no justification), which is cancelled when the sub-proof is ended and we are back out on a line that sits on a wider proof margin. The effect of this is that we can no longer appeal to any of the lines in the sub-proof to generate later lines on wider proof margins. No deduction ends in the middle of a sub-proof.

Here is a formal analogue of the above informal reductio.

1. ~(Married(shannon, kelly) & OlderThan(shannon, kelly)) Basis
2. Married(shannon, kelly) Basis
3. OlderThan(shannon, kelly) Assumption
4. (Married(shannon, kelly) & OlderThan(shannon, kelly)) &-Intro: 2, 3
5. ⊥-Intro: 1, 4
6. ~Olderthan(shannon, kelly) ~-Intro: 3-5

We signify a sub-proof with the indented proof margin line; the start and finish of a sub-proof is indicated by the start and break of the indented proof margin. An assumption, like a basis sentence, is a supposition we suppose true for the purposes of the deduction. The difference is that whereas a basis sentence may be used at any step in a proof, an assumption may only be used to make a step within the sub-proof it heads. At the end of the sub-proof, the assumption is discharged. We now look at more sub-proofs in action and introduce another of N’s inference rules. Consider the following inference.

1. (Male(kelly) v Female(kelly))
2. (Male(kelly) → ~Sister(kelly, paige))
3. (Female(kelly) → ~Brother(kelly, evan))


(therefore) (~Sister(kelly, paige) v ~Brother(kelly, evan))

Informal Proof:

By assumption ‘(Male(kelly) v Female(kelly))’ is true, that is, by assumption at least one of the disjuncts is true.

Suppose that ‘Male(kelly)’ is true. Then by modus ponens we may derive that ‘~Sister(kelly, paige)’ is true from this assumption and the basis sentence 2. Then ‘(~Sister(kelly, paige) v ~Brother(kelly, evan))’ is true.

Suppose that ‘Female(kelly)’ is true. Then by modus ponens we may derive that ‘~Brother(kelly, evan)’ is true from this assumption and the basis sentence 3. Then ‘(~Sister(kelly, paige) v ~Brother(kelly, evan))’ is true.

So in either case we have derived that ‘(~Sister(kelly, paige) v ~Brother(kelly, evan))’ is true. Thus we have shown that this sentence is a deductive consequence of the basis sentences.

We model this proof in N using the v-Elim rule.

v-Elim

k. (P v Q)
m. P Assumption
.
.
.
n. R
o. Q Assumption
.
.
.
p. R
q. R v-Elim: k, m-n, o-p

The v-Elim rule allows us to derive a sentence from a disjunction by deriving it from each disjunct, possibly using sentences on earlier lines that sit on wider proof margins.

The following formal proof models the above informal one.

1. (Male(kelly) v Female(kelly)) Basis
2. (Male(kelly) → ~Sister(kelly, paige)) Basis
3. (Female(kelly) → ~Brother(kelly, evan)) Basis
4. Male(kelly) Assumption
5. ~Sister(kelly, paige) →-Elim: 2, 4
6. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Intro: 5
7. Female(kelly) Assumption
8. ~Brother(kelly, evan) →-Elim: 3, 7
9. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Intro: 8
10. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Elim: 1, 4-6, 7-9

1. (P v Q) Basis
2. ~P Basis
3. P Assumption
4. ⊥-Intro: 2, 3
5. Q ⊥-Elim: 4
6. Q Assumption
7. Q Reit: 6
8. Q v-Elim: 1, 3-5, 6-7

Now we introduce the →-Intro rule by considering the following inference.

1. (Olderthan(shannon, kelly) → OlderThan(shannon, paige))
2. (OlderThan(shannon, paige) → OlderThan(shannon, evan))


(therefore) (Olderthan(shannon, kelly) → OlderThan(shannon, evan))

Informal proof:

Suppose that OlderThan(shannon, kelly). From this assumption and basis sentence 1 we may derive, by modus ponens, that OlderThan(shannon, paige). From this and basis sentence 2 we get, again by modus ponens, that OlderThan(shannon, evan). Hence, if OlderThan(shannon, kelly), then OlderThan(shannon, evan).

The structure of this proof is that of a conditional proof: a deduction of a conditional from a set of basis sentence which starts with the assumption of the antecedent, then a derivation of the consequent, and concludes with the conditional. To build conditional proofs in N, we rely on the →-Intro rule.

-Intro

k. P Assumption
.
.
.
m. Q
n. (P → Q) →-Intro: k-m

According to the →-Intro rule we may derive a conditional if we derive the consequent Q from the assumption of the antecedent P, and, perhaps, other sentences occurring earlier in the proof on wider proof margins. Again, such a proof is called a conditional proof.

We model the above informal conditional proof in N as follows.

1. (Olderthan(shannon, kelly) → OlderThan(shannon, paige)) Basis
2. (Olderthan(shannon, paige) → OlderThan(shannon, evan)) Basis
3. OlderThan(shannon, kelly) Assumption
4. OlderThan(shannon, paige) →-Elim: 1, 3
5. OlderThan(shannon, evan) →-Elim: 2, 4
6. (OlderThan(shannon, kelly) → OlderThan(shannon, evan)) →-Intro: 3-5

Mastery of a deductive system facilitates the discovery of proof pathways in hard cases and increases one’s efficiency in communicating proofs to others and explaining why a sentence is a logical consequence of others. For example, suppose that (1) if Beth is not Paige’s parent, then it is false that if Beth is a parent of Shannon, Shannon and Paige are sisters. Further suppose (2) that Beth is not Shannon’s parent. Then we may conclude that Beth is Paige’s parent. Of course, knowing the type of sentences involved is helpful for then we have a clearer idea of the inference principles that may be involved in deducing that Beth is a parent of Paige. Accordingly, we represent the two basis sentences and the conclusion in M, and then give a formal proof of the latter from the former.

1. (~Parent(beth, paige) → ~(Parent(beth, shannon) → Sister(shannon, paige))) Basis
2. ~Parent(beth, shannon) Basis
3. ~Parent(beth, paige) Assumption
4. ~(Parent(beth, shannon) → Sister(shannon, paige)) →-Elim: 1, 3
5. Parent(beth, shannon) Assumption
6. ⊥-Intro: 2, 5
7. Sister(shannon, paige) ⊥-Elim: 6
8. (Parent(beth, shannon) → Sister(shannon, paige)) →-Intro: 5-7
9. ⊥-Intro: 4, 8
10. ~~Parent(beth, paige) ~-Intro: 3-9
11. Parent(beth, paige) ~-Elim: 10

Because we derived a contradiction at line 9, we got ‘~~Parent(beth, paige)’ at line 10, using ~-Intro, and then we derived ‘Parent(beth, paige)’ by ~-Elim. Look at the conditional proof (lines 5-7) from which we derived line 8. Pretty neat, huh? Lines 2 and 5 generated the contradiction from which we derived ‘Sister(shannon, paige)’ at line 7 in order to get the conditional at line 8. This is our first example of a sub-proof (5-7) embedded in another sub-proof (3-9). It is unlikely that independent of the resources of a deductive system, a reasoner would be able to readily build the informal analogue of this pathway from the basis sentences to the sentence at line 11. Again, mastery of a deductive system such as N can increase the efficiency of our performances of rigorous reasoning and cultivate skill at producing elegant proofs (proofs that take the least number of steps to get from the basis to the conclusion).

We now introduce the Intro and Elim rules for the identity symbol and the quantifiers. Let n and n’ be any names, and 'Ωn' and 'Ωn’ ' be any well-formed formulas in which n and n’ appear and that have no free variables.

=-Intro

k. (n = n) =-Intro

=-Elim

k. Ωn
l. (n = n’ )
m. Ωn’ =-Elim: k, l

The =-Intro rule allows us to introduce '(n = n)' at any step in a proof. Since '(n = n)' is deducible from any sentence, there is no need to identify the lines from which line k is derived. In effect, the =-Intro rule confirms that ‘(paige = paige)’, ‘(shannon = shannon)’, ‘(kelly = kelly)’, etc… may be inferred from any sentence(s). The =-Elim rule tells us that if we have proven 'Ωn' and '(n = n’ )', then we may derive 'Ωn’ ' which is gotten from 'Ωn' by replacing n with n’ in some but possibly not all occurrences. The =-Elim rule represents the principle known as the indiscernibility of identicals, which says that if '(n = n’ )' is true, then whatever is true of the referent of n is true of the referent of n’. This principle grounds the following inference

1. ~Sister(beth, kelly)
2. (beth = shannon)


(therefore) ~Sister(shannon, kelly)

The indiscernibility of identicals is fairly obvious. If I know that Beth isn’t Kelly’s sister and that Beth is Shannon (perhaps ‘Shannon’ is an alias) then this establishes, with the help of the indiscernibility of identicals, that Shannon isn’t Kelly’s sister. Now we turn to the quantifier rules.

Let 'Ωv' be a formula in which v is the only free variable, and let n be any name.

∃-Intro

k. Ωn
m. vΩv ∃-Intro: k

∃-Elim

k. vΩv
[n] m. Ωn Assumption
.
.
.
n. P
o. P ∃-Elim: k, m-n

Here, n must be unique to the subproof, that is, n doesn’t occur on any of the lines above m and below n.

The ∃-Intro rule, which represents the principle of inference known as existential generalization, tells us that if we have proven 'Ωn', then we may derive 'vΩv' which results from 'Ωn' by replacing n with a variable v in some but possibly not all of its occurrences and prefixing the existential quantifier. According to this rule, we may infer, say, ‘∃x Married(x, matt)’ from the sentence ‘Married(beth, matt)’. By the ∃-Elim rule, we may reason from a sentence that is produced from an existential quantification by stripping the quantifier and replacing the resulting free variable in all of its occurrences by a name which is new to the proof. Recall that the language M has an infinite number of constants, and the name introduced by the ∃-Elim rule may be one of the wi. We regard the assumption at line l, which starts the embedded sub-proof, as saying “Suppose n names an arbitrary individual from the domain of discourse such that 'Ωn' is true.” To illustrate the basic idea behind the ∃-Elim rule, if I tell you that Shannon admires some McKeon, you can’t infer that Shannon admires any particular McKeon such as Matt, Beth, Shannon, Kelly, Paige, or Evan. Nevertheless we have it that she admires somebody. The principle of inference corresponding to the ∃-Elim rule, called existential instantiation, allows us to assign this ‘somebody’ an arbitrary name new to the proof, say, ‘w1‘ and reason within the relevant sub-proof from ‘Shannon admires w1‘. Then we cancel the assumption and infer a sentence that doesn’t make any claims about w1. For example, suppose that (1) Shannon admires some McKeon. Let’s call this McKeon ‘w1‘, that is, assume (2) that Shannon admires a McKeon named ‘w1‘. By the principle of inference corresponding to v-Intro we may derive (3) that Shannon admires w1 or w1 admires Kelly. From (3), we may infer by existential generalization (4) that for some McKeon x, Shannon admires x or x admires Kelly. We now cancel the assumption (that is, cancel (2)) by concluding (5) that for some McKeon x, Shannon admires x or x admires Kelly from (1) and the subproof (2)-(4), by existential instantiation. Here is the above reasoning set out formally.

1. ∃x Admires(shannon, x) Basis
[w1] 2. Admires(shannon, w1) Assumption
3. (Admires(shannon, w1) v Admires(w1, kelly)) v-Intro: 2
4. ∃x(Admires(shannon, x) v Admires(x, kelly)) ∃-Intro: 3
5. ∃x(Admires(shannon, x) v Admires(x, kelly)) ∃-Elim: 1, 2-4

The string at the assumption of the sub-proof (line 2) says “Suppose that ‘w1 ‘ names an arbitrary McKeon such that ‘Admires(shannon, w1)’ is true.” This is not a sentence of M, but of the meta-language for M, that is, the language used to talk about M. Hence, the ∃-Elim rule (as well as the ∀-Intro rule introduced below) has a meta-linguistic character.

∀-Intro

[n] k. Assumption
.
.
.
m. Ωn
n. vΩv ∀-Intro: k-m
n must be unique to the subproof

∀-Elim

k. vΩv
m. Ωn ∀-Elim: k

The ∀-Elim rule corresponds to the principle of inference known as universal instantiation: to infer that something holds for an individual of the domain if it holds for the entire domain. The ∀-Intro rule allows us to derive a claim that holds for the entire domain of discourse from a proof that the claim holds for an arbitrary selected individual from the domain. The assumption at line k reads in English “Suppose n names an arbitrarily selected individual from the domain of discourse.” As with the ∃-Elim rule, the name introduced by the ∀-Intro rule may be one of the wi. The ∀-Intro rule corresponds to the principle of inference often called universal generalization.

For example, suppose that we are told that (1) if a McKeon admires Paige, then that McKeon admires himself/herself, and that (2) every McKeon admires Paige. To show that we may correctly infer that every McKeon admires himself/herself we appeal to the principle of universal generalization, which (again) is represented in N by the ∀-Intro rule. We begin by assuming that (3) a McKeon is named ‘w1‘. All we assume about w1 is that w1 is one of the McKeons. From (2), we infer that (4) w1 admires Paige. We know from (1), using the principle of universal instantiation (the ∀-Elim rule in N), that (5) if w1 loves Paige then w1 loves w1. From (4) and (5) we may infer that (6) w1 loves w1 by modus ponens. Since w1 is an arbitrarily selected individual (and so what holds for w1 holds for all McKeons) we may conclude from (3)-(6) that (7) every McKeon loves himself/herself follows from (1) and (2) by universal generalization. This reasoning is represented by the following formal proof.

1. ∀x(Admires(x, paige) → Admires(x, x)) Basis
2. ∀x Admires(x, paige) Basis
[w1] 3. Assumption
4. Admires(w1, paige) ∀-Elim: 2
5. (Admires(w1, paige) → Admires(w1, w1)) ∀-Elim: 1
6. Admires(w1, w1) →-Elim: 4, 5
7. ∀x Admires(x, x) ∀-Intro: 3-6

Line 3, the assumption of the sub-proof, corresponds to the English sentence “Let ‘w1‘ refer to an arbitrary McKeon.” The notion of a name referring to an arbitrary individual from the domain of discourse, utilized by both the ∀-Intro and ∃-Elim rules in the assumptions that start the respective sub-proofs, incorporates two distinct ideas. One, relevant to the ∃-Elim rule, means “some specific object, but I don’t know which”, while the other, relevant to the ∀-Intro rule means “any object, it doesn’t matter which” (See Pelletier 1999, pp. 118-120 for discussion.)

Consider:

K = {All McKeons admire those who admire somebody, Some McKeon admires a McKeon}
X = Paige admires Paige

Here’s a proof that X is deducible from K.

1. ∀x(∃y Admires(x, y) → ∀z Admires(z, x)) Basis
2. ∃x∃y Admires(x, y) Basis
[w1] 3. ∃y Admires(w1, y) Assumption
4. (∃y Admires(w1, y) → ∀z Admires(z, w1)) ∀-Elim: 1
5. ∀z Admires(z, w1) →-Elim: 3, 4
6. Admires(paige, w1) ∀-Elim: 5
7. ∃y Admires(paige, y) ∃-Intro: 6
8. (∃y Admires(paige, y) → ∀z Admires(z, paige)) ∀-Elim: 1
9. ∀z Admires(z, paige) →-Elim: 7, 8
10. Admires(paige, paige) ∀-Elim: 9
11. Admires(paige, paige) ∃-Elim: 2, 3-10

An informal correlate put somewhat succinctly, runs as follows.

Let’s call the unnamed admirer, mentioned in (2), w1. From this and (1), every McKeon admires w1 and so Paige admires w1. Hence, Paige admires somebody. From this and (1) it follows that everybody admires Paige. So, Paige admires Paige. This is our desired conclusion

Even though the informal proof skips steps and doesn’t mention by name the principles of inference used, the formal proof guides its construction.

5. The Status of the Deductive Characterization of Logical Consequence in Terms of N

We began the article by presenting the deductive-theoretic characterization of logical consequence: X is a logical consequence of a set K of sentences if and only if X is deducible from K, that is, there is a deduction of X from K. To make it official, we now characterize the deductive consequence relation in M in terms of deducibility in N.

X is a deductive consequence of K if and only if K ⊢N X, that is, X is deducible in N from K

We now inquire into the status of this characterization of deductive consequence.

The first thing to note is that deductive system N is complete and sound with respect to the model-theoretic consequence relation defined in Logical Consequence, Model-Theoretic Conceptions: Section 4.4. Let

K ⊢N X

abbreviate

X is deducible in N from K

Similarly, let

K ⊨ X

abbreviate

X is a model-theoretic consequence of K, that is, every M-structure that is a model of K is also a model of X. (For more information on structures and models, see Logical Consequence, Model-Theoretic Conceptions.)

The completeness and soundness of N means that for any set K of M sentences and M-sentence X, K ⊢N X if and only if K ⊨ X. A soundness proof establishes K ⊢N X only if K ⊨ X, and a completeness proof establishes K ⊢N X if K ⊨ X. So, the ⊢N and ⊨ relations, defined on sentences of M, are extensionally equivalent. The question arises: which characterization of the logical consequence relation is more basic or fundamental?

a. Tarski’s argument that the model-theoretic characterization of logical consequence is more basic than its characterization in terms of a deductive system

The first thing to note is that the ⊢N-consequence relation is compact. For any deductive system D and pair there is a K’ such that, K ⊢D X if and only if K’ ⊢D X, where K’ is a finite subset of sentences from K. As pointed out by Tarski (1936), among others, there are intuitively correct principles of inference reflected in certain languages according to which one may infer a sentence X from a set K of sentences, even though it is incorrect to infer X from any finite subset of K. Here’s a rendition of his reasoning, focusing on the ⊢N-consequence relation defined on a language for arithmetic, which allows us to talk about the natural numbers 0, 1, 2, 3, and so on. Let ‘P’ be a predicate defined over the domain of natural numbers and let ‘NatNum(x)’ abbreviate ‘x is a natural number’. According to Tarski, intuitively,

∀x(NatNum(x) → P(x))

is a logical consequence of the infinite set S of sentences

P(0)
P(1)
P(2)
.
.
.

However, the universal quantification is not a ⊢N-consequence of the set S. The reason why is that the ⊢N-consequence relation is compact: for any sentence X and set K of sentences, X is a ⊢N-consequence of K, if and only if X is a ⊢N-consequence of some finite subset of K. Proofs in N are objects of finite length; a deduction is a finite sequence of sentences. Since the universal quantification is not a ⊢N-consequence of any finite subset of S, it is not a ⊢N-consequence of S. By the completeness of system N, it follows that

∀x(NatNum(x) → P(x))

is not a ⊨-consequence of S either. Consider the structure U* whose domain is the set of McKeons. Let all numerals name Beth. Let the extension of ‘NatNum’ be the entire domain, and the extension of ‘P’ be just Beth. Then each element of S is true in U*, but ‘∀x (NatNum(x) → P(x))’ is not true in U*. (See Logical Consequence, Model-Theoretic Conceptions for further discussion of structures.) Note that the sentences in S only say that P holds for 0, 1, 2, and so on, and not also that 0,1, 2, etc., are all the elements of the domain of discourse. The above interpretation takes advantage of this fact by reinterpreting all numerals as names for Beth.

However, we can reflect model-theoretically the intuition that ‘∀x(NatNum(x) → P(x))’ is a logical consequence of set S by doing one of two things. We can add to S the functional equivalent of the claim that 1, 2, 3, etc., are all the natural numbers there are on the basis that this is an implicit assumption of the view that the universal quantification follows from S. Or we could add ‘NatNum’ and all numerals to our list of logical terms. On either option it still won’t be the case that ‘∀x(NatNum(x) → P(x))’ is a ⊢N-consequence of the set S. There is no way to accommodate the intuition that ‘∀x(NatNum(x) → P(x))’ is a logical consequence of S in terms of a compact consequence relation. Tarski takes this to be a reason to think that the model-theoretic account of logical consequence is definitive as opposed to an account of logical consequence in terms of a compact consequence relation such as ⊢N.

Tarski’s illustration shows that what is called the ω-rule is a correct inference rule.

The ω-rule is that from:

{P(0), P(1), P(2), …}

one may infer

∀x(NatNum(x) → P(x))

with respect to any predicate P. Any inference guided by this rule is correct even though it can’t be represented in a deductive system as this notion has been used here and discussed in Logical Consequence, Philosophical Considerations.

Compactness is not a salient feature of logical consequence conceived deductive theoretically. This suggests, by the third criterion of a successful theoretical definition of logical consequence mentioned in Logical Consequence, Philosophical Considerations, that no compact consequence relation is definitive of the intuitive notion of deducibility. So, assuming that deductive system N is correct (that is, deducibility is co-extensive in M with the ⊢N-relation), we can’t treat

X is intuitively deducible from K if and only if K ⊢N X.

as a definition of deducibility in M since

X is a deductive consequence of K if and only if X is deducible in a correct deductive system from K.

is not true with respect to languages for which deducibility is not captured by any compact consequence relation (that is, not captured by any deduction-system account of it ). Some (e.g., Quine) demur using a language for purposes of science in which deducibility is not completely represented by a deduction-system account because of epistemological considerations. Nevertheless, as Tarski (1936) argues, the fact that there cannot be deduction-system accounts of some intuitively correct principles of inference is reason for taking a model-theoretic characterization of logical consequence to be more fundamental than any characterization in terms of a deductive system sound and complete with respect to the model-theoretic characterization.

b. Is deductive system N correct?

In discussing the status of the characterization of logical consequence in terms of deductive system N, we assumed that N is correct. The question arises whether N is, indeed, correct. That is, is it the case that X is intuitively deducible from K if and only if K ⊢N X? The biconditional holds only if both (1) and (2) are true.

(1) If sentence X is intuitively deducible from set K of sentences, then K ⊢N X.
(2) If K ⊢N X, then sentence X is intuitively deducible from set K of sentences.

So N is incorrect if either (1) or (2) is false. The truth of (1) and (2) is relevant to the correctness of the characterization of logical consequence in terms of system N, because any adequate deductive-theoretic characterization of logical consequence must identify the logical terms of the relevant language and account for their inferential properties (for discussion, see Logical Consequence, Philosophical Considerations: Section 4). (1) is false if the list of logical terms in M is incomplete. In such a case, there will be a sentence X and set K of sentences such that X is intuitively deducible from set K because of at least one inferential property of logical terminology unaccounted for by N and so false that K ⊢N X (for discussion of some of the issues surrounding what qualifies as a logical term see Logical Consequence, Model-theoretic Conceptions: Section 5.3). In this case, N would be incorrect because it wouldn’t completely account for the inferential machinery of language M. (2) is false if there are deductions in N that are intuitively incorrect. Are there such deductions? In order to fine-tune the question note that the sentential connectives, the identity symbol, and the quantifiers of M are intended to correspond to or, and, not, if…then (the indicative conditional), is identical with, some, and all. Hence, N is a correct deductive system only if the Intro and Elim rules of N reflect the inferential properties of the ordinary language expressions. In what follows, we sketch three views that are critical of the correctness of system N because they reject (2).

i. Relevance logic

Not everybody accepts it as a fact that any sentence is deducible from a contradiction, and so some question the correctness of the ⊥-Elim rule. Consider the following informal proof of Q from 'P & ~P', for sentences P and Q, as a rationale for the ⊥-Elim rule.

From (1) P and not-P, we may correctly infer (2) P, from which it is correct to infer (3) P or Q. We derive (4) not-P from (1). (5) P follows from (3) and (4).

The proof seems to be composed of valid modes of inference. Critics of the ⊥-Elim rule are obliged to tell us where it goes wrong. Here we follow the relevance logicians Anderson and Belnap (1962, pp.105-108; for discussion, see Read 1995, pp. 54-60). In a nutshell, Anderson and Belnap claim that the proof is defective because it commits a fallacy of equivocation. The move from (2) to (3) is correct only if or has the sense of at least one. For example, from Kelly is female it is legit to infer that at least one of the two sentences Kelly is female and Kelly is older than Paige is true. On this sense of or given that Kelly is female, one may infer that Kelly is female or whatever you like. However, in order for the passage from (3) and (4) to (5) to be legitimate the sense of or in (3) is if not-…then. For example from if Kelly is not female, then Kelly is not Paige’s sister and Kelly is not female it is correct to infer Kelly is not Paige’s sister. Hence, the above “support” for the ⊥-Elim rule is defective for it equivocates on the meaning of or.

Two things to highlight. First, Anderson and Belnap think that the inference from (2) to (3) on the if not-…then reading of or is incorrect. Given that Kelly is female it is problematic to deduce that if she is not then Kelly is older than Paige—or whatever you like. Such an inference commits a fallacy of relevance for Kelly not being female is not relevant to her being older than Paige. The representation of this inference in system N appeals to the ⊥-Elim rule, which is rejected by Anderson and Belnap. Second, the principle of inference underlying the move from (3) and (4) to (5)—from P or Q and not-P to infer Q—is called the principle of the disjunctive syllogism. Anderson and Belnap claim that this principle is not generally valid when or has the sense of at least one, which it has when it is rendered by ‘v‘ (e.g., see above). If Q is relevant to P, then the principle holds on this reading of or.

It is worthwhile to note the essentially informal nature of the debate. It calls upon our pre-theoretic intuitions about correct inference. It would be quite useless to cite the proof in N of the validity of disjunctive syllogism (given above) against Anderson and Belnap for it relies on the ⊥-Elim rule whose legitimacy is in question. No doubt, pre-theoretical notions and original intuitions must be refined and shaped somewhat by theory. Our pre-theoretic notion of correct deductive reasoning in ordinary language is not completely determinant and precise independently of the resources of a full or partial logic. (See Shapiro 1991, chaps. 1 and 2 for discussion of the interplay between theory and pre-theoretic notions and intuitions.) Nevertheless, hardcore intuitions regarding correct deductive reasoning do seem to drive the debate over the legitimacy of deductive systems such as N and over the legitimacy of the ⊥-Elim rule in particular. Anderson and Belnap (1962, p. 108) write that denying the principle of the disjunctive syllogism, regarded as a valid mode of inference since Aristotle, “… will seem hopelessly naïve to those logicians whose logical intuitions have been numbed through hearing and repeating the logicians fairy tales of the past half century, and hence stand in need of further support”. The possibility that intuitions in support of the general validity of the principle of the disjunctive syllogism have been shaped by a bad theory of inference is motive enough to consider argumentative support for the principle and to investigate deductive systems for relevance logic.

A natural deductive system for relevance logic has the means for tracking the relevance quotient of the steps used in a proof and allows the application of an introduction rule in the step from A to B “only when A is relevant to B in the sense that A is used in arriving at B” (Anderson and Belnap 1962, p. 90). Consider the following proof in system N.

1. Admires(evan, paige) Basis
2. ~Married(beth, matt) Assumption
3. Admires(evan, paige) Reit: 1
4. (~Married(beth, matt) → Admires(evan, paige)) →-Intro: 2-3

Recall that the rationale behind the →-Intro rule is that we may derive a conditional if we derive the consequent Q from the assumption of the antecedent P, and, perhaps, other sentences occurring earlier in the proof on wider proof margins. The defect of this rule, according to Anderson and Belnap is that “from” in “from the assumption of the antecedent P” is not taken seriously. They seem to have a point. By the lights of the → -Intro rule, we have derived line 4 but it is hard to see how we have derived the sentence at line 3 from the assumption at step 2 when we have simply reiterated the basis at line 3. Clearly, ‘~Married(beth, matt)’ was not used in inferring ‘Admires(evan, beth)’ at line 3. The relevance logician claims that the →-Intro rule in a correct natural deductive system should not make it possible to prove a conditional when the consequent was arrived at independently of the antecedent. A typical strategy is to use classes of numerals to mark the relevance conditions of basis sentences and assumptions and formulate the Intro and Elim rules to tell us how an application of the rule transfers the numerical subscript(s) from the sentences used to the sentence derived with the help of the rule. Label the basis sentences, if any, with distinct numerical subscripts. Let a, b, c, etc., range over classes of numerals. The →-rules for a relevance natural deductive system may be represented as follows.

→-Elim

k. (P → Q)a
l. Pb
m. Qab →-Elim: k, l

→-Intro

k. P{k} Assumption
.
.
.
m. Qb
n. (P → Q)b – {k} →-Intro: k-m, provided kb
The numerical subscript of the assumption
at line k must be new to the proof.
This is insured by using the line number
for the subscript.

In the directions for the →-Intro rule, the proviso that kb insures that the antecedent P is used in deriving the consequent Q. Anderson and Belnap require that if the line that results from the application of either rule is the conclusion of the proof the relevance markers be discharged. Here is a sample proof of the above two rules in action.

1. Admires(evan, paige)1 Assumption
2. (Admires(evan, paige) → ~Married(beth, matt))2 Assumption
3. ~Married(beth, matt)1, 2 →-Elim: 1,2
4. ((Admires(evan, paige) → ~Married(beth, matt)) → ~Married(beth, matt))1 →-Intro: 2-3
5. (Admires(evan, paige) → ((Admires(evan, paige) → ~Married(beth, matt)) → ~Married(beth, matt))) →-Intro: 1-4

For further discussion see Anderson and Belnap (1962). For a comprehensive discussion of relevance deductive systems see their (1975). For a more up-to-date review of the relevance logic literature see Dunn (1986).

ii. Intuitionistic logic

We now consider the correctness of the ~-Elim rule and consider the rule in the context of using it along with the ~-Intro rule.

~-Intro

k. P Assumption
.
.
.
m.
n. ~P ~-Intro: k-m

~-Elim

k. ~~P
m. P ~-Elim: k

Here is a typical use in classical logic of the ~-Intro and ~-Elim rules. Suppose that we derive a contradiction from the assumption that a sentence P is true. So, if P were true, then a contradiction would be true which is impossible. So P cannot be true and we may infer that not-P. Similarily, suppose that we derive a contradiction from the assumption that not-P. Since a contradiction cannot be true, not-P is not true. Then we may infer that P is true by ~-Elim.

The intuitionist logician rejects the reasoning given in bold. If a contradiction is derived from not-P we may infer that not-P is not true, that is, that not-not-P is true, but it is incorrect to infer that P is true. Why? Because the intuitionist rejects the presupposition behind the ~-Elim rule, which is that for any proposition P there are two alternatives: P and not-P. The grounds for this are the intuitionistic conceptions of truth and meaning.

According to intuitionistic logic, truth is an epistemic notion: the truth of a sentence P consists of our ability to verify it. To assert P is to have a proof of P, and to assert not-P is to have a refutation of P. This leads to an epistemic conception of the meaning of logical constants. The meaning of a logical constant is characterized in terms of its contribution to the criteria of proof for the sentences in which it occurs. Compare with classical logic: the meaning of a logical constant is semantically characterized in terms of its contribution to the determination of the truth conditions of the sentences in which it occurs. For example, the classical logician accepts a sentence of the form 'P v Q' only when she accepts that at least one of the disjuncts is true. On the other hand, the intuitionistic logician accepts ' P v Q' only when she has a method for proving P or a method for proving Q. But then the Law of Excluded Middle no longer holds, because a sentence of the form P or not-P is true, that is assertible, only when we are in a position to prove or refute P, and we lack the means for verifying or refuting all sentences. The alleged problem with the ~-Elim rule is that it illegitimately extends the grounds for asserting P on the basis of not-not-P since a refutation of not-P is not ipso facto a proof of P.

Since there are finitely many McKeons and the predicates of language M seem well defined, we can work through the domain of the McKeons to verify or refute any M-sentence and so there doesn’t seem to be an M-sentence that is neither verifiable nor refutable. However, consider a language about the natural numbers. Any sentence that results by substituting numerals for the variables in ‘x = y + z’ is decidable. This is to say that for any natural numbers x, y, and z, we have an effective procedure for determining whether or not x is the sum of y and z. Hence, for all x, y, and z either we may assert that x = y + z or we may assert the contrary. Let ‘A(x)’ abbreviate ‘if x is even and greater than 2 then there exists primes y and z such that x = y + z’. Since there are algorithms for determining of any number whether or not it is even, greater than 2, or prime, the hypothesis that the open formula ‘A(x)’ is satisfied by a given natural number is decidable for we can effectively determine for all smaller numbers whether or not they are prime. However, there is no known method for verifying or refuting Goldbach’s conjecture, for all x, A(x). Even though, for each numeral n standing for a natural number, the sentence 'A(n)' is decidable (that is, we can determine which of 'A(n)' or 'not-A(n)' is true), the sentence ‘for all x, A(x)’ is not. That is, we are not in a position to hold that either Goldbach’s conjecture is true or that it is not. Clearly, verification of the conjecture via an exhaustive search of the domain of natural numbers is not possible since the domain is non-finite. Minus a counterexample or proof of Goldbach’s conjecture, the intuitionist demurs from asserting that either Goldbach’s conjecture is true or it is not. This is just one of many examples where the intuitionist thinks that the law of excluded middle fails.

In sum, the legitimacy of the ~-Elim rule requires a realist conception of truth as verification transcendent. On this conception, sentences have truth-values independently of the possibility of a method for verifying them. Intuitionistic logic abandons this conception of truth in favor of an epistemic conception according to which the truth of a sentence turns on our ability to verify it. Hence, the inference rules of an intuitionistic natural deductive system must be coded in such a way to reflect this notion of truth. For example, consider an intuitionistic language in which a, b, … range over proofs, ‘a: P’ stands for ‘a is a proof of P’, and ‘(a, b)’ stands for some suitable pairing of the proofs a and b. The &-rules of an intuitionistic natural deductive system may look like the following:

&-Intro

k. a: P
l. b: Q
m. (a, b): (P & Q) &-Intro: k, l

&-Elim

k. (a, b): (P & Q) & nbsp; k. (a, b): (P & Q)
m. a: P &-Elim: k m. b: Q &-Elim: k

Apart from the negation rules, it is fairly straightforward to dress the Intro and Elim rules of N with a proof interpretation as is illustrated above with the &-rules. For the details see Van Dalen (1999). For further introductory discussion of the philosophical theses underlying intuitionistic logic see Read (1995) and Shapiro (2000). Tennant (1997) offers a more comprehensive discussion and defense of the philosophy of language underlying intuitionistic logic.

iii. Free Logic

We now turn to the ∃-Intro and ∀-Elim rules. Consider the following two inferences.

(1) Male(evan)


(3) ∀x Male(x)


(therefore) (2) ∃x Male(x) (therefore) (4) Male(evan)

Both are correct by the lights of our system N. Specifically, (2) is derivable from (1) by the ∃-Intro rule and we get (4) from (3) by the ∀-Elim rule. Note an implicit assumption required for the legitimacy of these inferences: every individual constant refers to an element of the quantifier domain. If this existence assumption, which is built into the semantics for M and reflected in the two quantifier rules, is rejected, then the inferences are unacceptable. What motivates rejecting the existence assumption and denying the correctness of the above inferences?

There are contexts in which singular terms are used without assuming that they refer to existing objects. For example, it is perfectly reasonable to regard the individual constants of a language used to talk about myths and fairy tales as not denoting existing objects. It seems inappropriate to infer that some actually existing individual is jolly on the basis that the sentence Santa Claus is jolly is true. Also, the logic of a language used to debate the existence of God should not presuppose that God refers to something in the world. The atheist doesn’t seem to be contradicting herself in asserting that God does not exist. Furthermore, there are contexts in science where introducing an individual constant for an allegedly existing object such as a planet or particle should not require the scientist to know that the purported object to which the term allegedly refers actually exists. A logic that allows non-denoting individual constants (terms that do not refer to existing things) while maintaining the existential import of the quantifiers (‘∀x’ and ‘∃x’ mean something like ‘for all existing individuals x’ and ‘for some existing individuals x’, respectively) is called a free logic. In order for the above two inferences to be correct by the lights of free logic, the sentence Evan exists must be added to the basis. Correspondingly, the ∃-Intro and ∀-Elim rules in a natural deductive system for free logic may be portrayed as follows. Again, let 'Ωv' be a formula in which v is the only free variable, and let n be any name.

∀-Elim ∃-Intro
k. vΩv k. Ωn
l. E!n l. E!n
m. Ωn ∀-Elim: k, l m. vΩv ∃-Intro: k, l

'E!n' abbreviates n exists and so we suppose that ‘E!’ is an item of the relevant language. The ∀-Intro and ∃-Elim rules in a free logic deductive system also make explicit the required existential presuppositions with respect to individual constants (for details see Bencivenga 1986, p. 387). Free logic seems to be a useful tool for representing and evaluating reasoning in contexts such as the above. Different types of free logic arise depending on whether we treat terms that do not denote existing individuals as denoting objects that do not actually exist or as simply not denoting at all.

In sum, there are contexts in which it is appropriate to use languages whose vocabulary and syntactic formation rules are independent of our knowledge of the actual existence of the entities the language is about. In such languages, the quantifier rules of deductive system N sanction incorrect inferences, and so at best N represents correct deductive reasoning in languages for which the existential presupposition with respect to singular terms makes sense. The proponent of system N may argue that only those expressions guaranteed a referent (e.g., demonstratives) are truly singular terms. On this view, advocated by Bertrand Russell at one time, expressions that may not have a referent such as Santa Claus, God, Evan, Bill Clinton, the child abused by Michael Jackson are not genuinely singular expressions. For example, in the sentence Evan is male, Evan abbreviates a unique description such as the son of Matt and Beth. Then Evan is male comes to

There exists a unique x such that x is a son of Matt and Beth and x is male.

From this we may correctly infer that some are male. The representation of this inference in N appeals to both the ∃-Intro and &exists;-Elim rules, as well as the &-Elim rule. However, treating most singular expressions as disguised definite descriptions at worst generates counter-intuitive truth-value assignments (Santa Claus is jolly turns out false since there is no Santa Claus) and seems at best an unnatural response to the criticism posed from the vantagepoint of free logic.

For a short discussion of the motives behind free logic and a review of the family of free logics see Read (1995, chap. 5). For a more comprehensive discussion and a survey of the relevant literature see Bencivenga (1986). Morscher and Hieke (2001) is a collection of recent essays devoted to taking stock of the past fifty years of research in free logic and outlining new directions.

6. Conclusion

This completes our discussion of the deductive-theoretic conception of logical consequence. Since, arguably, logical consequence conceived deductive-theoretically is not compact it cannot be defined in terms of deducibility in a correct deductive system. Nevertheless correct deductive systems are useful for modeling deductive reasoning and they have applications in areas such as computer science and mathematics. Is deductive system N correct? In other words: Do the Intro and Elim rules of N represent correct principles of inference? We sketched three motives for answering in the negative, each leading to a logic that differs from the classical one developed here and which requires altering Intro and Elim rules of N. It is clear from the discussion that any full coverage of the topic would have to engage philosophical issues, still a matter of debate, such as the nature of truth, meaning and inference. For a comprehensive and very readable survey of proposed revisions to classical logic (those discussed here and others) see Haack (1996). For discussion of related issues, see also the entries, “Logical Consequence, Philosophical Considerations” and “Logical Consequence, Model-Theoretic Conceptions” in this encyclopedia.

7. References and Further Reading

  • Anderson, A.R. and N. Belnap (1962): “Entailment”, pp. 76-110 in Logic and Philosophy, ed. G. Iseminger. New York: Appleton-Century-Crofts, 1968.
  • Anderson, A.R., and N. Belnap (1975): Entailment: The Logic of Relevance and Necessity. Princeton: Princeton University Press.
  • Barwise, J. and J. Etchemendy (2001): Language, Proof and Logic. Chicago: University of Chicago Press and CSLI Publications.
  • Bencivenga, E. (1986): “Free logics”, pp. 373-426 in Gabbay and Geunthner (1986).
  • Dunn, M. (1986): “Relevance Logic and Entailment”, pp. 117-224 in Gabbay and Geunthner (1986).
  • Fitch, F.B. (1952): Symbolic Logic: An Introduction. New York: The Ronald Press.
  • Gabbay, D. and F. Guenthner, eds. (1983): Handbook of Philosophical Logic, Vol 1. Dordrecht: D. Reidel.
  • Gabbay, D. and F. Guenthner, eds. (1986): Handbook of Philosophical Logic, Vol. 3. Dordrecht: D. Reidel.
  • Gentzen, G. (1934): “Investigations Into Logical Deduction”, pp. 68-128 in Collected Papers, ed. M.E. Szabo. Amsterdam: North-Holland, 1969.
  • Haack, S. (1978): Philosophy of Logics. Cambridge: Cambridge University Press.
  • Haack, S. (1996): Deviant Logic, Fuzzy Logic. Chicago: The University of Chicago Press.
  • Morscher E. and A. Hieke, eds. (2001): New Essays in Free Logic: In Honour of Karel Lambert, Dordrecht: Kluwer.
  • Pelletier, F.J. (1999): “A History of Natural Deduction and Elementary Logic Textbooks”, pp.105-138 in Logical Consequence: Rival Approaches, ed. J. Woods and B. Brown. Oxford: Hermes Science Publishing, 2001.
  • Read, S. (1995): Thinking About Logic. Oxford: Oxford University Press.
  • Shapiro, S. (1991): Foundations without Foundationalism: A Case For Second-Order Logic. Oxford: Clarendon Press.
  • Shapiro, S. (2000): Thinking About Mathematics. Oxford: Oxford University Press.
  • Sundholm, G. (1983): “Systems of Deduction”, in Gabbay and Guenthner (1983).
  • Tarski, A. (1936): “On the Concept of Logical Consequence”, pp. 409-420 in Tarski (1983).
  • Tarski, A. (1983): Logic, Semantics, Metamathematics, 2nd ed. Indianapolis: Hackett Publishing.
  • Tennant, N. (1997): The Taming of the True. Oxford: Clarendon Press.
  • Van Dalen, D. (1999): “The Intuitionistic Conception of Logic”, pp. 45-73 in Varzi (1999).
  • Varzi, A., ed. (1999): European Review of Philosophy, Vol. 4, The Nature of Logic, Stanford: CSLI Publications.

Author Information

Matthew McKeon
Email: mckeonm@msu.edu
Michigan State University
U. S. A.

Reductio ad Absurdum

Reductio ad absurdum is a mode of argumentation that seeks to establish a contention by deriving an absurdity from its denial, thus arguing that a thesis must be accepted because its rejection would be untenable. It is a style of reasoning that has been employed throughout the history of mathematics and philosophy from classical antiquity onwards.

Table of Contents

  1. Basic Ideas
  2. The Logic of Strict Propositional Reductio: Indirect Proof
  3. A Classical Example of Reductio Argumentation
  4. Self-Annihilation: Processes that Engender Contradiction
  5. Doctrinal Annihilation: Sets of Statements that Are Collectively Inconsistent
  6. Absurd Definitions and Specifications
  7. Per Impossible Reasoning
  8. References and Further Reading

1. Basic Ideas

Use of this Latin terminology traces back to the Greek expression hê eis to adunaton apagôgê, reduction to the impossible, found repeatedly in Aristotle’s Prior Analytics. In its most general construal, reductio ad absurdumreductio for short – is a process of refutation on grounds that absurd – and patently untenable consequences would ensue from accepting the item at issue. This takes three principal forms according as that untenable consequence is:

  1. a self-contradiction (ad absurdum)
  2. a falsehood (ad falsum or even ad impossible)
  3. an implausibility or anomaly (ad ridiculum or ad incommodum)

The first of these is reductio ad absurdum in its strictest construction and the other two cases involve a rather wider and looser sense of the term. Some conditionals that instantiate this latter sort of situation are:

  • If that’s so, then I’m a monkey’s uncle.
  • If that is true, then pigs can fly.
  • If he did that, then I’m the Shah of Persia.

What we have here are consequences that are absurd in the sense of being obviously false and indeed even a bit ridiculous. Despite its departure from the usual form of reductio, this sort of thing is also characterized as an attenuated mode of reductio. But although all three cases fall into the range of the term as it is commonly used, logicians and mathematicians generally have the first and strongest of them in view.

The usual explanations of reductio fail to acknowledge the full extent of its range of application. For at the very minimum such a refutation is a process that can be applied to

  • individual propositions or theses
  • groups of propositions or theses (that is, doctrines or positions or teachings)
  • modes of reasoning or argumentation
  • definitions
  • instructions and rules of procedure
  • practices, policies and processes

The task of the present discussion is to explain the modes of reasoning at issue with reductio and to illustrate the work range of its applications.

2. The Logic of Strict Propositional Reductio: Indirect Proof

Whitehead and Russell in Principia Mathematica characterize the principle of “reductio ad absurdum” as tantamount to the formula (~pp) →p of propositional logic. But this view is idiosyncratic. Elsewhere the principle is almost universally viewed as a mode of argumentation rather than a specific thesis of propositional logic.

Propositional reductio is based on the following line of reasoning:

If p ⊢ ~p, then ⊢ ~p

Here ⊢ represents assertability, be it absolute or conditional (that is, derivability). Since pq yields ⊢p →q this principle can be established as follows:

Suppose (1) p ⊢ ~p

(2) ⊢p → ~p from (1)

(3) ⊢p → (p & ~p) from (2) since pp

(4) ⊢ ~(p & ~p) → ~p from (3) by contraposition

(5) ⊢ ~(p & ~p) by the Law of Contradiction

(6) ⊢ ~p from (4), (5) by modus ponens

Accordingly, the above-indicated line of reasoning does not represent a postulated principle but a theorem that issues from subscription to various axioms and proof rules, as instanced in the just-presented derivation.

The reasoning involved here provides the basis for what is called an indirect proof. This is a process of justificating argumentation that proceeds as follows when the object is to establish a certain conclusion p:

(1) Assume not-p

(2) Provide argumentation that derives p from this assumption.

(3) Maintain p on this basis.

Such argumentation is in effect simply an implementation of the above-stated principle with ~p standing in place of p.

As this line of thought indicates, reductio argumentation is a special case of demonstrative reasoning. What we deal with here is an argument of the pattern: From the situation

(to-be-refuted assumption + a conjunction of preestablished facts) ⊢ contradiction

one proceeds to conclude the denial of that to-be-refuted assumption via modus tollens argumentation.

An example my help to clarify matters. Consider division by zero. If this were possible when x is not 0 and we took x ÷ 0 to constitute some well-defined quantity Q, then we would have x ÷ 0 = Q so that x = 0 x Q so that since 0 x (anything) = 0 we would have x = 0, contrary to assumption. The supposition that x ÷ 0 qualifies as a well-defined quantity is thereby refuted.

3. A Classical Example of Reductio Argumentation

A classic instance of reductio reasoning in Greek mathematics relates to the discovery by Pythagoras – disclosed to the chagrin of his associates by Hippasus of Metapontum in the fifth century BC – of the incommensurability of the diagonal of a square with its sides. The reasoning at issue runs as follows:

Let d be the length of the diagonal of a square and s the length of its sides. Then by the Pythagorean theorem we have it that d² = 2s². Now suppose (by way of a reductio assumption) that d and s were commensurable in terms of a common unit u, so that d = n x u and s = m x u, where m and n are whole numbers (integers) that have no common divisor. (If there were a common divisor, we could simply shift it into u.) Now we know that

(n x u)² = 2(m x u

We then have it that n² = 2m². This means that n must be even, since only even integers have even squares. So n = 2k. But now n² = (2k)² = 4k² = 2m², so that 2k² = m². But this means that m must be even (by the same reasoning as before). And this means that m and n, both being even, will have common divisors (namely 2), contrary to the hypothesis that they do not. Accordingly, since that initial commensurability assumption engendered a contradiction, we have no alternative but to reject it. The incommensurability thesis is accordingly established.

As indicated above, this sort of proof of a thesis by reductio argumentation that derives a contradiction from its negation is characterized as an indirect proof in mathematics. (On the historical background see T. L. Heath, A History of Greek Mathematics [Oxford, Clarendon Press, 1921].)

The use of such reductio argumentation was common in Greek mathematics and was also used by philosophers in antiquity and beyond. Aristotle employed it in the Prior Analytics to demonstrate the so-called imperfect syllogisms when it had already been used in dialectical contexts by Plato (see Republic I, 338C-343A; Parmenides 128d). Immanuel Kant’s entire discussion of the antinomies in his Critique of Pure Reason was based on reductio argumentation.

The mathematical school of so-called intuitionism has taken a definite line regarding the limitation of reductio argumentation for the purposes of existence proofs. The only valid way to establish existence, so they maintain, is by providing a concrete instance or example: general-principle argumentation is not acceptable here. This means, in specific, that one cannot establish (∃x)Fx by deducing an absurdity from (∀x)~Fx. Accordingly, intuitionists would not let us infer the existence of invertebrate ancestors of homo sapiens from the patent absurdity of the supposition that humans are vertebrates all the way back. They would maintain that in such cases where we are totally in the dark as to the individuals involved we are not in a position to maintain their existence.

4. Self-Annihilation: Processes that Engender Contradiction

Not only can a self-inconsistent statement (and thereby a self-refuting, self-annihilating one) but also a self-inconsistent process or practice or principle of procedure can be “reduced to absurdity.” For any such modus operandi answers to some instruction (or combination thereof), and such instruction can also prove to be self-contradictory. Examples of this would be:

  • Never say never.
  • Keep the old warehouse intact until the new one is constructed. And build the new warehouse from the materials salvaged by demolishing the old.

More loosely, there are also instructions that do not automatically result in logically absurd (self-contradictory) conclusions, but which open the door to such absurdity in certain conditions and circumstances. Along these lines, a practical rule of procedure or modus operandi would be reduced to absurdity when it can be shown that its actual adoption and implementation would result in an anomaly.Consider an illustration of this sort of situation. A man dies leaving an estate consisting of his town house, his bank account of $30,000, his share in the family business, and several pieces of costume jewelry he inherited from his mother. His will specifies that his sister is to have any three of the valuables in his estate and that his daughter is to inherent the rest. The sister selects the house, a bracelet, and a necklace. The executor refuses to make this distribution and the sister takes him to court. No doubt the judge will rule something like “Finding for the plaintiff would lead ad absurdum. She could just as well have also opted not just for the house but also for the bank account and the business, thereby effectively disinheriting the daughter, which was clearly not the testator’s wish.” Here we have a juridical reductio ad absurdum of sorts. Actually implementing this rule in all eligible cases – its generalized utilization across the board – would yield an unacceptable and untoward result so that the rule could self-destruct in its actual unrestricted implementation. (This sort of reasoning is common in legal contexts. Many such cases are discussed in David Daube Roman Law [Edinburgh: Edinburgh University Press, 1969], pp. 176-94.)Immanuel Kant taught that interpersonal practices cannot represent morally appropriate modes of procedure if they do not correspond to verbally generalizable rules in this way. Such practices as stealing (that is, taking someone else’s possessions without due authorization) or lying (i.e. telling falsehoods where it suits your convenience) are rules inappropriate, so Kant maintains, exactly because the corresponding maxims, if generalized across the board, would be utterly anomalous (leading to the annihilation of property- ownership and verbal communication respectively. Since the rule-conforming practices thus reduce to absurdity upon their general implementation, such practices are adjudged morally unacceptable. For Kant, generalizability is the acid test of the acceptability of practices in the realm of interpersonal dealings.

5. Doctrinal Annihilation: Sets of Statements that Are Collectively Inconsistent

Even as individual statements can prove to be self-contradictions, so a plurality of statements (a “doctrine” let us call it) can prove to be collectively inconsistent. And so in this context reductio reasoning can also come into operation. For example, consider the following schematic theses:

  • AB
  • BC
  • CD
  • Not-D

In this context, the supposition that A can be refuted by a reductio ad absurdum. For if A were conjoined to these premisses, we will arrive at both D and not-D which is patently absurd. Hence it is untenable (false) in the context of this family of givens.When someone is “caught out in a contradiction” in this way their position self-destructs in a reduction to absurdity. An example is provided by the exchange between Socrates and his accusers who had charged him with godlessness. In elaborating this accusation, these opponents also accused Socrates of believing in inspired beings (daimonia). But here inspiration is divine inspiration such a daimonism is supposed to be a being inspired by a god. And at this point Socrates has a ready-made defense: how can someone disbelieve in gods when he is acknowledged to believe in god-inspired beings. His accusers here become enmeshed in self-contradiction. And their position accordingly runs out into absurdity. (Compare Aristotle, Rhetorica 1398a12 [II xxiii 8].)

6. Absurd Definitions and Specifications

Even as instructions can issue in absurdity, so can definitions and explanations. As for example:

  • A zor is a round square that is colored green.

Again consider the following pair:

  • A bird is a vertebrate animal that flies.
  • An ostrich is a species of flightless bird.

Definitions or specifications that are in principle unsatisfiable are for this very reason absurd.

7. Per Impossible Reasoning

Per impossible reasoning also proceeds from a patently impossible premiss. It is closely related to, albeit distinctly different from reductio ad absurdum argumentation. Here we have to deal with literally impossible suppositions that are not just dramatically but necessarily false thanks to their logical conflict with some clearly necessary truths, be the necessity at issue logical or conceptual or mathematical or physical. In particular, such an utterly impossible supposition may negate:

  • a matter of (logico-conceptual) necessity (“There are infinitely many prime numbers”).
  • a law of nature (“Water freezes at low temperatures”).

Suppositions of this sort commonly give rise to per impossible counterfactuals such as:

  • If (per impossible) water did not freeze, then ice would not exist.
  • If, per impossible, pigs could fly, then the sky would sometimes be full of porkers.
  • If you were transported through space faster than the speed of light, then you would return from a journey younger than at the outset.
  • Even if there were no primes less than 1,000,000,000, the number of primes would be infinite.
  • If (per impossible) there were only finitely many prime numbers, then there would be a largest prime number.

A somewhat more interesting mathematical example is as follows: If, per impossible, there were a counterexample to Fermat’s Last Theorem, there would be infinitely many counterexamples, because if xk + yk = zk, then (nx)k + (ny)k = (nz)k, for any k.

With such per impossible counterfactuals we envision what is acknowledged as an impossible and thus necessarily false antecedent, doing so not in order to refute it as absurd (as in reductio ad absurdum reasoning), but in order to do the best one can to indicate its “natural” consequences.

Again, consider such counterfactuals as:

  • If (per impossible) 9 were divisible by 4 without a remainder, then it would be an even number.
  • If (per impossible) Napoleon were still alive today, he would be amazed at the state of international politics in Europe.

A virtually equivalent formulation of the very point at issue with these two contentions is:

  • Any number divisible by 4 without remainders is even.
  • By the standards of Napoleonic France the present state of international politics in Europe is amazing.

However, the designation per impossible indicates that it is the conditional itself that concerns us. Our concern is with the character of that consequence relationship rather than with the antecedent or consequent per so. In this regard the situation is quite different from reductio argumentation by which we seek to establish the untenability of the antecedent. To all intents and purposes, then, counterfactuals can serve distinctly factual purpose.And so, often what looks to be a per impossible conditional actually is not. Thus consider

  • If I were you, I would accept his offer.

Clearly the antecedent/premiss “I = you” is absurd. But even the slightest heed of what is communicatively occurring here shows that what is at issue is not this just-stated impossibility but a counterfactual of the format:

  • If I were in your place (that is, if I were circumstanced in the condition in which you now find yourself), then I would consult the doctor.

Only by being perversely literalistic could the absurdity of that antecedent be of any concern to us.

One final point. The contrast between reductio and per impossible reasoning conveys an interesting lesson. In both cases alike we begin with a situation of exactly the same basic format, namely a conflict of contradiction between an assumption of supposition and various facts that we already know. The difference lies entirely in pragmatic considerations, in what we are trying to accomplish. In the one (reductio) case we seek to refute and rebut that assumptions so as to establish its negation, and in the other (per impossible) case we are trying to establish an implication – to validate a conditional. The difference at bottom thus lies not in the nature of the inference at issue, but only in what we are trying to achieve by its means. The difference accordingly is not so much theoretical as functional – it is a pragmatic difference in objectives.

8. References and Further Reading

  • David Daube, Roman Law (Edinburgh: Edinburgh University Press, 1969), pp. 176-94.
  • M. Dorolle, “La valeur des conclusion par l’absurde,” Révue philosophique, vol. 86 (1918), pp. 309-13.
  • T. L. Heath, A History of Greek Mathematics, vol. 2 (Oxford: Clarendon Press, 1921), pp. 488-96.
  • A. Heyting, Intuitionism: An Introduction (Amsterdam, North-Holland Pub. Co., 1956).
  • William and Martha Kneale, The Development of Logic (Oxford: Clarendon Press, 1962), pp. 7-10.
  • J. M. Lee, “The Form of a reductio ad absurdum,” Notre Dame Journal of Formal Logic, vol. 14 (1973), pp. 381-86.
  • Gilbert Ryle, “Philosophical Arguments,” Colloquium Papers, vol. 2 (Bristol: University of Bristol, 1992), pp. 194-211.

Author Information

Nicholas Rescher
Email: rescher+@pitt.edu
University of Pittsburgh
U. S. A.

Russell-Myhill Paradox

russellThe Russell-Myhill Antinomy, also known as the Principles of Mathematics Appendix B Paradox, is a contradiction that arises in the logical treatment of classes and “propositions”, where “propositions” are understood as mind-independent and language-independent logical objects. If propositions are treated as objectively existing objects, then they can be members of classes. But propositions can also be about classes, including classes of propositions. Indeed, for each class of propositions, there is a proposition stating that all propositions in that class are true. Propositions of this form are said to “assert the logical product” of their associated classes. Some such propositions are themselves in the class whose logical product they assert. For example, the proposition asserting that all-propositions-in-the-class-of-all-propositions-are-true is itself a proposition, and therefore it itself is in the class whose logical product it asserts. However, the proposition stating that all-propositions-in-the-null-class-are-true is not itself in the null class. Now consider the class w, consisting of all propositions that state the logical product of some class m in which they are not included. This w is itself a class of propositions, and so there is a proposition r, stating its logical product. The contradiction arises from asking the question of whether r is in the class w. It seems that r is in w just in case it is not.

This antinomy was discovered by Bertrand Russell in 1902, a year after discovering a simpler paradox usually called “Russell’s paradox.” It was discussed informally in Appendix B of his 1903 Principles of Mathematics. In 1958, the antinomy was independently rediscovered by John Myhill, who found it to plague the “Logic of Sense and Denotation” developed by Alonzo Church.

Table of Contents

  1. History and Historical Importance
  2. Formulation and Derivation
  3. Frege’s Response
  4. Possible Solutions
  5. References and Further Reading

1. History and Historical Importance

In his early work (prior to 1907) Russell held an ontology of propositions understood as being mind independent entities corresponding to possible states of affairs. The proposition corresponding to the English sentence “Socrates is wise” would be thought to contain both Socrates the person and wisdom (understood as a Platonic universal) as constituent entities. These entities are the meanings of declarative sentences.

After discovering “Russell’s paradox” in 1901 while working on his Principles of Mathematics, Russell began searching for a solution. He soon came upon the Theory of Types, which he describes in Appendix B of the Principles. This early form of the theory of types was a version of what has later come to be known as the “simple theory of types” (as opposed to ramified type theory). The simple theory of types was successful in solving the simpler paradox. However, Russell soon asked himself whether there were other contradictions similar to Russell’s paradox that the simple theory of types could not solve. In 1902, he discovered such a contradiction. Like the simpler paradox, Russell discovered this paradox by considering Cantor’s power class theorem: the mathematical result that the number of classes of entities in a certain domain is always greater than the number in the domain itself. However, there seems to be a 1-1 correspondence between the number of classes of propositions and the number of propositions themselves. A different proposition can seemingly be generated for each class of propositions, for instance, the proposition stating that all propositions in the class are true. This would mean that the number of propositions is as great as the number of classes of propositions, in violation of Cantor’s theorem.

Unlike Russell’s paradox, this paradox cannot be blocked by the simple theory of types. The simple theory of types divides entities into individuals, properties of individuals, properties of properties of individuals, and so forth. The question of whether a certain property applies to itself does not arise, because properties never apply to entities of their own type. Thus there is no question as to whether the property that a property has just in case it does not apply to itself applies to itself. Classes can only have entities of a certain type: the type to which the property defining the class applies. There can be classes of individuals, classes of classes of individuals, and classes of classes of classes of individuals, etc., but never classes that contain members of different types. Thus, there is no such thing as the class of all classes that are not in themselves. However, on the simple theory of types, propositions are not properties of anything, and thus, they are all in the type of individuals. However, they can include classes or properties as constituents. But consider the property a proposition has just in case it states the logical product of a class it is not in. This property defines a class. This class will be a class of individuals; for any individual, the question arises whether that individual is in the class. However, the proposition stating the logical product of this class is also an individual. Thus, the problematic question is not avoided by the simple theory of types.

Some authors have speculated that this antinomy was the first hint Russell found that what was needed to solve the paradoxes was something more than the simple theory of types. If so, then this antinomy is of considerable importance, as it might represent the first motivation for the ramified theory of types adopted by Russell and Whitehead in Principia Mathematica.

2. Formulation and Derivation

In 1902, when he discovered this paradox, Russell’s logical notation was borrowed mostly from Peano. However, translating into more contemporary notation, the class w of all propositions stating the logical product of a class they are not in, and r, the proposition stating its logical product, are written as follows:

w = {p: (∃m)[(p = (∀q)(qmq)) & ~(p m)]}
r = (∀q)(q w q)

Because propositions are entities, variables for them in Russell’s logic can be bound by quantifiers and can flank the identity sign. Indeed, Russell also allows complete sentences or formulae to flank the identity sign. If α is some complex formula, then “p = α” is to be understood as asserting that p is the proposition that “α”. Thus, w is defined as the class of propositions p such that there is a class of m for which p is the proposition that all propositions q in m are true, and such that p is not in m. The proposition r is then defined as the proposition stating that all propositions in w are true.

The derivation of the contradiction requires certain principles involving the identity conditions of propositions understood as entities. These principles were never explicitly formulated by Russell, but are informally stated in his discussion of the antinomy in the Principles. However, other writers have sought to make these principles explicit, and even to develop a fully formulated intensional logic of propositions based on Russell’s views. The principles relevant for the derivation of the contradiction are the following:

Principle 1: (∀p)(∀q)(∀r)(∀s)[((p q) = (r s)) →((p = r) & (q = s))]
Principle 2: [(∀x)A(x) = (∀x)B(x)] →(∀y)[A(y) = B(y)]

The first principle states that identical conditional propositions have identical antecedent and consequent component propositions. The second states that if the universal proposition that everything satisfies open formula A(x) is the same as the universal proposition that everything satisfies open formula B(x), then for any particular entity y, the proposition that A(y) is identical to the proposition that B(y).

Then, from either the assumption that rw or the assumption ~(r w), the opposite follows.

Assume:

1. rw

From (1), by class abstraction and the definition of w:

2. (∃m)[(r = (∀q)(q m q)) & ~(r m)]

(2) allows us to consider some m such that:

3. (r = (∀q)(qm q)) & ~(r m)

From the first conjunct of (3) definition of r we arrive at:

4. (∀q)(qw q) = (∀q)(qm q)

By (4) and principle 2, then:

5. (∀q)[(qw q) = (q m q)]

Instantiating (5) to r, we conclude:

6. (rw r) = (rm r)

By (6), and principle 1, then:

7. (rw) = (r m)

This, with the second disjunct of (3), yields:

8. ~(rm)

By (7) and (8) and substitution of identicals, we get:

9. ~(rw)

This contradicts our assumption. However, assume instead:

10. ~(rw)

By (10) and class abstraction:

11. ~(∃m)[(r = (∀q)(q m q)) & ~(r m)]

By the rules of the quantifiers and propositional logic, (11) becomes:

12. (∀m)[(r = (∀q)(q m q)) → (rm)]

Instantiating (12) to w:

13. (r = (∀q)(qw q)) → (r w)

By (13), the definition of r, and modus ponens:

14. rw

Thus, from either assumption the opposite follows.

3. Frege’s Response

Soon after discovering this antinomy, in September of 1902, Russell related his discovery to Gottlob Frege. Although Frege was clearly devastated by the simpler “Russell’s paradox”, which Russell had related to Frege three months prior, Frege was not similarly impressed by the Russell-Myhill antinomy. Russell had formulated the antinomy in Peano’s logical notation, and Frege charged that the apparent paradox derived from defects of Peano’s symbolism.

In Frege’s own way of speaking, a “proposition” is understood simply as a declarative sentence, a bit of language. Frege certainly did not ascribe to propositions the sort of ontology Russell did. However, he thought propositions had both senses and references. He called the senses of propositions “thoughts” and believed that their references were truth-values, either the True or the False. An expression written in his logical language was thought to stand for its reference (though express a thought). When propositions flank the identity sign, e.g. “p = q” this is taken as expressing that the two propositions have the same truth-value, not that they express the same thought.

Thus, Frege was unsatisfied with Russell’s formulation of the antinomy. In Russell’s definition “w = {p: (∃m)[(p = (∀q)(qm q)) & ~(p m)]}”, the part “p = (∀q)(qm q)” seems to mean not an identity of truth-values, but thoughts. However, if this is the case, then “(∀q)(q m q)” must be understood as referring to, rather than simply expressing, a thought. However, on Frege’s view, this would mean that the expressions that occur in it have indirect reference, i.e. they refer to the thoughts they customarily express. However, in indirect reference, the variable “m” in that context must be understood not as standing for a class, but as standing for a sense picking out a class. However, the second occurrence of “m” later on in the definition of w must be understood as referring to a class, not a sense picking out a class. However, if the two occurrences of “m” do not refer to the same thing, it is extremely problematic that they be bound by the same quantifier. Moreover, Russell’s derivation of the contradiction requires treating the two occurrences of “m” as referring to the same thing. Thus, Frege himself concluded that the antinomy was due to unclarities in the symbolism Russell used to formulate the paradox. He suggests that the antinomy can only be derived in a system that conflates or assimilates sense and reference.

However, it is not clear that Frege’s response is adequate. Frege criticizes only the syntactic formulation of the antinomy in a logical language, not the violation of Cantor’s theorem lying behind the paradox. Frege does not have an ontology of propositions, but he does have an ontology of thoughts. Thoughts, as objectively existing entities, can be members of classes. Moreover, it seems that there will be as many thoughts as there are classes of thoughts. One can generate a different thought for every class, i.e. the thought that everything is in the class or that all thoughts in the class are true. We now consider the class of all thoughts that state the logical product of a class they are not in, and a thought stating the logical product of this class, and arrive at the same contradiction. Frege’s metaphysics seems to have similar difficulties.

It is true that the antinomy cannot be formulated in Frege’s own logical systems. However, this is only because those systems are entirely extensional. In them, it is impossible to refer to thoughts (as opposed to simply express them) and assert their identity–one can only refer to truth-values and assert their identity. However, it appears that if Frege’s logical systems were expanded to include commitment to the realm of sense, to make it possible to refer not only to truth-values and classes, but thoughts and other senses, a version of the antinomy would be provable. In 1951, Alonzo Church developed an expanded logical system based loosely on Frege’s views, which he called “the Logic of Sense and Denotation”. In 1958, John Myhill discovered that the antinomy considered here was formulable in Church’s system. Myhill seems to have rediscovered the paradox independently of Russell. Hence the term, “Russell-Myhill Antinomy.”

4. Possible Solutions

The antinomy results from the following commitments

(A) The commitment to classes, defined for every property,

(B) The commitment to propositions as intensional entities (or to similar entities, such as Frege’s thoughts),

(C) An understanding of propositions such that there must exist as many propositions as there are classes of propositions; i.e. a different proposition can be generated for every class,

(D) An understanding of propositions and classes such that for every proposition and every class of propositions, the question arises as to whether the proposition falls in the class.

One might hope to solve the antinomy by abandoning any one of these commitments. Let us examine them in turn.

Abandoning (A), the commitment to classes, is very tempting, especially given the other paradoxes of class theory. However, in this context, this option may be not be as fruitful as it might appear. Russell himself worked on a “no classes” theory from 1905 though 1907. However, he soon discovered a classless version of the same paradox. Here, rather than considering a class w consisting of propositions, we consider a property W that a proposition p has just in case there is some property F for which p states that all propositions with F are true but which p does not itself have. Thus:

(∀p)[Wp ↔ (∃F)[(p = (∀q)(Fqq)) & ~Fp]]

We then define proposition r as the proposition that all propositions with property W are true:

r = (∀q)(Wqq)

Then, via a similar deduction to that given above, from the assumption of Wr one can prove ~Wr and vice versa. Thus it does not do to simply abandon classes. One would also have to abandon a robust ontology of properties; perhaps eschewing all of higher-order logic.

One might simply want to abandon (B), the commitment to propositions or Fregean thoughts understood as logical entities. The commitment to logical entities in a Platonic realm has grown less and less popular, especially given the widespread view that logic ought to be without ontological commitment. The challenge would be to abandon such intensional entities while maintaining a plausible account of meaning and intentionality.

However, one might hope to maintain commitment to propositions or thoughts, but attempt to reduce the number posited. This would likely involve denying (C). The Cantorian construction lying at the heart of the antinomy involves the claim that one can generate a different proposition for every class. In the construction given above, this claim is justified by showing that for each class, one can generate a proposition stating its logical product, and showing that, for each class, the class so generated is different. To deny this, one could either deny that one can generate such a proposition for each class, or instead, deny that the proposition so generated is different for every class. The first strategy is difficult to justify if one understands propositions and classes as objectively existing entities, independent of mind and language. If a proposition exists for every possible state of affairs, then one such proposition will exist for every class.

However, if one adopts looser identity conditions for propositions or thoughts, one might attempt to take the second approach to denying (C). That is, one would allow that the proposition stating the logical product of one class might be the same proposition as the proposition stating the logical product of a different class. This is perhaps not an easy approach to justify. In the Russellian deduction given above, principles 1 and 2 guarantee that the proposition stating the logical product of one class is always different from the proposition stating the logical product of another class. These principles seem justified by the understanding of propositions as composite entities with a certain fixed structure. Consider principle 1. It states that identical conditional propositions have identical propositions in their antecedent and consequent positions. However, this might be denied if one were adopt looser identity conditions for propositions. One might, for example, adopt logical equivalence as being a sufficient condition for propositions to be identical. If so, then principle 1 would be unjustified. For example pq and ~q → ~p are logically equivalent, however, they obviously need not have the same antecedent propositions. However, this approach may lead to other difficulties. Often, part of the motivation for intensional entities such as propositions or Fregean thoughts is in order to view them as relata in belief and other intentional states. If one adopts logical equivalence as sufficient for propositions to be identical, this is extremely problematic. The simple proposition p is logically equivalent to the proposition ~(p & ~q) → ~(q → ~p). If we take these two be the same proposition, then if propositions are relata in belief states, we seemingly must conclude that anyone who believes p also believes ~(p & ~q) → ~(q → ~p). This does not seem to be true.

W. V. Quine is famous for suggesting that intensional entities are “creatures of darkness”, having obscure identity conditions. Here it appears that if the identity conditions of intensions are taken to be too loose, then intensions cannot do many of the things we want of them. If the identity conditions of intensions are too stringent, however, it is difficult to avoid positing so many of them that inconsistency with Cantor’s theorem is a genuine threat.

Lastly, one could maintain commitment to a great number of propositions or thoughts as entities, but block the paradox by suggesting that these entities fall into different logical types. That is, one could deny (D), and suggest instead that the question does not always arise for every proposition and class of propositions whether that proposition is in that class. This is in effect the approach taken with ramified type-theory. In ramified type theory, the type of a formula α depends not only on whether α stands for an individual, a property of an individual, or a property of a property of an individual, etc., but also on what sort of quantification α involves. The core notion is that α cannot involve quantification over, or classes including, entities within a domain that includes the thing that α itself stands for. Consider the proposition r from the antinomy. Recall that r was defined as (∀q)(q m q). Thus, r involves quantification over propositions. In ramified type theory, we would disallow r to fall within the range of the quantifier involved in the definition of r. If a certain proposition involves quantification over a range of propositions, it cannot be included in that range. Thus, we divide the type of propositions into orders. Propositions of the lowest order include mundane propositions such as the proposition that Socrates is bald or the proposition that Hypatia is wise. Propositions of the next highest order involve quantification over, or classes of, propositions of this order, such as the proposition that all such propositions are true, or the proposition that if such a proposition is true, then God believes it, etc. Here, the challenge is to justify the ramified hierarchy as something more than a simple ad hoc dodge of the antinomies, to provide it with solid philosophical foundations. Poincaré’s Vicious Circle Principle is perhaps one way of providing such justification.

Antinomies such as the Russell-Myhill antinomy must be a concern for anyone with a robust ontology of intensional entities. Nevertheless, there may be solutions to the antinomy short of eschewing intensions altogether.

5. References and Further Reading

  • Anderson, C. A. “Semantic Antinomies in the Logic of Sense and Denotation.” Notre Dame Journal of Formal Logic 28 (1987): 99-114.
  • Anderson, C. A.. “Some New Axioms for the Logic of Sense and Denotation: Alternative (0).” Noûs 14 (1980): 217-34.
  • Church, Alonzo. “A Formulation of the Logic of Sense and Denotation.” In Structure, Method and Meaning: Essays in Honor of Henry M. Sheffer, edited by P. Henle, H. Kallen and S. Langer. New York: Liberal Arts Press, 1951.
  • Church, Alonzo. “Russell’s Theory of Identity of Propositions.” Philosophia Naturalis 21 (1984): 513-22.
  • Frege, Gottlob. Correspondence with Russell. In Philosophical and Mathematical Correspondence. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Klement, Kevin C. Frege and the Logic of Sense and Reference, New York: Routledge, 2002.
  • Myhill, John. “Problems Arising in the Formalization of Intensional Logic.” Logique et Analyse 1 (1958): 78-83.
  • Russell, Bertrand. Correspondence with Frege. In Philosophical and Mathematical Correspondence, by Gottlob Frege. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Russell, Bertrand. The Principles of Mathematics. 1902. 2d. ed. Reprint, New York: W. W. Norton & Company, 1996, especially §500.

Author Information

Kevin C. Klement
Email: klement@philos.umass.edu
University of Massachusetts, Amherst
U. S. A.

Logical Paradoxes

paradox_logicalA paradox is generally a puzzling conclusion we seem to be driven towards by our reasoning, but which is highly counterintuitive, nevertheless. There are, among these, a large variety of paradoxes of a logical nature which have teased even professional logicians, in some cases for several millennia. But what are now sometimes isolated as “the logical paradoxes” are a much less heterogeneous collection: they are a group of antinomies centered on the notion of self-reference, some of which were known in Classical times, but most of which became particularly prominent in the early decades of last century. Quine distinguished amongst paradoxes such antinomies. He did so by first isolating the “veridical” and “falsidical” paradoxes, which, although puzzling riddles, turned out to be plainly true, or plainly false, after some inspection. In addition, however, there were paradoxes which “produce a self-contradiction by accepted ways of reasoning,” and which, Quine thought, established “that some tacit and trusted pattern of reasoning must be made explicit, and henceforward be avoided or revised.” We will first look, more broadly, and historically, at several of the main conundrums of a logical nature which have proved difficult, some since antiquity, before concentrating later on the more recent troubles with paradoxes of self-reference. They will all be called “logical paradoxes.”

Table of Contents

  1. Classical Logical Paradoxes
  2. Moving to Modern Times
  3. Some Recent Logical Paradoxes
  4. Paradoxes of Self-Reference
  5. A Contemporary Twist
  6. References and Further Reading

1. Classical Logical Paradoxes

The four main paradoxes attributed to Eubulides, who lived in the fourth century BC, were “The Liar,” “The Hooded Man,” “The Heap,” and “The Horned Man” (compare Kneale and Kneale 1962, p114).

The Horned Man is a version of the “When did you stop beating your wife?” puzzle. This is not a simple question, and needs a carefully phrased reply, to avoid the inevitable come-back to “I have not.” How is one to understand this denial, as saying you continue to beat your wife, or that you once did but do so no longer, or that you never have, and never will? It is a question of what the “not,” or negation means, in this case. If “stopped beating” means “beat before, but no longer,” then “not stopped beating” covers both “did not beat before” and “continues to beat.” And in that case “I haven’t” is an entirely correct answer to the question, if you in fact did not beat your wife. However, your audience might still need to be taken slowly through the alternatives before they clearly see this. Likewise with the Horned Man, which arises if someone wants to say, for instance, “what you have not lost you still have.” In that case they will maybe have to accept the unwelcome conclusion “I still have horns,” if they admit “I have not lost any horns.” Here, if “lost” means “had, but do not still have” then “not lost” would cover the alternative “did not have in the first place” as well as “do still have” — in which case what you have not lost you do not necessarily still have.

The Heap is nowadays commonly referred to as the Sorites Paradox, and concerns the possibility that the borderline between a predicate and its negation need not be finely drawn. We would all say that a man with no hairs on his head was bald, and that a man with, say, 10,000 hairs on his head was hirsute, that is not bald, but what about a man with only 1,000 hairs on his head, which are, say, evenly spread? It is not too clear what we should say, although maybe some would still want to say positively “bald,” while others would want to say “not bald.” The learned treatment of this issue, in recent years, has been very extensive, with “the lazy solution” not being the only one favoured, by any means. The lazy solution says that any lack of certainty about what to say is merely a matter of us not having yet decided upon, or even having the need to make up our mind about, a “precisification” of the concept of baldness. There are objectors to this “epistemic” way of seeing the matter, some of whom would prefer to think, for instance (see, e.g. Sainsbury 1995), that there was something essentially “fuzzy” about baldness, so it is a “vague predicate” by the nature of things, instead of just through lack of effort, or need. (For recent work in this area, see, for instance, Williamson 1994, and Keefe 2001).

The Hooded Man is about the concept of knowledge, and in other versions has again been much studied in recent years, as we shall see. In its original version the problem is this: maybe you would be prepared to say that you know your brother, yet surely someone might come in, who was in fact your brother, but with his head covered, so you did not know who it was. One aspect of this paradox is that the verb “know” is ambiguous, and in fact is translated by two separate terms in several other languages than English — French, for instance, has “connâitre” and “savoir.” There is the sense of “being acquainted with,” in other words, and the sense of “knowing a fact about something.” Perhaps these two senses are inter-related, but distinguishing them provides one way out of the Hooded Man. For we can distinguish being acquainted with your brother from knowing that someone is your brother. Although you do not know it, you are certainly acquainted with the hooded man, since he is your brother, and you are acquainted with your brother. But that does not entail that you know that the hooded man is your brother, indeed, evidently you do not. We could also say, in that case, that you did not recognize your brother, for the notion of recognition is close to that of knowledge. And that points to another aspect of the problem, and another way of resolving the paradox — showing, in addition, that there needn’t just be one solution, or way out. Thus you might well be able to recognise your brother, but that does not require you can always do so, it merely means you can do better at this than those people who cannot do so. If we re-phrase the case: “you can recognise your brother, but you did not recognise him when he had his head covered,” then there is not really a paradox.

The last of Eubulides’ paradoxes mentioned above was The Liar, which is perhaps the most famous paradox in the “self-reference” family. The basic idea had several variations, even in antiquity. There was, for instance, The Cretan, where Epimenides, a Cretan, says that all Cretans are liars, and The Crocodile, where a crocodile has stolen someone’s child, and says to him “I will return her to you if you guess correctly whether I will do so or not” — to which the father says “You will not return my child”! Indeed a whole host of complications of The Liar have been constructed, especially in the last century, as we shall see. Now in The Cretan there is no real antinomy — it may simply be false that all Cretans are liars; but if someone says just “I am lying,” the situation is different. For if it is true that he is lying then seemingly what he says is false; but if it is false that he is lying then what he is saying may seem to be true. A pedant might say that “lying” was strictly not telling an untruth, but telling merely what one believes to be an untruth. In that case there is not the same difficulty with the person’s remark being true: maybe he is indeed lying, although he does not believe it. The pedant, however, misses the point that his verbal nicety can be circumvented, and the paradox re-constructed in another, indeed many other forms. We shall look in more detail later at the paradox here, in some of its more complicated versions.

Before leaving the ancients, however, we can look at Zeno’s Paradoxes, which not only have a logical interest in their own right, but also have a very close bearing on some paradoxes which appeared later, to do with infinity, and infinitesimals. Zeno’s Paradoxes are primarily about the possibility of motion, but more generally they are about the possibility of specifying the units, or atomic parts, of which either space or time, or indeed any continuum may be thought to be composed.

For, Zeno argued (see, for instance, Owen 1957, and Salmon 1970), if there were such units then they would either have a size, or not have a size. But if they had a size we would have the paradox of The Stadium, while if they had no size we would have the paradox of The Arrow. Thus if runners A and B are approaching one another both at unit speed, then, supposing the units have a finite size, after one time unit they will have each moved one space unit relative to the stadium. But they will have moved two space units relative to each other, which implies that there was a time unit in between when they were just one space unit apart. So the time unit must be divisible after all. On the other hand, if the units of division have no size, then, at any given time, an arrow in flight must occupy a space just equal to itself — for it cannot move within that time. But if so then it is at rest, and the arrow never moves.

That would seem to mean that space and time are divided without limit. But Zeno argued that if space and time were in themselves divided without limit then we would have the paradox of Achilles and the Tortoise. A runner, before he gets to the end of his race would have to get to the half-way point, but then also to the half-way point beyond that, that is the three-quarter-way point, and so on. There would be no limit to the sequence of points he would have to get to, and so there would always be a bit more to be run, and he could never get to the end. Likewise in a competitive race, even, say, between the super-speedy Achilles and a tortoise: Achilles would not be able to catch the tortoise up — so long as the tortoise was given a start. For Achilles has first to get to the tortoise’s original position, but by then the tortoise will be, however fractionally, further on. Now Achilles must always reach the tortoise’s previous position before catching him up. Hence he never catches it up.

Aristotle had a way of resolving Zeno’s Paradoxes which convinced most people until more recent times. Aristotle’s resolution of Zeno’s Paradoxes involved distinguishing between space and time being in themselves divided into parts without limit, and simply being divisible (by ourselves, for instance) without limit. No continuous magnitude, Aristotle thought, is actually composed of parts, since, although it may be divisible into parts without limit, the continuum is given before any such resulting division into parts. In particular, Aristotle denied that there could be any non-finite parts, and so is often called a “Finitist”: non-finite “parts” cannot be parts of space or time, he thought, since no magnitude can be composed of what has no extension. This view came to be challenged later, since it means that an arrow can only be “at rest” if it is at the same place at two separate times — for Aristotle both rest and motion can only be defined over a finite increment of time. But later the notion of an instantaneous velocity came to be accepted, and that includes the case where the velocity is zero.

The puzzle about non-finite parts may remind one of the question which occupied many scholastic theologians in the Middle Ages: how many angels can sit upon a pin? And it is perhaps no accident that the theorist who gave the currently received answer to the general question of how many things without any extension make up a whole which has such an extension was a fervent believer in God. Certainly Aristotle’s Finitism only stayed generally persuasive until the latter end of the nineteenth century, when the theorist in question, Cantor, specified the number of non-finite points in a continuum to most learned people’s satisfaction.

2. Moving to Modern Times

Between the classical times of Aristotle and the late nineteenth century when Cantor worked, there was a period in the middle ages when paradoxes of a logical kind were considered intensively. That was during the fourteenth century. Notable individuals were Paul of Venice, living towards the end of that century, and John Buridan, born just before it. As models of the care, and clarity which is required to extricate oneself from the above kind of difficulties with problem propositions each of these writers will surely stand forever. As an illustration, Buridan discusses “No change is instantaneous” in the following way (Scott 1966, p178):

I prove it, because every change is either in an indivisible instant or it is in a divisible time. But none is in an indivisible instant, since an indivisible instant cannot be given in time, as is always supposed. Hence every change is in divisible time, and every such must be called temporal and not instantaneous.

The opposite is argued, because at least the creation of our intellective soul is instantaneous. For since it is indivisible, it must be made altogether at once, not one part after another. And such creation we call instantaneous. Therefore.

Buridan also discusses “You know the one approaching,” which resembles Eubulides’ Hooded Man (Scott 1966, p178):

I posit the case that you see your father coming from a distance, in such a way that you do not discern whether it is your father or another.Then it is proved, because you do indeed know your father, and he is the one approaching; hence, you know the one approaching. Likewise, you know him who is known by you, but the one approaching is known by you; hence, you know the one approaching. I prove the minor, because your father is known by you and your father is the one approaching; hence, the one approaching is known by you.

The opposite is argued, because you do not know him of whom, if you are asked who he is, you will answer truly “I do not know.” But concerning the one approaching you say this; hence etc.

These two cases are “sophisms” in Buridan’s book on such, Sophismata, and amongst these, in chapter 8, are the “insolubles,” which are the ones involving some form of self-reference. Broadly speaking, that is to say, Buridan made a distinction similar to that mentioned before, between general paradoxes of a logical nature, and “the logical paradoxes.” Thus in his chapter 8 Buridan discusses Eubulides’ Liar Paradox in several forms, for instance as it arises with “Every proposition is false” in the following circumstances (Scott 1966, p191): “I posit the case that all true propositions should be destroyed and false ones remain. And then Socrates utters only this proposition: ‘Every proposition is false’.”

Extended discussion of such cases may seem somewhat academic, but between Buridan’s period, and more recent times, one notable figure started to bring out something of the larger importance of these issues. Indeed, quite generally, sophisms about the nature of change and continuity, about knowledge and its objects, and the ones about the notion of self-reference, amongst many others, have attracted a great deal of very professional attention, once their significance was realised, with techniques of analysis drawn from developments in formal logic and linguistic studies being added to the careful and clear expression, and modes of argument found in the best writers before. The pace of change started to quicken in the later nineteenth century, but the one earlier thinker who will also be mentioned here is Bishop Berkeley, who was active in the early eighteenth. For a history of this period, in connection with the issues which concerned Berkeley, see, for instance, Grattan-Guinness 1980. Berkeley’s argument was with Newton about the foundations of the calculus; he took, amongst other things, a sceptical line about the possibility of instantaneous velocities.

It will be remembered that in the calculation of a derivative the following fraction is considered:

f(x + δx) – f(x) / δx,

where δx is a very small quantity.In the elementary case where f(x) = x2, for instance, we get

(x + δx)2 – x2 / δx,

and the calculation goes first to

2xδx+ δx2/ δx,

and then to 2x + δx, with δx being subsequently set to zero to get the exact derivative 2x.Berkeley objected that only if δx was not zero could one first divide through by it, and so one was in no position, with the result of that operation, to then take δx to be zero.If it took δx to be zero Newton’s calculus, it seemed, required the impossible notion of an instantaneous velocity, which, of course, Aristotle had denied in connection with his analysis of Zeno’s Paradoxes.The point was appreciated to some extent elsewhere.For the association between the derivative and motion, initiated by Newton’s use of the term “fluxion,” was largely confined to England, and on the Continent, Leibniz’ cotemporaneous development of the calculus had more hold.And that involved the idea that the increment δx was never zero, but merely remained a still finite “infinitesimal.”

One way of putting Aristotle’s Finitism is to say that he believed that infinities, such as the possible successive divisions of a line, were only “potential,” not “actual” — an actual infinite division would end up with non-extensional, and so non-finite points. Leibniz, however, had no problem with the notion of an actual infinite division of a line — or with the idea that the result could be a finite quantity. However, while Leibniz introduced finite infinitesimals instead of fluxions, this idea was also questioned as not sufficiently rigorous, and both ideas lost ground to definitions of derivatives in terms of limits, by Cauchy and Weierstrass in the nineteenth century. Leibniz’ notion of finite infinitesimals in fact has been given a more rigorous definition since that time, by Abraham Robinson, and other proponents of “non-standard analysis,” but it was on the previous, nineteenth century theory of real numbers that Cantor worked, before he came to formulate his theory of infinite numbers. Leibniz would not have thought it too sensible to ask how many of his infinitesimals made up the line, but Cantor made much more precise the answer “infinitely many.”

It is necessary to get some idea about the theory of real numbers before we can understand the next logical paradoxes which emerged in this tradition: Russell’s Paradox, Burali-Forti’s Paradox, Cantor’s Paradox, and Skolem’s Paradox. We will look at those in the next section, which will then lead us into twentieth century developments in the area of self-reference. But before all that it should be mentioned how recent discussions of knowledge and its objects, for instance, has become very professionalised, since developed discussion of issues to do with Eubulides’ Hooded Man has been just as dominant in this period.

These issues, it will be remembered, centred on the problem of non-recognition, and in various ways two central cases of this have been given close attention since the end of the nineteenth century. A great deal of other relevant discussion has also gone on, but these two cases are perhaps the most important, historically (see, for example, Linsky 1967). First must be mentioned Frege’s interest in the difficulty of inferring someone believes something about the Evening Star so long as they believe that thing about the Morning Star. In fact the Morning Star is the same as the Evening Star, we now realise, but this was not always recognised, and indeed it is now realised that even the term “star” is a misnomer, both objects being the planet Venus. Still someone ignorant of the astronomical identity, it may be thought, might accept “The Evening Star is in the sky,” but reject “The Morning Star is in the sky.” Quine produced another much discussed case of a similar sort, concerning Bernard J. Ortcutt, a respectable man with grey hair, once seen at the beach. In one location he was taken to be not a spy, in another place he was taken to be a spy, as one might say; but is that quite the best way the situation should be described? Maybe one who does not recognise him can have beliefs about the man at the beach without thereby having those beliefs about the respectable man with grey hair — or even Bernard J. Ortcutt. Certainly Quine thought so, which has not only caused a large scale controversy in itself; it has also led to, or been part of much broader discussions about identity in similar, but non-personal, intensional notions, like modality. Thus, as Quine pointed out, it would not seem to be necessary that the number of the planets is greater than 4, although it is necessary that 9 is greater than 4, and 9 is the number of the planets. A branch of formal logic, Intensional Logic, has been developed to enable a more precise analysis of these kinds of issue.

3. Some Recent Logical Paradoxes

It was developments in other parts of mathematics which were integral to the discovery of the next logical paradoxes to be considered. These were developments in the theory of real numbers, as was mentioned before, but also in Set Theory, and Arithmetic. Arithmetic is now taken to be concerned with a “denumerable” number of objects — the natural numbers — while real numbers are “non-denumerable.” Sets of both infinite sizes can be formed, it is now thought, which is the basis on which Cantor was to give his precise answer “two to the aleph zero” to the question of how many points there are on a line.

The tradition up to the middle of the nineteenth century did not look at these matters in this kind of way. For the natural numbers arise in connection with counting, for instance counting the cows in a field. If there are a number of cows in the field then there is a set of them: sets are collections of such individuals. But with the beef in the field we do not normally talk in these terms: “beef” is a mass noun not a count noun, and so it does not individuate things, merely name some stuff, and, as a result, a number can be associated with the beef in the field only given some arbitrary unit, like a pound, or a kilogram. When there is just some F then there isn’t a number of F’s, although there might be a number of, say, pieces of F. It is the same with continua like space and time, which we can divide into yards, or seconds, or indeed any finite quantity, and that is perhaps the main fact which supports Aristotle’s view that any division of such a continuum is merely potential rather than actual, and inevitably finite both in the unit used and in the number of them in a whole.

But continua from Cantor onwards have been seen as composed of non-finite individuals. And not only that is the change. For also the number of individuals in some set of individuals — whether cows, or the non-finite elements in beef — has been taken to be possibly non-finite, with a whole containing those individuals being then still available: the infinite set of them. We now commonly have the idea that there may be infinite sets first of finite entities, which will then be “countable” or “denumerable,” but also there will be sets of non-finite, infinitesimal entities, which will be “uncountable,” or “non-denumerable.”

It is important to appreciate the grip that these new ideas had on the late nineteenth century generation of mathematicians and logicians, since it came to seem, as a result of these sorts of changes, that everything in mathematics was going to be explainable in terms of sets: Set Theory looked like it would become the entire foundation for mathematics. Only once one has appreciated this expectation, which the vanguard of theorists uniformly had, can one realise the very severe jolt to that society which came with the discovery of Russell’s Paradox, and several others at much the same time, around the turn of the century. For Russell’s Paradox showed that not everything could be a set.

If we write “x is F” as “Fx”—as came to be common in this same period—then the set of F’s is written

{x|Fx},

and to say a is F, that is Fa, would then seem to be to say that a belonged to this set, that is

a ∈ {x|Fx},

where the symbol “∈” represents “is a member of.”

It therefore seems plausible to enunciate this as a general principle,

for all y: y ∈ {x|Fx} if and only if Fy,

which is symbolised in contemporary logic,

(y)(y ∈ {x|Fx} iff Fy).

But if the result held for all predicates “F” then we could say, for any “F”

there is a z such that: (y)(y ∈ z iff Fy),

which is now formalised

(∃>z)(y)(y ∈ z iff Fy).

In the foundations of Arithmetic which Frege described in his major logical works Begriffsschrift, and Grundgesetze, this principle is a major axiom (Kneale and Kneale 1962, Ch 8), but Russell found it was logically impossible, since if one takes for “Fy” the specific predicate “y does not belong to y,” that is “¬ y ∈ y” then it requires

(∃>z)(y)(y ∈ z iff ¬ y ∈ y),

wherefrom, given the above meanings of “(∃>z)” and “(y)”, we get the contradiction

z ∈ z iff ¬ z ∈ z,

that is z is a member of itself if and only if it is not a member of itself. As a result of this paradox which Russell discovered, the theory of sets was considerably altered, and limits were put on Frege’s axiom, so that, for instance, either it defined merely subsets of known sets (Zermelo’s theory), or allowed one to discriminate sets from other entities — usually called “proper classes” (von Neumann’s theory). In the latter case those things which are not members of themselves form a proper class but not a set, and proper classes cannot be members of anything.

But there were other reasons why it came to be realised that sets could not always be formed, following the discovery of Burali-Forti’s and Cantor’s Paradoxes. Burali-Forti’s Paradox is about certain sets called “ordinals,” because of their connection with the ordinals of ordinary language, that is “first,” “second,” “third,” etc. The sets which are ordinals are so ordered that each one is a member of all the following ones, and so, with no limit envisaged to the sets which could be formed, it seemed possible to prove that any succession of such ordinals would themselves be members of a further ordinal – which would have to be distinct from each of them. The trouble came in considering the totality of all ordinals, since that would mean that there would have to be a further distinct ordinal not in this totality, and yet it was supposed to be the totality of all ordinals. A very similar contradiction is reached in Cantor’s Paradox.

For, for finite sets of finite entities it is easy to prove Cantor’s Theorem, namely that the number of members of a set is strictly less than the number of its subsets. If one forms a set of the subsets of a given set then one produces the “power set” of the original set, so another way of stating Cantor’s Theorem is to say that the number of members of a set is strictly less than the number of members of its power set. Cantor extended this theorem to his infinite sets as well – although there was at least one such set he realised it obviously could not apply to, namely the set of everything, sometimes called the universal set. For the set of its subsets clearly could not have a greater number than the number of things in the universal set itself, since that contained everything. This was Cantor’s Paradox, and his resolution of it was to say that such an infinity was “inconsistent,” since it could not be consistently numbered. He thought, however, that only the size of infinite sets had to be limited, assuming that lesser infinities could be consistently numbered, and nominating, for a start, “aleph zero” as the number, or more properly the “power” of the natural numbers (Hallett 1984, p175). In fact an earlier paradox about the natural numbers had suggested that even they could not be consistently numbered: for they could be put into 1 to 1 correlation with the even numbers, for one thing, and yet there were surely more of them, since they included the odd numbers as well. This paradox Cantor took to be avoided by his definition of the power of a set (N.B. not the power set of a set): his definition merely required two sets to be put into 1 to 1 correlation in order for them to have the same power. Thus all infinite sequences of natural numbers have the same power, aleph zero.

But the number of points in a line was not aleph zero, it was two to the aleph zero, and Cantor produced several proofs that these were not the same. The most famous was his diagonal argument which seems to show that there must be orders of infinity, and specifically that the non-denumerably infinite is distinct from the denumerably infinite. For belief in real numbers is equivalent to belief in certain infinite sets: real numbers are commonly understood simply in terms of possibly-non-terminating decimals, but this definition can be derived from the more theoretical ones (Suppes 1972, p189). But can the decimals between, say, 0 and 1 be listed? Listing them would make them countable in the special sense of this which has been adopted, which amongst other things does not require there to be a last item counted. The natural numbers are countable in this sense, as before, and any list, it seems, can be indexed by the ordinal numbers. Suppose, however, that we had a list in which the n-th member was of this form:

an = 0.an1an2an3an4…,

where ani is a digit between 0 and 9 inclusive.Then that list would not contain the “diagonal” decimal am defined by

amn = 9 – ann,

since for n = m this equation is false, if only whole digits are involved. This seems to show that the totality of decimals in any continuous interval cannot be listed, which implies that there are at least two separate orders of infinity.

Of course, if there were no infinite sets then there would be no infinite numbers, countable or uncountable, and so an Aristotelian would not accept the result of this proof as a fact. Discrete things might, at the most, be potentially denumerable, for him. But the difficulty with the result extends even to those who accept that there are infinite sets, because of another paradox, Skolem’s Paradox, which shows that all theories of a certain sort must have a countable model, that is must be true in some countable domain of objects. But Set Theory is one such theory, and in it, supposedly, there must be non-countable sets. In fact a denumerable model for Set Theory has recently been specified, by Lavine (Lavine 1994), so how can Cantor’s diagonal proof be accommodated? Commonly it is accommodated by saying that, within the denumerable model of Set Theory, non-denumerability is represented merely by the absence of a function which can do the indexing of a set, that is produce a correlation between the set and the ordinal numbers. But if that is the case, then maybe the difficulty of listing the real numbers in an interval is comparable. Certainly given a list of real numbers with a functional way of indexing them, then diagonalisation enables us to construct another real number. But maybe there still might be a denumerable number of all the real numbers in an interval without any possibility of finding a function which lists them, in which case we would have no diagonal means of producing another. We seem to need a further proof that being denumerable in size means being listable by means of a function.

4. Paradoxes of Self-Reference

The possibility that Cantor’s diagonal procedure is a paradox in its own right is not usually entertained, although a direct application of it does yield an acknowledged paradox: Richard’s Paradox. Consider for a start all finite sequences of the twenty six letters of the English alphabet, the ten digits, a comma, a full stop, a dash and a blank space. Order these expressions according, first, to the number of symbols, and then lexicographically within each such set. We then have a way of identifying the n-th member of this collection. Now some of these expressions are English phrases, and some of those phrases will define real numbers. Let E be the sub-collection which does this, and suppose we can again identify the n-th place in this, for each natural number n. Then the following phrase, as Richard pointed out, would seem to define a real number which is not defined in the collection: “The real number whose whole part is zero, and whose n-th decimal place is p plus 1 if the n-th decimal of the real number defined by the n-th member of E is p, and p is neither 8 or 9, and is simply one if this n-th decimal is eight or nine.” But this expression is a finite sequence of the previously described kind.

One significant fact about this paradox is that it is a semantical paradox, since it is concerned not just with the ordered collection of expressions (which is a syntactic matter), but also their meaning, that is whether they refer to real numbers. It is this which possibly makes it unclear whether there is a specifiable list of expressions of the required kind, since while the total list of expressions can certainly be straightforwardly ordered, whether some expression defines a real number is maybe not such a clear cut matter. Indeed, it might be concluded, just from the very fact that a paradox ensues, as above, that whether some English phrase defines a real number is not always entirely settleable. In Borel’s terms, it cannot be decided effectively (Martin-Löf 1970, p44). Another very similar semantical paradox with this same aspect is Berry’s Paradox, about “the least integer not nameable in fewer than nineteen syllables.” The problem here is that that very phrase has less than nineteen syllables in it, and yet, if it names an integer, that integer would have to be not nameable in less than nineteen syllables. So is there a definite set of English expressions which name integers not nameable in less than nineteen syllables?

If some sort of fuzziness was the case then there would be a considerable difference between such paradoxes and the previous paradoxes in logical theory like Russell’s, Burali-Forti’s and Cantor’s, for instance. Indeed it has been common since Ramsey’s discussion of these matters, in the 1920s, to divide the major logical paradoxes into two: the semantic or linguistic on the one hand, and the syntactic or mathematical on the other. Mackie disagreed with Ramsey to a certain extent, although he was prepared to say (Mackie 1973, p262):

The semantical paradoxes…can thus be solved in a philosophical sense by demonstrating the lack of content of the key items, the fact that various questions and sentences, construed in the intended way, raise no substantial issue. But these are comments appropriate only to linguistic items; one would expect that this method would apply only to the semantic paradoxes, and not to “syntactic” ones like Russell’s class paradox, which are believed to involve only (formal) logical and mathematical elements.

Russell himself opposed the distinction, formulating his famous “Vicious Circle Principle” which, he held, all the paradoxes of self-reference violated. Specifically he held that statements about all the members of certain collections were nonsense (compare Haack 1978, p141):

Whatever involves all of a collection must not be one of a collection, or, conversely, if, provided a certain collection had a total it would have members only definable in terms of that total, then the said collection has no total.

But this, seemingly, would rule out specifying, for instance, a man as the one with the highest batting average in his team, since he is then defined in terms of a total of which he is a member. It effectively imposes a ban on all forms of self-reference, and so Russell’s uniform solution to the paradoxes is usually thought to be too drastic. Some might say “this may be using a cannon against a fly, but at least it stops the fly!”; but it also devastates too much else in the vicinity.

A more recent theorist to oppose Ramsey’s distinction has been Priest. In fact he has tried to prove that all the main paradoxes of self-reference have a common structure using a further insight of Russell’s, which he calls “Russell’s Schema” (Priest 1994, p27). This pre-dates Russell’s attachment to the Vicious Circle Principle, but Priest has shown that, when adapted and applied to all the main paradoxes, it matches the reasoning which leads to the contradiction in each one of them. This approach, however, presumes that semantical notions, like definability, designation, truth, and knowledge can be construed in terms of mathematical sets, which seems to be really the very supposition which Ramsey disputed.

Grelling’s Paradox also makes this supposition questionable. It is a self-referential, semantical paradox resembling, to some extent, Russell’s Paradox, and concerns the property which an adjective has if it does not apply to itself. Thus

“large” is not large,
“multi-syllabled” is multi-syllabled,
“English” is English,
“French” is not French.

Let us use the term “heterological” for the property of being non-self-applicable, so we can say that “large” and “French” are heterological, for instance, and we can write as a general definition

“x” is heterological if and only if “x” is not x.

But clearly, substituting “heterological” for “x” produces a contradiction. Does this contradiction mean there is no such concept as heterologicality, just as there is no such set as the Russell set? Goldstein has recently argued that this is so (Goldstein 2000, p67), following a tradition Mackie calls “the logical proof approach” (Mackie 1973, p254f), to which Ryle was a notable contributor (Ryle 1950-1). The point is made even more plausible given the very detailed logical analysis which Copi provided (Copi 1973, p301).

Copi first introduces the definition

Hs =df (∃>F)(sDesF&(P)(sDesP iff P=F) & ¬Fs),

in which “¬” abbreviates “not”, and “Des” refers to the relation between a verbal expression and the property it designates. Thus “sDesF” reads: s designates F. Copi’s proof of the contradiction then goes in the following way. First, H”H” entails in turn

(∃>F)(“H”DesF&(P)(“H”DesP iff P=F)&¬F”H”) – by substitution in the definition,

“H”DesF&(P)(“H”DesP iff P=F)&¬F”H” – by taking the case thus said to exist,

(“H”DesH iff H=F)&¬F”H” – by substitution in the “for all P”,

H=F&¬F”H” – by assuming “H” designates H,

Then ¬H”H” entails in turn

(F)¬(“H”DesF&(P)(“H”DesP iff P=F)&¬F”H”) – since “¬(∃>F)” is equivalent to “(F)¬”,¬(H”DesH&(P)(“H”DesP iff P=H)&¬H”H”) – substituting “H” for “F”,

¬((P)(“H”DesP iff P=H)&¬H”H”) – assuming “H” designates H,

H”H” – assuming (P)(“H”DesP iff P=H).

To get the contradiction

H”H” iff ¬H”H”,

therefore, one has to be assured that there is one and only one property which “H” designates. And Copi gives no proof of this.

The Liar Paradox is a further self-referential, semantical paradox, perhaps the major one to come down from antiquity. And one may very well ask, with respect to

What I am now saying is false,

for instance, whether this has any sense, or involves a substantive issue, as Mackie would have it (see also Parsons 1984). But there is a well known further paradox which seems to block this dismissal. For if we allow, as well as “true” and “false” also “meaningless,” then it might well seem that The Strengthened Liar arises, which, in this case, could be expressed

What I am now saying is false, or meaningless.

If I am saying nothing meaningful here, then seemingly what I say is true, which seems to imply that it does have meaning, after all.

Let us, therefore, look at some other notable ways of trying to escape even the Unstrengthened Liar. The Unstrengthened Liar comes in a whole host of variations, for instance:

This very sentence is false,

or

Some sentence in this book is false,

if that sentence is the only sentence in a book, say in its preface. It also arises with the following pair of sentences taken together:

The following sentence is false,The previous sentence is true;

and in a case of Buridan’s,

What Plato is saying is false,What Socrates is saying is true,

if Socrates says the first, while Plato says the second. There are many other variations, some of which we shall look at later.

The semantical concepts in these paradoxes are truth and falsity, and the first major contribution to our understanding of these, in the twentieth century, was by Tarski. Tarski took truth and falsity to be predicates of sentences, and discussed at length the following example of his famous “T-scheme”:

“snow is white” is true if and only if snow is white.

He believed that

Ts iff p,

holds, quite generally, if “s” is some phrase naming, or referring, to the sentence “p” — for instance, as above, that same sentence in quotation marks, or a number in some system of numbering, which was the way Gödel handled such matters. Tarski’s analysis of truth involved denying that there could be “semantic closure” that is the presence in a language of the semantic concepts relating to expressions in that language (Tarski 1956, p402):

The main source of the difficulties met with seems to lie in the following: it has not always been kept in mind that the semantical concepts have a relative character, that they must always be related to a particular language. People have not been aware that the language about which we speak need by no means coincide with the language in which we speak. They have carried out the semantics of a language in that language itself and, generally speaking, they have proceeded as though there was only one language in the world. The analysis of the antinomies mentioned shows, on the contrary, that the semantical concepts simply have no place in the language to which they relate, that the language which contains its own semantics, and within which the usual laws of logic hold, must inevitably be inconsistent.

This conclusion, which requires that any consistent language be incomplete, Tarski derived directly by considering The Liar, since “This is false” seems to provide a self-referential “s” for which

s = “¬Ts”,

hence, substituting in the following example of the T-scheme

T”¬Ts” iff ¬Ts,

we get

Ts iff ¬Ts.

To block this conclusion Tarski held that the self-reference seemingly available in the identity

s = “¬Ts”

was just not consistently available, and specifically that, if one used the sentence “this is false” then the referent of “this” should not be that very sentence itself – on pain of the evident contradiction. Using “this is false” coherently meant speaking about an object language, but in another, higher, language – the meta-language. Of course the semantical concepts applicable in this meta-language likewise could not be sensibly defined within it, so generally there was supposed to be a whole hierarchy of languages.

It seems difficult to apply this kind of stratification of languages to the way we ordinarily speak, however. Indeed, to assert that truth can attach to indexical sentences, like “What I am now saying is false,” would seem to be flying in the face of a very clear truth (Kneale 1972, p234f). Consider, further, this variation of the Plato-Socrates case above (compare Haack 1978, p144), where Jones says

All of Nixon’s utterances about Watergate are false,

and Nixon says

All of Jones’ utterances about Watergate are true.

If, following Tarski, we were to try to assign levels of language to this pair of utterances, then how could we do it? It would seem that Jones’ utterance would have to be in a language higher, in Tarski’s hierarchy, than any of Nixon’s; yet, contrariwise, Nixon’s would have to be higher than any of Jones’.

Martin has produced a typology of solutions to the Liar which locates Tarski’s way out as one amongst four possible, general diagnoses (Martin 1984, p4). The two principles which Martin takes to categorise the Liar we have just seen, namely

(S) There is a sentence which says of itself only that it is not true,

and

(T) Any sentence is true if and only if what it says is the case.

Tarski, in these terms, took claim (S) to be incorrect. But one also might claim that (T) is incorrect, maybe because there are sentences without a truth value, being meaningless, or lacking in content in some other way, as is held by the theorists mentioned before. A third general diagnosis claims that both (S) and (T) are correct, and indeed incompatible, but proceeds to some “rational reconstruction” of them so that the incompatibility is removed. Fourthly it is possible to argue that (S) and (T) are correct, but really compatible. Martin sees this happening as a result of some possible ambiguity in the terms used in the two principles.

We can isolate a further, fifth option, although Martin does not consider it. That option is to hold that both (S) and (T) are incorrect, as is done by the tradition which holds that it is not sentences which are true or false. One cannot say, for instance, that the sentence “that is white” is true, in itself, since what is spoken of might vary from one utterance of the sentence to another. Following the second world war, because of this sort of thing, it became more common to think of semantical notions as attached not to sentences and words, but to what such sentences and words mean (Kneale and Kneale 1962, p601f). On this understanding it is not specifically the sentence “that is white” but what is expressed by this sentence, that is the statement or proposition made by it, which may be true. But it was shown by Thomason, following work by Montague, that the same sorts of problems can be generated even in this case. We can create self-referential paradoxes to do with statements and propositions which again cannot be obviously escaped (Thomason 1977, 1980, 1986). And the problems are not just confined to the semantics of truth and falsity, but also arise in just the same way with more general semantical notions like knowledge, belief, and provability. In recent years, the much larger extent of the problems to do with self-reference has, in this way, become increasingly apparent.

Asher and Kamp sum up (Asher and Kamp 1989, p87):

Thomason argues that the results of Montague (1963) apply not only to theories in which attitudinal concepts, such as knowledge and belief, are treated as predicates of sentences, but also to “representational” theories of the attitudes, which analyse these concepts as relations to, or operations on (mental) representations. Such representational treatments of the attitudes have found many advocates; and it is probably true that some of their proponents have not been sufficiently alert to the pitfalls of self-reference even after those had been so clearly exposed in Montague (1963)… To such happy-go-lucky representationalists, Thomason (1980) is a stern warning of the obstacles that a precise elaboration of their proposals would encounter.

Thomason mentions specifically Fodor’s “Language of Thought” in his work; Asher and Kamp themselves show that modes of argument similar to Thomason’s can be used even to show that Montague’s Intensional Semantics has the same problems. Asher and Kamp go on to explain the general method which achieves these results (Asher and Kamp 1989, p87):

Thomason’s argument is, at least on the face of it, straightforward. He reasons as follows: Suppose that a certain attitude, say belief, is treated as a property of “proposition-like” objects – let us call them “representations” – which are built up from atomic constituents in much the way that sentences are. Then, with enough arithmetic at our disposal, we can associate a Gödel number with each such object and we can mimic the relevant structural properties of and relations between such objects by explicitly defined arithmetical predicates of their Gödel numbers. This Gödelisation of representations can then be exploited to derive a contradiction in ways familiar from the work of Gödel, Tarski and Montague.

The only ray of hope Asher and Kamp can offer is (Asher and Kamp 1989, p94): “Only the familiar systems of epistemic and doxastic logic, in which knowledge and belief are treated as sentential operators, and which do not treat propositions as objects of reference and quantification, seem solidly protected from this difficulty.” But see on these, for instance, Mackie 1973, p276f, although also Slater 1986.

Gödel’s famous theorems in this area are, of course, concerned with the notion of provability, and they show that if this notion is taken as a predicate of certain formulas, then in any standard formal system which has enough arithmetic to handle the Gödel numbers used to identify the formulas in the system, certain statements can be constructed which are true, but are not provable in the system, if it is consistent. What is also true, and even provable in such a system is that, if it is consistent then (a) a certain specific self-referential formula is not provable in the system, and (b) the consistency of the system is not provable in the system. This means the consistency of the system cannot be proved in the system unless it is inconsistent, and it is commonly believed that the appropriate systems are consistent. But if they are consistent then this result shows they are incomplete, that is there are truths which they cannot prove.

The paradoxical thing about Gödel’s Theorems is that they seem to show that there are things we can ourselves prove, in the natural language we use to talk about formal systems, but which a formal system of proof cannot prove. And that fact has been fed into the very large debate about our differences from, even superiority over mechanisms (see e.g. Penrose 1989). But if we consider the way many people would argue about, for instance,

this very sentence is unprovable,

then our abilities as humans might not seem to be too great. For many people would argue:

If that sentence is provable then it is true, since provability entails truth; but that makes it unprovable, which is a contradiction. Hence it must be unprovable. But by this process we seem to have proved that it is unprovable – another contradiction!

So, unless we can extricate ourselves from this impasse, as well as the many others we have looked at, we would not seem to be too bright. Or does this sort of argument show that there is, indeed, no escape? Some people, of course, might want to follow Tarski, and run from “natural language” in the face of these conclusions. For Gödel had no reason to conclude, from his theorems, that the formal systems he was concerned with were inconsistent. However, his formal arguments differ crucially from that just given, since there is no proof within his systems that “provability” entails “truth.” There is no doubt that what we have been dealing with are real paradoxes!

The intractability of the impasse here, and the failure of many great minds to make headway with it, has lead some theorists to believe that indeed there is no escape. Notable amongst these is Priest (compare Priest 1979), who believes we must now learn to accept that some contradictions can be true, and adjust our logic accordingly. This is very much in line with the expectation we initially noted Quine had, that maybe “some tacit and trusted pattern of reasoning must be made explicit and henceforward be avoided or revised.” (Quine 1966, p7)  The particular law which “paraconsistent” logicians mainly doubt is “ex impossibile quodlibet”, that is “from an impossibility anything follows,” or

(p&¬p) ⊢ q.

It is thought that, if this traditional rule were removed from logic then, at least, any true contradictions we find, e.g. anything of the form “p&¬p” which we deduce from some paradox of self-reference, will not have the wholesale repercussions that it otherwise would have in traditional logic. Objectors to paraconsistency might say that the premise of this rule could not arise, so its “explosive” repercussions would never eventuate. But there is the broader, philosophical question, as well, about whether a switch to a different logic does not just change the subject, leaving the original problems unattacked. That depends on how one views “deviant logics.” There are reasons to believe that deviant logics are not rivals of traditional logic, but merely supplementary to, or extensions of it (Haack, 1974, Pt 1, Ch1). For if one drops the above rule then hasn’t one merely produced a new kind of negation? Are “p” and “¬p” still contradictory, if they can, somehow, both be true? And if “p” and “¬p” are not contradictory, then what is contradictory to “p”, and couldn’t we formulate the previous paradoxes in terms of it? It seems we may have just turned our backs on the real difficulty.

5. A Contemporary Twist

There have been developments, in the last few years, which have shown that the previous emphasis on paradoxes involving self-reference was to some extent misleading. For a family of paradoxes, with similar levels of intractability, have been discovered, which are not reflexive in this way.

It was mentioned before that a form of the Liar paradox could be derived in connection with the pair of statements

What Plato is saying is false,

What Socrates is saying is true,

when Socrates says the former, and Plato the latter. For, if what Socrates is saying is true, then, according to the former, what Plato is saying is false, but then, according to the latter, what Socrates is saying is false. On the other hand, if what Socrates is saying is false then, according to the former, what Plato is saying is true, and then, according to the latter, what Socrates is saying is true. Such a paradox is called a “liar chain”; they can be of any length; and with them we are already out of the really strict “self-reference” family, although, by passing along through the chain what Socrates is saying, it will eventually come back to reflect on itself.

It seems, however, that, if one creates what might be called “infinite chains” then there is not even this attenuated form of self-reference (though see Beall, 2001). Yablo asked us to consider an infinite sequence of sentences of which the following is representative (Yablo 1993):

(Si) For all k>i, Sk is untrue.

Sorensen’s “Queue Paradox” is similar, and can be obtained by replacing “all” by “some” here, and considering the series of thoughts of some students in an infinite queue (Sorensen 1998). Suppose that, in Yablo’s case, Sn is true for some n. Then Sn+1 is false, and all subsequent statements; but the latter fact makes Sn+1 true; giving a contradiction. Hence for no n is Sn true. But that means that S1 is true, S2 is true, etc; in fact it means every statement is true, which is another contradiction. In Sorensen’s case, if some student thinks “some of the students behind me are now thinking an untruth” then this cannot be false, since then all the students behind her are thinking the truth – although that means that some student behind her is speaking an untruth, a contradiction. So no student is thinking an untruth. But if some student is consequently thinking a truth, then some student behind them is thinking an untruth, which we know to be impossible. Indeed every supposition seems impossible, and we are in the characteristic impasse.

Gaifman has worked up a way of dealing with such more complex paradoxes of the Liar sort, which can end up denying the sentences in such loops, chains, and infinite sequences have any truth value whatever. Using “GAP” for “recognised failure to assign a standard truth value” Gaifman formulates what he calls the “closed loop rule” (Gaifman 1992, pp225, 230):

If, in the course of applying the evaluation procedure, a closed unevaluated loop forms and none of its members can be assigned a standard value by any of the rules, then all of its members are assigned GAP in a single evaluation step.

Goldstein has formulated a comparable process, which he thinks improves upon Gaifman in certain details, and which ends up labelling certain sentences “FA”, meaning that the sentence has made a “failed attempt” at making a statement (Goldstein 2000, p57). But the major question with such approaches, as before, is how they deal with The Strengthened Liar. Surely there remain major problems with

This sentence is false, or has a GAP,

and

This sentence makes a false statement, or is a FA.

6. References and Further Reading

  • Asher, N. and Kamp, H. 1986, “The Knower’s Paradox and Representational Theories of Attitudes,” in J. Halpern (ed.) Theoretical Aspects of Reasoning about Knowledge, San Mateo CA, Morgan Kaufmann.
  • Asher, N. and Kamp, H. 1989, “Self-Reference, Attitudes and Paradox” in G. Chierchia, B.H. Partee, and R. Turner (eds.) Properties, Types and Meaning 1.
  • Beall, J.C., 2001, “Is Yablo’s Paradox Non-Circular?,” Analysis 61.3.
  • Copi, I.M., 1973, Symbolic Logic 4th ed. Macmillan, New York.
  • Gaifman, H. 1992, “Pointers to Truth,” The Journal of Philosophy, 89, 223-61.
  • Goldstein, L. 2000, “A Unified Solution to Some Paradoxes,” Proceedings of the Aristotelian Society, 100, pp53-74.
  • Grattan-Guinness, I. (ed.) 1980, From the Calculus to Set Theory, 1630-1910, Duckworth, London.
  • Haack, S. 1974, Deviant Logic, C.U.P., Cambridge.
  • Haack, S. 1978, Philosophy of Logics, C.U.P., Cambridge.
  • Hallett, M. 1984, Cantorian Set Theory and Limitation of Size, Clarendon Press, Oxford.
  • Keefe, R. 2001, Theories of Vagueness, C.U.P. Cambridge.
  • Kneale, W. 1972, “Propositions and Truth in Natural Languages,” Mind, 81, pp225-243.
  • Kneale, W. and Kneale M. 1962, The Development of Logic, Clarendon Press, Oxford.
  • Lavine, S. 1994, Understanding the Infinite, Harvard University Press, Cambridge MA.
  • Linsky, L. 1967, Referring, Routledge and Kegan Paul, London.
  • Mackie, J.L. 1973, Truth, Probability and Paradox, Clarendon Press, Oxford.
  • Martin, R.L. (ed.) 1984, Recent Essays on Truth and the Liar Paradox, Clarendon Press, Oxford.
  • Martin-Löf, P. 1970, Notes on Constructive Mathematics, Almqvist and Wiksell, Stockholm.
  • Montague, R. 1963, “Syntactic Treatments of Modality, with Corollaries on Reflection Principles and Finite Axiomatisability,” Acta Philosophica Fennica, 16, pp153-167.
  • Owen, G.E.L. 1957-8, “Zeno and the Mathematicians,” Proceedings of the Aristotelian Society, 58, 199-222.
  • Parsons, C. 1984, “The Liar Paradox” in R.L.Martin (ed.) Recent Essays on Truth and the Liar Paradox, Clarendon Press, Oxford.
  • Penrose, R. 1989, The Emperor’s New Mind, O.U.P., Oxford.
  • Priest, G.G. 1979, “The Logic of Paradox,” Journal of Philosophical Logic, 8, pp219-241.
  • Priest, G.G. 1994, “The Structure of the Paradoxes of Self-Reference,” Mind, 103, pp25- 34.
  • Quine, W.V.O. 1966, The Ways of Paradox, Random House, New York.
  • Ryle, G. 1950-1, “Heterologicality,” Analysis, 11, pp61-69.
  • Sainsbury, M. 1995, Paradoxes, 2nd ed., C.U.P. Cambridge.
  • Salmon, W.C. (ed.) 1970, Zeno’s Paradoxes, Bobbs-Merrill, Indianapolis.
  • Scott, T.K. 1966, John Buridan: Sophisms on Meaning and Truth, Appleton-Century- Crofts, New York.
  • Slater, B.H. 1986, “Prior’s Analytic,” Analysis, 46, pp76-81.
  • Sorensen, R. 1998, “Yablo’s Paradox and Kindred Infinite Liars,” Mind, 107, 137-55.
  • Suppes, P. 1972, Axiomatic Set Theory, Dover, New York.
  • Tarski, A. 1956, Logic, Semantics, Metamathematics: Papers from 1923 to 1938, trans. J.H. Woodger, O.U.P. Oxford.
  • Thomason, R. 1977, “Indirect Discourse is not Quotational,” The Monist, 60, pp340-354.
  • Thomason, R. 1980, “A Note on Syntactical Treatments of Modality,” Synthese, 44, pp391-395
  • Thomason, R. 1986, “Paradoxes and Semantic Representation,” in J.Halpern (ed.) Theoretical Aspects of Reasoning about Knowledge, San Mateo CA, Morgan Kaufmann.
  • Williamson, T. 1994, Vagueness, London, Routledge.
  • Yablo, S. 1993, “Paradox without Self-Reference,” Analysis, 53, 251-52.

For more discussion of the logical paradoxes, see the following articles within this encyclopedia:

Author Information

Barry Hartley Slater
Email: slaterbh@cyllene.uwa.edu.au
University of Western Australia
Australia

Russell’s Paradox

russellRussell’s paradox represents either of two interrelated logical antinomies. The most commonly discussed form is a contradiction arising in the logic of sets or classes. Some classes (or sets) seem to be members of themselves, while some do not. The class of all classes is itself a class, and so it seems to be in itself. The null or empty class, however, must not be a member of itself. However, suppose that we can form a class of all classes (or sets) that, like the null class, are not included in themselves. The paradox arises from asking the question of whether this class is in itself. It is if and only if it is not. The other form is a contradiction involving properties. Some properties seem to apply to themselves, while others do not. The property of being a property is itself a property, while the property of being a cat is not itself a cat. Consider the property that something has just in case it is a property (like that of being a cat) that does not apply to itself. Does this property apply to itself? Once again, from either assumption, the opposite follows. The paradox was named after Bertrand Russell (1872-1970), who discovered it in 1901.

Table of Contents

  1. History
  2. Possible Solutions to the Paradox of Properties
  3. Possible Solutions to the Paradox of Classes or Sets
  4. References and Further Reading

1. History

Russell’s discovery came while he was working on his Principles of Mathematics. Although Russell discovered the paradox independently, there is some evidence that other mathematicians and set-theorists, including Ernst Zermelo and David Hilbert, had already been aware of the first version of the contradiction prior to Russell’s discovery. Russell, however, was the first to discuss the contradiction at length in his published works, the first to attempt to formulate solutions and the first to appreciate fully its importance. An entire chapter of the Principles was dedicated to discussing the contradiction, and an appendix was dedicated to the theory of types that Russell suggested as a solution.

Russell discovered the contradiction from considering Cantor’s power class theorem: the mathematical result that the number of entities in a certain domain is always smaller than the number of subclasses of those entities. Certainly, there must be at least as many subclasses of entities in the domain as there are entities in the domain given that for each entity, one subclass will be the class containing only that entity. However, Cantor proved that there also cannot be the same number of entities as there are subclasses. If there were the same number, there would have to be a 1-1 function f mapping entities in the domain on to subclasses of entities in the domain. However, this can be proven to be impossible. Some entities in the domain would be mapped by f on to subclasses that contain them, whereas others may not. However, consider the subclass of entities in the domain that are not in the subclasses on to which f maps them. This is itself a subclass of entities of the domain, and thus, f would have to map it on to some particular entity in the domain. The problem is that then the question arises as to whether this entity is in the subclass on to which f maps it. Given the subclass in question, it does just in case it does not. The Russell paradox of classes can in effect be seen as an instance of this line of reasoning, only simplified. Are there more classes or subclasses of classes? It would seem that there would have to be more classes, since all subclasses of classes are themselves classes. But if Cantor’s theorem is correct, there would have to be more subclasses. Russell considered the simple mapping of classes onto themselves, and invoked the Cantorian approach of considering the class of all those entities that are not in the classes onto which they are mapped. Given Russell’s mapping, this becomes the class of all classes not in themselves.

The paradox had profound ramifications for the historical development of class or set theory. It made the notion of a universal class, a class containing all classes, extremely problematic. It also brought into considerable doubt the notion that for every specifiable condition or predicate, one can assume there to exist a class of all and only those things that satisfy that condition. The properties version of the contradiction–a natural extension of the classes or sets version–raised serious doubts about whether one can be committed to objective existence of a property or universal corresponding to every specifiable condition or predicate. Indeed, contradictions and problems were soon found in the work of those logicians, philosophers and mathematicians who made such assumptions. In 1902, Russell discovered that a version the contradiction was expressible in the logical system developed in Volume I of Gottlob Frege’s Grundgesetze der Arithmetik, one of the central works in the late-19th and early-20th century revolution in logic. In Frege’s philosophy, a class is understood as the “extension” or “value-range” of a concept. Concepts are the closest correlates to properties in Frege’s metaphysics. A concept is presumed to exist for every specifiable condition or predicate. Thus, there is a concept of being a class that does not fall under its defining concept. There is also a class defined by this concept, and it falls under its defining concept just in case it does not.

Russell wrote to Frege concerning the contradiction in June of 1902. This began one of the most interesting and discussed correspondences in intellectual history. Frege immediately recognized the disastrous consequences of the paradox. He did note, however, that the properties version of the paradox was solved in his philosophy by his distinction between levels of concepts. For him, concepts are understood as functions from arguments to truth-values. Some concepts, “first-level concepts”, take objects as arguments, some concepts, “second-level concepts” take these functions as arguments, and so on. Thus, a concept can never take itself as argument, and the properties version cannot be formulated.  However, classes, or extensions or concepts, were all understood by Frege to be of the same logical type as all other objects.  The question does arise, then, for each class whether it falls under its defining concept.

When he received Russell’s first letter, the second volume of Frege’s Grundgesetze was already in the latter stages of the publication process. Frege was forced to quickly prepare an appendix in response to the paradox. Frege considers a number of possible solutions. The conclusion he settles on, however, is to weaken the class abstraction principle in the logical system. In the original system, one could conclude that an object is in a class if and only if the object falls under the concept defining the class. In the revised system, one can conclude only that an object is in a class if and only if the object falls under the concept defining the class and the object is not identical to the class in question. This blocks the class version of the paradox. However, Frege was not entirely happy even with this solution. And this was for good reason. Some years later the revised system was found to lead to a more complicated form of the contradiction. Even before this result was discovered, Frege abandoned it and seems to have concluded that his earlier approach to the logic of classes was simply unworkable, and that logicians would have to make do entirely without commitment to classes or sets.

However, other logicians and mathematicians have proposed other, relatively more successful, alternative solutions. These are discussed below.

2. Possible Solutions to the Paradox of Properties

The Theory of Types. It was noted above that Frege did have an adequate response to the contradiction when formulated as a paradox of properties. Frege’s response was in effect a precursor to what one of the most commonly discussed and articulated proposed solutions to this form of the paradox. This is to insist that properties fall into different types, and that the type of a property is never the same as the entities to which it applies. Thus, the question never even arises as to whether a property applies to itself. A logical language that divides entities into such a hierarchy is said to employ the theory of types. Though hinted at already in Frege, the theory of types was first fully explained and defended by Russell in Appendix B of the Principles. Russell’s theory of types was more comprehensive than Frege’s distinction of levels; it divided not only properties into different logical types, but classes as well. The use of the theory of types to solve the other form of Russell’s paradox is described below.

To be philosophically adequate, the adoption of the theory of types for properties requires developing an account of the nature of properties such that one would be able to explain why they cannot apply to themselves. After all, at first blush, it would seem to make sense to predicate a property of itself. The property of being self-identical would seem to be self-identical. The property of being nice seems to be nice. Similarly, it seems false, not nonsensical, to say that the property of being a cat is a cat. However, different thinkers explain the justification for the type-division in different ways. Russell even gave different explanations at different parts of his career. For his part, the justification for Frege’s division of different levels of concepts derived from his theory of the unsaturatedness of concepts. Concepts, as functions, are essentially incomplete. They require an argument in order to yield a value. One cannot simply predicate one concept of a concept of the same type, because the argument concept still requires its own argument. For example, while it is possible to take the square root of the square root of some number, one cannot simply apply the function square root to the function square root and arrive at a value.

Conservatism about Properties. Another possible solution to the paradox of properties would involve denying that a property exists corresponding to any specifiable conditions or well-formed predicate. Of course, if one eschews metaphysical commitment to properties as objective and independent entities altogether, that is, if one adopts nominalism, then the paradoxical question is avoided entirely. However, one does not need to be quite so extreme in order to solve the antinomy. The higher-order logical systems developed by Frege and Russell contained what is called the comprehension principle, the principle that for every open formula, no matter how complex, there exists as entity a property or concept exemplified by all and only those things that satisfy the formula. In effect, they were committed to attributes or properties for any conceivable set of conditions or predicates, no matter how complex. However, one could instead adopt a more austere metaphysics of properties, only granting objective existence to simple properties, perhaps including redness, solidity and goodness, etc. One might even allow that such properties can possibly apply to themselves, e.g. that goodness is good.  However, on this approach one would deny the same status to complex attributes, e.g. the so-called “properties” as having-seventeen-heads, being-a-cheese-made-England, having-been-written-underwater, etc. It is simply not the case that any specifiable condition corresponds to a property, understood as an independently existing entity that has properties of its own. Thus, one might deny that there is a simple property being-a-property-that-does-not-apply-to-itself. If so, one can avoid the paradox simply by adopting a more conservative metaphysics of properties.

3. Possible Solutions to the Paradox of Classes or Sets

It was mentioned above that late in his life, Frege gave up entirely on the feasibility of the logic of classes or sets. This is of course one ready solution to the antinomy in the class or set form: simply deny the existence of such entities altogether. Short of this, however, the following solutions have enjoyed the greatest popularity:

The Theory of Types for Classes: It was mentioned earlier that Russell advocated a more comprehensive theory of types than Frege’s distinction of levels, one that divided not only properties or concepts into various types, but classes as well. Russell divided classes into classes of individuals, classes of classes of individuals, and so on. Classes were not taken to be individuals, and classes of classes of individuals were not taken to be classes of individuals. A class is never of the right type to have itself as member. Therefore, there is no such thing as the class of all classes that are not members of themselves, because for any class, the question of whether it is in itself is a violation of type. Once again, here the challenge is to explain the metaphysics of classes or sets in order to explain the philosophical grounds of the type-division.

Stratification: In 1937, W. V. Quine suggested an alternative solution in some ways similar to type-theory. His suggestion was rather than actually divide entities into individuals, classes of individuals, etc., such that the proposition that some class is in itself is always ill-formed or nonsensical, we can instead put certain restrictions on what classes are supposed to exist. Classes are only supposed to exist if their defining conditions are so as to not involve what would, in type theory, be a violation of types. Thus, for Quine, while “x is not a member of x” is a meaningful assertion, we do not suppose there to exist a class of all entities x that satisfy this statement. In Quine’s system, a class is only supposed to exist for some open formula A if and only if the formula A is stratified, that is, if there is some assignment of natural numbers to the variables in A such that for each occurrence of the class membership sign, the variable preceding the membership sign is given an assignment one lower than the variable following it. This blocks Russell’s paradox, because the formula used to define the problematic class has the same variable both before and after the membership sign, obviously making it unstratified. However, it has yet to be determined whether or not the resulting system, which Quine called “New Foundations for Mathematical Logic” or NF for short, is consistent or inconsistent.

Aussonderung: A quite different approach is taken in Zermelo-Fraenkel (ZF) set theory. Here too, a restriction is placed on what sets are supposed to exist. Rather than taking the “top-down” approach of Russell and Frege, who originally believed that for any concept, property or condition, one can suppose there to exist a class of all those things in existence with that property or satisfying that condition, in ZF set theory, one begins from the “bottom up”. One begins with individual entities, and the empty set, and puts such entities together to form sets. Thus, unlike the early systems of Russell and Frege, ZF is not committed to a universal set, a set including all entities or even all sets. ZF puts tight restrictions on what sets exist. Only those sets that are explicitly postulated to exist, or which can be put together from such sets by means of iterative processes, etc., can be concluded to exist. Then, rather than having a naive class abstraction principle that states that an entity is in a certain class if and only if it meets its defining condition, ZF has a principle of separation, selection, or as in the original German, “Aussonderung“. Rather than supposing there to exist a set of all entities that meet some condition simpliciter, for each set already known to exist, Aussonderung tells us that there is a subset of that set of all those entities in the original set that satisfy the condition. The class abstraction principle then becomes: if set A exists, then for all entities x in A, x is in the subset of A that satisfies condition C if and only if x satisfies condition C. This approach solves Russell’s paradox, because we cannot simply assume that there is a set of all sets that are not members of themselves. Given a set of sets, we can separate or divide it into those sets within it that are in themselves and those that are not, but since there is no universal set, we are not committed to the set of all such sets. Without the supposition of Russell’s problematic class, the contradiction cannot be proven.

There have been subsequent expansions or modifications made on all these solutions, such as the ramified type-theory of Principia Mathematica, Quine’s later expanded system of his Mathematical Logic, and the later developments in set-theory made by Bernays, Gödel and von Neumann. The question of what is the correct solution to Russell’s paradox is still a matter of debate.

See also the Russell-Myhill Paradox article in this encyclopedia.

4. References and Further Reading

  • Coffa, Alberto. “The Humble Origins of Russell’s Paradox.” Russell nos. 33-4 (1979): 31-7.
  • Frege, Gottlob. The Basic Laws of Arithmetic: Exposition of the System. Edited and translated by Montgomery Furth. Berkeley: University of California Press, 1964.
  • Frege, Gottlob. Correspondence with Russell. In Philosophical and Mathematical Correspondence. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Geach, Peter T. “On Frege’s Way Out.” Mind 65 (1956): 408-9.
  • Grattan-Guinness, Ivor. “How Bertrand Russell Discovered His Paradox.” Historica Mathematica 5 (1978): 127-37.
  • Hatcher, William S. Logical Foundations of Mathematics. New York: Pergamon Press, 1982.
  • Quine, W. V. O. “New Foundations for Mathematical Logic.” In From a Logical Point of View. 2d rev. ed. Cambridge, MA: Harvard University Press, 1980. (First published in 1937.)
  • Quine, W. V. O. “On Frege’s Way Out.” Mind 64 (1955): 145-59.
  • Russell, Bertrand. Correspondence with Frege. In Philosophical and Mathematical Correspondence, by Gottlob Frege. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Russell, Bertrand. The Principles of Mathematics. 2d. ed. Reprint, New York: W. W. Norton & Company, 1996. (First published in 1903.)
  • Zermelo, Ernst. “Investigations in the Foundations of Set Theory I.” In From Frege to Gödel, ed. by Jean van Heijenoort. Cambridge, MA: Harvard University Press, 1967. (First published in 1908.)

Author Information

Kevin C. Klement
Email: klement@philos.umass.edu
University of Massachusetts, Amherst
U. S. A.

Square of Opposition

The square of opposition is a chart that was introduced within classical (categorical) logic to represent the logical relationships holding between certain propositions in virtue of their form. The square, traditionally conceived, looks like this:

square-of-opposition

The four corners of this chart represent the four basic forms of propositions recognized in classical logic:

A propositions, or universal affirmatives take the form: All S are P.
E propositions, or universal negations take the form: No S are P.
I propositions, or particular affirmatives take the form: Some S are P.
O propositions, or particular negations take the form: Some S are not P.

Given the assumption made within classical (Aristotelian) categorical logic, that every category contains at least one member, the following relationships, depicted on the square, hold:

Firstly, A and O propositions are contradictory, as are E and I propositions. Propositions are contradictory when the truth of one implies the falsity of the other, and conversely. Here we see that the truth of a proposition of the form All S are P implies the falsity of the corresponding proposition of the form Some S are not P. For example, if the proposition “all industrialists are capitalists” (A) is true, then the proposition “some industrialists are not capitalists” (O) must be false. Similarly, if “no mammals are aquatic” (E) is false, then the proposition “some mammals are aquatic” must be true.

Secondly, A and E propositions are contrary. Propositions are contrary when they cannot both be true. An A proposition, e.g., “all giraffes have long necks” cannot be true at the same time as the corresponding E proposition: “no giraffes have long necks.” Note, however, that corresponding A and E propositions, while contrary, are not contradictory. While they cannot both be true, they can both be false, as with the examples of “all planets are gas giants” and “no planets are gas giants.”

Next, I and O propositions are subcontrary. Propositions are subcontrary when it is impossible for both to be false. Because “some lunches are free” is false, “some lunches are not free” must be true. Note, however, that it is possible for corresponding I and O propositions both to be true, as with “some nations are democracies,” and “some nations are not democracies.” Again, I and O propositions are subcontrary, but not contrary or contradictory.

Lastly, two propositions are said to stand in the relation of subalternation when the truth of the first (“the superaltern”) implies the truth of the second (“the subaltern”), but not conversely. A propositions stand in the subalternation relation with the corresponding I propositions. The truth of the A proposition “all plastics are synthetic,” implies the truth of the proposition “some plastics are synthetic.” However, the truth of the O proposition “some cars are not American-made products” does not imply the truth of the E proposition “no cars are American-made products.” In traditional logic, the truth of an A or E proposition implies the truth of the corresponding I or O proposition, respectively. Consequently, the falsity of an I or O proposition implies the falsity of the corresponding A or E proposition, respectively. However, the truth of a particular proposition does not imply the truth of the corresponding universal proposition, nor does the falsity of an universal proposition carry downwards to the respective particular propositions.

The presupposition, mentioned above, that all categories contain at least one thing, has been abandoned by most later logicians. Modern logic deals with uninstantiated terms such as “unicorn” and “ether flow” the same as it does other terms such as “apple” and “orangutan”. When dealing with “empty categories”, the relations of being contrary, being subcontrary and of subalternation no longer hold. Consider, e.g., “all unicorns have horns” and “no unicorns have horns.” Within contemporary logic, these are both regarded as true, so strictly speaking, they cannot be contrary, despite the former’s status as an A proposition and the latter’s status as an E proposition. Similarly, “some unicorns have horns” (I) and “some unicorns do not have horns” (O) are both regarded as false, and so they are not subcontrary. Obviously then, the truth of “all unicorns have horns” does not imply the truth of “some unicorns have horns,” and the subalternation relation fails to hold as well. Without the traditional presuppositions of “existential import”, i.e., the supposition that all categories have at least one member, then only the contradictory relation holds. On what is sometimes called the “modern square of opposition” (as opposed to the traditional square of opposition sketched above) the lines for contraries, subcontraries and subalternation are erased, leaving only the diagonal lines for the contradictory relation.

Author Information

The author of this article is anonymous.
The IEP would like a qualified author to replace this article with a longer one.

Zhuangzi (Chuang-Tzu, 369—298 B.C.E.)

zhuangziThe Zhuangzi (also known in Wade-Giles romanization romanization as Chuang-tzu), named after “Master Zhuang” was, along with the Laozi, one of the earliest texts to contribute to the philosophy that has come to be known as Daojia, or School of the Way. According to traditional dating, Master Zhuang, to whom the first seven chapters of the text have traditionally been attributed, was an almost exact contemporary of the Confucian thinker Mencius, but we have no record of direct philosophical dialogue between them.  The text is ranked among the greatest of literary and philosophical masterpieces that China has produced.  Its style is complex—mythical, poetic, narrative, humorous, indirect, and polysemic.

Much of the text espouses a holistic philosophy of life, encouraging disengagement from the artificialities of socialization, and cultivation of our natural “ancestral” potencies and skills, in order to live a simple and natural, but full and flourishing life. It is critical of our ordinary categorizations and evaluations, noting the multiplicity of different modes of understanding between different creatures, cultures, and philosophical schools, and the lack of an independent means of making a comparative evaluation. It advocates a mode of understanding that is not committed to a fixed system, but is fluid and flexible, and that maintains a provisional, pragmatic attitude towards the applicability of these categories and evaluations.

The Zhuangzi text is an anthology, in which several distinctive strands of Daoist thought can be recognized. The Jin dynasty thinker and commentator, Guo Xiang (Kuo Hsiang, d. 312 CE), edited and arranged an early collection, and reduced what had been a work in fifty-two chapters down to thirty-three chapters, excising material that he considered to be repetitious or spurious.  The versions of Daoist philosophy expressed in this text were highly influential in the reception, interpretation, and transformation of Buddhist philosophies in China.

Table of Contents

  1. Historical Background
  2. The Zhuangzi Text
  3. Central Concepts in the “Inner Chapters”
    1. Chapter 1: Xiao Yao You (Wandering Beyond)
    2. Chapter 2: Qi Wu Lun (Discussion on Smoothing Things Out)
    3. Chapter 3: Yang Sheng Zhu (The Principle of Nurturing Life)
    4. Chapter 4: Ren Jian Shi (The Realm of Human Interactions)
    5. Chapter 5: De Chong Fu (Signs of the Flourishing of Potency)
    6. Chapter 6: Da Zong Shi (The Vast Ancestral Teacher)
    7. Chapter 7: Ying Di Wang (Responding to Emperors and Kings)
  4. Key Interpreters of Zhuangzi
  5. References and Further Reading

1. Historical Background

According to the Han dynasty historian, Sima Qian, Zhuangzi was born during the Warring States (403-221 BCE), more than a century after the death of Confucius. During this time, the ostensibly ruling house of Zhou had lost its authority, and there was increasing violence between states contending for imperial power. This situation gave birth to the phenomenon known as the baijia, the hundred schools: the flourishing of many schools of thought, each articulating its own conception of a return to a state of harmony. The first and most significant of these schools was that of Confucius, who became the chief representative of the Ruists (Confucians), the scholars and propagators of the wisdom and culture of the tradition. Their great rivals were the Mohists, the followers of Mozi (“Master Mo”), who were critical of what they perceived to be the elitism and extravagance of the traditional culture. The archaeological discovery at Guo Dian in 1993 of an early Laozi manuscript suggests that the philosophical movement associated with the text also began to emerge during this period. The strands of Daoist philosophy expressed in the earliest strata of the Zhuangzi developed within a context infused with the ideas of these three schools. Master Zhuang is usually taken to be the author of the first seven chapters, but in recent years a few scholars have found reason to be skeptical not just of his authorship of any of the text, but also of his very existence.

According to early evidence compiled by Sima Qian, Zhuangzi was born in a village called Meng, in the state of Song; according to Lu Deming, the Sui-Tang dynasty scholar, the Pu River in which Zhuangzi was said to have fished was in the state of Chen which, as Wang Guowei points out, had become a territory of the southern state of Chu. We might say that Zhuangzi was situated in the borderlands between Chu, centered around the Yangzi River, and the central plains—which centered around the Yellow River and which were the home of the Shang and Zhou cultures. Some scholars, especially in China, maintain that there is a connection between the philosophies of the Daoist texts and the culture of Chu. The diversity of regions and cultures in early China has increasingly been acknowledged, and most interest has been directed to the state of Chu, in large part because of the wealth of archaeological evidence that is being unearthed there. As one develops a sensitivity for the culture of Chu, one senses deep resonances with the aesthetic sensibility of the Daoists, and with Zhuangzi’s style in particular. The silks and bronzes of Chu, for example, are rich and vibrant; the patterns and images on fabrics and pottery are fanciful and naturalistic. However, while the evidence is persuasive, it is far from decisive.

If the traditional dating is reliable, then Zhuangzi would have been an almost exact contemporary of the Ruist thinker Mencius, though there is no clear evidence of communication between them. There are a few remarks in the Zhuangzi that could possibly be alluding to Mencius’ philosophy, but there is nothing in the Mencius that shows any interest in Zhuangzi. The philosopher and statesman Hui Shi, or Huizi (“Master Hui,” 380-305 BCE), is represented as a close friend of Zhuangzi, though decidedly unconvinced by his philosophical musings. There appears to have been a friendly rivalry between the broad and mythic-minded Zhuangzi and the politically motivated Huizi, who is critiqued in the text as a shortsighted paradox-monger. Despite their very deep philosophical distance, and Huizi’s perceived limitations, Zhuangzi expresses great appreciation both for his linguistic abilities and for his friendship. The other “logician,” Gongsun Longzi, would also have been a contemporary of Zhuangzi, and although Zhuangzi does not, unfortunately, engage in any direct philosophical discussion with him, one does find what appears to be an occasional wink in his direction.

2. The Zhuangzi Text

The currently extant text known as the Zhuangzi is the result of the editing and arrangement of the Jin dynasty thinker and commentator Guo Xiang (Kuo Hsiang, d. 312 CE). He reduced what was then a work in fifty-two chapters to the current edition of thirty-three chapters, excising material that he considered to be spurious. His commentary on the text provides an interpretation that has been one of the most influential over the subsequent centuries.

Guo Xiang’s thirty-three chapter edition of the text is divided into three collections, known as the Inner Chapters (Neipian), the Outer Chapters (Waipian), and the Miscellaneous Chapters (Zapian). The Inner Chapters are the first seven chapters and are generally considered to be the work of Zhuangzi himself. Because the evidence for this attribution is sparse and because of the miscellaneous nature of the editing, some scholars (McCraw, Klein) express skepticism that we can be sure which were the earliest passages or who they were written by. The Outer Chapters are chapters 8 to 22, and the Miscellaneous Chapters are chapters 23 to 33. The Outer and Miscellaneous Chapters can be further subdivided into different strands of Daoist thought. Much modern research has been devoted to a sub-classification of these chapters according to philosophical school. Kuan Feng made some scholarly breakthroughs early in the twentieth century; A. C. Graham continued his classification in the tradition of Kuan Feng. Harold Roth has also taken up a consideration of this issue and come up with some very interesting results. What follows is a simplified version of the results of the research of Liu Xiaogan.

According to Liu, chapters 17 to 27 and 32 can be considered to be the work of a school of Zhuangzi’s followers, what he calls the Shu Zhuang Pai, or the “Transmitter” school. Graham, following Kuan Feng, considers chapters 22 to 27 and 32 not to be coherent chapters, but merely random “ragbag” collections of fragments. In fact, this miscellaneous character is characteristic of many, if not most, of the rest of the chapters, and complicates any simplistic classification of chapters as a whole. Liu considers chapters 8 to 10, chapters 28 to 31, and the first part of chapter 11 to be from a school of Anarchists whose philosophy is closely related to that of Laozi. Graham, again following Kuan Feng, sees these as two separate but related schools: the first he attributes to a writer he calls the “Primitivist,” the second he considers to be a school of followers of Yang Zhu. Liu classifies chapters 12 to 16, chapter 33, and the first part of chapter 11 as belonging to the Han dynasty school known as Huang-Lao. Graham refers to them as the Syncretist chapters. Graham finds the classification of chapter 16 to be problematic. Chapter 30 does not seem to have any distinctively Daoist content at all. Though Graham thinks that it is consistent with the Yangist emphasis on preserving life, it is also consistent with Confucian and Mohist critiques of aggression.

In the following chart the further to the right the chapters are listed, the further away they are from the central ideas of the Zhuangzian philosophy of the Inner Chapters:

The Inner Chapters School of Zhuang Anarchist Utopianism Huang-Lao Syncretism
1. Wandering Beyond 17. Autumn Floods 8. Webbed Toes 11. Let it Be, Leave it Alone
2. Discussion on Smoothing Things Out 18. Utmost Happiness 9. Horse’s Hooves 12. Heaven and Earth
3. The Principle of Nurturing Life 19. Mastering Life 10. Rifling Trunks 13. The Way of Heaven
4. In the Human Realm 20. The Mountain Tree 11. Let it Be, Leave it Alone 14. The Turning of Heaven
5. Signs of Abundant Potency 21. Tian Zi Fang 15. Constrained in Will
6. The Vast Ancestral Teacher 22. Knowledge Wandered North (16?. Mending the Inborn Nature) (16?. Mending the Inborn Nature)
7. Responding to Emperors and Kings 23. Geng Sang Chu
24. Xu Wugui 28. Yielding the Throne 33. The World
25. Ze Yang 29. Robber Zhi
26. External Things (30. Discoursing on Swords?)
27. Imputed Words 31. The Old Fisherman
32. Lie Yukou

3. Central Concepts in the “Inner Chapters”

The following is an account of the central ideas of Zhuangzian philosophy, going successively through each of the seven Inner Chapters. This discussion is not confined to the content of the particular chapters, but rather represents a fuller articulation of the inter-relationships of the ideas between the Inner Chapters, and also between these ideas and those expressed in the Outer and Miscellaneous Chapters, where these appear to be related. References to “Zhuangzi” below should not be taken as referring to a historical person, but rather as shorthand for the overall philosophy as articulated in the text of the Inner Chapters and related passages.

a. Chapter 1: Xiao Yao You (Wandering Beyond)

The title of the first chapter of the Zhuangzi has also been translated as “Free and Easy Wandering” and “Going Rambling Without a Destination.” Both of these reflect the sense of the Daoist who is in spontaneous accord with the natural world, and who has retreated from the anxieties and dangers of social life, in order to live a healthy and peaceful natural life. In modern Mandarin, the word xiaoyao has thus come to mean “free, at ease, leisurely, spontaneous.” It conveys the impression of people who have given up the hustle and bustle of worldly existence and have retired to live a leisurely life outside the city, perhaps in the natural setting of the mountains.

But this everyday expression is lacking a deeper significance that is expressed in the classical Chinese phrase: the sense of distance, or going beyond. As with all Zhuangzi’s images, this is to be understood metaphorically. The second word, ‘yao,’ means ‘distance’ or ‘beyond,’ and here implies going beyond the boundaries of familiarity. We ordinarily confine ourselves within our social roles, expectations, and values, and with our everyday understandings of things. But this, according to Zhuangzi, is inadequate for a deeper appreciation of the natures of things, and for a more successful mode of interacting with them. We need at the very least to undo preconceptions that prevent us from seeing things and events in new ways; we need to see how we can structure and restructure the boundaries of things. But we can only do so when we ourselves have ‘wandered beyond’ the boundaries of the familiar. It is only by freeing our imaginations to reconceive ourselves, and our worlds, and the things with which we interact, that we may begin to understand the deeper tendencies of the natural transformations by which we are all affected, and of which we are all constituted. By loosening the bonds of our fixed preconceptions, we bring ourselves closer to an attunement to the potent and productive natural way (dao) of things.

Paying close attention to the textual associations, we see that wandering is associated with the word wu, ordinarily translated ‘nothing,’ or ‘without.’ Related associations include: wuyou (no ‘something’) and wuwei (no interference). Roger Ames and David Hall have commented extensively on these wu expressions. Most importantly, they are not to be understood as simple negations, but have a much more complex function. The significance of all of these expressions must be traced back to the wu of Laozi: a type of negation that does not simply negate, but places us in a new kind of relation to ‘things’—a phenomenological waiting that allows them to manifest, one that acknowledges the space that is the possibility of their coming to presence, one that appreciates the emptiness that is the condition of the possibility of their capacity to function, to be useful (as the hollow inside a house makes it useful for living). The behavior of one who wanders beyond becomes wuwei: sensitive and responsive without fixed preconceptions, without artifice, responding spontaneously in accordance with the unfolding of the inter-developing factors of the environment of which one is an inseparable part.

But it is not just the crossing of horizontal boundaries that is at stake. There is also the vertical distance that is important: one rises to a height from which formerly important distinctions lose what appeared to be their crucial significance. Thus arises the distinction between the great and the small, or the Vast (da) and the petty (xiao). Of this distinction Zhuangzi says that the petty cannot come up to the Vast: petty understanding that remains confined and defined by its limitations cannot match Vast understanding, the expansive understanding that wanders beyond. Now, while it is true that the Vast loses sight of distinctions noticed by the petty, it does not follow that they are thereby equalized, as Guo Xiang suggests. For the Vast still embraces the petty in virtue of its very vastness. The petty, precisely in virtue of its smallness, is not able to reciprocate.

Now, the Vast that goes beyond our everyday distinctions also thereby appears to be useless. A soaring imagination may be wild and wonderful, but it is extremely impractical and often altogether useless. Indeed, Huizi, Zhuangzi’s friend and philosophical foil, chides him for this very reason. But Zhuangzi expresses disappointment in him: for his inability to sense the use of this kind of uselessness is a kind of blindness of the spirit. The useless has use, only not as seen on the ordinary level of practical affairs. It has a use in the cultivation and nurturing of the ‘shen‘ (spirit), in protecting the ancestral and preserving one’s life, so that one can last out one’s natural years and live a flourishing life. Now, this notion of a flourishing life is not to be confused with a ‘successful’ life: Zhuangzi is not impressed by worldly success. A flourishing life may indeed look quite unappealing from a traditional point of view. One may give up social ambition and retire in relative poverty to tend to one’s shen and cultivate one’s xing (nature, or life potency).

To summarize: When we wander beyond, we leave behind everything we find familiar, and explore the world in all its unfamiliarity. We drop the tools that we have been taught to use to tame the environment, and we allow it to teach us without words. We imitate its spontaneous behavior and we learn to respond immediately without fixed articulations.

b. Chapter 2: Qi Wu Lun (Discussion on Smoothing Things Out)

If the Inner Chapters form the core of the Zhuangzi collection, then the Qi Wu Lun may be thought of as forming the core of the Inner Chapters. It is, at any rate, the most complex and intricate of the chapters of the Zhuangzi, with allusions and allegories, highly condensed arguments, and baffling metaphors juxtaposed without explanation. It appears to be concerned with the deepest and most ‘abstract’ understanding of ourselves, our lives, our world, our language, and indeed of our understanding itself. The most perplexing sections concern language and judgment, and are filled with paradox, sometimes even contradiction. But the contradictions are not easy to dismiss: their context indicates that they have a deep significance. In part, they appear to attempt to express an understanding about the limits of understanding itself, about the limits of language and thought.

This creates a problem for the interpreter, and especially for the translator. How do we deal with the contradictions? The most common solution is to paraphrase them so as to remove the direct contradictoriness, under the presupposition that no sense can be made of a contradiction. The most common way to remove the contradictions is to insert references to points of view. Those translators, such as A. C. Graham, who do this are following the interpretation of the Jin dynasty commentator Guo Xiang, who presents the philosophy as a form of relativism: apparently opposing judgments can harmonized when it is recognized that they are made from different perspectives.

According to Guo Xiang’s interpretation, each thing has its own place, its own nature (ziran); and each thing has its own value that follows from its own nature. If so, then nothing should be judged by values appropriate to the natures of other things. According to Guo Xiang, the vast and the small are equal in significance: this is his interpretation of the word “qi” in the title, “equalization of all viewpoints”. Now, such a radical relativism may have the goal of issuing a fundamental challenge to the status quo, arguing that the established values have no more validity than any of the minority values, no matter how shocking they may seem to us. In this way, its effect would be one of destabilization of the social structure. Here, however, we see another of the possible consequences of such a position: its inherent conservativeness. Guo Xiang’s purpose in asserting this radical uniqueness and necessity of each position is conservative in this way. Indeed, it appears to be articulated precisely in response to those who oppose the traditional Ruist values of humanity and rightness (ren and yi) by claiming to have a superior mystical ground from which to judge them to be lacking. Guo Xiang’s aim in asserting the equality of every thing, every position, and every function, is to encourage each thing, and each person, to accept its own place in the hierarchical system, to acknowledge its value in the functioning of the whole. In this way, radical relativism actually forestalls the possibility of radical critique altogether!

According to this reading, the Vast perspective of the giant Peng bird is no better than the petty perspectives of the little birds who laugh at it. And indeed, Guo Xiang, draws precisely this conclusion. But there is a problem with taking this reading too seriously, and it is the kind of problem that plagues all forms of radical relativism when one attempts to follow them through consistently. Simply put, Zhuangzi would have to acknowledge that his own position is no better than those he appears to critique. He would have to acknowledge that his Daoist philosophy, indeed even this articulation of relativism, is no improvement over Confucianism after all, and that it is no less short-sighted than the logic-chopping of the Mohists. This, however, is a consequence that Zhuangzi does not recognize. This is surely an indication that the radical relativistic interpretation is clearly a misreading.

Recently, some western interpreters (Lisa Raphals and Paul Kjellberg, for example) have focused their attention on aspects of the text that express affinities with the Hellenistic philosophy of Skepticism. Now, it is important not to confuse this with what in modern philosophy is thought of as a doctrine of skepticism, the most common form of which is the claim that we cannot ever claim to know anything, for at least the reason that we might always be wrong about anything we claim to know—that is, because we can never know anything with absolute certainty. This is not quite the claim of the ancient Skeptics. Arguing from a position of fallibilism, these latter feel that we ought never to make any final judgments that go beyond the immediate evidence, or the immediate appearances. We should simply accept what appears at face value and have no further beliefs about its ultimate consequences, or its ultimate value. In particular, we should refrain from making judgments about whether it is good or bad for us. We bracket (epoche) these ultimate judgments. When we see that such things are beyond our ability to know with certainty, we will learn to let go of our anxieties and accept the things that happen to us with equanimity. Such a state of emotional tranquility they call ‘ataraxia.’

Now, the resonances with Zhuangzi’s philosophy are clear. Zhuangzi also accepts a form of fallibilism. While he does not refrain from making judgments, he nevertheless acknowledges that we cannot be certain that what we think of as good for us may not ultimately be bad for us, or that what we now think of as something terrible to be feared (death, for example) might not be an extraordinarily blissful awakening and a release from the toils and miseries of worldly life. When we accept this, we refrain from dividing things into the acceptable and the unacceptable; we learn to accept the changes of things in all their aspects with equanimity. In the Skeptical reading, the textual contradictions are also resolved by appealing to different perspectives from which different judgments appear to be true. Once one has learnt how to shift easily between the perspectives from which such different judgments can be made, then one can see how such apparently contradictory things can be true at the same time—and one no longer feels compelled to choose between them.

There is, however, another way to resolve these contradictions, one that involves recognizing the importance of continuous transformation between contrasting phenomena and even between opposites. In the tradition of Laozi’s cosmology, Zhuangzi’s worldview is also one of seasonal transformations of opposites. The world is seen as a giant clod (da kuai) around which the heavens (tian) revolve about a polar axis (daoshu). All transformations have such an axis, and the aim of the sage is to settle into this axis, so that one may observe the changes without being buffeted around by them.

Now, the theme of opposites is taken up by the Mohists, in their later Mohist Canon, but with a very different understanding. The later Mohists present a detailed analysis of judgments as requiring bivalence: that is judgments may be acceptable (ke) (also, ‘affirmed’ shi) or unacceptable (buke) (also ‘rejected’fei); they must be one or the other and they cannot be both. There must always be a clear distinction between the two. It is to this claim, I believe, that Zhuangzi is directly responding. Rejecting also the Mohist style of discussion, he appeals to an allusive, aphoristic, mythological style of poetic writing to upset the distinctions and blur the boundaries that the Mohists insist must be held apart. The Mohists believe that social harmony can only be achieved when we have clarity of distinctions, especially of evaluative distinctions: true/false, good/bad, beneficial/harmful. Zhuangzi’s position is that this kind of sharp and rigid thinking can result ultimately only in harming our natural tendencies (xing), which are themselves neither sharp nor rigid. If we, on the contrary, learn to nurture those aspects of our heart-minds (xin), our natural tendencies (xing), that are in tune with the natural (tian) and ancestral (zong) within us, then we will eventually find our place at the axis of the way (daoshu) and will be able to ride the transformations of the cosmos free from harm. That is, we will be able to sense and respond to what can only be vaguely expressed without forcing it into gross and unwieldy verbal expressions. We are then able to recognize the paradoxes of vagueness and indeterminacy that arise from infinitesimal processes of transformation.

Put another way, our knowledge and understanding (zhi, tong, da) are not just what we can explicitly see before us and verbalize: in modern terms, they are not just what is ‘consciously,’ ‘conceptually,’ or ‘linguistically’ available to us. Zhuangzi also insists on a level of understanding that goes beyond such relatively crude modes of dividing up our world and experiences. There are hidden modes of knowing, not evident or obviously present, modes that allow us to live, breathe, move, understand, connect with others without words, read our environments through subtle signs; these modes of knowing also give us tremendous skill in coping with others and with our environments. These modes of knowing Zhuangzi calls “wuzhi”, literally ‘without knowing,’ or ‘unknowing’. What is known by such modes of knowing, when we attempt to express it in words, becomes paradoxical and appears contradictory. It seems that bivalent distinctions leave out too much on either side of the divide: they are too crude a tool to cope with the subtlety and complexity of our non-conceptual modes of knowing. Zhuangzi, following a traditional folk psychology of his time, calls this capacity shenming: “spirit insight.”

When we nurture that deepest and most natural, most ancestral part of our pysches, through psycho-physical meditative practices, we at the same time nurture these non-cognitive modes of understanding, embodied wisdoms, that enable us to deal successfully with our circumstances. It is then that we are able to cope directly with what from the limited perspective of our ‘socialized’ and ‘linguistic’ understanding seems to be too vague, too open, too paradoxical.

c. Chapter 3: Yang Sheng Zhu (The Principle of Nurturing Life)

This chapter, like the Anarchist Utopian chapters, deals with the way to nurture and cultivate one’s ‘life tendencies’ (sheng, xing) so as to enable one to live skillfully and last out one’s natural years (zhong qi tian nian). There is a ‘potency’ within oneself that is a source of longevity, an ancestral place from which the phenomena of one’s life continue to arise. This place is to be protected (bao), kept whole (quan), nurtured and cultivated (yang). The result is a sagely and skillful life. We must be careful how we understand this word, ‘skill.’ Zhuangzi takes pains to point out that it is no mere technique. A technique is a procedure that may be mastered, but the skill of the sage goes beyond this. One might say that it has become an ‘art,’ a dao. With Zhuangzi’s conception, any physical activity, whether butchering a carcass, making wooden wheels, or carving beautiful ceremonial bell stands, becomes a dao when it is performed in a spiritual state of heightened awareness (‘attenuation’ xu).

Zhuangzi sees civic involvement as particularly inimical to the preservation and cultivation of one’s natural life. In order to cultivate one’s natural potencies, one must retreat from social life, or at least one must retreat from the highly complex and artificially structured social life of the city. One undergoes a psycho-physical training in which one’s sensory and physical capacities become honed to an extraordinary degree, indicating one’s attunement with the transformations of nature, and thus highly responsive to the tendencies (xing) of all things, people, and processes. The mastery achieved is demonstrated (both metaphorically, and literally) by practical embodied skill. That is, practical embodied skill is also a metaphor representing the mastery of the life of the sage, and so it is also a sign of sagehood (though not all those who are skillful are to be reckoned as sages). Thus, we see many examples of individuals who have achieved extraordinary levels of excellence in their achievements—practical, aesthetic, and spiritual. Butcher Ding provides an example of a practical, and very lowly skill; Liezi’s teacher, Huzi, in chapter 7, provides an example of skill in controlling the very forces responsible for life themselves. Chapter 19, Mastering Life, is replete with examples: a cicada catcher, a ferryman, a carpenter, a swimmer, and Woodcarver Qing, whose aesthetic skill reaches ‘magical’ heights.

d. Chapter 4: Ren Jian Shi (The Realm of Human Interactions)

In this chapter, Zhuangzi continues the theme broached in the last chapter, but now takes on the problem of how to maintain and preserve one’s life and last out one’s years while living in the social realm, especially in circumstances of great danger: a life of civic engagement in a time of social corruption.

The Daoists, especially the authors of the anarchistic utopian chapters, are highly critical of the artificiality required to create and sustain complex social structures. The Daoists are skeptical of the ability of deliberate planning to deal with the complexities of the world within which our social structures have their place. Even the developments of the social world when left to themselves are ‘natural’ developments, and as such escape the confines of planned, structured thinking. The more we try to control and curtail these natural meanderings, the more complicated and unwieldy the social structures become. According to the Daoists, no matter how complex we make our structures, they will never be fully able to cope with the fluid flexibility of natural changes. The Daoists perceive the unfolding of the transformations of nature as exhibiting a kind of natural intelligence, a wisdom that cannot be matched by deliberate artificial thinking, thinking that can be articulated in words. The result is that phenomena guided by such artificial structures quickly lose their course, and have to be constantly regulated, re-calibrated. This need gives rise to the development and articulation of the artificial concepts of ren and yi for the Ruists, and shi and fei for the Mohists.

The Ruists emphasize the importance of cultivating the values of ren ‘humanity’ and yi ‘appropriateness/rightness.’ The Mohists identify a bivalent structure of preference and evaluation, shifei. Our judgments can be positive or negative, and these arise out of our acceptance and rejection of things or of judgments, and these in turn arise out of our emotional responses to the phenomena of benefit and harm, that is, pleasure and pain. Thus, we set up one of two types of systems: the intuitive renyi morality of the Ruists, or the articulated structured shifei of the Mohists.

Zhuangzi sees both of these as dangerous. Neither can keep up with the complex transformations of things and so both will result in harm to our shen and xing. They lead to the desire of rulers to increase their personal profit, their pleasure, and their power, and to do so at the expense of others. The best thing is to steer clear of such situations. But there are times when one cannot do so: there is nothing one can do to avoid involvement in a social undertaking. There are also times—if one has a Ruist sensibility—when one will be moved to do what one can and must in order to improve the social situation. Zhuangzi makes up a story about Confucius’ most beloved and most virtuous follower, Yen Hui, who feels called to help ‘rectify’ the King of a state known for his selfishness and brutality.

Zhuangzi thinks that such a motivation, while admirable, is ultimately misguided. There is little to nothing one can do to change things in a corrupt world. But if you really have to try, then you should be aware of the dangers, be aware of the natures of things, and of how they transform and develop. Be on the lookout for the ‘triggers’: the critical junctures at which a situation can explode out of hand. In the presence of danger, do not confront it: always dance to one side, redirect it through skilled and subtle manipulations, that do not take control, but by adding their own weight appropriately, redirect the momentum of the situation. One must treat all dangerous social undertakings as a Daoist adept: one must perform xinzhai, fasting of the heart-mind. This is a psycho-physical discipline of attenuation, in which one nurtures one’s inner potencies by thinning out one’s personal preferences and keeping one’s emotions in check, so that one may achieve a heightened sensitivity to the tendencies of things. One then responds with the skill of a sage to the dangerous moods and intentions of one’s worldly ruler.

e. Chapter 5: De Chong Fu (Signs of the Flourishing of Potency)

This chapter is populated with a collection of characters with bodily eccentricities: criminals with amputated feet, people born with ‘ugly’ deformities, hunchbacks with no lips. Perhaps some of these are moralistic advisors, like those of chapter 4, who were unsuccessful in bringing virtue and harmony to a corrupt state, and instead received the harsh punishment of their offended ruler. But it is also possible that some were born with these physical ‘deformities.’ As the Commander of the Right says in chapter 3, “When tian (nature) gave me life, it saw to it that I would be one footed.” These then are people whose natural capacity (de) has been twisted somehow, redirected, so that it gives them a potency (de) that is beyond the normal human range. At any rate, this out of the ordinary appearance, this extraordinary physical form, is a sign of something deeper: a potency and a power (de) that connects them more closely to the ancestral source. These are the sages that Zhuangzi admires: those whose virtue (de) is beyond the ordinary, and whose signs of virtue indicate that they have gone beyond.

But what goes beyond is also the source of life. To hold fast to that which is beyond both living and dying, is perhaps also to hold fast to something more primordial that is beyond human and inhuman. To identify with and nurture this source is to nurture that which is at the root of our humanity. If so, then one does not necessarily become inhuman. Indeed, one might argue that this creates the possibility of deepening one’s most genuine humanity, insofar as this is a deeper nature still.

f. Chapter 6: Da Zong Shi (The Vast Ancestral Teacher)

The first part of this chapter is devoted to a discussion of the zhenren: the “genuine person,” or “genuine humanity,” (or in older translations, “True Man”). It begins by asking about the relation between tian and ren, the natural/heaven and the human, and suggests that the greatest wisdom lies in the ability to understand both. Thus, to be forced to choose between being natural or being human is a mistake. A genuinely flourishing human life cannot be separated from the natural, but nor can it on that account deny its own humanity. Genuine humanity is natural humanity.

There are several sections devoted to explicating this genuine humanity. We find that the genuinely human person, the zhen ren, is in tune with the cycles of nature, and is not upset by the vicissitudes of life. The zhenren like Laozi’s sage is somehow simultaneously unified with things, and yet not tied down by them. The zhenren is in tune with the cycles of nature, and with the cycles of yin yang, and is not disturbed or harmed by them. In fact, the zhenren is not harmed by them either in what appears to us to be their negative phases, nor are their most extreme phases able to upset the balance of the zhenren. This is sometimes expressed with what I take to be the hyperbole that the sage or zhenren can never be drowned by the ocean, nor burned by fire.

In the second part of the chapter, Zhuangzi hints at the process by which we are to cultivate our genuine and natural humanity. These are meditative practices and psycho-physical disciplines—”yogas” perhaps—by which we learn how to nourish the ancestral root of life that is within us. We learn how to identify with that center which functions as an axis of stability around which the cycles of emotional turbulence flow. By maintaining ourselves as a shifting and responding center of gravity we are able to maintain an equanimity without giving up our feelings altogether. We enjoy riding the dragon without being thrown around by it. Ordinarily, we are buffeted around like flotsam in a storm, and yet, by holding fast to our ancestral nature, and by following the nature of the environment—by “matching nature with nature”—we free ourselves from the mercy of random circumstances.

In this chapter we see a mature development of the ideas of life and death broached in the first three chapters. Zhuangzi continues musing on the significance of our existential predicament as being inextricably tied into interweaving cycles of darkness and light, sadness and joy, living and dying. In chapter two, it was the predicament itself that Zhuangzi described, and he tried to focus on the inseparability and indistinguishability of the two aspects of this single process of transformation. In this chapter, Zhuangzi tries to delve deeper to reach the center of balance, the ‘axis of the way,’ that allows one to undergo these changes with tranquility, and even to accept them with a kind of ‘joy.’ Not an ecstatic affirmation, to be sure, but a tranquil appreciation of the richness, beauty, and “inevitability” of whatever experiences we eventually will undergo. Again, not that we must experience whatever is ‘fated’ for us, or that we ought not to minimize harm and suffering where we can do so, but only that we should acknowledge and accept our situatedness, our thrownness into our situation, as the ‘raw materials’ that we have to deal with.

There are mystical practices hinted at that enable the sage to identify with the datong, the greater flow, not with the particular arisings of these particular emotions, or this particular body, but with what lies within (and below and above) as their ancestral root. These meditative and yogic practices are hinted at in this chapter, and also in chapter 7, but nothing in the text reveals what they are. It is not unreasonable to believe that similar techniques have been handed down by the practitioners of religious Daoism. It is clear, nonetheless, that part of the change is a change in self-understanding, self-identification. We somehow learn to expand, to wander beyond, our boundaries until they include the entire cosmic process. This entire process is seen as like a potter’s wheel, and simultaneously as a whetstone and as a grindstone, on which things are formed, and arise, sharpened, and are ground back down only to be made into new forms. With each ‘birth’ (sheng) some ‘thing’ (wu) new arises, flourishes, develops through its natural (tian) tendencies (xing), and then still following its natural tendencies, responding to those of its natural environment, it winds down: enters (ru) back into the undifferentiated (wu) from which it emerged (chu). The truest friendship arises when members of a community identify with this unknown undifferentiated process in which they are embedded, “forgotten” differences between self and other, and spontaneously follows the natural developments of which they are inseparable “parts.”

g. Chapter 7: Ying Di Wang (Responding to Emperors and Kings)

The last of the Inner Chapters does not introduce anything new, but closes by returning to a recurring theme from chapters 1, 3, 5, and 6: that of withdrawing from society. This ‘withdrawal’ has two functions: the first is to preserve one’s ‘life’; the second is to allow society to function naturally, and thus to bring itself to a harmonious completion. Rather than interfering with social interactions, one should allow them to follow their natural course, which, Zhuangzi believes, will be both imaginative and harmonious.

These themes resonate with those of the Anarchist chapters in the Outer (and Miscellaneous) chapters: 8 to 11a and 28 to 32. These encourage a life closer to nature in which one lets go of deliberate control and instead learns how to sense the tendencies of things, allowing them to manifest and flourish, while also adding one’s weight to redirect their momentum away from harm and danger. Or, if harm and danger are unavoidable, then one learns how to minimize them, and how to accept whatever one does have to suffer with equanimity.

4. Key Interpreters of Zhuangzi

The earliest of the interpreters of Zhuangzi’s philosophy are of course his followers, whose commentaries and interpretations have been preserved in the text itself, in the chapters that Liu Xiaogan ascribes to the “Shu Zhuang Pai,” chapters 17 to 27. Most of these chapters constitute holistic developments of the ideas of the Inner Chapters, but some of them concentrate on particular issues raised in particular chapters. For example, the author of Chapter 17, the Autumn Floods, elaborates on the philosophy of perspective and overcoming boundaries that is discussed in the first chapter, Xiao Yao You. This chapter develops the ideas in several divergent directions: relativism, skepticism, pragmatism, and even a kind of absolutism. Which of these, if any, is the overall philosophical perspective is not easy to discern. The author of chapter 19, Da Sheng, Mastering Life, takes up the theme of the cultivation of the wisdom of embodied skill that is introduced in chapter 3, Yang Sheng Zhu, The Principle of Nurturing Life. The author of chapter 18, Zhi Le, Utmost Happiness, and chapter 22, Zhi Bei You, Knowledge Wanders North, continues the meditations on life and death, and the cultivation of meditative practice, that are explored in chapter 6, Da Zong Shi, The Vast Ancestral Teacher.

The next group of interpreters have also become incorporated into the extant version of the text. They are the school of philosophers inclined towards anarchist utopias, that Graham identifies as a “Primitivist” and a school of “Yangists,” chapters 8 to 11, and 28 to 31. These thinkers appear to have been profoundly influenced by the Laozi, and also by the thought of the first and last of the Inner Chapters: “Wandering Beyond,” and “Responding to Emperors and Kings.” There are also possible signs of influence from Yang Zhu, whose concern was to protect and cultivate one’s inner life-source. These chapters combine the anarchistic ideals of a simple life close to nature that can be found in the Laozi with the practices that lead to the cultivation and nurturing of life. The practice of the nurturing of life in chapter 3, that leads to the “lasting out of one’s natural years,” becomes an emphasis on maintaining and protecting xing ming zhi qing “the essentials of nature and life’s command” in these later chapters.

The third main group, whose interpretation has been preserved in the text itself, is the Syncretist school, an eclectic school whose aim to is promote an ideal of mystical rulership, influenced by the major philosophical schools of the time, especially those that recommend a cultivation of inner potency. They may or may not be exemplary of the so-called ‘Huang-Lao’ school. They scoured the earlier philosophers in order to extract what was valuable in their philosophies, the element of the dao that is to be found in each philosophical claim. In particular, they sought to combine the more ‘mystically’ inclined philosophies with the more practical ones to create a more complete dao. The last chapter, Tian Xia, The World, considers several philosophical schools, and comments on what is worthwhile in each of them. Zhuangzi’s philosophy is here characterized as “vast,” “vague,” “outrageous,” “extravagant,” and “reckless”; he is also recognized for his encompassing modes of thought, his lack of partisanship, and his recklessness is acknowledged to be harmless. Nevertheless, it is stated that he did not succeed in getting it all.

Perhaps the most important of the pre-Qin thinkers to comment on Zhuangzi is Xunzi. In his “Dispelling Obsessions” chapter, anticipating the eclecticism of the Huang-Lao commentators of chapter 33, he considers several philosophical schools, mentions the corner of ‘truth’ that each has recognized, and then goes on to criticize them for failing to understand the larger picture. Xunzi mentions Zhuangzi by name, describes him as a philosopher who recognizes the value of nature and of following the tendencies of nature, but who thereby fails to recognize the value of the human ‘ren’. Indeed, Zhuangzi seems to be aware of this kind of objection, and even delights in it. He revels in knowing that he is one who wanders off into the distance, far from human concerns, one who is not bound by the guidelines. Perhaps in doing so he corroborates Xunzi’s fears.

Another text that reveals what might be a development of Zhuangzi’s philosophy is the Liezi. This is a philosophical treatise that clearly stands in the same tradition as the Zhuangzi, dealing with many of the same issues, and on occasion with almost identical stories and discussions. Although the Daoist adept, Liezi, to whom the text is attributed is said to have lived before Zhuangzi, the text clearly dates from a later period, perhaps compiled as late as the Eastern Han, though in terms of linguistic style the material appears to date from around the same period as Zhuangzi. The Liezi continues the line of philosophical thinking of the Xiao Yao You, and the Qiu Shui, taking up the themes of transcending boundaries, and even cosmic realms, by spirit journeying. The leaving behind and overturning of human values is a theme that is repeated in this text, though again not without a certain paradoxical tension: after all, the purpose of such journeying and overturning of values is ultimately to enable us in some sense to live ‘better’ lives. While Zhuangzi’s own philosophy exerted a significant influence on the interpretation of Buddhism in China, theLiezi appears to provide a possible converse case of Mahayana Buddhist influence on the development of the ideas of Zhuangzi.

The Jin dynasty scholar, Guo Xiang, is one of the most influential of the early interpreters. His “relativistic” reading of the text has become the received interpretation, and his own distinctive style of philosophical thinking has in this way become almost inseparable from that of Zhuangzi. The task of interpreting Zhuangzi independently of Guo Xiang’s reading is not easy to accomplish. His contribution and interpretation have already been discussed in the body of the entry (See sections above: The Zhuangzitext, and Chapter 2: Qi Wu Lun (Discussion on Smoothing Things Out) ). The Sui dynasty scholar, Lu Deming, produced an invaluable glossary and philological commentary on the text, enabling later generations to benefit from his vast linguistic expertise. The Ming dynasty Buddhist poet and scholar, Han Shan, wrote a commentary on the Zhuangzi from a Chan Buddhist perspective. In a similar vein, the Qing dynasty scholar, Zhang Taiyan, constructed a masterful interpretation of the Zhuangzi in the light of Chinese Buddhist Idealism, or Weishilun. Guo Qingfan, a late Qing, early twentieth century scholar, collected and synthesized the work of previous generations of commentators. The scholarly work of Takeushi Yoshio in Japan has also been of considerable influence. Qian Mu is a twentieth century scholar who has exerted considerable efforts with regard to historical scholarship. Currently, in Taiwan, Chen Guying is the leading scholar and interpreter of Zhuangzi, and he uses his knowledge of western philosophy, particularly western epistemology, cosmology, and metaphysics, to throw new light on this ancient text.

In the west, probably the most important and influential scholar was A. C. Graham, whose pioneering work on this text, and on the later Mohist Canon, has laid the groundwork and set an extraordinarily high standard for future western philosophical scholarship. Graham, following the reading of Guo Xiang, develops a relativistic reading based on a theory of the conventional nature of language. Chad Hansen is a current interpreter who sees the Daoists as largely theorists of language, and he interprets Zhuangzi’s own contribution as a form of “linguistic skepticism.” Recently, there has been a growth of interest in the aspects of Zhuangzi’s philosophy that resonate with the Hellenistic school of Skepticism. This was proposed by Paul Kjellberg, and has been pursued by other scholars such as Lisa Raphals.

5. References and Further Reading

  • Ames, Roger, ed. Wandering at Ease in the Zhuangzi. Albany: State University of New York Press, 1998.
  • Ames, Roger, and Takahiro Nakajima. Zhuangzi and the Happy Fish. Honolulu: University of Hawai`i Press, 2015.
  • Chai, David. Early Zhuangzi Commentaries: On the Sounds and Meanings of the Inner Chapters. Sarrbrucken: VDM Publishing, 2008.
  • Chuang Tzu. Basic Writings. Translated by Burton Watson. New York: Columbia University Press, 1964.
  • Chuang Tzu. The Complete Works of Chuang Tzu. Translated by Burton Watson. New York: Columbia University Press, 1968.
  • Chuang Tzu. Chuang-Tzu The Inner Chapters: A Classic of Tao. Translated by A. C. Graham. London: Mandala, 1991.
  • Chuang Tzu. Chuang tzu. Translated by James Legge, Sacred Books of the East, volumes 39, 40. Oxford: Oxford University Press, 1891.
  • Cook, Scott. Hiding the World Within the World: Ten Uneven Discourses on Zhuangzi. Albany: State University of New York Press, 2003.
  • Coutinho, Steve. An Introduction to Daoist Philosophies. New York: Columbia University Press, 2014.
  • Coutinho, Steve. “Conceptual Analyses of the Zhuangzi”. Dao Companion to Daoist Philosophy. Springer, 2014.
  • Coutinho, Steve. “Zhuangzi”. Berkshire Dictionary of Chinese Biography, pp. 149-162. 2014.
  • Coutinho, Steve. Zhuangzi and Early Chinese Philosophy: Vagueness, Transformation, and Paradox. London: Ashgate Press, forthcoming, December, 2004.
  • Fung, Yu-Lan. Chuang-Tzu: A New Selected Translation with an Exposition of the Philosophy of Kuo Hsiang. 2nd ed. New York: Paragon Book Reprint Corporation, 1964.
  • Graham, Angus Charles. Later Mohist Logic, Ethics and Science. London: School of Oriental and African Studies, 1978.
  • Graham, Angus Charles. Disputers of the Tao: Philosophical Argument in Ancient China. La Salle: Open Court, 1989.
  • Graham, A. C. “Chuang-tzu’s Essay on Seeing things as Equal.” History of Religions 9 (1969/1970), pp. 137—159. Reproduced in Roth, 2003.
  • Graham, A. C. “Chuang-tzu: Textual Notes to a Partial Translation.” London: School of Oriental and African Studies, 1982. Reproduced in Roth, 2003.
  • Hansen, Chad. A Daoist Theory of Chinese Thought: A Philosophical Interpretation. New York, Oxford University Press, 1992.
  • Ivanhoe, P. J. & Paul Kjellberg, ed. Essays on Skepticism, Relativism, and Ethics in the Zhuangzi. Albany: State University of New York Press, 1996.
  • Kaltenmark, Max. Lao Tzu and Taoism. Translated by Roger Greaves. Stanford: Stanford University Press, 1969.
  • Kjellberg, Paul. Zhuangzi and Skepticism. PhD dissertation, Department of Philosophy, Stanford University, 1993.
  • Klein, Esther. (2010). Were there “Inner Chapters” in the Warring States? A new examination of evidence about the Zhuangzi. T’oung Pao, 4/5, pp. 299–369.
  • Kohn, Livia. Zhuangzi: Text and Context. Three Pines Press, 2014.
  • Kohn, Livia. New Visions of the Zhuangzi. Three Pines Press, 2015.
  • Lawton, Thomas, ed. New Perspectives on Chu Culture During the Eastern Zhou Period. Washington, D.C.: Smithsonian Institution, 1991.
  • Li, Xueqin. Eastern Zhou and Qin Civilizations. Translated by Kwang-chih Chang. New Haven: Yale University Press, 1985.
  • Liu, Xiaogan. Classifying the Zhuangzi Chapters. Translated by Donald Munro. Michigan Monographs in Chinese Studies, no. 65. Ann Arbor, Michigan: The University of Michigan, 1994.
  • Mair, Victor H., ed. Experimental Essays on Chuang-tzu. Honolulu: University of Hawaii Press, 1983.
  • Mair, Victor. ed. Chuang-tzu: Composition and Interpretation. Symposium issues, Journal of Chinese Religions 11, 1983.
  • Mair, Victor. Wandering on the Way: Early Taoist Tales and Parables of Chuang Tzu. New York: Bantam Books, 1994.
  • Maspero, Henri. Le Taoïsme. Vol. II, Mélanges Posthumes sur les Religions et l’histoire de la Chine. Paris: Civilisations du Sud S.A.E.P., 1950.
  • McCraw, David. Stratifying Zhuangzi: Rhyme and other quantitative evidence. Language and Linguistics Monograph Series, 41. Taipei, Taiwan: Institute of Linguistics, Academia Sinica, 2010.
  • Roth, Harold. “Who Compiled the Chuang-tzu?” in Chinese Texts and Philosophical Contexts. edited by Henry Rosemont. La Salle: Open Court, 1991.
  • Roth, Harold. A Companion to A. C. Graham’s Chuang Tzu: The Inner Chapters. Honolulu: University of Hawai’i Press, 2003.
  • Wang, Bo. Thinking Through the Inner Chapters. Three Pines Press, 2014.
  • Wu, Kuang-ming. The Butterfly as Companion: Meditations on the First Three Chapters of the Chuang Tzu. Albany: State University of New York Press, 1990.
  • Ziporyn, Brook. Zhuangzi: The Essential Writings: With Selections from Traditional Comentaries. Hackett, 2009.

Author Information

Steve Coutinho
Email: coutinho@muhlenberg.edu
Muhlenberg College
U. S. A.

Zhang Zai (Chang Tsai, 1020—1077)

Chang_TsaiZhang Zai was one of the pioneers of the Song dynasty philosophical movement called “Study of the Way,” often known as Neo-Confucianism. One of the most distinctive features of many of these new ways of thought being formulated at the time was an increased interest in metaphysics, usually influenced by the Classic of Changes (Yijing). Zhang’s most significant contributions to Chinese philosophy were primarily in the area of metaphysics, where he came up with a new theory of qi that was very influential. He is also credited with differentiating original nature and physical nature, which was to become a key concept in the most prominent Song philosophers, the Cheng brothers and Zhu Xi (Chu Hsi). Ethically, his most influential doctrines were found in the brief essay “Western Inscription,” where he propounded the ideas of being one body with all things and universal caring. After his death, most of his disciples were absorbed into the Cheng brothers’ school and his thought become known primarily through the efforts of the Cheng brothers and Zhu Xi, who honored Zhang as one of the founders of the Study of the Way.

Table of Contents

  1. Life and Work
  2. Metaphysics
  3. Human Nature and Ethics
  4. Moral Education and the Heart
  5. References and Further Reading

1. Life and Work

Zhang Zai is also known as Zhang Hengqu, after the town where he grew up and later did much of his teaching. He was born in 1020 and died in 1077. As a youth he was interested in military affairs, but began studying the Confucian texts on the recommendation of an important official who was impressed with Zhang’s abilities. Like most of the Song philosophers, Zhang was initially dissatisfied with Confucian thought and studied Buddhism and Daoism for several years. Eventually, however, he decided that the Way was not to be found in Buddhism or Daoism and returned to Confucian texts. This acquaintance with the other major ways of thought was to have significant influence on Zhang’s own views. According to tradition, around 1056 Zhang sat on a tiger skin in the capital and lectured on the Classic of Changes. It may have been during this period that he first became acquainted with the Cheng brothers, who were actually his younger cousins. After passing the highest level of the civil service examinations, he held a series of minor government posts.

In 1069 Zhang was recommended to the emperor and given a position in the capital, but not long after he ran into conflict with the prime minister and retired home to Hengqu, where he spent his time in retirement studying and teaching. This was probably his most productive period for developing and spreading his own philosophy. In 1076 he completed his most important work, Correcting Ignorance, and presented it to his disciples. “Western Inscription” was originally part of this longer work. That same year he was summoned back to the capital and restored to an important position. However, in the winter he became ill and resigned again to try to convalesce at home. He never reached home, dying on the road in 1077. Zhang was awarded a posthumous title in 1220 and enshrined in the Confucian temple in 1241. Many of Zhang’s writings have been lost. Zhu Xi collected selections of Zhang’s writings in his anthology of Song Study of the Way known as Reflections on Things at Hand. His most important surviving works are probably his commentary on the Changes and Correcting Ignorance.

2. Metaphysics

Zhang Zai’s metaphysics is largely based on the Classic of Changes, especially one of the commentaries, “Appended Remarks,” traditionally attributed to Confucius. According to Zhang, all things of the world are composed of a primordial substance called qiQi is sometimes translated as “substance,” “matter,” or “material force, but there is really no term in English that can capture its meaning for Zhang. Qi originally meant “breath” and is a very old concept in Chinese culture, particularly medicine. For Zhang, qi includes matter and the forces that govern interactions between matter, yin and yang. In its dispersed, rarefied state, qi is invisible and insubstantial, but when it condenses it becomes a solid or liquid and takes on new properties. All material things are composed of condensed qi: rocks, trees, even people. There is nothing that is not qi. Thus, in a real sense, everything has the same essence, an idea which has important ethical implications.

Zhang believed that qi is never created or destroyed; the same qi goes through a continuous process of condensation and dispersion. He compared it to water: water in liquid form or frozen into ice is still the same water. Similarly, condensed qi which forms things or dispersed qi is still the same substance. Condensation is theyin force of qi and dispersion is the yang force. In its wholly dispersed state, Zhang refers to qi as the Great Vacuity, a term he adopted from the Zhuangzi. He emphasized that though this qi is insubstantial, it still exists, and thus is very different from the Buddhist concept of emptiness. Whereas Buddhists argued that the fact that everything changes shows it has no essence and is unreal, Zhang argued that the very fact that it changes proves it is real. Everything that is real is composed of qi, and since qi always changes, anything real must change. Although the Great Vacuity always exists, the particular qi that is dispersed into the Great Vacuity at any time is not the same, which allows Zhang to assert both that qi always changes and the Great Vacuity always remains. There is no such thing as creation ex nihilo for Zhang, an idea he attributes to both Buddhists and Daoists.

Qi begins dispersed and undifferentiated in the Great Vacuity and through condensation forms material things. When these material things pass away, their qi disperses and rejoins the Great Vacuity to begin the process again. What looks like creation and destruction is just the never-ending movements of qi. These processes of condensation and dispersion have no outside cause; they are just part of the nature of qi. Zhang wholly naturalized the workings of qi and rejected any idea of an anthropomorphic Heaven that controlled things. While the Classic of Changes talked of the workings of ghosts and spirits, he reinterpreted these terms to mean the extending and receding of qi from and back to the Great Vacuity. It is all a naturally occurring process.

Unlike later thinkers like the Cheng brothers and Zhu Xi, the concept of pattern (li, also translated as “principle”) is not that important in Zhang’s philosophy. While in the thought of Cheng Yi and Zhu Xi, pattern is a transcendental universal that exists outside of qi, Zhang denied there was anything outside of qi. He seems to use pattern to describe the actions of qi condensing and dispersing, and for the pattern actions should fit to be moral. It certainly has none of the importance for Zhang that it did for some of his successors. Zhu Xi criticized Zhang for this, saying that qi was not enough to explain the workings of the universe without pattern as well.

3. Human Nature and Ethics

Mencius‘s belief that human nature is good, and his theory of qi allowed him to come up with what became the definitive Song answer to a classic problem in Mencius’ thought: if human nature is good, what makes people bad? Zhang’s solution involved positing two ways of looking at nature: the original nature and nature embodied in qi. Zhang claimed original nature exists forever in unchanging perfection, as opposed to material things which decay and die. This raises the question of what original nature consists of, since Zhang has claimed that everything is qi and qi always changes. He is not very clear on this point, but he apparently identified original nature with the undifferentiated qi of the Great Vacuity. When qi condenses to form human beings, each somehow retains some of the character of the unity of the Great Vacuity (or Great Harmony, as he sometimes calls it). This is the original nature, and that is what is good.

However, human beings also have a nature embodied in qi, which Zhang calls physical nature. Being ordinary qi, physical nature changes, eventually dissipating upon death. Zhang theorized that the physical nature obscures the original nature, preventing it from being fulfilled, and this is what causes people to stray from the path of goodness. At one point, he stated that if clear yang qi formed the greater part of physical nature one’s moral capacities would function, but if turbid yin qi dominated, material desires would hold sway. However, it is unclear whether he meant all yang qi was clear and all yin qi was turbid, and he often seems to attach no particular moral weight to whether qi is primarily yang (dispersed) or yin (condensed). As we are all different individuals, we all have slightly different physical natures. Some people are naturally bigger and stronger, some are more generous, some are wiser. This is all a result of the particular endowment of qi that makes up the individual, and since qi condenses into things without cause or direction, there is no reason an individual has the particular physical nature he starts out with: it is just a matter of chance. What is important in terms of moral cultivation is there is also the potential to transform one’s physical nature and fulfill one’s original nature.

Zhang had a deep faith in the potential for human improvement. Like earlier Confucian thinkers such as Mencius and Xunzi, he believed that moral development was a matter of effort, not ability. In a departure from his metaphysical views, where he held that qi changes naturally with no particular rhyme or reason, he claimed that the human heart has the capacity to alter one’s own qi. One can change one’s physical nature in order to fulfill one’s original nature. If that were not possible, goodness would be a matter of chance, being born with the right kind of qi. Zhang said that only the qi of life span, which determines whether one dies young or lives to an old age, cannot be changed. This was Zhang’s attack on longevity-oriented Daoists, who taught techniques that promised to increase one’s life span or even confer immortality. Undoubtedly, part of the goal of Zhang’s theory of qi and physical nature was to refute Buddhist and Daoist teachings.Many Song and Ming thinkers, such as Zhu Xi and Wang Yangming, identified desires as one of the main obstacles to moral development. Zhang Zai was no exception to this trend, which was also probably due to Buddhist influence. The issue of how to moderate or channel desires had been discussed in Chinese philosophy at least since Mencius and Xunzi, but while the earlier Confucian tradition had emphasized finding the proper outlet to express desires and not letting them entirely control one’s actions, eliminating desires entirely never seemed to be a real option. In Xunzi’s case, at least, he clearly denied it was possible to get rid of desires. Eliminating desires was a main focus of Buddhism, on the other hand, and this view of desires was adopted by many of these Study of the Way philosophers. These thinkers focused mainly on what we might call sensual desires. The desire to be a good person was naturally not a cause for concern, but desires for fine clothes, good food, and sex were seen as interfering with one’s original nature. Zhang used the term “material desires,” identifying them with physical nature, so they had to be overcome to return to one’s original nature. Desires somehow arise from the interaction of yin and yang that produces material objects, though Zhang is none too clear exactly what this process is. The fundamental point is that following one’s desires is giving into physical nature and regressing farther and farther away from original goodness.

Overcoming the desires of physical nature, one progresses toward original nature, or the heavenly within, as Zhang also put it. In “Western Inscription” Zhang illustrated this ideal state. Putting aside selfishness, one comes to understand the essential unity of all things. All things are formed from the same qi, and ultimately we all share the same substance. This was to become Zhang’s most famous ethical doctrine, the idea of forming one body with all things. As Zhang wrote in “Western Inscription, “That which fills the universe I regard as my body.” Everyone has Heaven and Earth as their father and mother, and thus everyone are brothers and sisters. Caring for others is like caring for one’s own family. Zhang further wrote, “Even those who are tired, infirm, crippled, or sick; those who have no brothers or children, wives, or husbands, are all my brothers who are in distress and have no one to turn to.” Though there are some precedents for this idea of brotherhood in earlier Confucianism, it sounds much more like the great compassion of Buddhism or the Mohist idea of universal caring—Zhang even uses the same term (jian’ai). In response to a question about this apparent slide into Mohism, Cheng Yi admitted that “Western Inscription” went a little too far, but still defended it as going beyond what previous sages had discussed and being as meritorious as Mencius’ idea of the goodness of human nature. Later thinkers recognized “Western Inscription” as Zhang’s greatest contribution to the Study of the Way.

4. Moral Education and the Heart

Presaging Zhu Xi, Zhang emphasized the role of education in moral development. Education was the way one transformed one’s qi and overcame physical nature. Following earlier philosophers such as Confucius and Xunzi, Zhang insisted that learning should always be directed toward moral cultivation, which in his case meant returning to one’s original nature. Knowledge was not important for its own sake, but for its contributions to moral character. Despite this, Zhang’s own interests were fairly wide-ranging, and he was especially interested in observing and explaining natural phenomena such as the movements of the stars and planets. Nevertheless, he tended not to emphasize this kind of scientific study in his writings on education, which focused on ritual and the classical Confucian texts. Compared with his contemporaries, Zhang placed more importance on the study of ritual. He believed ritual derived from original nature, and following it helps one hold onto original nature and overcome the obstructions of physical nature. Zhang’s interest in the Classic of Changes has already been mentioned, and he also recommended studying the other Confucian classics, the Analects, and Mencius. In contrast to some later Study of the Way philosophers, he did not put a lot of weight on histories, considering them inferior to the classics for helping people transform their qi.

Though Zhang recommended reciting and memorizing these books, he still believed that books were a means to returning to one’s original nature, not an end in themselves. Books functioned like a set of directions: they could tell you how to get to the destination, but they should be not confused with the destination. He felt close reading and textual criticism was not necessary, and getting too caught up in the meaning of a word or sentence could detract from understanding the overall meaning. And even in the classics, not everything should be accepted. Zhang recalled Mencius’ criticism of literal readings of the Classic of Documents and pointed out the necessity for understanding the classics in light of one’s own sense of what is right. This seems to set up a paradox: a student needs to study the classics to return to his original nature and know what is right, but he needs to know what is right to properly understand the classics.

Zhang resolved this contradiction by positing an innate moral sense in everyone that he called “this heart,” a term he apparently adopted from the Mencius. “This heart” presumably belongs to the original nature, and is still present even when embodied in qi, but it can be obstructed and blocked by the physical nature. Zhang referred to this situation as the problem of the “fixed heart” blocking “this heart.” The fixed heart means having intentions, certainty, inflexibility, and egotism. Under these conditions, “this heart” will not function properly and one will have difficulty understanding the classics. The learner must get rid of the fixed heart to let “this heart” free. At times, Zhang suggests that reading books itself helps preserve “this heart,” and it is this heart itself that understands the Way. Ritual is perhaps more important than books. Zhang once suggested that even the illiterate could still develop “this heart,” but apparently ritual was indispensable in overcoming the fixed heart.

Zhang also talked of “expanding the heart” and “making the heart vast.” Both these phrases mean eliminating the obstructions of the fixed heart and putting the heart in a state where it is ready to understand. He tended to value knowledge apprehended directly through the heart over knowledge from sense perception. Zhang did not deny the validity of empirical knowledge, but he believed its scope was limited. Knowledge gained from sense perception is just knowledge of things, not knowledge of the Way. Knowledge of the Way is knowledge gained through the virtuous nature, not through sense perception. “Knowledge gained through the virtuous nature” is another way of saying knowledge apprehended directly by the heart, though Zhang seems to be talking more about a kind of mystic experience than rationalism: he wrote that understanding of the Way is not something thought and consideration can bring about.

The goal of moral cultivation was fulfilling one’s original nature. This was Zhang Zai’s definition of becoming a sage, the term in Chinese philosophy for a perfected person. Another term common in philosophical discourse of the time was integrity or authenticity (cheng). Integrity figured in some important passages in the Doctrine of the Mean, which was one of the most important Confucian texts in Song Study of the Way. Zhang emphasized “integrity resulting from clarity,” which he explained as first coming to understanding through study and inquiry and then fulfilling one’s nature. This could be a long and difficult process, but if one could persist and make the necessary effort, one could fulfill one’s nature and become a sage. There was no greater goal for Zhang.

5. References and Further Reading

Very little is available in English on Zhang Zai. The reader is encouraged to look into general histories of Chinese philosophy, especially those dealing with neo-Confucianism, in addition to the works listed here.

  • Chan, Wing-tsit. A Sourcebook in Chinese Philosophy. Princeton: Princeton University Press, 1963.
    • Translates a selection of Zhang’s works, focusing on Correcting Ignorance.
  • Chan, Wing-tsit, trans. Reflections on Things at Hand: The Neo-Confucian Anthology Compiled by Chu Hsi and Lü Tsu-chien. New York: Columbia University Press, 1967.
    • This probably contains the most extensive collection of Zhang’s writings in English. Chan includes a finding list to help the reader find the selections of a particular philosopher.
  • Chow, Kai-wing. “Ritual, Cosmology, and Ontology: Chang Tsai’s Moral Philosopy.” Philosophy East and West 43.2 (April 1993): 201-28.
    • Emphasizes the importance of ritual in moral development.
  • Huang, Siu-chi. “Chang Tsai’s Concept of Ch’i.” Philosophy East and West 18.4 (October 1968): 247-60.
  • Huang, Siu-chi. “The Moral Point of View of Chang Tsai.” Philosophy East and West 21.2 (April 1971): 141-56.
  • Kasoff, Ira. The Thought of Chang Tsai. Cambridge: Cambridge University Press, 1984.
    • This is the only English-language monograph on Zhang’s philosophy.
  • T’ang, Chün-i. “Chang Tsai’s Theory of Mind and Its Metaphysical Basis.” Philosophy East and West 6.2 (July 1956): 113-36.

Author Information

David Elstein
Email: davidelstein@world.oberlin.edu
State University of New York at New Paltz
U. S. A.

Xunzi (Hsün Tzu, c. 310—c. 220 B.C.E.)

xunziXunzi, along with Confucius and Mencius, was one of the three great early architects of Confucian philosophy. In many ways, he offers a more complete and sophisticated defense of Confucianism than Mencius. Xunzi lived toward the end of the Warring States period (453-221 BCE), generally regarded as the formative era for most later Chinese philosophy. It was a time of great variety of thought, comparable to classical Greece, so Xunzi was acquainted with many competing ideas. In reaction to some of the other thinkers of the time, he articulated a systematic version of Confucianism that encompasses ethics, metaphysics, political theory, philosophy of language, and a highly developed philosophy of education. Xunzi is known for his belief that ritual is crucial for reforming humanity’s original nature. Human nature lacks an innate moral compass, and left to itself falls into contention and disorder, which is why Xunzi characterizes human nature as bad. Ritual is thus an integral part of a stable society. He focused on humanity’s part in creating the roles and practices of an orderly society, and gave a much smaller role to Heaven or Nature as a source of order or morality than most other thinkers of the time. Although his thought was later considered to be outside of Confucian orthodoxy, it was still very influential in China and remains a source of interest today. (See Romanization systems for Chinese terms.)

Table of Contents

  1. Life and Work
  2. The Way and Heaven
  3. Human Nature, Education, and the Ethical Ideal
    1. Human Nature
    2. Education
    3. The Ethical Ideal
    4. Discovering the Way
    5. The Heart
  4. Logic and Language
  5. Social and Political Thought
    1. Government structure
    2. Ritual and Music
    3. Moral Power
  6. References and Further Reading

1. Life and Work

Xunzi (“Master Xun”) is the common appellation for the philosopher whose full name was Xun Kuang. He is also known as Xun Qing, “Minister Xun,” after an office he held. He was born in the state of Zhao in north-central China around 310 BCE. As a young man he studied in the state of Qi in the northeast, which had the greatest concentration of philosophers of the age. Xunzi’s writings show him to be well acquainted with all the doctrines current at the time, which he probably came in contact with during this period of his life. Leaving Qi, he traveled to many of the other states that made up China at the time, and was briefly employed by some of them. His last post ended when his patron was assassinated in 238 BCE, ending his chances to put his theories of government into practice. Xunzi may have lived to see China unified by the authoritarian state of Qin in 221 BCE. If so, he certainly must have been disappointed that two of his former students, Li Si and Han Feizi, helped counsel Qin to victory when the Qin government was steadfastly opposed to Xunzi’s ideas of government through moral power. The Qin dynasty was long remembered as a time of strict laws and draconian punishments, and Xunzi’s association with two of its architects probably was one factor in the later marginalization of his thought.

Like most philosophical works of the time, the Xunzi that we have today is a later compilation of writings associated with him, not all of which were necessarily written by Xunzi himself. The current version of the Xunzi is divided into thirty-two books, about twenty-five of which are considered mostly or wholly authentic and others of which are considered representative of his thought, if not his actual writings. This is probably the largest collection of early Chinese philosophical writings that can be plausibly attributed to one author. The Xunzi is also notable for its style. Comparatively little of it is written in the dialogue format of works like the Mencius, and there are none of the fanciful parables of the Zhuangzi. Most books normally attributed to Xunzi are sustained essays on one topic that appear to have be written as more or less unified pieces, though there are often sections of verse and two books that are merely compilations of poetry. In these writings, Xunzi carefully defines his own position and raises objections to rival thinkers in a way that renders his work more recognizable as philosophy than that of many other early Chinese thinkers.

2. The Way and Heaven

The most important concept in Xunzi’s philosophy is the Way (dao). This is one of the most common terms of Chinese philosophy, though all thinkers define it somewhat differently. Though the term originally referred to a road or path, it became extended to a way of doing things, a way of acting, or as it was used in philosophy, the right way to live. In Xunzi’s case, he means the human way, the way of good government and the proper way of behaving, not the Way of Heaven or Nature as Laozi and Zhuangzi define it, and as Mencius often suggests. In fact, Xunzi is notable for having probably the most rationalistic view of Heaven and the supernatural in the early period. Xunzi claims that the Way was first pointed out by particularly wise and gifted people he calls sages (a common term for an exemplar in early Chinese thought), and following the Way as it has been handed down from the past will result in a stable, prosperous, peaceful society, while going against it will have the opposite results. While certain aspects of the Way, such as particular rituals, are certainly created by humanity, whether the Way as a whole is created or discovered remains a matter of scholarly debate.

Unlike many other early philosophers, Xunzi does not believe Heaven gets involved in human affairs. Heaven was sometimes considered to be an anthropomorphic god, sometimes an impersonal force that automatically rewarded the good and punished the bad, but in Xunzi’s view Heaven is much like Nature: it acts as it always does, neither helping the good or harming the bad. The Way is not the Way because Heaven approves of it, it is the Way because it is good for people. In the chapter “Discourse on Heaven” (chapter 17, also translated as “Discourse on Nature”), Xunzi devotes himself to refuting these other views of Heaven, most prominently that of the Mohists. Heaven does not reward good kings with peace and prosperity, nor punish tyrants by having them deposed. These results come about through their own good or bad decisions. Having a good harvest and sufficient food is not a sign of Heaven’s favor, it is the result of wise agricultural policy. Similarly, events like eclipses and floods are not signs of Heaven’s displeasure: they are simply things that sometimes happen. One might wonder at them as unusual occurrences, but it is not right to be afraid of them or consider them ominous. Worrying about Heaven’s favor is a waste of time; it is better to be prepared for whatever might happen. There will be some natural disasters, but if one is prepared they will not cause harm.

Interestingly, though Xunzi has this rational view of Nature, which extends to spirits and gods as well, he never suggests eliminating religious rituals that are directed toward them, such as sacrifices and divination. One must perform them as part of the ritual system that binds society together, but one does not perform expecting any results. In “Discourse on Heaven,” Xunzi wrote, “You pray for rain and it rains. Why? For no particular reason, I say. It is just as though you had not prayed for rain and it rained anyway.” When it rains after you pray for rain, it is just like when it rains when you didn’t pray for it. Yet during a drought, officials must still pray for rain—not because it has any effect on the natural world, but because of its effect on people. What Xunzi believes ritual does will be examined later.

In Xunzi’s view, the best thing to do is understand what Nature does and what humanity does, and concentrate on the latter. Not only is it wrong to believe that Heaven intervenes in human affairs, it is useless to speculate about why Nature is the way it is or to try to help it along. Xunzi is interested in practical knowledge, and speculation about Nature is not useful. In this respect, he could be considered anti-metaphysical, since he has no interest in how the world works or what it is. His concern is what people should do, and anything that might confuse or detract from that is a waste of time. We know that Nature is invariable, and we know the Way to get what we need from Nature to live, and that is all we need to know. This kind of division between knowledge of the human world and knowledge of Heaven may have been partially influenced by Zhuangzi, but while Zhuangzi considers knowing Heaven to be important, Xunzi does not.

3. Human Nature, Education, and the Ethical Ideal

a. Human Nature

As Mencius is known for the slogan “human nature is good,” Xunzi is known for its opposite, “human nature is bad.” Mencius viewed self-cultivation as developing natural tendencies within us. Xunzi believes that our natural tendencies lead to conflict and disorder, and what we need to do is radically reform them, not develop them. Both shared an optimism about human perfectability, but they viewed the process quite differently. Xunzi envisioned that humanity was once in a state of nature reminiscent of Hobbes. Without study of the Way, people’s desires will run rampant, and they will inevitably find themselves in conflict in trying to satisfy their desires. Left to themselves, people will fall into disorder, poverty and conflict, living a life that would be, as Hobbes put it, “poor, nasty, brutish, and short.” It was this insistence that human nature is bad that was most often condemned by later thinkers, who rejected Xunzi’s view in favor of the idea, traced to Mencius, that people are naturally good.

Xunzi offers several arguments against Mencius’s position. He defines human nature as what is inborn and does not need to be learned. He argues that if people were good by nature, there would be no need for ritual and social norms. The sages would not have had to create them, and they would not need to have been handed down through the generations. They were created precisely because people do not act in accordance with them naturally. He also notes that people desire the good, and on the principle that one desires what one doesn’t already have, this shows that people are not good. He gives several illustrations of what life is like in the state of nature, without any education on ritual and morality. Xunzi does not believe that people are evil, that they deliberately violate the rules of morality, taking a perverse pleasure in doing so. They have no natural conception of morality at all: they are morally blind by nature. Their desires bring them into conflict because they don’t know any better, not because they enjoy conflict. In fact, Xunzi believes people do not enjoy it at all, which is why they desire the kind of life that results from good order brought about through the rituals of the sages.

Like Mencius, Xunzi believed human nature is the same in everyone: no one starts off with moral principles. The original nature of Yao (a legendary sage king) and Jie (a legendary tyrant) was the same. The difference was in how they cultivated themselves. Yao reformed his original nature, Jie did not. In this way, Xunzi emphasizes the essential perfectability of everyone. Human nature is bad, but it is not incorrigible, and in fact Xunzi was rather optimistic about the possibility of overcoming the demands of desires that result in the state of nature. Though Confucius suggests that some people are better off by nature than others, Mencius and Xunzi seem to agree that everyone starts out the same, though they differ on the content of that original state. Though Xunzi believes that it is always possible to reform oneself, he recognizes that in reality this will not always happen. In most cases, the individual himself has to make the first step in attempting to reform, and Xunzi is rather pessimistic about people actually doing this. They cannot be forced to do so, and they may in practice be unable to make the choice to improve, but for Xunzi, this does not mean that in principle it is impossible for them to change.

b. Education

Like Confucius and Mencius, Xunzi is much more concerned with what kind of person to be than with rules of moral behavior or duty, and in this respect his view is similar to Western virtue ethics. The goal of Xunzi’s ethics is to become a person who knows and acts according to the Way as if it were second nature. Because human nature is bad, Xunzi emphasizes the importance of study to learn the Way. He compares the process of reforming one’s nature to making a pot out of clay or straightening wood with a press-frame. Without the potter, the clay would never become a pot on its own. Similarly, people will not be able to reform their nature without a teacher showing them what to do. Xunzi’s concern is primarily moral education; he wants people to develop into good people, not people who know a lot of facts. He emphasizes the transformative aspect of education, where it changes one’s basic nature. Xunzi laid out a program of study based on the works of the sages of the past that would teach proper ritual behavior and develop moral principles. He was the first to offer an organized Confucian curriculum, and his curriculum became the blueprint for traditional education in China until the modern period.

Practice was an important aspect of Xunzi’s course of education. A student did not simply study ritual, he performed it. Xunzi recognized that this performative aspect was crucial to the goal of transforming one’s nature. It was only through practice that one could realize the beauty of ritual, ideally coming to appreciate it for itself. Though this was the end of education, Xunzi appealed to more utilitarian motives to start the student on the program of study. As noted above, he discussed how desires would inevitably be frustrated in the state of nature. Organizing society through ritual was the only way people could ever satisfy even some of their desires, and study of ritual was the best way to achieve satisfaction on a personal level. Through study and practice, one could learn to appreciate ritual for its own sake, not just as a means to satisfy desires. Ritual has this power to transform someone’s motives and character. The beginning student of ritual is like a child learning to play the piano. Maybe she doesn’t enjoy playing the piano at first, but her parents take her out for ice cream after each lesson, so she goes along with it because she gets what she wants. After years of study and practice, she might learn to appreciate playing the piano for its own sake, and will practice even without any reward. This is what Xunzi imagines will happen to the dedicated student of ritual: he starts out studying ritual as a means, but it becomes an end in itself as part of the Way.

c. The Ethical Ideal

Xunzi often distinguishes three stages of progress in study: the scholar, the gentlemen, and the sage, though sometimes the sage and the gentleman seem to be equivalent for him. These were all terms in common use in philosophical discourse of the time, especially in Confucian thought, but Xunzi gives them a unique twist. He describes the achievements of each stage slightly differently in several places, but what he seems to mean is that a scholar is someone who has taken the first step of wishing to study the Way of the ancient sages and adopts them as the model for correct conduct; the gentleman has acquired a good deal of learning, but still must think about what the right thing to do is in a situation; and the sage has wholly internalized the principles of ritual and morality so that his action flows spontaneously without the need for thought, yet never goes beyond the bounds of what is proper. Using the piano analogy, the scholar has made up his mind to study the piano and is practicing basic scales. The gentleman is fairly skilled, but still needs to look at the music in front of him to know what to play. The sage is like a concert pianist who not only plays with perfect technique, but also adds his own style and unique interpretation of the music, accomplishing all this without ever consciously thinking about what notes to play. As the pianist is still playing someone else’s music, the sage does not make up new standards of conduct; he still follows the Way, but he makes it his own. Yet even then, at this highest stage, Xunzi believes there is still room for learning. Study is a lifelong process that only ends at death, much as concert pianists must still practice to maintain their skills.

The teacher plays an extremely important role in the course of study. A good teacher does not simply know the rituals, he embodies them and practices them in his own life. Just as one would not learn piano from someone who had just read a book on piano pedagogy but never touched an actual instrument, one should not study from someone who has only learned texts. A teacher is not just a source of information; he is a model for the student to look up to and a source of inspiration of what to become. A teacher who does not live up to the Way of the sages in his own life is no teacher at all. Xunzi believes there is no better method of study than learning from such a teacher. In this way, the student has a model before of him of how to live ritual principles, so his learning does not become simple accumulation of facts. In the event that such a teacher is unavailable, the next best method is to honor ritual principles sincerely, trying to embody them in oneself. Without either of these methods, Xunzi believes learning degenerates into memorizing a jumble of facts with no impact on one’s conduct.

d. Discovering the Way

Given Xunzi’s insistence on the importance of teachers to transmit the Way of the sages of the past and his belief that people are all bad by nature, he must face the question of how the first sages discovered the Way. Xunzi uses the metaphor of a river ford for the true Way: without the people who have gone before to leave markers, those coming after would have no way of knowing where the deep places are, and they would be in danger of drowning. The question is, how did the first people get across safely, when there were no markers? Xunzi does not address the question in precisely this way, but we can piece together an answer from his writings.

Examining the analogies Xunzi uses is instructive here. He talks about cultivating moral principles as a process of crafting, using the metaphors of a potter shaping and firing clay into a pot, or using a press-frame to straighten a bent piece of wood. Just as the skill of making pottery was undoubtedly accumulated through generations of refining, Xunzi appears to think that the Way of the sages was also a product of generations of development. According to Xunzi’s definition of human nature, no one could say people know how to make pots by nature: this is not something we can do without study and practice, like walking and talking are. Nevertheless, some people, through a combination of perseverance, talent, and luck, were able to discover how to make pots, and then taught that skill to others. Similarly, through generations of observing humanity and trying different ways of regulating society, the sages hit upon the correct Way, the best way to order society in Xunzi’s view. David Nivison has suggested that different sages of the past contributed different aspects of the Way: some discovered agriculture, some discovered fire, some discovered the principles of filiality and respect between husband and wife, and so on.

Xunzi views these achievements as products of the sage’s acquired nature, not his original nature. This is another way of saying these are not products of people’s natural tendencies, but the results of study and experimentation. Accumulation of effort is an important concept for Xunzi. The Way of the sages was created through accumulation of learning what worked and benefited society. The sages built on the accomplishments of previous sages, added their own contributions, and now Xunzi believes the process is basically complete: we know the ritual principles that will produce a harmonious society. Trying to govern or become a moral person without studying the sages of the past is essentially trying to re-invent the wheel, or discover how to make pots on one’s own without learning from a potter. It is conceivable (though Xunzi is very skeptical about anyone actually being able to do it), but it is much more difficult and time-consuming, when all one has to do is study what has already been created.

e. The Heart

In addition to having a teacher, a critical requirement for study is having the proper frame of mind, or more precisely, heart, since early Chinese thought considered cognition to be located in the heart. Xunzi’s philosophy of the heart draws from other contemporary views as well as Confucian philosophy. Like Mencius, Xunzi believed that the heart should be the lord of the body, and using the heart to direct desires and decide on right and wrong accords with the Way. However, like Zhuangzi, Xunzi emphasizes that the heart must be tranquil and concentrated to be able to learn. In the view of the heart basically shared by Xunzi and Mencius, desires are not wholly voluntary. Desires are part of human nature, and can be activated without our necessarily being conscious of them. The function of the heart is to regulate the sense faculties and parts of the body, so that though one may have desires, the heart only acts on those desires when it is right to do so. The heart controls itself and directs the other parts of the body. This ability of the heart is what allows humanity to create ritual and moral principles and escape the state of nature.

In the chapter “Dispelling Blindness” Xunzi discusses the right way to develop the heart to avoid falling into error. For study, the heart needs to be trained to be receptive, focused, and calm. These qualities of the heart allow it to know the Way, and knowing the Way, the heart can realize the benefits of the Way and practice it. This receptivity Xunzi calls emptiness, meaning the ability of the heart to continually store new information without becoming full. Focus is called unity, by which Xunzi means the ability to be aware of two aspects of a thing or situation without allowing them to interfere with each other. “Being of two hearts” was a common problem in Chinese philosophical writings: it could mean being confused or perplexed about something, as well as what we would call being two-faced. Xunzi addresses the first aspect with his discussion of unity, a focus that keeps the heart directed and free from perplexity. The final quality the heart needs is stillness, the quality of moving freely from task to task without disorder, remaining unperturbed while processing new information. A heart that has the qualities of emptiness, unity, and stillness can understand the Way. Without these qualities, the heart is liable to fall into various kinds of “blindness” or obsessions that Xunzi attributes to his philosophical rivals. Their hearts focus too much on just one aspect of the Way, so they are unable to see the big picture. They become obsessed with this one part and mistake it for the entirety of the Way. Only with the proper attitudes and control of one’s heart can one perceive and grasp the Way as a whole.

4. Logic and Language

One subject that was certainly not part of Xunzi’s program of study is logic. Other philosophers, particularly the Mohist school, were developing sophisticated views on logic and the principles of argumentation around Xunzi’s time, and other thinkers were known for their paradoxes that played with language to show its limits. Though Xunzi was undoubtedly influenced by the principles of argument developed by the Mohists, he had no patience for the dialectical games and disputation for its own sake that were popular at the time. According to one story, a philosopher, having just convinced a king through his arguments, then took the other side and persuaded the king that his earlier arguments were wrong. Such exercises in argument and rhetoric were a waste of time for Xunzi; the only correct use of argument was to convince someone of the truth. Even the work of trying to distinguish logical categories was not productive in his view. According to Xunzi, such work can accomplish something, but it is still not the province of the gentlemen, much as wondering about the workings of nature are not the gentlemen’s concern, either. The only proper object of study is the Way of the sages; anything else is at best useless and at worst detrimental to the Way.

Despite his professed disinterest in logic, Xunzi came up with the most detailed philosophy of language in early Confucian thought. Again, however, his primary concern was preserving the Way in the face of attacks, which in Xunzi’s view included questions about the nature of language that were arising at the time. He defended a modified conventionalism concerning language: names were not intrinsically appropriate for the objects they referred to, but once usage was determined by convention, to depart from it is wrong. It would be a mistake to think of Xunzi’s view as a kind of nominalism, however, since he is very clear that there is an objective reality that names refer to. The particular phonemes used to make the word “cat” in language are conventionally determined, but the fact that a cat is a kind of feline is real. One of the fundamental principles of Confucianism was that the reality must match the name. Confucian thinkers were most concerned about the names of social roles: a father must act like a father should, a ruler must act like a ruler should. Not fulfilling the demands of one’s role means that one does not deserve the title, hence Mencius defined the removal of a tyrant as the killing of a commoner, not regicide. Xunzi defended this view, yet he objected to the Mohists, who claimed that a robber is not a person, so that killing a robber is not killing a person. This kind of usage violated the principles of correct naming and departed from the Way, though Xunzi is not entirely clear why. In Xunzi’s view, the reality represented by a name is objective, even if the name is merely conventional. Because of the objectivity of referent, he distinguishes appropriate (following convention) and inappropriate (violating convention) uses of names. In addition, he believes there are good and bad names. Good names are simple and direct and readily bring the referent to mind. Using names in a way that the referents are clear is using names correctly. The chief function of language is to communicate, and anything that interferes with communication, such as the word games and paradoxes of other philosophers, should be eliminated.

5. Social and Political Thought

a. Government structure

The Warring States period, during which Xunzi lived, was a time of great social change and instability. As the name implies, it was a period of disunity, when several different states were warring with each other to determine who would gain control of all of China and found a new dynasty. Under the pressure of competition, the old ways and political systems were being abandoned in the search for greater control over human and material resources and increased military power. The central question for most philosophers of the time was how to respond to this time of instability and achieve a greater measure of order and safety. For the Confucian philosophers, the answer was found in a revival of the ways of the past, and for Xunzi in particular, the most important aspect of that was the ritual system. In this sense, the ethical and political aspects of Xunzi’s philosophy are the core areas, and in fact were not sharply distinguished in most Confucian thought. Metaphysics and philosophy of language serve to further the goal of restoring social stability.

All of the Warring States philosophers assumed that the government should be a monarchy. The king was the ultimate authority in all areas of government, having full power to hire and dismiss (and execute) any other government official. There was no idea of democracy in early China. The ruler could lose his state through failing in his duties as a sovereign, but he could not be replaced at the whim of the people. The political thinkers of the time instead tried to impose checks through tradition and thought, rather than law. The Mohists made Heaven the watchdog over the ruler: if a ruler offended Heaven by mistreating the people, Heaven would have him removed through war or revolt. The Confucians also emphasized the duties of the ruler to the people, though in Xunzi’s case there was no personified Heaven watching over things. One of the functions of ritual was to try to put limits on the power of the ruler and emphasize his obligation to the people. Confucian thinkers, including Xunzi often viewed the state as a family. Just as a father must take care of his children, the ruler must take care of the people, and in return, the people will respond with loyalty. The Confucians also offered a very practical motive to care for the people: if the people were dissatisfied with the ruler, they would not fight on his behalf, and the state would be ripe for annexation by its neighbors.

b. Ritual and Music

Xunzi diagnosed the main cause of disorder as a breakdown of the social hierarchy. When hierarchical distinctions are confused and people do not follow their proper roles, they compete indiscriminately to satisfy their desires. The way to put limits on this competition is to clarify social distinctions: such as between ruler and subject, between older brother and younger brother, or between men and women. When everyone knows their place and what obligations and privileges they have, they will not contend for goods beyond their status. Not only will this result in order and stability, it actually will allow for greater satisfaction of everyone’s desires than the competition of the state of nature. This is the primary purpose of ritual: to clarify and enforce social distinctions, which will bring an end to contention for limited resources and improve social order. This, in turn, will ensure greater prosperity. The ritual tradition not only emphasized reciprocal obligations between people of different status, it had extremely precise regulations concerning who was allowed to own what kind of luxuries. There were rules concerning what colors of clothing different people could wear, who was allowed to ride in carriages, and what grave goods they could be buried with. The point of all these rules is to enforce the distinctions necessary for social harmony and prevent people from reaching beyond their station.

Without the benefit of ritual principles to enforce the social hierarchy, the identity of human nature makes conflict inevitable. By nature we all desire the same things: fine food, beautiful clothing, wealth, and comfort. Xunzi believes desires are inevitable. When most people see something beautiful, they will desire it: only the sage can control his desires. Because of limited resources, it is impossible for everyone to satisfy their desires for material goods. What people can do is decide whether to act on a desire or not. Ritual teaches people to channel, moderate, and in some cases transform their desires so they can satisfy them in appropriate ways. When it is right to do so one satisfies them, and when that is not possible one moderates them. This allows both the partial satisfaction of desires and the maintenance of social harmony. All of this is made possible by the ritual principles of the Way, when the alternative is the chaos of the state of nature. Hence, Xunzi wrote that Confucian teachings allow people to satisfy both the demands of ritual and their desires, when the alternative is satisfying neither.

Another important part of governing is music. The ancient Chinese believed that music was the most direct and effective way of influencing the emotions. Hence, only allowing the correct music to be played was crucial to governing the state. The right kinds of music, those attributed to the ancient sages, could both give people an outlet for emotions that could not be satisfied in other ways, like aggression, and channel their emotions and bring them in line with the Way. The wrong kind of music would instead encourage wanton, destructive behavior and cause a breakdown of social order. Because of its powerful effect on the emotions, music is as important a tool as ritual in moral education and in governing. Much as Plato suggested in the Republic, Xunzi believes regulating music is one of the duties of the state. It must promulgate the correct music to give people a legitimate source of emotional expression and ban unorthodox music to prevent it from upsetting the balance of society.

c. Moral Power

As he does with virtuous people, Xunzi distinguishes different levels of rulers. The lowest is the ruler who relies on military power to expand his territory, taxes excessively without regard for whether his people have enough to sustain themselves, and keeps them in line with laws and punishments. According to Xunzi, such a ruler is sure to come to a bad end. A ruler who governs efficiently, does not tax the people too harshly, gathers people of ability around him, and makes allies of the neighboring states can become a hegemon. The institution of the hegemon existed briefly about three hundred years before Xunzi’s time, but he often uses the term to connote an effective ruler who is still short of the highest level. The highest level is that of the true king who wins the hearts of the people through his rule by ritual principles. The moral power of the true king is so great that he can unify the whole country without a single battle, since the people will come to him of their own accord to live under his beneficent rule. According to Xunzi, this is how the sage kings of the past were able to unify the country even though they began as rulers of small states. The best kind of government is government through the moral power acquired by following the Way.

This concept of moral power was quite old in China even in Xunzi’s time, though initially it referred to the power gained from the spirits through sacrifice. Beginning with Confucius, it become ethicized into a kind of power or charisma that anyone who cultivated virtue and followed the Way developed. Through this moral power, a king could rule effectively without having to personally attend to the day-to-day business of governing. Following his example, the people would become virtuous as well, so crime would be minimal, and the ruler’s subordinates could carry out the necessary administrative tasks to run the state. In Confucian thought, the most important role of the ruler is that of moral example, which is why the best government was that of a sage who followed the ritual principles of the Way. Confucius seemed to believe that the moral power of a sage king would render laws and punishments completely unnecessary: the people would be transformed by the ruler’s moral power and never transgress the boundaries of what is right. Xunzi, while still believing in the efficacy of rule through moral force, is not quite as optimistic, which is likely related to his view on human nature. He thinks punishments will still be necessary because some people will break the law, but a sage king will only rarely need to employ punishments to keep the people in line, while a lord-protector or ordinary ruler will have to resort to them much more. This increased acceptance of the necessity for punishments may have influenced Xunzi’s student Han Feizi, to whom is attributed the most developed theory of government through a strict system of rewards and punishments that was employed by the short-lived Qin dynasty.

6. References and Further Reading

  • Cua, Antonio S. Ethical Argumentation: A Study in Hsün Tzu’s Moral Epistemology. Honolulu: University of Hawaii Press, 1985.
  • Dubs, Homer H. Hsüntze: Moulder of Ancient Confucianism. London: Arthur Probsthain, 1927. The first English-language monograph on Xunzi’s thought.
  • Goldin, Paul. Rituals of the Way. Chicago: Open Court, 1999. A good overview of the essentials of Xunzi’s thought.
  • Ivanhoe, Philip J. Confucian Moral Self Cultivation. Indianapolis: Hackett, 2000. An introduction to Confucian thought, focusing on the theme of self cultivation. Includes a chapter on Xunzi.
  • Kline, T.C. III and Philip J. Ivanhoe, eds. Virtue, Nature, and Moral Agency in the Xunzi. Indianapolis: Hackett, 2000. An excellent anthology bringing together much of the recent important work on Xunzi. The bibliography includes virtually every English publication related to Xunzi.
  • Knoblock, John, trans. Xunzi: A Translation and Study of the Complete Works, 3 vols. Stanford: Stanford University Press, 1988, 1990, 1994. The only complete English translation of the Xunzi, with extensive introductory material.
  • Machle, Edward. Nature and Heaven in the Xunzi: A Study of the Tian Lun. Albany: SUNY Press, 1993. A translation and study of chapter seventeen, “Discourse on Heaven.”
  • Watson, Burton, trans. Hsün Tzu: Basic Writings. New York: Columbia University Press, 1964. An excerpted translation, including many of the more philosophically interesting chapters. It is easier for non-specialists than Knoblock.

Author Information

David Elstein
Email: davidelstein@world.oberlin.edu
State University of New York at New Paltz
U. S. A.

Xuanzang (Hsüan-tsang) (602—664)

xuanzangXuanzang, world-famous for his sixteen-year pilgrimage to India and career as a translator of Buddhist scriptures, is one of the most illustrious figures in the history of scholastic Chinese Buddhism. Born into a scholarly family at the outset of the Tang (T’ang) Dynasty, he enjoyed a classical Confucian education. Under the influence of his elder brother, a Buddhist monk, however, he developed a keen interest in Buddhist subjects and soon became a monk himself at the age of thirteen. Upon his return to Chang’an in 645, Xuanzang brought back with him a great number of Sanskrit texts, of which he was able to translate only a small portion during the remainder of his lifetime. In addition to his translations of the most essential Mahayana scriptures, Xuanzang authored the Da tang xi yu ji (Ta-T’ang Hsi-yu-chi or Records of the Western Regions of the Great T’ang Dynasty) with the aid of Bianji (Bian-chi). It is through Xuanzang and his chief disciple Kuiji (K’uei-chi) (632-682) that the Faxiang (Fa-hsiang or Yogacara/Consciousness-only) School was initiated in China. In order to honor the famous Buddhist scholar, the Tang Emperor Gaozong (Gao-tsung) cancelled all audiences for three days after Xuanzang’s death. (See Romanization systems for Chinese terms.)

Table of Contents

  1. Xuanzang’s Beginnings (602-630)
  2. Pilgrimage to India (630-645)
  3. His Return to China and Career as Translator (645-664)
  4. The Faxiang School
    1. The Development of Yogacara
    2. Metaphysics of Mere-Consciousness
    3. Some Objections Answered
    4. The Vijnaptimatratasiddhi-sastra
    5. Faxiang Doctrines
  5. Conclusion
  6. References and Further Reading

1. Xuanzang’s Beginnings (602-630)

Born of a family possessing erudition for generations in Yanshi prefecture of Henan province, Xuanzang, whose lay name was Chenhui, was the youngest of four children. His great-grandfather was an official serving as a prefect, his grand-father was appointed as Professor in the National College at the capital, and his father was a Confucianist of the rigid conservative type who gave up office and withdrew into seclusion to escape the political turmoil that gripped China at that time. According to traditional biographies, Xuanzang displayed a precocious intelligence and seriousness, amazing his father by his careful observance of the Confucian rituals at the age of eight. Along with his brothers and sister, he received an early education from his father, who instructed him in classical works on filial piety and several other canonical treatises of orthodox Confucianism.

After the death of Xuanzang’s father in 611, his older brother Chensu, later known as Changjie, became the primary influence on his life. As a result, he commenced visiting the monastery of Jingtu at Luoyang where his brother dwelled as a Buddhist monk, and studying sacred texts of the faith with all the ardor of a young convert. When Xuanzang requested to take Buddhist orders at the age of thirteen, the abbot Zheng Shanguo made an exception in his case because of his precocious sapience.

In 618, due to the civil war breaking out in Henan, Xuanzang and his brother sought refuge in the mountains of Sichuan, where he spent three years or so in the monastery of Kong Hui plunging into the study of various Buddhist texts, such as the Abhidharmakosa-sastra (Abhidharma Storehouse Treatise. In 622, he was fully ordained as a monk. Deeply confused by myriad contradictions and discrepancies in the texts, and not receiving any solutions from his Chinese masters, Xuanzang decided to go to India and study in the cradle of Buddhism.

2. Pilgrimage to India (630-645)

An imperial decree by the Emperor Taizong (T’ai-tsung) forbade Xuanzang’s proposed visit to India on the grounds on preserving national security. Instead of feeling deterred from his long-standing dream, Xuanzang is said to have experienced a vision that strengthened resolve. In 629, defying imperial proscription, he secretly set out on his epochal journey to the land of the Buddha from Chang’an.

Xuanzang reports that he travelled by night, hiding during the day, enduring many dangers, and bereft of a guide after being abandoned by his companions. After some time in the Gobi Desert, he arrived in Liangzhou in modern Gansu province, the westernmost extent of the Chinese frontier at that time and the southern terminus of the Silk Road trade route connecting China with Central Asia. Here he spent approximately a month preaching the Buddhist message before being invited to Hami by King Qu Wentai (Ch’u Wen-tai) of Turfan, a pious Buddhist of Chinese extraction.

It soon became apparent to Xuanzang that Qu Wentai, although most hospitable and respectful, planned to detain him for life in his Court as its ecclesiastical head. In response, Xuanzang undertook a hunger strike until the king relented, extracting from Xuanzang a promise to return and spend three years in the kingdom upon his return. After remaining there for a month more for the sake of the dharma, Xuanzang resumed his journey in 630, well provided with introductions to all the kings on his itinerary, including the formidable Turkish Khan whose power extended to the very gates of India. Having initially left China against the will of the Emperor, he was no longer an unknown fugitive fleeing in secret, but an accredited pilgrim with official standing.

At long last, Xuanzang reached his ultimate destination, where his strongest personal interest in Buddhism was located and the principal portion of his time abroad was spent: the Nalanda monastery, located southwest of the modern city of Bihar in northern Bihar state. As a far-famed metropolis of Buddhist monastic education, Nalanda was a veritable monastic city consisting of some ten huge temples with spaces between divided into eight compounds, surrounded by a high wall. There were over ten thousand Mahayana monks there engaged in the study of the orthodox Buddhist canon as well as the Vedas, arithmetic, and medicine. According to legend, Silabhadra (529-645), abbot of Nalanda, was considering suicide after years of wasting illness when he received instructions from deities in a dream, commanding him to endure and await the arrival of a Chinese monk in order to guarantee the preservation of the Mahayana tradition abroad. Indeed, Xuanzang became Silabhadra’s disciple in 636 and was initiated into the Yogacara lineage of Mahayana learning by the venerable abbot. While at Nalanda, Xuanzang also studied Sanskrit and Brahmana philosophy. Subsequent studies in India included hetu-vidya (logic), the exegesis of Mahayana texts such as the Mahayana-sutralamkara (Treatise on the Scripture of Adorning the Great Vehicle), and Madhyamika (“Middle-ist”) doctrines.

The name of the Madhyamika School, founded by Nagarjuna (2nd century CE), derives from its having sought a middle position between the realism of the Sarvastivada (Doctrine That All Is Real) School and the idealism of the Yogacara (Mind Only) School. Xuanzang appears to have combined these two systems into each other in a more eclectic and comprehensive Mahayanism. With the approval of his Nalanda mentors, Xuanzang composed a treatise, Hui zong lun (Hui-tsüng-lun or On the Harmony of the Principles), which articulates his synthesis.

At Nalanda, Xuanzang became a critic of two major philosophical systems of Hinduism opposed to Buddhism: the Samkhya and the Vaiseshika. The former was based upon a dualism of Nature and Spirit. The latter was a realist system, immediate and direct in its realism, resting upon the acceptance of the data of consciousness and experience as such: in brief, it was a melding of monism and atomism. Such beliefs were in absolute contradiction to the acosmic idealism of the Buddhist Yogacara, which evenly repelled the substantial entity of the ego and the objective existence of matter. Xuanzang also critiqued the atheistic monism of the Jains, especially inveighing against what he saw as their caricature of Buddhism in terms of Jain monastic garb and iconography.

Xuanzang’s success in religious and philosophical disputes evidently aroused the attention of some Indian potentates, including the King of Assam and the poet-cum-dramatist king Harsha (r. 606-647), who was regarded as a Buddhist patron saint upon the throne like Ashoka and Kanishka of old. An eighteen-day religious assembly was convoked in Harsha’s capital of Kanauj during the first week of the year 643, during which Xuanzang allegedly defeated five hundred Brahmins, Jains, and heterodox Buddhists in spirited debate.

Following these public successes in India, Xuanzang resolved to return to China by way of Central Asia. He followed the caravan-track that led across the Pamirs to Dunhuang. In the spring of 644, he reached Khotan and awaited a reply to his request for return addressed to the Emperor Taizong. In the month of November, Xuanzang left for Dunhuang by a decree of the Emperor, and arrived in the Chinese capital Chang’an the first month of the Chinese Lunar Year 645.

3. His Return to China and Career as Translator (645-664)

Traditional sources report that Xuanzang’s arrival in Chang’an was greeted with an imperial audience and an offer of official position (which Xuanzang declined), followed by an assembly of all the Buddhist monks of the capital city, who accepted the manuscripts, relics, and statues brought back by the pilgrim and deposited them in the Temple of Great Happiness. It was in this Temple that Xuanzang devoted the rest of his life to the translation of the Sanskrit works that he had brought back out of the wide west, assisted by a staff of more than twenty translators, all well-versed in the knowledge of Chinese, Sanskrit, and Buddhism itself. Besides translating Buddhist texts and dictating the Da tang xi yu ji in 646, Xuanzang also translated the Dao de jing (Tao-te Ching) of Laozi (Lao-tzu) into Sanskrit and sent it to India in 647.

His translations may, by and large, be divided into three phases: the first six years (645-650), focusing on the Yogacarabhumi-sastra; the middle ten years (651-660), centering on the Abhidharmakosa-sastra; and the last four years (661-664), concentrating upon the Maha-prajnaparamita-sutra. In each phase of his career as a translator, Xuanzang saw his task as introducing Indian Buddhist texts to Chinese audiences in all their integrity. According to Thomas Watters, the total number of texts brought by Xuanzang from India to China is six hundred and fifty seven, enumerated as follows:

Mahayanist sutras: 224 items
Mahayanist sastras: 192
Sthavira sutras, sastras and Vinaya: 14
Mahasangika sutras, sastras and Vinaya: 15
Mahisasaka sutras, sastras and Vinaya: 22
Sammitiya sutras, sastras and Vinaya: 15
Kasyapiya sutras, sastra and Vinaya: 17
Dharmagupta sutras, Vinaya, sastras: 42
Sarvastivadin sutras, Vinaya, sastras: 67
Yin-lun (Treatises on the science of Inference): 36
Sheng-lun (Etymological treatises): 13

4. The Faxiang School

a. The Development of Yogacara

The Chinese Faxiang School, derived from the Indian Yogacara (yoga practice) School, is based upon the writings of two brothers, Asanga and Vasubandhu, who explicated a course of practice wherein hindrances are removed according to a sequence of stages, from which it gets its name. The appellation of the school originated with the title of an important fourth- or fifth-century CE text of the school, the Yogacarabhumi-sastra. Yogacara attacked both the provisional practical realism of the Madhyamika School of Mahayana Buddhism and the complete realism of Theravada Buddhism. Madhyamika is regarded as the nihilistic or Emptiness School, whereas Yogacara is seen as the realistic or Existence School. While the former is characterized as Mahayana due to its central theme of emptiness, the latter might be considered to be semi-Mahayana to a point for three basic reasons: (1) the Yogacara remains realistic like the Abhidharma School; (2) it expounds the three vehicles side by side without being confined to the Bodhisattvayana; and (3) it does not accent the doctrine of Buddha nature.

The other name of the school, Vijnanavada (Consciousness-affirming/Doctrine of Consciousness), is more descriptive of its philosophical position, which in short is that the reality a human being perceives does not exist. Yogacara becomes much better known, nevertheless, not for its practices, but for its rich development in psychological and metaphysical theory. The Yogacara thinkers took the theories of the body-mind aggregate of sentient beings that had been under development in earlier Indian schools such as the Sarvastivada, and worked them into a more fully articulated scheme of eight consciousnesses, the most weighty of which was the eighth, or store consciousness — the alaya-vijnana.

The Yogacara School is also known for the development of other key concepts that would hold great influence not merely within their system, but within all forms of later Mahayana to come. They embody the theory of the three natures of the dependently originated, completely real, and imaginary, which are understood as a Yogacara response to the Madhyamika’s truth of emptiness. Yogacara is also the original source for the theory of the three bodies of the Buddha, and greatly expands the notions of categories of elemental constructs.

Yogacara explored and propounded basic doctrines that were to be fundamental in the future growth of Mahayana and that influenced the rise of Tantric Buddhism. Its central doctrine is that only consciousness (vijnanamatra; hence the name Vijnanavada) is real, and that mind is the ultimate reality. In other words, external objects do not exist; nothing exists outside the mind. The common view that external phenomena exist is due to a misconception that is removable through a meditative or yogic process, which brings a complete withdrawal from these fictitious externals, and an inner concentration and tranquility may accordingly be bodied forth.

Yogacara is an alternative system of Buddhist logic. According to it, the object is not at all as it seems, and thus can not be of any service to knowledge. It is therefore unreal when consciousness is the sole reality. The object is only a mode of consciousness. Its appearance although objective and external is in fact the transcendental illusion, because of which consciousness is bifurcated into the subject-object duality. Consciousness is creative and its creativity is governed by the illusive idea of the object. Reality is to be viewed as an Idea or a Will. This creativity is manifested at different levels of consciousness.

Since this school believes that only ideation exists, it is also called the Idealistic School. In China, it was established by Xuanzang and his principal pupil Kuiji who systematized the teaching of his masters recorded in two essential works: the Fa yuan i lin zhang (Fa-yuan i-lin-chang or Chapter on the Forest of Meanings in the Garden of Law) and the Cheng wei shi lun shu ji (Ch’eng wei-shih lun shu-chi or Notes on the Treatise on the Completion of Ideation Only). On account of the school’s idealistic accent it is known as Weishi (Wei-shih) or Ideation Only School; yet because it is concerned with the specific character of all the dharmas, it is often called the Faxiang School as well. Besides, this school argues that not all beings possess pure seeds and, therefore, not all of them are capable of attaining Buddhahood.

The central concept of this school is borrowed from a statement by Vasubandhu — idam sarvam vijnaptimatrakam, “All this world is ideation only.” It strongly claims that the external world is merely a fabrication of our consciousness, that the external world does not exist, and that the internal ideation presents an appearance as if it were an outer world. The whole external world is, hence, an illusion according to it.

b. Metaphysics of Mere-Consciousness

Broadly speaking, Mere-Consciousness may cover the eight consciousnesses, the articulation of which forms one of the most seminal and distinctive aspects of the doctrine of the Yogacara School, transmitted to East Asia where it received the somewhat pejorative designations of Dharma-character School and Consciousness-only School. According to this doctrine, sentient beings possess eight distinct layers of consciousness, the first five — the visual consciousness, auditory consciousness, olfactory consciousness, gustatory consciousness, and tactile consciousness — corresponding to the sense perceptions, the sixth discriminatory consciousness to the thinking mind, the seventh manas consciousness to the notion of ego, and the eighth alaya-consciousness to the repository of all the impressions from one’s past experiences. As the first seven of these arise on the basis of the eighth, they are called the transformed consciousnesses. In contrast, the eighth is known as the base consciousness, store consciousness, or seed consciousness. And in particular, it is this last consciousness that the Mere-Consciousness is all about.

One of the foremost themes discussed in the school is the

alaya-vijnana or storehouse consciousness, which stores and coordinates all the notions reflected in the mind. Thus, it is a storehouse where all the pure and contaminated ideas are blended or interfused. This principle might be illustrated by the school’s favorite citation:

“A seed produces a manifestation,
A manifestation perfumes a seed.
The three elements (seed, manifestation, and perfume) turn on and on,
The cause and effect occur at one and the same time.”

It is the doctrine of consciousness or mind as the basis for so-called “external” objects that gave the Cittamatra (Mind Only) tradition its name. Apparently external objects are constituted by consciousness and do not exist apart from it. Vasubandhu began his

Vimsatika vijnapti-matrata-siddhih (Twenty Verses on Consciousness-only) by stating: “All this is only perception, since consciousness manifests itself in the form of nonexistent objects.” There is only a flow of perceptions. This flow, however, really exists, and it is mental by nature, as in terms of the Buddhist division of things it has to be either mental or physical. The flow of experiences could barely be a physical or material flow. There might be a danger in calling this “idealism,” because it is rather dissimilar from forms of idealism in Western philosophy, in which it is deemed necessary for a newcomer to negate and transcend previous theories and philosophies through criticism, but the situation in Buddhism, especially Yogacara Buddhism, is such that it developed its doctrines by inheriting the entire body of thought of its former masters. Nonetheless, if “idealism” denotes that subjects and objects are no more than a flow of experiences and perceptions, which are of the same nature, and these experiences, just as perceptions, are mental, then this could be called a form of “dynamic idealism.”

Because this school maintains that no external reality exists, while retaining the position that knowledge exists, assuming knowledge itself is the object of consciousness. It, therefore, postulates a higher storage consciousness, which is the final basis of the apparent individual. The universe consists in an infinite number of possible ideas that lie inactively in storage. Such dormant consciousness projects an interrupted sequence of thoughts, while it itself is in restless flux till the karma, or accumulated consequences of past deeds, blows out. This storage consciousness takes in all the impressions of previous experiences, which shape up the seeds of future karmic action, an illusory force creating outer categories that are actually only fictions of the mind. So illusive a force determines the world of difference and belongs to human nature, sprouting the erroneous notions of an I and a non-I. That duality can only be conquered by enlightenment, which effects the transformation of an ordinary person into a Buddha.

c. Some Objections Answered

Certain objections were interposed to level at Yogacara’s doctrine of consciousness. Vasubhandhu, again in his Vimsatika, undertook to prove the invalidity of some of these:

  • Spatiotemporal determination would be impossible — experiences of object X are not occurrent everywhere and at every time so there must be some external basis for our experiences.
  • Many people experience X and not just one person, as in the case of a hallucination.
  • Hallucinations can be determined because they do not possess pragmatic results. It does not follow that entities, which we generally accept as real, can be placed in the same class.

In reply, Vasubandhu argued that these were after all no objections; they simply failed to show that perception-only as a teaching was beyond the limits of what could be concretely reasoned. Spatiotemporal determination can be elucidated on the analogy of dream experience, where a complete and surreal world is created with objects appearing to have spatiotemporal localization despite the fact that they do not exist apart from the mind which is cognizing them. Moreover, the second objection can be met by recourse to the wider Buddhist religious framework. The hells and their tortures, which are taught by Buddhist beliefs as the result of wicked deeds, and to be endured for a very long time till purified, are experienced as the collective fruit of the previous karmas done by those hell inmates. The torturers of hell obviously can not really exist, otherwise they would have been reborn in hell themselves and would too experience the sufferings associated with it. If this were the case then how could they jovially inflict sufferings upon their fellow inmates? Thus they must be illusive, and yet they are experienced by a number of people. Finally, as in a dream objects bear some pragmatic purpose within that dream, and likewise in hell, so in everyday life. Furthermore, as physical activity can be directed toward unreal objects in a dream owing, it is said, to nervous irritation on the part of the dreamer, so too in daily life.

e. The Vijnaptimatratasiddhi-sastra

Representing a two-hundred-year development within the Vijnanavadin tradition subsequent to the Lankavatara Sutra (Sutra on the Buddha’s Entering the Country of Lanka) and being the primary text of the Faxiang School, the Vijnaptimatratasiddhi-sastra is an exhaustive study of the alaya-vijnana and the sevenfold development of the manas, manovijnana, and the five sensorial consciousnesses. As a creative and elaborate exposition of Vasubandhu’s Trimsika-vijnapti-matrata-siddhi (Treatise in Thirty Stanzas on Consciousness Only) rendered by Xuanzang in 648 at Great Happiness Monastery, it synthesizes the ten most significant commentaries written on it, and becomes the enchiridion of the new Faxiang School of Buddhist idealism. It is mainly a translation by Xuanzang in 659 of Dharmapala’s commentary on the Trimsika-vijnapti-matrata-siddhi, yet it also contains edited translations of other masters’ works on the same verses. This is the only translation by Xuanzang that is not a direct translation of a text, but instead a selective and evaluative editorial drawing on ten distinct texts. Since Kuiji aligned himself with this text as assuming the role of Xuanzang’s successor, the East Asian tradition has treated the Vijnaptimatratasiddhi-sastra as the pivotal exemplar of Xuanzang’s teachings.
In both style and content, the Vijnaptimatratasiddhi-sastra symbolizes a superior advance over the earlier Lankavatara Sutra, a basic Faxiang School’s canonical text that sets forth quite a few hallmarks of Mahayana position, such as the eight consciousnesses and the tathagatagarbha (Womb of the Buddha-to-be). Instead of bearing the latter’s cryptically aphoristic form, Xuanzang’s treatise is a detailed and coherent analysis, a scholastic apologetics on the doctrine of Consciousness-only. Without any reference to the tathagatagarbha itself, the Vijnaptimatratasiddhi-sastra firmly grounds its pan-consciousness upon Absolute Suchness or the existence of the mind as true reality. Aside from human consciousness, another principle is accepted as real — the so-called suchness, which is the equivalent of the void of the Madhyamika School.

The Vijnaptimatratasiddhi-sastra spells out how there can be a common empirical world for different individuals who ideate or construct particular objects, and who possess distinct bodies and sensory systems. According to Xuanzang, the universal “seeds” in the store consciousness account for the common appearance of things, while particular “seeds” make a description of the differences.

f. Faxiang Doctrines

Being a first and foremost idealistic school of Mahayana Buddhism, the Faxiang School categorically discerns chimerical phenomena manifested in consistent patterns of regularity and continuity; in order to justify this order in which only defiled elements could prevail before enlightenment is attained, it created the tenet of the alaya-vijnana. Sense perceptions are commanded as regular and coherent by a store of consciousnesses, of which one is consciously unaware. Then, sense impressions produce certain configurations in this insensibility that “perfumate” later impressions so that they appear consistent and regular. Each and every single one of beings possesses this seed consciousness, which therefore becomes a sort of collective consciousness that takes control of human perceptions of the world, though this world does not exist at all according to the very tenet. This school’s forerunner had emerged in India roughly the second century AD, yet had its period of greatest productivity in the fourth century, during the time of Asanga and Vasubandha. Following them, the school divided into two branches, the Nyayanusarino Vijnanavadinah (Vijnanavada School of the Logical Tradition) and the Agamanusarino Vijnanavadinah (Vijnanavada School of the Scriptural Tradition), with the former sub-school postulating the standpoints of the logician Dignaga (c. AD 480-540) and his successor, Dharmakirti (c. AD 600?-680?).

This consciousness-oriented school of ideology was largely represented in China by the Faxiang School, called Popsang in Korea, and Hosso in Japan. The radical teachings of Yogacara became known in China primarily through a work of Paramartha, a sixth-century Indian missionary-translator. His rendition of the Mahayana-samparigraha-sastra (Compendium of the Great Vehicle) by Asanga provided a sound base for the Sanlun (Three-Treatise) School, which preceded the Faxiang School as the vehicle of Yogacara thought in China. Faxiang is the Chinese translation of the Sanskrit term dharmalaksana (characteristic of dharma), referring to the school’s basal emphasis on the unique characteristics of the dharmas that make up the world, which appears in human ideation. According to Faxiang doctrines, there are five categories of dharmas: (1) eight mental dharmas, encompassing the five sense consciousnesses, cognition, the cognitive faculty, and the store consciousness; (2) eleven elements relating to appearances or material forms; (3) fifty-one mental capacities or functions, activities, and dispositions; (4) twenty-four situations, processes, and things not associated with the mind — for example, time and becoming; and (5) six non-conditioned or non-created elements — for instance, space and the nature of existence.

Alayaconsciousness is posited as the receptacle of the imprint of thoughts and deeds, thus it is the dwelling of sundry karmic seeds. These “germs” develop into form, feeling, perception, impulse, and consciousness, collectively known as the Five Aggregates. Then ideation gradually takes shape, which triggers off a self or mind against an outer world. Finally comes the awareness of the objects of thought via sense perceptions and ideas. The store consciousness must be purified of its subject-object duality and notions of false existence, and restored to its pure state tantamount to buddhahood, the Absolute Suchness, and the undifferentiated. In line with these three elements of false imagination, right knowledge, and suchness is the three modes in which things respectively are: (1) the mere fictions of false imagination; (2) under certain conditions to relatively exist; and (3) in the perfect mode of being. Corresponding to this threefold version of the modes of existence is the tri-body doctrine of the Buddha — the Dharma Body, the Reward Body, and the Response Body, a creed that was put into its systematic and highly developed theory by Yogacara thinkers. The distinguishing features of the Faxiang School lie in its highlight of meditation and broadly psychological analyses. Seen in this light, it is a fry cry from the other predominant Mahayana stream, Madhyamika, where the stress is entirely upon dialectics and logical arguments.

The base consciousness is interpreted as the container of the karmic impressions or seeds, nourished by us beings in the process of our existence. These seeds, ripening in the course of future circumstances, find the nearest parallel to the present-day understanding of genes. In view of the foregoing, philosophers of this school have constantly essayed to explain in detail how karmic force actually operates and affects us on a concrete, personal level. Comprised in this development of consciousness theory is the concept of conscious justification — phenomena that are presumably external to us can never exist but in intimate association with consciousness itself. Such a notion is commonly referred to as “Mind Only.”

The fundamental early canonical texts that expound Yogacara doctrines are such scriptures as the (Sutra on Understanding Profound and Esoteric Doctrine, the Srimala-sutra (Sutra on the Lion’s Roar of Queen Srimala), and treatises like the Mahayana-samparigraha-sastra, the Prakaranaryavaca-sastra (Acclamation of the Scriptural Teaching), and the Yogacarabhumi, etc.

5. Conclusion

As an early and influential Chinese Buddhist monk, Xuanzang embodies the tensions inherent in Chinese Buddhism: filial piety versus monastic discipline, Confucian orthodoxy versus Mahayana progressivism, etc. Such tensions can be seen not only in his personal legacies, which include the extremely popular Chinese novel based on his travels, Xiyouji (Journey to the West), but also in the career of scholastic Buddhism in China.

For a time during the middle of the Tang Dynasty the Faxiang School achieved a high degree of eminence and popularity across China, but after the passing of Xuanzang and Kuiji the school swiftly declined. One of the factors resulting in this decadence was the anti-Buddhist imperial persecutions of 845. Another likely factor was the harsh criticism of Faxiang by members of the Huayan (Hua-yen) School. In addition, the philosophy of this school, with its abstruse terminology and hairsplitting analysis of the mind and the senses, was too alien to be accepted by the practical-minded Chinese.

6. References and Further Reading

  • Bapat, P. V., and K. A. Nilakanta Sastri, eds. 2500 Years of Buddhism. Delhi: Government of India Press, 1964.
  • Bernstein, Richard. Ultimate Journey: Retracing the Path of an Ancient Buddhist Monk Who Crossed Asia in Search of Enlightenment: Alfred A. Knopf, 2001.
  • Brown, Brian Edward. The Buddha Nature: A Study of the Tathagatagarbha and Alayavijnana. Delhi: Motilal Banarsidass, 1991.
  • Ch’en, Kenneth K. S. Buddhism in China: A Historical Survey. Princeton: Princeton University Press, 1973.
  • Chatterjee, Ashok Kumar. The Yogacara Idealism. Delhi: Motilal Banarsidass, 1987.
  • The Unknown Hsuan-Tsang. Oxford: Oxford University Press, 2001.Edkins, Joseph. Chinese Buddhism: A Volume of Sketches, Historical, Descriptive and Critical. San Francisco: Chinese Materials Center, 1976.
  • Grousset, Rene. In the Footsteps of the Buddha. San Francisco: Chinese Materials Center, 1976.
  • Hwui Li. The life of Hiuen-Tsiang. London: Kegan Paul, Trench, and Trubner, 1911.
  • Kieschnick, John. The Eminent Monk: Buddhist Ideals in Medieval Chinese Hagiography. Honolulu: University of Hawaii Press, 1997.
  • Lan Ji-fu, ed. The Chung-hwa Fo Jian Bai Ke Quan Shu: Religious Affairs Committee of Foguangshan Buddhist Order, 1993.
  • Lusthaus, Dan. Buddhist Phenomenology: A Philosophical Investigation of Yogacara Buddhism and the Ch’eng Wei-shih lun. London: Routledge Curzon, 2002.
  • Nagao, Gadjin M. Madhyamika and Yogacara: A Study of Mahayana Philosophies. Albany: State University of New York Press, 1991.
  • Pachow, W. Chinese Buddhism: Aspects of Interaction and Reinterpretation. Lanham, MD: University Press of America, 1980.
  • Sharf, Robert H. Coming to Terms with Chinese Buddhism: A Reading of the Treasure Store Treatise. Honolulu: University of Hawaii Press, 2002.
  • Waley, Arthur. The Real Tripitaka, and Other Pieces. London: George Allen & Unwin, 1952.
  • Watters, Thomas. On Yuan Chwang’s Travels in India: A. D. 629-645. Delhi: Munshiram Manoharlal, 1996.
  • William, Paul, Mahayana Buddhism (The Doctrinal Foundations). London: Routledge, 1991.
  • Wriggins, Sally Hovey. Xuanzang: A Buddhist Pilgrim on the Silk Road. Boulder: Westview Press, 1996.

Author Information

Der Huey Lee
Email: leederhuey@hotmail.com
Peking University
China

Validity and Soundness

A deductive argument is said to be valid if and only if it takes a form that makes it impossible for the premises to be true and the conclusion nevertheless to be false. Otherwise, a deductive argument is said to be invalid.

A deductive argument is sound if and only if it is both valid, and all of its premises are actually true. Otherwise, a deductive argument is unsound.

According to the definition of a deductive argument (see the Deduction and Induction), the author of a deductive argument always intends that the premises provide the sort of justification for the conclusion whereby if the premises are true, the conclusion is guaranteed to be true as well. Loosely speaking, if the author’s process of reasoning is a good one, if the premises actually do provide this sort of justification for the conclusion, then the argument is valid.

In effect, an argument is valid if the truth of the premises logically guarantees the truth of the conclusion. The following argument is valid, because it is impossible for the premises to be true and the conclusion nevertheless to be false:

Elizabeth owns either a Honda or a Saturn.
Elizabeth does not own a Honda.
Therefore, Elizabeth owns a Saturn.

It is important to stress that the premises of an argument do not have actually to be true in order for the argument to be valid. An argument is valid if the premises and conclusion are related to each other in the right way so that if the premises were true, then the conclusion would have to be true as well. We can recognize in the above case that even if one of the premises is actually false, that if they had been true the conclusion would have been true as well. Consider, then an argument such as the following:

All toasters are items made of gold.
All items made of gold are time-travel devices.
Therefore, all toasters are time-travel devices.

Obviously, the premises in this argument are not true. It may be hard to imagine these premises being true, but it is not hard to see that if they were true, their truth would logically guarantee the conclusion’s truth.

It is easy to see that the previous example is not an example of a completely good argument. A valid argument may still have a false conclusion. When we construct our arguments, we must aim to construct one that is not only valid, but sound. A sound argument is one that is not only valid, but begins with premises that are actually true. The example given about toasters is valid, but not sound. However, the following argument is both valid and sound:

In some states, no felons are eligible voters, that is, eligible to vote.
In those states, some professional athletes are felons.
Therefore, in some states, some professional athletes are not eligible voters.

Here, not only do the premises provide the right sort of support for the conclusion, but the premises are actually true. Therefore, so is the conclusion. Although it is not part of the definition of a sound argument, because sound arguments both start out with true premises and have a form that guarantees that the conclusion must be true if the premises are, sound arguments always end with true conclusions.

It should be noted that both invalid, as well as valid but unsound, arguments can nevertheless have true conclusions. One cannot reject the conclusion of an argument simply by discovering a given argument for that conclusion to be flawed.

Whether or not the premises of an argument are true depends on their specific content. However, according to the dominant understanding among logicians, the validity or invalidity of an argument is determined entirely by its logical form. The logical form of an argument is that which remains of it when one abstracts away from the specific content of the premises and the conclusion, that is, words naming things, their properties and relations, leaving only those elements that are common to discourse and reasoning about any subject matter, that is, words such as “all,” “and,” “not,” “some,” and so forth. One can represent the logical form of an argument by replacing the specific content words with letters used as place-holders or variables.

For example, consider these two arguments:

All tigers are mammals.
No mammals are creatures with scales.
Therefore, no tigers are creatures with scales.

All spider monkeys are elephants.
No elephants are animals.
Therefore, no spider monkeys are animals.

These arguments share the same form:

All A are B;
No B are C;
Therefore, No A are C.

All arguments with this form are valid. Because they have this form, the examples above are valid. However, the first example is sound while the second is unsound, because its premises are false. Now consider:

All basketballs are round.
The Earth is round.
Therefore, the Earth is a basketball.

All popes reside at the Vatican.
John Paul II resides at the Vatican.
Therefore, John Paul II is a pope.

These arguments also have the same form:

All A’s are F;
X is F;
Therefore, X is an A.

Arguments with this form are invalid. This is easy to see with the first example. The second example may seem like a good argument because the premises and the conclusion are all true, but note that the conclusion’s truth isn’t guaranteed by the premises’ truth. It could have been possible for the premises to be true and the conclusion false. This argument is invalid, and all invalid arguments are unsound.

While it is accepted by most contemporary logicians that logical validity and invalidity is determined entirely by form, there is some dissent. Consider, for example, the following arguments:

My table is circular. Therefore, it is not square shaped.

Juan is a bachelor. Therefore, he is not married.

These arguments, at least on the surface, have the form:

x is F;
Therefore, x is not G.

Arguments of this form are not valid as a rule. However, it seems clear in these particular cases that it is, in some strong sense, impossible for the premises to be true while the conclusion is false. However, many logicians would respond to these complications in various ways. Some might insist–although this is controverisal–that these arguments actually contain implicit premises such as “Nothing is both circular and square shaped” or “All bachelors are unmarried,” which, while themselves necessary truths, nevertheless play a role in the form of these arguments. It might also be suggested, especially with the first argument, that while (even without the additional premise) there is a necessary connection between the premise and the conclusion, the sort of necessity involved is something other than “logical” necessity, and hence that this argument (in the simple form) should not be regarded as logically valid. Lastly, especially with regard to the second example, it might be suggested that because “bachelor” is defined as “adult unmarried male”, that the true logical form of the argument is the following universally valid form:

x is F and not G and H;
Therefore, x is not G.

The logical form of a statement is not always as easy to discern as one might expect. For example, statements that seem to have the same surface grammar can nevertheless differ in logical form. Take for example the two statements:

(1) Tony is a ferocious tiger.
(2) Clinton is a lame duck.

Despite their apparent similarity, only (1) has the form “x is a A that is F.” From it one can validly infer that Tony is a tiger. One cannot validly infer from (2) that Clinton is a duck. Indeed, one and the same sentence can be used in different ways in different contexts. Consider the statement:

(3) The King and Queen are visiting dignitaries.

It is not clear what the logical form of this statement is. Either there are dignitaries that the King and Queen are visiting, in which case the sentence (3) has the same logical form as “The King and Queen are playing violins,” or the King and Queen are themselves the dignitaries who are visiting from somewhere else, in which case the sentence has the same logical form as “The King and Queen are sniveling cowards.” Depending on which logical form the statement has, inferences may be valid or invalid. Consider:

The King and Queen are visiting dignitaries. Visiting dignitaries is always boring. Therefore, the King and Queen are doing something boring.

Only if the statement is given the first reading can this argument be considered to be valid.

Because of the difficulty in identifying the logical form of an argument, and the potential deviation of logical form from grammatical form in ordinary language, contemporary logicians typically make use of artificial logical languages in which logical form and grammatical form coincide. In these artificial languages, certain symbols, similar to those used in mathematics, are used to represent those elements of form analogous to ordinary English words such as “all”, “not”, “or”, “and”, and so forth. The use of an artificially constructed language makes it easier to specify a set of rules that determine whether or not a given argument is valid or invalid. Hence, the study of which deductive argument forms are valid and which are invalid is often called “formal logic” or “symbolic logic.”

In short, a deductive argument must be evaluated in two ways. First, one must ask if the premises provide support for the conclusion by examing the form of the argument. If they do, then the argument is valid. Then, one must ask whether the premises are true or false in actuality. Only if an argument passes both these tests is it sound. However, if an argument does not pass these tests, its conclusion may still be true, despite that no support for its truth is given by the argument.

Note: there are other, related, uses of these words that are found within more advanced mathematical logic. In that context, a formula (on its own) written in a logical language is said to be valid if it comes out as true (or “satisfied”) under all admissible or standard assignments of meaning to that formula within the intended semantics for the logical language. Moreover, an axiomatic logical calculus (in its entirety) is said to be sound if and only if all theorems derivable from the axioms of the logical calculus are semantically valid in the sense just described.

For a more sophisticated look at the nature of logical validity, see the articles on “Logical Consequence” in this encyclopedia. The articles on “Argument” and “Deductive and Inductive Arguments” in this encyclopedia may also be helpful.

Author Information

The author of this article is anonymous. The IEP is actively seeking an author who will write a replacement article.

Special Relativity: Proper Time, Coordinate Systems, and Lorentz Transformations

This supplement to the main Time article explains some of the key concepts of the Special Theory of Relativity (STR). It shows how the predictions of STR differ from classical mechanics in the most fundamental way. Some basic mathematical knowledge is assumed.

Table of Contents

  1. Proper Time
  2. The STR Relationship Between Space, Time, and Proper Time
  3. Coordinate Systems
    1. Coordinates as a Mathematical Language for Time and Space
  4. Cartesian Coordinates for Space
  5. Choice of Inertial Reference Frame
  6. Operational Specification of Coordinate Systems for Classical Space and Time
  7. Operational Specification of Coordinate Systems for STR Space and Time
  8. Operationalism
  9. Coordinate Transformations and Object Transformations
  10. Valid Transformations
  11. Velocity Boosts in STR and Classical Mechanics
  12. Galilean Transformation of Coordinate System
  13. Lorentz Transformation of Coordinate System
  14. Time and Space Dilation
  15. The Full Special Theory of Relativity
  16. References and Further Reading

1. Proper Time

EinsteinThe essence of the Special Theory of Relativity (STR) is that it connects three distinct quantities to each other: space, time, and proper time. ‘Time’ is also called coordinate time or real time, to distinguish it from ‘proper time’. Proper time is also called clock time, or process time, and it is a measure of the amount of physical process that a system undergoes. For example, proper time for an ordinary mechanical clock is recorded by the number of rotations of the hands of the clock. Alternatively, we might take a gyroscope, or a freely spinning wheel, and measure the number of rotations in a given period. We could also take a chemical process with a natural rate, such as the burning of a candle, and measure the proportion of candle that is burnt over a given period.

Note that these processes are measured by ‘absolute quantities’: the number of times a wheel spins on its axis, or the proportion of candle that has burnt. These give absolute physical quantities and do not depend upon assigning any coordinate system, as does a numerical representation of space or real time. The numerical coordinate systems we use firstly require a choice of measuring units (meters and seconds, for example). Even more importantly, the measurement of space and real time in STR is relative to the choice of an inertial frame. This choice is partly arbitrary.

Our numerical representation of proper time also requires a choice of units, and we adopt the same units as we use for real time (seconds). But the choice of a coordinate system, based on an inertial frame, does not affect the measurement of proper time. We will consider the concept of coordinate systems and measuring units shortly.

Proper time can be defined in classical mechanics through cyclic processes that have natural periods – for instance, pendulum clocks are based on counting the number of swings of a pendulum. More generally, any natural process in a classical system runs through a sequence of physical states at a certain absolute rate, and this is the ‘proper time rate’ for the system.

In classical physics, two identical types of systems (with identical types of internal construction, and identical initial states) are predicted to have the same proper time rates. That is, they will run through their physical states in perfect correlation with each other.

This holds even if two identical systems are in relative constant motion with respect to each other. For instance, two identical classical clocks would run at the same rate, even if one is kept stationary in a laboratory, while the other is placed in a spaceship traveling at high speed.

This invariance principle is fundamental to classical physics, and it means that in classical physics we can define: Coordinate time = Proper time for all natural systems. For this reason, the distinction between these two concepts of time was hardly recognized in classical physics (although Newton did distinguish them conceptually, regarding ‘real time’ as an absolute temporal flow, and ‘proper time’ as merely a ‘sensible measure’ of real time; see his Scholium).

However, the distinction only gained real significance in the Special Theory of Relativity, which contradicts classical physics by predicting that the rate of proper time for a system varies with its velocity, or motion through space. The relationship is very simple: the faster a system travels through space, the slower its internal processes go. At the maximum possible speed, the speed of light, c, the internal processes in a physical system would stop completely. Indeed, for light itself, the rate of proper time is zero: there is no ‘internal process’ occurring in light. It is as if light is ‘frozen’ in a specific internal state.

At this point, we should mention that the concept of proper time appears more strongly in quantum mechanics than in classical mechanics, through the intrinsically ‘wave-like’ nature of quantum particles. In classical physics, single point-particles are simple things, and do not have any ‘internal state’ that represents proper time, but in quantum mechanics, the most fundamental particles have an intrinsic proper time, represented by an internal frequency. This is directly related to the wave-like nature of quantum particles. For radioactive systems, the rate of radioactive decay is a measure of proper time. Note that the amount of decay of a substance can be measured in an absolute sense. For light, treated as a quantum mechanical particle (the photon), the rate of proper time is zero, and this is because it has no mass. But for quantum mechanical particles with mass, there is always a finite ‘intrinsic’ proper time rate, represented by the ‘phase’ of the quantum wave. Classical particles do not have any correlate of this feature, which is responsible for quantum interference effects and other non-classical ‘wave-like’ behavior.

2. The STR Relationship Between Space, Time, and Proper Time

STR predicts that motion of a system through space is directly compensated by a decrease in real internal processes, or proper time rates. Thus, a clock will run fastest when it is stationary. If we move it about in space, its rate of internal processes will decrease, and it will run slower than an identical type of stationary clock. The relationship is precisely specified by the most profound equation of STR, usually called the metric equation (or line metric equation). The metric equation is:

(1) coordinate systems 1

This applies to the trajectory of any physical system. The quantities involved are:

D is the difference operator.

Dt is the amount of proper time elapsed between two points on the trajectory.

Dt is the amount of real time elapsed between two points on the trajectory.

Dr is the amount of motion through space between two points on the trajectory.

c is the speed of light, and depends on the units we choose for space and time.

The meaning of this equation is illustrated by considering simple trajectories depicted in a space-time diagram.

Figure 1. Two simple space-time trajectories.

Figure 1. Two simple space-time trajectories.

If we start at a initial point on the trajectory of a physical system, and follow it to a later point, we find that the system has covered a certain amount of physical space, Dr, over a certain amount of real time, Dt, and has undergone a certain amount of internal process or proper-time, Dt. As long as we use the same units (seconds) to represent proper time and real time, these quantities are connected as described in Equation (1). Proper time intervals are shown in Figure 1 by blue dots along the trajectories. If these were trajectories of clocks, for example, then the blue dots would represent seconds ticked off by the clock mechanism.

In Figure 1, we have chosen to set the speed of light as 1. This is equivalent to using our normal units for time, i.e. seconds, but choosing the units for space as c meters (instead of 1 meter), where c is the speed of light in meters per second. This system of units is often used by physicists for convenience, and it appears to make the quantity c drop out of the equations, since c = 1. However, it is important to note that c is a dimensional constant, and even if its numerical value is set equal to 1 by choosing appropriate units, it is still logically necessary in Equation 1 for the equation to balance dimensionally. For multiplying an interval of time, Dt, by the quantity c converts from a temporal quantity into a spatial quantity. Equations of physics, just like ordinary propositions, can only identify objects or quantities of the same physical kinds with each other, and the role of c as a dimensional constant remains crucial in Equation (1), for the identity it states to make any sense.

Trajectories in Figure 1

  • Trajectory 1 (green) is for a stationary particle, hence Dr = 0 (it has no motion through space), and putting this value in Equation (1), we find that: Dt = Dt. For a stationary particle, the amount of proper time is equal to the amount of coordinate time.
  • Trajectory 2 (red) is for a moving particle, and Dr > 0. We have chosen the velocity in this example to be: v = c/2, half the speed of light. But: v = Dr/Dt (distance traveled in the interval of time). Hence: Dr = ½cDt. Putting this value into Equation (1), we get: c²Dt² = c²Dt²-(½cDt)², or: Dt = Ö(¾)Dt » 0.87Dt. Hence the amount of proper time is only about 87% of coordinate time. Even though this trajectory is very fast, proper time is still only slowed down a little.
  • Trajectory 3 (black) is for a particle moving at the speed of light, with v = c, giving: Dr = cDt. Putting this in Equation (1), we get: c²Dt² = c²Dt²-(cDt)² = 0. Hence for a light-like particle, the amount of proper time is equal to 0.

Now from the classical point of view, Equation (1) is a surprise – indeed, it seems bizarre! For how can mere motion through space directly and precisely affect the rate of physical processes occurring in a system? We are used to the opposite idea, that motion through space, by itself, has no intrinsic effect on processes. This is at the heart of the classical Galilean invariance or symmetry. But STR breaks this rule.

We can compare this situation with classical physics, where (for linear trajectories) we have two independent equations:

(2.a) Dt = Dt

(2.b) Dr = vDt for some coordinate systems 3 (real numbers)

  • Equation (2.a) just means that the rate of proper time in a system is invariant – and we measure it in the same units as coordinate time, t.
  • Equation (2.b) just means that every particle or system has some finite velocity or speed, v, through space, with v defined by: v = Dr/Dt.

There is no connection here between proper time and spatial motion of the system.

The fact that (2) is replaced by (1) in STR is very peculiar indeed. It means that the rate of internal process in a system like a clock (whether it is a mechanical, chemical, or radioactive clock) is automatically connected to the motion of the clock in space. If we speed up a clock in motion through space, the rate of internal process slows down in a precise way to compensate for the motion through space.

The great mystery is that there is no apparent mechanism for this effect, called time dilation. In classical physics, to slow down a clock, we have to apply some force like friction to its internal mechanism. In STR, the physical process of a system is slowed down just by moving it around. This applies equally to all physical processes. For instance, a radioactive isotope decays more slowly at high speed. And even animals, including human beings, should age more slowly if they move around at high speed, giving rise to the Twin Paradox.

In fact, time dilation was already recognized by Lorentz and Poincare, who developed most of the essential mathematical relationships of STR before Einstein. But Einstein formulated a more comprehensive theory, and, with important contributions by Minkowski, he provided an explanation for the effects. The Einstein-Minkowski explanation appeals to the new concept of a space-time manifold, and interprets Equation (1) as a kind of ‘geometric’ feature of space-time. This view has been widely embraced in 20th Century physics. By contrast, Lorentz refused to believe in the ‘geometric’ explanation, and he thought that motion through space has some kind of ‘mechanical’ effect on particles, which causes processes to slow down. While Lorentz’s view is dismissed by most physicists, some writers have persisted with similar ideas, and the issues involved in the explanation of Equation (1) continue to be of deep interest, to philosophers at least.

But before moving on to the explanation, we need to discuss the concepts of coordinate systems for space and time, which we have been assuming so far without explanation.

3. Coordinate Systems

In physics we generally assume that space is a three dimensional manifold and time is a one dimensional continuum. A coordinate system is a way of representing space and time using numbers to represent points. We assign a set of three numbers, (x,y,z), to characterize points in space, and one number, t, to characterize a point in time. Combining these, we have general space-time coordinates: (x,y,z,t). The idea is that every physical event in the universe has a ‘space-time location’, and a coordinate system provides a numerical description of the system of these possible ‘locations’.

Classical coordinate systems were used by Descartes, Galileo, Newton, Leibniz, and other classical physicists to describe space. Classical space is assumed to be a three dimensional Euclidean manifold. Classical physicists added time coordinates, t, as an additional parameter to characterize events. The principles behind coordinate systems seemed very intuitive and natural up until the beginning of the 20th century, but things changed dramatically with the STR. One of Einstein’s first great achievements was to reexamine the concept of a coordinate system, and to propose a new system suited to STR, which differs from the system for classical physics. In doing this, Einstein recognized that the notion of a coordinate system is theory dependent. The classical system depends on adopting certain physical assumptions of classical physics – for instance, that clocks do not alter their rates when they are moved about in space. In STR, some of the laws underpinning these classical assumptions change, and this changes our very assumptions about how we can measure space and time. To formulate STR successfully, Einstein could not simply propose a new set of physical laws within the existing classical framework of ideas about space and time: he had to simultaneously reformulate the representation of space and time. He did this primarily by reformulating the rules for assigning coordinate systems for space and time. He gave a new system of rules suited to the new physical principles of STR, and reexamined the validity of the old rules of classical physics within this new system.

A key feature Einstein focused on is that a coordinate system involves a system of operational principles, which connect the features of space and time with physical processes or ‘operations’ that we can use to measure those features. For instance, the theory of classical space assumes that there is an intrinsic distance (or length) between points of space. We may take distance itself to be an underlying feature of ‘empty space’. Geometric lines can be defined as collections of points in space, and line segments have intrinsic lengths, prior to any physical objects being placed in space. But of course, we only measure (or perceive) the underlying structure of space by using physical objects or physical processes to make measurements. Typically, we use ‘straight rigid rulers’ to measure distances between points of space; or we use ‘uniform, standard clocks’ to measure the time intervals between moments of time. Rulers and clocks are particular physical objects or processes, and for them to perform their measurement functions adequately, they must have appropriate physical properties.

But those physical properties are the subject of the theories of physics themselves. Classical physics, for example, assumes that ordinary rigid rulers maintain the same length (or distance between the end-points) when they are moved around in space. It also assumes that there are certain types of systems (providing ‘idealized clocks’) that produce cyclic physical processes, and maintain the same temporal intervals between cycles through time, even if we move these systems around in space.

These assumptions are internally consistent with principles of measurement in classical physics. But they are contradicted in STR, and Einstein had to reformulate the operational principles for measuring space and time, in a way that is internally consistent with the new physical principles of STR.

We will briefly describe these new operational principles shortly, but there are some features of coordinate systems that are important to appreciate first.

a. Coordinates as a Mathematical Language for Time and Space

The assignment of a numerical coordinate system for time or space is thought of as providing a mathematical language (using numbers as names) for representing physical things (time and space). In a sense, this language could be ‘arbitrarily chosen’: there are no laws about what names can be used to represent things. But naturally there are features that we want a coordinate system to reflect. In particular, we want the assignment of numbers to directly reflect the concepts of distance between points of space, and the size of intervals between moments of time.

We perform mathematical operations on numbers, and we can subtract two numbers to find the ‘numerical distance’ between them. For numbers are really defined as certain structures, with features such as continuity, and we want to use the structures of number systems to represent structural features of space and time.

For instance, we assume in our fundamental physical theory that any two intervals of time have intrinsic magnitudes, which can be compared to each other. The ‘intrinsic temporal distance’ between two moments, t1 and t2, may be the same as that between two quite different moments, t3 and t4. We naturally want to assign numbers to times so that ordinary numerical subtraction corresponds to the ‘intrinsic temporal distance’ between events. We choose a ‘uniform’ coordinate system for time to achieve this.

coordinate systems 4

Figure 2. A Coordinate system for time gives a mathematical language for a physical thing.
Numbers are used as names for moments of time.

4. Cartesian Coordinates for Space

Time is simple because it is one-dimensional. Three-dimensional space is much more complex. Because space is three dimensional, we need three separate real numbers to represent a single point. Physicists normally choose a Cartesian coordinate system to represent space. We represent points in this system as: r = (x,y,z), where x, y, and z are separate numerical coordinates, in three orthogonal (perpendicular) directions.

The numerical structure with real-number points is denoted in mathematics as (x,y,z). Three dimensional space itself (a physical thing) is denoted as: coordinate systems 5. A Cartesian coordinate system is a special kind of mapping between points of these two structures. It makes the intrinsic spatial distance between two points in E3 be directly reflected by the ‘numerical distance’ between their numerical coordinates in coordinate systems 5.

The numerical distances in coordinate systems 5are determined by a numerical function for length. A line from the origin: (0,0,0), to the point r = (x,y,z), which is called the vector r, has its length given by the Pythagorean formula:

|r| = √(x²+y²+z²).

More generally, for any two points, r1 = (x1, y1, z1), and: r2 = (x2, y2, z2), the distance function is:

|r2 – r1| = √((x2 – x1)²+ (y2 – y1)²+ (z2 – z1)²)

The special feature of this system is that the lengths of lines in the x, y, or z directions alone are given directly by the values of the coordinates. E.g. if: r = (x,0,0), then the vector to r is a line purely in the x-direction, and its length is simply: |r| = x. If r1 = (x1,0,0), and: r2 = (x2,0,0), then the distance between them is just: |r2 – r1| = (x2 – x1 ). As well, a Cartesian coordinate system treats the three directions, x, y, and z, in a symmetric way: the angles between any pair of these directions is the same, 900. For this reason, a Cartesian system can be rotated, and the same form of the general distance function is maintained in the rotated system.

In fact, there are spatial manifolds which do not have any possible Cartesian coordinate system – e.g. the surface of a sphere, regarded as a two dimensional manifold, cannot be represented by using Cartesian coordinates. Such spaces were first studied as geometric systems in the 19th century, and are called non-classical or non-Euclidean geometries. However, classical space is Euclidean, and by definition:

  • Euclidean space can be represented by Cartesian coordinate systems.

We can define alternative, non-Cartesian, coordinate systems for Euclidean space; for instance, cylindrical and spherical coordinate systems are very useful in physics, and they use mixtures of linear or radial distance, and angles, as the numbers to specify points of space. The numerical formulas for distance in these coordinate systems appear quite different from the Cartesian formula. But they are defined to give the same results for the distances between physical points. This is the most crucial feature of the concept of distance in classical physics:

  • Distance between points in classical space (or between two events that occur at the same moment of time) is a physical invariant. It does not change with the choice of coordinate system.

The form of the numerical equation for distance changes with the choice of coordinate system; but this is done deliberately to preserve the physical concept of distance.

5. Choice of Inertial Reference Frame

A second crucial concept is the idea of a reference frame. A reference frame specifies all the trajectories that are regarded as stationary, or at rest in space. This defines the property of remaining at the same place through time. But the key feature of both classical mechanics and STR is that no unique reference frame is determined. Any object that is not accelerating can be regarded as stationary ‘in its own inertial frame’. It defines a valid reference frame for the whole universe. This is the natural reference frame ‘from the point of view’ of the object, or ‘relative to the object’. But there are many possible choices because given any particular reference frame, any other frame, defined to give everything a constant velocity relative to the first frame is also a valid choice.

The class of possible (physically valid) reference frames is objectively determined, because acceleration is absolutely distinguished from constant motion. Any object that is not accelerating may be regarded as defining a valid reference frame. But the specific choice of a reference frame from the range of possibilities is regarded as arbitrary or conventional. This choice must be made before a coordinate system can be defined to represent distances in space and time. Even after we have chosen a reference frame, there are still innumerable choices of coordinate systems. But the reference frame settles the definition of distances between events, which must be defined as the same in any coordinate system relative to a given reference frame.

The idea of the conventionality of the reference frame is partly evident already in the choice of a Cartesian coordinate system: for it is an arbitrary matter where we choose the origin, or point: 0 = (0,0,0), for such a system. It is also arbitrary which directions we choose for the x, y, and z axes – as long as we make them mutually perpendicular. We are free to rotate a given set of axes, x, y, z, to produce a new set, x’, y’, and z’, and this gives another Cartesian coordinate system. Thus, translations and rotations of Cartesian coordinate systems for space still leave us with Cartesian systems.

But there is a further transformation, which is absolutely central to classical physics, and involves both time and space. This is the Galilean velocity transformation, or velocity boost. The essential point is that we need to apply a spatial coordinate system through time. In pure classical geometry, we do not have to take time into account: we just assign a single coordinate system, at a single moment of time. But in physics we need to apply a coordinate system for space at different moments of time. How do we know whether the coordinate system we apply at one moment of time represents the same coordinate system we use at a later moment of time?

The principles of classical physics mean that we cannot measure ‘absolute location in space’ across time. The reason is the fundamental classical principle that the laws of nature do not distinguish between two inertial frames moving relative to each other at a constant speed. This is the classical Galilean principle of ‘relativity of motion’. Roughly stated, this means that uniform motion through space has no effect on physical processes. And if motion in itself does not affect processes, then we cannot use processes to detect motion.

Newton believed that the classical conception of space requires there to be absolute spatial locations through time nonetheless, and that some special coordinate systems or physical objects will indeed be at ‘absolute rest’ in space. But in the context of classical physics, it is impossible to measure whether any object is at absolute rest, or is in uniform motion in space. Because of this, Leibniz denied that classical physics requires any concept of absolute position in space, and argued that only the notion of ‘relative’ or ‘relational’ space’ is required. In this view, only the relative positions of objects with regards to each other are considered real. For Newton, the impossibility of measuring absolute space does not prevent it from being a viable concept, and even a logically necessary concept. There is still no general agreement about this debate between ‘absolute’ and ‘relative’ or ‘relational’ conceptions of space. It is one of the great historical debates in the philosophy of both classical and relativistic physics. However, it is generally accepted that classical physics makes absolute space undetectable. This means, at least, that in the context of classical physics there is no way of giving an operational procedure for determining absolute position (or absolute rest) through time.

However absolute acceleration is detectable. Accelerations are always accompanied by forces. This means that we can certainly specify the class of coordinate systems which are in uniform motion, or which do not accelerate. These special systems are called inertial systems, or inertial frames, or Galilean frames. The existence of inertial frames is a fundamental assumption of classical physics. It is also fundamental in STR, and the notion of an inertial frame is very similar in both theories.

The laws of classical physics are therefore specified for inertial coordinate systems. They are equally valid in any inertial frame. The same holds for the laws of STR. However, the laws for transforming from one inertial frame to another are different for the two theories. To see how this works, we now consider the operational specification of coordinate systems.

6. Operational Specification of Coordinate Systems for Classical Space and Time

In classical physics, we can define an ‘operational’ measuring system, which allows us to assign coordinates to events in space and time.

Classical Time. We imagine measuring time by making a number of uniform clocks, synchronizing them at some initial moment, checking that they all run at exactly the same rates (proper time rates), and then moving clocks to different points of space, where we keep them ‘stationary’ in a chosen inertial frame. We subsequently measure the times of events that occur at the various places, as recorded by the different clocks at those places.

Of course, we cannot assume that our system of clocks is truly stationary. The entire system of clocks placed in uniform motion would also define a valid inertial frame. But the laws of classical physics mean that clocks in uniform inertial motion run at exactly the same rates, and so the times recoded for specific events turn out to be exactly the same, on the assumptions of the classical theory, for any such system of clocks.

Classical Space. We imagine measuring space by constructing a set of rigid measuring rods or rulers of the same length, which we can (imaginatively at least) set up as a grid across space, in an inertial frame. We keep all the rulers stationary relative to each other, and we use them to measure the distances between various events. Again, the main complication is that we cannot determine any absolutely stationary frame for the grid of rulers, and we can set up an alternative system of rulers which is in relative motion. This results in assigning different ‘absolute velocities’ to objects, as measured in two different frames. However, on the assumptions of the classical theory, the relative distances between any two objects or events, taken at any given moment of time, is measured to be the same in any inertial frame. This is because, in classical physics, uniform motion in itself does not alter the lengths of material objects, or the forces between systems of objects. (Accelerations do alter lengths).

7. Operational Specification of Coordinate Systems for STR Space and Time

In STR, the situation is in many ways very similar to classical physics: there is still a special concept of inertial frames, acceleration is absolutely detectable, and uniform velocity is undetectable. According to STR, the laws of physics still are invariant with regard to uniform motion in space, very much like the classical laws.

We also specify operational definitions of inertial coordinate systems in STR in a similar way to classical physics. However, the system sketched above for assigning classical coordinates fails, because it is inconsistent with the physical principles of STR. Einstein was forced to reconstruct the classical system of measurement to obtain a system which is internally consistent with STR.

STR Time. In STR, we can still make uniform clocks, which run at the same rates when they are held stationary relative to each other. But now there is a problem synchronizing them at different points of space. We can start them off synchronized at a particular common point; but moving them to different points of space already upsets their synchronization, according to Equation (1).

However, while synchronizing distant clocks is a problem, they nonetheless run at the same intrinsic rates as each other when held in the same inertial frame. And we can ensure two clocks are in a common inertial frame as long as we can ensure that they maintain the same distance from each other. We see how to do this next.

Given we have two clocks maintained at the same distance from each other, Einstein showed that there is indeed a simple operational procedure to establish synchronization. We send a light signal from Clock 1 to Clock 2, and reflect it back to Clock 1. We record the time it was sent on Clock 1 as t0, and the time it was received again as a later time, t2. We also record the time it was received at Clock 2 as t1’ on Clock 2. Now symmetry of the situation requires that, in the inertial frame of Clock 1, we must assume that the light signal reached Clock 2 at a moment halfway between t0 and t1, i.e. at the time: t1 = ½(t2 – t0). This is because, by symmetry, the light signal must take equal time traveling in either direction between the clocks, given that they are kept at a constant distance throughout the process, and they do not accelerate. (If the light signal took longer to travel one way than the other, then light would have to move at different speeds in different directions, which contradicts STR).

Hence, we must resynchronize Clock 2 to make: t1’ = t1. We simply set the hands on Clock 2 forwards by: (t1 – t1’), i.e. by: ½(t2 – t0) – t1’. (Hence, the coordinate time on Clock 2 at t1’ is changed to: t1’ + (½(t2 – t0) – t1’) = ½(t2 – t0) = t1.)

This is sometimes called the ‘clock synchronization convention’, and some philosophers have argued about whether it is justified. But there is no real dispute that this successfully defines the only system for assigning simultaneity in time, in the chosen reference frame, which is consistent with STR.

Some deeper issues arise over the notion of simultaneity that it seems to involve. From the point of view of Clock 1, the moment recorded at: t1 = ½(t2 – t0) must be judged as ‘simultaneous’ with the moment recorded at t1’ on Clock 2. But in a different inertial frame, the natural coordinate system will alter the apparent simultaneity of these two events, so that simultaneity itself is not ‘objective’ in STR, except relative to a choice of inertial frame. We will consider this later.

STR Space. In STR, we can measure space in a very similar way as in classical physics. We imagine constructing a set of rigid measuring rods or rulers, which are checked to be the same length in the inertial frame of Clock 1, and we extend this out into a grid across space. We have to move the rulers around to start with, but when we have set up the grid, we keep them all stationary in the chosen inertial frame of Clock 1.

We then use this grid of stationary measuring rods to measure the distances between various events. The main assumption is that identical types of measuring rods (which are the same lengths when we originally compare them at rest with Clock 1), maintain the same lengths after being moved to different places (and being made stationary again with regard to Clock 1). This feature is required by STR.

The main complication, once again, is that we cannot determine any absolutely stationary frame for the grid of rulers. We can set up an alternative system of rulers, which are all in relative motion in a different inertial frame. As in classical physics, this results in assigning different ‘absolute velocities’ to most trajectories in the two different frames. But in this case there is a deeper difference: on the assumptions of STR, the lengths of measuring rods alter according to their velocities. This is called space dilation, and it is the counterpart of time dilation.

Nonetheless, Einstein showed that perfectly sensible operational definitions of coordinate measurements for length, as well as time, are available in STR. But both simultaneity and length become relative to specified inertial frames.

It is this confusing conceptual problem, which involves the theory dependence of measurement, that Einstein first managed to unravel, as the prelude to showing how to radically reconstruct classical physics.

8. Operationalism

Unraveling this problem requires us to specify ‘operational principles’ of measurement, but this does not require us to embrace an operational theory of meaning. The latter is a form of positivism, and it holds that the meaning of ‘time’ or ‘space’ in physics is determined entirely by specifying the procedures for measuring time or space. This theory is generally rejected by philosophers and logicians, and it was rejected by Einstein himself in his mature work. According to operationalism, STR changes the meanings of the concepts of space and time from the classical conception. However, many philosophers would argue that ‘time’ and ‘space’ have a meaning for us which is essentially the same as for Galileo and Newton, because we identify the same kinds of things as time and space; but relativity theory has altered our scientific beliefs about these things – just as the discovery that water is H2O has altered our understanding of the nature of water, without necessarily altering the meaning of the term ‘water’. This semantic dispute is ongoing in the philosophy of science. Having clarified these basic ideas of coordinate systems and inertial frames, we now turn back to the notion of transformations between coordinate systems for different inertial frames.

9. Coordinate Transformations and Object Transformations

Physics uses two different concepts of transformations. It is important to distinguish these carefully.

  • Coordinate transformations: Taking the description of a given process (such as a trajectory), described in one coordinate system, and transforming to its description in an alternative coordinate system.
  • Object transformations: Taking a given process, described in a given coordinate system, and transforming it into a different process, described in the same coordinate system as the original process.

The difference is illustrated in the following diagram for the simplest kind of transformation, translation of space.

coordinate systems 6

Figure 3. Object, Coordinate, and Combined Transformations.

  • The transformations in Figure 3 are simple space translations.
  • Figure 3 (B) shows an object transformation. The original trajectory (A) is moved in space to the right, by 4 units. The new coordinates are related to the original coordinates by: xnew particle ® xoriginal particle + 4.
  • Figure 3 (C) shows a coordinate transformation: the coordinate system is moved to the left by 4 units. The new coordinate system, x’, is related to the original system, x, by: x’original particle = xoriginal particle + 4. The result ‘looks’ the same as (B).
  • Figure 3 (D) shows a combination of the object transformation (B) and a coordinate transformation, which is the inverse of that in (C), defined by: x’’original particle = xoriginal particle – 4. The result of this looks the same as the original trajectory in (A), because the coordinate transformation appears to ‘undo’ the effect of the object transformation.

10. Valid Transformations

There is an intimate connection between these two kinds of transformations. This connection provides the major conceptual apparatus of modern physics, through the concept of physical symmetries, or invariance principles, and valid transformations.

The deepest features of laws or theories of physics are reflected in their symmetry properties, which are also called invariances under symmetry transformations. Laws or theories can be understood as describing classes of physical processes. Physical processes that conform to a theory are valid physical processes of that theory. Of course, not all (logically) possible processes that we can imagine are valid physical processes of a given theory. Otherwise the theory would encompass all possible processes, and tell us nothing about what is physically possible, as opposed to what is logically conceivable.

Symmetries of a theory are described by transformations that preserve valid processes of the theory. For instance, time translation is a symmetry of almost all theories. This means that if we take a valid process, and transform it, intact, to an earlier or later time, we still have a valid process. This is equivalent to simply setting the ‘temporal origin’ of the process to a later or earlier time.

Other common symmetries are:

  • Rotations in space (if we take a valid process, and rotate it to another direction in space, we end up with another valid process).
  • Translations in space (if we take a valid process, and move it to another position in space, we end up with another valid process).
  • Velocity transformations (if we take a valid process, and give it uniform velocity boost in some direction in space, we end up with another valid process).

These symmetries are valid both in classical physics and in STR. In classical physics, they are called Galilean symmetries or transformations. In STR they are called Lorentz transformations. However, although the symmetries are very similar in both theories, the Lorentz transformations in STR involve features that are not evident in the classical theory. In fact, this difference only emerges for velocity boosts. Translations and rotations are identical in both theories. This is essentially because velocity boosts in STR involve transformations of the connection between proper time and ordinary space and time, which does not appear in classical theory.

The concept of valid coordinate transformations follows directly from that of valid object transformations. The point is that when we make an object transformation, we begin with a description of a process in a coordinate system, and end up with another description, of a different process, given in the same coordinate system. Now instead of transforming the processes involved, we can do the inverse, and make a transformation of the coordinate system, so that we end up with a new coordinate description of the original process, which looks exactly the same as the description of the transformed process in the original coordinate system.

This gives an alternative way of regarding the process, and its transformed image: instead of taking them as two different processes, we can take them as two different coordinate descriptions of the same process.

This is connected to the idea that certain aspects of the coordinate system are arbitrary or conventional. For instance, the choice of a particular origin for time or space is regarded as conventional: we can move the origins in our coordinate description, and we still have a valid system. This is only possible because the corresponding object transformations (time and space translations) are valid physical transformations.

Physicists tend to regard coordinate transformations and valid object transformations interchangeably and somewhat ambiguously, and the distinction between the two is often blurred in applied physics. While this doesn’t cause practical problems, it is important when learning the concepts of the theory to distinguish the two kinds of transformations clearly.

11. Velocity Boosts in STR and Classical Mechanics

STR and classical mechanics have exactly the same symmetries under translations of time and space, and rotations of space. They also both have symmetries under velocity boosts: both theories hold that, if we take a valid physical process, and give it a uniform additional velocity in some direction, we end with another valid physical process. But the transformation of space and time coordinates, and of proper time, are different for the two theories under a velocity boost. In classical physics, it is called a Galilean transformation, while for STR it is called a Lorentz transformation.

To see how the difference appears, we can take a stationary trajectory, and consider what happens when we apply a velocity boost in either theory.

coordinate systems 7

Figure 4. Classical and STR Velocity Boosts give different results.

In both diagrams, the green line is the original trajectory of a stationary particle, and it looks exactly the same in STR and classical mechanics. Proper time events (marked in blue) are equally spaced with the coordinate time intervals in both cases.

If we transform the classical trajectory by giving the particle a velocity (in this example, v = c/2) towards the right, the result (red line) is very simple: the proper time events remain equally spaced with coordinate time intervals. The same sequence of proper time events takes the same amount of coordinate time to complete. The classical particle moves a distance: Dx = v.Dt to the right, where Dt is the coordinate time duration of the original process.

But when we transform the STR particle, a strange thing happens: the proper time events become more widely spaced than the coordinate time intervals, and the same sequence of proper time events takes more coordinate time to complete. The STR particle moves a distance: Dx’ = v.Dt’ to the right, where: Dt’ > Dt, and hence: Dx’ > Dx.

The transformations of the coordinates of the (proper time) points of the original processes are shown in the following table.

coordinate systems 8

Table 1. Example of Velocity Transformation.

We can work out the general formula for the STR transformations of t’ and x’ in this example by using Equation (1). This requires finding a formula for the transformation of time-space coordinates:

(t, 0) ® (t’, x’)

We obtain this by applying Equation (1) in the (t’,x’) coordinate system, giving:

(1’) coordinate systems 9

It is crucial that this equation retains the same form under the Lorentz equation. In this special case, we have the additional facts that:

(i) Dt = Dt, and:(ii) Dx’ = vDt’

We substitute (i) and (ii) in (1’) to get:

coordinate systems 10

This rearranges to give:

coordinate systems 11 and: coordinate systems 12

We can see that: Dx’/Dt’ = v. This is a special case of a Lorentz transformation for this simplest kind of trajectory. Note that if we think of this as a coordinate transformation which generates the appearance of this object transformation, we need to move the new coordinate system in the opposite direction to the motion of the object. I.e. if we define a new coordinate system, (x’,t’), moving at –v (i.e. to the left) with regard to the original (x,t) system, then the original trajectory (which appeared stationary in (x,t)) will appear to be moving with velocity +v (to the left) in (x’,t’). In general, object transformations correspond the inverse coordinate transformations.

12. Lorentz Transformations for Velocity Boost V in the x-direction

The previous transformations is only for points on the special line where: x = 0. More generally, we want to work out the formulae for transforming points anywhere in the coordinate system:

(t, x) ® (t’, x’)

The classical formulas are Galilean transformations, and they are very simple.

Galilean Velocity Boost:

(t, x) ® (t, x+vt)t’ = t

x’ = x+vt

The STR formulas are more general Lorentz transformations. The Galilean transformation is simple because time coordinates are unchanged, so that: t = t’. This means that simultaneity in time in classical physics is absolute: it does not depend upon the choice of coordinate system. We also have that distance between two points at a given moment of time is invariant, because if: x2 -x1 = Dx, then: x’2 -x’1 = (x2+vt) – (x1-vt) = Dx. Ordinary distance in space is the crucial invariant quantity in classical physics.

But in STR, we have a complex interdependence of time and space coordinates. This is seen because the transformation formulas for both t’ and x’ are functions of both x and t. I.e. there are functions f and g such that:

t’ = f(x,t) and: x’ = g(x,t)

These functions represent the Lorentz transformations. To give stationary objects a velocity V in the x-direction, these general functions are found to be Lorentz Transformation, and the factor coordinate systems 13 is called γ, letting us write these equations more simply as:

Lorentz Transformations: t’ = γ(t+Vx/c2) and: x’ = γ(x+Vt)

We can equally consider the corresponding coordinate transformation, which would generate the appearance of this object transformation in a new coordinate system. It is essentially the same as the object transformation – except it must go in the opposite direction. For the object transformation, which increases the velocity of stationary particles by the speed V in the x direction, corresponds to moving the coordinate system in the opposite direction. I.e. if we define a new coordinate system, and call it (x’,t’), and place this in motion with a speed –V (i.e. V in the negative-x-direction), relative to the (x,t) coordinate system, then the original stationary trajectories in (x,t)-coordinates will appear to have speed V in the new (x’,t’) coordinates.

Because the Lorentz transformation of processes leaves us with valid STR processes, the Lorentz transformation of a STR coordinate system leaves us with a valid coordinate system. In particular, the form of Equation (1) is preserved by the Lorentz transformation, so that we get: coordinate systems 14. This can be checked by substituting the formulas for t’ and x’ back into this equation, and simplifying; the resulting equation turns out to be identical to Equation (1).

13. Galilean Transformation of Coordinate System

One useful way to visualize the effect of a transformation is to make an ordinary space-time diagram, with the space and time axes drawn perpendicular to each other as usual, and then to draw the new set of coordinates on this diagram. In these diagrams, the space axes represent points which are measured to have the same time coordinates, and similarly, the time axes represent points which are measured to have the same space coordinates. When we make a velocity boost, these lines of simultaneity and same-position are altered.

This is shown first for a Galilean velocity boost, where in fact the lines of simultaneity remain the same, but the lines representing position are rotated:

coordinate systems 15

Figure 5. Galilean Velocity Boost.

  • In Figure 5, the (green) horizontal lines are lines of absolute simultaneity. They have the same coordinates in both t and t’.
  • The (blue) vertical lines are lines with the same x-coordinates.
  • The (gray) slanted lines are lines with the same x’-coordinates.
  • The spacing of the x’ coordinates is the same as the x coordinates, which means that relative distances between points are not affected.
  • The solid black arrow represents a stationary trajectory in (x,t).
  • An object transformation of +V moves it onto the green arrow, with velocity: v = c/2 in the (x,t)-system.
  • A coordinate transformation of +V, to a system (x’,t’) moving at +V with regard to (x,t), makes this green arrow appears stationary in the (x’,t’) system.
  • This coordinate transformation makes the black arrow appear to be moving at –V in (x’,t’) coordinates.

14. Lorentz Transformation of Coordinate System

In a Lorentz velocity boost, the time and space axes are both rotated, and the spacing is also changed.

coordinate systems 16

Figure 6. Rotation of Space and Time Coordinate Axes by a Lorentz Velocity Boost. Some proper time events are marked in blue.

To obtain the (x’,t’)-coordinates of a point defined in (x,t)-coordinates, we start at that point, and: (i) move parallel to the green lines, to find the intersection with the (red) t’-axis, which is marked with the x’-coordinates; and: (ii) move parallel to the red lines, to find the intersection with the (green) x’-axis, which is marked with the t’-coordinates. The effects of this transformation on a solid rod or ruler extending from x=0 to x=1, and stationary in (x,t), is shown in more detail below.

coordinate systems 17

Figure 7. Lorentz Velocity Boost. Magnified view of Figure 6 shows time and space dilation. The gray rectangle represents a unit of the space-time path of a rod (Rod 1) stationary in (x,t). The dark green lines represent a Lorentz (object) transformation of this trajectory, which is a second rod (Rod 2) moving at V in (x,t) coordinates. This is a unit of the space-time path of a stationary rod in (x’,t’).

15. Time and Space Dilation

Figure 7 shows how both time and space dilation effects work. To see this clearly, we need to consider the volumes of space-time that an object like a rod traces out.

  • The (gray) rectangle PQRS represents a space-time volume, for a stationary rod or ruler in the original frame. It is 1-meter long in original coordinates (Dx = 1), and is shown over 1 unit of proper time, which corresponds to one unit of coordinate time (Dt = 1).
  • The rectangle PQ’R’S’ (green edges) represents a second space-time volume, for a rod which appears to be moving in the original frame. This is how the space-time volume of the first rod transforms under a Lorentz transformation.
  • We may interpret the transformation as either: (i) a Lorentz velocity boost of the rod by velocity +V (object transformation), or equally: (ii) a Lorentz transformation to a new coordinate system, (x’,t’), moving at –V with regard to (x,t). Note that:
  • The length of the moving rod measured in x is now shorter than the stationary rod: Dx = 1/γ. This is space dilation.
  • The coordinate time between proper time events on the moving rod measured in t is now longer than for the stationary rod (Dt = γ). This is time dilation.

The need to fix the new coordinate system in this way can be worked out by considering the moving rod from the point of view of its own inertial system.

  • As viewed in its own inertial coordinate system, the green rectangle PQ’R’S’ appears as the space-time boundary for a stationary rod. In this frame:
  • PS’ appears stationary: it is a line where: x’ = 0.
  • PQ’ appears as a line of simultaneity, i.e. it is a line where: t’=0.
  • R’S’ is also a line of simultaneity in t’.
  • Points on R’S’ must have the time coordinate: t’=1, since it is at the time t’ when one unit of proper time has elapsed, and for the stationary object, Dt’ = Dt.
  • The length of PQ’ must be one unit in x’, since the moving rod appears the same length in its own inertial frame as the original stationary rod did.

Time and space dilation are often referred to as ‘perspective effects’ in discussions of STR. Objects and processes are said to ‘look’ shorter or longer when viewed in one inertial frame rather than in another. It is common to regard this effect as a purely ‘conventional’ feature, which merely reflects a conventional choice of reference frame. But this is rather misleading, because time and space dilation are very real physical effects, and they lead to completely different types of physical predictions than classical physics.

However, the symmetrical properties of the Lorentz transformation makes it impossible to use these features to tell whether one frame is ‘really moving’ and another is ‘really stationary’. For instance, if objects get shorter when they are placed in motion, then why do we not simply measure how long objects are, and use this to determine whether they are ‘really stationary’? The details in Figure 7 reveal why this does not work: the space dilation effect is reversed when we change reference frames. That is:

  • Measured in Frame 1, i.e. in (x,t)-coordinates, the stationary object (Rod 1) appears longer than the moving object (Rod 2). But:
  • Measured in Frame 2, using (x’,t’)-coordinates, the moving object (Rod 2) appears stationary, while the originally stationary object (Rod 1) moves. But now the space dilation effect appears reversed, and Rod 2 appears longer than Rod 1!

The reason this is not a real paradox or inconsistency can be seen from the point of view of Frame 2, because now Rod 1 at the moment of time t’ = 0 stretches from the point P to Q’’, rather than from P to Q, as in Frame 1. The line of simultaneity alters in the new frame, so that we measure the distance between a different pair of space-time events. And PQ’’ is now found to be shorter than PQ’, which is the length of Rod 2 in Frame 2.

There is no answer, within STR, as to which rod ‘really gets shorter’. Similarly there is no answer as to which rod ‘really has faster proper time’ – when we switch to Frame 2, we find that Rod 2 has a faster rate of proper time with regard to coordinate time, reversing the time dilation effect apparent in Frame 1. In this sense, we could consider these effects a matter of ‘perspective’ – although it is more accurate to say that in STR, in its usual interpretation, there are simply no facts about absolute length, or absolute time, or absolute simultaneity, at all.

However, this does not mean that time and space dilation are not real effects. They are displayed in other situations where there is no ambiguity. One example is the twins’ paradox, where proper time slows down in an absolute way for a moving twin. And there are equally real physical effects resulting from space dilation. It is just that these effects cannot be used to determine an absolute frame of rest.

16. The Full Special Theory of Relativity

So far, we have only examined the most basic part of STR: the valid STR transformations for space, time, and proper time, and the way these three quantities are connected together. This is the most fundamental part of the theory. It represents relativistic kinematics. It already has very powerful implications. But the fully developed theory is far more extensive: it results from Einstein’s idea that the Lorentz transformations represent a universal invariance, applicable to all physics. Einstein formulated this in 1905: “The laws of physics are invariant under Lorentz transformations (when going from one inertial system to another arbitrarily chosen inertial system)”. Adopting this general principle, he explored the ramifications for the concepts of mass, energy, momentum, and force.

The most famous result is Einstein’s equation for energy: E = mc². This involves the extension of the Lorentz transformation to mass. Einstein found that when we Lorentz transform a stationary particle with original rest-mass m0, to set it in motion with a velocity V, we cannot regard it as maintaining the same total mass. Instead, its mass becomes larger: m = γm0, with γ defined as above. This is another deep contradiction with classical physics.

Einstein showed that this requires us to reformulate our concept of energy. In classical physics, kinetic energy is given by: E = ½ mv². In STR, there is a more general definition of energy, as: E = mc². A stationary particle then has a basic ‘rest mass energy’ of m0c². When it is set in motion, its energy is increased purely by the increase in mass, and this is kinetic energy. So we find in STR that:

Kinetic Energy = mc²-m0c² = (γ-1)m0

For low velocities, with: v << c, it is easily shown that: (γ-1)c² is very close to ½v², so this corresponds to the classical result in the classical limit of low energies. But for high energies, the behavior of particles is very different. The discovery that there is an underlying energy of m0c² simply from rest-mass is what made nuclear reactors and nuclear bombs possible: they convert tiny amounts of rest mass into vast amounts of thermal energy.

The main application Einstein explored first was the theory of electromagnetism, and his most famous paper, in which he defined STR in 1905, is called “Electrodynamics of Moving Bodies”. In fact, Lorentz, Poincaré and others already knew that they needed to apply the Lorentz transformation to Maxwell’s theory of classical electromagnetism, and had succeeded a few years earlier in formulating a theory which is extremely similar to Einstein’s in its predictions. Some important experimental verification of this was also available before Einstein’s work (most famously, the Michelson-Morley experiment). But his theory went much further. He radically reformulated the concepts that we use to analyse force, energy, momentum, and so forth. In this sense, his new theory was primarily a philosophical and conceptual achievement, rather than a new experimental discovery of the kind traditionally regarded as the epitome of empirical science.

He also attributed his universal ‘principle of relativity’ to the very nature of space and time itself. With important contributions by Minkowski, this gave rise to the modern view that physics is based on an inseparable combination of space and time, called space-time. Minkowski treated this as a kind of ‘geometric’ entity, based on regarding our Equation (1) as a ‘metric equation’ describing the geometric nature of space-time. This view is called the ‘geometric explanation’ of relativity theory, and this approach led Einstein even deeper into modern physics, when he applied this new conception to the theory of gravity, and discovered a generalised theory of space-time.

The nature of this ‘geometric explanation’ of the connection between space, time, and proper time is one of the most fascinating topics in the philosophy of physics. But it involves the General Theory of Relativity, which goes beyond STR.

17. References and Further Reading

The literature on relativity and its philosophical implications is enormous – and still growing rapidly. The following short selection illustrates some of the range of material available. Original publication dates are in brackets.

  • Bondi, Hermann. 1962. Relativity and Common Sense. Heinemann Educational Books.
    • A clear exposition of basic relativity theory for beginners, with a minimum of equations. Contains useful discussions of the Twins Paradox and other topics.
  • Einstein, Albert. 1956 (1921). The Meaning of Relativity. (The Stafford Little Lectures of Princeton University.) Princeton University Press.
    • Einstein’s account of the principles of his famous theory. Simple in parts, but mainly a fairly technical summary, requiring a good knowledge of physics.
  • Epstein, Lewis Carroll. 1983. Relativity Visualized. Insight Press. San Francisco.
    • A clear, simple, and rather unique introduction to relativity theory for beginners. Epstein illustrates the functional relationships between space, time, and proper time in a clear and direct way, using novel geometric presentations.
  • Grunbaum, Adolf. 1963. Philosophical Problems of Space and Time. Knopf, New York.
    • A collection of original studies by one of the seminal philosophers of relativity theory, this covers an impressive range of issues, and remains an important starting place for many recent philosophical studies.
  • Lorentz, H. A., A. Einstein, H. Minkowski and H. Weyl. 1923. The Principle of Relativity. A Collection of Original Memoirs on the Special and General Theory of Relativity. Trans. W. Perrett and G.B. Jeffery. Methuen. London.
    • These are the major figures in the early development of relativity theory, apart from Poincare, who simultaneously with Lorentz formulated the ‘pre-relativistic’ version of electromagnetic theory, which contains most of the mathematical basis of STR, shortly before Einstein’s paper of 1905. While Einstein deeply admired Lorentz – despite their permanent disagreements about STR – he paid no attention to Poincare.
  • Newton, Isaac. 1686. Mathematical Principles of Natural Philosophy.
    • Every serious student should read Newton’s “Definitions” and “Scholium”, where he introduces his concepts of time and space.
  • Planck, Max. 1998 (1909). Eight Lectures on Theoretical Physics.
    • Planck elegantly summarizes the revolutionary discoveries that characterized the first decade of 20th Century physics. Lecture 8 is one of the earliest accounts of relativity theory. This classic work shows Planck’s penetrating vision of many fundamental themes that soon came to dominate physics.
  • Reichenbach, Hans. 1958 (1928). The Philosophy of Space and Time. Dover, New York.
    • An influential early study of the concepts of space and time, and the relativistic revolution. Although Reichenbach’s approach is underpinned by his positivistic program, which is rejected today by philosophers, the central issues are of continuing interest.
  • Russell, Bertrand. 1977 (1925). ABC of Relativity. Unwin Paperbacks, London.
    • A early popular exposition of the meaning of relativity theory by one of the most influential 20th century philosophers, this presents key philosophical issues with Russell’s characteristic simplicity.
  • Schlipp, P.A. (Ed.) 1949. Albert Einstein: Philosopher-Scientist. The Library of Living Philosophers.
    • A classic collection of papers on Einstein and relativity theory.
  • Spivak, M. 1979. A Comprehensive Introduction to Differential Geometry. Publish or Perish. Berkeley.
    • An advanced mathematical introduction to the modern approach to differentiable manifolds, which developed in the 1960’s. Philosophical interest lies in the detailed semantics for coordinate systems, and the generalizations of concepts of geometry, such as the tangent vector.
  • Tipler, Paul A. 1982. Physics. Worth Publishers Ltd.
    • An extended introductory textbook for undergraduates, Chapter 35, “Relativity Theory”, is a typical modern introduction to relativity theory.
  • Torretti, Roberto. 1983/1996. Relativity and Geometry. Dover, New York.
    • An excellent source for the specialist philosopher, summarizing history and concepts of both the Special and General Theories, with extended bibliography. Combines excellent technical summaries with detailed historical surveys.
  • Wangsness, Roald K. 1979. Electromagnetic Fields. John Wiley & Sons Ltd.
    • This is a typical advanced modern undergraduate textbook on electromagnetism. The final chapter explains how the structure of electrodynamics is derived from the principles of STR.

Back to the main “Time” article.

Author Information

Andrew Holster
Email: ATASA@clear.net.nz
New Zealand

Rudolf Carnap (1891—1970)

carnap02Rudolf Carnap, a German-born philosopher and naturalized U.S. citizen, was a leading exponent of logical positivism and was one of the major philosophers of the twentieth century. He made significant contributions to philosophy of science, philosophy of language, the theory of probability, inductive logic and modal logic. He rejected metaphysics as meaningless because metaphysical statements cannot be proved or disproved by experience. He asserted that many philosophical problems are indeed pseudo-problems, the outcome of a misuse of language. Some of them can be resolved when we recognize that they are not expressing matters of fact, but rather concern the choice between different linguistic frameworks. Thus the logical analysis of language becomes the principal instrument in resolving philosophical problems. Since ordinary language is ambiguous, Carnap asserted the necessity of studying philosophical issues in artificial languages, which are governed by the rules of logic and mathematics. In such languages, he dealt with the problems of the meaning of a statement, the different interpretations of probability, the nature of explanation, and the distinctions between analytic and synthetic, a priori and a posteriori, and necessary and contingent statements.

Table of Contents

  1. Life
  2. The Structure of Scientific Theories
  3. Analytic and Synthetic
  4. Meaning and Verifiability
  5. Probability and Inductive Logic
  6. Modal Logic and the Philosophy of Language
  7. Philosophy of Physics
  8. Carnap’s Heritage
  9. References and Further Reading
    1. Carnap’s Works
    2. Other Sources

1. Life

Rudolf Carnap was born on May 18, 1891, in Ronsdorf, Germany. In 1898, after his father’s death, his family moved to Barmen, where Carnap studied at the Gymnasium. From 1910 to1914 he studied philosophy, physics and mathematics at the universities of Jena and Freiburg. He studied Kant under Bruno Bauch and later recalled how a whole year was devoted to the discussion of The Critique of Pure Reason. Carnap became especially interested in Kant’s theory of space. Carnap took three courses from Gottlob Frege in 1910, 1913 and 1914. Frege was professor of mathematics at Jena. During those courses, Frege expounded his system of logic and its applications in mathematics. However, Carnap’s principal interest at that time was in physics, and by 1913 he was planning to write his dissertation on thermionic emission. His studies were interrupted by World War I and Carnap served at the front until 1917. He then moved to Berlin and studied the theory of relativity. At that time, Albert Einstein was professor of physics at the University of Berlin.

After the war, Carnap developed a new dissertation, this time on an axiomatic system for the physical theory of space and time. He submitted a draft to physicist Max Wien, director of the Institute of Physics at the University of Jena, and to Bruno Bauch. Both found the work interesting, but Wien told Carnap the dissertation was pertinent to philosophy, not to physics, while Bauch said it was relevant to physics. Carnap then chose to write a dissertation under the direction of Bauch on the theory of space from a philosophical point of view. Entitled Der Raum (Space), the work was clearly influenced by Kantian philosophy. Submitted in 1921, it was published the following year in a supplemental issue of Kant-Studien.

Carnap’s involvement with the Vienna Circle developed over the next few years. He met Hans Reichenbach at a conference on philosophy held at Erlangen in 1923. Reichenbach introduced him to Moritz Schlick, then professor of the theory of inductive science at Vienna. Carnap visited Schlick—and the Vienna Circle—in 1925 and the following year moved to Vienna to become assistant professor at the University of Vienna. He became a leading member of the Vienna Circle and, in 1929, with Hans Hahn and Otto Neurath, he wrote the manifesto of the Circle.

In 1928, Carnap published The Logical Structure of the World, in which he developed a formal version of empiricism arguing that all scientific terms are definable by means of a phenomenalistic language. The great merit of the book was the rigor with which Carnap developed his theory. In the same year he published Pseudoproblems in Philosophy asserting the meaninglessness of many philosophical problems. He was closely involved in the First Conference on Epistemology, held in Prague in 1929 and organized by the Vienna Circle and the Berlin Circle (the latter founded by Reichenbach in 1928). The following year, he and Reichenbach founded the journal Erkenntnis. At the same time, Carnap met Alfred Tarski, who was developing his semantical theory of truth. Carnap was also interested in mathematical logic and wrote a manual of logic, entitled Abriss der Logistik (1929).

In 1931, Carnap moved to Prague to become professor of natural philosophy at the German University. It was there that he made his important contribution to logic with The Logical Syntax of Language (1934). His stay in Prague, however, was cut short by the Nazi rise to power. In 1935, with the aid of the American philosophers Charles Morris and Willard Van Orman Quine, whom he had met in Prague the previous year, Carnap moved to the United States. He became an American citizen in 1941.

From 1936 to 1952, Carnap was a professor at the University of Chicago (with the year 1940-41 spent as a visiting professor at Harvard University). He then spent two years at the Institute for Advanced Study at Princeton before taking an appointment at the University of California at Los Angeles.

In the 1940s, stimulated by Tarskian model theory, Carnap became interested in semantics. He wrote several books on semantics: Introduction to Semantics (1942), Formalization of Logic (1943), and Meaning and Necessity: A Study in Semantics and Modal Logic (1947). In Meaning and Necessity, Carnap used semantics to explain modalities. Subsequently he began to work on the structure of scientific theories. His main concerns were (i) to give an account of the distinction between analytic and synthetic statements and (ii) to give a suitable formulation of the verifiability principle; that is, to find a criterion of significance appropriate to scientific language. Other important works were “Meaning Postulates” (1952) and “Observation Language and Theoretical Language” (1958). The latter sets out Carnap’s definitive view on the analytic-synthetic distinction. “The Methodological Character of Theoretical Concepts” (1958) is an attempt to give a tentative definition of a criterion of significance for scientific language. Carnap was also interested in formal logic (Introduction to Symbolic Logic, 1954) and in inductive logic (Logical Foundations of Probability, 1950; The Continuum of Inductive Methods, 1952). The Philosophy of Rudolf Carnap, ed. by Paul Arthur Schilpp, was published in 1963 and includes an intellectual autobiography. Philosophical Foundations of Physics, ed. by Martin Gardner, was published in 1966. Carnap was working on the theory of inductive logic when he died on September 14, 1970, at Santa Monica, California.

2. The Structure of Scientific Theories

In Carnap’s opinion, a scientific theory is an interpreted axiomatic formal system. It consists of:

  • a formal language, including logical and non-logical terms;
  • a set of logical-mathematical axioms and rules of inference;
  • a set of non-logical axioms, expressing the empirical portion of the theory;
  • a set of meaning postulates stating the meaning of non-logical terms, which formalize the analytic truths of the theory;
  • a set of rules of correspondence, which give an empirical interpretation of the theory.

The sets of meaning postulates and rules of correspondence may be included in the set of non-logical axioms. Indeed, meaning postulates and rules of correspondence are not usually explicitly distinguished from non-logical axioms; only one set of axioms is formulated. One of the main purposes of the philosophy of science is to show the difference between the various kinds of statements.

The Language of Scientific Theories The language of a scientific theory consists of:

  1. a set of symbols and
  2. rules to ensure that a sequence of symbols is a well-formed formula, that is, correct with respect to syntax.

Among the symbols of the language are logical and non-logical terms. The set of logical terms include logical symbols, e.g., connectives and quantifiers, and mathematical symbols, e.g., numbers, derivatives, and integrals. Non-logical terms are divided into observational and theoretical. They are symbols denoting physical entities, properties or relations such as ‘blue’, ‘cold’, ‘ warmer than’, ‘proton’, ‘electromagnetic field’. Formulas are divided into: (i) logical statements, which do not contain non-logical terms; (ii) observational statements, which contain observational terms but no theoretical terms; (iii) purely theoretical statements, which contain theoretical terms but no observational terms and (iv) rules of correspondence, which contain both observational and theoretical terms.

Classification of statements in a scientific language
type of statement
observational terms
theoretical terms
logical statements No No
observational statements Yes No
purely theoretical statements No Yes
rules of correspondence Yes Yes

Observational language contains only logical and observational statements; theoretical language contains logical and theoretical statements and rules of correspondence.

The distinction between observational and theoretical terms is a central tenet of logical positivism and at the core of Carnap’s view on scientific theories. In his book Philosophical Foundations of Physics (1966), Carnap bases the distinction between observational and theoretical terms on the distinction between two kinds of scientific laws, namely empirical laws and theoretical laws.

An empirical law deals with objects or properties that can be observed or measured by means of simple procedures. This kind of law can be directly confirmed by empirical observations. It can explain and forecast facts and be thought of as an inductive generalization of such factual observations. Typically, an empirical law which deals with measurable physical quantities, can be established by means of measuring such quantities in suitable cases and then interpolating a simple curve between the measured values. For example, a physicist could measure the volume V, the temperature T and the pressure P of a gas in diverse experiments, and he could find the law PV=RT, for a suitable constant R.

A theoretical law, on the other hand, is concerned with objects or properties we cannot observe or measure but only infer from direct observations. A theoretical law cannot be justified by means of direct observation. It is not an inductive generalization but a hypothesis reaching beyond experience. While an empirical law can explain and forecast facts, a theoretical law can explain and forecast empirical laws. The method of justifying a theoretical law is indirect: a scientist does not test the law itself but, rather, the empirical laws that are among its consequences.

The distinction between empirical and theoretical laws entails the distinction between observational and theoretical properties, and hence between observational and theoretical terms. The distinction in many situations is clear, for example: the laws that deal with the pressure, volume and temperature of a gas are empirical laws and the corresponding terms are observational; while the laws of quantum mechanics are theoretical. Carnap admits, however, that the distinction is not always clear and the line of demarcation often arbitrary. In some ways the distinction between observational and theoretical terms is similar to that between macro-events, which are characterized by physical quantities that remain constant over a large portion of space and time, and micro-events, where physical quantities change rapidly in space or time.

3. Analytic and Synthetic

To the logical empiricist, all statements can be divided into two classes: analytic a priori and synthetic a posteriori. There can be no synthetic a priori statements. A substantial aspect of Carnap’s work was his attempt to give precise definition to the distinction between analytic and synthetic statements.

In The Logical Syntax of Language (1934), Carnap studied a formal language that could express classical mathematics and scientific theories, for example, classical physics. Carnap would have known Kurt Gödel’s 1931 article on the incompleteness of mathematics. He was, therefore, aware of the substantial difference between the two concepts of proof and consequence: some statements, despite being a logical consequence of the axioms of mathematics, are not provable by means of these axioms. He would not, however, have been able to take account of Alfred Tarski’s essay on semantics, first published in Polish in 1933. Tarski’s essay led to the notion of logical consequence being regarded as a semantic concept and defined by means of model theory. These circumstances explain how Carnap, in The Logical Syntax of Language, gave a purely syntactic formulation of the concept of logical consequence. However, he did define a new rule of inference, now called the omega-rule, but formerly called the Carnap rule:

From the infinite series of premises A(1), A(2), … , A(n), A(n+1) ,…, we can infer the conclusion (x)A(x)

Carnap defines the notion of logical consequence in the following way: a statement A is a logical consequence of a set S of statements if and only if there is a proof of A based on the set S; it is admissible to use the omega-rule in the proof of A. In the definition of the notion of provable, however, a statement A is provable by means of a set S of statements if and only if there is a proof of A based on the set S, but the omega-rule is not admissible in the proof of A. (A formal system which admits the use of the omega-rule is complete, so Gödel’s incompleteness theorem does not apply to such formal systems.

Carnap then proceeded to define some kinds of statements: (i) a statement is L-true if and only if it is a logical consequence of the empty set of statements; (ii) a statement is L-false if and only if all statements are a logical consequence of it; (iii) a statement is analytic if and only if it is L-true or L-false; (iv) a statement is synthetic if and only if is not analytic. Carnap thus defines analytic statements as logically determined statements: their truth depends on logical rules of inference and is independent of experience. Thus, analytic statements are a priori while synthetic statements are a posteriori, because they are not logically determined.

Carnap maintained his definitions of statements in his article “Testability and Meaning” (1936) and his book Meaning and Necessity (1947). In “Testability and Meaning,” he introduced semantic concepts: a statement is analytic if and only if it is logically true; it is self-contradictory if and only if it is logically false. In any other case, the statement is synthetic. In Meaning and Necessity. Carnap first defines the notion of L-true (a statement is L-true if its truth depends on semantic rules) and then defines the notion of L-false (a statements if L-false if its negation is L-true). A statement is L-determined if it is L-true or L-false; analytic statements are L-determined, while synthetic statements are not L-determined. This is very similar to the definitions Carnap gave in The Logical Syntax of Language but with the change from syntactic to semantic concepts.

In 1951, Quine published the article “Two Dogmas of Empiricism,” in which he disputed the distinction made between analytic and synthetic statements. In response, Carnap partially changed his point of view on this problem. His first response to Quine came in “Meaning postulates” (1952) where Carnap suggested that analytic statements are those which can be derived from a set of appropriate sentences that he called meaning postulates. Such sentences define the meaning of non logical terms and thus the set of analytic statements is not equal to the set of logically true statements. Later, in “Observation language and theoretical language” (1958), he expressed a general method for determining a set of meaning postulates for the language of a scientific theory. He further expounded on this method in his reply to Carl Gustav Hempel in The Philosophy of Rudolf Carnap (1963), and in Philosophical Foundations of Physics (1966). Suppose the number of non-logical axioms is finite. Let T be the conjunction of all purely theoretical axioms, and C the conjunction of all correspondence postulates and TC the conjunction of T and C. The theory is equivalent to the single axiom TC. Carnap formulates the following problems: how can we find two statements, say A and R, so that A expresses the analytic portion of the theory (that is, all consequences of A are analytic) while R expresses the empirical portion (that is, all consequences of R are synthetic)? The empirical content of the theory is formulated by means of a Ramsey sentence (a discovery of the English philosopher Frank Ramsey). Carnap’s solution to the problem builds a Ramsey sentence on the following instructions:

  1. Replace every theoretical term in TC with a variable.
  2. Add an appropriate number of existential quantifiers at the beginning of the sentence.

Look at the following example. Let TC(O 1 ,..,O n ,T 1 ,…,T m ) be the conjunction of T and C; in TC there are observational terms O 1 …O n and theoretical terms T 1 …T m . The Ramsey sentence (R) is

EX 1 …EX m TC(O 1 ,…,O n ,X 1 ,…,X m )

Every observational statement which is derivable from TC is also derivable from R and vice versa so that, R expresses exactly the empirical portion of the theory. Carnap proposes the statement R TC as the only meaning postulate; this became known as the Carnap sentence. Note that every empirical statement that can be derived from the Carnap sentence is logically true, and thus the Carnap sentence lacks empirical consequences. So, a statement is analytic if it is derivable from the Carnap sentence; otherwise the statement is synthetic. The requirements of Carnap’s method can be summarized as follows : (i) non-logical axioms must be explicitly stated, (ii) the number of non-logical axioms must be finite and (iii) observational terms must be clearly distinguished from theoretical terms.

4. Meaning and Verifiability

Perhaps the most famous tenet of logical empiricism is the verifiability principle, according to which a synthetic statement is meaningful only if it is verifiable. Carnap sought to give a logical formulation of this principle. In The Logical Structure of the World (1928) he asserted that a statement is meaningful only if every non-logical term is explicitly definable by means of a very restricted phenomenalistic language. A few years later, Carnap realized that this thesis was untenable because a phenomenalistic language is insufficient to define physical concepts. Thus he choose an objective language (“thing language”) as the basic language, one in which every primitive term is a physical term. All other terms (biological, psychological, cultural) must be defined by means of basic terms. To overcome the problem that an explicit definition is often impossible, Carnap used dispositional concepts, which can be introduced by means of reduction sentences. For example, if A, B, C and D are observational terms and Q is a dispositional concept, then

(x)[Ax → (Bx ↔ Qx)]
(x)[Cx → (Dx ↔ ~Qx)]

are reduction sentences for Q. In “Testability and Meaning” (1936) Carnap revised the new verifiability principle in this way: all terms must be reducible, by means of definitions or reduction sentences, to the observational language. But this proved to be inadequate. K. R. Popper showed not only that some metaphysical terms can be reduced to the observational language and thus fulfill Carnap’s requirements, but also that some genuine physical concepts are forbidden. Carnap acknowledged that criticism and in “The Methodological Character of Theoretical Concepts” (1956) sought to develop a further definition. The main philosophical properties of Carnap’s new principle can be outlined under three headings. First, of all, the significance of a term becomes a relative concept: a term is meaningful with respect to a given theory and a given language. The meaning of a concept thus depends on the theory in which that concept is used. This represents a significant modification in empiricism’s theory of meaning. Secondly, Carnap explicitly acknowledges that some theoretical terms cannot be reduced to the observational language: they acquire an empirical meaning by means of the links with other reducible theoretical terms. Third, Carnap realizes that the principle of operationalism is too restrictive. Operationalism was formulated by the American physicist Percy Williams Bridgman (1882-1961) in his book The Logic of Modern Physics (1927). According to Bridgman, every physical concept is defined by the operations a physicist uses to apply it. Bridgman asserted that the curvature of space-time, a concept used by Einstein in his general theory of relativity, is meaningless, because it is not definable by means of operations., Bridgman subsequently changed his philosophical point of view, and admitted there is an indirect connection with observations. Perhaps influenced by Popper’s criticism, or by the problematic consequences of a strict operationalism, Carnap changed his earlier point of view and freely admitted a very indirect connection between theoretical terms and the observational language.

5. Probability and Inductive Logic

A variety of interpretations of probability have been proposed:

  • Classical interpretation. The probability of an event is the ratio of the favorable outcomes to the possible outcomes. For example: a die is thrown with the result that “the score is five”. There are six possible outcomes with only one favorable; thus the probability of “the score is five” is one sixth.
  • Axiomatic interpretation. The probability is whatever fulfils the axioms of the theory of probability. In the early 1930s, the Russian mathematician Andrei Nikolaevich Kolmogorov (1903-1987) formulated the first axiomatic system for probability.
  • Frequency interpretation, now the favored interpretation in empirical science. The probability of an event in a sequence of events is the limit of the relative frequency of that event. Example: throw a die several times and record the scores; the relative frequency of “the score is five” is about one sixth; the limit of the relative frequency is exactly one sixth.
  • Probability as a degree of confirmation. This was an approach supported by Carnap and students of inductive logic. The probability of a statement is the degree of confirmation the empirical evidence gives to the statement. Example: the statement “the score is five” receives a partial confirmation by the evidence; its degree of confirmation is one sixth.
  • Subjective interpretation. The probability is a measure of the degree of belief. A special case is the theory that the probability is a fair betting quotient – this interpretation was supported by Carnap. Example: suppose you bet that the score would be five; you bet a dollar and, if you win, you will receive six dollars: this is a fair bet.
  • Propensity interpretation. This is a proposal of K. R. Popper. The probability of an event is an objective property of the event. For example: the physical properties of a die (the die is homogeneous; it has six sides; on every side there is a different number between one and six; etc.) explain the fact that the limit of the relative frequency of “the score is five” is one sixth.

Carnap devoted himself to giving an account of the probability as a degree of confirmation. The philosophically most significant consequences of his research arise from his assertion that the probability of a statement, with respect to a given body of evidence, is a logical relation between the statement and the evidence. Thus it is necessary to build an inductive logic; that is, a logic which studies the logical relations between statements and evidence. Inductive logic would give us a mathematical method of evaluating the reliability of an hypothesis. In this way inductive logic would answer the problem raised by David Hume’s analysis of induction. Of course, we cannot be sure that an hypothesis is true; but we can evaluate its degree of confirmation and we can thus compare alternative theories.

In spite of the abundance of logical and mathematical methods Carnap used in his own research on the inductive logic, he was not able to formulate a theory of the inductive confirmation of scientific laws. In fact, in Carnap’s inductive logic, the degree of confirmation of every universal law is always zero.

Carnap tried to employ the physical-mathematical theory of thermodynamic entropy to develop a comprehensive theory of inductive logic, but his plan never progressed beyond an outline stage. His works on entropy were published posthumously.

6. Modal Logic and the Philosophy of Language

The following table, which is an adaptation of a similar table Carnap used in Meaning and Necessity, shows the relations between modal properties such as necessary and impossible and logical properties such as L-true, L-false, analytic, synthetic. The symbol N means “necessarily”, so that Np means “necessarily p” or “p is necessary.”

Modal and logical properties of statements
Modalities
Formalization
Logical status
p is necessary Np L true, analytic
p is impossible N~p L false, contradictory
p is contingent ~Np & ~N~p factual, synthetic
p is not necessary ~Np Not L true
p is possible ~N~p Not L false
p is not contingent Np v N~p L determined, not synthetic

Carnap identifies the necessity of a statement p with its logical truth: a statement is necessary if and only if it is logically true. Thus modal properties can be defined by means of the usual logical properties of statements. Np, i.e., “necessarily p”, is true if and only if p is logically true. He defines the possibility of p as “it is not necessary that not p”. That is, “possibly p” is defined as ~N~p. The impossibility of p means that p is logically false. It must be stressed that, in Carnap’s opinion, every modal concept is definable by means of the logical properties of statements. Modal concepts are thus explicable from a classical point of view (meaning “using classical logic”, e.g., first order logic). Carnap was aware that the symbol N is definable only in the meta-language, not in the object language. Np means “p is logically true”, and the last statement belongs to the meta-language; thus N is not explicitly definable in the language of a formal logic, and we cannot eliminate the term N. More precisely, we can define N only by means of another modal symbol we take as a primitive symbol, so that at least one modal symbol is required among the primitive symbols.

Carnap’s formulation of modal logic is very important from a historical point of view. Carnap gave the first semantic analysis of a modal logic, using Tarskian model theory to explain the conditions in which “necessarily p” is true. He also solved the problem of the meaning of the statement (x)N[Ax], where Ax is a sentence in which the individual variable x occurs. Carnap showed that (x)N[Ax] is equivalent to N[(x)Ax] or, more precisely, he proved we can assume its equivalence without contradictions.

From a broader philosophical point of view, Carnap believed that modalities did not require a new conceptual framework; a semantic logic of language can explain the modal concepts. The method he used in explaining modalities was a typical example of his philosophical analysis. Another interesting example is the explanation of belief-sentences which Carnap gave in Meaning and Necessity. Carnap asserts that two sentences have the same extension if they are equivalent, i.e., if they are both true or both false. On the other hand, two sentences have the same intension if they are logically equivalent, i.e., their equivalence is due to the semantic rules of the language. Let A be a sentence in which another sentence occurs, say p. A is called “extensional with respect to p” if and only if the truth value of A does not change if we substitute the sentence p with an equivalent sentence q. A is called “intensional with respect to p” if and only if (i) A is not extensional with respect to p and (ii) the truth of A does not change if we substitute the sentence p with a logically equivalent sentence q. The following examples arise from Carnap’s assertions:

  • The sentence A v B is extensional with respect to both A and B; we can substitute A and B with equivalent sentences and the truth value of A v B does not change.
  • Suppose A is true but not L-true; therefore the sentences A v ~A and A are equivalent (both are true) and, of course, they are not L-equivalent. The sentence N(A v ~A) is true and the sentence N(A) is false; thus N(A) is not extensional with respect to A. On the contrary, if C is a sentence L-equivalent to A v ~A, then N(A v ~A) and N(C) are both true: N(A) is intensional with respect to A.

There are sentences which are neither extensional not intensional; for example, belief-sentences. Carnap’s example is “John believes that D”. Suppose that “John believes that D” is true; let A be a sentence equivalent to D and let B be a sentence L-equivalent to D. It is possible that the sentences “John believes that A” and “John believes that B” are false. In fact, John can believe that a sentence is true, but he can believe that a logically equivalent sentence is false. To explain belief-sentences, Carnap defines the notion of intensional isomorphism. In broad terms, two sentences are intensionally isomorphic if and only if their corresponding elements are L-equivalent. In the belief-sentence “John believes that D” we can substitute D with an intensionally isomorphic sentence C.

7. Philosophy of Physics

The first and the last books Carnap published during his lifetime were concerned with the philosophy of physics: his doctoral dissertation (Der Raum, 1922) and Philosophical Foundations of Physics, ed. by Martin Gardner, 1966. Der Raum deals with the philosophy of space. Carnap recognizes the difference between three kinds of theories of space: formal, physical and intuitive s. Formal space is analytic a priori; it is concerned with the formal properties of the space that is with those properties which are a logical consequence of a definite set of axioms. Physical space is synthetic a posteriori; it is the object of natural science, and we can know its structure only by means of experience. Intuitive space is synthetic a priori, and is known via a priori intuition. According to Carnap, the distinction between three different kinds of space is similar to the distinction between three different aspects of geometry: projective, metric and topological respectively.

Some aspects of Der Raum remain very interesting. First, Carnap accepts a neo-Kantian philosophical point of view. Intuitive space, with its synthetic a priori character, is a concession to Kantian philosophy. Second, Carnap uses the methods of mathematical logic; for example, the characterization of intuitive space is given by means of Hilbert’s axioms for topology. Thirdly, the distinction between formal and physical space is similar to the distinction between mathematical and physical geometry. This distinction, first proposed by Hans Reichenbach and later accepted by Carnap, and became the official position of logical empiricism on the philosophy of space.

Carnap also developed a formal system for space-time topology. He asserted (1925) that space relations are based on the causal propagation of a signal, while the causal propagation itself is based on the time order.

Philosophical Foundations of Physics is a clear and approachable survey of topics from the philosophy of physics based on Carnap’s university lectures. Some theories expressed there are not those of Carnap alone, but they belong to the common heritage of logical empiricism. The subjects dealt with in the book include:

  • The structure of scientific explanation: deductive and probabilistic explanation.
  • The philosophical and physical significance of non-Euclidean geometry; the theory of space in the general theory of relativity. Carnap argues against Kantian philosophy, especially against the synthetic a priori, and against conventionalism. He gives a clear explanation of the main properties of non-Euclidean geometry.
  • Determinism and quantum physics.
  • The nature of scientific language. Carnap deals with (i) the distinction between observational and theoretical terms, (ii) the distinction between analytic and synthetic statements and (iii) quantitative concepts.

As a sample of the content of Philosophical Foundations of Physics we can briefly look at Carnap’s thought on scientific explanation. Carnap accepts the classical theory developed by Carl Gustav Hempel. Carnap gives the following example to explain the general structure of a scientific explanation:

(x)(Px→ Qx)
Pa
———
Qa

where the first statement is a scientific law; the second, is a description of the initial conditions; and the third, is the description of the event we want to explain. The last statement is a logical consequence of the first and the second, which are the premises of the explanation. A scientific explanation is thus a logical derivation of an appropriate statement from a set of premises, which state universal laws and initial conditions. According to Carnap, there is another kind of scientific explanation, probabilistic explanation, in which at least one universal law is not a deterministic law, but a probabilistic law. Again Carnap’s example is:

fr(Q,P) = 0.8
Pa
———-
Qa

where the first sentence means “the relative frequency of Q with respect to P is 0.8”. Qa is not a logical consequence of the premises; therefore this kind of explanation determines only a certain degree of confirmation for the event we want to explain.

8. Carnap’s Heritage

Carnap’s work has stimulated much debate. A substantial scholarly literature, both critical and supportive, has developed from examination of his thought. With respect to the analytic-synthetic distinction, Ryszard Wojcicki and Marian Przelecki – two Polish logicians – formulated a semantic definition of the distinction between analytic and synthetic. They proved that the Carnap sentence is the weakest meaning postulate, i.e., every meaning postulate entails the Carnap sentence. As a result, the set of analytic statements which are a logical consequence of the Carnap sentence is the smallest set of analytic statements. Wojcicki and Przelecki’s research is independent of the distinction between observational and theoretical terms, i.e., their suggested definition also works in a purely theoretical language. They also dispense with the requirement for a finite number of non-logical axioms.

The tentative definition of meaningfulness that Carnap proposed in “The Methodological Character of Theoretical Concepts” has been proved untenable. See, for example, David Kaplan, “Significance and Analyticity” in Rudolf Carnap, Logical Empiricist and Marco Mondadori’s introduction to Analiticità, Significanza, Induzione, in which Mondadori suggests a possible correction of Carnap’s definition.

With respect to inductive logic, I mention only Jaakko Hintikka’s generalization of Carnap’s continuum of inductive methods. In Carnap’s inductive logic, the probability of every universal law is always zero. Hintikka succeeded in formulating an inductive logic in which universal laws can obtain a positive degree of confirmation.

In Meaning and Necessity, 1947, Carnap was the first logician to use a semantic method to explain modalities. However, he used Tarskian model theory, so that every model of the language is an admissible model. In 1972 the American philosopher Saul Kripke was able to prove that a full semantics of modalities can be attained by means of possible-worlds semantics. According to Kripke, not all possible models are admissible. J. Hintikka’s essay “Carnap’s heritage in logical semantics” in Rudolf Carnap, Logical Empiricist, shows that Carnap came extremely close to possible-worlds semantics, but was not able to go beyond classical model theory.

The omega-rule, which Carnap proposed in The Logical Syntax of Language, has come into widespread use in metamathematical research over a broad range of subjects.

9. References and Further Reading

The Philosophy of Rudolf Carnap (1963) contains the most complete bibliography of Carnap’s work.  Listed below are Carnap’s most important works, arranged in chronological order.

a. Carnap’s Works

  • 1922 Der Raum: Ein Beitrag zur Wissenschaftslehre, dissertation, in Kant-Studien, Ergänzungshefte, n. 56
  • 1925 “Über die Abhängigkeit der Eigenschaften der Raumes von denen der Zeit” in Kant-Studien, 30
  • 1926 Physikalische Begriffsbildung, Karlsruhe : Braun, (Wissen und Wirken ; 39)
  • 1928 Scheinprobleme in der Philosophie, Berlin : Weltkreis-Verlag
  • 1928 Der Logische Aufbau der Welt, Leipzig : Felix Meiner Verlag (English translation The Logical Structure of the World; Pseudoproblems in Philosophy, Berkeley : University of California Press, 1967)
  • 1929 (with Otto Neurath and Hans Hahn) Wissenschaftliche Weltauffassung der Wiener Kreis, Vienna : A. Wolf
  • 1929 Abriss der Logistik, mit besonderer Berücksichtigung der Relationstheorie und ihrer Anwendungen, Vienna : Springer
  • 1932 “Die physikalische Sprache als Universalsprache der Wissenschaft” in Erkenntnis, II (English translation The Unity of Science, London : Kegan Paul, 1934)
  • 1934 Logische Syntax der Sprache (English translation The Logical Syntax of Language, New York : Humanities, 1937)
  • 1935 Philosophy and Logical Syntax, London : Kegan Paul
  • 1936 “Testability and meaning” in Philosophy of Science, III (1936) and IV (1937)
  • 1938 “Logical Foundations of the Unity of Science” in International Encyclopaedia of Unified Science, vol. I n. 1, Chicago : University of Chicago Press
  • 1939 “Foundations of Logic and Mathematics” in International Encyclopaedia of Unified Science, vol. I n. 3, Chicago : University of Chicago Press
  • 1942 Introduction to Semantics, Cambridge, Mass. : Harvard University Press
  • 1943 Formalization of Logic, Cambridge, Mass. : Harvard University Press
  • 1947 Meaning and Necessity: a Study in Semantics and Modal Logic, Chicago : University of Chicago Press
  • 1950 Logical Foundations of Probability, Chicago : University of Chicago Press
  • 1952 “Meaning postulates” in Philosophical Studies, III (now in Meaning and Necessity, 1956, 2nd edition)
  • 1952 The Continuum of Inductive Methods, Chicago : University of Chicago Press
  • 1954 Einführung in die Symbolische Logik, Vienna : Springer (English translation Introduction to Symbolic Logic and its Applications, New York : Dover, 1958)
  • 1956 “The Methodological Character of Theoretical Concepts” in Minnesota Studies in the Philosophy of Science, vol. I, ed. by H. Feigl and M. Scriven, Minneapolis : University of Minnesota Press
  • 1958 “Beobacthungssprache und theoretische Sprache” in Dialectica, XII (English translation “Observation Language and Theoretical Language” in Rudolf Carnap, Logical Empiricist, Dordrecht, Holl. : D. Reidel Publishing Company, 1975)
  • 1966 Philosophical Foundations of Physics, ed. by Martin Gardner, New York : Basic Books
  • 1977 Two Essays on Entropy, ed. by Abner Shimony, Berkeley : University of California Press

b. Other Sources

  • 1962 Logic and Language: Studies Dedicated to Professor Rudolf Carnap on the Occasion of his Seventieth Birthday, Dordrect, Holl. : D. Reidel Publishing Company
  • 1963 The Philosophy of Rudolf Carnap, ed. by Paul Arthur Schillp, La Salle, Ill. : Open Court Pub. Co.
  • 1970 PSA 1970: Proceedings of the 1970 Biennial Meeting of the Philosophy of Science Association: In Memory of Rudolf Carnap, Dordrect, Holl. : D. Reidel Publishing Company
  • 1971 Analiticità, Significanza, Induzione, ed. by Alberto Meotti e Marco Mondadori, Bologna, Italy : il Mulino
  • 1975 Rudolf Carnap, Logical Empiricist. Materials and Perspectives, ed. by Jaakko Hintikka, Dordrecht, Holl. : D. Reidel Publishing Company
  • 1986 Joëlle Proust, Questions de Forme: Logique at Proposition Analytique de Kant a Carnap, Paris, France: Fayard (English translation Questions of Forms: Logic and Analytic Propositions from Kant to Carnap, Minneapolis : University of Minnesota Press)
  • 1990 Dear Carnap, Dear Van: The Quine-Carnap Correspondence and Related Work, ed. by Richard Creath, Berkeley : University of California Press
  • 1991 Maria Grazia Sandrini, Probabilità e Induzione: Carnap e la Conferma come Concetto Semantico, Milano, Italy : Franco Angeli
  • 1991 Erkenntnis Orientated: A Centennial Volume for Rudolf Carnap and Hans Reichenbach, ed. by Wolfgang Spohn, Dordrecht; Boston : Kluwer Academic Publishers
  • 1991 Logic, Language, and the Structure of Scientific Theories: Proceedings of the Carnap-Reichenbach Centennial, University of Konstanz, 21-24 May 1991 Pittsburgh : University of Pittsburgh Press; [Konstanz] : Universitasverlag Konstanz
  • 1995 L’eredità di Rudolf Carnap: Epistemologia, Filosofia delle Scienze, Filosofia del Linguaggio, ed. by Alberto Pasquinelli, Bologna, Italy : CLUEB

Author Information

Mauro Murzi
Email: murzim@yahoo.com
Italy