All posts by IEP Author

Logical Consequence

Logical consequence is arguably the central concept of logic. The primary aim of logic is to tell us what follows logically from what. In order to simplify matters we take the logical consequence relation to hold for sentences rather than for abstract propositions, facts, state of affairs, etc. Correspondingly, logical consequence is a relation between a given class of sentences and the sentences that logically follow. One sentence is said to be a logical consequence of a set of sentences, if and only if, in virtue of logic alone, it is impossible for the sentences in the set to be all true without the other sentence being true as well. If sentence X is a logical consequence of a set of sentences K, then we may say that K implies or entails X, or that one may correctly infer the truth of X from the truth of the sentences in K. For example, Kelly is not at work is a logical consequence of Kelly is not both at home and at work and Kelly is at home. However, the sentence Kelly is not a football fan does not follow from All West High School students are football fans and Kelly is not a West High School student. The central question to be investigated here is: What conditions must be met in order for a sentence to be a logical consequence of others?

One popular answer derives from the work of Alfred Tarski, one of the preeminent logicians of the twentieth century, in his famous 1936 paper, “The Concept of Logical Consequence.” Here Tarski uses his observations of the salient features of what he calls the common concept of logical consequence to guide his theoretical development of it. Accordingly, we begin by examining the common concept focusing on Tarski’s observations of the criteria by which we intuitively judge what follows from what and which Tarski thinks must be reflected in any theory of logical consequence. Then two theoretical definitions of logical consequence are introduced: the model theoretic and the deductive theoretic definitions. They represent two major approaches to making the common concept of logical consequence more precise. The article concludes by highlighting considerations relevant to evaluating model theoretic and deductive theoretic characterizations of logical consequence.

Table of Contents

  1. Introduction
  2. The Concept of Logical Consequence: Model-Theoretic and Deductive-Theoretic Conceptions of Logic
    1. Tarski’s Characterization of the Common Concept of Logical Consequence
      1. The Logical Consequence Relation Has a Modal Element
      2. The Logical Consequence Relation is Formal
      3. The Logical Consequence Relation is A Priori
    2. Logical and Non-Logical Terminology
      1. The Nature of Logical Constants Explained in Terms of Their Semantic Properties
      2. The Nature of Logical Constants Explained in Terms of Their Inferential Properties
  3. Model-Theoretic and Deductive-Theoretic Conceptions of Logic
  4. Conclusion
  5. References and Further Reading

1. Introduction

For a given language, a sentence is said to be a logical consequence of a set of sentences, if and only if, in virtue of logic alone, the sentence must be true if every sentence in the set were to be true. This corresponds to the ordinary notion of a sentence “logically following” from others. Logicians have attempted to make the ordinary concept more precise relative to a given language L by sketching a deductive system for L, or by formalizing the intended semantics for L. Any adequate precise characterization of logical consequence must reflect its salient features such as those highlighted by Alfred Tarski: (1) that the logical consequence relation is formal, that is, depends on the forms of the sentences involved, (2) that the relation is a priori, that is, it is possible to determine whether or not it holds without appeal to sense-experience, and (3) that the relation has a modal element.

For more comprehensive presentations of the two definitions of logical consequence, as well as further critical discussion, see the entries Logical Consequence, Model-Theoretic Conceptions and Logical Consequence, Deductive-Theoretic Conceptions.

2. The Concept of Logical Consequence

a. Tarski’s Characterization of the Common Concept of Logical Consequence

Tarski begins his article, “On the Concept of Logical Consequence,” by noting a challenge confronting the project of making precise the common concept of logical consequence.

The concept of logical consequence is one of those whose introduction into a field of strict formal investigation was not a matter of arbitrary decision on the part of this or that investigator; in defining this concept efforts were made to adhere to the common usage of the language of everyday life. But these efforts have been confronted with the difficulties which usually present themselves in such cases. With respect to the clarity of its content the common concept of consequence is in no way superior to other concepts of everyday language. Its extension is not sharply bounded and its usage fluctuates. Any attempt to bring into harmony all possible vague, sometimes contradictory, tendencies which are connected with the use of this concept, is certainly doomed to failure. We must reconcile ourselves from the start to the fact that every precise definition of this concept will show arbitrary features to a greater or less degree. (Tarski 1936, p. 409)

Not every feature of the technical account will be reflected in the ordinary concept, and we should not expect any clarification of the concept to reflect each and every deployment of it in everyday language and life. Nevertheless, despite its vagueness, Tarski believes that there are identifiable, essential features of the common concept of logical consequence.

…consider any class K of sentences and a sentence X which follows from this class. From an intuitive standpoint, it can never happen that both the class K consists of only true sentences and the sentence X is false. Moreover, since we are concerned here with the concept of logical, that is, formal consequence, and thus with a relation which is to be uniquely determined by the form of the sentences between which it holds, this relation cannot be influenced in any way by empirical knowledge, and in particular by knowledge of the objects to which the sentence X or the sentences of class K refer. The consequence relation cannot be affected by replacing designations of the objects referred to in these sentences by the designations of any other objects. (Tarski 1936, pp. 414-415)

According to Tarski, the logical consequence relation as it is employed by typical reasoners is (1) necessary, (2) formal, and (3) not influenced by empirical knowledge. I now elaborate on (1)-(3) in order to shape two preliminary characterizations of logical consequence.

i. The Logical Consequence Relation Has a Modal Element

Tarski countenances an implicit modal notion in the common concept of logical consequence. If X is a logical consequence of K, then not only is it the case that not all of the elements of K are true and X is false, but also this is necessarily the case. That is, X follows from K only if it is not possible for all of the sentences in K to be true with X false. For example, the supposition that All West High School students are football fans and that Kelly is not a West High School student does not rule out the possibility that Kelly is a football fan. Hence, the sentences All West High School students are football fans and Kelly is not a West High School student do not entail Kelly is not a football fan, even if she, in fact, isn’t a football fan. Also, Most of Kelly’s male classmates are football fans does not entail Most of Kelly’s classmates are football fans. What if the majority of Kelly’s class is composed of females who are not fond of football?

We said above that Kelly is not both at home and at work and Kelly is at home jointly imply Kelly is not at work. Note that it doesn’t seem possible for the first two sentences to be true and Kelly is not at work false. But it is hard to see what this comes to without further clarification of the relevant notion of possibility. For example, consider the following pairs of sentences.

Kelly kissed her sister at 2:00pm.
2:00pm is not a time during which Kelly
and her sister were 100 miles apart.
Kelly is a female.
Kelly is not the US President.
There is a chimp in Paige’s house.
There is a primate in Paige’s house.
Ten is a prime number.
Ten is greater than nine.

For each pair of sentences, there is a sense in which it is not possible for the first to be true and the second false. At the very least an account of logical consequence must distinguish logical possibility from other types of possibility. Should truths about physical laws, US political history, zoology, and mathematics constrain what we take to be possible in determining whether or not the first sentence of each pair could logically be true with the second sentence false? If not, then this seems to mystify logical possibility (e.g., how could ten be a prime number?). To paraphrase questions asked by G.E. Moore (1959, pp. 231-238), given that I know that George W. Bush is US President and that he is not a female named Kelly, isn’t it inconsistent for me to grant the logical possibility of the truth of Kelly is a female and the falsity of Kelly is not the US President? Or should I ignore my present state of knowledge in considering what is logically possible? Tarski does not derive a clear notion of logical possibility from the common concept of logical consequence. Perhaps there is none to be had, and we should seek the help of a proper theoretical development in clarifying the notion of logical possibility. Towards this end, let’s turn to the other features of logical consequence highlighted by Tarski, starting with the formality criterion of logical consequence.

ii. The Logical Consequence Relation is Formal

Tarski observes that logical consequence is a formal consequence relation. And he tells us that a formal consequence relation is a consequence relation that is uniquely determined by the form of the sentences between which it holds. Consider the following pair of sentences

(1) Some children are both lawyers and peacemakers.
(2) Some children are peacemakers

Intuitively, (2) is a logical consequence of (1). It appears that this fact does not turn on the subject matter of the sentences. Replace ‘children’, ‘lawyers’, and ‘peacemakers’ in (1) and (2) with the variables S, M, and P to get the following.

(1′) Some S are both M and P
(2′) Some S are P

(1′) and (2′) are forms of (1) and (2), respectively. Note that there is no interpretation of S, M, and P according to which the sentence that results from (1′) is true and the resulting instance of (2′) is false. Hence, (2) is a formal consequence of (1) and on each interpretation of S, M, and P the resulting (2′) is a formal consequence of the sentence that results from (1′) (e.g., Some clowns are sad is a formal consequence of Some clowns are both lonely and sad). Tarski’s observation is that for any sentence X and set K of sentences, X is a logical consequence of K only if X is a formal consequence of K. The formality criterion of logical consequence can work in explaining why one sentence doesn’t entail another in cases where it seems impossible for the first to be true and the second false. For example, (3) is false and (4) is true.

(3) Ten is a prime number
(4) Ten is greater than nine

Does (4) follow from (3)? One might think that (4) does not follow from (3) because being a prime number does not necessitate being greater than nine. However, this does not require one to think that ten could be a prime number and less than or equal to nine, which is probably a good thing since it is hard to see how this is possible. Rather, we take

(3′) a is a P
(4′) a is R than b

to be the forms of (5) and (6) and note that there are interpretations of ‘a’, ‘b’, ‘P’, and ‘R’ according to which the first is true and the second false (e.g. let ‘a’ and ‘b’ name the numbers two and ten, respectively, and let ‘P’ mean prime number, and ‘R’ greater). Note that the claim here is not that formality is sufficient for a consequence relation to qualify as logical but only that it is a necessary condition. I now elaborate on this last point by saying a little more about forms of sentences (that is, sentential forms) and formal consequence.

Distinguishing between a term of a sentence replaced with a variable and one held constant determines a form of the sentence. In Some children are both lawyers and peacemakers we may replace ‘Some’ with a variable and treat all the other terms as constant. Then

(1”) D children are both lawyers and peacemakers

is a form of (1), and each sentence generated by assigning a meaning to D shares this form with (1). For example, the following three sentences are instances of (1”), produced by interpreting D as ‘No’, ‘Many’, and ‘Few’.

No children are both lawyers and peacemakers
Many children are both lawyers and peacemakers
Few children are both lawyers and peacemakers

Whether X is a formal consequence of K then turns on a prior selection of terms as constant and others replaced with variables. Relative to such a determination, X is a formal consequence of K if and only if (iff) there is no interpretation of the variables according to which each of the K are true and X is false. So, taking all the terms, except for ‘Some’, in (1) Some children are both philosophers and peacemakers and in (2) Some children are peacemakers as constants makes the following forms of (1) and (2).

(1”) D children are both lawyers and peacemakers
(2”) D children are peacemakers

Relative to this selection, (2) is not a formal consequence of (1) because replacing ‘D’ with ‘No’ yields a true instance of (1”) and a false instance of (2”).

Consider the following pair.

(5) Kelly is female
(6) Kelly is not US President

(6) is a formal consequence of (5) relative to replacing ‘Kelly’ with a variable. Given current U.S. political history, there is no individual whose name yields a true (5) and a false (6) when it replaces ‘Kelly’. This is not, however, sufficient reason for seeing (6) as a logical consequence of (5). There are two ways of thinking about why, a metaphysical consideration and an epistemological one. First the metaphysical consideration. It seems possible for (5) to be true and (6) false. The course of U.S. political history could have turned out differently. One might think that the current US President could–logically–have been a female named, say, ‘Sally’. Using ‘Sally’ as a replacement for ‘Kelly’ would yield in that situation a true (5) and a false (6). Also, it seems possible that in the future there will be a female US President. In order for a formal consequence relation from K to X to qualify as logical it has to be the case that it is necessary that there is no interpretation of the variables in K and X according to which the K-sentences are true and X is false.

The epistemological consideration is that one might think that knowledge that X follows logically from K should not essentially depend on being justified by experience of extra-linguistic states of affairs. Clearly, the determination that (6) follows formally from (5) essentially turns on empirical knowledge, specifically knowledge about the current political situation in the US. This leads to the final highlight of Tarski’s rendition of the intuitive concept of logical consequence: that logical consequence cannot be influenced by empirical knowledge.

iii. The Logical Consequence Relation is A Priori

Tarski says that by virtue of being formal, knowledge that X follows logically from K cannot be affected by knowledge of the objects that X and the sentences of K are about. Hence, our knowledge that X is a logical consequence of K cannot be influenced by empirical knowledge. However, as noted above, formality by itself does not insure that the extension of a consequence relation is not influenced by empirical knowledge. So, let’s view this alleged feature of logical consequence as independent of formality. We characterize empirical knowledge in two steps as follows. First, a priori knowledge is knowledge “whose truth, given an understanding of the terms involved, is ascertainable by a procedure which makes no reference to experience” (Hamlyn 1967, p. 141). Empirical, or a posteriori, knowledge is knowledge that is not a priori, that is, knowledge whose validation necessitates a procedure that does make reference to experience. We can safely read Tarski as saying that a consequence relation is logical only if knowledge that something falls in its extension is a priori, that is, only if the relation is a priori. Knowledge of physical laws, a determinant in people’s observed sizes, is not a priori and such knowledge is required to know that there is no interpretation of k, h, and t according to which (7) is true and (8) false.

(7) k kissed h at time t
(8) t is not a time during which k and h were 100 miles apart

So (8) cannot be a logical consequence of (7). However, my knowledge that Kelly is not Paige’s only friend follows from Kelly is taller than Paige’s only friend is a priori since I know a priori that nobody is taller than herself.

Let’s summarize and tie things together. We began by asking, for a given language L, what conditions must be met in order for a sentence X of L to be a logical consequence of a class K of L-sentences? Tarski thinks that an adequate response must reflect the common concept of logical consequence, that is, the concept as it is ordinarily employed. By the lights of this concept, an adequate account of logical consequence must reflect the formality and necessity of logical consequence, and must also reflect the fact that knowledge of what follows logically from what is a priori. Tying the criteria together, in order to fix what follows logically from what in a given language L, we must select a class of constants that determines a formal consequence relation that is both necessary and known, if at all, a priori. Such constants are called logical constants, and we say that the logical form of a sentence is a function of the logical constants that occur in the sentence and the pattern of the remaining expressions. As was illustrated above, the notion of formality does not presuppose a criterion of logical constancy. A consequence relation based on any division between constants and terms replaced with variables will automatically be formal with respect to the latter.

b. Logical and Non-Logical Terminology

Tarski’s basic move from his rendition of the common concept of logical consequence is to distinguish between logical terms and non-logical terms and then say that X is a logical consequence of K only if there is no possible interpretation of the non-logical terms of the language L that makes all of the sentences in K true and X false. The choice of the right terms as logical will reflect the modal element in the concept of logical consequence, that is, will insure that there is no ‘possible’ interpretation of the variable, non-logical terms of the language L that makes all of the K true and X false, and will insure that this is known a priori. Of course, we have yet to spell out the modal notion in the concept of logical consequence. Tarski pretty much left this underdeveloped in his (1936). Lacking such an explanation hampers our ability to clarify the rationale for a selection of terms to serve as the logical ones.

Traditionally, logicians have regarded sentential connectives such as and, not, or, ifthen, the quantifiers all and some, and the identity predicate ‘=’ as logical terms. Remarking on the boundary between logical and non-terms, Tarski (1936, p. 419) writes the following.

Underlying this characterization of logical consequence is the division of all terms of the language discussed into logical and extra-logical. This division is not quite arbitrary. If, for example, we were to include among the extra-logical signs the implication sign, or the universal quantifier, then our definition of the concept of consequence would lead to results which obviously contradict ordinary usage. On the other hand, no objective grounds are known to me which permit us to draw a sharp boundary between the two groups of terms. It seems to be possible to include among logical terms some which are usually regarded by logicians as extra-logical without running into consequences which stands in sharp contrast to ordinary usage.

Tarski seems right to think that the logical consequence relation turns on the work that the logical terminology does in the relevant sentences. It seems odd to say that Kelly is happy does not logically follow from All are happy because the second is true and the first false when All is replaced with Few. However, by Tarski’s version of the ordinary concept of logical consequence there is no reason not to treat say taller than as a logical term along with not and, therefore, no reason not to take Kelly is not taller than Paige as following logically from Paige is taller than Kelly. Also, it seems plausible to say that I know a priori that there is no possible interpretation of Kelly and is mortal according to which it is necessary that Kelly is mortal is true and Kelly is mortal is false. This makes Kelly is mortal a logical consequence of it is necessary that Kelly is mortal. Given that taller than and it is necessary that, along with other terms, were not generally regarded as logical terms by logicians of Tarski’s day, the fact that they seem to be logical terms by the common concept of logical consequence, as observed by Tarski, highlights the question of what it takes to be a logical term. Tarski says that future research will either justify the traditional boundary between the logical and the non-logical or conclude that there is no such boundary and the concept of logical consequence is a relative concept whose extension is always relative to some selection of terms as logical (p. 420). For further discussion of Tarski’s views on logical terminology and contemporary views see Logical Consequence, Model-Theoretic Conceptions: Section 5.3.

How, exactly, does the terminology usually regarded by logicians as logical work in making it the case that one sentence follows from others? In the next two sections two distinct approaches to understanding the nature of logical terms are sketched. Each approach leads to a unique way of characterizing logical consequence and thus yields a unique response to the above question.

i. The Nature of Logical Constants Explained in Terms of Their Semantic Properties

Consider the following metaphor, borrowed from Bencivenga (1999).

The locked room metaphor

Suppose that you are locked in a dark windowless room and you know everything about your language but nothing about the world outside. A sentence X and a class K of sentences are presented to you. If you can determine that X is true if all the sentences in K are, X is a logical consequence of K.

Ignorant of US politics, I couldn’t determine the truth of Kelly is not US President solely on the basis of Kelly is a female. However, behind such a veil of ignorance I would be able to tell that Kelly is not US President is true if Kelly is female and Kelly is not US President is true. How? Short answer: based on my linguistic competence; longer answer: based on my understanding of the semantic contribution of and to the determination of the truth conditions of a sentence of the form P and Q. For any sentences P and Q, I know that P and Q is true just in case P is true and Q is true. So, I know, a priori, if P and Q is true, then Q is true. As noted by one philosopher, “This really is remarkable since, after all, it’s what they mean, together with the facts about the non-linguistic world, that decide whether P or Q are true” (Fodor 2000, p.12).

Taking not and and to be the only logical constants in (9) Kelly is not both at home and at work, (10) Kelly is at home, and (11) Kelly is not at work, we formalize the sentences as follows, letting k mean Kelly, H mean is at home, and W mean is at work.

(9′) not-(Hk and Wk)
(10′) Hk
(11′) not-Wk

There is no interpretation of k, H, and W according to which (9′) and (10′) are true and (11′) is false. The reason why turns on the semantic properties of and and not, which are knowable a priori. Suppose (9′) and (10′) are true on some interpretation of the variable terms. Then the meaning of not in (9′) makes it the case that Hk and Wk is false, which, by the meaning of and requires that Hk is false or Wk is false. Given (10′), it must be that Wk is false, that is, not-Wk is true. So, there can’t be an interpretation of the variable terms according to which (9′) and (10′) are true and (11′) is false, and, as the above reasoning illustrates, this is due exclusively to the semantic properties of not and and. So the reason that it is impossible that an interpretation of k, H, and W make (9′) and (10′) true and (11′) false is that the supposition otherwise is inconsistent with the semantic functioning of not and and. Compare: the supposition that there is an interpretation of k according to which k is a female is true and k is not US President is false does not seem to violate the semantic properties of the constant terms. If we identify the meanings of the predicates with their extensions in all possible worlds, then the supposition that there is a female U.S. President does not violate the meanings of female and US President for surely it is possible that there be a female US President. But, supposing that (9′) and (10′) could be true with (11′) false on some interpretation of k, H, and W, violates the semantic properties of either and or not.

In sum, our first-step characterization of logical consequence is the following. For a given language L,

X is a logical consequence of K if and only if there is no possible interpretation of the non-logical terminology of L according to which all the sentence in K are true and X is false.

A possible interpretation of the non-logical terminology of the language L according to which sentences are true or false is a reading of the non-logical terms according to which the sentences receive a truth-value (that is, is either true or false) in a situation that is not ruled out by the semantic properties of the logical constants. The philosophical locus of the technical development of ‘possible interpretation’ in terms of models is Tarski (1936). A model for a language L is the theoretical development of a possible interpretation of non-logical terminology of L according to which the sentences of L receive a truth-value. Models have become standard tools for characterizing the logical consequence relation, and the characterization of logical consequence in terms of models is called the Tarskian or model-theoretic characterization of logical consequence. We say that X is a model-theoretic consequence of K if and only if all models of K are models of X. This relation may be represented as K ⊨ X. If model-theoretic consequence is adequate as a representation of logical consequence, then it must reflect the salient features of the common concept, which, according to Tarski means that it must be necessary, formal and a priori.

For further discussion of this conception of logical consequence, see the article, Logical Consequence, Model-Theoretic Conceptions.

ii. The Nature of Logical Constants Explained in Terms of Their Inferential Properties

We now turn to a second approach to understanding logical constants. Instead of understanding the nature of logical constants in terms of their semantic properties as is done on the model-theoretic approach, on the second approach we appeal to their inferential properties conceived of in terms of principles of inference, that is, principles justifying steps in deductions. We begin with a remark made by Aristotle. In his study of logical consequence, Aristotle comments that

A syllogism is discourse in which, certain things being stated, something other than what is stated follows of necessity from their being so. I mean by the last phrase that they produce the consequence, and by this, that no further term is required from without in order to make the consequence necessary. (Prior Analytics, p. 24b)

Adapting this to our X and K, we may say that X is a logical consequence of K when the sentences of K are sufficient to produce X. How are we to think of a sentence being produced by others? One way of developing this is to appeal to a notion of an actual or possible deduction. X is a deductive consequence of K if and only if there is a deduction of X from K. In such a case, we say that X may be correctly inferred from K or that it would be correct to conclude X from K. A deduction is associated with a pair ; the set K of sentences is the basis of the deduction, and X is the conclusion. A deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S (that is, each intermediate conclusion) is derived from a sentence (or more) in K or from previous sentences in S in accordance with a correct principle of inference.

For example, intuitively, the following inference seems correct.

  Kelly is not both at home and at work
  Kelly is at home
(therefore) Kelly is not at work

The set K of sentences above the line is the basis of the inference and the sentence X below is the conclusion. We represent their logical forms, again, as follows.

  (9′) not-(Hk and Wk)
  (10′) Hk
(therefore) (11′) not-Wk

Consider the following deduction of (11′) from (10′) and (9′).

Deduction: Assume that (12′) Wk. Then from (10′) and (12′) we may deduce that (13′) Hk and Wk. (13′) contradicts (9′) and so (12′), our initial assumption, must be false. We have deduced not-Wk from not-(Hk and Wk) and Hk.

Since the deduction of not-Wk from not-(Hk and Wk) and Hk did not depend on the interpretation of k, W, and H, the deductive relation is formal. Furthermore, my knowledge of this is a priori because my knowledge of the underlying principles of inference in the above deduction is not empirical. For example, letting P and Q be any sentences, we know a priori that P and Q may be inferred from the set K={P, Q} of basis sentences. This principle grounds the move from (10′) and (12′) to (13′). Also, the deduction appeals to the principle that if we deduce a contradiction from an assumption, then we may infer that the assumption is false. The correctness of this principle seems to be an a priori matter. Let’s look at another example of a deduction.

  (1) Some children are both lawyers and peacemakers
(therefore) (2) Some children are peacemakers

The logical forms are, again, the following.

  (1′) Some S are both M and P
(therefore) (2′) Some S are P

Again, intuitively, (2′) is deducible from (1′).

Deduction: The basis tells us that at least one S–let’s call this Sa‘–is both an M and a P. Clearly, a is a P may be deduced from a is both an M and a P. Since we’ve assumed that a is an S, what we derive with respect to a we derive with respect to some S. So our derivation of a is a P is a derivation of Some S is a P, which is our desired conclusion.

Since the deduction is formal, we have shown not merely that (2) can be correctly inferred from (1), but we have shown that for any interpretation of S, M, and P it is correct to infer (2′) from (1′).

Typically, deductions leave out steps (perhaps because they are too obvious), and they usually do not justify each and every step made in moving towards the conclusion (again, obviousness begets brevity). The notion of a deduction is made precise by describing a mechanism for constructing deductions that are both transparent and rigorous (each step is explicitly justified and no steps are omitted). This mechanism is a deductive system (also known as a formal system or as a formal proof calculus). A deductive system D is a collection of rules that govern which sequences of sentences, associated with a given , are allowed and which are not. Such a sequence is called a proof in D (or, equivalently, a deduction in D) of X from K. The rules must be such that whether or not a given sequence associated with qualifies as a proof in D of X from K is decidable purely by inspection and calculation. That is, the rules provide a purely mechanical procedure for deciding whether a given object is a proof in D of X from K.

We say that a deductive system D is correct when for any K and X, proofs in D of X from K correspond to intuitively valid deductions. For example, intuitively, there are no correct principles of inference according to which it is correct to conclude

Some animals are both mammals and reptiles

on the basis of the following two sentences.

Some animals are mammals
Some animals are reptiles

Hence, a proof in a deductive system of the former sentence from the latter two is evidence that the deductive system is incorrect. The point here is that a proof in D may fail to represent a deduction if D is incorrect.

A rich variety of deductive systems have been developed for registering deductions. Each system has its advantages and disadvantages, which are assessed in the context of the more specific tasks the deductive system is designed to accomplish. Historically, the general purpose of the construction of deductive systems was to reduce reasoning to precise mechanical rules (Hodges 1983, p. 26). Some view a deductive system defined for a language L as a mathematical model of actual or possible chains of correct reasoning in L. Sundholm (1983) offers a thorough survey of three main types of deductive systems. For a shorter, excellent introduction to the concept of a deductive system see Henkin (1967). A deductive system is developed in detail in the accompanying article, Logical Consequence, Deductive-Theoretic Conceptions.

If there is a proof of X from K in deductive system D, then we may say that X is a deductive consequence in D of K, which is sometimes expressed as K ⊢D X. Relative to a correct deductive system D, we characterize logical consequence in terms of deductive consequence as follows.

X is a logical consequence of K if and only if X is a deductive consequence in D of K, that is, there is an actual or possible proof in D of X from K.

This is called the deductive-theoretic (or proof-theoretic) characterization of logical consequence.

3. Model-Theoretic and Deductive-Theoretic Conceptions of Logic

We began with Tarski’s observations of the common or ordinary concept of logical consequence that we employ in daily life. According to Tarski, if X is a logical consequence of a set of sentences, K, then, in virtue of the logical forms of the sentences involved, if all of the members of K are true, then X must be true, and furthermore, we know this a priori. The formality criterion makes the logical constants the essential determinant of the logical consequence relation. The logical consequence relation is fixed exclusively in terms of the nature of the logical terminology. We have highlighted two different approaches to the nature of a logical constant: (1) in terms of its semantic contribution to sentences in which it occurs and (2) in terms of its inferential properties. The two approaches yield distinct conceptions of the notion of necessity inherent in the common concept of logical consequence, and lead to the following characterizations of logical consequence.

(1) X is a logical consequence of K if and only if there is no possible interpretation of the non-logical terminology of the language according to which all the sentences in K are true and X is false.

(2) X is a logical consequence of K if and only if X is deducible from K.

We make the notions of possible interpretation in (1) and deducibility in (2) precise by appealing to the technical notions of model and deductive system. This leads to the following theoretical characterizations of logical consequence.

(1) The model-theoretic characterization of logical consequence: X is a logical consequence of K iff all models of K are models of X.

(2) The deductive- theoretic characterization of logical consequence: X is a logical consequence of K iff there is a deduction in a correct deductive system of X from K.

Following Shapiro (1991, p. 3) define a logic to be a language L plus either a model-theoretic or a deductive-theoretic account of logical consequence. A language with both characterizations is a full logic just in case both characterizations coincide. A soundness proof establishes K ⊢D X only if K ⊨ X, and a completeness proof establishes K ⊢D X if K ⊨ X. These proofs together establish that the two characterizations coincide, and in such a case the deductive system D is said to be complete and sound with respect to the model-theoretic consequence relation defined for the relevant language L.

We said that the primary aim of logic is to tell us what follows logically from what. These two characterizations of logical consequence lead to two different orientations or conceptions of logic (see Tharp 1975, p. 5).

Model-theoretic approach: Logic is a theory of possible interpretations. For a given language the class of situations that can–logically–be described by that language.

Deductive-theoretic approach: Logic is a theory of formal deductive inference.

The article now concludes by highlighting three considerations relevant to evaluating a particular deployment of the model-theoretic or deductive-theoretic definition in defining logical consequence. These considerations emerge from the above development of the two theoretic definitions from the common concept of logical consequence.

4. Conclusion

The two theoretical characterizations of logical consequence do not provide the means for drawing a boundary in a language L between logical and non-logical terms. Indeed, their use presupposes that a list of logical terms is in hand. Hence, in evaluating a model-theoretic or deductive-theoretic definition of logical consequence for a language L the issue arises whether or not the boundary in L between logical and non-logical terms has been correctly drawn. This requires a response to a central question in the philosophy of logic: what qualifies as a logical constant? Tarski gives a well-reasoned response in his (1986). (For more recent discussion see McCarthy 1981 and 1998, Hanson 1997, and Warbrod 1999.)

A second thing to consider in evaluating a theoretical account of logical consequence is whether or not its characterization of the logical terminology is accurate. For example, model-theoretic and deductive accounts of logical consequence are inadequate unless they reflect the semantic and inferential properties of the logical terms, respectively. So a model-theoretic account is inadequate unless it gets right the semantic contributions of the logical terms to the truth conditions of the sentences formed using them. For a particular deductive system D, the question arises whether or not D’s rules of inference reflect the inferential properties of the logical terms. (For further discussion of the semantic and inferential properties of logical terms see Haack 1978 and 1996, Read 1995, and Quine 1986.)

A third consideration in assessing the success of a theoretical definition of logical consequence is whether or not the definition, relative to a selection of terms as logical, reflects the salient features of the common concept of logical consequence. There are criticisms of the theoretical definitions that claim that they are incapable of reflecting the common concept of logical consequence. Typically, such criticisms are used to question the status of the model-theoretic and deductive-theoretic approaches to logic.

For example, there are critics who question the model-theoretic approach to logic by arguing that any model-theoretic account lacks the conceptual resources to reflect the notion of necessity inherent in the common concept of logical consequence because such an account does not rule out the possibility of there being logically possible situations in which sentences in K are true and X is false even though every model of K is a model of X. Kneale (1961) is an early critic, Etchemendy (1988, 1999) offers a sustained and multi-faceted attack. Also, it is argued that the model-theoretic approach to logic makes knowledge of what follows from what depend on knowledge of the existence of models, which is knowledge of worldly matters of fact. But logical knowledge should not depend on knowledge about the extra-linguistic world (recall the locked room metaphor in 2.2.1). This standard logical positivist line has been recently challenged by those who see logic penetrated and permeated by metaphysics (e.g., Putnam 1971, Almog 1989, Sher 1991, Williamson 1999).

The status of the deductive-theoretic approach to logic is not clear for, as Tarski argues in his (1936), deductive-theoretic accounts are unable to reflect the fact that, according to the common concept, logical consequence is not compact. Relative to any deductive system D, the ⊢D-consequence relation is compact if and only if for any sentence X and set K of sentences, if K ⊢D X, then K’ ⊢D X, where K’ is a finite subset of sentences from K. But there are intuitively correct principles of inference according to which one may infer a sentence X from a set K of sentences, even though it is incorrect to infer X from any finite subset of K. This suggests that the intuitive notion of deducibility is not completely captured by any compact consequence relation. We need to weaken

X is a logical consequence of K if and only if there is a proof in a correct deductive system of X from K,

given above, to

X is a logical consequence of K if there is a proof in a correct deductive system of X from K.

In sum, the issue of the nature of logical consequence, which intersects with other areas of philosophy, is still a matter of debate. Tarski’s analysis of the concept is not universally accepted; philosophers and logicians differ over what the features of the common concept are. For example, some offer accounts of the logical consequence relation according to which it is not a priori (e.g., see Koslow 1999, Sher 1991 and see Hanson 1997 for criticism of Sher) or deny that it even need be strongly necessary (Smiley 1995, 2000, section 6). The entry Logical Consequence, Model-Theoretic Conceptions gives a model-theoretic definition of logical consequence. For a detailed development of a deductive system see the entry Logical Consequence, Deductive-Theoretic Conceptions. The critical discussion in both articles deepens and extends points made in the conclusion of this article.

5. References and Further Reading

  • Almog, J. (1989): “Logic and the World”, pp. 43-65 in Themes From Kaplan, ed. J. Almog, J. Perry, J., and H. Wettstein. New York: Oxford UP.
  • Aristotle. (1941): Basic Works, ed. R. McKeon. New York: Random House.
  • Bencivenga, E. (1999): “What is Logic About?”, pp. 5-19 in Varzi (1999).
  • Etchemendy, J. (1983): “The Doctrine of Logic as Form”, Linguistics and Philosophy 6, pp. 319-334.
  • Etchemendy, J. (1988): “Tarski on truth and logical consequence”, Journal of Symbolic Logic 53, pp. 51-79.
  • Etchemendy, J. (1999): The Concept of Logical Consequence. Stanford: CSLI Publications.
  • Fodor, J. (2000): The Mind Doesn’t Work That Way. Cambridge: The MIT Press.
  • Gabbay, D. and F. Guenthner, eds. (1983): Handbook of Philosophical Logic, Vol 1. Dordrecht: D. Reidel Publishing Company.
  • Haack, S. (1978): Philosophy of Logics . Cambridge: Cambridge University Press.
  • Haack, S. (1996): Deviant Logic, Fuzzy Logic. Chicago: The University of Chicago Press.
  • Hodges, W. (1983): “Elementary Predicate Logic”, in Gabbay, D. and F. Guenthner (1983).
  • Hamlyn, D.W. (1967): “A Priori and A Posteriori”, pp.105-109 in The Encyclopedia of Philosophy, Vol. 1, ed. P. Edwards. New York: Macmillan & The Free Press.
  • Hanson, W. (1997): “The Concept of Logical Consequence”, The Philosophical Review 106, pp. 365-409.
  • Henkin, L. (1967): “Formal Systems and Models of Formal Systems”, pp. 61-74 in The Encyclopedia of Philosophy, Vol. 8, ed. P. Edwards. New York: Macmillan & The Free Press.
  • Kneale, W. (1961): “Universality and Necessity”, British Journal for the Philosophy of Science 12, pp. 89-102.
  • Koslow, A. (1999): “The Implicational Nature of Logic: A Structuralist Account”, pp. 111-155 in Varzi (1999).
  • McCarthy, T. (1981): “The Idea of a Logical Constant”, Journal of Philosophy 78, pp. 499-523.
  • McCarthy, T. (1998): “Logical Constants”, pp. 599-603 in Routledge Encyclopedia of Philosophy, Vol. 5, ed. E. Craig. London: Routledge.
  • McGee, V. (1999): “Two Problems with Tarski’s Theory of Consequence”, Proceedings of the Aristotelean Society 92, pp. 273-292.
  • Moore, G.E., (1959): “Certainty”, pp. 227-251 in Philosophical Papers. London: George Allen & Unwin.
  • Priest. G. (1995): “Etchemendy and Logical Consequence”, Canadian Journal of Philosophy 25, pp. 283-292.
  • Putnam, H. (1971): Philosophy of Logic. New York: Harper & Row.
  • Quine, W.V. (1986): Philosophy of Logic, 2nd ed.. Cambridge: Harvard UP.
  • Read, S. (1995): Thinking About Logic. Oxford: Oxford UP.
  • Shapiro, S. (1991): Foundations without Foundationalism: A Case For Second-Order Logic. Oxford: Clarendon Press.
  • Shapiro, S. (1993): ” Modality and Ontology”, Mind 102, pp. 455-481.
  • Shapiro, S. (1998): “Logical Consequence: Models and Modality”, pp. 131-156 in The Philosophy of Mathematics Today, ed. Matthias Schirn. Oxford, Clarendon Press.
  • Shapiro, S. (2000): Thinking About Mathematics , Oxford: Oxford University Press.
  • Sher, G. (1989): “A Conception of Tarskian Logic”, Pacific Philosophical Quarterly 70, pp. 341-368.
  • Sher, G. (1991): The Bounds of Logic: A Generalized Viewpoint, Cambridge, MA: The MIT Press.
  • Sher, G. (1996): “Did Tarski commit ‘Tarski’s fallacy’?” Journal of Symbolic Logic 61, pp. 653-686.
  • Sher, G. (1999): “Is Logic a Theory of the Obvious?”, pp. 207-238 in Varzi (1999).
  • Smiley, T. (1995): “A Tale of Two Tortoises”, Mind 104, pp. 725-36.
  • Smiley, T. (1998): “Consequence, Conceptions of”, pp. 599-603 in Routledge Encyclopedia of Philosophy, vol. 2, ed. E. Craig. London: Routledge.
  • Sundholm, G. (1983): “Systems of Deduction”, in Gabbay and Guenthner (1983).
  • Tarski, A. (1933): “Pojecie prawdy w jezykach nauk dedukeycyjnych”, translated as “On the Concept of Truth in Formalized Languages”, pp. 152-278 in Tarski (1983).
  • Tarski, A. (1936): “On the Concept of Logical Consequence”, pp. 409-420 in Tarski (1983).
  • Tarski, A. (1983): Logic, Semantics, Metamathematics, 2nd ed. Indianapolis: Hackett Publishing.
  • Tarski, A. (1986): “What are logical notions?” History and Philosophy of Logic 7, pp. 143-154.
  • Tharp, L. (1975): “Which Logic is the Right Logic?” Synthese 31, pp. 1-21.
  • Warbrod, K., (1999): “Logical Constants” Mind 108, pp. 503-538.
  • Williamson, T. (1999): “Existence and Contingency”, Proceedings of the Aristotelian Society Supplementary Vol. 73, pp. 181-203.
  • Varzi, A., ed. (1999): European Review of Philosophy, Vol. 4: The Nature of Logic, Stanford: CSLI Publications.

Author Information

Matthew McKeon
Email: mckeonm@msu.edu
Michigan State University
U. S. A.

Deductive-Theoretic Conceptions of Logical Consequence

According to the deductive-theoretic conception of logical consequence, a sentence X is a logical consequence of a set K of sentences if and only if X is a deductive consequence of K, that is, X is deducible or provable from K. Deductive consequence is clarified in terms of the notion of proof in a correct deductive system. Since, arguably, logical consequence conceived deductive-theoretically is not a compact relation and deducibility in a deductive system is, there are languages for which deductive consequence cannot be defined in terms of deducibility in a correct deductive system. However, it is true that if a sentence is deducible in a correct deductive system from other sentences, then the sentence is a deductive consequence of them. A deductive system is correct only if its rules of inference correspond to intuitively valid principles of inference. So whether or not a natural deductive system is correct brings into play rival theories of valid principles of inference such as classical, relevance, intuitionistic, and free logics.

Table of Contents

  1. Introduction
  2. Linguistic Preliminaries: the Language M
    1. Syntax of M
    2. Semantics for M
  3. What is a Logic?
  4. Deductive System N
  5. The Status of the Deductive Characterization of Logical Consequence in Terms of N
    1. Tarski’s argument that the model-theoretic characterization of logical consequence is more basic than its characterization in terms of a deductive system
    2. Is deductive system N correct?
      1. Relevance logic
      2. Intuitionistic logic
      3. Free logic
  6. Conclusion
  7. References and Further Reading

1. Introduction

According to the deductive-theoretic conception of logical consequence, a sentence X is a logical consequence of a set K of sentences if and only if X is a deductive consequence of K, that is, X is deducible from K. X is deducible from K just in case there is an actual or possible deduction of X from K. In such a case, we say that X may be correctly inferred from K or that it would be correct to conclude X from K. A deduction is associated with a pair ; the set K of sentences is the basis of the deduction, and X is the conclusion. A deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S (that is, each intermediate conclusion) is derived from a sentence (or more) in K or from previous sentences in S in accordance with a correct principle of inference. The notion of a deduction is clarified by appealing to a deductive system. A deductive system D is a collection of rules that govern which sequences of sentences, associated with a given , are allowed and which are not. Such a sequence is called a proof in D (or, equivalently, a deduction in D) of X from K. The rules must be such that whether or not a given sequence associated with qualifies as a proof in D of X from K is decidable purely by inspection and calculation. That is, the rules provide a purely mechanical procedure for deciding whether a given object is a proof in D of X from K. We write

K ⊢D X

to mean

X is deducible in deductive system D from K.

See the entry Logical Consequence, Philosophical Considerations for discussion of the interplay between the concepts of logical consequence and deductive consequence, and deductive systems. We say that a deductive system D is correct when for any K and X, proofs in D of X from K correspond to intuitively valid deductions. For a given language the deductive consequence relation is defined in terms of a correct deductive system D only if it is true that

X is a deductive consequence of K if and only if X is deducible in D from K.

Sundholm (1983) offers a thorough survey of three main types of deductive systems. In this article, a natural deductive system is presented that originates in the work of the mathematician Gerhard Gentzen (1934) and the logician Fredrick Fitch (1952). We will refer to the deductive system as N (for ‘natural deduction’). For an in-depth introductory presentation of a natural deductive system very similar to N see Barwise and Etchemendy (2001). N is a collection of inference rules. A proof of X from K that appeals exclusively to the inference rules of N is a formal deduction or formal proof. We shall take a formal proof to be associated with a pair where K is a set of sentences from a first-order language M, which will be introduced below, and X is an M-sentence. The set K of sentences is the basis of the deduction, and X is the conclusion. We say that a formal deduction from K to X is a finite sequence S of sentences ending with X such that each sentence in S is either an assumption, deduced from a sentence (or more) in K, or deduced from previous sentences in S in accordance with one of N’s inference rules.

Formal proofs are not only epistemologically significant for securing knowledge, but also the derivations making up formal proofs may serve as models of the informal deductive reasoning performed using sentences from language M. Indeed, a primary value of a formal proof is that it can serve as a model of ordinary deductive reasoning that explains the force of such reasoning by representing the principles of inference required to get to X from K.

Gentzen, one of the first logicians to present a natural deductive system, makes clear that a primary motive for the construction of his system is to reflect as accurately as possible the actual logical reasoning involved in mathematical proofs. He writes,

My starting point was this: The formalization of logical deduction especially as it has been developed by Frege, Russell, and Hilbert, is rather far removed from the forms of deduction used in practice in mathematical proofs…In contrast, I intended first to set up a formal system which comes as close as possible to actual reasoning. The result was a ‘calculus of natural deduction’. (Gentzen 1934, p. 68)

Natural deductive systems are distinguished from other deductive systems by their usefulness in modeling ordinary, informal deductive inferential practices. Paraphrasing Gentzen, we may say that if one is interested in seeing logical connections between sentences in the most natural way possible, then a natural deductive system is a good choice for defining the deductive consequence relation.

The remainder of the article proceeds as follows. First, an interpreted language M is given. Next, we present the deductive system N and represent the deductive consequence relation in M. After discussing the philosophical significance of the deductive consequence relation defined in terms of N, we consider some standard criticisms of the correctness of deductive system N.

2. Linguistic Preliminaries: the Language M

Here we define a simple language M, a language about the McKeon family, by first sketching what strings qualify as well-formed formulas (wffs) in M. Next we define sentences from formulas, and then give an account of truth in M, that is we describe the conditions in which M-sentences are true.

a. Syntax of M

Building blocks of formulas

Terms

Individual names—’beth’, ‘kelly’, ‘matt’, ‘paige’, ‘shannon’, ‘evan’, and ‘w1‘, ‘w2‘, ‘w3‘, etc.

Variables—’x’, ‘y’, ‘z’, ‘x1‘, ‘y1‘, ‘z1‘, ‘x2‘, ‘y2‘, ‘z2‘, etc.

Predicates

1-place predicates—’Female’, ‘Male’

2-place predicates—’Parent’, ‘Brother’, ‘Sister’, ‘Married’, ‘OlderThan’, ‘Admires’, ‘=’.

Blueprints of well-formed formulas (wffs)

Atomic formulas: An atomic wff is any of the above n-place predicates followed by n terms which are enclosed in parentheses and separated by commas.

Formulas: The general notion of a well-formed formula (wff) is defined recursively as follows:

(1) All atomic wffs are wffs.
(2) If α is a wff, so is ''.
(3) If α and β are wffs, so is '(α & β)'.
(4) If α and β are wffs, so is 'v β)'.
(5) If α and β are wffs, so is '(α → β)'.
(6) If Ψ is a wff and v is a variable, then 'vΨ' is a wff.
(7) If Ψ is a wff and v is a variable, then 'vΨ' is a wff.
Finally, no string of symbols is a well-formed formula of M unless the string can be derived from (1)-(7).

The signs ‘~’, ‘&’, ‘v‘, and ‘→’, are called sentential connectives. The signs ‘∀’ and ‘∃’ are called quantifiers.

It will prove convenient to have available in M an infinite number of individual names as well as variables. The strings ‘Parent(beth, paige)’ and ‘Male(x)’ are examples of atomic wffs. We allow the identity symbol in an atomic formula to occur in between two terms, e.g., instead of ‘=(evan, evan)’ we allow ‘(evan = evan)’. The symbols ‘~’, ‘&’, ‘v‘, and ‘→’ correspond to the English words ‘not’, ‘and’, ‘or’ and ‘if…then’, respectively. ‘∃’ is our symbol for an existential quantifier and ‘∀’ represents the universal quantifier. 'vΨ' and 'vΨ' correspond to for some v, Ψ, and for all v, Ψ, respectively. For every quantifier, its scope is the smallest part of the wff in which it is contained that is itself a wff. An occurrence of a variable v is a bound occurrence iff it is in the scope of some quantifier of the form 'v' or the form 'v', and is free otherwise. For example, the occurrence of ‘x’ is free in ‘Male(x)’ and in ‘∃y Married(y, x)’. The occurrences of ‘y’ in the second formula are bound because they are in the scope of the existential quantifier. A wff with at least one free variable is an open wff, and a closed formula is one with no free variables. A sentence is a closed wff. For example, ‘Female(kelly)’ and ‘∃y∃x Married(y, x)’ are sentences but ‘OlderThan(kelly, y)’ and ‘(∃x Male(x) & Female(z))’ are not. So, not all of the wffs of M are sentences. As noted below, this will affect our definition of truth for M.

b. Semantics for M

We now provide a semantics for M. This is done in two steps. First, we specify a domain of discourse, that is, the chunk of the world that our language M is about, and interpret M’s predicates and names in terms of the elements composing the domain. Then we state the conditions under which each type of M-sentence is true. To each of the above syntactic rules (1-7) there corresponds a semantic rule that stipulates the conditions in which the sentence constructed using the syntactic rule is true. The principle of bivalence is assumed and so ‘not true’ and ‘false’ are used interchangeably. In effect, the interpretation of M determines a truth-value (true, false) for each and every sentence of M.

Domain D—The McKeons: Matt, Beth, Shannon, Kelly, Paige, and Evan.

Here are the referents and extensions of the names and predicates of M.

Terms: ‘matt’ refers to Matt, ‘beth’ refers to Beth, ‘shannon’ refers to Shannon, etc.

Predicates. The meaning of a predicate is identified with its extension, that is the set (possibly empty) of elements from the domain D the predicate is true of. The extension of a one-place predicate is a set of elements from D, the extension of a two-place predicate is a set of ordered pairs of elements from D.

The extension of ‘Male’ is {Matt, Evan}.

The extension of ‘Female’ is {Beth, Shannon, Kelly, Paige}.

The extension of ‘Parent’ is {<Matt, Shannon>, <Matt, Kelly>, <Matt, Paige>, <Matt, Evan>, <Beth, Shannon>, <Beth, Kelly>, <Beth, Paige>, <Beth, Evan>}.

The extension of ‘Married’ is {<Matt, Beth>, <Beth, Matt>}.

The extension of ‘Sister’ is {<Shannon, Kelly>, <Kelly, Shannon>, <Shannon, Paige>, <Paige, Shannon>, <Kelly, Paige>, <Paige, Kelly>, <Kelly, Evan>, <Paige, Evan>, <Shannon, Evan>}.

The extension of ‘Brother’ is {<Evan, Shannon>, <Evan, Kelly>, <Evan, Paige>}.

The extension of ‘OlderThan’ is {<Beth, Matt>, <Beth, Shannon>, <Beth, Kelly>, <Beth, Paige>, <Beth, Evan>, <Matt, Shannon>, <Matt, Kelly>, <Matt, Paige>, <Matt, Evan>, <Shannon, Kelly>, <Shannon, Paige>, <Shannon, Evan>, <Kelly, Paige>, <Kelly, Evan>, <Paige, Evan>}.

The extension of ‘Admires’ is {<Matt, Beth>, <Shannon, Matt>, <Shannon, Beth>, <Kelly, Beth>, <Kelly, Matt>, <Kelly, Shannon>, <Paige, Beth>, <Paige, Matt>, <Paige, Shannon>, <Paige, Kelly>, <Evan, Beth>, <Evan, Matt>, <Evan, Shannon>, <Evan, Kelly>, <Evan, Paige>}.

The extension of ‘=’ is {<Matt, Matt>, <Beth, Beth>, <Shannon, Shannon>, <Kelly, Kelly>, <Paige, Paige>, <Evan, Evan>}.

The atomic sentence ‘Female(kelly)’ is true because, as indicated above, the referent of ‘kelly’ is in the extension of the property designated by ‘Female’. The atomic sentence ‘Married(shannon, kelly)’ is false because the ordered pair is not in the extension of the relation designated by ‘Married’.

(I) An atomic sentence with a one-place predicate is true iff the referent of the term is a member of the extension of the predicate, and an atomic sentence with a two-place predicate is true iff the ordered pair formed from the referents of the terms in order is a member of the extension of the predicate.
(II) '' is true iff α is false.
(III) '(α & β)' is true when both α and β are true; otherwise '(α & β)' is false.
(IV) 'v β)' is true when at least one of α and β is true; otherwise 'v β)' is false.
(V) '(α → β)' is true if and only if (iff) α is false or β is true. So, '(α → β)' is false just in case α is true and β is false.

The meanings for ‘~’ and ‘&’ roughly correspond to the meanings of ‘not’ and ‘and’ as ordinarily used. We call '' and '(α & β)' negation and conjunction formulas, respectively. The formula '(~α v β)' is called a disjunction and the meaning of ‘v‘ corresponds to inclusive or. There are a variety of conditionals in English (e.g., causal, counterfactual, logical), with each type having a distinct meaning. The conditional defined by (V) above is called the material conditional. One way of following (V) is to see that the truth conditions for '(α → β)' are the same as for '~(α & ~β)'.

By (II) ‘~Married(shannon, kelly)’ is true because, as noted above, ‘Married(shannon, kelly)’ is false. (II) also tells us that ‘~Female(kelly)’ is false since ‘Female(kelly)’ is true. According to (III), ‘(~Married(shannon, kelly) & Female(kelly))’ is true because ‘~Married(shannon, kelly)’ is true and ‘Female(kelly)’ is true. And ‘(Male(shannon) & Female(shannon))’ is false because ‘Male(shannon)’ is false. (IV) confirms that ‘(Female(kelly) v Married(evan, evan))’ is true because, even though ‘Married(evan, evan)’ is false, ‘Female(kelly)’ is true. From (V) we know that the sentence ‘(~(beth = beth) → Male(shannon))’ is true because ‘~(beth = beth)’ is false. If α is false then '(α → β)' is true regardless of whether or not β is true. The sentence ‘(Female(beth) → Male(shannon))’ is false because ‘Female(beth)’ is true and ‘Male(shannon)’ is false.

Before describing the truth conditions for quantified sentences we need to say something about the notion of satisfaction. We’ve defined truth only for the formulas of M that are sentences. So, the notions of truth and falsity are not applicable to non-sentences such as ‘Male(x)’ and ‘((x = x) → Female(x))’ in which ‘x’ occurs free. However, objects may satisfy wffs that are non-sentences. We introduce the notion of satisfaction with some examples. An object satisfies ‘Male(x)’ just in case that object is male. Matt satisfies ‘Male(x)’, Beth does not. This is the case because replacing ‘x’ in ‘Male(x)’ with ‘Matt’ yields a truth while replacing the variable with ‘beth’ yields a falsehood. An object satisfies ‘((x = x) → Female(x))’ if and only if it is either not identical with itself or is a female. Beth satisfies this wff (we get a truth when ‘beth’ is substituted for the variable in all of its occurrences), Matt does not (putting ‘matt’ in for ‘x’ wherever it occurs results in a falsehood). As a first approximation, we say that an object with a name, say ‘a’, satisfies a wff 'Ψv' in which at most v occurs free if and only if the sentence that results by replacing v in all of its occurrences with ‘a’ is true. ‘Male(x)’ is neither true nor false because it is not a sentence, but it is either satisfiable or not by a given object. Now we define the truth conditions for quantifications, utilizing the notion of satisfaction. For a more detailed discussion of the notion of satisfaction, see the article, “Logical Consequence, Model-Theoretic Conceptions.”

Let Ψ be any formula of M in which at most v occurs free.

(VI) 'vΨ' is true just in case there is at least one individual in the domain of quantification (e.g. at least one McKeon) that satisfies Ψ.
(VII) 'vΨ' is true just in case every individual in the domain of quantification (e.g. every McKeon) satisfies Ψ.

Here are some examples. ‘∃x(Male(x) & Married(x, beth))’ is true because Matt satisfies ‘(Male(x) & Married(x, beth))’; replacing ‘x’ wherever it appears in the wff with ‘matt’ results in a true sentence. The sentence ‘∃xOlderThan(x, x)’ is false because no McKeon satisfies ‘OlderThan(x, x)’, that is replacing ‘x’ in ‘OlderThan(x, x)’ with the name of a McKeon always yields a falsehood.

The universal quantification ‘∀x( OlderThan(x, paige) → Male(x))’ is false for there is a McKeon who doesn’t satisfy ‘(OlderThan(x, paige) → Male(x))’. For example, Shannon does not satisfy ‘(OlderThan(x, paige) → Male(x))’ because Shannon satisfies ‘OlderThan(x, paige)’ but not ‘Male(x)’. The sentence ‘∀x(x = x)’ is true because all McKeons satisfy ‘x = x’; replacing ‘x’ with the name of any McKeon results in a true sentence.

Note that in the explanation of satisfaction we suppose that an object satisfies a wff only if the object is named. But we don’t want to presuppose that all objects in the domain of discourse are named. For the purposes of an example, suppose that the McKeons adopt a baby boy, but haven’t named him yet. Then, ‘∃x Brother(x, evan)’ is true because the adopted child satisfies ‘Brother(x, evan)’, even though we can’t replace ‘x’ with the child’s name to get a truth. To get around this is easy enough. We have added a list of names, ‘w1′, ‘w2′, ‘w3′, etc. to M, and we may say that any unnamed object satisfies 'Ψv' iff the replacement of v with a previously unused wi assigned as a name of this object results in a true sentence. In the above scenerio, ‘∃xBrother(x, evan)’ is true because, ultimately, treating ‘w1‘ as a temporary name of the child, ‘Brother(w1, evan)’ is true. Of course, the meanings of the predicates would have to be amended in order to reflect the addition of a new person to the domain of McKeons.

3. What is a Logic?

We have characterized an interpreted formal language M by defining what qualifies as a sentence of M and by specifying the conditions under which any M-sentence is true. The received view of logical consequence entails that the logical consequence relation in M turns on the nature of the logical constants in the relevant M-sentences. We shall regard just the sentential connectives, the quantifiers of M, and the identity predicate as logical constants (the language M is a first-order language). For discussion of the notion of a logical constant see Logical Consequence, Philosophical Considerations and Logical Consequence, Model-Theoretic Conceptions. Intuitively, one M-sentence is a logical consequence of a set of M-sentences if and only if it is impossible for all the sentences in the set to be true without the former sentence being true as well. A model-theoretic conception of logical consequence in M clarifies this intuitive characterization of logical consequence by appealing to the semantic properties of the logical constants, represented in the above truth clauses (I)-(VII). The entry Logical Consequence, Model-Theoretic Conceptions formalizes the account of truth in language M and gives a model-theoretic characterization of logical consequence in M. In contrast to the model-theoretic conception, the deductive-theoretic conception clarifies logical consequence, conceived of in terms of deducibility, by appealing to the inferential properties of logical constants portrayed as intuitively valid principles of inference, that is, principles justifying steps in deductions. See Logical Consequence, Philosophical Considerations for discussion of the relationship between the logical consequence relation and the model-theoretic and deductive-theoretic conceptions of it.

Deductive system N’s inference rules, introduced below, are introduction and elimination rules, defined for each logical constant of our language M. An introduction rule introduces a logical constant into a proof and is useful for deriving a sentence that contains the constant. An elimination rule for the constant makes it possible to derive a sentence that has at least one less occurrence of the logical constant. Elimination rules are useful for deriving a sentence from another in which the constant appears.

Following Shapiro (1991, p. 3), we define a logic to be a language L plus either a model-theoretic or a deductive-theoretic account of logical consequence. A language with both characterizations is a full logic just in case both characterizations coincide. For discussion on the relationship between the model-theoretic and deductive-theoretic accounts of logical consequence, see Logical Consequence, Philosophical Considerations. The logic for M developed below may be viewed as a classical logic or a first-order theory.

4. Deductive System N

In stating N’s rules, we begin with the simpler inference rules and give a sample formal deduction of them in action. Then we turn to the inference rules that employ what we shall call sub-proofs. In the statement of the rules, we let P and Q be any sentences from our language M. We shall number each line of a formal deduction with a positive integer. We let k, l, m, n, o, p and q be any positive integers such that k < m, and l < m, and m < n < o < p < q.

&-Intro

k. P
l. Q
m. (P & Q) &-Intro: k, l

&-Elim

k. (P & Q) k. (P & Q)
m. P &-Elim: k m. Q &-Elim: k

&-Intro allows us to derive a conjunction from both of its two parts (called conjuncts). According to the &-Elim rule we may derive a conjunct from a conjunction. To the right of the sentence derived using an inference rule is the justification. Steps in a proof are justified by identifying both the lines in the proof used and by citing the appropriate rule. The vertical lines serve as proof margins, which, as you will shortly see, help in portraying the structure of a proof when it contains embedded sub-proofs.

~-Elim

k. ~~P
m. P ~-Elim: k

The ~-Elim rule allows us to drop double negations and infer what was subject to the two negations.

v-Intro

k. P k. P
m. (P v Q) v-Intro: k m. (Q v P) v-Intro: k

By v-Intro we may derive a disjunction from one of its parts (called disjuncts).

-Elim

k. (P → Q)
l. P
m. Q →-Elim: k, l

The →- Elim rule corresponds to the principle of inference called modus ponens: from a conditional and its antecedent one may infer the consequent.

Here’s a sample deduction using the above inference rules. The formal deduction–the sequence of sentences 4-11—is associated with the pair

<{(Female(paige) & Female (kelly)), (Female(paige) → ~~Sister(paige, kelly)), (Female(kelly) → ~~Sister(paige, shannon))}, ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan))>.

The first element is the set of basis sentences and the second element is the conclusion. We number the basis sentences and list them (beginning with 1) ahead of the deduction. The deduction ends with the conclusion.

1. (Female(paige) & Female (kelly)) Basis
2. (Female(paige) → ~~Sister(paige, kelly)) Basis
3. (Female(kelly) → ~~Sister(paige, shannon)) Basis
4. Female(paige) &-Elim: 1
5. Female(kelly) &-Elim: 1
6. ~~Sister(paige, kelly) →-Elim: 2, 4
7. Sister(paige, kelly) ~-Elim: 6
8. ~~Sister(paige, shannon) →-Elim: 3, 5
9. Sister(paige, shannon) ~-Elim: 8
10. (Sister(paige, kelly) & Sister(paige, shannon)) &-Intro: 7, 9
11. ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan)) v-Intro: 10

Again, the column all the way to the right gives the explanations for each line of the proof. Assuming the adequacy of N, the formal deduction establishes that the following inference is correct.

(Female(paige) & Female (kelly))
(Female(paige) → ~~Sister(paige, kelly))
(Female(kelly) → ~~Sister(paige, shannon))


(therefore) ((Sister(paige, kelly) & Sister(paige, shannon)) v Male(evan))

For convenience in building proofs, we expand M to include ‘⊥’, which we use as a symbol for a contradiction (e.g., ‘(Female(beth) & ~Female(beth))’).

⊥-Intro

k. P
l. ~P
m. ⊥-Intro: k, l

⊥-Elim

k.
m. P ⊥-Elim: k

If we have derived a sentence and its negation we may derive ⊥ using ⊥-Intro. The ⊥-Elim rule represents the idea that any sentence P is deducible from a contradiction. So, from ⊥ we may derive any sentence P using ⊥-Elim.

Here’s a deduction using the two rules.

1. (Parent(beth, evan) & ~Parent(beth, evan)) Basis
2. Parent(beth, evan) &-Elim: 1
3. ~Parent(beth, evan) &-Elim: 1
4. ⊥-Intro: 2, 3
5. Parent(beth, shannon) ⊥-Elim: 4

For convenience, we introduce a reiteration rule that allows us to repeat steps in a proof as needed.

Reit

k. P
.
.
.
m. P Reit: k

We now turn to the rules for the sentential connectives that employ what we shall call sub-proofs. Consider the following inference.

1. ~(Married(shannon, kelly) & OlderThan(shannon, kelly))
2. Married(shannon, kelly)


(therefore) ~Olderthan(shannon, kelly)

Here is an informal deduction of the conclusion from the basis sentences.

Proof: Suppose that ‘Olderthan(shannon, kelly)’ is true. Then, from this assumption and basis sentence 2 it follows that ‘((Shannon is married to Kelly) & (Shannon is taller than Kelly))’ is true. But this contradicts the first basis sentence ‘~((Shannon is married to Kelly) & (Shannon is taller than Kelly))’, which is true by hypothesis. Hence our initial supposition is false. We have derived that ‘~(Shannon is married to Kelly)’ is true.

Such a proof is called a reductio ad absurdum proof (or reductio for short). Reductio ad absurdum is Latin for ‘reduction to the absurd’. (For more information, see the article “Reductio ad absurdum“.) In order to model this proof in N we introduce the ~-Intro rule.

~-Intro

k. P Assumption
.
.
.
m.
n. ~P ~-Intro: k-m

The ~-Intro rule allows us to infer the negation of an assumption if we have derived a contradiction, symbolized by ‘⊥’, from the assumption. The indented proof margin (k-m) signifies a sub-proof. In a sub-proof the first line is always an assumption (and so requires no justification), which is cancelled when the sub-proof is ended and we are back out on a line that sits on a wider proof margin. The effect of this is that we can no longer appeal to any of the lines in the sub-proof to generate later lines on wider proof margins. No deduction ends in the middle of a sub-proof.

Here is a formal analogue of the above informal reductio.

1. ~(Married(shannon, kelly) & OlderThan(shannon, kelly)) Basis
2. Married(shannon, kelly) Basis
3. OlderThan(shannon, kelly) Assumption
4. (Married(shannon, kelly) & OlderThan(shannon, kelly)) &-Intro: 2, 3
5. ⊥-Intro: 1, 4
6. ~Olderthan(shannon, kelly) ~-Intro: 3-5

We signify a sub-proof with the indented proof margin line; the start and finish of a sub-proof is indicated by the start and break of the indented proof margin. An assumption, like a basis sentence, is a supposition we suppose true for the purposes of the deduction. The difference is that whereas a basis sentence may be used at any step in a proof, an assumption may only be used to make a step within the sub-proof it heads. At the end of the sub-proof, the assumption is discharged. We now look at more sub-proofs in action and introduce another of N’s inference rules. Consider the following inference.

1. (Male(kelly) v Female(kelly))
2. (Male(kelly) → ~Sister(kelly, paige))
3. (Female(kelly) → ~Brother(kelly, evan))


(therefore) (~Sister(kelly, paige) v ~Brother(kelly, evan))

Informal Proof:

By assumption ‘(Male(kelly) v Female(kelly))’ is true, that is, by assumption at least one of the disjuncts is true.

Suppose that ‘Male(kelly)’ is true. Then by modus ponens we may derive that ‘~Sister(kelly, paige)’ is true from this assumption and the basis sentence 2. Then ‘(~Sister(kelly, paige) v ~Brother(kelly, evan))’ is true.

Suppose that ‘Female(kelly)’ is true. Then by modus ponens we may derive that ‘~Brother(kelly, evan)’ is true from this assumption and the basis sentence 3. Then ‘(~Sister(kelly, paige) v ~Brother(kelly, evan))’ is true.

So in either case we have derived that ‘(~Sister(kelly, paige) v ~Brother(kelly, evan))’ is true. Thus we have shown that this sentence is a deductive consequence of the basis sentences.

We model this proof in N using the v-Elim rule.

v-Elim

k. (P v Q)
m. P Assumption
.
.
.
n. R
o. Q Assumption
.
.
.
p. R
q. R v-Elim: k, m-n, o-p

The v-Elim rule allows us to derive a sentence from a disjunction by deriving it from each disjunct, possibly using sentences on earlier lines that sit on wider proof margins.

The following formal proof models the above informal one.

1. (Male(kelly) v Female(kelly)) Basis
2. (Male(kelly) → ~Sister(kelly, paige)) Basis
3. (Female(kelly) → ~Brother(kelly, evan)) Basis
4. Male(kelly) Assumption
5. ~Sister(kelly, paige) →-Elim: 2, 4
6. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Intro: 5
7. Female(kelly) Assumption
8. ~Brother(kelly, evan) →-Elim: 3, 7
9. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Intro: 8
10. (~Sister(kelly, paige) v ~Brother(kelly, evan)) v-Elim: 1, 4-6, 7-9

1. (P v Q) Basis
2. ~P Basis
3. P Assumption
4. ⊥-Intro: 2, 3
5. Q ⊥-Elim: 4
6. Q Assumption
7. Q Reit: 6
8. Q v-Elim: 1, 3-5, 6-7

Now we introduce the →-Intro rule by considering the following inference.

1. (Olderthan(shannon, kelly) → OlderThan(shannon, paige))
2. (OlderThan(shannon, paige) → OlderThan(shannon, evan))


(therefore) (Olderthan(shannon, kelly) → OlderThan(shannon, evan))

Informal proof:

Suppose that OlderThan(shannon, kelly). From this assumption and basis sentence 1 we may derive, by modus ponens, that OlderThan(shannon, paige). From this and basis sentence 2 we get, again by modus ponens, that OlderThan(shannon, evan). Hence, if OlderThan(shannon, kelly), then OlderThan(shannon, evan).

The structure of this proof is that of a conditional proof: a deduction of a conditional from a set of basis sentence which starts with the assumption of the antecedent, then a derivation of the consequent, and concludes with the conditional. To build conditional proofs in N, we rely on the →-Intro rule.

-Intro

k. P Assumption
.
.
.
m. Q
n. (P → Q) →-Intro: k-m

According to the →-Intro rule we may derive a conditional if we derive the consequent Q from the assumption of the antecedent P, and, perhaps, other sentences occurring earlier in the proof on wider proof margins. Again, such a proof is called a conditional proof.

We model the above informal conditional proof in N as follows.

1. (Olderthan(shannon, kelly) → OlderThan(shannon, paige)) Basis
2. (Olderthan(shannon, paige) → OlderThan(shannon, evan)) Basis
3. OlderThan(shannon, kelly) Assumption
4. OlderThan(shannon, paige) →-Elim: 1, 3
5. OlderThan(shannon, evan) →-Elim: 2, 4
6. (OlderThan(shannon, kelly) → OlderThan(shannon, evan)) →-Intro: 3-5

Mastery of a deductive system facilitates the discovery of proof pathways in hard cases and increases one’s efficiency in communicating proofs to others and explaining why a sentence is a logical consequence of others. For example, suppose that (1) if Beth is not Paige’s parent, then it is false that if Beth is a parent of Shannon, Shannon and Paige are sisters. Further suppose (2) that Beth is not Shannon’s parent. Then we may conclude that Beth is Paige’s parent. Of course, knowing the type of sentences involved is helpful for then we have a clearer idea of the inference principles that may be involved in deducing that Beth is a parent of Paige. Accordingly, we represent the two basis sentences and the conclusion in M, and then give a formal proof of the latter from the former.

1. (~Parent(beth, paige) → ~(Parent(beth, shannon) → Sister(shannon, paige))) Basis
2. ~Parent(beth, shannon) Basis
3. ~Parent(beth, paige) Assumption
4. ~(Parent(beth, shannon) → Sister(shannon, paige)) →-Elim: 1, 3
5. Parent(beth, shannon) Assumption
6. ⊥-Intro: 2, 5
7. Sister(shannon, paige) ⊥-Elim: 6
8. (Parent(beth, shannon) → Sister(shannon, paige)) →-Intro: 5-7
9. ⊥-Intro: 4, 8
10. ~~Parent(beth, paige) ~-Intro: 3-9
11. Parent(beth, paige) ~-Elim: 10

Because we derived a contradiction at line 9, we got ‘~~Parent(beth, paige)’ at line 10, using ~-Intro, and then we derived ‘Parent(beth, paige)’ by ~-Elim. Look at the conditional proof (lines 5-7) from which we derived line 8. Pretty neat, huh? Lines 2 and 5 generated the contradiction from which we derived ‘Sister(shannon, paige)’ at line 7 in order to get the conditional at line 8. This is our first example of a sub-proof (5-7) embedded in another sub-proof (3-9). It is unlikely that independent of the resources of a deductive system, a reasoner would be able to readily build the informal analogue of this pathway from the basis sentences to the sentence at line 11. Again, mastery of a deductive system such as N can increase the efficiency of our performances of rigorous reasoning and cultivate skill at producing elegant proofs (proofs that take the least number of steps to get from the basis to the conclusion).

We now introduce the Intro and Elim rules for the identity symbol and the quantifiers. Let n and n’ be any names, and 'Ωn' and 'Ωn’ ' be any well-formed formulas in which n and n’ appear and that have no free variables.

=-Intro

k. (n = n) =-Intro

=-Elim

k. Ωn
l. (n = n’ )
m. Ωn’ =-Elim: k, l

The =-Intro rule allows us to introduce '(n = n)' at any step in a proof. Since '(n = n)' is deducible from any sentence, there is no need to identify the lines from which line k is derived. In effect, the =-Intro rule confirms that ‘(paige = paige)’, ‘(shannon = shannon)’, ‘(kelly = kelly)’, etc… may be inferred from any sentence(s). The =-Elim rule tells us that if we have proven 'Ωn' and '(n = n’ )', then we may derive 'Ωn’ ' which is gotten from 'Ωn' by replacing n with n’ in some but possibly not all occurrences. The =-Elim rule represents the principle known as the indiscernibility of identicals, which says that if '(n = n’ )' is true, then whatever is true of the referent of n is true of the referent of n’. This principle grounds the following inference

1. ~Sister(beth, kelly)
2. (beth = shannon)


(therefore) ~Sister(shannon, kelly)

The indiscernibility of identicals is fairly obvious. If I know that Beth isn’t Kelly’s sister and that Beth is Shannon (perhaps ‘Shannon’ is an alias) then this establishes, with the help of the indiscernibility of identicals, that Shannon isn’t Kelly’s sister. Now we turn to the quantifier rules.

Let 'Ωv' be a formula in which v is the only free variable, and let n be any name.

∃-Intro

k. Ωn
m. vΩv ∃-Intro: k

∃-Elim

k. vΩv
[n] m. Ωn Assumption
.
.
.
n. P
o. P ∃-Elim: k, m-n

Here, n must be unique to the subproof, that is, n doesn’t occur on any of the lines above m and below n.

The ∃-Intro rule, which represents the principle of inference known as existential generalization, tells us that if we have proven 'Ωn', then we may derive 'vΩv' which results from 'Ωn' by replacing n with a variable v in some but possibly not all of its occurrences and prefixing the existential quantifier. According to this rule, we may infer, say, ‘∃x Married(x, matt)’ from the sentence ‘Married(beth, matt)’. By the ∃-Elim rule, we may reason from a sentence that is produced from an existential quantification by stripping the quantifier and replacing the resulting free variable in all of its occurrences by a name which is new to the proof. Recall that the language M has an infinite number of constants, and the name introduced by the ∃-Elim rule may be one of the wi. We regard the assumption at line l, which starts the embedded sub-proof, as saying “Suppose n names an arbitrary individual from the domain of discourse such that 'Ωn' is true.” To illustrate the basic idea behind the ∃-Elim rule, if I tell you that Shannon admires some McKeon, you can’t infer that Shannon admires any particular McKeon such as Matt, Beth, Shannon, Kelly, Paige, or Evan. Nevertheless we have it that she admires somebody. The principle of inference corresponding to the ∃-Elim rule, called existential instantiation, allows us to assign this ‘somebody’ an arbitrary name new to the proof, say, ‘w1‘ and reason within the relevant sub-proof from ‘Shannon admires w1‘. Then we cancel the assumption and infer a sentence that doesn’t make any claims about w1. For example, suppose that (1) Shannon admires some McKeon. Let’s call this McKeon ‘w1‘, that is, assume (2) that Shannon admires a McKeon named ‘w1‘. By the principle of inference corresponding to v-Intro we may derive (3) that Shannon admires w1 or w1 admires Kelly. From (3), we may infer by existential generalization (4) that for some McKeon x, Shannon admires x or x admires Kelly. We now cancel the assumption (that is, cancel (2)) by concluding (5) that for some McKeon x, Shannon admires x or x admires Kelly from (1) and the subproof (2)-(4), by existential instantiation. Here is the above reasoning set out formally.

1. ∃x Admires(shannon, x) Basis
[w1] 2. Admires(shannon, w1) Assumption
3. (Admires(shannon, w1) v Admires(w1, kelly)) v-Intro: 2
4. ∃x(Admires(shannon, x) v Admires(x, kelly)) ∃-Intro: 3
5. ∃x(Admires(shannon, x) v Admires(x, kelly)) ∃-Elim: 1, 2-4

The string at the assumption of the sub-proof (line 2) says “Suppose that ‘w1 ‘ names an arbitrary McKeon such that ‘Admires(shannon, w1)’ is true.” This is not a sentence of M, but of the meta-language for M, that is, the language used to talk about M. Hence, the ∃-Elim rule (as well as the ∀-Intro rule introduced below) has a meta-linguistic character.

∀-Intro

[n] k. Assumption
.
.
.
m. Ωn
n. vΩv ∀-Intro: k-m
n must be unique to the subproof

∀-Elim

k. vΩv
m. Ωn ∀-Elim: k

The ∀-Elim rule corresponds to the principle of inference known as universal instantiation: to infer that something holds for an individual of the domain if it holds for the entire domain. The ∀-Intro rule allows us to derive a claim that holds for the entire domain of discourse from a proof that the claim holds for an arbitrary selected individual from the domain. The assumption at line k reads in English “Suppose n names an arbitrarily selected individual from the domain of discourse.” As with the ∃-Elim rule, the name introduced by the ∀-Intro rule may be one of the wi. The ∀-Intro rule corresponds to the principle of inference often called universal generalization.

For example, suppose that we are told that (1) if a McKeon admires Paige, then that McKeon admires himself/herself, and that (2) every McKeon admires Paige. To show that we may correctly infer that every McKeon admires himself/herself we appeal to the principle of universal generalization, which (again) is represented in N by the ∀-Intro rule. We begin by assuming that (3) a McKeon is named ‘w1‘. All we assume about w1 is that w1 is one of the McKeons. From (2), we infer that (4) w1 admires Paige. We know from (1), using the principle of universal instantiation (the ∀-Elim rule in N), that (5) if w1 loves Paige then w1 loves w1. From (4) and (5) we may infer that (6) w1 loves w1 by modus ponens. Since w1 is an arbitrarily selected individual (and so what holds for w1 holds for all McKeons) we may conclude from (3)-(6) that (7) every McKeon loves himself/herself follows from (1) and (2) by universal generalization. This reasoning is represented by the following formal proof.

1. ∀x(Admires(x, paige) → Admires(x, x)) Basis
2. ∀x Admires(x, paige) Basis
[w1] 3. Assumption
4. Admires(w1, paige) ∀-Elim: 2
5. (Admires(w1, paige) → Admires(w1, w1)) ∀-Elim: 1
6. Admires(w1, w1) →-Elim: 4, 5
7. ∀x Admires(x, x) ∀-Intro: 3-6

Line 3, the assumption of the sub-proof, corresponds to the English sentence “Let ‘w1‘ refer to an arbitrary McKeon.” The notion of a name referring to an arbitrary individual from the domain of discourse, utilized by both the ∀-Intro and ∃-Elim rules in the assumptions that start the respective sub-proofs, incorporates two distinct ideas. One, relevant to the ∃-Elim rule, means “some specific object, but I don’t know which”, while the other, relevant to the ∀-Intro rule means “any object, it doesn’t matter which” (See Pelletier 1999, pp. 118-120 for discussion.)

Consider:

K = {All McKeons admire those who admire somebody, Some McKeon admires a McKeon}
X = Paige admires Paige

Here’s a proof that X is deducible from K.

1. ∀x(∃y Admires(x, y) → ∀z Admires(z, x)) Basis
2. ∃x∃y Admires(x, y) Basis
[w1] 3. ∃y Admires(w1, y) Assumption
4. (∃y Admires(w1, y) → ∀z Admires(z, w1)) ∀-Elim: 1
5. ∀z Admires(z, w1) →-Elim: 3, 4
6. Admires(paige, w1) ∀-Elim: 5
7. ∃y Admires(paige, y) ∃-Intro: 6
8. (∃y Admires(paige, y) → ∀z Admires(z, paige)) ∀-Elim: 1
9. ∀z Admires(z, paige) →-Elim: 7, 8
10. Admires(paige, paige) ∀-Elim: 9
11. Admires(paige, paige) ∃-Elim: 2, 3-10

An informal correlate put somewhat succinctly, runs as follows.

Let’s call the unnamed admirer, mentioned in (2), w1. From this and (1), every McKeon admires w1 and so Paige admires w1. Hence, Paige admires somebody. From this and (1) it follows that everybody admires Paige. So, Paige admires Paige. This is our desired conclusion

Even though the informal proof skips steps and doesn’t mention by name the principles of inference used, the formal proof guides its construction.

5. The Status of the Deductive Characterization of Logical Consequence in Terms of N

We began the article by presenting the deductive-theoretic characterization of logical consequence: X is a logical consequence of a set K of sentences if and only if X is deducible from K, that is, there is a deduction of X from K. To make it official, we now characterize the deductive consequence relation in M in terms of deducibility in N.

X is a deductive consequence of K if and only if K ⊢N X, that is, X is deducible in N from K

We now inquire into the status of this characterization of deductive consequence.

The first thing to note is that deductive system N is complete and sound with respect to the model-theoretic consequence relation defined in Logical Consequence, Model-Theoretic Conceptions: Section 4.4. Let

K ⊢N X

abbreviate

X is deducible in N from K

Similarly, let

K ⊨ X

abbreviate

X is a model-theoretic consequence of K, that is, every M-structure that is a model of K is also a model of X. (For more information on structures and models, see Logical Consequence, Model-Theoretic Conceptions.)

The completeness and soundness of N means that for any set K of M sentences and M-sentence X, K ⊢N X if and only if K ⊨ X. A soundness proof establishes K ⊢N X only if K ⊨ X, and a completeness proof establishes K ⊢N X if K ⊨ X. So, the ⊢N and ⊨ relations, defined on sentences of M, are extensionally equivalent. The question arises: which characterization of the logical consequence relation is more basic or fundamental?

a. Tarski’s argument that the model-theoretic characterization of logical consequence is more basic than its characterization in terms of a deductive system

The first thing to note is that the ⊢N-consequence relation is compact. For any deductive system D and pair there is a K’ such that, K ⊢D X if and only if K’ ⊢D X, where K’ is a finite subset of sentences from K. As pointed out by Tarski (1936), among others, there are intuitively correct principles of inference reflected in certain languages according to which one may infer a sentence X from a set K of sentences, even though it is incorrect to infer X from any finite subset of K. Here’s a rendition of his reasoning, focusing on the ⊢N-consequence relation defined on a language for arithmetic, which allows us to talk about the natural numbers 0, 1, 2, 3, and so on. Let ‘P’ be a predicate defined over the domain of natural numbers and let ‘NatNum(x)’ abbreviate ‘x is a natural number’. According to Tarski, intuitively,

∀x(NatNum(x) → P(x))

is a logical consequence of the infinite set S of sentences

P(0)
P(1)
P(2)
.
.
.

However, the universal quantification is not a ⊢N-consequence of the set S. The reason why is that the ⊢N-consequence relation is compact: for any sentence X and set K of sentences, X is a ⊢N-consequence of K, if and only if X is a ⊢N-consequence of some finite subset of K. Proofs in N are objects of finite length; a deduction is a finite sequence of sentences. Since the universal quantification is not a ⊢N-consequence of any finite subset of S, it is not a ⊢N-consequence of S. By the completeness of system N, it follows that

∀x(NatNum(x) → P(x))

is not a ⊨-consequence of S either. Consider the structure U* whose domain is the set of McKeons. Let all numerals name Beth. Let the extension of ‘NatNum’ be the entire domain, and the extension of ‘P’ be just Beth. Then each element of S is true in U*, but ‘∀x (NatNum(x) → P(x))’ is not true in U*. (See Logical Consequence, Model-Theoretic Conceptions for further discussion of structures.) Note that the sentences in S only say that P holds for 0, 1, 2, and so on, and not also that 0,1, 2, etc., are all the elements of the domain of discourse. The above interpretation takes advantage of this fact by reinterpreting all numerals as names for Beth.

However, we can reflect model-theoretically the intuition that ‘∀x(NatNum(x) → P(x))’ is a logical consequence of set S by doing one of two things. We can add to S the functional equivalent of the claim that 1, 2, 3, etc., are all the natural numbers there are on the basis that this is an implicit assumption of the view that the universal quantification follows from S. Or we could add ‘NatNum’ and all numerals to our list of logical terms. On either option it still won’t be the case that ‘∀x(NatNum(x) → P(x))’ is a ⊢N-consequence of the set S. There is no way to accommodate the intuition that ‘∀x(NatNum(x) → P(x))’ is a logical consequence of S in terms of a compact consequence relation. Tarski takes this to be a reason to think that the model-theoretic account of logical consequence is definitive as opposed to an account of logical consequence in terms of a compact consequence relation such as ⊢N.

Tarski’s illustration shows that what is called the ω-rule is a correct inference rule.

The ω-rule is that from:

{P(0), P(1), P(2), …}

one may infer

∀x(NatNum(x) → P(x))

with respect to any predicate P. Any inference guided by this rule is correct even though it can’t be represented in a deductive system as this notion has been used here and discussed in Logical Consequence, Philosophical Considerations.

Compactness is not a salient feature of logical consequence conceived deductive theoretically. This suggests, by the third criterion of a successful theoretical definition of logical consequence mentioned in Logical Consequence, Philosophical Considerations, that no compact consequence relation is definitive of the intuitive notion of deducibility. So, assuming that deductive system N is correct (that is, deducibility is co-extensive in M with the ⊢N-relation), we can’t treat

X is intuitively deducible from K if and only if K ⊢N X.

as a definition of deducibility in M since

X is a deductive consequence of K if and only if X is deducible in a correct deductive system from K.

is not true with respect to languages for which deducibility is not captured by any compact consequence relation (that is, not captured by any deduction-system account of it ). Some (e.g., Quine) demur using a language for purposes of science in which deducibility is not completely represented by a deduction-system account because of epistemological considerations. Nevertheless, as Tarski (1936) argues, the fact that there cannot be deduction-system accounts of some intuitively correct principles of inference is reason for taking a model-theoretic characterization of logical consequence to be more fundamental than any characterization in terms of a deductive system sound and complete with respect to the model-theoretic characterization.

b. Is deductive system N correct?

In discussing the status of the characterization of logical consequence in terms of deductive system N, we assumed that N is correct. The question arises whether N is, indeed, correct. That is, is it the case that X is intuitively deducible from K if and only if K ⊢N X? The biconditional holds only if both (1) and (2) are true.

(1) If sentence X is intuitively deducible from set K of sentences, then K ⊢N X.
(2) If K ⊢N X, then sentence X is intuitively deducible from set K of sentences.

So N is incorrect if either (1) or (2) is false. The truth of (1) and (2) is relevant to the correctness of the characterization of logical consequence in terms of system N, because any adequate deductive-theoretic characterization of logical consequence must identify the logical terms of the relevant language and account for their inferential properties (for discussion, see Logical Consequence, Philosophical Considerations: Section 4). (1) is false if the list of logical terms in M is incomplete. In such a case, there will be a sentence X and set K of sentences such that X is intuitively deducible from set K because of at least one inferential property of logical terminology unaccounted for by N and so false that K ⊢N X (for discussion of some of the issues surrounding what qualifies as a logical term see Logical Consequence, Model-theoretic Conceptions: Section 5.3). In this case, N would be incorrect because it wouldn’t completely account for the inferential machinery of language M. (2) is false if there are deductions in N that are intuitively incorrect. Are there such deductions? In order to fine-tune the question note that the sentential connectives, the identity symbol, and the quantifiers of M are intended to correspond to or, and, not, if…then (the indicative conditional), is identical with, some, and all. Hence, N is a correct deductive system only if the Intro and Elim rules of N reflect the inferential properties of the ordinary language expressions. In what follows, we sketch three views that are critical of the correctness of system N because they reject (2).

i. Relevance logic

Not everybody accepts it as a fact that any sentence is deducible from a contradiction, and so some question the correctness of the ⊥-Elim rule. Consider the following informal proof of Q from 'P & ~P', for sentences P and Q, as a rationale for the ⊥-Elim rule.

From (1) P and not-P, we may correctly infer (2) P, from which it is correct to infer (3) P or Q. We derive (4) not-P from (1). (5) P follows from (3) and (4).

The proof seems to be composed of valid modes of inference. Critics of the ⊥-Elim rule are obliged to tell us where it goes wrong. Here we follow the relevance logicians Anderson and Belnap (1962, pp.105-108; for discussion, see Read 1995, pp. 54-60). In a nutshell, Anderson and Belnap claim that the proof is defective because it commits a fallacy of equivocation. The move from (2) to (3) is correct only if or has the sense of at least one. For example, from Kelly is female it is legit to infer that at least one of the two sentences Kelly is female and Kelly is older than Paige is true. On this sense of or given that Kelly is female, one may infer that Kelly is female or whatever you like. However, in order for the passage from (3) and (4) to (5) to be legitimate the sense of or in (3) is if not-…then. For example from if Kelly is not female, then Kelly is not Paige’s sister and Kelly is not female it is correct to infer Kelly is not Paige’s sister. Hence, the above “support” for the ⊥-Elim rule is defective for it equivocates on the meaning of or.

Two things to highlight. First, Anderson and Belnap think that the inference from (2) to (3) on the if not-…then reading of or is incorrect. Given that Kelly is female it is problematic to deduce that if she is not then Kelly is older than Paige—or whatever you like. Such an inference commits a fallacy of relevance for Kelly not being female is not relevant to her being older than Paige. The representation of this inference in system N appeals to the ⊥-Elim rule, which is rejected by Anderson and Belnap. Second, the principle of inference underlying the move from (3) and (4) to (5)—from P or Q and not-P to infer Q—is called the principle of the disjunctive syllogism. Anderson and Belnap claim that this principle is not generally valid when or has the sense of at least one, which it has when it is rendered by ‘v‘ (e.g., see above). If Q is relevant to P, then the principle holds on this reading of or.

It is worthwhile to note the essentially informal nature of the debate. It calls upon our pre-theoretic intuitions about correct inference. It would be quite useless to cite the proof in N of the validity of disjunctive syllogism (given above) against Anderson and Belnap for it relies on the ⊥-Elim rule whose legitimacy is in question. No doubt, pre-theoretical notions and original intuitions must be refined and shaped somewhat by theory. Our pre-theoretic notion of correct deductive reasoning in ordinary language is not completely determinant and precise independently of the resources of a full or partial logic. (See Shapiro 1991, chaps. 1 and 2 for discussion of the interplay between theory and pre-theoretic notions and intuitions.) Nevertheless, hardcore intuitions regarding correct deductive reasoning do seem to drive the debate over the legitimacy of deductive systems such as N and over the legitimacy of the ⊥-Elim rule in particular. Anderson and Belnap (1962, p. 108) write that denying the principle of the disjunctive syllogism, regarded as a valid mode of inference since Aristotle, “… will seem hopelessly naïve to those logicians whose logical intuitions have been numbed through hearing and repeating the logicians fairy tales of the past half century, and hence stand in need of further support”. The possibility that intuitions in support of the general validity of the principle of the disjunctive syllogism have been shaped by a bad theory of inference is motive enough to consider argumentative support for the principle and to investigate deductive systems for relevance logic.

A natural deductive system for relevance logic has the means for tracking the relevance quotient of the steps used in a proof and allows the application of an introduction rule in the step from A to B “only when A is relevant to B in the sense that A is used in arriving at B” (Anderson and Belnap 1962, p. 90). Consider the following proof in system N.

1. Admires(evan, paige) Basis
2. ~Married(beth, matt) Assumption
3. Admires(evan, paige) Reit: 1
4. (~Married(beth, matt) → Admires(evan, paige)) →-Intro: 2-3

Recall that the rationale behind the →-Intro rule is that we may derive a conditional if we derive the consequent Q from the assumption of the antecedent P, and, perhaps, other sentences occurring earlier in the proof on wider proof margins. The defect of this rule, according to Anderson and Belnap is that “from” in “from the assumption of the antecedent P” is not taken seriously. They seem to have a point. By the lights of the → -Intro rule, we have derived line 4 but it is hard to see how we have derived the sentence at line 3 from the assumption at step 2 when we have simply reiterated the basis at line 3. Clearly, ‘~Married(beth, matt)’ was not used in inferring ‘Admires(evan, beth)’ at line 3. The relevance logician claims that the →-Intro rule in a correct natural deductive system should not make it possible to prove a conditional when the consequent was arrived at independently of the antecedent. A typical strategy is to use classes of numerals to mark the relevance conditions of basis sentences and assumptions and formulate the Intro and Elim rules to tell us how an application of the rule transfers the numerical subscript(s) from the sentences used to the sentence derived with the help of the rule. Label the basis sentences, if any, with distinct numerical subscripts. Let a, b, c, etc., range over classes of numerals. The →-rules for a relevance natural deductive system may be represented as follows.

→-Elim

k. (P → Q)a
l. Pb
m. Qab →-Elim: k, l

→-Intro

k. P{k} Assumption
.
.
.
m. Qb
n. (P → Q)b – {k} →-Intro: k-m, provided kb
The numerical subscript of the assumption
at line k must be new to the proof.
This is insured by using the line number
for the subscript.

In the directions for the →-Intro rule, the proviso that kb insures that the antecedent P is used in deriving the consequent Q. Anderson and Belnap require that if the line that results from the application of either rule is the conclusion of the proof the relevance markers be discharged. Here is a sample proof of the above two rules in action.

1. Admires(evan, paige)1 Assumption
2. (Admires(evan, paige) → ~Married(beth, matt))2 Assumption
3. ~Married(beth, matt)1, 2 →-Elim: 1,2
4. ((Admires(evan, paige) → ~Married(beth, matt)) → ~Married(beth, matt))1 →-Intro: 2-3
5. (Admires(evan, paige) → ((Admires(evan, paige) → ~Married(beth, matt)) → ~Married(beth, matt))) →-Intro: 1-4

For further discussion see Anderson and Belnap (1962). For a comprehensive discussion of relevance deductive systems see their (1975). For a more up-to-date review of the relevance logic literature see Dunn (1986).

ii. Intuitionistic logic

We now consider the correctness of the ~-Elim rule and consider the rule in the context of using it along with the ~-Intro rule.

~-Intro

k. P Assumption
.
.
.
m.
n. ~P ~-Intro: k-m

~-Elim

k. ~~P
m. P ~-Elim: k

Here is a typical use in classical logic of the ~-Intro and ~-Elim rules. Suppose that we derive a contradiction from the assumption that a sentence P is true. So, if P were true, then a contradiction would be true which is impossible. So P cannot be true and we may infer that not-P. Similarily, suppose that we derive a contradiction from the assumption that not-P. Since a contradiction cannot be true, not-P is not true. Then we may infer that P is true by ~-Elim.

The intuitionist logician rejects the reasoning given in bold. If a contradiction is derived from not-P we may infer that not-P is not true, that is, that not-not-P is true, but it is incorrect to infer that P is true. Why? Because the intuitionist rejects the presupposition behind the ~-Elim rule, which is that for any proposition P there are two alternatives: P and not-P. The grounds for this are the intuitionistic conceptions of truth and meaning.

According to intuitionistic logic, truth is an epistemic notion: the truth of a sentence P consists of our ability to verify it. To assert P is to have a proof of P, and to assert not-P is to have a refutation of P. This leads to an epistemic conception of the meaning of logical constants. The meaning of a logical constant is characterized in terms of its contribution to the criteria of proof for the sentences in which it occurs. Compare with classical logic: the meaning of a logical constant is semantically characterized in terms of its contribution to the determination of the truth conditions of the sentences in which it occurs. For example, the classical logician accepts a sentence of the form 'P v Q' only when she accepts that at least one of the disjuncts is true. On the other hand, the intuitionistic logician accepts ' P v Q' only when she has a method for proving P or a method for proving Q. But then the Law of Excluded Middle no longer holds, because a sentence of the form P or not-P is true, that is assertible, only when we are in a position to prove or refute P, and we lack the means for verifying or refuting all sentences. The alleged problem with the ~-Elim rule is that it illegitimately extends the grounds for asserting P on the basis of not-not-P since a refutation of not-P is not ipso facto a proof of P.

Since there are finitely many McKeons and the predicates of language M seem well defined, we can work through the domain of the McKeons to verify or refute any M-sentence and so there doesn’t seem to be an M-sentence that is neither verifiable nor refutable. However, consider a language about the natural numbers. Any sentence that results by substituting numerals for the variables in ‘x = y + z’ is decidable. This is to say that for any natural numbers x, y, and z, we have an effective procedure for determining whether or not x is the sum of y and z. Hence, for all x, y, and z either we may assert that x = y + z or we may assert the contrary. Let ‘A(x)’ abbreviate ‘if x is even and greater than 2 then there exists primes y and z such that x = y + z’. Since there are algorithms for determining of any number whether or not it is even, greater than 2, or prime, the hypothesis that the open formula ‘A(x)’ is satisfied by a given natural number is decidable for we can effectively determine for all smaller numbers whether or not they are prime. However, there is no known method for verifying or refuting Goldbach’s conjecture, for all x, A(x). Even though, for each numeral n standing for a natural number, the sentence 'A(n)' is decidable (that is, we can determine which of 'A(n)' or 'not-A(n)' is true), the sentence ‘for all x, A(x)’ is not. That is, we are not in a position to hold that either Goldbach’s conjecture is true or that it is not. Clearly, verification of the conjecture via an exhaustive search of the domain of natural numbers is not possible since the domain is non-finite. Minus a counterexample or proof of Goldbach’s conjecture, the intuitionist demurs from asserting that either Goldbach’s conjecture is true or it is not. This is just one of many examples where the intuitionist thinks that the law of excluded middle fails.

In sum, the legitimacy of the ~-Elim rule requires a realist conception of truth as verification transcendent. On this conception, sentences have truth-values independently of the possibility of a method for verifying them. Intuitionistic logic abandons this conception of truth in favor of an epistemic conception according to which the truth of a sentence turns on our ability to verify it. Hence, the inference rules of an intuitionistic natural deductive system must be coded in such a way to reflect this notion of truth. For example, consider an intuitionistic language in which a, b, … range over proofs, ‘a: P’ stands for ‘a is a proof of P’, and ‘(a, b)’ stands for some suitable pairing of the proofs a and b. The &-rules of an intuitionistic natural deductive system may look like the following:

&-Intro

k. a: P
l. b: Q
m. (a, b): (P & Q) &-Intro: k, l

&-Elim

k. (a, b): (P & Q) & nbsp; k. (a, b): (P & Q)
m. a: P &-Elim: k m. b: Q &-Elim: k

Apart from the negation rules, it is fairly straightforward to dress the Intro and Elim rules of N with a proof interpretation as is illustrated above with the &-rules. For the details see Van Dalen (1999). For further introductory discussion of the philosophical theses underlying intuitionistic logic see Read (1995) and Shapiro (2000). Tennant (1997) offers a more comprehensive discussion and defense of the philosophy of language underlying intuitionistic logic.

iii. Free Logic

We now turn to the ∃-Intro and ∀-Elim rules. Consider the following two inferences.

(1) Male(evan)


(3) ∀x Male(x)


(therefore) (2) ∃x Male(x) (therefore) (4) Male(evan)

Both are correct by the lights of our system N. Specifically, (2) is derivable from (1) by the ∃-Intro rule and we get (4) from (3) by the ∀-Elim rule. Note an implicit assumption required for the legitimacy of these inferences: every individual constant refers to an element of the quantifier domain. If this existence assumption, which is built into the semantics for M and reflected in the two quantifier rules, is rejected, then the inferences are unacceptable. What motivates rejecting the existence assumption and denying the correctness of the above inferences?

There are contexts in which singular terms are used without assuming that they refer to existing objects. For example, it is perfectly reasonable to regard the individual constants of a language used to talk about myths and fairy tales as not denoting existing objects. It seems inappropriate to infer that some actually existing individual is jolly on the basis that the sentence Santa Claus is jolly is true. Also, the logic of a language used to debate the existence of God should not presuppose that God refers to something in the world. The atheist doesn’t seem to be contradicting herself in asserting that God does not exist. Furthermore, there are contexts in science where introducing an individual constant for an allegedly existing object such as a planet or particle should not require the scientist to know that the purported object to which the term allegedly refers actually exists. A logic that allows non-denoting individual constants (terms that do not refer to existing things) while maintaining the existential import of the quantifiers (‘∀x’ and ‘∃x’ mean something like ‘for all existing individuals x’ and ‘for some existing individuals x’, respectively) is called a free logic. In order for the above two inferences to be correct by the lights of free logic, the sentence Evan exists must be added to the basis. Correspondingly, the ∃-Intro and ∀-Elim rules in a natural deductive system for free logic may be portrayed as follows. Again, let 'Ωv' be a formula in which v is the only free variable, and let n be any name.

∀-Elim ∃-Intro
k. vΩv k. Ωn
l. E!n l. E!n
m. Ωn ∀-Elim: k, l m. vΩv ∃-Intro: k, l

'E!n' abbreviates n exists and so we suppose that ‘E!’ is an item of the relevant language. The ∀-Intro and ∃-Elim rules in a free logic deductive system also make explicit the required existential presuppositions with respect to individual constants (for details see Bencivenga 1986, p. 387). Free logic seems to be a useful tool for representing and evaluating reasoning in contexts such as the above. Different types of free logic arise depending on whether we treat terms that do not denote existing individuals as denoting objects that do not actually exist or as simply not denoting at all.

In sum, there are contexts in which it is appropriate to use languages whose vocabulary and syntactic formation rules are independent of our knowledge of the actual existence of the entities the language is about. In such languages, the quantifier rules of deductive system N sanction incorrect inferences, and so at best N represents correct deductive reasoning in languages for which the existential presupposition with respect to singular terms makes sense. The proponent of system N may argue that only those expressions guaranteed a referent (e.g., demonstratives) are truly singular terms. On this view, advocated by Bertrand Russell at one time, expressions that may not have a referent such as Santa Claus, God, Evan, Bill Clinton, the child abused by Michael Jackson are not genuinely singular expressions. For example, in the sentence Evan is male, Evan abbreviates a unique description such as the son of Matt and Beth. Then Evan is male comes to

There exists a unique x such that x is a son of Matt and Beth and x is male.

From this we may correctly infer that some are male. The representation of this inference in N appeals to both the ∃-Intro and &exists;-Elim rules, as well as the &-Elim rule. However, treating most singular expressions as disguised definite descriptions at worst generates counter-intuitive truth-value assignments (Santa Claus is jolly turns out false since there is no Santa Claus) and seems at best an unnatural response to the criticism posed from the vantagepoint of free logic.

For a short discussion of the motives behind free logic and a review of the family of free logics see Read (1995, chap. 5). For a more comprehensive discussion and a survey of the relevant literature see Bencivenga (1986). Morscher and Hieke (2001) is a collection of recent essays devoted to taking stock of the past fifty years of research in free logic and outlining new directions.

6. Conclusion

This completes our discussion of the deductive-theoretic conception of logical consequence. Since, arguably, logical consequence conceived deductive-theoretically is not compact it cannot be defined in terms of deducibility in a correct deductive system. Nevertheless correct deductive systems are useful for modeling deductive reasoning and they have applications in areas such as computer science and mathematics. Is deductive system N correct? In other words: Do the Intro and Elim rules of N represent correct principles of inference? We sketched three motives for answering in the negative, each leading to a logic that differs from the classical one developed here and which requires altering Intro and Elim rules of N. It is clear from the discussion that any full coverage of the topic would have to engage philosophical issues, still a matter of debate, such as the nature of truth, meaning and inference. For a comprehensive and very readable survey of proposed revisions to classical logic (those discussed here and others) see Haack (1996). For discussion of related issues, see also the entries, “Logical Consequence, Philosophical Considerations” and “Logical Consequence, Model-Theoretic Conceptions” in this encyclopedia.

7. References and Further Reading

  • Anderson, A.R. and N. Belnap (1962): “Entailment”, pp. 76-110 in Logic and Philosophy, ed. G. Iseminger. New York: Appleton-Century-Crofts, 1968.
  • Anderson, A.R., and N. Belnap (1975): Entailment: The Logic of Relevance and Necessity. Princeton: Princeton University Press.
  • Barwise, J. and J. Etchemendy (2001): Language, Proof and Logic. Chicago: University of Chicago Press and CSLI Publications.
  • Bencivenga, E. (1986): “Free logics”, pp. 373-426 in Gabbay and Geunthner (1986).
  • Dunn, M. (1986): “Relevance Logic and Entailment”, pp. 117-224 in Gabbay and Geunthner (1986).
  • Fitch, F.B. (1952): Symbolic Logic: An Introduction. New York: The Ronald Press.
  • Gabbay, D. and F. Guenthner, eds. (1983): Handbook of Philosophical Logic, Vol 1. Dordrecht: D. Reidel.
  • Gabbay, D. and F. Guenthner, eds. (1986): Handbook of Philosophical Logic, Vol. 3. Dordrecht: D. Reidel.
  • Gentzen, G. (1934): “Investigations Into Logical Deduction”, pp. 68-128 in Collected Papers, ed. M.E. Szabo. Amsterdam: North-Holland, 1969.
  • Haack, S. (1978): Philosophy of Logics. Cambridge: Cambridge University Press.
  • Haack, S. (1996): Deviant Logic, Fuzzy Logic. Chicago: The University of Chicago Press.
  • Morscher E. and A. Hieke, eds. (2001): New Essays in Free Logic: In Honour of Karel Lambert, Dordrecht: Kluwer.
  • Pelletier, F.J. (1999): “A History of Natural Deduction and Elementary Logic Textbooks”, pp.105-138 in Logical Consequence: Rival Approaches, ed. J. Woods and B. Brown. Oxford: Hermes Science Publishing, 2001.
  • Read, S. (1995): Thinking About Logic. Oxford: Oxford University Press.
  • Shapiro, S. (1991): Foundations without Foundationalism: A Case For Second-Order Logic. Oxford: Clarendon Press.
  • Shapiro, S. (2000): Thinking About Mathematics. Oxford: Oxford University Press.
  • Sundholm, G. (1983): “Systems of Deduction”, in Gabbay and Guenthner (1983).
  • Tarski, A. (1936): “On the Concept of Logical Consequence”, pp. 409-420 in Tarski (1983).
  • Tarski, A. (1983): Logic, Semantics, Metamathematics, 2nd ed. Indianapolis: Hackett Publishing.
  • Tennant, N. (1997): The Taming of the True. Oxford: Clarendon Press.
  • Van Dalen, D. (1999): “The Intuitionistic Conception of Logic”, pp. 45-73 in Varzi (1999).
  • Varzi, A., ed. (1999): European Review of Philosophy, Vol. 4, The Nature of Logic, Stanford: CSLI Publications.

Author Information

Matthew McKeon
Email: mckeonm@msu.edu
Michigan State University
U. S. A.

Reductio ad Absurdum

Reductio ad absurdum is a mode of argumentation that seeks to establish a contention by deriving an absurdity from its denial, thus arguing that a thesis must be accepted because its rejection would be untenable. It is a style of reasoning that has been employed throughout the history of mathematics and philosophy from classical antiquity onwards.

Table of Contents

  1. Basic Ideas
  2. The Logic of Strict Propositional Reductio: Indirect Proof
  3. A Classical Example of Reductio Argumentation
  4. Self-Annihilation: Processes that Engender Contradiction
  5. Doctrinal Annihilation: Sets of Statements that Are Collectively Inconsistent
  6. Absurd Definitions and Specifications
  7. Per Impossible Reasoning
  8. References and Further Reading

1. Basic Ideas

Use of this Latin terminology traces back to the Greek expression hê eis to adunaton apagôgê, reduction to the impossible, found repeatedly in Aristotle’s Prior Analytics. In its most general construal, reductio ad absurdumreductio for short – is a process of refutation on grounds that absurd – and patently untenable consequences would ensue from accepting the item at issue. This takes three principal forms according as that untenable consequence is:

  1. a self-contradiction (ad absurdum)
  2. a falsehood (ad falsum or even ad impossible)
  3. an implausibility or anomaly (ad ridiculum or ad incommodum)

The first of these is reductio ad absurdum in its strictest construction and the other two cases involve a rather wider and looser sense of the term. Some conditionals that instantiate this latter sort of situation are:

  • If that’s so, then I’m a monkey’s uncle.
  • If that is true, then pigs can fly.
  • If he did that, then I’m the Shah of Persia.

What we have here are consequences that are absurd in the sense of being obviously false and indeed even a bit ridiculous. Despite its departure from the usual form of reductio, this sort of thing is also characterized as an attenuated mode of reductio. But although all three cases fall into the range of the term as it is commonly used, logicians and mathematicians generally have the first and strongest of them in view.

The usual explanations of reductio fail to acknowledge the full extent of its range of application. For at the very minimum such a refutation is a process that can be applied to

  • individual propositions or theses
  • groups of propositions or theses (that is, doctrines or positions or teachings)
  • modes of reasoning or argumentation
  • definitions
  • instructions and rules of procedure
  • practices, policies and processes

The task of the present discussion is to explain the modes of reasoning at issue with reductio and to illustrate the work range of its applications.

2. The Logic of Strict Propositional Reductio: Indirect Proof

Whitehead and Russell in Principia Mathematica characterize the principle of “reductio ad absurdum” as tantamount to the formula (~pp) →p of propositional logic. But this view is idiosyncratic. Elsewhere the principle is almost universally viewed as a mode of argumentation rather than a specific thesis of propositional logic.

Propositional reductio is based on the following line of reasoning:

If p ⊢ ~p, then ⊢ ~p

Here ⊢ represents assertability, be it absolute or conditional (that is, derivability). Since pq yields ⊢p →q this principle can be established as follows:

Suppose (1) p ⊢ ~p

(2) ⊢p → ~p from (1)

(3) ⊢p → (p & ~p) from (2) since pp

(4) ⊢ ~(p & ~p) → ~p from (3) by contraposition

(5) ⊢ ~(p & ~p) by the Law of Contradiction

(6) ⊢ ~p from (4), (5) by modus ponens

Accordingly, the above-indicated line of reasoning does not represent a postulated principle but a theorem that issues from subscription to various axioms and proof rules, as instanced in the just-presented derivation.

The reasoning involved here provides the basis for what is called an indirect proof. This is a process of justificating argumentation that proceeds as follows when the object is to establish a certain conclusion p:

(1) Assume not-p

(2) Provide argumentation that derives p from this assumption.

(3) Maintain p on this basis.

Such argumentation is in effect simply an implementation of the above-stated principle with ~p standing in place of p.

As this line of thought indicates, reductio argumentation is a special case of demonstrative reasoning. What we deal with here is an argument of the pattern: From the situation

(to-be-refuted assumption + a conjunction of preestablished facts) ⊢ contradiction

one proceeds to conclude the denial of that to-be-refuted assumption via modus tollens argumentation.

An example my help to clarify matters. Consider division by zero. If this were possible when x is not 0 and we took x ÷ 0 to constitute some well-defined quantity Q, then we would have x ÷ 0 = Q so that x = 0 x Q so that since 0 x (anything) = 0 we would have x = 0, contrary to assumption. The supposition that x ÷ 0 qualifies as a well-defined quantity is thereby refuted.

3. A Classical Example of Reductio Argumentation

A classic instance of reductio reasoning in Greek mathematics relates to the discovery by Pythagoras – disclosed to the chagrin of his associates by Hippasus of Metapontum in the fifth century BC – of the incommensurability of the diagonal of a square with its sides. The reasoning at issue runs as follows:

Let d be the length of the diagonal of a square and s the length of its sides. Then by the Pythagorean theorem we have it that d² = 2s². Now suppose (by way of a reductio assumption) that d and s were commensurable in terms of a common unit u, so that d = n x u and s = m x u, where m and n are whole numbers (integers) that have no common divisor. (If there were a common divisor, we could simply shift it into u.) Now we know that

(n x u)² = 2(m x u

We then have it that n² = 2m². This means that n must be even, since only even integers have even squares. So n = 2k. But now n² = (2k)² = 4k² = 2m², so that 2k² = m². But this means that m must be even (by the same reasoning as before). And this means that m and n, both being even, will have common divisors (namely 2), contrary to the hypothesis that they do not. Accordingly, since that initial commensurability assumption engendered a contradiction, we have no alternative but to reject it. The incommensurability thesis is accordingly established.

As indicated above, this sort of proof of a thesis by reductio argumentation that derives a contradiction from its negation is characterized as an indirect proof in mathematics. (On the historical background see T. L. Heath, A History of Greek Mathematics [Oxford, Clarendon Press, 1921].)

The use of such reductio argumentation was common in Greek mathematics and was also used by philosophers in antiquity and beyond. Aristotle employed it in the Prior Analytics to demonstrate the so-called imperfect syllogisms when it had already been used in dialectical contexts by Plato (see Republic I, 338C-343A; Parmenides 128d). Immanuel Kant’s entire discussion of the antinomies in his Critique of Pure Reason was based on reductio argumentation.

The mathematical school of so-called intuitionism has taken a definite line regarding the limitation of reductio argumentation for the purposes of existence proofs. The only valid way to establish existence, so they maintain, is by providing a concrete instance or example: general-principle argumentation is not acceptable here. This means, in specific, that one cannot establish (∃x)Fx by deducing an absurdity from (∀x)~Fx. Accordingly, intuitionists would not let us infer the existence of invertebrate ancestors of homo sapiens from the patent absurdity of the supposition that humans are vertebrates all the way back. They would maintain that in such cases where we are totally in the dark as to the individuals involved we are not in a position to maintain their existence.

4. Self-Annihilation: Processes that Engender Contradiction

Not only can a self-inconsistent statement (and thereby a self-refuting, self-annihilating one) but also a self-inconsistent process or practice or principle of procedure can be “reduced to absurdity.” For any such modus operandi answers to some instruction (or combination thereof), and such instruction can also prove to be self-contradictory. Examples of this would be:

  • Never say never.
  • Keep the old warehouse intact until the new one is constructed. And build the new warehouse from the materials salvaged by demolishing the old.

More loosely, there are also instructions that do not automatically result in logically absurd (self-contradictory) conclusions, but which open the door to such absurdity in certain conditions and circumstances. Along these lines, a practical rule of procedure or modus operandi would be reduced to absurdity when it can be shown that its actual adoption and implementation would result in an anomaly.Consider an illustration of this sort of situation. A man dies leaving an estate consisting of his town house, his bank account of $30,000, his share in the family business, and several pieces of costume jewelry he inherited from his mother. His will specifies that his sister is to have any three of the valuables in his estate and that his daughter is to inherent the rest. The sister selects the house, a bracelet, and a necklace. The executor refuses to make this distribution and the sister takes him to court. No doubt the judge will rule something like “Finding for the plaintiff would lead ad absurdum. She could just as well have also opted not just for the house but also for the bank account and the business, thereby effectively disinheriting the daughter, which was clearly not the testator’s wish.” Here we have a juridical reductio ad absurdum of sorts. Actually implementing this rule in all eligible cases – its generalized utilization across the board – would yield an unacceptable and untoward result so that the rule could self-destruct in its actual unrestricted implementation. (This sort of reasoning is common in legal contexts. Many such cases are discussed in David Daube Roman Law [Edinburgh: Edinburgh University Press, 1969], pp. 176-94.)Immanuel Kant taught that interpersonal practices cannot represent morally appropriate modes of procedure if they do not correspond to verbally generalizable rules in this way. Such practices as stealing (that is, taking someone else’s possessions without due authorization) or lying (i.e. telling falsehoods where it suits your convenience) are rules inappropriate, so Kant maintains, exactly because the corresponding maxims, if generalized across the board, would be utterly anomalous (leading to the annihilation of property- ownership and verbal communication respectively. Since the rule-conforming practices thus reduce to absurdity upon their general implementation, such practices are adjudged morally unacceptable. For Kant, generalizability is the acid test of the acceptability of practices in the realm of interpersonal dealings.

5. Doctrinal Annihilation: Sets of Statements that Are Collectively Inconsistent

Even as individual statements can prove to be self-contradictions, so a plurality of statements (a “doctrine” let us call it) can prove to be collectively inconsistent. And so in this context reductio reasoning can also come into operation. For example, consider the following schematic theses:

  • AB
  • BC
  • CD
  • Not-D

In this context, the supposition that A can be refuted by a reductio ad absurdum. For if A were conjoined to these premisses, we will arrive at both D and not-D which is patently absurd. Hence it is untenable (false) in the context of this family of givens.When someone is “caught out in a contradiction” in this way their position self-destructs in a reduction to absurdity. An example is provided by the exchange between Socrates and his accusers who had charged him with godlessness. In elaborating this accusation, these opponents also accused Socrates of believing in inspired beings (daimonia). But here inspiration is divine inspiration such a daimonism is supposed to be a being inspired by a god. And at this point Socrates has a ready-made defense: how can someone disbelieve in gods when he is acknowledged to believe in god-inspired beings. His accusers here become enmeshed in self-contradiction. And their position accordingly runs out into absurdity. (Compare Aristotle, Rhetorica 1398a12 [II xxiii 8].)

6. Absurd Definitions and Specifications

Even as instructions can issue in absurdity, so can definitions and explanations. As for example:

  • A zor is a round square that is colored green.

Again consider the following pair:

  • A bird is a vertebrate animal that flies.
  • An ostrich is a species of flightless bird.

Definitions or specifications that are in principle unsatisfiable are for this very reason absurd.

7. Per Impossible Reasoning

Per impossible reasoning also proceeds from a patently impossible premiss. It is closely related to, albeit distinctly different from reductio ad absurdum argumentation. Here we have to deal with literally impossible suppositions that are not just dramatically but necessarily false thanks to their logical conflict with some clearly necessary truths, be the necessity at issue logical or conceptual or mathematical or physical. In particular, such an utterly impossible supposition may negate:

  • a matter of (logico-conceptual) necessity (“There are infinitely many prime numbers”).
  • a law of nature (“Water freezes at low temperatures”).

Suppositions of this sort commonly give rise to per impossible counterfactuals such as:

  • If (per impossible) water did not freeze, then ice would not exist.
  • If, per impossible, pigs could fly, then the sky would sometimes be full of porkers.
  • If you were transported through space faster than the speed of light, then you would return from a journey younger than at the outset.
  • Even if there were no primes less than 1,000,000,000, the number of primes would be infinite.
  • If (per impossible) there were only finitely many prime numbers, then there would be a largest prime number.

A somewhat more interesting mathematical example is as follows: If, per impossible, there were a counterexample to Fermat’s Last Theorem, there would be infinitely many counterexamples, because if xk + yk = zk, then (nx)k + (ny)k = (nz)k, for any k.

With such per impossible counterfactuals we envision what is acknowledged as an impossible and thus necessarily false antecedent, doing so not in order to refute it as absurd (as in reductio ad absurdum reasoning), but in order to do the best one can to indicate its “natural” consequences.

Again, consider such counterfactuals as:

  • If (per impossible) 9 were divisible by 4 without a remainder, then it would be an even number.
  • If (per impossible) Napoleon were still alive today, he would be amazed at the state of international politics in Europe.

A virtually equivalent formulation of the very point at issue with these two contentions is:

  • Any number divisible by 4 without remainders is even.
  • By the standards of Napoleonic France the present state of international politics in Europe is amazing.

However, the designation per impossible indicates that it is the conditional itself that concerns us. Our concern is with the character of that consequence relationship rather than with the antecedent or consequent per so. In this regard the situation is quite different from reductio argumentation by which we seek to establish the untenability of the antecedent. To all intents and purposes, then, counterfactuals can serve distinctly factual purpose.And so, often what looks to be a per impossible conditional actually is not. Thus consider

  • If I were you, I would accept his offer.

Clearly the antecedent/premiss “I = you” is absurd. But even the slightest heed of what is communicatively occurring here shows that what is at issue is not this just-stated impossibility but a counterfactual of the format:

  • If I were in your place (that is, if I were circumstanced in the condition in which you now find yourself), then I would consult the doctor.

Only by being perversely literalistic could the absurdity of that antecedent be of any concern to us.

One final point. The contrast between reductio and per impossible reasoning conveys an interesting lesson. In both cases alike we begin with a situation of exactly the same basic format, namely a conflict of contradiction between an assumption of supposition and various facts that we already know. The difference lies entirely in pragmatic considerations, in what we are trying to accomplish. In the one (reductio) case we seek to refute and rebut that assumptions so as to establish its negation, and in the other (per impossible) case we are trying to establish an implication – to validate a conditional. The difference at bottom thus lies not in the nature of the inference at issue, but only in what we are trying to achieve by its means. The difference accordingly is not so much theoretical as functional – it is a pragmatic difference in objectives.

8. References and Further Reading

  • David Daube, Roman Law (Edinburgh: Edinburgh University Press, 1969), pp. 176-94.
  • M. Dorolle, “La valeur des conclusion par l’absurde,” Révue philosophique, vol. 86 (1918), pp. 309-13.
  • T. L. Heath, A History of Greek Mathematics, vol. 2 (Oxford: Clarendon Press, 1921), pp. 488-96.
  • A. Heyting, Intuitionism: An Introduction (Amsterdam, North-Holland Pub. Co., 1956).
  • William and Martha Kneale, The Development of Logic (Oxford: Clarendon Press, 1962), pp. 7-10.
  • J. M. Lee, “The Form of a reductio ad absurdum,” Notre Dame Journal of Formal Logic, vol. 14 (1973), pp. 381-86.
  • Gilbert Ryle, “Philosophical Arguments,” Colloquium Papers, vol. 2 (Bristol: University of Bristol, 1992), pp. 194-211.

Author Information

Nicholas Rescher
Email: rescher+@pitt.edu
University of Pittsburgh
U. S. A.

Russell-Myhill Paradox

russellThe Russell-Myhill Antinomy, also known as the Principles of Mathematics Appendix B Paradox, is a contradiction that arises in the logical treatment of classes and “propositions”, where “propositions” are understood as mind-independent and language-independent logical objects. If propositions are treated as objectively existing objects, then they can be members of classes. But propositions can also be about classes, including classes of propositions. Indeed, for each class of propositions, there is a proposition stating that all propositions in that class are true. Propositions of this form are said to “assert the logical product” of their associated classes. Some such propositions are themselves in the class whose logical product they assert. For example, the proposition asserting that all-propositions-in-the-class-of-all-propositions-are-true is itself a proposition, and therefore it itself is in the class whose logical product it asserts. However, the proposition stating that all-propositions-in-the-null-class-are-true is not itself in the null class. Now consider the class w, consisting of all propositions that state the logical product of some class m in which they are not included. This w is itself a class of propositions, and so there is a proposition r, stating its logical product. The contradiction arises from asking the question of whether r is in the class w. It seems that r is in w just in case it is not.

This antinomy was discovered by Bertrand Russell in 1902, a year after discovering a simpler paradox usually called “Russell’s paradox.” It was discussed informally in Appendix B of his 1903 Principles of Mathematics. In 1958, the antinomy was independently rediscovered by John Myhill, who found it to plague the “Logic of Sense and Denotation” developed by Alonzo Church.

Table of Contents

  1. History and Historical Importance
  2. Formulation and Derivation
  3. Frege’s Response
  4. Possible Solutions
  5. References and Further Reading

1. History and Historical Importance

In his early work (prior to 1907) Russell held an ontology of propositions understood as being mind independent entities corresponding to possible states of affairs. The proposition corresponding to the English sentence “Socrates is wise” would be thought to contain both Socrates the person and wisdom (understood as a Platonic universal) as constituent entities. These entities are the meanings of declarative sentences.

After discovering “Russell’s paradox” in 1901 while working on his Principles of Mathematics, Russell began searching for a solution. He soon came upon the Theory of Types, which he describes in Appendix B of the Principles. This early form of the theory of types was a version of what has later come to be known as the “simple theory of types” (as opposed to ramified type theory). The simple theory of types was successful in solving the simpler paradox. However, Russell soon asked himself whether there were other contradictions similar to Russell’s paradox that the simple theory of types could not solve. In 1902, he discovered such a contradiction. Like the simpler paradox, Russell discovered this paradox by considering Cantor’s power class theorem: the mathematical result that the number of classes of entities in a certain domain is always greater than the number in the domain itself. However, there seems to be a 1-1 correspondence between the number of classes of propositions and the number of propositions themselves. A different proposition can seemingly be generated for each class of propositions, for instance, the proposition stating that all propositions in the class are true. This would mean that the number of propositions is as great as the number of classes of propositions, in violation of Cantor’s theorem.

Unlike Russell’s paradox, this paradox cannot be blocked by the simple theory of types. The simple theory of types divides entities into individuals, properties of individuals, properties of properties of individuals, and so forth. The question of whether a certain property applies to itself does not arise, because properties never apply to entities of their own type. Thus there is no question as to whether the property that a property has just in case it does not apply to itself applies to itself. Classes can only have entities of a certain type: the type to which the property defining the class applies. There can be classes of individuals, classes of classes of individuals, and classes of classes of classes of individuals, etc., but never classes that contain members of different types. Thus, there is no such thing as the class of all classes that are not in themselves. However, on the simple theory of types, propositions are not properties of anything, and thus, they are all in the type of individuals. However, they can include classes or properties as constituents. But consider the property a proposition has just in case it states the logical product of a class it is not in. This property defines a class. This class will be a class of individuals; for any individual, the question arises whether that individual is in the class. However, the proposition stating the logical product of this class is also an individual. Thus, the problematic question is not avoided by the simple theory of types.

Some authors have speculated that this antinomy was the first hint Russell found that what was needed to solve the paradoxes was something more than the simple theory of types. If so, then this antinomy is of considerable importance, as it might represent the first motivation for the ramified theory of types adopted by Russell and Whitehead in Principia Mathematica.

2. Formulation and Derivation

In 1902, when he discovered this paradox, Russell’s logical notation was borrowed mostly from Peano. However, translating into more contemporary notation, the class w of all propositions stating the logical product of a class they are not in, and r, the proposition stating its logical product, are written as follows:

w = {p: (∃m)[(p = (∀q)(qmq)) & ~(p m)]}
r = (∀q)(q w q)

Because propositions are entities, variables for them in Russell’s logic can be bound by quantifiers and can flank the identity sign. Indeed, Russell also allows complete sentences or formulae to flank the identity sign. If α is some complex formula, then “p = α” is to be understood as asserting that p is the proposition that “α”. Thus, w is defined as the class of propositions p such that there is a class of m for which p is the proposition that all propositions q in m are true, and such that p is not in m. The proposition r is then defined as the proposition stating that all propositions in w are true.

The derivation of the contradiction requires certain principles involving the identity conditions of propositions understood as entities. These principles were never explicitly formulated by Russell, but are informally stated in his discussion of the antinomy in the Principles. However, other writers have sought to make these principles explicit, and even to develop a fully formulated intensional logic of propositions based on Russell’s views. The principles relevant for the derivation of the contradiction are the following:

Principle 1: (∀p)(∀q)(∀r)(∀s)[((p q) = (r s)) →((p = r) & (q = s))]
Principle 2: [(∀x)A(x) = (∀x)B(x)] →(∀y)[A(y) = B(y)]

The first principle states that identical conditional propositions have identical antecedent and consequent component propositions. The second states that if the universal proposition that everything satisfies open formula A(x) is the same as the universal proposition that everything satisfies open formula B(x), then for any particular entity y, the proposition that A(y) is identical to the proposition that B(y).

Then, from either the assumption that rw or the assumption ~(r w), the opposite follows.

Assume:

1. rw

From (1), by class abstraction and the definition of w:

2. (∃m)[(r = (∀q)(q m q)) & ~(r m)]

(2) allows us to consider some m such that:

3. (r = (∀q)(qm q)) & ~(r m)

From the first conjunct of (3) definition of r we arrive at:

4. (∀q)(qw q) = (∀q)(qm q)

By (4) and principle 2, then:

5. (∀q)[(qw q) = (q m q)]

Instantiating (5) to r, we conclude:

6. (rw r) = (rm r)

By (6), and principle 1, then:

7. (rw) = (r m)

This, with the second disjunct of (3), yields:

8. ~(rm)

By (7) and (8) and substitution of identicals, we get:

9. ~(rw)

This contradicts our assumption. However, assume instead:

10. ~(rw)

By (10) and class abstraction:

11. ~(∃m)[(r = (∀q)(q m q)) & ~(r m)]

By the rules of the quantifiers and propositional logic, (11) becomes:

12. (∀m)[(r = (∀q)(q m q)) → (rm)]

Instantiating (12) to w:

13. (r = (∀q)(qw q)) → (r w)

By (13), the definition of r, and modus ponens:

14. rw

Thus, from either assumption the opposite follows.

3. Frege’s Response

Soon after discovering this antinomy, in September of 1902, Russell related his discovery to Gottlob Frege. Although Frege was clearly devastated by the simpler “Russell’s paradox”, which Russell had related to Frege three months prior, Frege was not similarly impressed by the Russell-Myhill antinomy. Russell had formulated the antinomy in Peano’s logical notation, and Frege charged that the apparent paradox derived from defects of Peano’s symbolism.

In Frege’s own way of speaking, a “proposition” is understood simply as a declarative sentence, a bit of language. Frege certainly did not ascribe to propositions the sort of ontology Russell did. However, he thought propositions had both senses and references. He called the senses of propositions “thoughts” and believed that their references were truth-values, either the True or the False. An expression written in his logical language was thought to stand for its reference (though express a thought). When propositions flank the identity sign, e.g. “p = q” this is taken as expressing that the two propositions have the same truth-value, not that they express the same thought.

Thus, Frege was unsatisfied with Russell’s formulation of the antinomy. In Russell’s definition “w = {p: (∃m)[(p = (∀q)(qm q)) & ~(p m)]}”, the part “p = (∀q)(qm q)” seems to mean not an identity of truth-values, but thoughts. However, if this is the case, then “(∀q)(q m q)” must be understood as referring to, rather than simply expressing, a thought. However, on Frege’s view, this would mean that the expressions that occur in it have indirect reference, i.e. they refer to the thoughts they customarily express. However, in indirect reference, the variable “m” in that context must be understood not as standing for a class, but as standing for a sense picking out a class. However, the second occurrence of “m” later on in the definition of w must be understood as referring to a class, not a sense picking out a class. However, if the two occurrences of “m” do not refer to the same thing, it is extremely problematic that they be bound by the same quantifier. Moreover, Russell’s derivation of the contradiction requires treating the two occurrences of “m” as referring to the same thing. Thus, Frege himself concluded that the antinomy was due to unclarities in the symbolism Russell used to formulate the paradox. He suggests that the antinomy can only be derived in a system that conflates or assimilates sense and reference.

However, it is not clear that Frege’s response is adequate. Frege criticizes only the syntactic formulation of the antinomy in a logical language, not the violation of Cantor’s theorem lying behind the paradox. Frege does not have an ontology of propositions, but he does have an ontology of thoughts. Thoughts, as objectively existing entities, can be members of classes. Moreover, it seems that there will be as many thoughts as there are classes of thoughts. One can generate a different thought for every class, i.e. the thought that everything is in the class or that all thoughts in the class are true. We now consider the class of all thoughts that state the logical product of a class they are not in, and a thought stating the logical product of this class, and arrive at the same contradiction. Frege’s metaphysics seems to have similar difficulties.

It is true that the antinomy cannot be formulated in Frege’s own logical systems. However, this is only because those systems are entirely extensional. In them, it is impossible to refer to thoughts (as opposed to simply express them) and assert their identity–one can only refer to truth-values and assert their identity. However, it appears that if Frege’s logical systems were expanded to include commitment to the realm of sense, to make it possible to refer not only to truth-values and classes, but thoughts and other senses, a version of the antinomy would be provable. In 1951, Alonzo Church developed an expanded logical system based loosely on Frege’s views, which he called “the Logic of Sense and Denotation”. In 1958, John Myhill discovered that the antinomy considered here was formulable in Church’s system. Myhill seems to have rediscovered the paradox independently of Russell. Hence the term, “Russell-Myhill Antinomy.”

4. Possible Solutions

The antinomy results from the following commitments

(A) The commitment to classes, defined for every property,

(B) The commitment to propositions as intensional entities (or to similar entities, such as Frege’s thoughts),

(C) An understanding of propositions such that there must exist as many propositions as there are classes of propositions; i.e. a different proposition can be generated for every class,

(D) An understanding of propositions and classes such that for every proposition and every class of propositions, the question arises as to whether the proposition falls in the class.

One might hope to solve the antinomy by abandoning any one of these commitments. Let us examine them in turn.

Abandoning (A), the commitment to classes, is very tempting, especially given the other paradoxes of class theory. However, in this context, this option may be not be as fruitful as it might appear. Russell himself worked on a “no classes” theory from 1905 though 1907. However, he soon discovered a classless version of the same paradox. Here, rather than considering a class w consisting of propositions, we consider a property W that a proposition p has just in case there is some property F for which p states that all propositions with F are true but which p does not itself have. Thus:

(∀p)[Wp ↔ (∃F)[(p = (∀q)(Fqq)) & ~Fp]]

We then define proposition r as the proposition that all propositions with property W are true:

r = (∀q)(Wqq)

Then, via a similar deduction to that given above, from the assumption of Wr one can prove ~Wr and vice versa. Thus it does not do to simply abandon classes. One would also have to abandon a robust ontology of properties; perhaps eschewing all of higher-order logic.

One might simply want to abandon (B), the commitment to propositions or Fregean thoughts understood as logical entities. The commitment to logical entities in a Platonic realm has grown less and less popular, especially given the widespread view that logic ought to be without ontological commitment. The challenge would be to abandon such intensional entities while maintaining a plausible account of meaning and intentionality.

However, one might hope to maintain commitment to propositions or thoughts, but attempt to reduce the number posited. This would likely involve denying (C). The Cantorian construction lying at the heart of the antinomy involves the claim that one can generate a different proposition for every class. In the construction given above, this claim is justified by showing that for each class, one can generate a proposition stating its logical product, and showing that, for each class, the class so generated is different. To deny this, one could either deny that one can generate such a proposition for each class, or instead, deny that the proposition so generated is different for every class. The first strategy is difficult to justify if one understands propositions and classes as objectively existing entities, independent of mind and language. If a proposition exists for every possible state of affairs, then one such proposition will exist for every class.

However, if one adopts looser identity conditions for propositions or thoughts, one might attempt to take the second approach to denying (C). That is, one would allow that the proposition stating the logical product of one class might be the same proposition as the proposition stating the logical product of a different class. This is perhaps not an easy approach to justify. In the Russellian deduction given above, principles 1 and 2 guarantee that the proposition stating the logical product of one class is always different from the proposition stating the logical product of another class. These principles seem justified by the understanding of propositions as composite entities with a certain fixed structure. Consider principle 1. It states that identical conditional propositions have identical propositions in their antecedent and consequent positions. However, this might be denied if one were adopt looser identity conditions for propositions. One might, for example, adopt logical equivalence as being a sufficient condition for propositions to be identical. If so, then principle 1 would be unjustified. For example pq and ~q → ~p are logically equivalent, however, they obviously need not have the same antecedent propositions. However, this approach may lead to other difficulties. Often, part of the motivation for intensional entities such as propositions or Fregean thoughts is in order to view them as relata in belief and other intentional states. If one adopts logical equivalence as sufficient for propositions to be identical, this is extremely problematic. The simple proposition p is logically equivalent to the proposition ~(p & ~q) → ~(q → ~p). If we take these two be the same proposition, then if propositions are relata in belief states, we seemingly must conclude that anyone who believes p also believes ~(p & ~q) → ~(q → ~p). This does not seem to be true.

W. V. Quine is famous for suggesting that intensional entities are “creatures of darkness”, having obscure identity conditions. Here it appears that if the identity conditions of intensions are taken to be too loose, then intensions cannot do many of the things we want of them. If the identity conditions of intensions are too stringent, however, it is difficult to avoid positing so many of them that inconsistency with Cantor’s theorem is a genuine threat.

Lastly, one could maintain commitment to a great number of propositions or thoughts as entities, but block the paradox by suggesting that these entities fall into different logical types. That is, one could deny (D), and suggest instead that the question does not always arise for every proposition and class of propositions whether that proposition is in that class. This is in effect the approach taken with ramified type-theory. In ramified type theory, the type of a formula α depends not only on whether α stands for an individual, a property of an individual, or a property of a property of an individual, etc., but also on what sort of quantification α involves. The core notion is that α cannot involve quantification over, or classes including, entities within a domain that includes the thing that α itself stands for. Consider the proposition r from the antinomy. Recall that r was defined as (∀q)(q m q). Thus, r involves quantification over propositions. In ramified type theory, we would disallow r to fall within the range of the quantifier involved in the definition of r. If a certain proposition involves quantification over a range of propositions, it cannot be included in that range. Thus, we divide the type of propositions into orders. Propositions of the lowest order include mundane propositions such as the proposition that Socrates is bald or the proposition that Hypatia is wise. Propositions of the next highest order involve quantification over, or classes of, propositions of this order, such as the proposition that all such propositions are true, or the proposition that if such a proposition is true, then God believes it, etc. Here, the challenge is to justify the ramified hierarchy as something more than a simple ad hoc dodge of the antinomies, to provide it with solid philosophical foundations. Poincaré’s Vicious Circle Principle is perhaps one way of providing such justification.

Antinomies such as the Russell-Myhill antinomy must be a concern for anyone with a robust ontology of intensional entities. Nevertheless, there may be solutions to the antinomy short of eschewing intensions altogether.

5. References and Further Reading

  • Anderson, C. A. “Semantic Antinomies in the Logic of Sense and Denotation.” Notre Dame Journal of Formal Logic 28 (1987): 99-114.
  • Anderson, C. A.. “Some New Axioms for the Logic of Sense and Denotation: Alternative (0).” Noûs 14 (1980): 217-34.
  • Church, Alonzo. “A Formulation of the Logic of Sense and Denotation.” In Structure, Method and Meaning: Essays in Honor of Henry M. Sheffer, edited by P. Henle, H. Kallen and S. Langer. New York: Liberal Arts Press, 1951.
  • Church, Alonzo. “Russell’s Theory of Identity of Propositions.” Philosophia Naturalis 21 (1984): 513-22.
  • Frege, Gottlob. Correspondence with Russell. In Philosophical and Mathematical Correspondence. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Klement, Kevin C. Frege and the Logic of Sense and Reference, New York: Routledge, 2002.
  • Myhill, John. “Problems Arising in the Formalization of Intensional Logic.” Logique et Analyse 1 (1958): 78-83.
  • Russell, Bertrand. Correspondence with Frege. In Philosophical and Mathematical Correspondence, by Gottlob Frege. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Russell, Bertrand. The Principles of Mathematics. 1902. 2d. ed. Reprint, New York: W. W. Norton & Company, 1996, especially §500.

Author Information

Kevin C. Klement
Email: klement@philos.umass.edu
University of Massachusetts, Amherst
U. S. A.

Logical Paradoxes

paradox_logicalA paradox is generally a puzzling conclusion we seem to be driven towards by our reasoning, but which is highly counterintuitive, nevertheless. There are, among these, a large variety of paradoxes of a logical nature which have teased even professional logicians, in some cases for several millennia. But what are now sometimes isolated as “the logical paradoxes” are a much less heterogeneous collection: they are a group of antinomies centered on the notion of self-reference, some of which were known in Classical times, but most of which became particularly prominent in the early decades of last century. Quine distinguished amongst paradoxes such antinomies. He did so by first isolating the “veridical” and “falsidical” paradoxes, which, although puzzling riddles, turned out to be plainly true, or plainly false, after some inspection. In addition, however, there were paradoxes which “produce a self-contradiction by accepted ways of reasoning,” and which, Quine thought, established “that some tacit and trusted pattern of reasoning must be made explicit, and henceforward be avoided or revised.” We will first look, more broadly, and historically, at several of the main conundrums of a logical nature which have proved difficult, some since antiquity, before concentrating later on the more recent troubles with paradoxes of self-reference. They will all be called “logical paradoxes.”

Table of Contents

  1. Classical Logical Paradoxes
  2. Moving to Modern Times
  3. Some Recent Logical Paradoxes
  4. Paradoxes of Self-Reference
  5. A Contemporary Twist
  6. References and Further Reading

1. Classical Logical Paradoxes

The four main paradoxes attributed to Eubulides, who lived in the fourth century BC, were “The Liar,” “The Hooded Man,” “The Heap,” and “The Horned Man” (compare Kneale and Kneale 1962, p114).

The Horned Man is a version of the “When did you stop beating your wife?” puzzle. This is not a simple question, and needs a carefully phrased reply, to avoid the inevitable come-back to “I have not.” How is one to understand this denial, as saying you continue to beat your wife, or that you once did but do so no longer, or that you never have, and never will? It is a question of what the “not,” or negation means, in this case. If “stopped beating” means “beat before, but no longer,” then “not stopped beating” covers both “did not beat before” and “continues to beat.” And in that case “I haven’t” is an entirely correct answer to the question, if you in fact did not beat your wife. However, your audience might still need to be taken slowly through the alternatives before they clearly see this. Likewise with the Horned Man, which arises if someone wants to say, for instance, “what you have not lost you still have.” In that case they will maybe have to accept the unwelcome conclusion “I still have horns,” if they admit “I have not lost any horns.” Here, if “lost” means “had, but do not still have” then “not lost” would cover the alternative “did not have in the first place” as well as “do still have” — in which case what you have not lost you do not necessarily still have.

The Heap is nowadays commonly referred to as the Sorites Paradox, and concerns the possibility that the borderline between a predicate and its negation need not be finely drawn. We would all say that a man with no hairs on his head was bald, and that a man with, say, 10,000 hairs on his head was hirsute, that is not bald, but what about a man with only 1,000 hairs on his head, which are, say, evenly spread? It is not too clear what we should say, although maybe some would still want to say positively “bald,” while others would want to say “not bald.” The learned treatment of this issue, in recent years, has been very extensive, with “the lazy solution” not being the only one favoured, by any means. The lazy solution says that any lack of certainty about what to say is merely a matter of us not having yet decided upon, or even having the need to make up our mind about, a “precisification” of the concept of baldness. There are objectors to this “epistemic” way of seeing the matter, some of whom would prefer to think, for instance (see, e.g. Sainsbury 1995), that there was something essentially “fuzzy” about baldness, so it is a “vague predicate” by the nature of things, instead of just through lack of effort, or need. (For recent work in this area, see, for instance, Williamson 1994, and Keefe 2001).

The Hooded Man is about the concept of knowledge, and in other versions has again been much studied in recent years, as we shall see. In its original version the problem is this: maybe you would be prepared to say that you know your brother, yet surely someone might come in, who was in fact your brother, but with his head covered, so you did not know who it was. One aspect of this paradox is that the verb “know” is ambiguous, and in fact is translated by two separate terms in several other languages than English — French, for instance, has “connâitre” and “savoir.” There is the sense of “being acquainted with,” in other words, and the sense of “knowing a fact about something.” Perhaps these two senses are inter-related, but distinguishing them provides one way out of the Hooded Man. For we can distinguish being acquainted with your brother from knowing that someone is your brother. Although you do not know it, you are certainly acquainted with the hooded man, since he is your brother, and you are acquainted with your brother. But that does not entail that you know that the hooded man is your brother, indeed, evidently you do not. We could also say, in that case, that you did not recognize your brother, for the notion of recognition is close to that of knowledge. And that points to another aspect of the problem, and another way of resolving the paradox — showing, in addition, that there needn’t just be one solution, or way out. Thus you might well be able to recognise your brother, but that does not require you can always do so, it merely means you can do better at this than those people who cannot do so. If we re-phrase the case: “you can recognise your brother, but you did not recognise him when he had his head covered,” then there is not really a paradox.

The last of Eubulides’ paradoxes mentioned above was The Liar, which is perhaps the most famous paradox in the “self-reference” family. The basic idea had several variations, even in antiquity. There was, for instance, The Cretan, where Epimenides, a Cretan, says that all Cretans are liars, and The Crocodile, where a crocodile has stolen someone’s child, and says to him “I will return her to you if you guess correctly whether I will do so or not” — to which the father says “You will not return my child”! Indeed a whole host of complications of The Liar have been constructed, especially in the last century, as we shall see. Now in The Cretan there is no real antinomy — it may simply be false that all Cretans are liars; but if someone says just “I am lying,” the situation is different. For if it is true that he is lying then seemingly what he says is false; but if it is false that he is lying then what he is saying may seem to be true. A pedant might say that “lying” was strictly not telling an untruth, but telling merely what one believes to be an untruth. In that case there is not the same difficulty with the person’s remark being true: maybe he is indeed lying, although he does not believe it. The pedant, however, misses the point that his verbal nicety can be circumvented, and the paradox re-constructed in another, indeed many other forms. We shall look in more detail later at the paradox here, in some of its more complicated versions.

Before leaving the ancients, however, we can look at Zeno’s Paradoxes, which not only have a logical interest in their own right, but also have a very close bearing on some paradoxes which appeared later, to do with infinity, and infinitesimals. Zeno’s Paradoxes are primarily about the possibility of motion, but more generally they are about the possibility of specifying the units, or atomic parts, of which either space or time, or indeed any continuum may be thought to be composed.

For, Zeno argued (see, for instance, Owen 1957, and Salmon 1970), if there were such units then they would either have a size, or not have a size. But if they had a size we would have the paradox of The Stadium, while if they had no size we would have the paradox of The Arrow. Thus if runners A and B are approaching one another both at unit speed, then, supposing the units have a finite size, after one time unit they will have each moved one space unit relative to the stadium. But they will have moved two space units relative to each other, which implies that there was a time unit in between when they were just one space unit apart. So the time unit must be divisible after all. On the other hand, if the units of division have no size, then, at any given time, an arrow in flight must occupy a space just equal to itself — for it cannot move within that time. But if so then it is at rest, and the arrow never moves.

That would seem to mean that space and time are divided without limit. But Zeno argued that if space and time were in themselves divided without limit then we would have the paradox of Achilles and the Tortoise. A runner, before he gets to the end of his race would have to get to the half-way point, but then also to the half-way point beyond that, that is the three-quarter-way point, and so on. There would be no limit to the sequence of points he would have to get to, and so there would always be a bit more to be run, and he could never get to the end. Likewise in a competitive race, even, say, between the super-speedy Achilles and a tortoise: Achilles would not be able to catch the tortoise up — so long as the tortoise was given a start. For Achilles has first to get to the tortoise’s original position, but by then the tortoise will be, however fractionally, further on. Now Achilles must always reach the tortoise’s previous position before catching him up. Hence he never catches it up.

Aristotle had a way of resolving Zeno’s Paradoxes which convinced most people until more recent times. Aristotle’s resolution of Zeno’s Paradoxes involved distinguishing between space and time being in themselves divided into parts without limit, and simply being divisible (by ourselves, for instance) without limit. No continuous magnitude, Aristotle thought, is actually composed of parts, since, although it may be divisible into parts without limit, the continuum is given before any such resulting division into parts. In particular, Aristotle denied that there could be any non-finite parts, and so is often called a “Finitist”: non-finite “parts” cannot be parts of space or time, he thought, since no magnitude can be composed of what has no extension. This view came to be challenged later, since it means that an arrow can only be “at rest” if it is at the same place at two separate times — for Aristotle both rest and motion can only be defined over a finite increment of time. But later the notion of an instantaneous velocity came to be accepted, and that includes the case where the velocity is zero.

The puzzle about non-finite parts may remind one of the question which occupied many scholastic theologians in the Middle Ages: how many angels can sit upon a pin? And it is perhaps no accident that the theorist who gave the currently received answer to the general question of how many things without any extension make up a whole which has such an extension was a fervent believer in God. Certainly Aristotle’s Finitism only stayed generally persuasive until the latter end of the nineteenth century, when the theorist in question, Cantor, specified the number of non-finite points in a continuum to most learned people’s satisfaction.

2. Moving to Modern Times

Between the classical times of Aristotle and the late nineteenth century when Cantor worked, there was a period in the middle ages when paradoxes of a logical kind were considered intensively. That was during the fourteenth century. Notable individuals were Paul of Venice, living towards the end of that century, and John Buridan, born just before it. As models of the care, and clarity which is required to extricate oneself from the above kind of difficulties with problem propositions each of these writers will surely stand forever. As an illustration, Buridan discusses “No change is instantaneous” in the following way (Scott 1966, p178):

I prove it, because every change is either in an indivisible instant or it is in a divisible time. But none is in an indivisible instant, since an indivisible instant cannot be given in time, as is always supposed. Hence every change is in divisible time, and every such must be called temporal and not instantaneous.

The opposite is argued, because at least the creation of our intellective soul is instantaneous. For since it is indivisible, it must be made altogether at once, not one part after another. And such creation we call instantaneous. Therefore.

Buridan also discusses “You know the one approaching,” which resembles Eubulides’ Hooded Man (Scott 1966, p178):

I posit the case that you see your father coming from a distance, in such a way that you do not discern whether it is your father or another.Then it is proved, because you do indeed know your father, and he is the one approaching; hence, you know the one approaching. Likewise, you know him who is known by you, but the one approaching is known by you; hence, you know the one approaching. I prove the minor, because your father is known by you and your father is the one approaching; hence, the one approaching is known by you.

The opposite is argued, because you do not know him of whom, if you are asked who he is, you will answer truly “I do not know.” But concerning the one approaching you say this; hence etc.

These two cases are “sophisms” in Buridan’s book on such, Sophismata, and amongst these, in chapter 8, are the “insolubles,” which are the ones involving some form of self-reference. Broadly speaking, that is to say, Buridan made a distinction similar to that mentioned before, between general paradoxes of a logical nature, and “the logical paradoxes.” Thus in his chapter 8 Buridan discusses Eubulides’ Liar Paradox in several forms, for instance as it arises with “Every proposition is false” in the following circumstances (Scott 1966, p191): “I posit the case that all true propositions should be destroyed and false ones remain. And then Socrates utters only this proposition: ‘Every proposition is false’.”

Extended discussion of such cases may seem somewhat academic, but between Buridan’s period, and more recent times, one notable figure started to bring out something of the larger importance of these issues. Indeed, quite generally, sophisms about the nature of change and continuity, about knowledge and its objects, and the ones about the notion of self-reference, amongst many others, have attracted a great deal of very professional attention, once their significance was realised, with techniques of analysis drawn from developments in formal logic and linguistic studies being added to the careful and clear expression, and modes of argument found in the best writers before. The pace of change started to quicken in the later nineteenth century, but the one earlier thinker who will also be mentioned here is Bishop Berkeley, who was active in the early eighteenth. For a history of this period, in connection with the issues which concerned Berkeley, see, for instance, Grattan-Guinness 1980. Berkeley’s argument was with Newton about the foundations of the calculus; he took, amongst other things, a sceptical line about the possibility of instantaneous velocities.

It will be remembered that in the calculation of a derivative the following fraction is considered:

f(x + δx) – f(x) / δx,

where δx is a very small quantity.In the elementary case where f(x) = x2, for instance, we get

(x + δx)2 – x2 / δx,

and the calculation goes first to

2xδx+ δx2/ δx,

and then to 2x + δx, with δx being subsequently set to zero to get the exact derivative 2x.Berkeley objected that only if δx was not zero could one first divide through by it, and so one was in no position, with the result of that operation, to then take δx to be zero.If it took δx to be zero Newton’s calculus, it seemed, required the impossible notion of an instantaneous velocity, which, of course, Aristotle had denied in connection with his analysis of Zeno’s Paradoxes.The point was appreciated to some extent elsewhere.For the association between the derivative and motion, initiated by Newton’s use of the term “fluxion,” was largely confined to England, and on the Continent, Leibniz’ cotemporaneous development of the calculus had more hold.And that involved the idea that the increment δx was never zero, but merely remained a still finite “infinitesimal.”

One way of putting Aristotle’s Finitism is to say that he believed that infinities, such as the possible successive divisions of a line, were only “potential,” not “actual” — an actual infinite division would end up with non-extensional, and so non-finite points. Leibniz, however, had no problem with the notion of an actual infinite division of a line — or with the idea that the result could be a finite quantity. However, while Leibniz introduced finite infinitesimals instead of fluxions, this idea was also questioned as not sufficiently rigorous, and both ideas lost ground to definitions of derivatives in terms of limits, by Cauchy and Weierstrass in the nineteenth century. Leibniz’ notion of finite infinitesimals in fact has been given a more rigorous definition since that time, by Abraham Robinson, and other proponents of “non-standard analysis,” but it was on the previous, nineteenth century theory of real numbers that Cantor worked, before he came to formulate his theory of infinite numbers. Leibniz would not have thought it too sensible to ask how many of his infinitesimals made up the line, but Cantor made much more precise the answer “infinitely many.”

It is necessary to get some idea about the theory of real numbers before we can understand the next logical paradoxes which emerged in this tradition: Russell’s Paradox, Burali-Forti’s Paradox, Cantor’s Paradox, and Skolem’s Paradox. We will look at those in the next section, which will then lead us into twentieth century developments in the area of self-reference. But before all that it should be mentioned how recent discussions of knowledge and its objects, for instance, has become very professionalised, since developed discussion of issues to do with Eubulides’ Hooded Man has been just as dominant in this period.

These issues, it will be remembered, centred on the problem of non-recognition, and in various ways two central cases of this have been given close attention since the end of the nineteenth century. A great deal of other relevant discussion has also gone on, but these two cases are perhaps the most important, historically (see, for example, Linsky 1967). First must be mentioned Frege’s interest in the difficulty of inferring someone believes something about the Evening Star so long as they believe that thing about the Morning Star. In fact the Morning Star is the same as the Evening Star, we now realise, but this was not always recognised, and indeed it is now realised that even the term “star” is a misnomer, both objects being the planet Venus. Still someone ignorant of the astronomical identity, it may be thought, might accept “The Evening Star is in the sky,” but reject “The Morning Star is in the sky.” Quine produced another much discussed case of a similar sort, concerning Bernard J. Ortcutt, a respectable man with grey hair, once seen at the beach. In one location he was taken to be not a spy, in another place he was taken to be a spy, as one might say; but is that quite the best way the situation should be described? Maybe one who does not recognise him can have beliefs about the man at the beach without thereby having those beliefs about the respectable man with grey hair — or even Bernard J. Ortcutt. Certainly Quine thought so, which has not only caused a large scale controversy in itself; it has also led to, or been part of much broader discussions about identity in similar, but non-personal, intensional notions, like modality. Thus, as Quine pointed out, it would not seem to be necessary that the number of the planets is greater than 4, although it is necessary that 9 is greater than 4, and 9 is the number of the planets. A branch of formal logic, Intensional Logic, has been developed to enable a more precise analysis of these kinds of issue.

3. Some Recent Logical Paradoxes

It was developments in other parts of mathematics which were integral to the discovery of the next logical paradoxes to be considered. These were developments in the theory of real numbers, as was mentioned before, but also in Set Theory, and Arithmetic. Arithmetic is now taken to be concerned with a “denumerable” number of objects — the natural numbers — while real numbers are “non-denumerable.” Sets of both infinite sizes can be formed, it is now thought, which is the basis on which Cantor was to give his precise answer “two to the aleph zero” to the question of how many points there are on a line.

The tradition up to the middle of the nineteenth century did not look at these matters in this kind of way. For the natural numbers arise in connection with counting, for instance counting the cows in a field. If there are a number of cows in the field then there is a set of them: sets are collections of such individuals. But with the beef in the field we do not normally talk in these terms: “beef” is a mass noun not a count noun, and so it does not individuate things, merely name some stuff, and, as a result, a number can be associated with the beef in the field only given some arbitrary unit, like a pound, or a kilogram. When there is just some F then there isn’t a number of F’s, although there might be a number of, say, pieces of F. It is the same with continua like space and time, which we can divide into yards, or seconds, or indeed any finite quantity, and that is perhaps the main fact which supports Aristotle’s view that any division of such a continuum is merely potential rather than actual, and inevitably finite both in the unit used and in the number of them in a whole.

But continua from Cantor onwards have been seen as composed of non-finite individuals. And not only that is the change. For also the number of individuals in some set of individuals — whether cows, or the non-finite elements in beef — has been taken to be possibly non-finite, with a whole containing those individuals being then still available: the infinite set of them. We now commonly have the idea that there may be infinite sets first of finite entities, which will then be “countable” or “denumerable,” but also there will be sets of non-finite, infinitesimal entities, which will be “uncountable,” or “non-denumerable.”

It is important to appreciate the grip that these new ideas had on the late nineteenth century generation of mathematicians and logicians, since it came to seem, as a result of these sorts of changes, that everything in mathematics was going to be explainable in terms of sets: Set Theory looked like it would become the entire foundation for mathematics. Only once one has appreciated this expectation, which the vanguard of theorists uniformly had, can one realise the very severe jolt to that society which came with the discovery of Russell’s Paradox, and several others at much the same time, around the turn of the century. For Russell’s Paradox showed that not everything could be a set.

If we write “x is F” as “Fx”—as came to be common in this same period—then the set of F’s is written

{x|Fx},

and to say a is F, that is Fa, would then seem to be to say that a belonged to this set, that is

a ∈ {x|Fx},

where the symbol “∈” represents “is a member of.”

It therefore seems plausible to enunciate this as a general principle,

for all y: y ∈ {x|Fx} if and only if Fy,

which is symbolised in contemporary logic,

(y)(y ∈ {x|Fx} iff Fy).

But if the result held for all predicates “F” then we could say, for any “F”

there is a z such that: (y)(y ∈ z iff Fy),

which is now formalised

(∃>z)(y)(y ∈ z iff Fy).

In the foundations of Arithmetic which Frege described in his major logical works Begriffsschrift, and Grundgesetze, this principle is a major axiom (Kneale and Kneale 1962, Ch 8), but Russell found it was logically impossible, since if one takes for “Fy” the specific predicate “y does not belong to y,” that is “¬ y ∈ y” then it requires

(∃>z)(y)(y ∈ z iff ¬ y ∈ y),

wherefrom, given the above meanings of “(∃>z)” and “(y)”, we get the contradiction

z ∈ z iff ¬ z ∈ z,

that is z is a member of itself if and only if it is not a member of itself. As a result of this paradox which Russell discovered, the theory of sets was considerably altered, and limits were put on Frege’s axiom, so that, for instance, either it defined merely subsets of known sets (Zermelo’s theory), or allowed one to discriminate sets from other entities — usually called “proper classes” (von Neumann’s theory). In the latter case those things which are not members of themselves form a proper class but not a set, and proper classes cannot be members of anything.

But there were other reasons why it came to be realised that sets could not always be formed, following the discovery of Burali-Forti’s and Cantor’s Paradoxes. Burali-Forti’s Paradox is about certain sets called “ordinals,” because of their connection with the ordinals of ordinary language, that is “first,” “second,” “third,” etc. The sets which are ordinals are so ordered that each one is a member of all the following ones, and so, with no limit envisaged to the sets which could be formed, it seemed possible to prove that any succession of such ordinals would themselves be members of a further ordinal – which would have to be distinct from each of them. The trouble came in considering the totality of all ordinals, since that would mean that there would have to be a further distinct ordinal not in this totality, and yet it was supposed to be the totality of all ordinals. A very similar contradiction is reached in Cantor’s Paradox.

For, for finite sets of finite entities it is easy to prove Cantor’s Theorem, namely that the number of members of a set is strictly less than the number of its subsets. If one forms a set of the subsets of a given set then one produces the “power set” of the original set, so another way of stating Cantor’s Theorem is to say that the number of members of a set is strictly less than the number of members of its power set. Cantor extended this theorem to his infinite sets as well – although there was at least one such set he realised it obviously could not apply to, namely the set of everything, sometimes called the universal set. For the set of its subsets clearly could not have a greater number than the number of things in the universal set itself, since that contained everything. This was Cantor’s Paradox, and his resolution of it was to say that such an infinity was “inconsistent,” since it could not be consistently numbered. He thought, however, that only the size of infinite sets had to be limited, assuming that lesser infinities could be consistently numbered, and nominating, for a start, “aleph zero” as the number, or more properly the “power” of the natural numbers (Hallett 1984, p175). In fact an earlier paradox about the natural numbers had suggested that even they could not be consistently numbered: for they could be put into 1 to 1 correlation with the even numbers, for one thing, and yet there were surely more of them, since they included the odd numbers as well. This paradox Cantor took to be avoided by his definition of the power of a set (N.B. not the power set of a set): his definition merely required two sets to be put into 1 to 1 correlation in order for them to have the same power. Thus all infinite sequences of natural numbers have the same power, aleph zero.

But the number of points in a line was not aleph zero, it was two to the aleph zero, and Cantor produced several proofs that these were not the same. The most famous was his diagonal argument which seems to show that there must be orders of infinity, and specifically that the non-denumerably infinite is distinct from the denumerably infinite. For belief in real numbers is equivalent to belief in certain infinite sets: real numbers are commonly understood simply in terms of possibly-non-terminating decimals, but this definition can be derived from the more theoretical ones (Suppes 1972, p189). But can the decimals between, say, 0 and 1 be listed? Listing them would make them countable in the special sense of this which has been adopted, which amongst other things does not require there to be a last item counted. The natural numbers are countable in this sense, as before, and any list, it seems, can be indexed by the ordinal numbers. Suppose, however, that we had a list in which the n-th member was of this form:

an = 0.an1an2an3an4…,

where ani is a digit between 0 and 9 inclusive.Then that list would not contain the “diagonal” decimal am defined by

amn = 9 – ann,

since for n = m this equation is false, if only whole digits are involved. This seems to show that the totality of decimals in any continuous interval cannot be listed, which implies that there are at least two separate orders of infinity.

Of course, if there were no infinite sets then there would be no infinite numbers, countable or uncountable, and so an Aristotelian would not accept the result of this proof as a fact. Discrete things might, at the most, be potentially denumerable, for him. But the difficulty with the result extends even to those who accept that there are infinite sets, because of another paradox, Skolem’s Paradox, which shows that all theories of a certain sort must have a countable model, that is must be true in some countable domain of objects. But Set Theory is one such theory, and in it, supposedly, there must be non-countable sets. In fact a denumerable model for Set Theory has recently been specified, by Lavine (Lavine 1994), so how can Cantor’s diagonal proof be accommodated? Commonly it is accommodated by saying that, within the denumerable model of Set Theory, non-denumerability is represented merely by the absence of a function which can do the indexing of a set, that is produce a correlation between the set and the ordinal numbers. But if that is the case, then maybe the difficulty of listing the real numbers in an interval is comparable. Certainly given a list of real numbers with a functional way of indexing them, then diagonalisation enables us to construct another real number. But maybe there still might be a denumerable number of all the real numbers in an interval without any possibility of finding a function which lists them, in which case we would have no diagonal means of producing another. We seem to need a further proof that being denumerable in size means being listable by means of a function.

4. Paradoxes of Self-Reference

The possibility that Cantor’s diagonal procedure is a paradox in its own right is not usually entertained, although a direct application of it does yield an acknowledged paradox: Richard’s Paradox. Consider for a start all finite sequences of the twenty six letters of the English alphabet, the ten digits, a comma, a full stop, a dash and a blank space. Order these expressions according, first, to the number of symbols, and then lexicographically within each such set. We then have a way of identifying the n-th member of this collection. Now some of these expressions are English phrases, and some of those phrases will define real numbers. Let E be the sub-collection which does this, and suppose we can again identify the n-th place in this, for each natural number n. Then the following phrase, as Richard pointed out, would seem to define a real number which is not defined in the collection: “The real number whose whole part is zero, and whose n-th decimal place is p plus 1 if the n-th decimal of the real number defined by the n-th member of E is p, and p is neither 8 or 9, and is simply one if this n-th decimal is eight or nine.” But this expression is a finite sequence of the previously described kind.

One significant fact about this paradox is that it is a semantical paradox, since it is concerned not just with the ordered collection of expressions (which is a syntactic matter), but also their meaning, that is whether they refer to real numbers. It is this which possibly makes it unclear whether there is a specifiable list of expressions of the required kind, since while the total list of expressions can certainly be straightforwardly ordered, whether some expression defines a real number is maybe not such a clear cut matter. Indeed, it might be concluded, just from the very fact that a paradox ensues, as above, that whether some English phrase defines a real number is not always entirely settleable. In Borel’s terms, it cannot be decided effectively (Martin-Löf 1970, p44). Another very similar semantical paradox with this same aspect is Berry’s Paradox, about “the least integer not nameable in fewer than nineteen syllables.” The problem here is that that very phrase has less than nineteen syllables in it, and yet, if it names an integer, that integer would have to be not nameable in less than nineteen syllables. So is there a definite set of English expressions which name integers not nameable in less than nineteen syllables?

If some sort of fuzziness was the case then there would be a considerable difference between such paradoxes and the previous paradoxes in logical theory like Russell’s, Burali-Forti’s and Cantor’s, for instance. Indeed it has been common since Ramsey’s discussion of these matters, in the 1920s, to divide the major logical paradoxes into two: the semantic or linguistic on the one hand, and the syntactic or mathematical on the other. Mackie disagreed with Ramsey to a certain extent, although he was prepared to say (Mackie 1973, p262):

The semantical paradoxes…can thus be solved in a philosophical sense by demonstrating the lack of content of the key items, the fact that various questions and sentences, construed in the intended way, raise no substantial issue. But these are comments appropriate only to linguistic items; one would expect that this method would apply only to the semantic paradoxes, and not to “syntactic” ones like Russell’s class paradox, which are believed to involve only (formal) logical and mathematical elements.

Russell himself opposed the distinction, formulating his famous “Vicious Circle Principle” which, he held, all the paradoxes of self-reference violated. Specifically he held that statements about all the members of certain collections were nonsense (compare Haack 1978, p141):

Whatever involves all of a collection must not be one of a collection, or, conversely, if, provided a certain collection had a total it would have members only definable in terms of that total, then the said collection has no total.

But this, seemingly, would rule out specifying, for instance, a man as the one with the highest batting average in his team, since he is then defined in terms of a total of which he is a member. It effectively imposes a ban on all forms of self-reference, and so Russell’s uniform solution to the paradoxes is usually thought to be too drastic. Some might say “this may be using a cannon against a fly, but at least it stops the fly!”; but it also devastates too much else in the vicinity.

A more recent theorist to oppose Ramsey’s distinction has been Priest. In fact he has tried to prove that all the main paradoxes of self-reference have a common structure using a further insight of Russell’s, which he calls “Russell’s Schema” (Priest 1994, p27). This pre-dates Russell’s attachment to the Vicious Circle Principle, but Priest has shown that, when adapted and applied to all the main paradoxes, it matches the reasoning which leads to the contradiction in each one of them. This approach, however, presumes that semantical notions, like definability, designation, truth, and knowledge can be construed in terms of mathematical sets, which seems to be really the very supposition which Ramsey disputed.

Grelling’s Paradox also makes this supposition questionable. It is a self-referential, semantical paradox resembling, to some extent, Russell’s Paradox, and concerns the property which an adjective has if it does not apply to itself. Thus

“large” is not large,
“multi-syllabled” is multi-syllabled,
“English” is English,
“French” is not French.

Let us use the term “heterological” for the property of being non-self-applicable, so we can say that “large” and “French” are heterological, for instance, and we can write as a general definition

“x” is heterological if and only if “x” is not x.

But clearly, substituting “heterological” for “x” produces a contradiction. Does this contradiction mean there is no such concept as heterologicality, just as there is no such set as the Russell set? Goldstein has recently argued that this is so (Goldstein 2000, p67), following a tradition Mackie calls “the logical proof approach” (Mackie 1973, p254f), to which Ryle was a notable contributor (Ryle 1950-1). The point is made even more plausible given the very detailed logical analysis which Copi provided (Copi 1973, p301).

Copi first introduces the definition

Hs =df (∃>F)(sDesF&(P)(sDesP iff P=F) & ¬Fs),

in which “¬” abbreviates “not”, and “Des” refers to the relation between a verbal expression and the property it designates. Thus “sDesF” reads: s designates F. Copi’s proof of the contradiction then goes in the following way. First, H”H” entails in turn

(∃>F)(“H”DesF&(P)(“H”DesP iff P=F)&¬F”H”) – by substitution in the definition,

“H”DesF&(P)(“H”DesP iff P=F)&¬F”H” – by taking the case thus said to exist,

(“H”DesH iff H=F)&¬F”H” – by substitution in the “for all P”,

H=F&¬F”H” – by assuming “H” designates H,

Then ¬H”H” entails in turn

(F)¬(“H”DesF&(P)(“H”DesP iff P=F)&¬F”H”) – since “¬(∃>F)” is equivalent to “(F)¬”,¬(H”DesH&(P)(“H”DesP iff P=H)&¬H”H”) – substituting “H” for “F”,

¬((P)(“H”DesP iff P=H)&¬H”H”) – assuming “H” designates H,

H”H” – assuming (P)(“H”DesP iff P=H).

To get the contradiction

H”H” iff ¬H”H”,

therefore, one has to be assured that there is one and only one property which “H” designates. And Copi gives no proof of this.

The Liar Paradox is a further self-referential, semantical paradox, perhaps the major one to come down from antiquity. And one may very well ask, with respect to

What I am now saying is false,

for instance, whether this has any sense, or involves a substantive issue, as Mackie would have it (see also Parsons 1984). But there is a well known further paradox which seems to block this dismissal. For if we allow, as well as “true” and “false” also “meaningless,” then it might well seem that The Strengthened Liar arises, which, in this case, could be expressed

What I am now saying is false, or meaningless.

If I am saying nothing meaningful here, then seemingly what I say is true, which seems to imply that it does have meaning, after all.

Let us, therefore, look at some other notable ways of trying to escape even the Unstrengthened Liar. The Unstrengthened Liar comes in a whole host of variations, for instance:

This very sentence is false,

or

Some sentence in this book is false,

if that sentence is the only sentence in a book, say in its preface. It also arises with the following pair of sentences taken together:

The following sentence is false,The previous sentence is true;

and in a case of Buridan’s,

What Plato is saying is false,What Socrates is saying is true,

if Socrates says the first, while Plato says the second. There are many other variations, some of which we shall look at later.

The semantical concepts in these paradoxes are truth and falsity, and the first major contribution to our understanding of these, in the twentieth century, was by Tarski. Tarski took truth and falsity to be predicates of sentences, and discussed at length the following example of his famous “T-scheme”:

“snow is white” is true if and only if snow is white.

He believed that

Ts iff p,

holds, quite generally, if “s” is some phrase naming, or referring, to the sentence “p” — for instance, as above, that same sentence in quotation marks, or a number in some system of numbering, which was the way Gödel handled such matters. Tarski’s analysis of truth involved denying that there could be “semantic closure” that is the presence in a language of the semantic concepts relating to expressions in that language (Tarski 1956, p402):

The main source of the difficulties met with seems to lie in the following: it has not always been kept in mind that the semantical concepts have a relative character, that they must always be related to a particular language. People have not been aware that the language about which we speak need by no means coincide with the language in which we speak. They have carried out the semantics of a language in that language itself and, generally speaking, they have proceeded as though there was only one language in the world. The analysis of the antinomies mentioned shows, on the contrary, that the semantical concepts simply have no place in the language to which they relate, that the language which contains its own semantics, and within which the usual laws of logic hold, must inevitably be inconsistent.

This conclusion, which requires that any consistent language be incomplete, Tarski derived directly by considering The Liar, since “This is false” seems to provide a self-referential “s” for which

s = “¬Ts”,

hence, substituting in the following example of the T-scheme

T”¬Ts” iff ¬Ts,

we get

Ts iff ¬Ts.

To block this conclusion Tarski held that the self-reference seemingly available in the identity

s = “¬Ts”

was just not consistently available, and specifically that, if one used the sentence “this is false” then the referent of “this” should not be that very sentence itself – on pain of the evident contradiction. Using “this is false” coherently meant speaking about an object language, but in another, higher, language – the meta-language. Of course the semantical concepts applicable in this meta-language likewise could not be sensibly defined within it, so generally there was supposed to be a whole hierarchy of languages.

It seems difficult to apply this kind of stratification of languages to the way we ordinarily speak, however. Indeed, to assert that truth can attach to indexical sentences, like “What I am now saying is false,” would seem to be flying in the face of a very clear truth (Kneale 1972, p234f). Consider, further, this variation of the Plato-Socrates case above (compare Haack 1978, p144), where Jones says

All of Nixon’s utterances about Watergate are false,

and Nixon says

All of Jones’ utterances about Watergate are true.

If, following Tarski, we were to try to assign levels of language to this pair of utterances, then how could we do it? It would seem that Jones’ utterance would have to be in a language higher, in Tarski’s hierarchy, than any of Nixon’s; yet, contrariwise, Nixon’s would have to be higher than any of Jones’.

Martin has produced a typology of solutions to the Liar which locates Tarski’s way out as one amongst four possible, general diagnoses (Martin 1984, p4). The two principles which Martin takes to categorise the Liar we have just seen, namely

(S) There is a sentence which says of itself only that it is not true,

and

(T) Any sentence is true if and only if what it says is the case.

Tarski, in these terms, took claim (S) to be incorrect. But one also might claim that (T) is incorrect, maybe because there are sentences without a truth value, being meaningless, or lacking in content in some other way, as is held by the theorists mentioned before. A third general diagnosis claims that both (S) and (T) are correct, and indeed incompatible, but proceeds to some “rational reconstruction” of them so that the incompatibility is removed. Fourthly it is possible to argue that (S) and (T) are correct, but really compatible. Martin sees this happening as a result of some possible ambiguity in the terms used in the two principles.

We can isolate a further, fifth option, although Martin does not consider it. That option is to hold that both (S) and (T) are incorrect, as is done by the tradition which holds that it is not sentences which are true or false. One cannot say, for instance, that the sentence “that is white” is true, in itself, since what is spoken of might vary from one utterance of the sentence to another. Following the second world war, because of this sort of thing, it became more common to think of semantical notions as attached not to sentences and words, but to what such sentences and words mean (Kneale and Kneale 1962, p601f). On this understanding it is not specifically the sentence “that is white” but what is expressed by this sentence, that is the statement or proposition made by it, which may be true. But it was shown by Thomason, following work by Montague, that the same sorts of problems can be generated even in this case. We can create self-referential paradoxes to do with statements and propositions which again cannot be obviously escaped (Thomason 1977, 1980, 1986). And the problems are not just confined to the semantics of truth and falsity, but also arise in just the same way with more general semantical notions like knowledge, belief, and provability. In recent years, the much larger extent of the problems to do with self-reference has, in this way, become increasingly apparent.

Asher and Kamp sum up (Asher and Kamp 1989, p87):

Thomason argues that the results of Montague (1963) apply not only to theories in which attitudinal concepts, such as knowledge and belief, are treated as predicates of sentences, but also to “representational” theories of the attitudes, which analyse these concepts as relations to, or operations on (mental) representations. Such representational treatments of the attitudes have found many advocates; and it is probably true that some of their proponents have not been sufficiently alert to the pitfalls of self-reference even after those had been so clearly exposed in Montague (1963)… To such happy-go-lucky representationalists, Thomason (1980) is a stern warning of the obstacles that a precise elaboration of their proposals would encounter.

Thomason mentions specifically Fodor’s “Language of Thought” in his work; Asher and Kamp themselves show that modes of argument similar to Thomason’s can be used even to show that Montague’s Intensional Semantics has the same problems. Asher and Kamp go on to explain the general method which achieves these results (Asher and Kamp 1989, p87):

Thomason’s argument is, at least on the face of it, straightforward. He reasons as follows: Suppose that a certain attitude, say belief, is treated as a property of “proposition-like” objects – let us call them “representations” – which are built up from atomic constituents in much the way that sentences are. Then, with enough arithmetic at our disposal, we can associate a Gödel number with each such object and we can mimic the relevant structural properties of and relations between such objects by explicitly defined arithmetical predicates of their Gödel numbers. This Gödelisation of representations can then be exploited to derive a contradiction in ways familiar from the work of Gödel, Tarski and Montague.

The only ray of hope Asher and Kamp can offer is (Asher and Kamp 1989, p94): “Only the familiar systems of epistemic and doxastic logic, in which knowledge and belief are treated as sentential operators, and which do not treat propositions as objects of reference and quantification, seem solidly protected from this difficulty.” But see on these, for instance, Mackie 1973, p276f, although also Slater 1986.

Gödel’s famous theorems in this area are, of course, concerned with the notion of provability, and they show that if this notion is taken as a predicate of certain formulas, then in any standard formal system which has enough arithmetic to handle the Gödel numbers used to identify the formulas in the system, certain statements can be constructed which are true, but are not provable in the system, if it is consistent. What is also true, and even provable in such a system is that, if it is consistent then (a) a certain specific self-referential formula is not provable in the system, and (b) the consistency of the system is not provable in the system. This means the consistency of the system cannot be proved in the system unless it is inconsistent, and it is commonly believed that the appropriate systems are consistent. But if they are consistent then this result shows they are incomplete, that is there are truths which they cannot prove.

The paradoxical thing about Gödel’s Theorems is that they seem to show that there are things we can ourselves prove, in the natural language we use to talk about formal systems, but which a formal system of proof cannot prove. And that fact has been fed into the very large debate about our differences from, even superiority over mechanisms (see e.g. Penrose 1989). But if we consider the way many people would argue about, for instance,

this very sentence is unprovable,

then our abilities as humans might not seem to be too great. For many people would argue:

If that sentence is provable then it is true, since provability entails truth; but that makes it unprovable, which is a contradiction. Hence it must be unprovable. But by this process we seem to have proved that it is unprovable – another contradiction!

So, unless we can extricate ourselves from this impasse, as well as the many others we have looked at, we would not seem to be too bright. Or does this sort of argument show that there is, indeed, no escape? Some people, of course, might want to follow Tarski, and run from “natural language” in the face of these conclusions. For Gödel had no reason to conclude, from his theorems, that the formal systems he was concerned with were inconsistent. However, his formal arguments differ crucially from that just given, since there is no proof within his systems that “provability” entails “truth.” There is no doubt that what we have been dealing with are real paradoxes!

The intractability of the impasse here, and the failure of many great minds to make headway with it, has lead some theorists to believe that indeed there is no escape. Notable amongst these is Priest (compare Priest 1979), who believes we must now learn to accept that some contradictions can be true, and adjust our logic accordingly. This is very much in line with the expectation we initially noted Quine had, that maybe “some tacit and trusted pattern of reasoning must be made explicit and henceforward be avoided or revised.” (Quine 1966, p7)  The particular law which “paraconsistent” logicians mainly doubt is “ex impossibile quodlibet”, that is “from an impossibility anything follows,” or

(p&¬p) ⊢ q.

It is thought that, if this traditional rule were removed from logic then, at least, any true contradictions we find, e.g. anything of the form “p&¬p” which we deduce from some paradox of self-reference, will not have the wholesale repercussions that it otherwise would have in traditional logic. Objectors to paraconsistency might say that the premise of this rule could not arise, so its “explosive” repercussions would never eventuate. But there is the broader, philosophical question, as well, about whether a switch to a different logic does not just change the subject, leaving the original problems unattacked. That depends on how one views “deviant logics.” There are reasons to believe that deviant logics are not rivals of traditional logic, but merely supplementary to, or extensions of it (Haack, 1974, Pt 1, Ch1). For if one drops the above rule then hasn’t one merely produced a new kind of negation? Are “p” and “¬p” still contradictory, if they can, somehow, both be true? And if “p” and “¬p” are not contradictory, then what is contradictory to “p”, and couldn’t we formulate the previous paradoxes in terms of it? It seems we may have just turned our backs on the real difficulty.

5. A Contemporary Twist

There have been developments, in the last few years, which have shown that the previous emphasis on paradoxes involving self-reference was to some extent misleading. For a family of paradoxes, with similar levels of intractability, have been discovered, which are not reflexive in this way.

It was mentioned before that a form of the Liar paradox could be derived in connection with the pair of statements

What Plato is saying is false,

What Socrates is saying is true,

when Socrates says the former, and Plato the latter. For, if what Socrates is saying is true, then, according to the former, what Plato is saying is false, but then, according to the latter, what Socrates is saying is false. On the other hand, if what Socrates is saying is false then, according to the former, what Plato is saying is true, and then, according to the latter, what Socrates is saying is true. Such a paradox is called a “liar chain”; they can be of any length; and with them we are already out of the really strict “self-reference” family, although, by passing along through the chain what Socrates is saying, it will eventually come back to reflect on itself.

It seems, however, that, if one creates what might be called “infinite chains” then there is not even this attenuated form of self-reference (though see Beall, 2001). Yablo asked us to consider an infinite sequence of sentences of which the following is representative (Yablo 1993):

(Si) For all k>i, Sk is untrue.

Sorensen’s “Queue Paradox” is similar, and can be obtained by replacing “all” by “some” here, and considering the series of thoughts of some students in an infinite queue (Sorensen 1998). Suppose that, in Yablo’s case, Sn is true for some n. Then Sn+1 is false, and all subsequent statements; but the latter fact makes Sn+1 true; giving a contradiction. Hence for no n is Sn true. But that means that S1 is true, S2 is true, etc; in fact it means every statement is true, which is another contradiction. In Sorensen’s case, if some student thinks “some of the students behind me are now thinking an untruth” then this cannot be false, since then all the students behind her are thinking the truth – although that means that some student behind her is speaking an untruth, a contradiction. So no student is thinking an untruth. But if some student is consequently thinking a truth, then some student behind them is thinking an untruth, which we know to be impossible. Indeed every supposition seems impossible, and we are in the characteristic impasse.

Gaifman has worked up a way of dealing with such more complex paradoxes of the Liar sort, which can end up denying the sentences in such loops, chains, and infinite sequences have any truth value whatever. Using “GAP” for “recognised failure to assign a standard truth value” Gaifman formulates what he calls the “closed loop rule” (Gaifman 1992, pp225, 230):

If, in the course of applying the evaluation procedure, a closed unevaluated loop forms and none of its members can be assigned a standard value by any of the rules, then all of its members are assigned GAP in a single evaluation step.

Goldstein has formulated a comparable process, which he thinks improves upon Gaifman in certain details, and which ends up labelling certain sentences “FA”, meaning that the sentence has made a “failed attempt” at making a statement (Goldstein 2000, p57). But the major question with such approaches, as before, is how they deal with The Strengthened Liar. Surely there remain major problems with

This sentence is false, or has a GAP,

and

This sentence makes a false statement, or is a FA.

6. References and Further Reading

  • Asher, N. and Kamp, H. 1986, “The Knower’s Paradox and Representational Theories of Attitudes,” in J. Halpern (ed.) Theoretical Aspects of Reasoning about Knowledge, San Mateo CA, Morgan Kaufmann.
  • Asher, N. and Kamp, H. 1989, “Self-Reference, Attitudes and Paradox” in G. Chierchia, B.H. Partee, and R. Turner (eds.) Properties, Types and Meaning 1.
  • Beall, J.C., 2001, “Is Yablo’s Paradox Non-Circular?,” Analysis 61.3.
  • Copi, I.M., 1973, Symbolic Logic 4th ed. Macmillan, New York.
  • Gaifman, H. 1992, “Pointers to Truth,” The Journal of Philosophy, 89, 223-61.
  • Goldstein, L. 2000, “A Unified Solution to Some Paradoxes,” Proceedings of the Aristotelian Society, 100, pp53-74.
  • Grattan-Guinness, I. (ed.) 1980, From the Calculus to Set Theory, 1630-1910, Duckworth, London.
  • Haack, S. 1974, Deviant Logic, C.U.P., Cambridge.
  • Haack, S. 1978, Philosophy of Logics, C.U.P., Cambridge.
  • Hallett, M. 1984, Cantorian Set Theory and Limitation of Size, Clarendon Press, Oxford.
  • Keefe, R. 2001, Theories of Vagueness, C.U.P. Cambridge.
  • Kneale, W. 1972, “Propositions and Truth in Natural Languages,” Mind, 81, pp225-243.
  • Kneale, W. and Kneale M. 1962, The Development of Logic, Clarendon Press, Oxford.
  • Lavine, S. 1994, Understanding the Infinite, Harvard University Press, Cambridge MA.
  • Linsky, L. 1967, Referring, Routledge and Kegan Paul, London.
  • Mackie, J.L. 1973, Truth, Probability and Paradox, Clarendon Press, Oxford.
  • Martin, R.L. (ed.) 1984, Recent Essays on Truth and the Liar Paradox, Clarendon Press, Oxford.
  • Martin-Löf, P. 1970, Notes on Constructive Mathematics, Almqvist and Wiksell, Stockholm.
  • Montague, R. 1963, “Syntactic Treatments of Modality, with Corollaries on Reflection Principles and Finite Axiomatisability,” Acta Philosophica Fennica, 16, pp153-167.
  • Owen, G.E.L. 1957-8, “Zeno and the Mathematicians,” Proceedings of the Aristotelian Society, 58, 199-222.
  • Parsons, C. 1984, “The Liar Paradox” in R.L.Martin (ed.) Recent Essays on Truth and the Liar Paradox, Clarendon Press, Oxford.
  • Penrose, R. 1989, The Emperor’s New Mind, O.U.P., Oxford.
  • Priest, G.G. 1979, “The Logic of Paradox,” Journal of Philosophical Logic, 8, pp219-241.
  • Priest, G.G. 1994, “The Structure of the Paradoxes of Self-Reference,” Mind, 103, pp25- 34.
  • Quine, W.V.O. 1966, The Ways of Paradox, Random House, New York.
  • Ryle, G. 1950-1, “Heterologicality,” Analysis, 11, pp61-69.
  • Sainsbury, M. 1995, Paradoxes, 2nd ed., C.U.P. Cambridge.
  • Salmon, W.C. (ed.) 1970, Zeno’s Paradoxes, Bobbs-Merrill, Indianapolis.
  • Scott, T.K. 1966, John Buridan: Sophisms on Meaning and Truth, Appleton-Century- Crofts, New York.
  • Slater, B.H. 1986, “Prior’s Analytic,” Analysis, 46, pp76-81.
  • Sorensen, R. 1998, “Yablo’s Paradox and Kindred Infinite Liars,” Mind, 107, 137-55.
  • Suppes, P. 1972, Axiomatic Set Theory, Dover, New York.
  • Tarski, A. 1956, Logic, Semantics, Metamathematics: Papers from 1923 to 1938, trans. J.H. Woodger, O.U.P. Oxford.
  • Thomason, R. 1977, “Indirect Discourse is not Quotational,” The Monist, 60, pp340-354.
  • Thomason, R. 1980, “A Note on Syntactical Treatments of Modality,” Synthese, 44, pp391-395
  • Thomason, R. 1986, “Paradoxes and Semantic Representation,” in J.Halpern (ed.) Theoretical Aspects of Reasoning about Knowledge, San Mateo CA, Morgan Kaufmann.
  • Williamson, T. 1994, Vagueness, London, Routledge.
  • Yablo, S. 1993, “Paradox without Self-Reference,” Analysis, 53, 251-52.

For more discussion of the logical paradoxes, see the following articles within this encyclopedia:

Author Information

Barry Hartley Slater
Email: slaterbh@cyllene.uwa.edu.au
University of Western Australia
Australia

Russell’s Paradox

russellRussell’s paradox represents either of two interrelated logical antinomies. The most commonly discussed form is a contradiction arising in the logic of sets or classes. Some classes (or sets) seem to be members of themselves, while some do not. The class of all classes is itself a class, and so it seems to be in itself. The null or empty class, however, must not be a member of itself. However, suppose that we can form a class of all classes (or sets) that, like the null class, are not included in themselves. The paradox arises from asking the question of whether this class is in itself. It is if and only if it is not. The other form is a contradiction involving properties. Some properties seem to apply to themselves, while others do not. The property of being a property is itself a property, while the property of being a cat is not itself a cat. Consider the property that something has just in case it is a property (like that of being a cat) that does not apply to itself. Does this property apply to itself? Once again, from either assumption, the opposite follows. The paradox was named after Bertrand Russell (1872-1970), who discovered it in 1901.

Table of Contents

  1. History
  2. Possible Solutions to the Paradox of Properties
  3. Possible Solutions to the Paradox of Classes or Sets
  4. References and Further Reading

1. History

Russell’s discovery came while he was working on his Principles of Mathematics. Although Russell discovered the paradox independently, there is some evidence that other mathematicians and set-theorists, including Ernst Zermelo and David Hilbert, had already been aware of the first version of the contradiction prior to Russell’s discovery. Russell, however, was the first to discuss the contradiction at length in his published works, the first to attempt to formulate solutions and the first to appreciate fully its importance. An entire chapter of the Principles was dedicated to discussing the contradiction, and an appendix was dedicated to the theory of types that Russell suggested as a solution.

Russell discovered the contradiction from considering Cantor’s power class theorem: the mathematical result that the number of entities in a certain domain is always smaller than the number of subclasses of those entities. Certainly, there must be at least as many subclasses of entities in the domain as there are entities in the domain given that for each entity, one subclass will be the class containing only that entity. However, Cantor proved that there also cannot be the same number of entities as there are subclasses. If there were the same number, there would have to be a 1-1 function f mapping entities in the domain on to subclasses of entities in the domain. However, this can be proven to be impossible. Some entities in the domain would be mapped by f on to subclasses that contain them, whereas others may not. However, consider the subclass of entities in the domain that are not in the subclasses on to which f maps them. This is itself a subclass of entities of the domain, and thus, f would have to map it on to some particular entity in the domain. The problem is that then the question arises as to whether this entity is in the subclass on to which f maps it. Given the subclass in question, it does just in case it does not. The Russell paradox of classes can in effect be seen as an instance of this line of reasoning, only simplified. Are there more classes or subclasses of classes? It would seem that there would have to be more classes, since all subclasses of classes are themselves classes. But if Cantor’s theorem is correct, there would have to be more subclasses. Russell considered the simple mapping of classes onto themselves, and invoked the Cantorian approach of considering the class of all those entities that are not in the classes onto which they are mapped. Given Russell’s mapping, this becomes the class of all classes not in themselves.

The paradox had profound ramifications for the historical development of class or set theory. It made the notion of a universal class, a class containing all classes, extremely problematic. It also brought into considerable doubt the notion that for every specifiable condition or predicate, one can assume there to exist a class of all and only those things that satisfy that condition. The properties version of the contradiction–a natural extension of the classes or sets version–raised serious doubts about whether one can be committed to objective existence of a property or universal corresponding to every specifiable condition or predicate. Indeed, contradictions and problems were soon found in the work of those logicians, philosophers and mathematicians who made such assumptions. In 1902, Russell discovered that a version the contradiction was expressible in the logical system developed in Volume I of Gottlob Frege’s Grundgesetze der Arithmetik, one of the central works in the late-19th and early-20th century revolution in logic. In Frege’s philosophy, a class is understood as the “extension” or “value-range” of a concept. Concepts are the closest correlates to properties in Frege’s metaphysics. A concept is presumed to exist for every specifiable condition or predicate. Thus, there is a concept of being a class that does not fall under its defining concept. There is also a class defined by this concept, and it falls under its defining concept just in case it does not.

Russell wrote to Frege concerning the contradiction in June of 1902. This began one of the most interesting and discussed correspondences in intellectual history. Frege immediately recognized the disastrous consequences of the paradox. He did note, however, that the properties version of the paradox was solved in his philosophy by his distinction between levels of concepts. For him, concepts are understood as functions from arguments to truth-values. Some concepts, “first-level concepts”, take objects as arguments, some concepts, “second-level concepts” take these functions as arguments, and so on. Thus, a concept can never take itself as argument, and the properties version cannot be formulated.  However, classes, or extensions or concepts, were all understood by Frege to be of the same logical type as all other objects.  The question does arise, then, for each class whether it falls under its defining concept.

When he received Russell’s first letter, the second volume of Frege’s Grundgesetze was already in the latter stages of the publication process. Frege was forced to quickly prepare an appendix in response to the paradox. Frege considers a number of possible solutions. The conclusion he settles on, however, is to weaken the class abstraction principle in the logical system. In the original system, one could conclude that an object is in a class if and only if the object falls under the concept defining the class. In the revised system, one can conclude only that an object is in a class if and only if the object falls under the concept defining the class and the object is not identical to the class in question. This blocks the class version of the paradox. However, Frege was not entirely happy even with this solution. And this was for good reason. Some years later the revised system was found to lead to a more complicated form of the contradiction. Even before this result was discovered, Frege abandoned it and seems to have concluded that his earlier approach to the logic of classes was simply unworkable, and that logicians would have to make do entirely without commitment to classes or sets.

However, other logicians and mathematicians have proposed other, relatively more successful, alternative solutions. These are discussed below.

2. Possible Solutions to the Paradox of Properties

The Theory of Types. It was noted above that Frege did have an adequate response to the contradiction when formulated as a paradox of properties. Frege’s response was in effect a precursor to what one of the most commonly discussed and articulated proposed solutions to this form of the paradox. This is to insist that properties fall into different types, and that the type of a property is never the same as the entities to which it applies. Thus, the question never even arises as to whether a property applies to itself. A logical language that divides entities into such a hierarchy is said to employ the theory of types. Though hinted at already in Frege, the theory of types was first fully explained and defended by Russell in Appendix B of the Principles. Russell’s theory of types was more comprehensive than Frege’s distinction of levels; it divided not only properties into different logical types, but classes as well. The use of the theory of types to solve the other form of Russell’s paradox is described below.

To be philosophically adequate, the adoption of the theory of types for properties requires developing an account of the nature of properties such that one would be able to explain why they cannot apply to themselves. After all, at first blush, it would seem to make sense to predicate a property of itself. The property of being self-identical would seem to be self-identical. The property of being nice seems to be nice. Similarly, it seems false, not nonsensical, to say that the property of being a cat is a cat. However, different thinkers explain the justification for the type-division in different ways. Russell even gave different explanations at different parts of his career. For his part, the justification for Frege’s division of different levels of concepts derived from his theory of the unsaturatedness of concepts. Concepts, as functions, are essentially incomplete. They require an argument in order to yield a value. One cannot simply predicate one concept of a concept of the same type, because the argument concept still requires its own argument. For example, while it is possible to take the square root of the square root of some number, one cannot simply apply the function square root to the function square root and arrive at a value.

Conservatism about Properties. Another possible solution to the paradox of properties would involve denying that a property exists corresponding to any specifiable conditions or well-formed predicate. Of course, if one eschews metaphysical commitment to properties as objective and independent entities altogether, that is, if one adopts nominalism, then the paradoxical question is avoided entirely. However, one does not need to be quite so extreme in order to solve the antinomy. The higher-order logical systems developed by Frege and Russell contained what is called the comprehension principle, the principle that for every open formula, no matter how complex, there exists as entity a property or concept exemplified by all and only those things that satisfy the formula. In effect, they were committed to attributes or properties for any conceivable set of conditions or predicates, no matter how complex. However, one could instead adopt a more austere metaphysics of properties, only granting objective existence to simple properties, perhaps including redness, solidity and goodness, etc. One might even allow that such properties can possibly apply to themselves, e.g. that goodness is good.  However, on this approach one would deny the same status to complex attributes, e.g. the so-called “properties” as having-seventeen-heads, being-a-cheese-made-England, having-been-written-underwater, etc. It is simply not the case that any specifiable condition corresponds to a property, understood as an independently existing entity that has properties of its own. Thus, one might deny that there is a simple property being-a-property-that-does-not-apply-to-itself. If so, one can avoid the paradox simply by adopting a more conservative metaphysics of properties.

3. Possible Solutions to the Paradox of Classes or Sets

It was mentioned above that late in his life, Frege gave up entirely on the feasibility of the logic of classes or sets. This is of course one ready solution to the antinomy in the class or set form: simply deny the existence of such entities altogether. Short of this, however, the following solutions have enjoyed the greatest popularity:

The Theory of Types for Classes: It was mentioned earlier that Russell advocated a more comprehensive theory of types than Frege’s distinction of levels, one that divided not only properties or concepts into various types, but classes as well. Russell divided classes into classes of individuals, classes of classes of individuals, and so on. Classes were not taken to be individuals, and classes of classes of individuals were not taken to be classes of individuals. A class is never of the right type to have itself as member. Therefore, there is no such thing as the class of all classes that are not members of themselves, because for any class, the question of whether it is in itself is a violation of type. Once again, here the challenge is to explain the metaphysics of classes or sets in order to explain the philosophical grounds of the type-division.

Stratification: In 1937, W. V. Quine suggested an alternative solution in some ways similar to type-theory. His suggestion was rather than actually divide entities into individuals, classes of individuals, etc., such that the proposition that some class is in itself is always ill-formed or nonsensical, we can instead put certain restrictions on what classes are supposed to exist. Classes are only supposed to exist if their defining conditions are so as to not involve what would, in type theory, be a violation of types. Thus, for Quine, while “x is not a member of x” is a meaningful assertion, we do not suppose there to exist a class of all entities x that satisfy this statement. In Quine’s system, a class is only supposed to exist for some open formula A if and only if the formula A is stratified, that is, if there is some assignment of natural numbers to the variables in A such that for each occurrence of the class membership sign, the variable preceding the membership sign is given an assignment one lower than the variable following it. This blocks Russell’s paradox, because the formula used to define the problematic class has the same variable both before and after the membership sign, obviously making it unstratified. However, it has yet to be determined whether or not the resulting system, which Quine called “New Foundations for Mathematical Logic” or NF for short, is consistent or inconsistent.

Aussonderung: A quite different approach is taken in Zermelo-Fraenkel (ZF) set theory. Here too, a restriction is placed on what sets are supposed to exist. Rather than taking the “top-down” approach of Russell and Frege, who originally believed that for any concept, property or condition, one can suppose there to exist a class of all those things in existence with that property or satisfying that condition, in ZF set theory, one begins from the “bottom up”. One begins with individual entities, and the empty set, and puts such entities together to form sets. Thus, unlike the early systems of Russell and Frege, ZF is not committed to a universal set, a set including all entities or even all sets. ZF puts tight restrictions on what sets exist. Only those sets that are explicitly postulated to exist, or which can be put together from such sets by means of iterative processes, etc., can be concluded to exist. Then, rather than having a naive class abstraction principle that states that an entity is in a certain class if and only if it meets its defining condition, ZF has a principle of separation, selection, or as in the original German, “Aussonderung“. Rather than supposing there to exist a set of all entities that meet some condition simpliciter, for each set already known to exist, Aussonderung tells us that there is a subset of that set of all those entities in the original set that satisfy the condition. The class abstraction principle then becomes: if set A exists, then for all entities x in A, x is in the subset of A that satisfies condition C if and only if x satisfies condition C. This approach solves Russell’s paradox, because we cannot simply assume that there is a set of all sets that are not members of themselves. Given a set of sets, we can separate or divide it into those sets within it that are in themselves and those that are not, but since there is no universal set, we are not committed to the set of all such sets. Without the supposition of Russell’s problematic class, the contradiction cannot be proven.

There have been subsequent expansions or modifications made on all these solutions, such as the ramified type-theory of Principia Mathematica, Quine’s later expanded system of his Mathematical Logic, and the later developments in set-theory made by Bernays, Gödel and von Neumann. The question of what is the correct solution to Russell’s paradox is still a matter of debate.

See also the Russell-Myhill Paradox article in this encyclopedia.

4. References and Further Reading

  • Coffa, Alberto. “The Humble Origins of Russell’s Paradox.” Russell nos. 33-4 (1979): 31-7.
  • Frege, Gottlob. The Basic Laws of Arithmetic: Exposition of the System. Edited and translated by Montgomery Furth. Berkeley: University of California Press, 1964.
  • Frege, Gottlob. Correspondence with Russell. In Philosophical and Mathematical Correspondence. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Geach, Peter T. “On Frege’s Way Out.” Mind 65 (1956): 408-9.
  • Grattan-Guinness, Ivor. “How Bertrand Russell Discovered His Paradox.” Historica Mathematica 5 (1978): 127-37.
  • Hatcher, William S. Logical Foundations of Mathematics. New York: Pergamon Press, 1982.
  • Quine, W. V. O. “New Foundations for Mathematical Logic.” In From a Logical Point of View. 2d rev. ed. Cambridge, MA: Harvard University Press, 1980. (First published in 1937.)
  • Quine, W. V. O. “On Frege’s Way Out.” Mind 64 (1955): 145-59.
  • Russell, Bertrand. Correspondence with Frege. In Philosophical and Mathematical Correspondence, by Gottlob Frege. Translated by Hans Kaal. Chicago: University of Chicago Press, 1980.
  • Russell, Bertrand. The Principles of Mathematics. 2d. ed. Reprint, New York: W. W. Norton & Company, 1996. (First published in 1903.)
  • Zermelo, Ernst. “Investigations in the Foundations of Set Theory I.” In From Frege to Gödel, ed. by Jean van Heijenoort. Cambridge, MA: Harvard University Press, 1967. (First published in 1908.)

Author Information

Kevin C. Klement
Email: klement@philos.umass.edu
University of Massachusetts, Amherst
U. S. A.

Square of Opposition

The square of opposition is a chart that was introduced within classical (categorical) logic to represent the logical relationships holding between certain propositions in virtue of their form. The square, traditionally conceived, looks like this:

square-of-opposition

The four corners of this chart represent the four basic forms of propositions recognized in classical logic:

A propositions, or universal affirmatives take the form: All S are P.
E propositions, or universal negations take the form: No S are P.
I propositions, or particular affirmatives take the form: Some S are P.
O propositions, or particular negations take the form: Some S are not P.

Given the assumption made within classical (Aristotelian) categorical logic, that every category contains at least one member, the following relationships, depicted on the square, hold:

Firstly, A and O propositions are contradictory, as are E and I propositions. Propositions are contradictory when the truth of one implies the falsity of the other, and conversely. Here we see that the truth of a proposition of the form All S are P implies the falsity of the corresponding proposition of the form Some S are not P. For example, if the proposition “all industrialists are capitalists” (A) is true, then the proposition “some industrialists are not capitalists” (O) must be false. Similarly, if “no mammals are aquatic” (E) is false, then the proposition “some mammals are aquatic” must be true.

Secondly, A and E propositions are contrary. Propositions are contrary when they cannot both be true. An A proposition, e.g., “all giraffes have long necks” cannot be true at the same time as the corresponding E proposition: “no giraffes have long necks.” Note, however, that corresponding A and E propositions, while contrary, are not contradictory. While they cannot both be true, they can both be false, as with the examples of “all planets are gas giants” and “no planets are gas giants.”

Next, I and O propositions are subcontrary. Propositions are subcontrary when it is impossible for both to be false. Because “some lunches are free” is false, “some lunches are not free” must be true. Note, however, that it is possible for corresponding I and O propositions both to be true, as with “some nations are democracies,” and “some nations are not democracies.” Again, I and O propositions are subcontrary, but not contrary or contradictory.

Lastly, two propositions are said to stand in the relation of subalternation when the truth of the first (“the superaltern”) implies the truth of the second (“the subaltern”), but not conversely. A propositions stand in the subalternation relation with the corresponding I propositions. The truth of the A proposition “all plastics are synthetic,” implies the truth of the proposition “some plastics are synthetic.” However, the truth of the O proposition “some cars are not American-made products” does not imply the truth of the E proposition “no cars are American-made products.” In traditional logic, the truth of an A or E proposition implies the truth of the corresponding I or O proposition, respectively. Consequently, the falsity of an I or O proposition implies the falsity of the corresponding A or E proposition, respectively. However, the truth of a particular proposition does not imply the truth of the corresponding universal proposition, nor does the falsity of an universal proposition carry downwards to the respective particular propositions.

The presupposition, mentioned above, that all categories contain at least one thing, has been abandoned by most later logicians. Modern logic deals with uninstantiated terms such as “unicorn” and “ether flow” the same as it does other terms such as “apple” and “orangutan”. When dealing with “empty categories”, the relations of being contrary, being subcontrary and of subalternation no longer hold. Consider, e.g., “all unicorns have horns” and “no unicorns have horns.” Within contemporary logic, these are both regarded as true, so strictly speaking, they cannot be contrary, despite the former’s status as an A proposition and the latter’s status as an E proposition. Similarly, “some unicorns have horns” (I) and “some unicorns do not have horns” (O) are both regarded as false, and so they are not subcontrary. Obviously then, the truth of “all unicorns have horns” does not imply the truth of “some unicorns have horns,” and the subalternation relation fails to hold as well. Without the traditional presuppositions of “existential import”, i.e., the supposition that all categories have at least one member, then only the contradictory relation holds. On what is sometimes called the “modern square of opposition” (as opposed to the traditional square of opposition sketched above) the lines for contraries, subcontraries and subalternation are erased, leaving only the diagonal lines for the contradictory relation.

Author Information

The author of this article is anonymous.
The IEP would like a qualified author to replace this article with a longer one.

Zhuangzi (Chuang-Tzu, 369—298 B.C.E.)

zhuangziThe Zhuangzi (also known in Wade-Giles romanization romanization as Chuang-tzu), named after “Master Zhuang” was, along with the Laozi, one of the earliest texts to contribute to the philosophy that has come to be known as Daojia, or School of the Way. According to traditional dating, Master Zhuang, to whom the first seven chapters of the text have traditionally been attributed, was an almost exact contemporary of the Confucian thinker Mencius, but we have no record of direct philosophical dialogue between them.  The text is ranked among the greatest of literary and philosophical masterpieces that China has produced.  Its style is complex—mythical, poetic, narrative, humorous, indirect, and polysemic.

Much of the text espouses a holistic philosophy of life, encouraging disengagement from the artificialities of socialization, and cultivation of our natural “ancestral” potencies and skills, in order to live a simple and natural, but full and flourishing life. It is critical of our ordinary categorizations and evaluations, noting the multiplicity of different modes of understanding between different creatures, cultures, and philosophical schools, and the lack of an independent means of making a comparative evaluation. It advocates a mode of understanding that is not committed to a fixed system, but is fluid and flexible, and that maintains a provisional, pragmatic attitude towards the applicability of these categories and evaluations.

The Zhuangzi text is an anthology, in which several distinctive strands of Daoist thought can be recognized. The Jin dynasty thinker and commentator, Guo Xiang (Kuo Hsiang, d. 312 CE), edited and arranged an early collection, and reduced what had been a work in fifty-two chapters down to thirty-three chapters, excising material that he considered to be repetitious or spurious.  The versions of Daoist philosophy expressed in this text were highly influential in the reception, interpretation, and transformation of Buddhist philosophies in China.

Table of Contents

  1. Historical Background
  2. The Zhuangzi Text
  3. Central Concepts in the “Inner Chapters”
    1. Chapter 1: Xiao Yao You (Wandering Beyond)
    2. Chapter 2: Qi Wu Lun (Discussion on Smoothing Things Out)
    3. Chapter 3: Yang Sheng Zhu (The Principle of Nurturing Life)
    4. Chapter 4: Ren Jian Shi (The Realm of Human Interactions)
    5. Chapter 5: De Chong Fu (Signs of the Flourishing of Potency)
    6. Chapter 6: Da Zong Shi (The Vast Ancestral Teacher)
    7. Chapter 7: Ying Di Wang (Responding to Emperors and Kings)
  4. Key Interpreters of Zhuangzi
  5. References and Further Reading

1. Historical Background

According to the Han dynasty historian, Sima Qian, Zhuangzi was born during the Warring States (403-221 BCE), more than a century after the death of Confucius. During this time, the ostensibly ruling house of Zhou had lost its authority, and there was increasing violence between states contending for imperial power. This situation gave birth to the phenomenon known as the baijia, the hundred schools: the flourishing of many schools of thought, each articulating its own conception of a return to a state of harmony. The first and most significant of these schools was that of Confucius, who became the chief representative of the Ruists (Confucians), the scholars and propagators of the wisdom and culture of the tradition. Their great rivals were the Mohists, the followers of Mozi (“Master Mo”), who were critical of what they perceived to be the elitism and extravagance of the traditional culture. The archaeological discovery at Guo Dian in 1993 of an early Laozi manuscript suggests that the philosophical movement associated with the text also began to emerge during this period. The strands of Daoist philosophy expressed in the earliest strata of the Zhuangzi developed within a context infused with the ideas of these three schools. Master Zhuang is usually taken to be the author of the first seven chapters, but in recent years a few scholars have found reason to be skeptical not just of his authorship of any of the text, but also of his very existence.

According to early evidence compiled by Sima Qian, Zhuangzi was born in a village called Meng, in the state of Song; according to Lu Deming, the Sui-Tang dynasty scholar, the Pu River in which Zhuangzi was said to have fished was in the state of Chen which, as Wang Guowei points out, had become a territory of the southern state of Chu. We might say that Zhuangzi was situated in the borderlands between Chu, centered around the Yangzi River, and the central plains—which centered around the Yellow River and which were the home of the Shang and Zhou cultures. Some scholars, especially in China, maintain that there is a connection between the philosophies of the Daoist texts and the culture of Chu. The diversity of regions and cultures in early China has increasingly been acknowledged, and most interest has been directed to the state of Chu, in large part because of the wealth of archaeological evidence that is being unearthed there. As one develops a sensitivity for the culture of Chu, one senses deep resonances with the aesthetic sensibility of the Daoists, and with Zhuangzi’s style in particular. The silks and bronzes of Chu, for example, are rich and vibrant; the patterns and images on fabrics and pottery are fanciful and naturalistic. However, while the evidence is persuasive, it is far from decisive.

If the traditional dating is reliable, then Zhuangzi would have been an almost exact contemporary of the Ruist thinker Mencius, though there is no clear evidence of communication between them. There are a few remarks in the Zhuangzi that could possibly be alluding to Mencius’ philosophy, but there is nothing in the Mencius that shows any interest in Zhuangzi. The philosopher and statesman Hui Shi, or Huizi (“Master Hui,” 380-305 BCE), is represented as a close friend of Zhuangzi, though decidedly unconvinced by his philosophical musings. There appears to have been a friendly rivalry between the broad and mythic-minded Zhuangzi and the politically motivated Huizi, who is critiqued in the text as a shortsighted paradox-monger. Despite their very deep philosophical distance, and Huizi’s perceived limitations, Zhuangzi expresses great appreciation both for his linguistic abilities and for his friendship. The other “logician,” Gongsun Longzi, would also have been a contemporary of Zhuangzi, and although Zhuangzi does not, unfortunately, engage in any direct philosophical discussion with him, one does find what appears to be an occasional wink in his direction.

2. The Zhuangzi Text

The currently extant text known as the Zhuangzi is the result of the editing and arrangement of the Jin dynasty thinker and commentator Guo Xiang (Kuo Hsiang, d. 312 CE). He reduced what was then a work in fifty-two chapters to the current edition of thirty-three chapters, excising material that he considered to be spurious. His commentary on the text provides an interpretation that has been one of the most influential over the subsequent centuries.

Guo Xiang’s thirty-three chapter edition of the text is divided into three collections, known as the Inner Chapters (Neipian), the Outer Chapters (Waipian), and the Miscellaneous Chapters (Zapian). The Inner Chapters are the first seven chapters and are generally considered to be the work of Zhuangzi himself. Because the evidence for this attribution is sparse and because of the miscellaneous nature of the editing, some scholars (McCraw, Klein) express skepticism that we can be sure which were the earliest passages or who they were written by. The Outer Chapters are chapters 8 to 22, and the Miscellaneous Chapters are chapters 23 to 33. The Outer and Miscellaneous Chapters can be further subdivided into different strands of Daoist thought. Much modern research has been devoted to a sub-classification of these chapters according to philosophical school. Kuan Feng made some scholarly breakthroughs early in the twentieth century; A. C. Graham continued his classification in the tradition of Kuan Feng. Harold Roth has also taken up a consideration of this issue and come up with some very interesting results. What follows is a simplified version of the results of the research of Liu Xiaogan.

According to Liu, chapters 17 to 27 and 32 can be considered to be the work of a school of Zhuangzi’s followers, what he calls the Shu Zhuang Pai, or the “Transmitter” school. Graham, following Kuan Feng, considers chapters 22 to 27 and 32 not to be coherent chapters, but merely random “ragbag” collections of fragments. In fact, this miscellaneous character is characteristic of many, if not most, of the rest of the chapters, and complicates any simplistic classification of chapters as a whole. Liu considers chapters 8 to 10, chapters 28 to 31, and the first part of chapter 11 to be from a school of Anarchists whose philosophy is closely related to that of Laozi. Graham, again following Kuan Feng, sees these as two separate but related schools: the first he attributes to a writer he calls the “Primitivist,” the second he considers to be a school of followers of Yang Zhu. Liu classifies chapters 12 to 16, chapter 33, and the first part of chapter 11 as belonging to the Han dynasty school known as Huang-Lao. Graham refers to them as the Syncretist chapters. Graham finds the classification of chapter 16 to be problematic. Chapter 30 does not seem to have any distinctively Daoist content at all. Though Graham thinks that it is consistent with the Yangist emphasis on preserving life, it is also consistent with Confucian and Mohist critiques of aggression.

In the following chart the further to the right the chapters are listed, the further away they are from the central ideas of the Zhuangzian philosophy of the Inner Chapters:

The Inner Chapters School of Zhuang Anarchist Utopianism Huang-Lao Syncretism
1. Wandering Beyond 17. Autumn Floods 8. Webbed Toes 11. Let it Be, Leave it Alone
2. Discussion on Smoothing Things Out 18. Utmost Happiness 9. Horse’s Hooves 12. Heaven and Earth
3. The Principle of Nurturing Life 19. Mastering Life 10. Rifling Trunks 13. The Way of Heaven
4. In the Human Realm 20. The Mountain Tree 11. Let it Be, Leave it Alone 14. The Turning of Heaven
5. Signs of Abundant Potency 21. Tian Zi Fang 15. Constrained in Will
6. The Vast Ancestral Teacher 22. Knowledge Wandered North (16?. Mending the Inborn Nature) (16?. Mending the Inborn Nature)
7. Responding to Emperors and Kings 23. Geng Sang Chu
24. Xu Wugui 28. Yielding the Throne 33. The World
25. Ze Yang 29. Robber Zhi
26. External Things (30. Discoursing on Swords?)
27. Imputed Words 31. The Old Fisherman
32. Lie Yukou

3. Central Concepts in the “Inner Chapters”

The following is an account of the central ideas of Zhuangzian philosophy, going successively through each of the seven Inner Chapters. This discussion is not confined to the content of the particular chapters, but rather represents a fuller articulation of the inter-relationships of the ideas between the Inner Chapters, and also between these ideas and those expressed in the Outer and Miscellaneous Chapters, where these appear to be related. References to “Zhuangzi” below should not be taken as referring to a historical person, but rather as shorthand for the overall philosophy as articulated in the text of the Inner Chapters and related passages.

a. Chapter 1: Xiao Yao You (Wandering Beyond)

The title of the first chapter of the Zhuangzi has also been translated as “Free and Easy Wandering” and “Going Rambling Without a Destination.” Both of these reflect the sense of the Daoist who is in spontaneous accord with the natural world, and who has retreated from the anxieties and dangers of social life, in order to live a healthy and peaceful natural life. In modern Mandarin, the word xiaoyao has thus come to mean “free, at ease, leisurely, spontaneous.” It conveys the impression of people who have given up the hustle and bustle of worldly existence and have retired to live a leisurely life outside the city, perhaps in the natural setting of the mountains.

But this everyday expression is lacking a deeper significance that is expressed in the classical Chinese phrase: the sense of distance, or going beyond. As with all Zhuangzi’s images, this is to be understood metaphorically. The second word, ‘yao,’ means ‘distance’ or ‘beyond,’ and here implies going beyond the boundaries of familiarity. We ordinarily confine ourselves within our social roles, expectations, and values, and with our everyday understandings of things. But this, according to Zhuangzi, is inadequate for a deeper appreciation of the natures of things, and for a more successful mode of interacting with them. We need at the very least to undo preconceptions that prevent us from seeing things and events in new ways; we need to see how we can structure and restructure the boundaries of things. But we can only do so when we ourselves have ‘wandered beyond’ the boundaries of the familiar. It is only by freeing our imaginations to reconceive ourselves, and our worlds, and the things with which we interact, that we may begin to understand the deeper tendencies of the natural transformations by which we are all affected, and of which we are all constituted. By loosening the bonds of our fixed preconceptions, we bring ourselves closer to an attunement to the potent and productive natural way (dao) of things.

Paying close attention to the textual associations, we see that wandering is associated with the word wu, ordinarily translated ‘nothing,’ or ‘without.’ Related associations include: wuyou (no ‘something’) and wuwei (no interference). Roger Ames and David Hall have commented extensively on these wu expressions. Most importantly, they are not to be understood as simple negations, but have a much more complex function. The significance of all of these expressions must be traced back to the wu of Laozi: a type of negation that does not simply negate, but places us in a new kind of relation to ‘things’—a phenomenological waiting that allows them to manifest, one that acknowledges the space that is the possibility of their coming to presence, one that appreciates the emptiness that is the condition of the possibility of their capacity to function, to be useful (as the hollow inside a house makes it useful for living). The behavior of one who wanders beyond becomes wuwei: sensitive and responsive without fixed preconceptions, without artifice, responding spontaneously in accordance with the unfolding of the inter-developing factors of the environment of which one is an inseparable part.

But it is not just the crossing of horizontal boundaries that is at stake. There is also the vertical distance that is important: one rises to a height from which formerly important distinctions lose what appeared to be their crucial significance. Thus arises the distinction between the great and the small, or the Vast (da) and the petty (xiao). Of this distinction Zhuangzi says that the petty cannot come up to the Vast: petty understanding that remains confined and defined by its limitations cannot match Vast understanding, the expansive understanding that wanders beyond. Now, while it is true that the Vast loses sight of distinctions noticed by the petty, it does not follow that they are thereby equalized, as Guo Xiang suggests. For the Vast still embraces the petty in virtue of its very vastness. The petty, precisely in virtue of its smallness, is not able to reciprocate.

Now, the Vast that goes beyond our everyday distinctions also thereby appears to be useless. A soaring imagination may be wild and wonderful, but it is extremely impractical and often altogether useless. Indeed, Huizi, Zhuangzi’s friend and philosophical foil, chides him for this very reason. But Zhuangzi expresses disappointment in him: for his inability to sense the use of this kind of uselessness is a kind of blindness of the spirit. The useless has use, only not as seen on the ordinary level of practical affairs. It has a use in the cultivation and nurturing of the ‘shen‘ (spirit), in protecting the ancestral and preserving one’s life, so that one can last out one’s natural years and live a flourishing life. Now, this notion of a flourishing life is not to be confused with a ‘successful’ life: Zhuangzi is not impressed by worldly success. A flourishing life may indeed look quite unappealing from a traditional point of view. One may give up social ambition and retire in relative poverty to tend to one’s shen and cultivate one’s xing (nature, or life potency).

To summarize: When we wander beyond, we leave behind everything we find familiar, and explore the world in all its unfamiliarity. We drop the tools that we have been taught to use to tame the environment, and we allow it to teach us without words. We imitate its spontaneous behavior and we learn to respond immediately without fixed articulations.

b. Chapter 2: Qi Wu Lun (Discussion on Smoothing Things Out)

If the Inner Chapters form the core of the Zhuangzi collection, then the Qi Wu Lun may be thought of as forming the core of the Inner Chapters. It is, at any rate, the most complex and intricate of the chapters of the Zhuangzi, with allusions and allegories, highly condensed arguments, and baffling metaphors juxtaposed without explanation. It appears to be concerned with the deepest and most ‘abstract’ understanding of ourselves, our lives, our world, our language, and indeed of our understanding itself. The most perplexing sections concern language and judgment, and are filled with paradox, sometimes even contradiction. But the contradictions are not easy to dismiss: their context indicates that they have a deep significance. In part, they appear to attempt to express an understanding about the limits of understanding itself, about the limits of language and thought.

This creates a problem for the interpreter, and especially for the translator. How do we deal with the contradictions? The most common solution is to paraphrase them so as to remove the direct contradictoriness, under the presupposition that no sense can be made of a contradiction. The most common way to remove the contradictions is to insert references to points of view. Those translators, such as A. C. Graham, who do this are following the interpretation of the Jin dynasty commentator Guo Xiang, who presents the philosophy as a form of relativism: apparently opposing judgments can harmonized when it is recognized that they are made from different perspectives.

According to Guo Xiang’s interpretation, each thing has its own place, its own nature (ziran); and each thing has its own value that follows from its own nature. If so, then nothing should be judged by values appropriate to the natures of other things. According to Guo Xiang, the vast and the small are equal in significance: this is his interpretation of the word “qi” in the title, “equalization of all viewpoints”. Now, such a radical relativism may have the goal of issuing a fundamental challenge to the status quo, arguing that the established values have no more validity than any of the minority values, no matter how shocking they may seem to us. In this way, its effect would be one of destabilization of the social structure. Here, however, we see another of the possible consequences of such a position: its inherent conservativeness. Guo Xiang’s purpose in asserting this radical uniqueness and necessity of each position is conservative in this way. Indeed, it appears to be articulated precisely in response to those who oppose the traditional Ruist values of humanity and rightness (ren and yi) by claiming to have a superior mystical ground from which to judge them to be lacking. Guo Xiang’s aim in asserting the equality of every thing, every position, and every function, is to encourage each thing, and each person, to accept its own place in the hierarchical system, to acknowledge its value in the functioning of the whole. In this way, radical relativism actually forestalls the possibility of radical critique altogether!

According to this reading, the Vast perspective of the giant Peng bird is no better than the petty perspectives of the little birds who laugh at it. And indeed, Guo Xiang, draws precisely this conclusion. But there is a problem with taking this reading too seriously, and it is the kind of problem that plagues all forms of radical relativism when one attempts to follow them through consistently. Simply put, Zhuangzi would have to acknowledge that his own position is no better than those he appears to critique. He would have to acknowledge that his Daoist philosophy, indeed even this articulation of relativism, is no improvement over Confucianism after all, and that it is no less short-sighted than the logic-chopping of the Mohists. This, however, is a consequence that Zhuangzi does not recognize. This is surely an indication that the radical relativistic interpretation is clearly a misreading.

Recently, some western interpreters (Lisa Raphals and Paul Kjellberg, for example) have focused their attention on aspects of the text that express affinities with the Hellenistic philosophy of Skepticism. Now, it is important not to confuse this with what in modern philosophy is thought of as a doctrine of skepticism, the most common form of which is the claim that we cannot ever claim to know anything, for at least the reason that we might always be wrong about anything we claim to know—that is, because we can never know anything with absolute certainty. This is not quite the claim of the ancient Skeptics. Arguing from a position of fallibilism, these latter feel that we ought never to make any final judgments that go beyond the immediate evidence, or the immediate appearances. We should simply accept what appears at face value and have no further beliefs about its ultimate consequences, or its ultimate value. In particular, we should refrain from making judgments about whether it is good or bad for us. We bracket (epoche) these ultimate judgments. When we see that such things are beyond our ability to know with certainty, we will learn to let go of our anxieties and accept the things that happen to us with equanimity. Such a state of emotional tranquility they call ‘ataraxia.’

Now, the resonances with Zhuangzi’s philosophy are clear. Zhuangzi also accepts a form of fallibilism. While he does not refrain from making judgments, he nevertheless acknowledges that we cannot be certain that what we think of as good for us may not ultimately be bad for us, or that what we now think of as something terrible to be feared (death, for example) might not be an extraordinarily blissful awakening and a release from the toils and miseries of worldly life. When we accept this, we refrain from dividing things into the acceptable and the unacceptable; we learn to accept the changes of things in all their aspects with equanimity. In the Skeptical reading, the textual contradictions are also resolved by appealing to different perspectives from which different judgments appear to be true. Once one has learnt how to shift easily between the perspectives from which such different judgments can be made, then one can see how such apparently contradictory things can be true at the same time—and one no longer feels compelled to choose between them.

There is, however, another way to resolve these contradictions, one that involves recognizing the importance of continuous transformation between contrasting phenomena and even between opposites. In the tradition of Laozi’s cosmology, Zhuangzi’s worldview is also one of seasonal transformations of opposites. The world is seen as a giant clod (da kuai) around which the heavens (tian) revolve about a polar axis (daoshu). All transformations have such an axis, and the aim of the sage is to settle into this axis, so that one may observe the changes without being buffeted around by them.

Now, the theme of opposites is taken up by the Mohists, in their later Mohist Canon, but with a very different understanding. The later Mohists present a detailed analysis of judgments as requiring bivalence: that is judgments may be acceptable (ke) (also, ‘affirmed’ shi) or unacceptable (buke) (also ‘rejected’fei); they must be one or the other and they cannot be both. There must always be a clear distinction between the two. It is to this claim, I believe, that Zhuangzi is directly responding. Rejecting also the Mohist style of discussion, he appeals to an allusive, aphoristic, mythological style of poetic writing to upset the distinctions and blur the boundaries that the Mohists insist must be held apart. The Mohists believe that social harmony can only be achieved when we have clarity of distinctions, especially of evaluative distinctions: true/false, good/bad, beneficial/harmful. Zhuangzi’s position is that this kind of sharp and rigid thinking can result ultimately only in harming our natural tendencies (xing), which are themselves neither sharp nor rigid. If we, on the contrary, learn to nurture those aspects of our heart-minds (xin), our natural tendencies (xing), that are in tune with the natural (tian) and ancestral (zong) within us, then we will eventually find our place at the axis of the way (daoshu) and will be able to ride the transformations of the cosmos free from harm. That is, we will be able to sense and respond to what can only be vaguely expressed without forcing it into gross and unwieldy verbal expressions. We are then able to recognize the paradoxes of vagueness and indeterminacy that arise from infinitesimal processes of transformation.

Put another way, our knowledge and understanding (zhi, tong, da) are not just what we can explicitly see before us and verbalize: in modern terms, they are not just what is ‘consciously,’ ‘conceptually,’ or ‘linguistically’ available to us. Zhuangzi also insists on a level of understanding that goes beyond such relatively crude modes of dividing up our world and experiences. There are hidden modes of knowing, not evident or obviously present, modes that allow us to live, breathe, move, understand, connect with others without words, read our environments through subtle signs; these modes of knowing also give us tremendous skill in coping with others and with our environments. These modes of knowing Zhuangzi calls “wuzhi”, literally ‘without knowing,’ or ‘unknowing’. What is known by such modes of knowing, when we attempt to express it in words, becomes paradoxical and appears contradictory. It seems that bivalent distinctions leave out too much on either side of the divide: they are too crude a tool to cope with the subtlety and complexity of our non-conceptual modes of knowing. Zhuangzi, following a traditional folk psychology of his time, calls this capacity shenming: “spirit insight.”

When we nurture that deepest and most natural, most ancestral part of our pysches, through psycho-physical meditative practices, we at the same time nurture these non-cognitive modes of understanding, embodied wisdoms, that enable us to deal successfully with our circumstances. It is then that we are able to cope directly with what from the limited perspective of our ‘socialized’ and ‘linguistic’ understanding seems to be too vague, too open, too paradoxical.

c. Chapter 3: Yang Sheng Zhu (The Principle of Nurturing Life)

This chapter, like the Anarchist Utopian chapters, deals with the way to nurture and cultivate one’s ‘life tendencies’ (sheng, xing) so as to enable one to live skillfully and last out one’s natural years (zhong qi tian nian). There is a ‘potency’ within oneself that is a source of longevity, an ancestral place from which the phenomena of one’s life continue to arise. This place is to be protected (bao), kept whole (quan), nurtured and cultivated (yang). The result is a sagely and skillful life. We must be careful how we understand this word, ‘skill.’ Zhuangzi takes pains to point out that it is no mere technique. A technique is a procedure that may be mastered, but the skill of the sage goes beyond this. One might say that it has become an ‘art,’ a dao. With Zhuangzi’s conception, any physical activity, whether butchering a carcass, making wooden wheels, or carving beautiful ceremonial bell stands, becomes a dao when it is performed in a spiritual state of heightened awareness (‘attenuation’ xu).

Zhuangzi sees civic involvement as particularly inimical to the preservation and cultivation of one’s natural life. In order to cultivate one’s natural potencies, one must retreat from social life, or at least one must retreat from the highly complex and artificially structured social life of the city. One undergoes a psycho-physical training in which one’s sensory and physical capacities become honed to an extraordinary degree, indicating one’s attunement with the transformations of nature, and thus highly responsive to the tendencies (xing) of all things, people, and processes. The mastery achieved is demonstrated (both metaphorically, and literally) by practical embodied skill. That is, practical embodied skill is also a metaphor representing the mastery of the life of the sage, and so it is also a sign of sagehood (though not all those who are skillful are to be reckoned as sages). Thus, we see many examples of individuals who have achieved extraordinary levels of excellence in their achievements—practical, aesthetic, and spiritual. Butcher Ding provides an example of a practical, and very lowly skill; Liezi’s teacher, Huzi, in chapter 7, provides an example of skill in controlling the very forces responsible for life themselves. Chapter 19, Mastering Life, is replete with examples: a cicada catcher, a ferryman, a carpenter, a swimmer, and Woodcarver Qing, whose aesthetic skill reaches ‘magical’ heights.

d. Chapter 4: Ren Jian Shi (The Realm of Human Interactions)

In this chapter, Zhuangzi continues the theme broached in the last chapter, but now takes on the problem of how to maintain and preserve one’s life and last out one’s years while living in the social realm, especially in circumstances of great danger: a life of civic engagement in a time of social corruption.

The Daoists, especially the authors of the anarchistic utopian chapters, are highly critical of the artificiality required to create and sustain complex social structures. The Daoists are skeptical of the ability of deliberate planning to deal with the complexities of the world within which our social structures have their place. Even the developments of the social world when left to themselves are ‘natural’ developments, and as such escape the confines of planned, structured thinking. The more we try to control and curtail these natural meanderings, the more complicated and unwieldy the social structures become. According to the Daoists, no matter how complex we make our structures, they will never be fully able to cope with the fluid flexibility of natural changes. The Daoists perceive the unfolding of the transformations of nature as exhibiting a kind of natural intelligence, a wisdom that cannot be matched by deliberate artificial thinking, thinking that can be articulated in words. The result is that phenomena guided by such artificial structures quickly lose their course, and have to be constantly regulated, re-calibrated. This need gives rise to the development and articulation of the artificial concepts of ren and yi for the Ruists, and shi and fei for the Mohists.

The Ruists emphasize the importance of cultivating the values of ren ‘humanity’ and yi ‘appropriateness/rightness.’ The Mohists identify a bivalent structure of preference and evaluation, shifei. Our judgments can be positive or negative, and these arise out of our acceptance and rejection of things or of judgments, and these in turn arise out of our emotional responses to the phenomena of benefit and harm, that is, pleasure and pain. Thus, we set up one of two types of systems: the intuitive renyi morality of the Ruists, or the articulated structured shifei of the Mohists.

Zhuangzi sees both of these as dangerous. Neither can keep up with the complex transformations of things and so both will result in harm to our shen and xing. They lead to the desire of rulers to increase their personal profit, their pleasure, and their power, and to do so at the expense of others. The best thing is to steer clear of such situations. But there are times when one cannot do so: there is nothing one can do to avoid involvement in a social undertaking. There are also times—if one has a Ruist sensibility—when one will be moved to do what one can and must in order to improve the social situation. Zhuangzi makes up a story about Confucius’ most beloved and most virtuous follower, Yen Hui, who feels called to help ‘rectify’ the King of a state known for his selfishness and brutality.

Zhuangzi thinks that such a motivation, while admirable, is ultimately misguided. There is little to nothing one can do to change things in a corrupt world. But if you really have to try, then you should be aware of the dangers, be aware of the natures of things, and of how they transform and develop. Be on the lookout for the ‘triggers’: the critical junctures at which a situation can explode out of hand. In the presence of danger, do not confront it: always dance to one side, redirect it through skilled and subtle manipulations, that do not take control, but by adding their own weight appropriately, redirect the momentum of the situation. One must treat all dangerous social undertakings as a Daoist adept: one must perform xinzhai, fasting of the heart-mind. This is a psycho-physical discipline of attenuation, in which one nurtures one’s inner potencies by thinning out one’s personal preferences and keeping one’s emotions in check, so that one may achieve a heightened sensitivity to the tendencies of things. One then responds with the skill of a sage to the dangerous moods and intentions of one’s worldly ruler.

e. Chapter 5: De Chong Fu (Signs of the Flourishing of Potency)

This chapter is populated with a collection of characters with bodily eccentricities: criminals with amputated feet, people born with ‘ugly’ deformities, hunchbacks with no lips. Perhaps some of these are moralistic advisors, like those of chapter 4, who were unsuccessful in bringing virtue and harmony to a corrupt state, and instead received the harsh punishment of their offended ruler. But it is also possible that some were born with these physical ‘deformities.’ As the Commander of the Right says in chapter 3, “When tian (nature) gave me life, it saw to it that I would be one footed.” These then are people whose natural capacity (de) has been twisted somehow, redirected, so that it gives them a potency (de) that is beyond the normal human range. At any rate, this out of the ordinary appearance, this extraordinary physical form, is a sign of something deeper: a potency and a power (de) that connects them more closely to the ancestral source. These are the sages that Zhuangzi admires: those whose virtue (de) is beyond the ordinary, and whose signs of virtue indicate that they have gone beyond.

But what goes beyond is also the source of life. To hold fast to that which is beyond both living and dying, is perhaps also to hold fast to something more primordial that is beyond human and inhuman. To identify with and nurture this source is to nurture that which is at the root of our humanity. If so, then one does not necessarily become inhuman. Indeed, one might argue that this creates the possibility of deepening one’s most genuine humanity, insofar as this is a deeper nature still.

f. Chapter 6: Da Zong Shi (The Vast Ancestral Teacher)

The first part of this chapter is devoted to a discussion of the zhenren: the “genuine person,” or “genuine humanity,” (or in older translations, “True Man”). It begins by asking about the relation between tian and ren, the natural/heaven and the human, and suggests that the greatest wisdom lies in the ability to understand both. Thus, to be forced to choose between being natural or being human is a mistake. A genuinely flourishing human life cannot be separated from the natural, but nor can it on that account deny its own humanity. Genuine humanity is natural humanity.

There are several sections devoted to explicating this genuine humanity. We find that the genuinely human person, the zhen ren, is in tune with the cycles of nature, and is not upset by the vicissitudes of life. The zhenren like Laozi’s sage is somehow simultaneously unified with things, and yet not tied down by them. The zhenren is in tune with the cycles of nature, and with the cycles of yin yang, and is not disturbed or harmed by them. In fact, the zhenren is not harmed by them either in what appears to us to be their negative phases, nor are their most extreme phases able to upset the balance of the zhenren. This is sometimes expressed with what I take to be the hyperbole that the sage or zhenren can never be drowned by the ocean, nor burned by fire.

In the second part of the chapter, Zhuangzi hints at the process by which we are to cultivate our genuine and natural humanity. These are meditative practices and psycho-physical disciplines—”yogas” perhaps—by which we learn how to nourish the ancestral root of life that is within us. We learn how to identify with that center which functions as an axis of stability around which the cycles of emotional turbulence flow. By maintaining ourselves as a shifting and responding center of gravity we are able to maintain an equanimity without giving up our feelings altogether. We enjoy riding the dragon without being thrown around by it. Ordinarily, we are buffeted around like flotsam in a storm, and yet, by holding fast to our ancestral nature, and by following the nature of the environment—by “matching nature with nature”—we free ourselves from the mercy of random circumstances.

In this chapter we see a mature development of the ideas of life and death broached in the first three chapters. Zhuangzi continues musing on the significance of our existential predicament as being inextricably tied into interweaving cycles of darkness and light, sadness and joy, living and dying. In chapter two, it was the predicament itself that Zhuangzi described, and he tried to focus on the inseparability and indistinguishability of the two aspects of this single process of transformation. In this chapter, Zhuangzi tries to delve deeper to reach the center of balance, the ‘axis of the way,’ that allows one to undergo these changes with tranquility, and even to accept them with a kind of ‘joy.’ Not an ecstatic affirmation, to be sure, but a tranquil appreciation of the richness, beauty, and “inevitability” of whatever experiences we eventually will undergo. Again, not that we must experience whatever is ‘fated’ for us, or that we ought not to minimize harm and suffering where we can do so, but only that we should acknowledge and accept our situatedness, our thrownness into our situation, as the ‘raw materials’ that we have to deal with.

There are mystical practices hinted at that enable the sage to identify with the datong, the greater flow, not with the particular arisings of these particular emotions, or this particular body, but with what lies within (and below and above) as their ancestral root. These meditative and yogic practices are hinted at in this chapter, and also in chapter 7, but nothing in the text reveals what they are. It is not unreasonable to believe that similar techniques have been handed down by the practitioners of religious Daoism. It is clear, nonetheless, that part of the change is a change in self-understanding, self-identification. We somehow learn to expand, to wander beyond, our boundaries until they include the entire cosmic process. This entire process is seen as like a potter’s wheel, and simultaneously as a whetstone and as a grindstone, on which things are formed, and arise, sharpened, and are ground back down only to be made into new forms. With each ‘birth’ (sheng) some ‘thing’ (wu) new arises, flourishes, develops through its natural (tian) tendencies (xing), and then still following its natural tendencies, responding to those of its natural environment, it winds down: enters (ru) back into the undifferentiated (wu) from which it emerged (chu). The truest friendship arises when members of a community identify with this unknown undifferentiated process in which they are embedded, “forgotten” differences between self and other, and spontaneously follows the natural developments of which they are inseparable “parts.”

g. Chapter 7: Ying Di Wang (Responding to Emperors and Kings)

The last of the Inner Chapters does not introduce anything new, but closes by returning to a recurring theme from chapters 1, 3, 5, and 6: that of withdrawing from society. This ‘withdrawal’ has two functions: the first is to preserve one’s ‘life’; the second is to allow society to function naturally, and thus to bring itself to a harmonious completion. Rather than interfering with social interactions, one should allow them to follow their natural course, which, Zhuangzi believes, will be both imaginative and harmonious.

These themes resonate with those of the Anarchist chapters in the Outer (and Miscellaneous) chapters: 8 to 11a and 28 to 32. These encourage a life closer to nature in which one lets go of deliberate control and instead learns how to sense the tendencies of things, allowing them to manifest and flourish, while also adding one’s weight to redirect their momentum away from harm and danger. Or, if harm and danger are unavoidable, then one learns how to minimize them, and how to accept whatever one does have to suffer with equanimity.

4. Key Interpreters of Zhuangzi

The earliest of the interpreters of Zhuangzi’s philosophy are of course his followers, whose commentaries and interpretations have been preserved in the text itself, in the chapters that Liu Xiaogan ascribes to the “Shu Zhuang Pai,” chapters 17 to 27. Most of these chapters constitute holistic developments of the ideas of the Inner Chapters, but some of them concentrate on particular issues raised in particular chapters. For example, the author of Chapter 17, the Autumn Floods, elaborates on the philosophy of perspective and overcoming boundaries that is discussed in the first chapter, Xiao Yao You. This chapter develops the ideas in several divergent directions: relativism, skepticism, pragmatism, and even a kind of absolutism. Which of these, if any, is the overall philosophical perspective is not easy to discern. The author of chapter 19, Da Sheng, Mastering Life, takes up the theme of the cultivation of the wisdom of embodied skill that is introduced in chapter 3, Yang Sheng Zhu, The Principle of Nurturing Life. The author of chapter 18, Zhi Le, Utmost Happiness, and chapter 22, Zhi Bei You, Knowledge Wanders North, continues the meditations on life and death, and the cultivation of meditative practice, that are explored in chapter 6, Da Zong Shi, The Vast Ancestral Teacher.

The next group of interpreters have also become incorporated into the extant version of the text. They are the school of philosophers inclined towards anarchist utopias, that Graham identifies as a “Primitivist” and a school of “Yangists,” chapters 8 to 11, and 28 to 31. These thinkers appear to have been profoundly influenced by the Laozi, and also by the thought of the first and last of the Inner Chapters: “Wandering Beyond,” and “Responding to Emperors and Kings.” There are also possible signs of influence from Yang Zhu, whose concern was to protect and cultivate one’s inner life-source. These chapters combine the anarchistic ideals of a simple life close to nature that can be found in the Laozi with the practices that lead to the cultivation and nurturing of life. The practice of the nurturing of life in chapter 3, that leads to the “lasting out of one’s natural years,” becomes an emphasis on maintaining and protecting xing ming zhi qing “the essentials of nature and life’s command” in these later chapters.

The third main group, whose interpretation has been preserved in the text itself, is the Syncretist school, an eclectic school whose aim to is promote an ideal of mystical rulership, influenced by the major philosophical schools of the time, especially those that recommend a cultivation of inner potency. They may or may not be exemplary of the so-called ‘Huang-Lao’ school. They scoured the earlier philosophers in order to extract what was valuable in their philosophies, the element of the dao that is to be found in each philosophical claim. In particular, they sought to combine the more ‘mystically’ inclined philosophies with the more practical ones to create a more complete dao. The last chapter, Tian Xia, The World, considers several philosophical schools, and comments on what is worthwhile in each of them. Zhuangzi’s philosophy is here characterized as “vast,” “vague,” “outrageous,” “extravagant,” and “reckless”; he is also recognized for his encompassing modes of thought, his lack of partisanship, and his recklessness is acknowledged to be harmless. Nevertheless, it is stated that he did not succeed in getting it all.

Perhaps the most important of the pre-Qin thinkers to comment on Zhuangzi is Xunzi. In his “Dispelling Obsessions” chapter, anticipating the eclecticism of the Huang-Lao commentators of chapter 33, he considers several philosophical schools, mentions the corner of ‘truth’ that each has recognized, and then goes on to criticize them for failing to understand the larger picture. Xunzi mentions Zhuangzi by name, describes him as a philosopher who recognizes the value of nature and of following the tendencies of nature, but who thereby fails to recognize the value of the human ‘ren’. Indeed, Zhuangzi seems to be aware of this kind of objection, and even delights in it. He revels in knowing that he is one who wanders off into the distance, far from human concerns, one who is not bound by the guidelines. Perhaps in doing so he corroborates Xunzi’s fears.

Another text that reveals what might be a development of Zhuangzi’s philosophy is the Liezi. This is a philosophical treatise that clearly stands in the same tradition as the Zhuangzi, dealing with many of the same issues, and on occasion with almost identical stories and discussions. Although the Daoist adept, Liezi, to whom the text is attributed is said to have lived before Zhuangzi, the text clearly dates from a later period, perhaps compiled as late as the Eastern Han, though in terms of linguistic style the material appears to date from around the same period as Zhuangzi. The Liezi continues the line of philosophical thinking of the Xiao Yao You, and the Qiu Shui, taking up the themes of transcending boundaries, and even cosmic realms, by spirit journeying. The leaving behind and overturning of human values is a theme that is repeated in this text, though again not without a certain paradoxical tension: after all, the purpose of such journeying and overturning of values is ultimately to enable us in some sense to live ‘better’ lives. While Zhuangzi’s own philosophy exerted a significant influence on the interpretation of Buddhism in China, theLiezi appears to provide a possible converse case of Mahayana Buddhist influence on the development of the ideas of Zhuangzi.

The Jin dynasty scholar, Guo Xiang, is one of the most influential of the early interpreters. His “relativistic” reading of the text has become the received interpretation, and his own distinctive style of philosophical thinking has in this way become almost inseparable from that of Zhuangzi. The task of interpreting Zhuangzi independently of Guo Xiang’s reading is not easy to accomplish. His contribution and interpretation have already been discussed in the body of the entry (See sections above: The Zhuangzitext, and Chapter 2: Qi Wu Lun (Discussion on Smoothing Things Out) ). The Sui dynasty scholar, Lu Deming, produced an invaluable glossary and philological commentary on the text, enabling later generations to benefit from his vast linguistic expertise. The Ming dynasty Buddhist poet and scholar, Han Shan, wrote a commentary on the Zhuangzi from a Chan Buddhist perspective. In a similar vein, the Qing dynasty scholar, Zhang Taiyan, constructed a masterful interpretation of the Zhuangzi in the light of Chinese Buddhist Idealism, or Weishilun. Guo Qingfan, a late Qing, early twentieth century scholar, collected and synthesized the work of previous generations of commentators. The scholarly work of Takeushi Yoshio in Japan has also been of considerable influence. Qian Mu is a twentieth century scholar who has exerted considerable efforts with regard to historical scholarship. Currently, in Taiwan, Chen Guying is the leading scholar and interpreter of Zhuangzi, and he uses his knowledge of western philosophy, particularly western epistemology, cosmology, and metaphysics, to throw new light on this ancient text.

In the west, probably the most important and influential scholar was A. C. Graham, whose pioneering work on this text, and on the later Mohist Canon, has laid the groundwork and set an extraordinarily high standard for future western philosophical scholarship. Graham, following the reading of Guo Xiang, develops a relativistic reading based on a theory of the conventional nature of language. Chad Hansen is a current interpreter who sees the Daoists as largely theorists of language, and he interprets Zhuangzi’s own contribution as a form of “linguistic skepticism.” Recently, there has been a growth of interest in the aspects of Zhuangzi’s philosophy that resonate with the Hellenistic school of Skepticism. This was proposed by Paul Kjellberg, and has been pursued by other scholars such as Lisa Raphals.

5. References and Further Reading

  • Ames, Roger, ed. Wandering at Ease in the Zhuangzi. Albany: State University of New York Press, 1998.
  • Ames, Roger, and Takahiro Nakajima. Zhuangzi and the Happy Fish. Honolulu: University of Hawai`i Press, 2015.
  • Chai, David. Early Zhuangzi Commentaries: On the Sounds and Meanings of the Inner Chapters. Sarrbrucken: VDM Publishing, 2008.
  • Chuang Tzu. Basic Writings. Translated by Burton Watson. New York: Columbia University Press, 1964.
  • Chuang Tzu. The Complete Works of Chuang Tzu. Translated by Burton Watson. New York: Columbia University Press, 1968.
  • Chuang Tzu. Chuang-Tzu The Inner Chapters: A Classic of Tao. Translated by A. C. Graham. London: Mandala, 1991.
  • Chuang Tzu. Chuang tzu. Translated by James Legge, Sacred Books of the East, volumes 39, 40. Oxford: Oxford University Press, 1891.
  • Cook, Scott. Hiding the World Within the World: Ten Uneven Discourses on Zhuangzi. Albany: State University of New York Press, 2003.
  • Coutinho, Steve. An Introduction to Daoist Philosophies. New York: Columbia University Press, 2014.
  • Coutinho, Steve. “Conceptual Analyses of the Zhuangzi”. Dao Companion to Daoist Philosophy. Springer, 2014.
  • Coutinho, Steve. “Zhuangzi”. Berkshire Dictionary of Chinese Biography, pp. 149-162. 2014.
  • Coutinho, Steve. Zhuangzi and Early Chinese Philosophy: Vagueness, Transformation, and Paradox. London: Ashgate Press, forthcoming, December, 2004.
  • Fung, Yu-Lan. Chuang-Tzu: A New Selected Translation with an Exposition of the Philosophy of Kuo Hsiang. 2nd ed. New York: Paragon Book Reprint Corporation, 1964.
  • Graham, Angus Charles. Later Mohist Logic, Ethics and Science. London: School of Oriental and African Studies, 1978.
  • Graham, Angus Charles. Disputers of the Tao: Philosophical Argument in Ancient China. La Salle: Open Court, 1989.
  • Graham, A. C. “Chuang-tzu’s Essay on Seeing things as Equal.” History of Religions 9 (1969/1970), pp. 137—159. Reproduced in Roth, 2003.
  • Graham, A. C. “Chuang-tzu: Textual Notes to a Partial Translation.” London: School of Oriental and African Studies, 1982. Reproduced in Roth, 2003.
  • Hansen, Chad. A Daoist Theory of Chinese Thought: A Philosophical Interpretation. New York, Oxford University Press, 1992.
  • Ivanhoe, P. J. & Paul Kjellberg, ed. Essays on Skepticism, Relativism, and Ethics in the Zhuangzi. Albany: State University of New York Press, 1996.
  • Kaltenmark, Max. Lao Tzu and Taoism. Translated by Roger Greaves. Stanford: Stanford University Press, 1969.
  • Kjellberg, Paul. Zhuangzi and Skepticism. PhD dissertation, Department of Philosophy, Stanford University, 1993.
  • Klein, Esther. (2010). Were there “Inner Chapters” in the Warring States? A new examination of evidence about the Zhuangzi. T’oung Pao, 4/5, pp. 299–369.
  • Kohn, Livia. Zhuangzi: Text and Context. Three Pines Press, 2014.
  • Kohn, Livia. New Visions of the Zhuangzi. Three Pines Press, 2015.
  • Lawton, Thomas, ed. New Perspectives on Chu Culture During the Eastern Zhou Period. Washington, D.C.: Smithsonian Institution, 1991.
  • Li, Xueqin. Eastern Zhou and Qin Civilizations. Translated by Kwang-chih Chang. New Haven: Yale University Press, 1985.
  • Liu, Xiaogan. Classifying the Zhuangzi Chapters. Translated by Donald Munro. Michigan Monographs in Chinese Studies, no. 65. Ann Arbor, Michigan: The University of Michigan, 1994.
  • Mair, Victor H., ed. Experimental Essays on Chuang-tzu. Honolulu: University of Hawaii Press, 1983.
  • Mair, Victor. ed. Chuang-tzu: Composition and Interpretation. Symposium issues, Journal of Chinese Religions 11, 1983.
  • Mair, Victor. Wandering on the Way: Early Taoist Tales and Parables of Chuang Tzu. New York: Bantam Books, 1994.
  • Maspero, Henri. Le Taoïsme. Vol. II, Mélanges Posthumes sur les Religions et l’histoire de la Chine. Paris: Civilisations du Sud S.A.E.P., 1950.
  • McCraw, David. Stratifying Zhuangzi: Rhyme and other quantitative evidence. Language and Linguistics Monograph Series, 41. Taipei, Taiwan: Institute of Linguistics, Academia Sinica, 2010.
  • Roth, Harold. “Who Compiled the Chuang-tzu?” in Chinese Texts and Philosophical Contexts. edited by Henry Rosemont. La Salle: Open Court, 1991.
  • Roth, Harold. A Companion to A. C. Graham’s Chuang Tzu: The Inner Chapters. Honolulu: University of Hawai’i Press, 2003.
  • Wang, Bo. Thinking Through the Inner Chapters. Three Pines Press, 2014.
  • Wu, Kuang-ming. The Butterfly as Companion: Meditations on the First Three Chapters of the Chuang Tzu. Albany: State University of New York Press, 1990.
  • Ziporyn, Brook. Zhuangzi: The Essential Writings: With Selections from Traditional Comentaries. Hackett, 2009.

Author Information

Steve Coutinho
Email: coutinho@muhlenberg.edu
Muhlenberg College
U. S. A.

Zhang Zai (Chang Tsai, 1020—1077)

Chang_TsaiZhang Zai was one of the pioneers of the Song dynasty philosophical movement called “Study of the Way,” often known as Neo-Confucianism. One of the most distinctive features of many of these new ways of thought being formulated at the time was an increased interest in metaphysics, usually influenced by the Classic of Changes (Yijing). Zhang’s most significant contributions to Chinese philosophy were primarily in the area of metaphysics, where he came up with a new theory of qi that was very influential. He is also credited with differentiating original nature and physical nature, which was to become a key concept in the most prominent Song philosophers, the Cheng brothers and Zhu Xi (Chu Hsi). Ethically, his most influential doctrines were found in the brief essay “Western Inscription,” where he propounded the ideas of being one body with all things and universal caring. After his death, most of his disciples were absorbed into the Cheng brothers’ school and his thought become known primarily through the efforts of the Cheng brothers and Zhu Xi, who honored Zhang as one of the founders of the Study of the Way.

Table of Contents

  1. Life and Work
  2. Metaphysics
  3. Human Nature and Ethics
  4. Moral Education and the Heart
  5. References and Further Reading

1. Life and Work

Zhang Zai is also known as Zhang Hengqu, after the town where he grew up and later did much of his teaching. He was born in 1020 and died in 1077. As a youth he was interested in military affairs, but began studying the Confucian texts on the recommendation of an important official who was impressed with Zhang’s abilities. Like most of the Song philosophers, Zhang was initially dissatisfied with Confucian thought and studied Buddhism and Daoism for several years. Eventually, however, he decided that the Way was not to be found in Buddhism or Daoism and returned to Confucian texts. This acquaintance with the other major ways of thought was to have significant influence on Zhang’s own views. According to tradition, around 1056 Zhang sat on a tiger skin in the capital and lectured on the Classic of Changes. It may have been during this period that he first became acquainted with the Cheng brothers, who were actually his younger cousins. After passing the highest level of the civil service examinations, he held a series of minor government posts.

In 1069 Zhang was recommended to the emperor and given a position in the capital, but not long after he ran into conflict with the prime minister and retired home to Hengqu, where he spent his time in retirement studying and teaching. This was probably his most productive period for developing and spreading his own philosophy. In 1076 he completed his most important work, Correcting Ignorance, and presented it to his disciples. “Western Inscription” was originally part of this longer work. That same year he was summoned back to the capital and restored to an important position. However, in the winter he became ill and resigned again to try to convalesce at home. He never reached home, dying on the road in 1077. Zhang was awarded a posthumous title in 1220 and enshrined in the Confucian temple in 1241. Many of Zhang’s writings have been lost. Zhu Xi collected selections of Zhang’s writings in his anthology of Song Study of the Way known as Reflections on Things at Hand. His most important surviving works are probably his commentary on the Changes and Correcting Ignorance.

2. Metaphysics

Zhang Zai’s metaphysics is largely based on the Classic of Changes, especially one of the commentaries, “Appended Remarks,” traditionally attributed to Confucius. According to Zhang, all things of the world are composed of a primordial substance called qiQi is sometimes translated as “substance,” “matter,” or “material force, but there is really no term in English that can capture its meaning for Zhang. Qi originally meant “breath” and is a very old concept in Chinese culture, particularly medicine. For Zhang, qi includes matter and the forces that govern interactions between matter, yin and yang. In its dispersed, rarefied state, qi is invisible and insubstantial, but when it condenses it becomes a solid or liquid and takes on new properties. All material things are composed of condensed qi: rocks, trees, even people. There is nothing that is not qi. Thus, in a real sense, everything has the same essence, an idea which has important ethical implications.

Zhang believed that qi is never created or destroyed; the same qi goes through a continuous process of condensation and dispersion. He compared it to water: water in liquid form or frozen into ice is still the same water. Similarly, condensed qi which forms things or dispersed qi is still the same substance. Condensation is theyin force of qi and dispersion is the yang force. In its wholly dispersed state, Zhang refers to qi as the Great Vacuity, a term he adopted from the Zhuangzi. He emphasized that though this qi is insubstantial, it still exists, and thus is very different from the Buddhist concept of emptiness. Whereas Buddhists argued that the fact that everything changes shows it has no essence and is unreal, Zhang argued that the very fact that it changes proves it is real. Everything that is real is composed of qi, and since qi always changes, anything real must change. Although the Great Vacuity always exists, the particular qi that is dispersed into the Great Vacuity at any time is not the same, which allows Zhang to assert both that qi always changes and the Great Vacuity always remains. There is no such thing as creation ex nihilo for Zhang, an idea he attributes to both Buddhists and Daoists.

Qi begins dispersed and undifferentiated in the Great Vacuity and through condensation forms material things. When these material things pass away, their qi disperses and rejoins the Great Vacuity to begin the process again. What looks like creation and destruction is just the never-ending movements of qi. These processes of condensation and dispersion have no outside cause; they are just part of the nature of qi. Zhang wholly naturalized the workings of qi and rejected any idea of an anthropomorphic Heaven that controlled things. While the Classic of Changes talked of the workings of ghosts and spirits, he reinterpreted these terms to mean the extending and receding of qi from and back to the Great Vacuity. It is all a naturally occurring process.

Unlike later thinkers like the Cheng brothers and Zhu Xi, the concept of pattern (li, also translated as “principle”) is not that important in Zhang’s philosophy. While in the thought of Cheng Yi and Zhu Xi, pattern is a transcendental universal that exists outside of qi, Zhang denied there was anything outside of qi. He seems to use pattern to describe the actions of qi condensing and dispersing, and for the pattern actions should fit to be moral. It certainly has none of the importance for Zhang that it did for some of his successors. Zhu Xi criticized Zhang for this, saying that qi was not enough to explain the workings of the universe without pattern as well.

3. Human Nature and Ethics

Mencius‘s belief that human nature is good, and his theory of qi allowed him to come up with what became the definitive Song answer to a classic problem in Mencius’ thought: if human nature is good, what makes people bad? Zhang’s solution involved positing two ways of looking at nature: the original nature and nature embodied in qi. Zhang claimed original nature exists forever in unchanging perfection, as opposed to material things which decay and die. This raises the question of what original nature consists of, since Zhang has claimed that everything is qi and qi always changes. He is not very clear on this point, but he apparently identified original nature with the undifferentiated qi of the Great Vacuity. When qi condenses to form human beings, each somehow retains some of the character of the unity of the Great Vacuity (or Great Harmony, as he sometimes calls it). This is the original nature, and that is what is good.

However, human beings also have a nature embodied in qi, which Zhang calls physical nature. Being ordinary qi, physical nature changes, eventually dissipating upon death. Zhang theorized that the physical nature obscures the original nature, preventing it from being fulfilled, and this is what causes people to stray from the path of goodness. At one point, he stated that if clear yang qi formed the greater part of physical nature one’s moral capacities would function, but if turbid yin qi dominated, material desires would hold sway. However, it is unclear whether he meant all yang qi was clear and all yin qi was turbid, and he often seems to attach no particular moral weight to whether qi is primarily yang (dispersed) or yin (condensed). As we are all different individuals, we all have slightly different physical natures. Some people are naturally bigger and stronger, some are more generous, some are wiser. This is all a result of the particular endowment of qi that makes up the individual, and since qi condenses into things without cause or direction, there is no reason an individual has the particular physical nature he starts out with: it is just a matter of chance. What is important in terms of moral cultivation is there is also the potential to transform one’s physical nature and fulfill one’s original nature.

Zhang had a deep faith in the potential for human improvement. Like earlier Confucian thinkers such as Mencius and Xunzi, he believed that moral development was a matter of effort, not ability. In a departure from his metaphysical views, where he held that qi changes naturally with no particular rhyme or reason, he claimed that the human heart has the capacity to alter one’s own qi. One can change one’s physical nature in order to fulfill one’s original nature. If that were not possible, goodness would be a matter of chance, being born with the right kind of qi. Zhang said that only the qi of life span, which determines whether one dies young or lives to an old age, cannot be changed. This was Zhang’s attack on longevity-oriented Daoists, who taught techniques that promised to increase one’s life span or even confer immortality. Undoubtedly, part of the goal of Zhang’s theory of qi and physical nature was to refute Buddhist and Daoist teachings.Many Song and Ming thinkers, such as Zhu Xi and Wang Yangming, identified desires as one of the main obstacles to moral development. Zhang Zai was no exception to this trend, which was also probably due to Buddhist influence. The issue of how to moderate or channel desires had been discussed in Chinese philosophy at least since Mencius and Xunzi, but while the earlier Confucian tradition had emphasized finding the proper outlet to express desires and not letting them entirely control one’s actions, eliminating desires entirely never seemed to be a real option. In Xunzi’s case, at least, he clearly denied it was possible to get rid of desires. Eliminating desires was a main focus of Buddhism, on the other hand, and this view of desires was adopted by many of these Study of the Way philosophers. These thinkers focused mainly on what we might call sensual desires. The desire to be a good person was naturally not a cause for concern, but desires for fine clothes, good food, and sex were seen as interfering with one’s original nature. Zhang used the term “material desires,” identifying them with physical nature, so they had to be overcome to return to one’s original nature. Desires somehow arise from the interaction of yin and yang that produces material objects, though Zhang is none too clear exactly what this process is. The fundamental point is that following one’s desires is giving into physical nature and regressing farther and farther away from original goodness.

Overcoming the desires of physical nature, one progresses toward original nature, or the heavenly within, as Zhang also put it. In “Western Inscription” Zhang illustrated this ideal state. Putting aside selfishness, one comes to understand the essential unity of all things. All things are formed from the same qi, and ultimately we all share the same substance. This was to become Zhang’s most famous ethical doctrine, the idea of forming one body with all things. As Zhang wrote in “Western Inscription, “That which fills the universe I regard as my body.” Everyone has Heaven and Earth as their father and mother, and thus everyone are brothers and sisters. Caring for others is like caring for one’s own family. Zhang further wrote, “Even those who are tired, infirm, crippled, or sick; those who have no brothers or children, wives, or husbands, are all my brothers who are in distress and have no one to turn to.” Though there are some precedents for this idea of brotherhood in earlier Confucianism, it sounds much more like the great compassion of Buddhism or the Mohist idea of universal caring—Zhang even uses the same term (jian’ai). In response to a question about this apparent slide into Mohism, Cheng Yi admitted that “Western Inscription” went a little too far, but still defended it as going beyond what previous sages had discussed and being as meritorious as Mencius’ idea of the goodness of human nature. Later thinkers recognized “Western Inscription” as Zhang’s greatest contribution to the Study of the Way.

4. Moral Education and the Heart

Presaging Zhu Xi, Zhang emphasized the role of education in moral development. Education was the way one transformed one’s qi and overcame physical nature. Following earlier philosophers such as Confucius and Xunzi, Zhang insisted that learning should always be directed toward moral cultivation, which in his case meant returning to one’s original nature. Knowledge was not important for its own sake, but for its contributions to moral character. Despite this, Zhang’s own interests were fairly wide-ranging, and he was especially interested in observing and explaining natural phenomena such as the movements of the stars and planets. Nevertheless, he tended not to emphasize this kind of scientific study in his writings on education, which focused on ritual and the classical Confucian texts. Compared with his contemporaries, Zhang placed more importance on the study of ritual. He believed ritual derived from original nature, and following it helps one hold onto original nature and overcome the obstructions of physical nature. Zhang’s interest in the Classic of Changes has already been mentioned, and he also recommended studying the other Confucian classics, the Analects, and Mencius. In contrast to some later Study of the Way philosophers, he did not put a lot of weight on histories, considering them inferior to the classics for helping people transform their qi.

Though Zhang recommended reciting and memorizing these books, he still believed that books were a means to returning to one’s original nature, not an end in themselves. Books functioned like a set of directions: they could tell you how to get to the destination, but they should be not confused with the destination. He felt close reading and textual criticism was not necessary, and getting too caught up in the meaning of a word or sentence could detract from understanding the overall meaning. And even in the classics, not everything should be accepted. Zhang recalled Mencius’ criticism of literal readings of the Classic of Documents and pointed out the necessity for understanding the classics in light of one’s own sense of what is right. This seems to set up a paradox: a student needs to study the classics to return to his original nature and know what is right, but he needs to know what is right to properly understand the classics.

Zhang resolved this contradiction by positing an innate moral sense in everyone that he called “this heart,” a term he apparently adopted from the Mencius. “This heart” presumably belongs to the original nature, and is still present even when embodied in qi, but it can be obstructed and blocked by the physical nature. Zhang referred to this situation as the problem of the “fixed heart” blocking “this heart.” The fixed heart means having intentions, certainty, inflexibility, and egotism. Under these conditions, “this heart” will not function properly and one will have difficulty understanding the classics. The learner must get rid of the fixed heart to let “this heart” free. At times, Zhang suggests that reading books itself helps preserve “this heart,” and it is this heart itself that understands the Way. Ritual is perhaps more important than books. Zhang once suggested that even the illiterate could still develop “this heart,” but apparently ritual was indispensable in overcoming the fixed heart.

Zhang also talked of “expanding the heart” and “making the heart vast.” Both these phrases mean eliminating the obstructions of the fixed heart and putting the heart in a state where it is ready to understand. He tended to value knowledge apprehended directly through the heart over knowledge from sense perception. Zhang did not deny the validity of empirical knowledge, but he believed its scope was limited. Knowledge gained from sense perception is just knowledge of things, not knowledge of the Way. Knowledge of the Way is knowledge gained through the virtuous nature, not through sense perception. “Knowledge gained through the virtuous nature” is another way of saying knowledge apprehended directly by the heart, though Zhang seems to be talking more about a kind of mystic experience than rationalism: he wrote that understanding of the Way is not something thought and consideration can bring about.

The goal of moral cultivation was fulfilling one’s original nature. This was Zhang Zai’s definition of becoming a sage, the term in Chinese philosophy for a perfected person. Another term common in philosophical discourse of the time was integrity or authenticity (cheng). Integrity figured in some important passages in the Doctrine of the Mean, which was one of the most important Confucian texts in Song Study of the Way. Zhang emphasized “integrity resulting from clarity,” which he explained as first coming to understanding through study and inquiry and then fulfilling one’s nature. This could be a long and difficult process, but if one could persist and make the necessary effort, one could fulfill one’s nature and become a sage. There was no greater goal for Zhang.

5. References and Further Reading

Very little is available in English on Zhang Zai. The reader is encouraged to look into general histories of Chinese philosophy, especially those dealing with neo-Confucianism, in addition to the works listed here.

  • Chan, Wing-tsit. A Sourcebook in Chinese Philosophy. Princeton: Princeton University Press, 1963.
    • Translates a selection of Zhang’s works, focusing on Correcting Ignorance.
  • Chan, Wing-tsit, trans. Reflections on Things at Hand: The Neo-Confucian Anthology Compiled by Chu Hsi and Lü Tsu-chien. New York: Columbia University Press, 1967.
    • This probably contains the most extensive collection of Zhang’s writings in English. Chan includes a finding list to help the reader find the selections of a particular philosopher.
  • Chow, Kai-wing. “Ritual, Cosmology, and Ontology: Chang Tsai’s Moral Philosopy.” Philosophy East and West 43.2 (April 1993): 201-28.
    • Emphasizes the importance of ritual in moral development.
  • Huang, Siu-chi. “Chang Tsai’s Concept of Ch’i.” Philosophy East and West 18.4 (October 1968): 247-60.
  • Huang, Siu-chi. “The Moral Point of View of Chang Tsai.” Philosophy East and West 21.2 (April 1971): 141-56.
  • Kasoff, Ira. The Thought of Chang Tsai. Cambridge: Cambridge University Press, 1984.
    • This is the only English-language monograph on Zhang’s philosophy.
  • T’ang, Chün-i. “Chang Tsai’s Theory of Mind and Its Metaphysical Basis.” Philosophy East and West 6.2 (July 1956): 113-36.

Author Information

David Elstein
Email: davidelstein@world.oberlin.edu
State University of New York at New Paltz
U. S. A.

Xunzi (Hsün Tzu, c. 310—c. 220 B.C.E.)

xunziXunzi, along with Confucius and Mencius, was one of the three great early architects of Confucian philosophy. In many ways, he offers a more complete and sophisticated defense of Confucianism than Mencius. Xunzi lived toward the end of the Warring States period (453-221 BCE), generally regarded as the formative era for most later Chinese philosophy. It was a time of great variety of thought, comparable to classical Greece, so Xunzi was acquainted with many competing ideas. In reaction to some of the other thinkers of the time, he articulated a systematic version of Confucianism that encompasses ethics, metaphysics, political theory, philosophy of language, and a highly developed philosophy of education. Xunzi is known for his belief that ritual is crucial for reforming humanity’s original nature. Human nature lacks an innate moral compass, and left to itself falls into contention and disorder, which is why Xunzi characterizes human nature as bad. Ritual is thus an integral part of a stable society. He focused on humanity’s part in creating the roles and practices of an orderly society, and gave a much smaller role to Heaven or Nature as a source of order or morality than most other thinkers of the time. Although his thought was later considered to be outside of Confucian orthodoxy, it was still very influential in China and remains a source of interest today. (See Romanization systems for Chinese terms.)

Table of Contents

  1. Life and Work
  2. The Way and Heaven
  3. Human Nature, Education, and the Ethical Ideal
    1. Human Nature
    2. Education
    3. The Ethical Ideal
    4. Discovering the Way
    5. The Heart
  4. Logic and Language
  5. Social and Political Thought
    1. Government structure
    2. Ritual and Music
    3. Moral Power
  6. References and Further Reading

1. Life and Work

Xunzi (“Master Xun”) is the common appellation for the philosopher whose full name was Xun Kuang. He is also known as Xun Qing, “Minister Xun,” after an office he held. He was born in the state of Zhao in north-central China around 310 BCE. As a young man he studied in the state of Qi in the northeast, which had the greatest concentration of philosophers of the age. Xunzi’s writings show him to be well acquainted with all the doctrines current at the time, which he probably came in contact with during this period of his life. Leaving Qi, he traveled to many of the other states that made up China at the time, and was briefly employed by some of them. His last post ended when his patron was assassinated in 238 BCE, ending his chances to put his theories of government into practice. Xunzi may have lived to see China unified by the authoritarian state of Qin in 221 BCE. If so, he certainly must have been disappointed that two of his former students, Li Si and Han Feizi, helped counsel Qin to victory when the Qin government was steadfastly opposed to Xunzi’s ideas of government through moral power. The Qin dynasty was long remembered as a time of strict laws and draconian punishments, and Xunzi’s association with two of its architects probably was one factor in the later marginalization of his thought.

Like most philosophical works of the time, the Xunzi that we have today is a later compilation of writings associated with him, not all of which were necessarily written by Xunzi himself. The current version of the Xunzi is divided into thirty-two books, about twenty-five of which are considered mostly or wholly authentic and others of which are considered representative of his thought, if not his actual writings. This is probably the largest collection of early Chinese philosophical writings that can be plausibly attributed to one author. The Xunzi is also notable for its style. Comparatively little of it is written in the dialogue format of works like the Mencius, and there are none of the fanciful parables of the Zhuangzi. Most books normally attributed to Xunzi are sustained essays on one topic that appear to have be written as more or less unified pieces, though there are often sections of verse and two books that are merely compilations of poetry. In these writings, Xunzi carefully defines his own position and raises objections to rival thinkers in a way that renders his work more recognizable as philosophy than that of many other early Chinese thinkers.

2. The Way and Heaven

The most important concept in Xunzi’s philosophy is the Way (dao). This is one of the most common terms of Chinese philosophy, though all thinkers define it somewhat differently. Though the term originally referred to a road or path, it became extended to a way of doing things, a way of acting, or as it was used in philosophy, the right way to live. In Xunzi’s case, he means the human way, the way of good government and the proper way of behaving, not the Way of Heaven or Nature as Laozi and Zhuangzi define it, and as Mencius often suggests. In fact, Xunzi is notable for having probably the most rationalistic view of Heaven and the supernatural in the early period. Xunzi claims that the Way was first pointed out by particularly wise and gifted people he calls sages (a common term for an exemplar in early Chinese thought), and following the Way as it has been handed down from the past will result in a stable, prosperous, peaceful society, while going against it will have the opposite results. While certain aspects of the Way, such as particular rituals, are certainly created by humanity, whether the Way as a whole is created or discovered remains a matter of scholarly debate.

Unlike many other early philosophers, Xunzi does not believe Heaven gets involved in human affairs. Heaven was sometimes considered to be an anthropomorphic god, sometimes an impersonal force that automatically rewarded the good and punished the bad, but in Xunzi’s view Heaven is much like Nature: it acts as it always does, neither helping the good or harming the bad. The Way is not the Way because Heaven approves of it, it is the Way because it is good for people. In the chapter “Discourse on Heaven” (chapter 17, also translated as “Discourse on Nature”), Xunzi devotes himself to refuting these other views of Heaven, most prominently that of the Mohists. Heaven does not reward good kings with peace and prosperity, nor punish tyrants by having them deposed. These results come about through their own good or bad decisions. Having a good harvest and sufficient food is not a sign of Heaven’s favor, it is the result of wise agricultural policy. Similarly, events like eclipses and floods are not signs of Heaven’s displeasure: they are simply things that sometimes happen. One might wonder at them as unusual occurrences, but it is not right to be afraid of them or consider them ominous. Worrying about Heaven’s favor is a waste of time; it is better to be prepared for whatever might happen. There will be some natural disasters, but if one is prepared they will not cause harm.

Interestingly, though Xunzi has this rational view of Nature, which extends to spirits and gods as well, he never suggests eliminating religious rituals that are directed toward them, such as sacrifices and divination. One must perform them as part of the ritual system that binds society together, but one does not perform expecting any results. In “Discourse on Heaven,” Xunzi wrote, “You pray for rain and it rains. Why? For no particular reason, I say. It is just as though you had not prayed for rain and it rained anyway.” When it rains after you pray for rain, it is just like when it rains when you didn’t pray for it. Yet during a drought, officials must still pray for rain—not because it has any effect on the natural world, but because of its effect on people. What Xunzi believes ritual does will be examined later.

In Xunzi’s view, the best thing to do is understand what Nature does and what humanity does, and concentrate on the latter. Not only is it wrong to believe that Heaven intervenes in human affairs, it is useless to speculate about why Nature is the way it is or to try to help it along. Xunzi is interested in practical knowledge, and speculation about Nature is not useful. In this respect, he could be considered anti-metaphysical, since he has no interest in how the world works or what it is. His concern is what people should do, and anything that might confuse or detract from that is a waste of time. We know that Nature is invariable, and we know the Way to get what we need from Nature to live, and that is all we need to know. This kind of division between knowledge of the human world and knowledge of Heaven may have been partially influenced by Zhuangzi, but while Zhuangzi considers knowing Heaven to be important, Xunzi does not.

3. Human Nature, Education, and the Ethical Ideal

a. Human Nature

As Mencius is known for the slogan “human nature is good,” Xunzi is known for its opposite, “human nature is bad.” Mencius viewed self-cultivation as developing natural tendencies within us. Xunzi believes that our natural tendencies lead to conflict and disorder, and what we need to do is radically reform them, not develop them. Both shared an optimism about human perfectability, but they viewed the process quite differently. Xunzi envisioned that humanity was once in a state of nature reminiscent of Hobbes. Without study of the Way, people’s desires will run rampant, and they will inevitably find themselves in conflict in trying to satisfy their desires. Left to themselves, people will fall into disorder, poverty and conflict, living a life that would be, as Hobbes put it, “poor, nasty, brutish, and short.” It was this insistence that human nature is bad that was most often condemned by later thinkers, who rejected Xunzi’s view in favor of the idea, traced to Mencius, that people are naturally good.

Xunzi offers several arguments against Mencius’s position. He defines human nature as what is inborn and does not need to be learned. He argues that if people were good by nature, there would be no need for ritual and social norms. The sages would not have had to create them, and they would not need to have been handed down through the generations. They were created precisely because people do not act in accordance with them naturally. He also notes that people desire the good, and on the principle that one desires what one doesn’t already have, this shows that people are not good. He gives several illustrations of what life is like in the state of nature, without any education on ritual and morality. Xunzi does not believe that people are evil, that they deliberately violate the rules of morality, taking a perverse pleasure in doing so. They have no natural conception of morality at all: they are morally blind by nature. Their desires bring them into conflict because they don’t know any better, not because they enjoy conflict. In fact, Xunzi believes people do not enjoy it at all, which is why they desire the kind of life that results from good order brought about through the rituals of the sages.

Like Mencius, Xunzi believed human nature is the same in everyone: no one starts off with moral principles. The original nature of Yao (a legendary sage king) and Jie (a legendary tyrant) was the same. The difference was in how they cultivated themselves. Yao reformed his original nature, Jie did not. In this way, Xunzi emphasizes the essential perfectability of everyone. Human nature is bad, but it is not incorrigible, and in fact Xunzi was rather optimistic about the possibility of overcoming the demands of desires that result in the state of nature. Though Confucius suggests that some people are better off by nature than others, Mencius and Xunzi seem to agree that everyone starts out the same, though they differ on the content of that original state. Though Xunzi believes that it is always possible to reform oneself, he recognizes that in reality this will not always happen. In most cases, the individual himself has to make the first step in attempting to reform, and Xunzi is rather pessimistic about people actually doing this. They cannot be forced to do so, and they may in practice be unable to make the choice to improve, but for Xunzi, this does not mean that in principle it is impossible for them to change.

b. Education

Like Confucius and Mencius, Xunzi is much more concerned with what kind of person to be than with rules of moral behavior or duty, and in this respect his view is similar to Western virtue ethics. The goal of Xunzi’s ethics is to become a person who knows and acts according to the Way as if it were second nature. Because human nature is bad, Xunzi emphasizes the importance of study to learn the Way. He compares the process of reforming one’s nature to making a pot out of clay or straightening wood with a press-frame. Without the potter, the clay would never become a pot on its own. Similarly, people will not be able to reform their nature without a teacher showing them what to do. Xunzi’s concern is primarily moral education; he wants people to develop into good people, not people who know a lot of facts. He emphasizes the transformative aspect of education, where it changes one’s basic nature. Xunzi laid out a program of study based on the works of the sages of the past that would teach proper ritual behavior and develop moral principles. He was the first to offer an organized Confucian curriculum, and his curriculum became the blueprint for traditional education in China until the modern period.

Practice was an important aspect of Xunzi’s course of education. A student did not simply study ritual, he performed it. Xunzi recognized that this performative aspect was crucial to the goal of transforming one’s nature. It was only through practice that one could realize the beauty of ritual, ideally coming to appreciate it for itself. Though this was the end of education, Xunzi appealed to more utilitarian motives to start the student on the program of study. As noted above, he discussed how desires would inevitably be frustrated in the state of nature. Organizing society through ritual was the only way people could ever satisfy even some of their desires, and study of ritual was the best way to achieve satisfaction on a personal level. Through study and practice, one could learn to appreciate ritual for its own sake, not just as a means to satisfy desires. Ritual has this power to transform someone’s motives and character. The beginning student of ritual is like a child learning to play the piano. Maybe she doesn’t enjoy playing the piano at first, but her parents take her out for ice cream after each lesson, so she goes along with it because she gets what she wants. After years of study and practice, she might learn to appreciate playing the piano for its own sake, and will practice even without any reward. This is what Xunzi imagines will happen to the dedicated student of ritual: he starts out studying ritual as a means, but it becomes an end in itself as part of the Way.

c. The Ethical Ideal

Xunzi often distinguishes three stages of progress in study: the scholar, the gentlemen, and the sage, though sometimes the sage and the gentleman seem to be equivalent for him. These were all terms in common use in philosophical discourse of the time, especially in Confucian thought, but Xunzi gives them a unique twist. He describes the achievements of each stage slightly differently in several places, but what he seems to mean is that a scholar is someone who has taken the first step of wishing to study the Way of the ancient sages and adopts them as the model for correct conduct; the gentleman has acquired a good deal of learning, but still must think about what the right thing to do is in a situation; and the sage has wholly internalized the principles of ritual and morality so that his action flows spontaneously without the need for thought, yet never goes beyond the bounds of what is proper. Using the piano analogy, the scholar has made up his mind to study the piano and is practicing basic scales. The gentleman is fairly skilled, but still needs to look at the music in front of him to know what to play. The sage is like a concert pianist who not only plays with perfect technique, but also adds his own style and unique interpretation of the music, accomplishing all this without ever consciously thinking about what notes to play. As the pianist is still playing someone else’s music, the sage does not make up new standards of conduct; he still follows the Way, but he makes it his own. Yet even then, at this highest stage, Xunzi believes there is still room for learning. Study is a lifelong process that only ends at death, much as concert pianists must still practice to maintain their skills.

The teacher plays an extremely important role in the course of study. A good teacher does not simply know the rituals, he embodies them and practices them in his own life. Just as one would not learn piano from someone who had just read a book on piano pedagogy but never touched an actual instrument, one should not study from someone who has only learned texts. A teacher is not just a source of information; he is a model for the student to look up to and a source of inspiration of what to become. A teacher who does not live up to the Way of the sages in his own life is no teacher at all. Xunzi believes there is no better method of study than learning from such a teacher. In this way, the student has a model before of him of how to live ritual principles, so his learning does not become simple accumulation of facts. In the event that such a teacher is unavailable, the next best method is to honor ritual principles sincerely, trying to embody them in oneself. Without either of these methods, Xunzi believes learning degenerates into memorizing a jumble of facts with no impact on one’s conduct.

d. Discovering the Way

Given Xunzi’s insistence on the importance of teachers to transmit the Way of the sages of the past and his belief that people are all bad by nature, he must face the question of how the first sages discovered the Way. Xunzi uses the metaphor of a river ford for the true Way: without the people who have gone before to leave markers, those coming after would have no way of knowing where the deep places are, and they would be in danger of drowning. The question is, how did the first people get across safely, when there were no markers? Xunzi does not address the question in precisely this way, but we can piece together an answer from his writings.

Examining the analogies Xunzi uses is instructive here. He talks about cultivating moral principles as a process of crafting, using the metaphors of a potter shaping and firing clay into a pot, or using a press-frame to straighten a bent piece of wood. Just as the skill of making pottery was undoubtedly accumulated through generations of refining, Xunzi appears to think that the Way of the sages was also a product of generations of development. According to Xunzi’s definition of human nature, no one could say people know how to make pots by nature: this is not something we can do without study and practice, like walking and talking are. Nevertheless, some people, through a combination of perseverance, talent, and luck, were able to discover how to make pots, and then taught that skill to others. Similarly, through generations of observing humanity and trying different ways of regulating society, the sages hit upon the correct Way, the best way to order society in Xunzi’s view. David Nivison has suggested that different sages of the past contributed different aspects of the Way: some discovered agriculture, some discovered fire, some discovered the principles of filiality and respect between husband and wife, and so on.

Xunzi views these achievements as products of the sage’s acquired nature, not his original nature. This is another way of saying these are not products of people’s natural tendencies, but the results of study and experimentation. Accumulation of effort is an important concept for Xunzi. The Way of the sages was created through accumulation of learning what worked and benefited society. The sages built on the accomplishments of previous sages, added their own contributions, and now Xunzi believes the process is basically complete: we know the ritual principles that will produce a harmonious society. Trying to govern or become a moral person without studying the sages of the past is essentially trying to re-invent the wheel, or discover how to make pots on one’s own without learning from a potter. It is conceivable (though Xunzi is very skeptical about anyone actually being able to do it), but it is much more difficult and time-consuming, when all one has to do is study what has already been created.

e. The Heart

In addition to having a teacher, a critical requirement for study is having the proper frame of mind, or more precisely, heart, since early Chinese thought considered cognition to be located in the heart. Xunzi’s philosophy of the heart draws from other contemporary views as well as Confucian philosophy. Like Mencius, Xunzi believed that the heart should be the lord of the body, and using the heart to direct desires and decide on right and wrong accords with the Way. However, like Zhuangzi, Xunzi emphasizes that the heart must be tranquil and concentrated to be able to learn. In the view of the heart basically shared by Xunzi and Mencius, desires are not wholly voluntary. Desires are part of human nature, and can be activated without our necessarily being conscious of them. The function of the heart is to regulate the sense faculties and parts of the body, so that though one may have desires, the heart only acts on those desires when it is right to do so. The heart controls itself and directs the other parts of the body. This ability of the heart is what allows humanity to create ritual and moral principles and escape the state of nature.

In the chapter “Dispelling Blindness” Xunzi discusses the right way to develop the heart to avoid falling into error. For study, the heart needs to be trained to be receptive, focused, and calm. These qualities of the heart allow it to know the Way, and knowing the Way, the heart can realize the benefits of the Way and practice it. This receptivity Xunzi calls emptiness, meaning the ability of the heart to continually store new information without becoming full. Focus is called unity, by which Xunzi means the ability to be aware of two aspects of a thing or situation without allowing them to interfere with each other. “Being of two hearts” was a common problem in Chinese philosophical writings: it could mean being confused or perplexed about something, as well as what we would call being two-faced. Xunzi addresses the first aspect with his discussion of unity, a focus that keeps the heart directed and free from perplexity. The final quality the heart needs is stillness, the quality of moving freely from task to task without disorder, remaining unperturbed while processing new information. A heart that has the qualities of emptiness, unity, and stillness can understand the Way. Without these qualities, the heart is liable to fall into various kinds of “blindness” or obsessions that Xunzi attributes to his philosophical rivals. Their hearts focus too much on just one aspect of the Way, so they are unable to see the big picture. They become obsessed with this one part and mistake it for the entirety of the Way. Only with the proper attitudes and control of one’s heart can one perceive and grasp the Way as a whole.

4. Logic and Language

One subject that was certainly not part of Xunzi’s program of study is logic. Other philosophers, particularly the Mohist school, were developing sophisticated views on logic and the principles of argumentation around Xunzi’s time, and other thinkers were known for their paradoxes that played with language to show its limits. Though Xunzi was undoubtedly influenced by the principles of argument developed by the Mohists, he had no patience for the dialectical games and disputation for its own sake that were popular at the time. According to one story, a philosopher, having just convinced a king through his arguments, then took the other side and persuaded the king that his earlier arguments were wrong. Such exercises in argument and rhetoric were a waste of time for Xunzi; the only correct use of argument was to convince someone of the truth. Even the work of trying to distinguish logical categories was not productive in his view. According to Xunzi, such work can accomplish something, but it is still not the province of the gentlemen, much as wondering about the workings of nature are not the gentlemen’s concern, either. The only proper object of study is the Way of the sages; anything else is at best useless and at worst detrimental to the Way.

Despite his professed disinterest in logic, Xunzi came up with the most detailed philosophy of language in early Confucian thought. Again, however, his primary concern was preserving the Way in the face of attacks, which in Xunzi’s view included questions about the nature of language that were arising at the time. He defended a modified conventionalism concerning language: names were not intrinsically appropriate for the objects they referred to, but once usage was determined by convention, to depart from it is wrong. It would be a mistake to think of Xunzi’s view as a kind of nominalism, however, since he is very clear that there is an objective reality that names refer to. The particular phonemes used to make the word “cat” in language are conventionally determined, but the fact that a cat is a kind of feline is real. One of the fundamental principles of Confucianism was that the reality must match the name. Confucian thinkers were most concerned about the names of social roles: a father must act like a father should, a ruler must act like a ruler should. Not fulfilling the demands of one’s role means that one does not deserve the title, hence Mencius defined the removal of a tyrant as the killing of a commoner, not regicide. Xunzi defended this view, yet he objected to the Mohists, who claimed that a robber is not a person, so that killing a robber is not killing a person. This kind of usage violated the principles of correct naming and departed from the Way, though Xunzi is not entirely clear why. In Xunzi’s view, the reality represented by a name is objective, even if the name is merely conventional. Because of the objectivity of referent, he distinguishes appropriate (following convention) and inappropriate (violating convention) uses of names. In addition, he believes there are good and bad names. Good names are simple and direct and readily bring the referent to mind. Using names in a way that the referents are clear is using names correctly. The chief function of language is to communicate, and anything that interferes with communication, such as the word games and paradoxes of other philosophers, should be eliminated.

5. Social and Political Thought

a. Government structure

The Warring States period, during which Xunzi lived, was a time of great social change and instability. As the name implies, it was a period of disunity, when several different states were warring with each other to determine who would gain control of all of China and found a new dynasty. Under the pressure of competition, the old ways and political systems were being abandoned in the search for greater control over human and material resources and increased military power. The central question for most philosophers of the time was how to respond to this time of instability and achieve a greater measure of order and safety. For the Confucian philosophers, the answer was found in a revival of the ways of the past, and for Xunzi in particular, the most important aspect of that was the ritual system. In this sense, the ethical and political aspects of Xunzi’s philosophy are the core areas, and in fact were not sharply distinguished in most Confucian thought. Metaphysics and philosophy of language serve to further the goal of restoring social stability.

All of the Warring States philosophers assumed that the government should be a monarchy. The king was the ultimate authority in all areas of government, having full power to hire and dismiss (and execute) any other government official. There was no idea of democracy in early China. The ruler could lose his state through failing in his duties as a sovereign, but he could not be replaced at the whim of the people. The political thinkers of the time instead tried to impose checks through tradition and thought, rather than law. The Mohists made Heaven the watchdog over the ruler: if a ruler offended Heaven by mistreating the people, Heaven would have him removed through war or revolt. The Confucians also emphasized the duties of the ruler to the people, though in Xunzi’s case there was no personified Heaven watching over things. One of the functions of ritual was to try to put limits on the power of the ruler and emphasize his obligation to the people. Confucian thinkers, including Xunzi often viewed the state as a family. Just as a father must take care of his children, the ruler must take care of the people, and in return, the people will respond with loyalty. The Confucians also offered a very practical motive to care for the people: if the people were dissatisfied with the ruler, they would not fight on his behalf, and the state would be ripe for annexation by its neighbors.

b. Ritual and Music

Xunzi diagnosed the main cause of disorder as a breakdown of the social hierarchy. When hierarchical distinctions are confused and people do not follow their proper roles, they compete indiscriminately to satisfy their desires. The way to put limits on this competition is to clarify social distinctions: such as between ruler and subject, between older brother and younger brother, or between men and women. When everyone knows their place and what obligations and privileges they have, they will not contend for goods beyond their status. Not only will this result in order and stability, it actually will allow for greater satisfaction of everyone’s desires than the competition of the state of nature. This is the primary purpose of ritual: to clarify and enforce social distinctions, which will bring an end to contention for limited resources and improve social order. This, in turn, will ensure greater prosperity. The ritual tradition not only emphasized reciprocal obligations between people of different status, it had extremely precise regulations concerning who was allowed to own what kind of luxuries. There were rules concerning what colors of clothing different people could wear, who was allowed to ride in carriages, and what grave goods they could be buried with. The point of all these rules is to enforce the distinctions necessary for social harmony and prevent people from reaching beyond their station.

Without the benefit of ritual principles to enforce the social hierarchy, the identity of human nature makes conflict inevitable. By nature we all desire the same things: fine food, beautiful clothing, wealth, and comfort. Xunzi believes desires are inevitable. When most people see something beautiful, they will desire it: only the sage can control his desires. Because of limited resources, it is impossible for everyone to satisfy their desires for material goods. What people can do is decide whether to act on a desire or not. Ritual teaches people to channel, moderate, and in some cases transform their desires so they can satisfy them in appropriate ways. When it is right to do so one satisfies them, and when that is not possible one moderates them. This allows both the partial satisfaction of desires and the maintenance of social harmony. All of this is made possible by the ritual principles of the Way, when the alternative is the chaos of the state of nature. Hence, Xunzi wrote that Confucian teachings allow people to satisfy both the demands of ritual and their desires, when the alternative is satisfying neither.

Another important part of governing is music. The ancient Chinese believed that music was the most direct and effective way of influencing the emotions. Hence, only allowing the correct music to be played was crucial to governing the state. The right kinds of music, those attributed to the ancient sages, could both give people an outlet for emotions that could not be satisfied in other ways, like aggression, and channel their emotions and bring them in line with the Way. The wrong kind of music would instead encourage wanton, destructive behavior and cause a breakdown of social order. Because of its powerful effect on the emotions, music is as important a tool as ritual in moral education and in governing. Much as Plato suggested in the Republic, Xunzi believes regulating music is one of the duties of the state. It must promulgate the correct music to give people a legitimate source of emotional expression and ban unorthodox music to prevent it from upsetting the balance of society.

c. Moral Power

As he does with virtuous people, Xunzi distinguishes different levels of rulers. The lowest is the ruler who relies on military power to expand his territory, taxes excessively without regard for whether his people have enough to sustain themselves, and keeps them in line with laws and punishments. According to Xunzi, such a ruler is sure to come to a bad end. A ruler who governs efficiently, does not tax the people too harshly, gathers people of ability around him, and makes allies of the neighboring states can become a hegemon. The institution of the hegemon existed briefly about three hundred years before Xunzi’s time, but he often uses the term to connote an effective ruler who is still short of the highest level. The highest level is that of the true king who wins the hearts of the people through his rule by ritual principles. The moral power of the true king is so great that he can unify the whole country without a single battle, since the people will come to him of their own accord to live under his beneficent rule. According to Xunzi, this is how the sage kings of the past were able to unify the country even though they began as rulers of small states. The best kind of government is government through the moral power acquired by following the Way.

This concept of moral power was quite old in China even in Xunzi’s time, though initially it referred to the power gained from the spirits through sacrifice. Beginning with Confucius, it become ethicized into a kind of power or charisma that anyone who cultivated virtue and followed the Way developed. Through this moral power, a king could rule effectively without having to personally attend to the day-to-day business of governing. Following his example, the people would become virtuous as well, so crime would be minimal, and the ruler’s subordinates could carry out the necessary administrative tasks to run the state. In Confucian thought, the most important role of the ruler is that of moral example, which is why the best government was that of a sage who followed the ritual principles of the Way. Confucius seemed to believe that the moral power of a sage king would render laws and punishments completely unnecessary: the people would be transformed by the ruler’s moral power and never transgress the boundaries of what is right. Xunzi, while still believing in the efficacy of rule through moral force, is not quite as optimistic, which is likely related to his view on human nature. He thinks punishments will still be necessary because some people will break the law, but a sage king will only rarely need to employ punishments to keep the people in line, while a lord-protector or ordinary ruler will have to resort to them much more. This increased acceptance of the necessity for punishments may have influenced Xunzi’s student Han Feizi, to whom is attributed the most developed theory of government through a strict system of rewards and punishments that was employed by the short-lived Qin dynasty.

6. References and Further Reading

  • Cua, Antonio S. Ethical Argumentation: A Study in Hsün Tzu’s Moral Epistemology. Honolulu: University of Hawaii Press, 1985.
  • Dubs, Homer H. Hsüntze: Moulder of Ancient Confucianism. London: Arthur Probsthain, 1927. The first English-language monograph on Xunzi’s thought.
  • Goldin, Paul. Rituals of the Way. Chicago: Open Court, 1999. A good overview of the essentials of Xunzi’s thought.
  • Ivanhoe, Philip J. Confucian Moral Self Cultivation. Indianapolis: Hackett, 2000. An introduction to Confucian thought, focusing on the theme of self cultivation. Includes a chapter on Xunzi.
  • Kline, T.C. III and Philip J. Ivanhoe, eds. Virtue, Nature, and Moral Agency in the Xunzi. Indianapolis: Hackett, 2000. An excellent anthology bringing together much of the recent important work on Xunzi. The bibliography includes virtually every English publication related to Xunzi.
  • Knoblock, John, trans. Xunzi: A Translation and Study of the Complete Works, 3 vols. Stanford: Stanford University Press, 1988, 1990, 1994. The only complete English translation of the Xunzi, with extensive introductory material.
  • Machle, Edward. Nature and Heaven in the Xunzi: A Study of the Tian Lun. Albany: SUNY Press, 1993. A translation and study of chapter seventeen, “Discourse on Heaven.”
  • Watson, Burton, trans. Hsün Tzu: Basic Writings. New York: Columbia University Press, 1964. An excerpted translation, including many of the more philosophically interesting chapters. It is easier for non-specialists than Knoblock.

Author Information

David Elstein
Email: davidelstein@world.oberlin.edu
State University of New York at New Paltz
U. S. A.

Xuanzang (Hsüan-tsang) (602—664)

xuanzangXuanzang, world-famous for his sixteen-year pilgrimage to India and career as a translator of Buddhist scriptures, is one of the most illustrious figures in the history of scholastic Chinese Buddhism. Born into a scholarly family at the outset of the Tang (T’ang) Dynasty, he enjoyed a classical Confucian education. Under the influence of his elder brother, a Buddhist monk, however, he developed a keen interest in Buddhist subjects and soon became a monk himself at the age of thirteen. Upon his return to Chang’an in 645, Xuanzang brought back with him a great number of Sanskrit texts, of which he was able to translate only a small portion during the remainder of his lifetime. In addition to his translations of the most essential Mahayana scriptures, Xuanzang authored the Da tang xi yu ji (Ta-T’ang Hsi-yu-chi or Records of the Western Regions of the Great T’ang Dynasty) with the aid of Bianji (Bian-chi). It is through Xuanzang and his chief disciple Kuiji (K’uei-chi) (632-682) that the Faxiang (Fa-hsiang or Yogacara/Consciousness-only) School was initiated in China. In order to honor the famous Buddhist scholar, the Tang Emperor Gaozong (Gao-tsung) cancelled all audiences for three days after Xuanzang’s death. (See Romanization systems for Chinese terms.)

Table of Contents

  1. Xuanzang’s Beginnings (602-630)
  2. Pilgrimage to India (630-645)
  3. His Return to China and Career as Translator (645-664)
  4. The Faxiang School
    1. The Development of Yogacara
    2. Metaphysics of Mere-Consciousness
    3. Some Objections Answered
    4. The Vijnaptimatratasiddhi-sastra
    5. Faxiang Doctrines
  5. Conclusion
  6. References and Further Reading

1. Xuanzang’s Beginnings (602-630)

Born of a family possessing erudition for generations in Yanshi prefecture of Henan province, Xuanzang, whose lay name was Chenhui, was the youngest of four children. His great-grandfather was an official serving as a prefect, his grand-father was appointed as Professor in the National College at the capital, and his father was a Confucianist of the rigid conservative type who gave up office and withdrew into seclusion to escape the political turmoil that gripped China at that time. According to traditional biographies, Xuanzang displayed a precocious intelligence and seriousness, amazing his father by his careful observance of the Confucian rituals at the age of eight. Along with his brothers and sister, he received an early education from his father, who instructed him in classical works on filial piety and several other canonical treatises of orthodox Confucianism.

After the death of Xuanzang’s father in 611, his older brother Chensu, later known as Changjie, became the primary influence on his life. As a result, he commenced visiting the monastery of Jingtu at Luoyang where his brother dwelled as a Buddhist monk, and studying sacred texts of the faith with all the ardor of a young convert. When Xuanzang requested to take Buddhist orders at the age of thirteen, the abbot Zheng Shanguo made an exception in his case because of his precocious sapience.

In 618, due to the civil war breaking out in Henan, Xuanzang and his brother sought refuge in the mountains of Sichuan, where he spent three years or so in the monastery of Kong Hui plunging into the study of various Buddhist texts, such as the Abhidharmakosa-sastra (Abhidharma Storehouse Treatise. In 622, he was fully ordained as a monk. Deeply confused by myriad contradictions and discrepancies in the texts, and not receiving any solutions from his Chinese masters, Xuanzang decided to go to India and study in the cradle of Buddhism.

2. Pilgrimage to India (630-645)

An imperial decree by the Emperor Taizong (T’ai-tsung) forbade Xuanzang’s proposed visit to India on the grounds on preserving national security. Instead of feeling deterred from his long-standing dream, Xuanzang is said to have experienced a vision that strengthened resolve. In 629, defying imperial proscription, he secretly set out on his epochal journey to the land of the Buddha from Chang’an.

Xuanzang reports that he travelled by night, hiding during the day, enduring many dangers, and bereft of a guide after being abandoned by his companions. After some time in the Gobi Desert, he arrived in Liangzhou in modern Gansu province, the westernmost extent of the Chinese frontier at that time and the southern terminus of the Silk Road trade route connecting China with Central Asia. Here he spent approximately a month preaching the Buddhist message before being invited to Hami by King Qu Wentai (Ch’u Wen-tai) of Turfan, a pious Buddhist of Chinese extraction.

It soon became apparent to Xuanzang that Qu Wentai, although most hospitable and respectful, planned to detain him for life in his Court as its ecclesiastical head. In response, Xuanzang undertook a hunger strike until the king relented, extracting from Xuanzang a promise to return and spend three years in the kingdom upon his return. After remaining there for a month more for the sake of the dharma, Xuanzang resumed his journey in 630, well provided with introductions to all the kings on his itinerary, including the formidable Turkish Khan whose power extended to the very gates of India. Having initially left China against the will of the Emperor, he was no longer an unknown fugitive fleeing in secret, but an accredited pilgrim with official standing.

At long last, Xuanzang reached his ultimate destination, where his strongest personal interest in Buddhism was located and the principal portion of his time abroad was spent: the Nalanda monastery, located southwest of the modern city of Bihar in northern Bihar state. As a far-famed metropolis of Buddhist monastic education, Nalanda was a veritable monastic city consisting of some ten huge temples with spaces between divided into eight compounds, surrounded by a high wall. There were over ten thousand Mahayana monks there engaged in the study of the orthodox Buddhist canon as well as the Vedas, arithmetic, and medicine. According to legend, Silabhadra (529-645), abbot of Nalanda, was considering suicide after years of wasting illness when he received instructions from deities in a dream, commanding him to endure and await the arrival of a Chinese monk in order to guarantee the preservation of the Mahayana tradition abroad. Indeed, Xuanzang became Silabhadra’s disciple in 636 and was initiated into the Yogacara lineage of Mahayana learning by the venerable abbot. While at Nalanda, Xuanzang also studied Sanskrit and Brahmana philosophy. Subsequent studies in India included hetu-vidya (logic), the exegesis of Mahayana texts such as the Mahayana-sutralamkara (Treatise on the Scripture of Adorning the Great Vehicle), and Madhyamika (“Middle-ist”) doctrines.

The name of the Madhyamika School, founded by Nagarjuna (2nd century CE), derives from its having sought a middle position between the realism of the Sarvastivada (Doctrine That All Is Real) School and the idealism of the Yogacara (Mind Only) School. Xuanzang appears to have combined these two systems into each other in a more eclectic and comprehensive Mahayanism. With the approval of his Nalanda mentors, Xuanzang composed a treatise, Hui zong lun (Hui-tsüng-lun or On the Harmony of the Principles), which articulates his synthesis.

At Nalanda, Xuanzang became a critic of two major philosophical systems of Hinduism opposed to Buddhism: the Samkhya and the Vaiseshika. The former was based upon a dualism of Nature and Spirit. The latter was a realist system, immediate and direct in its realism, resting upon the acceptance of the data of consciousness and experience as such: in brief, it was a melding of monism and atomism. Such beliefs were in absolute contradiction to the acosmic idealism of the Buddhist Yogacara, which evenly repelled the substantial entity of the ego and the objective existence of matter. Xuanzang also critiqued the atheistic monism of the Jains, especially inveighing against what he saw as their caricature of Buddhism in terms of Jain monastic garb and iconography.

Xuanzang’s success in religious and philosophical disputes evidently aroused the attention of some Indian potentates, including the King of Assam and the poet-cum-dramatist king Harsha (r. 606-647), who was regarded as a Buddhist patron saint upon the throne like Ashoka and Kanishka of old. An eighteen-day religious assembly was convoked in Harsha’s capital of Kanauj during the first week of the year 643, during which Xuanzang allegedly defeated five hundred Brahmins, Jains, and heterodox Buddhists in spirited debate.

Following these public successes in India, Xuanzang resolved to return to China by way of Central Asia. He followed the caravan-track that led across the Pamirs to Dunhuang. In the spring of 644, he reached Khotan and awaited a reply to his request for return addressed to the Emperor Taizong. In the month of November, Xuanzang left for Dunhuang by a decree of the Emperor, and arrived in the Chinese capital Chang’an the first month of the Chinese Lunar Year 645.

3. His Return to China and Career as Translator (645-664)

Traditional sources report that Xuanzang’s arrival in Chang’an was greeted with an imperial audience and an offer of official position (which Xuanzang declined), followed by an assembly of all the Buddhist monks of the capital city, who accepted the manuscripts, relics, and statues brought back by the pilgrim and deposited them in the Temple of Great Happiness. It was in this Temple that Xuanzang devoted the rest of his life to the translation of the Sanskrit works that he had brought back out of the wide west, assisted by a staff of more than twenty translators, all well-versed in the knowledge of Chinese, Sanskrit, and Buddhism itself. Besides translating Buddhist texts and dictating the Da tang xi yu ji in 646, Xuanzang also translated the Dao de jing (Tao-te Ching) of Laozi (Lao-tzu) into Sanskrit and sent it to India in 647.

His translations may, by and large, be divided into three phases: the first six years (645-650), focusing on the Yogacarabhumi-sastra; the middle ten years (651-660), centering on the Abhidharmakosa-sastra; and the last four years (661-664), concentrating upon the Maha-prajnaparamita-sutra. In each phase of his career as a translator, Xuanzang saw his task as introducing Indian Buddhist texts to Chinese audiences in all their integrity. According to Thomas Watters, the total number of texts brought by Xuanzang from India to China is six hundred and fifty seven, enumerated as follows:

Mahayanist sutras: 224 items
Mahayanist sastras: 192
Sthavira sutras, sastras and Vinaya: 14
Mahasangika sutras, sastras and Vinaya: 15
Mahisasaka sutras, sastras and Vinaya: 22
Sammitiya sutras, sastras and Vinaya: 15
Kasyapiya sutras, sastra and Vinaya: 17
Dharmagupta sutras, Vinaya, sastras: 42
Sarvastivadin sutras, Vinaya, sastras: 67
Yin-lun (Treatises on the science of Inference): 36
Sheng-lun (Etymological treatises): 13

4. The Faxiang School

a. The Development of Yogacara

The Chinese Faxiang School, derived from the Indian Yogacara (yoga practice) School, is based upon the writings of two brothers, Asanga and Vasubandhu, who explicated a course of practice wherein hindrances are removed according to a sequence of stages, from which it gets its name. The appellation of the school originated with the title of an important fourth- or fifth-century CE text of the school, the Yogacarabhumi-sastra. Yogacara attacked both the provisional practical realism of the Madhyamika School of Mahayana Buddhism and the complete realism of Theravada Buddhism. Madhyamika is regarded as the nihilistic or Emptiness School, whereas Yogacara is seen as the realistic or Existence School. While the former is characterized as Mahayana due to its central theme of emptiness, the latter might be considered to be semi-Mahayana to a point for three basic reasons: (1) the Yogacara remains realistic like the Abhidharma School; (2) it expounds the three vehicles side by side without being confined to the Bodhisattvayana; and (3) it does not accent the doctrine of Buddha nature.

The other name of the school, Vijnanavada (Consciousness-affirming/Doctrine of Consciousness), is more descriptive of its philosophical position, which in short is that the reality a human being perceives does not exist. Yogacara becomes much better known, nevertheless, not for its practices, but for its rich development in psychological and metaphysical theory. The Yogacara thinkers took the theories of the body-mind aggregate of sentient beings that had been under development in earlier Indian schools such as the Sarvastivada, and worked them into a more fully articulated scheme of eight consciousnesses, the most weighty of which was the eighth, or store consciousness — the alaya-vijnana.

The Yogacara School is also known for the development of other key concepts that would hold great influence not merely within their system, but within all forms of later Mahayana to come. They embody the theory of the three natures of the dependently originated, completely real, and imaginary, which are understood as a Yogacara response to the Madhyamika’s truth of emptiness. Yogacara is also the original source for the theory of the three bodies of the Buddha, and greatly expands the notions of categories of elemental constructs.

Yogacara explored and propounded basic doctrines that were to be fundamental in the future growth of Mahayana and that influenced the rise of Tantric Buddhism. Its central doctrine is that only consciousness (vijnanamatra; hence the name Vijnanavada) is real, and that mind is the ultimate reality. In other words, external objects do not exist; nothing exists outside the mind. The common view that external phenomena exist is due to a misconception that is removable through a meditative or yogic process, which brings a complete withdrawal from these fictitious externals, and an inner concentration and tranquility may accordingly be bodied forth.

Yogacara is an alternative system of Buddhist logic. According to it, the object is not at all as it seems, and thus can not be of any service to knowledge. It is therefore unreal when consciousness is the sole reality. The object is only a mode of consciousness. Its appearance although objective and external is in fact the transcendental illusion, because of which consciousness is bifurcated into the subject-object duality. Consciousness is creative and its creativity is governed by the illusive idea of the object. Reality is to be viewed as an Idea or a Will. This creativity is manifested at different levels of consciousness.

Since this school believes that only ideation exists, it is also called the Idealistic School. In China, it was established by Xuanzang and his principal pupil Kuiji who systematized the teaching of his masters recorded in two essential works: the Fa yuan i lin zhang (Fa-yuan i-lin-chang or Chapter on the Forest of Meanings in the Garden of Law) and the Cheng wei shi lun shu ji (Ch’eng wei-shih lun shu-chi or Notes on the Treatise on the Completion of Ideation Only). On account of the school’s idealistic accent it is known as Weishi (Wei-shih) or Ideation Only School; yet because it is concerned with the specific character of all the dharmas, it is often called the Faxiang School as well. Besides, this school argues that not all beings possess pure seeds and, therefore, not all of them are capable of attaining Buddhahood.

The central concept of this school is borrowed from a statement by Vasubandhu — idam sarvam vijnaptimatrakam, “All this world is ideation only.” It strongly claims that the external world is merely a fabrication of our consciousness, that the external world does not exist, and that the internal ideation presents an appearance as if it were an outer world. The whole external world is, hence, an illusion according to it.

b. Metaphysics of Mere-Consciousness

Broadly speaking, Mere-Consciousness may cover the eight consciousnesses, the articulation of which forms one of the most seminal and distinctive aspects of the doctrine of the Yogacara School, transmitted to East Asia where it received the somewhat pejorative designations of Dharma-character School and Consciousness-only School. According to this doctrine, sentient beings possess eight distinct layers of consciousness, the first five — the visual consciousness, auditory consciousness, olfactory consciousness, gustatory consciousness, and tactile consciousness — corresponding to the sense perceptions, the sixth discriminatory consciousness to the thinking mind, the seventh manas consciousness to the notion of ego, and the eighth alaya-consciousness to the repository of all the impressions from one’s past experiences. As the first seven of these arise on the basis of the eighth, they are called the transformed consciousnesses. In contrast, the eighth is known as the base consciousness, store consciousness, or seed consciousness. And in particular, it is this last consciousness that the Mere-Consciousness is all about.

One of the foremost themes discussed in the school is the

alaya-vijnana or storehouse consciousness, which stores and coordinates all the notions reflected in the mind. Thus, it is a storehouse where all the pure and contaminated ideas are blended or interfused. This principle might be illustrated by the school’s favorite citation:

“A seed produces a manifestation,
A manifestation perfumes a seed.
The three elements (seed, manifestation, and perfume) turn on and on,
The cause and effect occur at one and the same time.”

It is the doctrine of consciousness or mind as the basis for so-called “external” objects that gave the Cittamatra (Mind Only) tradition its name. Apparently external objects are constituted by consciousness and do not exist apart from it. Vasubandhu began his

Vimsatika vijnapti-matrata-siddhih (Twenty Verses on Consciousness-only) by stating: “All this is only perception, since consciousness manifests itself in the form of nonexistent objects.” There is only a flow of perceptions. This flow, however, really exists, and it is mental by nature, as in terms of the Buddhist division of things it has to be either mental or physical. The flow of experiences could barely be a physical or material flow. There might be a danger in calling this “idealism,” because it is rather dissimilar from forms of idealism in Western philosophy, in which it is deemed necessary for a newcomer to negate and transcend previous theories and philosophies through criticism, but the situation in Buddhism, especially Yogacara Buddhism, is such that it developed its doctrines by inheriting the entire body of thought of its former masters. Nonetheless, if “idealism” denotes that subjects and objects are no more than a flow of experiences and perceptions, which are of the same nature, and these experiences, just as perceptions, are mental, then this could be called a form of “dynamic idealism.”

Because this school maintains that no external reality exists, while retaining the position that knowledge exists, assuming knowledge itself is the object of consciousness. It, therefore, postulates a higher storage consciousness, which is the final basis of the apparent individual. The universe consists in an infinite number of possible ideas that lie inactively in storage. Such dormant consciousness projects an interrupted sequence of thoughts, while it itself is in restless flux till the karma, or accumulated consequences of past deeds, blows out. This storage consciousness takes in all the impressions of previous experiences, which shape up the seeds of future karmic action, an illusory force creating outer categories that are actually only fictions of the mind. So illusive a force determines the world of difference and belongs to human nature, sprouting the erroneous notions of an I and a non-I. That duality can only be conquered by enlightenment, which effects the transformation of an ordinary person into a Buddha.

c. Some Objections Answered

Certain objections were interposed to level at Yogacara’s doctrine of consciousness. Vasubhandhu, again in his Vimsatika, undertook to prove the invalidity of some of these:

  • Spatiotemporal determination would be impossible — experiences of object X are not occurrent everywhere and at every time so there must be some external basis for our experiences.
  • Many people experience X and not just one person, as in the case of a hallucination.
  • Hallucinations can be determined because they do not possess pragmatic results. It does not follow that entities, which we generally accept as real, can be placed in the same class.

In reply, Vasubandhu argued that these were after all no objections; they simply failed to show that perception-only as a teaching was beyond the limits of what could be concretely reasoned. Spatiotemporal determination can be elucidated on the analogy of dream experience, where a complete and surreal world is created with objects appearing to have spatiotemporal localization despite the fact that they do not exist apart from the mind which is cognizing them. Moreover, the second objection can be met by recourse to the wider Buddhist religious framework. The hells and their tortures, which are taught by Buddhist beliefs as the result of wicked deeds, and to be endured for a very long time till purified, are experienced as the collective fruit of the previous karmas done by those hell inmates. The torturers of hell obviously can not really exist, otherwise they would have been reborn in hell themselves and would too experience the sufferings associated with it. If this were the case then how could they jovially inflict sufferings upon their fellow inmates? Thus they must be illusive, and yet they are experienced by a number of people. Finally, as in a dream objects bear some pragmatic purpose within that dream, and likewise in hell, so in everyday life. Furthermore, as physical activity can be directed toward unreal objects in a dream owing, it is said, to nervous irritation on the part of the dreamer, so too in daily life.

e. The Vijnaptimatratasiddhi-sastra

Representing a two-hundred-year development within the Vijnanavadin tradition subsequent to the Lankavatara Sutra (Sutra on the Buddha’s Entering the Country of Lanka) and being the primary text of the Faxiang School, the Vijnaptimatratasiddhi-sastra is an exhaustive study of the alaya-vijnana and the sevenfold development of the manas, manovijnana, and the five sensorial consciousnesses. As a creative and elaborate exposition of Vasubandhu’s Trimsika-vijnapti-matrata-siddhi (Treatise in Thirty Stanzas on Consciousness Only) rendered by Xuanzang in 648 at Great Happiness Monastery, it synthesizes the ten most significant commentaries written on it, and becomes the enchiridion of the new Faxiang School of Buddhist idealism. It is mainly a translation by Xuanzang in 659 of Dharmapala’s commentary on the Trimsika-vijnapti-matrata-siddhi, yet it also contains edited translations of other masters’ works on the same verses. This is the only translation by Xuanzang that is not a direct translation of a text, but instead a selective and evaluative editorial drawing on ten distinct texts. Since Kuiji aligned himself with this text as assuming the role of Xuanzang’s successor, the East Asian tradition has treated the Vijnaptimatratasiddhi-sastra as the pivotal exemplar of Xuanzang’s teachings.
In both style and content, the Vijnaptimatratasiddhi-sastra symbolizes a superior advance over the earlier Lankavatara Sutra, a basic Faxiang School’s canonical text that sets forth quite a few hallmarks of Mahayana position, such as the eight consciousnesses and the tathagatagarbha (Womb of the Buddha-to-be). Instead of bearing the latter’s cryptically aphoristic form, Xuanzang’s treatise is a detailed and coherent analysis, a scholastic apologetics on the doctrine of Consciousness-only. Without any reference to the tathagatagarbha itself, the Vijnaptimatratasiddhi-sastra firmly grounds its pan-consciousness upon Absolute Suchness or the existence of the mind as true reality. Aside from human consciousness, another principle is accepted as real — the so-called suchness, which is the equivalent of the void of the Madhyamika School.

The Vijnaptimatratasiddhi-sastra spells out how there can be a common empirical world for different individuals who ideate or construct particular objects, and who possess distinct bodies and sensory systems. According to Xuanzang, the universal “seeds” in the store consciousness account for the common appearance of things, while particular “seeds” make a description of the differences.

f. Faxiang Doctrines

Being a first and foremost idealistic school of Mahayana Buddhism, the Faxiang School categorically discerns chimerical phenomena manifested in consistent patterns of regularity and continuity; in order to justify this order in which only defiled elements could prevail before enlightenment is attained, it created the tenet of the alaya-vijnana. Sense perceptions are commanded as regular and coherent by a store of consciousnesses, of which one is consciously unaware. Then, sense impressions produce certain configurations in this insensibility that “perfumate” later impressions so that they appear consistent and regular. Each and every single one of beings possesses this seed consciousness, which therefore becomes a sort of collective consciousness that takes control of human perceptions of the world, though this world does not exist at all according to the very tenet. This school’s forerunner had emerged in India roughly the second century AD, yet had its period of greatest productivity in the fourth century, during the time of Asanga and Vasubandha. Following them, the school divided into two branches, the Nyayanusarino Vijnanavadinah (Vijnanavada School of the Logical Tradition) and the Agamanusarino Vijnanavadinah (Vijnanavada School of the Scriptural Tradition), with the former sub-school postulating the standpoints of the logician Dignaga (c. AD 480-540) and his successor, Dharmakirti (c. AD 600?-680?).

This consciousness-oriented school of ideology was largely represented in China by the Faxiang School, called Popsang in Korea, and Hosso in Japan. The radical teachings of Yogacara became known in China primarily through a work of Paramartha, a sixth-century Indian missionary-translator. His rendition of the Mahayana-samparigraha-sastra (Compendium of the Great Vehicle) by Asanga provided a sound base for the Sanlun (Three-Treatise) School, which preceded the Faxiang School as the vehicle of Yogacara thought in China. Faxiang is the Chinese translation of the Sanskrit term dharmalaksana (characteristic of dharma), referring to the school’s basal emphasis on the unique characteristics of the dharmas that make up the world, which appears in human ideation. According to Faxiang doctrines, there are five categories of dharmas: (1) eight mental dharmas, encompassing the five sense consciousnesses, cognition, the cognitive faculty, and the store consciousness; (2) eleven elements relating to appearances or material forms; (3) fifty-one mental capacities or functions, activities, and dispositions; (4) twenty-four situations, processes, and things not associated with the mind — for example, time and becoming; and (5) six non-conditioned or non-created elements — for instance, space and the nature of existence.

Alayaconsciousness is posited as the receptacle of the imprint of thoughts and deeds, thus it is the dwelling of sundry karmic seeds. These “germs” develop into form, feeling, perception, impulse, and consciousness, collectively known as the Five Aggregates. Then ideation gradually takes shape, which triggers off a self or mind against an outer world. Finally comes the awareness of the objects of thought via sense perceptions and ideas. The store consciousness must be purified of its subject-object duality and notions of false existence, and restored to its pure state tantamount to buddhahood, the Absolute Suchness, and the undifferentiated. In line with these three elements of false imagination, right knowledge, and suchness is the three modes in which things respectively are: (1) the mere fictions of false imagination; (2) under certain conditions to relatively exist; and (3) in the perfect mode of being. Corresponding to this threefold version of the modes of existence is the tri-body doctrine of the Buddha — the Dharma Body, the Reward Body, and the Response Body, a creed that was put into its systematic and highly developed theory by Yogacara thinkers. The distinguishing features of the Faxiang School lie in its highlight of meditation and broadly psychological analyses. Seen in this light, it is a fry cry from the other predominant Mahayana stream, Madhyamika, where the stress is entirely upon dialectics and logical arguments.

The base consciousness is interpreted as the container of the karmic impressions or seeds, nourished by us beings in the process of our existence. These seeds, ripening in the course of future circumstances, find the nearest parallel to the present-day understanding of genes. In view of the foregoing, philosophers of this school have constantly essayed to explain in detail how karmic force actually operates and affects us on a concrete, personal level. Comprised in this development of consciousness theory is the concept of conscious justification — phenomena that are presumably external to us can never exist but in intimate association with consciousness itself. Such a notion is commonly referred to as “Mind Only.”

The fundamental early canonical texts that expound Yogacara doctrines are such scriptures as the (Sutra on Understanding Profound and Esoteric Doctrine, the Srimala-sutra (Sutra on the Lion’s Roar of Queen Srimala), and treatises like the Mahayana-samparigraha-sastra, the Prakaranaryavaca-sastra (Acclamation of the Scriptural Teaching), and the Yogacarabhumi, etc.

5. Conclusion

As an early and influential Chinese Buddhist monk, Xuanzang embodies the tensions inherent in Chinese Buddhism: filial piety versus monastic discipline, Confucian orthodoxy versus Mahayana progressivism, etc. Such tensions can be seen not only in his personal legacies, which include the extremely popular Chinese novel based on his travels, Xiyouji (Journey to the West), but also in the career of scholastic Buddhism in China.

For a time during the middle of the Tang Dynasty the Faxiang School achieved a high degree of eminence and popularity across China, but after the passing of Xuanzang and Kuiji the school swiftly declined. One of the factors resulting in this decadence was the anti-Buddhist imperial persecutions of 845. Another likely factor was the harsh criticism of Faxiang by members of the Huayan (Hua-yen) School. In addition, the philosophy of this school, with its abstruse terminology and hairsplitting analysis of the mind and the senses, was too alien to be accepted by the practical-minded Chinese.

6. References and Further Reading

  • Bapat, P. V., and K. A. Nilakanta Sastri, eds. 2500 Years of Buddhism. Delhi: Government of India Press, 1964.
  • Bernstein, Richard. Ultimate Journey: Retracing the Path of an Ancient Buddhist Monk Who Crossed Asia in Search of Enlightenment: Alfred A. Knopf, 2001.
  • Brown, Brian Edward. The Buddha Nature: A Study of the Tathagatagarbha and Alayavijnana. Delhi: Motilal Banarsidass, 1991.
  • Ch’en, Kenneth K. S. Buddhism in China: A Historical Survey. Princeton: Princeton University Press, 1973.
  • Chatterjee, Ashok Kumar. The Yogacara Idealism. Delhi: Motilal Banarsidass, 1987.
  • The Unknown Hsuan-Tsang. Oxford: Oxford University Press, 2001.Edkins, Joseph. Chinese Buddhism: A Volume of Sketches, Historical, Descriptive and Critical. San Francisco: Chinese Materials Center, 1976.
  • Grousset, Rene. In the Footsteps of the Buddha. San Francisco: Chinese Materials Center, 1976.
  • Hwui Li. The life of Hiuen-Tsiang. London: Kegan Paul, Trench, and Trubner, 1911.
  • Kieschnick, John. The Eminent Monk: Buddhist Ideals in Medieval Chinese Hagiography. Honolulu: University of Hawaii Press, 1997.
  • Lan Ji-fu, ed. The Chung-hwa Fo Jian Bai Ke Quan Shu: Religious Affairs Committee of Foguangshan Buddhist Order, 1993.
  • Lusthaus, Dan. Buddhist Phenomenology: A Philosophical Investigation of Yogacara Buddhism and the Ch’eng Wei-shih lun. London: Routledge Curzon, 2002.
  • Nagao, Gadjin M. Madhyamika and Yogacara: A Study of Mahayana Philosophies. Albany: State University of New York Press, 1991.
  • Pachow, W. Chinese Buddhism: Aspects of Interaction and Reinterpretation. Lanham, MD: University Press of America, 1980.
  • Sharf, Robert H. Coming to Terms with Chinese Buddhism: A Reading of the Treasure Store Treatise. Honolulu: University of Hawaii Press, 2002.
  • Waley, Arthur. The Real Tripitaka, and Other Pieces. London: George Allen & Unwin, 1952.
  • Watters, Thomas. On Yuan Chwang’s Travels in India: A. D. 629-645. Delhi: Munshiram Manoharlal, 1996.
  • William, Paul, Mahayana Buddhism (The Doctrinal Foundations). London: Routledge, 1991.
  • Wriggins, Sally Hovey. Xuanzang: A Buddhist Pilgrim on the Silk Road. Boulder: Westview Press, 1996.

Author Information

Der Huey Lee
Email: leederhuey@hotmail.com
Peking University
China

Validity and Soundness

A deductive argument is said to be valid if and only if it takes a form that makes it impossible for the premises to be true and the conclusion nevertheless to be false. Otherwise, a deductive argument is said to be invalid.

A deductive argument is sound if and only if it is both valid, and all of its premises are actually true. Otherwise, a deductive argument is unsound.

According to the definition of a deductive argument (see the Deduction and Induction), the author of a deductive argument always intends that the premises provide the sort of justification for the conclusion whereby if the premises are true, the conclusion is guaranteed to be true as well. Loosely speaking, if the author’s process of reasoning is a good one, if the premises actually do provide this sort of justification for the conclusion, then the argument is valid.

In effect, an argument is valid if the truth of the premises logically guarantees the truth of the conclusion. The following argument is valid, because it is impossible for the premises to be true and the conclusion nevertheless to be false:

Elizabeth owns either a Honda or a Saturn.
Elizabeth does not own a Honda.
Therefore, Elizabeth owns a Saturn.

It is important to stress that the premises of an argument do not have actually to be true in order for the argument to be valid. An argument is valid if the premises and conclusion are related to each other in the right way so that if the premises were true, then the conclusion would have to be true as well. We can recognize in the above case that even if one of the premises is actually false, that if they had been true the conclusion would have been true as well. Consider, then an argument such as the following:

All toasters are items made of gold.
All items made of gold are time-travel devices.
Therefore, all toasters are time-travel devices.

Obviously, the premises in this argument are not true. It may be hard to imagine these premises being true, but it is not hard to see that if they were true, their truth would logically guarantee the conclusion’s truth.

It is easy to see that the previous example is not an example of a completely good argument. A valid argument may still have a false conclusion. When we construct our arguments, we must aim to construct one that is not only valid, but sound. A sound argument is one that is not only valid, but begins with premises that are actually true. The example given about toasters is valid, but not sound. However, the following argument is both valid and sound:

In some states, no felons are eligible voters, that is, eligible to vote.
In those states, some professional athletes are felons.
Therefore, in some states, some professional athletes are not eligible voters.

Here, not only do the premises provide the right sort of support for the conclusion, but the premises are actually true. Therefore, so is the conclusion. Although it is not part of the definition of a sound argument, because sound arguments both start out with true premises and have a form that guarantees that the conclusion must be true if the premises are, sound arguments always end with true conclusions.

It should be noted that both invalid, as well as valid but unsound, arguments can nevertheless have true conclusions. One cannot reject the conclusion of an argument simply by discovering a given argument for that conclusion to be flawed.

Whether or not the premises of an argument are true depends on their specific content. However, according to the dominant understanding among logicians, the validity or invalidity of an argument is determined entirely by its logical form. The logical form of an argument is that which remains of it when one abstracts away from the specific content of the premises and the conclusion, that is, words naming things, their properties and relations, leaving only those elements that are common to discourse and reasoning about any subject matter, that is, words such as “all,” “and,” “not,” “some,” and so forth. One can represent the logical form of an argument by replacing the specific content words with letters used as place-holders or variables.

For example, consider these two arguments:

All tigers are mammals.
No mammals are creatures with scales.
Therefore, no tigers are creatures with scales.

All spider monkeys are elephants.
No elephants are animals.
Therefore, no spider monkeys are animals.

These arguments share the same form:

All A are B;
No B are C;
Therefore, No A are C.

All arguments with this form are valid. Because they have this form, the examples above are valid. However, the first example is sound while the second is unsound, because its premises are false. Now consider:

All basketballs are round.
The Earth is round.
Therefore, the Earth is a basketball.

All popes reside at the Vatican.
John Paul II resides at the Vatican.
Therefore, John Paul II is a pope.

These arguments also have the same form:

All A’s are F;
X is F;
Therefore, X is an A.

Arguments with this form are invalid. This is easy to see with the first example. The second example may seem like a good argument because the premises and the conclusion are all true, but note that the conclusion’s truth isn’t guaranteed by the premises’ truth. It could have been possible for the premises to be true and the conclusion false. This argument is invalid, and all invalid arguments are unsound.

While it is accepted by most contemporary logicians that logical validity and invalidity is determined entirely by form, there is some dissent. Consider, for example, the following arguments:

My table is circular. Therefore, it is not square shaped.

Juan is a bachelor. Therefore, he is not married.

These arguments, at least on the surface, have the form:

x is F;
Therefore, x is not G.

Arguments of this form are not valid as a rule. However, it seems clear in these particular cases that it is, in some strong sense, impossible for the premises to be true while the conclusion is false. However, many logicians would respond to these complications in various ways. Some might insist–although this is controverisal–that these arguments actually contain implicit premises such as “Nothing is both circular and square shaped” or “All bachelors are unmarried,” which, while themselves necessary truths, nevertheless play a role in the form of these arguments. It might also be suggested, especially with the first argument, that while (even without the additional premise) there is a necessary connection between the premise and the conclusion, the sort of necessity involved is something other than “logical” necessity, and hence that this argument (in the simple form) should not be regarded as logically valid. Lastly, especially with regard to the second example, it might be suggested that because “bachelor” is defined as “adult unmarried male”, that the true logical form of the argument is the following universally valid form:

x is F and not G and H;
Therefore, x is not G.

The logical form of a statement is not always as easy to discern as one might expect. For example, statements that seem to have the same surface grammar can nevertheless differ in logical form. Take for example the two statements:

(1) Tony is a ferocious tiger.
(2) Clinton is a lame duck.

Despite their apparent similarity, only (1) has the form “x is a A that is F.” From it one can validly infer that Tony is a tiger. One cannot validly infer from (2) that Clinton is a duck. Indeed, one and the same sentence can be used in different ways in different contexts. Consider the statement:

(3) The King and Queen are visiting dignitaries.

It is not clear what the logical form of this statement is. Either there are dignitaries that the King and Queen are visiting, in which case the sentence (3) has the same logical form as “The King and Queen are playing violins,” or the King and Queen are themselves the dignitaries who are visiting from somewhere else, in which case the sentence has the same logical form as “The King and Queen are sniveling cowards.” Depending on which logical form the statement has, inferences may be valid or invalid. Consider:

The King and Queen are visiting dignitaries. Visiting dignitaries is always boring. Therefore, the King and Queen are doing something boring.

Only if the statement is given the first reading can this argument be considered to be valid.

Because of the difficulty in identifying the logical form of an argument, and the potential deviation of logical form from grammatical form in ordinary language, contemporary logicians typically make use of artificial logical languages in which logical form and grammatical form coincide. In these artificial languages, certain symbols, similar to those used in mathematics, are used to represent those elements of form analogous to ordinary English words such as “all”, “not”, “or”, “and”, and so forth. The use of an artificially constructed language makes it easier to specify a set of rules that determine whether or not a given argument is valid or invalid. Hence, the study of which deductive argument forms are valid and which are invalid is often called “formal logic” or “symbolic logic.”

In short, a deductive argument must be evaluated in two ways. First, one must ask if the premises provide support for the conclusion by examing the form of the argument. If they do, then the argument is valid. Then, one must ask whether the premises are true or false in actuality. Only if an argument passes both these tests is it sound. However, if an argument does not pass these tests, its conclusion may still be true, despite that no support for its truth is given by the argument.

Note: there are other, related, uses of these words that are found within more advanced mathematical logic. In that context, a formula (on its own) written in a logical language is said to be valid if it comes out as true (or “satisfied”) under all admissible or standard assignments of meaning to that formula within the intended semantics for the logical language. Moreover, an axiomatic logical calculus (in its entirety) is said to be sound if and only if all theorems derivable from the axioms of the logical calculus are semantically valid in the sense just described.

For a more sophisticated look at the nature of logical validity, see the articles on “Logical Consequence” in this encyclopedia. The articles on “Argument” and “Deductive and Inductive Arguments” in this encyclopedia may also be helpful.

Author Information

The author of this article is anonymous. The IEP is actively seeking an author who will write a replacement article.

Special Relativity: Proper Time, Coordinate Systems, and Lorentz Transformations

This supplement to the main Time article explains some of the key concepts of the Special Theory of Relativity (STR). It shows how the predictions of STR differ from classical mechanics in the most fundamental way. Some basic mathematical knowledge is assumed.

Table of Contents

  1. Proper Time
  2. The STR Relationship Between Space, Time, and Proper Time
  3. Coordinate Systems
    1. Coordinates as a Mathematical Language for Time and Space
  4. Cartesian Coordinates for Space
  5. Choice of Inertial Reference Frame
  6. Operational Specification of Coordinate Systems for Classical Space and Time
  7. Operational Specification of Coordinate Systems for STR Space and Time
  8. Operationalism
  9. Coordinate Transformations and Object Transformations
  10. Valid Transformations
  11. Velocity Boosts in STR and Classical Mechanics
  12. Galilean Transformation of Coordinate System
  13. Lorentz Transformation of Coordinate System
  14. Time and Space Dilation
  15. The Full Special Theory of Relativity
  16. References and Further Reading

1. Proper Time

EinsteinThe essence of the Special Theory of Relativity (STR) is that it connects three distinct quantities to each other: space, time, and proper time. ‘Time’ is also called coordinate time or real time, to distinguish it from ‘proper time’. Proper time is also called clock time, or process time, and it is a measure of the amount of physical process that a system undergoes. For example, proper time for an ordinary mechanical clock is recorded by the number of rotations of the hands of the clock. Alternatively, we might take a gyroscope, or a freely spinning wheel, and measure the number of rotations in a given period. We could also take a chemical process with a natural rate, such as the burning of a candle, and measure the proportion of candle that is burnt over a given period.

Note that these processes are measured by ‘absolute quantities’: the number of times a wheel spins on its axis, or the proportion of candle that has burnt. These give absolute physical quantities and do not depend upon assigning any coordinate system, as does a numerical representation of space or real time. The numerical coordinate systems we use firstly require a choice of measuring units (meters and seconds, for example). Even more importantly, the measurement of space and real time in STR is relative to the choice of an inertial frame. This choice is partly arbitrary.

Our numerical representation of proper time also requires a choice of units, and we adopt the same units as we use for real time (seconds). But the choice of a coordinate system, based on an inertial frame, does not affect the measurement of proper time. We will consider the concept of coordinate systems and measuring units shortly.

Proper time can be defined in classical mechanics through cyclic processes that have natural periods – for instance, pendulum clocks are based on counting the number of swings of a pendulum. More generally, any natural process in a classical system runs through a sequence of physical states at a certain absolute rate, and this is the ‘proper time rate’ for the system.

In classical physics, two identical types of systems (with identical types of internal construction, and identical initial states) are predicted to have the same proper time rates. That is, they will run through their physical states in perfect correlation with each other.

This holds even if two identical systems are in relative constant motion with respect to each other. For instance, two identical classical clocks would run at the same rate, even if one is kept stationary in a laboratory, while the other is placed in a spaceship traveling at high speed.

This invariance principle is fundamental to classical physics, and it means that in classical physics we can define: Coordinate time = Proper time for all natural systems. For this reason, the distinction between these two concepts of time was hardly recognized in classical physics (although Newton did distinguish them conceptually, regarding ‘real time’ as an absolute temporal flow, and ‘proper time’ as merely a ‘sensible measure’ of real time; see his Scholium).

However, the distinction only gained real significance in the Special Theory of Relativity, which contradicts classical physics by predicting that the rate of proper time for a system varies with its velocity, or motion through space. The relationship is very simple: the faster a system travels through space, the slower its internal processes go. At the maximum possible speed, the speed of light, c, the internal processes in a physical system would stop completely. Indeed, for light itself, the rate of proper time is zero: there is no ‘internal process’ occurring in light. It is as if light is ‘frozen’ in a specific internal state.

At this point, we should mention that the concept of proper time appears more strongly in quantum mechanics than in classical mechanics, through the intrinsically ‘wave-like’ nature of quantum particles. In classical physics, single point-particles are simple things, and do not have any ‘internal state’ that represents proper time, but in quantum mechanics, the most fundamental particles have an intrinsic proper time, represented by an internal frequency. This is directly related to the wave-like nature of quantum particles. For radioactive systems, the rate of radioactive decay is a measure of proper time. Note that the amount of decay of a substance can be measured in an absolute sense. For light, treated as a quantum mechanical particle (the photon), the rate of proper time is zero, and this is because it has no mass. But for quantum mechanical particles with mass, there is always a finite ‘intrinsic’ proper time rate, represented by the ‘phase’ of the quantum wave. Classical particles do not have any correlate of this feature, which is responsible for quantum interference effects and other non-classical ‘wave-like’ behavior.

2. The STR Relationship Between Space, Time, and Proper Time

STR predicts that motion of a system through space is directly compensated by a decrease in real internal processes, or proper time rates. Thus, a clock will run fastest when it is stationary. If we move it about in space, its rate of internal processes will decrease, and it will run slower than an identical type of stationary clock. The relationship is precisely specified by the most profound equation of STR, usually called the metric equation (or line metric equation). The metric equation is:

(1) coordinate systems 1

This applies to the trajectory of any physical system. The quantities involved are:

D is the difference operator.

Dt is the amount of proper time elapsed between two points on the trajectory.

Dt is the amount of real time elapsed between two points on the trajectory.

Dr is the amount of motion through space between two points on the trajectory.

c is the speed of light, and depends on the units we choose for space and time.

The meaning of this equation is illustrated by considering simple trajectories depicted in a space-time diagram.

Figure 1. Two simple space-time trajectories.

Figure 1. Two simple space-time trajectories.

If we start at a initial point on the trajectory of a physical system, and follow it to a later point, we find that the system has covered a certain amount of physical space, Dr, over a certain amount of real time, Dt, and has undergone a certain amount of internal process or proper-time, Dt. As long as we use the same units (seconds) to represent proper time and real time, these quantities are connected as described in Equation (1). Proper time intervals are shown in Figure 1 by blue dots along the trajectories. If these were trajectories of clocks, for example, then the blue dots would represent seconds ticked off by the clock mechanism.

In Figure 1, we have chosen to set the speed of light as 1. This is equivalent to using our normal units for time, i.e. seconds, but choosing the units for space as c meters (instead of 1 meter), where c is the speed of light in meters per second. This system of units is often used by physicists for convenience, and it appears to make the quantity c drop out of the equations, since c = 1. However, it is important to note that c is a dimensional constant, and even if its numerical value is set equal to 1 by choosing appropriate units, it is still logically necessary in Equation 1 for the equation to balance dimensionally. For multiplying an interval of time, Dt, by the quantity c converts from a temporal quantity into a spatial quantity. Equations of physics, just like ordinary propositions, can only identify objects or quantities of the same physical kinds with each other, and the role of c as a dimensional constant remains crucial in Equation (1), for the identity it states to make any sense.

Trajectories in Figure 1

  • Trajectory 1 (green) is for a stationary particle, hence Dr = 0 (it has no motion through space), and putting this value in Equation (1), we find that: Dt = Dt. For a stationary particle, the amount of proper time is equal to the amount of coordinate time.
  • Trajectory 2 (red) is for a moving particle, and Dr > 0. We have chosen the velocity in this example to be: v = c/2, half the speed of light. But: v = Dr/Dt (distance traveled in the interval of time). Hence: Dr = ½cDt. Putting this value into Equation (1), we get: c²Dt² = c²Dt²-(½cDt)², or: Dt = Ö(¾)Dt » 0.87Dt. Hence the amount of proper time is only about 87% of coordinate time. Even though this trajectory is very fast, proper time is still only slowed down a little.
  • Trajectory 3 (black) is for a particle moving at the speed of light, with v = c, giving: Dr = cDt. Putting this in Equation (1), we get: c²Dt² = c²Dt²-(cDt)² = 0. Hence for a light-like particle, the amount of proper time is equal to 0.

Now from the classical point of view, Equation (1) is a surprise – indeed, it seems bizarre! For how can mere motion through space directly and precisely affect the rate of physical processes occurring in a system? We are used to the opposite idea, that motion through space, by itself, has no intrinsic effect on processes. This is at the heart of the classical Galilean invariance or symmetry. But STR breaks this rule.

We can compare this situation with classical physics, where (for linear trajectories) we have two independent equations:

(2.a) Dt = Dt

(2.b) Dr = vDt for some coordinate systems 3 (real numbers)

  • Equation (2.a) just means that the rate of proper time in a system is invariant – and we measure it in the same units as coordinate time, t.
  • Equation (2.b) just means that every particle or system has some finite velocity or speed, v, through space, with v defined by: v = Dr/Dt.

There is no connection here between proper time and spatial motion of the system.

The fact that (2) is replaced by (1) in STR is very peculiar indeed. It means that the rate of internal process in a system like a clock (whether it is a mechanical, chemical, or radioactive clock) is automatically connected to the motion of the clock in space. If we speed up a clock in motion through space, the rate of internal process slows down in a precise way to compensate for the motion through space.

The great mystery is that there is no apparent mechanism for this effect, called time dilation. In classical physics, to slow down a clock, we have to apply some force like friction to its internal mechanism. In STR, the physical process of a system is slowed down just by moving it around. This applies equally to all physical processes. For instance, a radioactive isotope decays more slowly at high speed. And even animals, including human beings, should age more slowly if they move around at high speed, giving rise to the Twin Paradox.

In fact, time dilation was already recognized by Lorentz and Poincare, who developed most of the essential mathematical relationships of STR before Einstein. But Einstein formulated a more comprehensive theory, and, with important contributions by Minkowski, he provided an explanation for the effects. The Einstein-Minkowski explanation appeals to the new concept of a space-time manifold, and interprets Equation (1) as a kind of ‘geometric’ feature of space-time. This view has been widely embraced in 20th Century physics. By contrast, Lorentz refused to believe in the ‘geometric’ explanation, and he thought that motion through space has some kind of ‘mechanical’ effect on particles, which causes processes to slow down. While Lorentz’s view is dismissed by most physicists, some writers have persisted with similar ideas, and the issues involved in the explanation of Equation (1) continue to be of deep interest, to philosophers at least.

But before moving on to the explanation, we need to discuss the concepts of coordinate systems for space and time, which we have been assuming so far without explanation.

3. Coordinate Systems

In physics we generally assume that space is a three dimensional manifold and time is a one dimensional continuum. A coordinate system is a way of representing space and time using numbers to represent points. We assign a set of three numbers, (x,y,z), to characterize points in space, and one number, t, to characterize a point in time. Combining these, we have general space-time coordinates: (x,y,z,t). The idea is that every physical event in the universe has a ‘space-time location’, and a coordinate system provides a numerical description of the system of these possible ‘locations’.

Classical coordinate systems were used by Descartes, Galileo, Newton, Leibniz, and other classical physicists to describe space. Classical space is assumed to be a three dimensional Euclidean manifold. Classical physicists added time coordinates, t, as an additional parameter to characterize events. The principles behind coordinate systems seemed very intuitive and natural up until the beginning of the 20th century, but things changed dramatically with the STR. One of Einstein’s first great achievements was to reexamine the concept of a coordinate system, and to propose a new system suited to STR, which differs from the system for classical physics. In doing this, Einstein recognized that the notion of a coordinate system is theory dependent. The classical system depends on adopting certain physical assumptions of classical physics – for instance, that clocks do not alter their rates when they are moved about in space. In STR, some of the laws underpinning these classical assumptions change, and this changes our very assumptions about how we can measure space and time. To formulate STR successfully, Einstein could not simply propose a new set of physical laws within the existing classical framework of ideas about space and time: he had to simultaneously reformulate the representation of space and time. He did this primarily by reformulating the rules for assigning coordinate systems for space and time. He gave a new system of rules suited to the new physical principles of STR, and reexamined the validity of the old rules of classical physics within this new system.

A key feature Einstein focused on is that a coordinate system involves a system of operational principles, which connect the features of space and time with physical processes or ‘operations’ that we can use to measure those features. For instance, the theory of classical space assumes that there is an intrinsic distance (or length) between points of space. We may take distance itself to be an underlying feature of ‘empty space’. Geometric lines can be defined as collections of points in space, and line segments have intrinsic lengths, prior to any physical objects being placed in space. But of course, we only measure (or perceive) the underlying structure of space by using physical objects or physical processes to make measurements. Typically, we use ‘straight rigid rulers’ to measure distances between points of space; or we use ‘uniform, standard clocks’ to measure the time intervals between moments of time. Rulers and clocks are particular physical objects or processes, and for them to perform their measurement functions adequately, they must have appropriate physical properties.

But those physical properties are the subject of the theories of physics themselves. Classical physics, for example, assumes that ordinary rigid rulers maintain the same length (or distance between the end-points) when they are moved around in space. It also assumes that there are certain types of systems (providing ‘idealized clocks’) that produce cyclic physical processes, and maintain the same temporal intervals between cycles through time, even if we move these systems around in space.

These assumptions are internally consistent with principles of measurement in classical physics. But they are contradicted in STR, and Einstein had to reformulate the operational principles for measuring space and time, in a way that is internally consistent with the new physical principles of STR.

We will briefly describe these new operational principles shortly, but there are some features of coordinate systems that are important to appreciate first.

a. Coordinates as a Mathematical Language for Time and Space

The assignment of a numerical coordinate system for time or space is thought of as providing a mathematical language (using numbers as names) for representing physical things (time and space). In a sense, this language could be ‘arbitrarily chosen’: there are no laws about what names can be used to represent things. But naturally there are features that we want a coordinate system to reflect. In particular, we want the assignment of numbers to directly reflect the concepts of distance between points of space, and the size of intervals between moments of time.

We perform mathematical operations on numbers, and we can subtract two numbers to find the ‘numerical distance’ between them. For numbers are really defined as certain structures, with features such as continuity, and we want to use the structures of number systems to represent structural features of space and time.

For instance, we assume in our fundamental physical theory that any two intervals of time have intrinsic magnitudes, which can be compared to each other. The ‘intrinsic temporal distance’ between two moments, t1 and t2, may be the same as that between two quite different moments, t3 and t4. We naturally want to assign numbers to times so that ordinary numerical subtraction corresponds to the ‘intrinsic temporal distance’ between events. We choose a ‘uniform’ coordinate system for time to achieve this.

coordinate systems 4

Figure 2. A Coordinate system for time gives a mathematical language for a physical thing.
Numbers are used as names for moments of time.

4. Cartesian Coordinates for Space

Time is simple because it is one-dimensional. Three-dimensional space is much more complex. Because space is three dimensional, we need three separate real numbers to represent a single point. Physicists normally choose a Cartesian coordinate system to represent space. We represent points in this system as: r = (x,y,z), where x, y, and z are separate numerical coordinates, in three orthogonal (perpendicular) directions.

The numerical structure with real-number points is denoted in mathematics as (x,y,z). Three dimensional space itself (a physical thing) is denoted as: coordinate systems 5. A Cartesian coordinate system is a special kind of mapping between points of these two structures. It makes the intrinsic spatial distance between two points in E3 be directly reflected by the ‘numerical distance’ between their numerical coordinates in coordinate systems 5.

The numerical distances in coordinate systems 5are determined by a numerical function for length. A line from the origin: (0,0,0), to the point r = (x,y,z), which is called the vector r, has its length given by the Pythagorean formula:

|r| = √(x²+y²+z²).

More generally, for any two points, r1 = (x1, y1, z1), and: r2 = (x2, y2, z2), the distance function is:

|r2 – r1| = √((x2 – x1)²+ (y2 – y1)²+ (z2 – z1)²)

The special feature of this system is that the lengths of lines in the x, y, or z directions alone are given directly by the values of the coordinates. E.g. if: r = (x,0,0), then the vector to r is a line purely in the x-direction, and its length is simply: |r| = x. If r1 = (x1,0,0), and: r2 = (x2,0,0), then the distance between them is just: |r2 – r1| = (x2 – x1 ). As well, a Cartesian coordinate system treats the three directions, x, y, and z, in a symmetric way: the angles between any pair of these directions is the same, 900. For this reason, a Cartesian system can be rotated, and the same form of the general distance function is maintained in the rotated system.

In fact, there are spatial manifolds which do not have any possible Cartesian coordinate system – e.g. the surface of a sphere, regarded as a two dimensional manifold, cannot be represented by using Cartesian coordinates. Such spaces were first studied as geometric systems in the 19th century, and are called non-classical or non-Euclidean geometries. However, classical space is Euclidean, and by definition:

  • Euclidean space can be represented by Cartesian coordinate systems.

We can define alternative, non-Cartesian, coordinate systems for Euclidean space; for instance, cylindrical and spherical coordinate systems are very useful in physics, and they use mixtures of linear or radial distance, and angles, as the numbers to specify points of space. The numerical formulas for distance in these coordinate systems appear quite different from the Cartesian formula. But they are defined to give the same results for the distances between physical points. This is the most crucial feature of the concept of distance in classical physics:

  • Distance between points in classical space (or between two events that occur at the same moment of time) is a physical invariant. It does not change with the choice of coordinate system.

The form of the numerical equation for distance changes with the choice of coordinate system; but this is done deliberately to preserve the physical concept of distance.

5. Choice of Inertial Reference Frame

A second crucial concept is the idea of a reference frame. A reference frame specifies all the trajectories that are regarded as stationary, or at rest in space. This defines the property of remaining at the same place through time. But the key feature of both classical mechanics and STR is that no unique reference frame is determined. Any object that is not accelerating can be regarded as stationary ‘in its own inertial frame’. It defines a valid reference frame for the whole universe. This is the natural reference frame ‘from the point of view’ of the object, or ‘relative to the object’. But there are many possible choices because given any particular reference frame, any other frame, defined to give everything a constant velocity relative to the first frame is also a valid choice.

The class of possible (physically valid) reference frames is objectively determined, because acceleration is absolutely distinguished from constant motion. Any object that is not accelerating may be regarded as defining a valid reference frame. But the specific choice of a reference frame from the range of possibilities is regarded as arbitrary or conventional. This choice must be made before a coordinate system can be defined to represent distances in space and time. Even after we have chosen a reference frame, there are still innumerable choices of coordinate systems. But the reference frame settles the definition of distances between events, which must be defined as the same in any coordinate system relative to a given reference frame.

The idea of the conventionality of the reference frame is partly evident already in the choice of a Cartesian coordinate system: for it is an arbitrary matter where we choose the origin, or point: 0 = (0,0,0), for such a system. It is also arbitrary which directions we choose for the x, y, and z axes – as long as we make them mutually perpendicular. We are free to rotate a given set of axes, x, y, z, to produce a new set, x’, y’, and z’, and this gives another Cartesian coordinate system. Thus, translations and rotations of Cartesian coordinate systems for space still leave us with Cartesian systems.

But there is a further transformation, which is absolutely central to classical physics, and involves both time and space. This is the Galilean velocity transformation, or velocity boost. The essential point is that we need to apply a spatial coordinate system through time. In pure classical geometry, we do not have to take time into account: we just assign a single coordinate system, at a single moment of time. But in physics we need to apply a coordinate system for space at different moments of time. How do we know whether the coordinate system we apply at one moment of time represents the same coordinate system we use at a later moment of time?

The principles of classical physics mean that we cannot measure ‘absolute location in space’ across time. The reason is the fundamental classical principle that the laws of nature do not distinguish between two inertial frames moving relative to each other at a constant speed. This is the classical Galilean principle of ‘relativity of motion’. Roughly stated, this means that uniform motion through space has no effect on physical processes. And if motion in itself does not affect processes, then we cannot use processes to detect motion.

Newton believed that the classical conception of space requires there to be absolute spatial locations through time nonetheless, and that some special coordinate systems or physical objects will indeed be at ‘absolute rest’ in space. But in the context of classical physics, it is impossible to measure whether any object is at absolute rest, or is in uniform motion in space. Because of this, Leibniz denied that classical physics requires any concept of absolute position in space, and argued that only the notion of ‘relative’ or ‘relational’ space’ is required. In this view, only the relative positions of objects with regards to each other are considered real. For Newton, the impossibility of measuring absolute space does not prevent it from being a viable concept, and even a logically necessary concept. There is still no general agreement about this debate between ‘absolute’ and ‘relative’ or ‘relational’ conceptions of space. It is one of the great historical debates in the philosophy of both classical and relativistic physics. However, it is generally accepted that classical physics makes absolute space undetectable. This means, at least, that in the context of classical physics there is no way of giving an operational procedure for determining absolute position (or absolute rest) through time.

However absolute acceleration is detectable. Accelerations are always accompanied by forces. This means that we can certainly specify the class of coordinate systems which are in uniform motion, or which do not accelerate. These special systems are called inertial systems, or inertial frames, or Galilean frames. The existence of inertial frames is a fundamental assumption of classical physics. It is also fundamental in STR, and the notion of an inertial frame is very similar in both theories.

The laws of classical physics are therefore specified for inertial coordinate systems. They are equally valid in any inertial frame. The same holds for the laws of STR. However, the laws for transforming from one inertial frame to another are different for the two theories. To see how this works, we now consider the operational specification of coordinate systems.

6. Operational Specification of Coordinate Systems for Classical Space and Time

In classical physics, we can define an ‘operational’ measuring system, which allows us to assign coordinates to events in space and time.

Classical Time. We imagine measuring time by making a number of uniform clocks, synchronizing them at some initial moment, checking that they all run at exactly the same rates (proper time rates), and then moving clocks to different points of space, where we keep them ‘stationary’ in a chosen inertial frame. We subsequently measure the times of events that occur at the various places, as recorded by the different clocks at those places.

Of course, we cannot assume that our system of clocks is truly stationary. The entire system of clocks placed in uniform motion would also define a valid inertial frame. But the laws of classical physics mean that clocks in uniform inertial motion run at exactly the same rates, and so the times recoded for specific events turn out to be exactly the same, on the assumptions of the classical theory, for any such system of clocks.

Classical Space. We imagine measuring space by constructing a set of rigid measuring rods or rulers of the same length, which we can (imaginatively at least) set up as a grid across space, in an inertial frame. We keep all the rulers stationary relative to each other, and we use them to measure the distances between various events. Again, the main complication is that we cannot determine any absolutely stationary frame for the grid of rulers, and we can set up an alternative system of rulers which is in relative motion. This results in assigning different ‘absolute velocities’ to objects, as measured in two different frames. However, on the assumptions of the classical theory, the relative distances between any two objects or events, taken at any given moment of time, is measured to be the same in any inertial frame. This is because, in classical physics, uniform motion in itself does not alter the lengths of material objects, or the forces between systems of objects. (Accelerations do alter lengths).

7. Operational Specification of Coordinate Systems for STR Space and Time

In STR, the situation is in many ways very similar to classical physics: there is still a special concept of inertial frames, acceleration is absolutely detectable, and uniform velocity is undetectable. According to STR, the laws of physics still are invariant with regard to uniform motion in space, very much like the classical laws.

We also specify operational definitions of inertial coordinate systems in STR in a similar way to classical physics. However, the system sketched above for assigning classical coordinates fails, because it is inconsistent with the physical principles of STR. Einstein was forced to reconstruct the classical system of measurement to obtain a system which is internally consistent with STR.

STR Time. In STR, we can still make uniform clocks, which run at the same rates when they are held stationary relative to each other. But now there is a problem synchronizing them at different points of space. We can start them off synchronized at a particular common point; but moving them to different points of space already upsets their synchronization, according to Equation (1).

However, while synchronizing distant clocks is a problem, they nonetheless run at the same intrinsic rates as each other when held in the same inertial frame. And we can ensure two clocks are in a common inertial frame as long as we can ensure that they maintain the same distance from each other. We see how to do this next.

Given we have two clocks maintained at the same distance from each other, Einstein showed that there is indeed a simple operational procedure to establish synchronization. We send a light signal from Clock 1 to Clock 2, and reflect it back to Clock 1. We record the time it was sent on Clock 1 as t0, and the time it was received again as a later time, t2. We also record the time it was received at Clock 2 as t1’ on Clock 2. Now symmetry of the situation requires that, in the inertial frame of Clock 1, we must assume that the light signal reached Clock 2 at a moment halfway between t0 and t1, i.e. at the time: t1 = ½(t2 – t0). This is because, by symmetry, the light signal must take equal time traveling in either direction between the clocks, given that they are kept at a constant distance throughout the process, and they do not accelerate. (If the light signal took longer to travel one way than the other, then light would have to move at different speeds in different directions, which contradicts STR).

Hence, we must resynchronize Clock 2 to make: t1’ = t1. We simply set the hands on Clock 2 forwards by: (t1 – t1’), i.e. by: ½(t2 – t0) – t1’. (Hence, the coordinate time on Clock 2 at t1’ is changed to: t1’ + (½(t2 – t0) – t1’) = ½(t2 – t0) = t1.)

This is sometimes called the ‘clock synchronization convention’, and some philosophers have argued about whether it is justified. But there is no real dispute that this successfully defines the only system for assigning simultaneity in time, in the chosen reference frame, which is consistent with STR.

Some deeper issues arise over the notion of simultaneity that it seems to involve. From the point of view of Clock 1, the moment recorded at: t1 = ½(t2 – t0) must be judged as ‘simultaneous’ with the moment recorded at t1’ on Clock 2. But in a different inertial frame, the natural coordinate system will alter the apparent simultaneity of these two events, so that simultaneity itself is not ‘objective’ in STR, except relative to a choice of inertial frame. We will consider this later.

STR Space. In STR, we can measure space in a very similar way as in classical physics. We imagine constructing a set of rigid measuring rods or rulers, which are checked to be the same length in the inertial frame of Clock 1, and we extend this out into a grid across space. We have to move the rulers around to start with, but when we have set up the grid, we keep them all stationary in the chosen inertial frame of Clock 1.

We then use this grid of stationary measuring rods to measure the distances between various events. The main assumption is that identical types of measuring rods (which are the same lengths when we originally compare them at rest with Clock 1), maintain the same lengths after being moved to different places (and being made stationary again with regard to Clock 1). This feature is required by STR.

The main complication, once again, is that we cannot determine any absolutely stationary frame for the grid of rulers. We can set up an alternative system of rulers, which are all in relative motion in a different inertial frame. As in classical physics, this results in assigning different ‘absolute velocities’ to most trajectories in the two different frames. But in this case there is a deeper difference: on the assumptions of STR, the lengths of measuring rods alter according to their velocities. This is called space dilation, and it is the counterpart of time dilation.

Nonetheless, Einstein showed that perfectly sensible operational definitions of coordinate measurements for length, as well as time, are available in STR. But both simultaneity and length become relative to specified inertial frames.

It is this confusing conceptual problem, which involves the theory dependence of measurement, that Einstein first managed to unravel, as the prelude to showing how to radically reconstruct classical physics.

8. Operationalism

Unraveling this problem requires us to specify ‘operational principles’ of measurement, but this does not require us to embrace an operational theory of meaning. The latter is a form of positivism, and it holds that the meaning of ‘time’ or ‘space’ in physics is determined entirely by specifying the procedures for measuring time or space. This theory is generally rejected by philosophers and logicians, and it was rejected by Einstein himself in his mature work. According to operationalism, STR changes the meanings of the concepts of space and time from the classical conception. However, many philosophers would argue that ‘time’ and ‘space’ have a meaning for us which is essentially the same as for Galileo and Newton, because we identify the same kinds of things as time and space; but relativity theory has altered our scientific beliefs about these things – just as the discovery that water is H2O has altered our understanding of the nature of water, without necessarily altering the meaning of the term ‘water’. This semantic dispute is ongoing in the philosophy of science. Having clarified these basic ideas of coordinate systems and inertial frames, we now turn back to the notion of transformations between coordinate systems for different inertial frames.

9. Coordinate Transformations and Object Transformations

Physics uses two different concepts of transformations. It is important to distinguish these carefully.

  • Coordinate transformations: Taking the description of a given process (such as a trajectory), described in one coordinate system, and transforming to its description in an alternative coordinate system.
  • Object transformations: Taking a given process, described in a given coordinate system, and transforming it into a different process, described in the same coordinate system as the original process.

The difference is illustrated in the following diagram for the simplest kind of transformation, translation of space.

coordinate systems 6

Figure 3. Object, Coordinate, and Combined Transformations.

  • The transformations in Figure 3 are simple space translations.
  • Figure 3 (B) shows an object transformation. The original trajectory (A) is moved in space to the right, by 4 units. The new coordinates are related to the original coordinates by: xnew particle ® xoriginal particle + 4.
  • Figure 3 (C) shows a coordinate transformation: the coordinate system is moved to the left by 4 units. The new coordinate system, x’, is related to the original system, x, by: x’original particle = xoriginal particle + 4. The result ‘looks’ the same as (B).
  • Figure 3 (D) shows a combination of the object transformation (B) and a coordinate transformation, which is the inverse of that in (C), defined by: x’’original particle = xoriginal particle – 4. The result of this looks the same as the original trajectory in (A), because the coordinate transformation appears to ‘undo’ the effect of the object transformation.

10. Valid Transformations

There is an intimate connection between these two kinds of transformations. This connection provides the major conceptual apparatus of modern physics, through the concept of physical symmetries, or invariance principles, and valid transformations.

The deepest features of laws or theories of physics are reflected in their symmetry properties, which are also called invariances under symmetry transformations. Laws or theories can be understood as describing classes of physical processes. Physical processes that conform to a theory are valid physical processes of that theory. Of course, not all (logically) possible processes that we can imagine are valid physical processes of a given theory. Otherwise the theory would encompass all possible processes, and tell us nothing about what is physically possible, as opposed to what is logically conceivable.

Symmetries of a theory are described by transformations that preserve valid processes of the theory. For instance, time translation is a symmetry of almost all theories. This means that if we take a valid process, and transform it, intact, to an earlier or later time, we still have a valid process. This is equivalent to simply setting the ‘temporal origin’ of the process to a later or earlier time.

Other common symmetries are:

  • Rotations in space (if we take a valid process, and rotate it to another direction in space, we end up with another valid process).
  • Translations in space (if we take a valid process, and move it to another position in space, we end up with another valid process).
  • Velocity transformations (if we take a valid process, and give it uniform velocity boost in some direction in space, we end up with another valid process).

These symmetries are valid both in classical physics and in STR. In classical physics, they are called Galilean symmetries or transformations. In STR they are called Lorentz transformations. However, although the symmetries are very similar in both theories, the Lorentz transformations in STR involve features that are not evident in the classical theory. In fact, this difference only emerges for velocity boosts. Translations and rotations are identical in both theories. This is essentially because velocity boosts in STR involve transformations of the connection between proper time and ordinary space and time, which does not appear in classical theory.

The concept of valid coordinate transformations follows directly from that of valid object transformations. The point is that when we make an object transformation, we begin with a description of a process in a coordinate system, and end up with another description, of a different process, given in the same coordinate system. Now instead of transforming the processes involved, we can do the inverse, and make a transformation of the coordinate system, so that we end up with a new coordinate description of the original process, which looks exactly the same as the description of the transformed process in the original coordinate system.

This gives an alternative way of regarding the process, and its transformed image: instead of taking them as two different processes, we can take them as two different coordinate descriptions of the same process.

This is connected to the idea that certain aspects of the coordinate system are arbitrary or conventional. For instance, the choice of a particular origin for time or space is regarded as conventional: we can move the origins in our coordinate description, and we still have a valid system. This is only possible because the corresponding object transformations (time and space translations) are valid physical transformations.

Physicists tend to regard coordinate transformations and valid object transformations interchangeably and somewhat ambiguously, and the distinction between the two is often blurred in applied physics. While this doesn’t cause practical problems, it is important when learning the concepts of the theory to distinguish the two kinds of transformations clearly.

11. Velocity Boosts in STR and Classical Mechanics

STR and classical mechanics have exactly the same symmetries under translations of time and space, and rotations of space. They also both have symmetries under velocity boosts: both theories hold that, if we take a valid physical process, and give it a uniform additional velocity in some direction, we end with another valid physical process. But the transformation of space and time coordinates, and of proper time, are different for the two theories under a velocity boost. In classical physics, it is called a Galilean transformation, while for STR it is called a Lorentz transformation.

To see how the difference appears, we can take a stationary trajectory, and consider what happens when we apply a velocity boost in either theory.

coordinate systems 7

Figure 4. Classical and STR Velocity Boosts give different results.

In both diagrams, the green line is the original trajectory of a stationary particle, and it looks exactly the same in STR and classical mechanics. Proper time events (marked in blue) are equally spaced with the coordinate time intervals in both cases.

If we transform the classical trajectory by giving the particle a velocity (in this example, v = c/2) towards the right, the result (red line) is very simple: the proper time events remain equally spaced with coordinate time intervals. The same sequence of proper time events takes the same amount of coordinate time to complete. The classical particle moves a distance: Dx = v.Dt to the right, where Dt is the coordinate time duration of the original process.

But when we transform the STR particle, a strange thing happens: the proper time events become more widely spaced than the coordinate time intervals, and the same sequence of proper time events takes more coordinate time to complete. The STR particle moves a distance: Dx’ = v.Dt’ to the right, where: Dt’ > Dt, and hence: Dx’ > Dx.

The transformations of the coordinates of the (proper time) points of the original processes are shown in the following table.

coordinate systems 8

Table 1. Example of Velocity Transformation.

We can work out the general formula for the STR transformations of t’ and x’ in this example by using Equation (1). This requires finding a formula for the transformation of time-space coordinates:

(t, 0) ® (t’, x’)

We obtain this by applying Equation (1) in the (t’,x’) coordinate system, giving:

(1’) coordinate systems 9

It is crucial that this equation retains the same form under the Lorentz equation. In this special case, we have the additional facts that:

(i) Dt = Dt, and:(ii) Dx’ = vDt’

We substitute (i) and (ii) in (1’) to get:

coordinate systems 10

This rearranges to give:

coordinate systems 11 and: coordinate systems 12

We can see that: Dx’/Dt’ = v. This is a special case of a Lorentz transformation for this simplest kind of trajectory. Note that if we think of this as a coordinate transformation which generates the appearance of this object transformation, we need to move the new coordinate system in the opposite direction to the motion of the object. I.e. if we define a new coordinate system, (x’,t’), moving at –v (i.e. to the left) with regard to the original (x,t) system, then the original trajectory (which appeared stationary in (x,t)) will appear to be moving with velocity +v (to the left) in (x’,t’). In general, object transformations correspond the inverse coordinate transformations.

12. Lorentz Transformations for Velocity Boost V in the x-direction

The previous transformations is only for points on the special line where: x = 0. More generally, we want to work out the formulae for transforming points anywhere in the coordinate system:

(t, x) ® (t’, x’)

The classical formulas are Galilean transformations, and they are very simple.

Galilean Velocity Boost:

(t, x) ® (t, x+vt)t’ = t

x’ = x+vt

The STR formulas are more general Lorentz transformations. The Galilean transformation is simple because time coordinates are unchanged, so that: t = t’. This means that simultaneity in time in classical physics is absolute: it does not depend upon the choice of coordinate system. We also have that distance between two points at a given moment of time is invariant, because if: x2 -x1 = Dx, then: x’2 -x’1 = (x2+vt) – (x1-vt) = Dx. Ordinary distance in space is the crucial invariant quantity in classical physics.

But in STR, we have a complex interdependence of time and space coordinates. This is seen because the transformation formulas for both t’ and x’ are functions of both x and t. I.e. there are functions f and g such that:

t’ = f(x,t) and: x’ = g(x,t)

These functions represent the Lorentz transformations. To give stationary objects a velocity V in the x-direction, these general functions are found to be Lorentz Transformation, and the factor coordinate systems 13 is called γ, letting us write these equations more simply as:

Lorentz Transformations: t’ = γ(t+Vx/c2) and: x’ = γ(x+Vt)

We can equally consider the corresponding coordinate transformation, which would generate the appearance of this object transformation in a new coordinate system. It is essentially the same as the object transformation – except it must go in the opposite direction. For the object transformation, which increases the velocity of stationary particles by the speed V in the x direction, corresponds to moving the coordinate system in the opposite direction. I.e. if we define a new coordinate system, and call it (x’,t’), and place this in motion with a speed –V (i.e. V in the negative-x-direction), relative to the (x,t) coordinate system, then the original stationary trajectories in (x,t)-coordinates will appear to have speed V in the new (x’,t’) coordinates.

Because the Lorentz transformation of processes leaves us with valid STR processes, the Lorentz transformation of a STR coordinate system leaves us with a valid coordinate system. In particular, the form of Equation (1) is preserved by the Lorentz transformation, so that we get: coordinate systems 14. This can be checked by substituting the formulas for t’ and x’ back into this equation, and simplifying; the resulting equation turns out to be identical to Equation (1).

13. Galilean Transformation of Coordinate System

One useful way to visualize the effect of a transformation is to make an ordinary space-time diagram, with the space and time axes drawn perpendicular to each other as usual, and then to draw the new set of coordinates on this diagram. In these diagrams, the space axes represent points which are measured to have the same time coordinates, and similarly, the time axes represent points which are measured to have the same space coordinates. When we make a velocity boost, these lines of simultaneity and same-position are altered.

This is shown first for a Galilean velocity boost, where in fact the lines of simultaneity remain the same, but the lines representing position are rotated:

coordinate systems 15

Figure 5. Galilean Velocity Boost.

  • In Figure 5, the (green) horizontal lines are lines of absolute simultaneity. They have the same coordinates in both t and t’.
  • The (blue) vertical lines are lines with the same x-coordinates.
  • The (gray) slanted lines are lines with the same x’-coordinates.
  • The spacing of the x’ coordinates is the same as the x coordinates, which means that relative distances between points are not affected.
  • The solid black arrow represents a stationary trajectory in (x,t).
  • An object transformation of +V moves it onto the green arrow, with velocity: v = c/2 in the (x,t)-system.
  • A coordinate transformation of +V, to a system (x’,t’) moving at +V with regard to (x,t), makes this green arrow appears stationary in the (x’,t’) system.
  • This coordinate transformation makes the black arrow appear to be moving at –V in (x’,t’) coordinates.

14. Lorentz Transformation of Coordinate System

In a Lorentz velocity boost, the time and space axes are both rotated, and the spacing is also changed.

coordinate systems 16

Figure 6. Rotation of Space and Time Coordinate Axes by a Lorentz Velocity Boost. Some proper time events are marked in blue.

To obtain the (x’,t’)-coordinates of a point defined in (x,t)-coordinates, we start at that point, and: (i) move parallel to the green lines, to find the intersection with the (red) t’-axis, which is marked with the x’-coordinates; and: (ii) move parallel to the red lines, to find the intersection with the (green) x’-axis, which is marked with the t’-coordinates. The effects of this transformation on a solid rod or ruler extending from x=0 to x=1, and stationary in (x,t), is shown in more detail below.

coordinate systems 17

Figure 7. Lorentz Velocity Boost. Magnified view of Figure 6 shows time and space dilation. The gray rectangle represents a unit of the space-time path of a rod (Rod 1) stationary in (x,t). The dark green lines represent a Lorentz (object) transformation of this trajectory, which is a second rod (Rod 2) moving at V in (x,t) coordinates. This is a unit of the space-time path of a stationary rod in (x’,t’).

15. Time and Space Dilation

Figure 7 shows how both time and space dilation effects work. To see this clearly, we need to consider the volumes of space-time that an object like a rod traces out.

  • The (gray) rectangle PQRS represents a space-time volume, for a stationary rod or ruler in the original frame. It is 1-meter long in original coordinates (Dx = 1), and is shown over 1 unit of proper time, which corresponds to one unit of coordinate time (Dt = 1).
  • The rectangle PQ’R’S’ (green edges) represents a second space-time volume, for a rod which appears to be moving in the original frame. This is how the space-time volume of the first rod transforms under a Lorentz transformation.
  • We may interpret the transformation as either: (i) a Lorentz velocity boost of the rod by velocity +V (object transformation), or equally: (ii) a Lorentz transformation to a new coordinate system, (x’,t’), moving at –V with regard to (x,t). Note that:
  • The length of the moving rod measured in x is now shorter than the stationary rod: Dx = 1/γ. This is space dilation.
  • The coordinate time between proper time events on the moving rod measured in t is now longer than for the stationary rod (Dt = γ). This is time dilation.

The need to fix the new coordinate system in this way can be worked out by considering the moving rod from the point of view of its own inertial system.

  • As viewed in its own inertial coordinate system, the green rectangle PQ’R’S’ appears as the space-time boundary for a stationary rod. In this frame:
  • PS’ appears stationary: it is a line where: x’ = 0.
  • PQ’ appears as a line of simultaneity, i.e. it is a line where: t’=0.
  • R’S’ is also a line of simultaneity in t’.
  • Points on R’S’ must have the time coordinate: t’=1, since it is at the time t’ when one unit of proper time has elapsed, and for the stationary object, Dt’ = Dt.
  • The length of PQ’ must be one unit in x’, since the moving rod appears the same length in its own inertial frame as the original stationary rod did.

Time and space dilation are often referred to as ‘perspective effects’ in discussions of STR. Objects and processes are said to ‘look’ shorter or longer when viewed in one inertial frame rather than in another. It is common to regard this effect as a purely ‘conventional’ feature, which merely reflects a conventional choice of reference frame. But this is rather misleading, because time and space dilation are very real physical effects, and they lead to completely different types of physical predictions than classical physics.

However, the symmetrical properties of the Lorentz transformation makes it impossible to use these features to tell whether one frame is ‘really moving’ and another is ‘really stationary’. For instance, if objects get shorter when they are placed in motion, then why do we not simply measure how long objects are, and use this to determine whether they are ‘really stationary’? The details in Figure 7 reveal why this does not work: the space dilation effect is reversed when we change reference frames. That is:

  • Measured in Frame 1, i.e. in (x,t)-coordinates, the stationary object (Rod 1) appears longer than the moving object (Rod 2). But:
  • Measured in Frame 2, using (x’,t’)-coordinates, the moving object (Rod 2) appears stationary, while the originally stationary object (Rod 1) moves. But now the space dilation effect appears reversed, and Rod 2 appears longer than Rod 1!

The reason this is not a real paradox or inconsistency can be seen from the point of view of Frame 2, because now Rod 1 at the moment of time t’ = 0 stretches from the point P to Q’’, rather than from P to Q, as in Frame 1. The line of simultaneity alters in the new frame, so that we measure the distance between a different pair of space-time events. And PQ’’ is now found to be shorter than PQ’, which is the length of Rod 2 in Frame 2.

There is no answer, within STR, as to which rod ‘really gets shorter’. Similarly there is no answer as to which rod ‘really has faster proper time’ – when we switch to Frame 2, we find that Rod 2 has a faster rate of proper time with regard to coordinate time, reversing the time dilation effect apparent in Frame 1. In this sense, we could consider these effects a matter of ‘perspective’ – although it is more accurate to say that in STR, in its usual interpretation, there are simply no facts about absolute length, or absolute time, or absolute simultaneity, at all.

However, this does not mean that time and space dilation are not real effects. They are displayed in other situations where there is no ambiguity. One example is the twins’ paradox, where proper time slows down in an absolute way for a moving twin. And there are equally real physical effects resulting from space dilation. It is just that these effects cannot be used to determine an absolute frame of rest.

16. The Full Special Theory of Relativity

So far, we have only examined the most basic part of STR: the valid STR transformations for space, time, and proper time, and the way these three quantities are connected together. This is the most fundamental part of the theory. It represents relativistic kinematics. It already has very powerful implications. But the fully developed theory is far more extensive: it results from Einstein’s idea that the Lorentz transformations represent a universal invariance, applicable to all physics. Einstein formulated this in 1905: “The laws of physics are invariant under Lorentz transformations (when going from one inertial system to another arbitrarily chosen inertial system)”. Adopting this general principle, he explored the ramifications for the concepts of mass, energy, momentum, and force.

The most famous result is Einstein’s equation for energy: E = mc². This involves the extension of the Lorentz transformation to mass. Einstein found that when we Lorentz transform a stationary particle with original rest-mass m0, to set it in motion with a velocity V, we cannot regard it as maintaining the same total mass. Instead, its mass becomes larger: m = γm0, with γ defined as above. This is another deep contradiction with classical physics.

Einstein showed that this requires us to reformulate our concept of energy. In classical physics, kinetic energy is given by: E = ½ mv². In STR, there is a more general definition of energy, as: E = mc². A stationary particle then has a basic ‘rest mass energy’ of m0c². When it is set in motion, its energy is increased purely by the increase in mass, and this is kinetic energy. So we find in STR that:

Kinetic Energy = mc²-m0c² = (γ-1)m0

For low velocities, with: v << c, it is easily shown that: (γ-1)c² is very close to ½v², so this corresponds to the classical result in the classical limit of low energies. But for high energies, the behavior of particles is very different. The discovery that there is an underlying energy of m0c² simply from rest-mass is what made nuclear reactors and nuclear bombs possible: they convert tiny amounts of rest mass into vast amounts of thermal energy.

The main application Einstein explored first was the theory of electromagnetism, and his most famous paper, in which he defined STR in 1905, is called “Electrodynamics of Moving Bodies”. In fact, Lorentz, Poincaré and others already knew that they needed to apply the Lorentz transformation to Maxwell’s theory of classical electromagnetism, and had succeeded a few years earlier in formulating a theory which is extremely similar to Einstein’s in its predictions. Some important experimental verification of this was also available before Einstein’s work (most famously, the Michelson-Morley experiment). But his theory went much further. He radically reformulated the concepts that we use to analyse force, energy, momentum, and so forth. In this sense, his new theory was primarily a philosophical and conceptual achievement, rather than a new experimental discovery of the kind traditionally regarded as the epitome of empirical science.

He also attributed his universal ‘principle of relativity’ to the very nature of space and time itself. With important contributions by Minkowski, this gave rise to the modern view that physics is based on an inseparable combination of space and time, called space-time. Minkowski treated this as a kind of ‘geometric’ entity, based on regarding our Equation (1) as a ‘metric equation’ describing the geometric nature of space-time. This view is called the ‘geometric explanation’ of relativity theory, and this approach led Einstein even deeper into modern physics, when he applied this new conception to the theory of gravity, and discovered a generalised theory of space-time.

The nature of this ‘geometric explanation’ of the connection between space, time, and proper time is one of the most fascinating topics in the philosophy of physics. But it involves the General Theory of Relativity, which goes beyond STR.

17. References and Further Reading

The literature on relativity and its philosophical implications is enormous – and still growing rapidly. The following short selection illustrates some of the range of material available. Original publication dates are in brackets.

  • Bondi, Hermann. 1962. Relativity and Common Sense. Heinemann Educational Books.
    • A clear exposition of basic relativity theory for beginners, with a minimum of equations. Contains useful discussions of the Twins Paradox and other topics.
  • Einstein, Albert. 1956 (1921). The Meaning of Relativity. (The Stafford Little Lectures of Princeton University.) Princeton University Press.
    • Einstein’s account of the principles of his famous theory. Simple in parts, but mainly a fairly technical summary, requiring a good knowledge of physics.
  • Epstein, Lewis Carroll. 1983. Relativity Visualized. Insight Press. San Francisco.
    • A clear, simple, and rather unique introduction to relativity theory for beginners. Epstein illustrates the functional relationships between space, time, and proper time in a clear and direct way, using novel geometric presentations.
  • Grunbaum, Adolf. 1963. Philosophical Problems of Space and Time. Knopf, New York.
    • A collection of original studies by one of the seminal philosophers of relativity theory, this covers an impressive range of issues, and remains an important starting place for many recent philosophical studies.
  • Lorentz, H. A., A. Einstein, H. Minkowski and H. Weyl. 1923. The Principle of Relativity. A Collection of Original Memoirs on the Special and General Theory of Relativity. Trans. W. Perrett and G.B. Jeffery. Methuen. London.
    • These are the major figures in the early development of relativity theory, apart from Poincare, who simultaneously with Lorentz formulated the ‘pre-relativistic’ version of electromagnetic theory, which contains most of the mathematical basis of STR, shortly before Einstein’s paper of 1905. While Einstein deeply admired Lorentz – despite their permanent disagreements about STR – he paid no attention to Poincare.
  • Newton, Isaac. 1686. Mathematical Principles of Natural Philosophy.
    • Every serious student should read Newton’s “Definitions” and “Scholium”, where he introduces his concepts of time and space.
  • Planck, Max. 1998 (1909). Eight Lectures on Theoretical Physics.
    • Planck elegantly summarizes the revolutionary discoveries that characterized the first decade of 20th Century physics. Lecture 8 is one of the earliest accounts of relativity theory. This classic work shows Planck’s penetrating vision of many fundamental themes that soon came to dominate physics.
  • Reichenbach, Hans. 1958 (1928). The Philosophy of Space and Time. Dover, New York.
    • An influential early study of the concepts of space and time, and the relativistic revolution. Although Reichenbach’s approach is underpinned by his positivistic program, which is rejected today by philosophers, the central issues are of continuing interest.
  • Russell, Bertrand. 1977 (1925). ABC of Relativity. Unwin Paperbacks, London.
    • A early popular exposition of the meaning of relativity theory by one of the most influential 20th century philosophers, this presents key philosophical issues with Russell’s characteristic simplicity.
  • Schlipp, P.A. (Ed.) 1949. Albert Einstein: Philosopher-Scientist. The Library of Living Philosophers.
    • A classic collection of papers on Einstein and relativity theory.
  • Spivak, M. 1979. A Comprehensive Introduction to Differential Geometry. Publish or Perish. Berkeley.
    • An advanced mathematical introduction to the modern approach to differentiable manifolds, which developed in the 1960’s. Philosophical interest lies in the detailed semantics for coordinate systems, and the generalizations of concepts of geometry, such as the tangent vector.
  • Tipler, Paul A. 1982. Physics. Worth Publishers Ltd.
    • An extended introductory textbook for undergraduates, Chapter 35, “Relativity Theory”, is a typical modern introduction to relativity theory.
  • Torretti, Roberto. 1983/1996. Relativity and Geometry. Dover, New York.
    • An excellent source for the specialist philosopher, summarizing history and concepts of both the Special and General Theories, with extended bibliography. Combines excellent technical summaries with detailed historical surveys.
  • Wangsness, Roald K. 1979. Electromagnetic Fields. John Wiley & Sons Ltd.
    • This is a typical advanced modern undergraduate textbook on electromagnetism. The final chapter explains how the structure of electrodynamics is derived from the principles of STR.

Back to the main “Time” article.

Author Information

Andrew Holster
Email: ATASA@clear.net.nz
New Zealand

Rudolf Carnap (1891—1970)

carnap02Rudolf Carnap, a German-born philosopher and naturalized U.S. citizen, was a leading exponent of logical positivism and was one of the major philosophers of the twentieth century. He made significant contributions to philosophy of science, philosophy of language, the theory of probability, inductive logic and modal logic. He rejected metaphysics as meaningless because metaphysical statements cannot be proved or disproved by experience. He asserted that many philosophical problems are indeed pseudo-problems, the outcome of a misuse of language. Some of them can be resolved when we recognize that they are not expressing matters of fact, but rather concern the choice between different linguistic frameworks. Thus the logical analysis of language becomes the principal instrument in resolving philosophical problems. Since ordinary language is ambiguous, Carnap asserted the necessity of studying philosophical issues in artificial languages, which are governed by the rules of logic and mathematics. In such languages, he dealt with the problems of the meaning of a statement, the different interpretations of probability, the nature of explanation, and the distinctions between analytic and synthetic, a priori and a posteriori, and necessary and contingent statements.

Table of Contents

  1. Life
  2. The Structure of Scientific Theories
  3. Analytic and Synthetic
  4. Meaning and Verifiability
  5. Probability and Inductive Logic
  6. Modal Logic and the Philosophy of Language
  7. Philosophy of Physics
  8. Carnap’s Heritage
  9. References and Further Reading
    1. Carnap’s Works
    2. Other Sources

1. Life

Rudolf Carnap was born on May 18, 1891, in Ronsdorf, Germany. In 1898, after his father’s death, his family moved to Barmen, where Carnap studied at the Gymnasium. From 1910 to1914 he studied philosophy, physics and mathematics at the universities of Jena and Freiburg. He studied Kant under Bruno Bauch and later recalled how a whole year was devoted to the discussion of The Critique of Pure Reason. Carnap became especially interested in Kant’s theory of space. Carnap took three courses from Gottlob Frege in 1910, 1913 and 1914. Frege was professor of mathematics at Jena. During those courses, Frege expounded his system of logic and its applications in mathematics. However, Carnap’s principal interest at that time was in physics, and by 1913 he was planning to write his dissertation on thermionic emission. His studies were interrupted by World War I and Carnap served at the front until 1917. He then moved to Berlin and studied the theory of relativity. At that time, Albert Einstein was professor of physics at the University of Berlin.

After the war, Carnap developed a new dissertation, this time on an axiomatic system for the physical theory of space and time. He submitted a draft to physicist Max Wien, director of the Institute of Physics at the University of Jena, and to Bruno Bauch. Both found the work interesting, but Wien told Carnap the dissertation was pertinent to philosophy, not to physics, while Bauch said it was relevant to physics. Carnap then chose to write a dissertation under the direction of Bauch on the theory of space from a philosophical point of view. Entitled Der Raum (Space), the work was clearly influenced by Kantian philosophy. Submitted in 1921, it was published the following year in a supplemental issue of Kant-Studien.

Carnap’s involvement with the Vienna Circle developed over the next few years. He met Hans Reichenbach at a conference on philosophy held at Erlangen in 1923. Reichenbach introduced him to Moritz Schlick, then professor of the theory of inductive science at Vienna. Carnap visited Schlick—and the Vienna Circle—in 1925 and the following year moved to Vienna to become assistant professor at the University of Vienna. He became a leading member of the Vienna Circle and, in 1929, with Hans Hahn and Otto Neurath, he wrote the manifesto of the Circle.

In 1928, Carnap published The Logical Structure of the World, in which he developed a formal version of empiricism arguing that all scientific terms are definable by means of a phenomenalistic language. The great merit of the book was the rigor with which Carnap developed his theory. In the same year he published Pseudoproblems in Philosophy asserting the meaninglessness of many philosophical problems. He was closely involved in the First Conference on Epistemology, held in Prague in 1929 and organized by the Vienna Circle and the Berlin Circle (the latter founded by Reichenbach in 1928). The following year, he and Reichenbach founded the journal Erkenntnis. At the same time, Carnap met Alfred Tarski, who was developing his semantical theory of truth. Carnap was also interested in mathematical logic and wrote a manual of logic, entitled Abriss der Logistik (1929).

In 1931, Carnap moved to Prague to become professor of natural philosophy at the German University. It was there that he made his important contribution to logic with The Logical Syntax of Language (1934). His stay in Prague, however, was cut short by the Nazi rise to power. In 1935, with the aid of the American philosophers Charles Morris and Willard Van Orman Quine, whom he had met in Prague the previous year, Carnap moved to the United States. He became an American citizen in 1941.

From 1936 to 1952, Carnap was a professor at the University of Chicago (with the year 1940-41 spent as a visiting professor at Harvard University). He then spent two years at the Institute for Advanced Study at Princeton before taking an appointment at the University of California at Los Angeles.

In the 1940s, stimulated by Tarskian model theory, Carnap became interested in semantics. He wrote several books on semantics: Introduction to Semantics (1942), Formalization of Logic (1943), and Meaning and Necessity: A Study in Semantics and Modal Logic (1947). In Meaning and Necessity, Carnap used semantics to explain modalities. Subsequently he began to work on the structure of scientific theories. His main concerns were (i) to give an account of the distinction between analytic and synthetic statements and (ii) to give a suitable formulation of the verifiability principle; that is, to find a criterion of significance appropriate to scientific language. Other important works were “Meaning Postulates” (1952) and “Observation Language and Theoretical Language” (1958). The latter sets out Carnap’s definitive view on the analytic-synthetic distinction. “The Methodological Character of Theoretical Concepts” (1958) is an attempt to give a tentative definition of a criterion of significance for scientific language. Carnap was also interested in formal logic (Introduction to Symbolic Logic, 1954) and in inductive logic (Logical Foundations of Probability, 1950; The Continuum of Inductive Methods, 1952). The Philosophy of Rudolf Carnap, ed. by Paul Arthur Schilpp, was published in 1963 and includes an intellectual autobiography. Philosophical Foundations of Physics, ed. by Martin Gardner, was published in 1966. Carnap was working on the theory of inductive logic when he died on September 14, 1970, at Santa Monica, California.

2. The Structure of Scientific Theories

In Carnap’s opinion, a scientific theory is an interpreted axiomatic formal system. It consists of:

  • a formal language, including logical and non-logical terms;
  • a set of logical-mathematical axioms and rules of inference;
  • a set of non-logical axioms, expressing the empirical portion of the theory;
  • a set of meaning postulates stating the meaning of non-logical terms, which formalize the analytic truths of the theory;
  • a set of rules of correspondence, which give an empirical interpretation of the theory.

The sets of meaning postulates and rules of correspondence may be included in the set of non-logical axioms. Indeed, meaning postulates and rules of correspondence are not usually explicitly distinguished from non-logical axioms; only one set of axioms is formulated. One of the main purposes of the philosophy of science is to show the difference between the various kinds of statements.

The Language of Scientific Theories The language of a scientific theory consists of:

  1. a set of symbols and
  2. rules to ensure that a sequence of symbols is a well-formed formula, that is, correct with respect to syntax.

Among the symbols of the language are logical and non-logical terms. The set of logical terms include logical symbols, e.g., connectives and quantifiers, and mathematical symbols, e.g., numbers, derivatives, and integrals. Non-logical terms are divided into observational and theoretical. They are symbols denoting physical entities, properties or relations such as ‘blue’, ‘cold’, ‘ warmer than’, ‘proton’, ‘electromagnetic field’. Formulas are divided into: (i) logical statements, which do not contain non-logical terms; (ii) observational statements, which contain observational terms but no theoretical terms; (iii) purely theoretical statements, which contain theoretical terms but no observational terms and (iv) rules of correspondence, which contain both observational and theoretical terms.

Classification of statements in a scientific language
type of statement
observational terms
theoretical terms
logical statements No No
observational statements Yes No
purely theoretical statements No Yes
rules of correspondence Yes Yes

Observational language contains only logical and observational statements; theoretical language contains logical and theoretical statements and rules of correspondence.

The distinction between observational and theoretical terms is a central tenet of logical positivism and at the core of Carnap’s view on scientific theories. In his book Philosophical Foundations of Physics (1966), Carnap bases the distinction between observational and theoretical terms on the distinction between two kinds of scientific laws, namely empirical laws and theoretical laws.

An empirical law deals with objects or properties that can be observed or measured by means of simple procedures. This kind of law can be directly confirmed by empirical observations. It can explain and forecast facts and be thought of as an inductive generalization of such factual observations. Typically, an empirical law which deals with measurable physical quantities, can be established by means of measuring such quantities in suitable cases and then interpolating a simple curve between the measured values. For example, a physicist could measure the volume V, the temperature T and the pressure P of a gas in diverse experiments, and he could find the law PV=RT, for a suitable constant R.

A theoretical law, on the other hand, is concerned with objects or properties we cannot observe or measure but only infer from direct observations. A theoretical law cannot be justified by means of direct observation. It is not an inductive generalization but a hypothesis reaching beyond experience. While an empirical law can explain and forecast facts, a theoretical law can explain and forecast empirical laws. The method of justifying a theoretical law is indirect: a scientist does not test the law itself but, rather, the empirical laws that are among its consequences.

The distinction between empirical and theoretical laws entails the distinction between observational and theoretical properties, and hence between observational and theoretical terms. The distinction in many situations is clear, for example: the laws that deal with the pressure, volume and temperature of a gas are empirical laws and the corresponding terms are observational; while the laws of quantum mechanics are theoretical. Carnap admits, however, that the distinction is not always clear and the line of demarcation often arbitrary. In some ways the distinction between observational and theoretical terms is similar to that between macro-events, which are characterized by physical quantities that remain constant over a large portion of space and time, and micro-events, where physical quantities change rapidly in space or time.

3. Analytic and Synthetic

To the logical empiricist, all statements can be divided into two classes: analytic a priori and synthetic a posteriori. There can be no synthetic a priori statements. A substantial aspect of Carnap’s work was his attempt to give precise definition to the distinction between analytic and synthetic statements.

In The Logical Syntax of Language (1934), Carnap studied a formal language that could express classical mathematics and scientific theories, for example, classical physics. Carnap would have known Kurt Gödel’s 1931 article on the incompleteness of mathematics. He was, therefore, aware of the substantial difference between the two concepts of proof and consequence: some statements, despite being a logical consequence of the axioms of mathematics, are not provable by means of these axioms. He would not, however, have been able to take account of Alfred Tarski’s essay on semantics, first published in Polish in 1933. Tarski’s essay led to the notion of logical consequence being regarded as a semantic concept and defined by means of model theory. These circumstances explain how Carnap, in The Logical Syntax of Language, gave a purely syntactic formulation of the concept of logical consequence. However, he did define a new rule of inference, now called the omega-rule, but formerly called the Carnap rule:

From the infinite series of premises A(1), A(2), … , A(n), A(n+1) ,…, we can infer the conclusion (x)A(x)

Carnap defines the notion of logical consequence in the following way: a statement A is a logical consequence of a set S of statements if and only if there is a proof of A based on the set S; it is admissible to use the omega-rule in the proof of A. In the definition of the notion of provable, however, a statement A is provable by means of a set S of statements if and only if there is a proof of A based on the set S, but the omega-rule is not admissible in the proof of A. (A formal system which admits the use of the omega-rule is complete, so Gödel’s incompleteness theorem does not apply to such formal systems.

Carnap then proceeded to define some kinds of statements: (i) a statement is L-true if and only if it is a logical consequence of the empty set of statements; (ii) a statement is L-false if and only if all statements are a logical consequence of it; (iii) a statement is analytic if and only if it is L-true or L-false; (iv) a statement is synthetic if and only if is not analytic. Carnap thus defines analytic statements as logically determined statements: their truth depends on logical rules of inference and is independent of experience. Thus, analytic statements are a priori while synthetic statements are a posteriori, because they are not logically determined.

Carnap maintained his definitions of statements in his article “Testability and Meaning” (1936) and his book Meaning and Necessity (1947). In “Testability and Meaning,” he introduced semantic concepts: a statement is analytic if and only if it is logically true; it is self-contradictory if and only if it is logically false. In any other case, the statement is synthetic. In Meaning and Necessity. Carnap first defines the notion of L-true (a statement is L-true if its truth depends on semantic rules) and then defines the notion of L-false (a statements if L-false if its negation is L-true). A statement is L-determined if it is L-true or L-false; analytic statements are L-determined, while synthetic statements are not L-determined. This is very similar to the definitions Carnap gave in The Logical Syntax of Language but with the change from syntactic to semantic concepts.

In 1951, Quine published the article “Two Dogmas of Empiricism,” in which he disputed the distinction made between analytic and synthetic statements. In response, Carnap partially changed his point of view on this problem. His first response to Quine came in “Meaning postulates” (1952) where Carnap suggested that analytic statements are those which can be derived from a set of appropriate sentences that he called meaning postulates. Such sentences define the meaning of non logical terms and thus the set of analytic statements is not equal to the set of logically true statements. Later, in “Observation language and theoretical language” (1958), he expressed a general method for determining a set of meaning postulates for the language of a scientific theory. He further expounded on this method in his reply to Carl Gustav Hempel in The Philosophy of Rudolf Carnap (1963), and in Philosophical Foundations of Physics (1966). Suppose the number of non-logical axioms is finite. Let T be the conjunction of all purely theoretical axioms, and C the conjunction of all correspondence postulates and TC the conjunction of T and C. The theory is equivalent to the single axiom TC. Carnap formulates the following problems: how can we find two statements, say A and R, so that A expresses the analytic portion of the theory (that is, all consequences of A are analytic) while R expresses the empirical portion (that is, all consequences of R are synthetic)? The empirical content of the theory is formulated by means of a Ramsey sentence (a discovery of the English philosopher Frank Ramsey). Carnap’s solution to the problem builds a Ramsey sentence on the following instructions:

  1. Replace every theoretical term in TC with a variable.
  2. Add an appropriate number of existential quantifiers at the beginning of the sentence.

Look at the following example. Let TC(O 1 ,..,O n ,T 1 ,…,T m ) be the conjunction of T and C; in TC there are observational terms O 1 …O n and theoretical terms T 1 …T m . The Ramsey sentence (R) is

EX 1 …EX m TC(O 1 ,…,O n ,X 1 ,…,X m )

Every observational statement which is derivable from TC is also derivable from R and vice versa so that, R expresses exactly the empirical portion of the theory. Carnap proposes the statement R TC as the only meaning postulate; this became known as the Carnap sentence. Note that every empirical statement that can be derived from the Carnap sentence is logically true, and thus the Carnap sentence lacks empirical consequences. So, a statement is analytic if it is derivable from the Carnap sentence; otherwise the statement is synthetic. The requirements of Carnap’s method can be summarized as follows : (i) non-logical axioms must be explicitly stated, (ii) the number of non-logical axioms must be finite and (iii) observational terms must be clearly distinguished from theoretical terms.

4. Meaning and Verifiability

Perhaps the most famous tenet of logical empiricism is the verifiability principle, according to which a synthetic statement is meaningful only if it is verifiable. Carnap sought to give a logical formulation of this principle. In The Logical Structure of the World (1928) he asserted that a statement is meaningful only if every non-logical term is explicitly definable by means of a very restricted phenomenalistic language. A few years later, Carnap realized that this thesis was untenable because a phenomenalistic language is insufficient to define physical concepts. Thus he choose an objective language (“thing language”) as the basic language, one in which every primitive term is a physical term. All other terms (biological, psychological, cultural) must be defined by means of basic terms. To overcome the problem that an explicit definition is often impossible, Carnap used dispositional concepts, which can be introduced by means of reduction sentences. For example, if A, B, C and D are observational terms and Q is a dispositional concept, then

(x)[Ax → (Bx ↔ Qx)]
(x)[Cx → (Dx ↔ ~Qx)]

are reduction sentences for Q. In “Testability and Meaning” (1936) Carnap revised the new verifiability principle in this way: all terms must be reducible, by means of definitions or reduction sentences, to the observational language. But this proved to be inadequate. K. R. Popper showed not only that some metaphysical terms can be reduced to the observational language and thus fulfill Carnap’s requirements, but also that some genuine physical concepts are forbidden. Carnap acknowledged that criticism and in “The Methodological Character of Theoretical Concepts” (1956) sought to develop a further definition. The main philosophical properties of Carnap’s new principle can be outlined under three headings. First, of all, the significance of a term becomes a relative concept: a term is meaningful with respect to a given theory and a given language. The meaning of a concept thus depends on the theory in which that concept is used. This represents a significant modification in empiricism’s theory of meaning. Secondly, Carnap explicitly acknowledges that some theoretical terms cannot be reduced to the observational language: they acquire an empirical meaning by means of the links with other reducible theoretical terms. Third, Carnap realizes that the principle of operationalism is too restrictive. Operationalism was formulated by the American physicist Percy Williams Bridgman (1882-1961) in his book The Logic of Modern Physics (1927). According to Bridgman, every physical concept is defined by the operations a physicist uses to apply it. Bridgman asserted that the curvature of space-time, a concept used by Einstein in his general theory of relativity, is meaningless, because it is not definable by means of operations., Bridgman subsequently changed his philosophical point of view, and admitted there is an indirect connection with observations. Perhaps influenced by Popper’s criticism, or by the problematic consequences of a strict operationalism, Carnap changed his earlier point of view and freely admitted a very indirect connection between theoretical terms and the observational language.

5. Probability and Inductive Logic

A variety of interpretations of probability have been proposed:

  • Classical interpretation. The probability of an event is the ratio of the favorable outcomes to the possible outcomes. For example: a die is thrown with the result that “the score is five”. There are six possible outcomes with only one favorable; thus the probability of “the score is five” is one sixth.
  • Axiomatic interpretation. The probability is whatever fulfils the axioms of the theory of probability. In the early 1930s, the Russian mathematician Andrei Nikolaevich Kolmogorov (1903-1987) formulated the first axiomatic system for probability.
  • Frequency interpretation, now the favored interpretation in empirical science. The probability of an event in a sequence of events is the limit of the relative frequency of that event. Example: throw a die several times and record the scores; the relative frequency of “the score is five” is about one sixth; the limit of the relative frequency is exactly one sixth.
  • Probability as a degree of confirmation. This was an approach supported by Carnap and students of inductive logic. The probability of a statement is the degree of confirmation the empirical evidence gives to the statement. Example: the statement “the score is five” receives a partial confirmation by the evidence; its degree of confirmation is one sixth.
  • Subjective interpretation. The probability is a measure of the degree of belief. A special case is the theory that the probability is a fair betting quotient – this interpretation was supported by Carnap. Example: suppose you bet that the score would be five; you bet a dollar and, if you win, you will receive six dollars: this is a fair bet.
  • Propensity interpretation. This is a proposal of K. R. Popper. The probability of an event is an objective property of the event. For example: the physical properties of a die (the die is homogeneous; it has six sides; on every side there is a different number between one and six; etc.) explain the fact that the limit of the relative frequency of “the score is five” is one sixth.

Carnap devoted himself to giving an account of the probability as a degree of confirmation. The philosophically most significant consequences of his research arise from his assertion that the probability of a statement, with respect to a given body of evidence, is a logical relation between the statement and the evidence. Thus it is necessary to build an inductive logic; that is, a logic which studies the logical relations between statements and evidence. Inductive logic would give us a mathematical method of evaluating the reliability of an hypothesis. In this way inductive logic would answer the problem raised by David Hume’s analysis of induction. Of course, we cannot be sure that an hypothesis is true; but we can evaluate its degree of confirmation and we can thus compare alternative theories.

In spite of the abundance of logical and mathematical methods Carnap used in his own research on the inductive logic, he was not able to formulate a theory of the inductive confirmation of scientific laws. In fact, in Carnap’s inductive logic, the degree of confirmation of every universal law is always zero.

Carnap tried to employ the physical-mathematical theory of thermodynamic entropy to develop a comprehensive theory of inductive logic, but his plan never progressed beyond an outline stage. His works on entropy were published posthumously.

6. Modal Logic and the Philosophy of Language

The following table, which is an adaptation of a similar table Carnap used in Meaning and Necessity, shows the relations between modal properties such as necessary and impossible and logical properties such as L-true, L-false, analytic, synthetic. The symbol N means “necessarily”, so that Np means “necessarily p” or “p is necessary.”

Modal and logical properties of statements
Modalities
Formalization
Logical status
p is necessary Np L true, analytic
p is impossible N~p L false, contradictory
p is contingent ~Np & ~N~p factual, synthetic
p is not necessary ~Np Not L true
p is possible ~N~p Not L false
p is not contingent Np v N~p L determined, not synthetic

Carnap identifies the necessity of a statement p with its logical truth: a statement is necessary if and only if it is logically true. Thus modal properties can be defined by means of the usual logical properties of statements. Np, i.e., “necessarily p”, is true if and only if p is logically true. He defines the possibility of p as “it is not necessary that not p”. That is, “possibly p” is defined as ~N~p. The impossibility of p means that p is logically false. It must be stressed that, in Carnap’s opinion, every modal concept is definable by means of the logical properties of statements. Modal concepts are thus explicable from a classical point of view (meaning “using classical logic”, e.g., first order logic). Carnap was aware that the symbol N is definable only in the meta-language, not in the object language. Np means “p is logically true”, and the last statement belongs to the meta-language; thus N is not explicitly definable in the language of a formal logic, and we cannot eliminate the term N. More precisely, we can define N only by means of another modal symbol we take as a primitive symbol, so that at least one modal symbol is required among the primitive symbols.

Carnap’s formulation of modal logic is very important from a historical point of view. Carnap gave the first semantic analysis of a modal logic, using Tarskian model theory to explain the conditions in which “necessarily p” is true. He also solved the problem of the meaning of the statement (x)N[Ax], where Ax is a sentence in which the individual variable x occurs. Carnap showed that (x)N[Ax] is equivalent to N[(x)Ax] or, more precisely, he proved we can assume its equivalence without contradictions.

From a broader philosophical point of view, Carnap believed that modalities did not require a new conceptual framework; a semantic logic of language can explain the modal concepts. The method he used in explaining modalities was a typical example of his philosophical analysis. Another interesting example is the explanation of belief-sentences which Carnap gave in Meaning and Necessity. Carnap asserts that two sentences have the same extension if they are equivalent, i.e., if they are both true or both false. On the other hand, two sentences have the same intension if they are logically equivalent, i.e., their equivalence is due to the semantic rules of the language. Let A be a sentence in which another sentence occurs, say p. A is called “extensional with respect to p” if and only if the truth value of A does not change if we substitute the sentence p with an equivalent sentence q. A is called “intensional with respect to p” if and only if (i) A is not extensional with respect to p and (ii) the truth of A does not change if we substitute the sentence p with a logically equivalent sentence q. The following examples arise from Carnap’s assertions:

  • The sentence A v B is extensional with respect to both A and B; we can substitute A and B with equivalent sentences and the truth value of A v B does not change.
  • Suppose A is true but not L-true; therefore the sentences A v ~A and A are equivalent (both are true) and, of course, they are not L-equivalent. The sentence N(A v ~A) is true and the sentence N(A) is false; thus N(A) is not extensional with respect to A. On the contrary, if C is a sentence L-equivalent to A v ~A, then N(A v ~A) and N(C) are both true: N(A) is intensional with respect to A.

There are sentences which are neither extensional not intensional; for example, belief-sentences. Carnap’s example is “John believes that D”. Suppose that “John believes that D” is true; let A be a sentence equivalent to D and let B be a sentence L-equivalent to D. It is possible that the sentences “John believes that A” and “John believes that B” are false. In fact, John can believe that a sentence is true, but he can believe that a logically equivalent sentence is false. To explain belief-sentences, Carnap defines the notion of intensional isomorphism. In broad terms, two sentences are intensionally isomorphic if and only if their corresponding elements are L-equivalent. In the belief-sentence “John believes that D” we can substitute D with an intensionally isomorphic sentence C.

7. Philosophy of Physics

The first and the last books Carnap published during his lifetime were concerned with the philosophy of physics: his doctoral dissertation (Der Raum, 1922) and Philosophical Foundations of Physics, ed. by Martin Gardner, 1966. Der Raum deals with the philosophy of space. Carnap recognizes the difference between three kinds of theories of space: formal, physical and intuitive s. Formal space is analytic a priori; it is concerned with the formal properties of the space that is with those properties which are a logical consequence of a definite set of axioms. Physical space is synthetic a posteriori; it is the object of natural science, and we can know its structure only by means of experience. Intuitive space is synthetic a priori, and is known via a priori intuition. According to Carnap, the distinction between three different kinds of space is similar to the distinction between three different aspects of geometry: projective, metric and topological respectively.

Some aspects of Der Raum remain very interesting. First, Carnap accepts a neo-Kantian philosophical point of view. Intuitive space, with its synthetic a priori character, is a concession to Kantian philosophy. Second, Carnap uses the methods of mathematical logic; for example, the characterization of intuitive space is given by means of Hilbert’s axioms for topology. Thirdly, the distinction between formal and physical space is similar to the distinction between mathematical and physical geometry. This distinction, first proposed by Hans Reichenbach and later accepted by Carnap, and became the official position of logical empiricism on the philosophy of space.

Carnap also developed a formal system for space-time topology. He asserted (1925) that space relations are based on the causal propagation of a signal, while the causal propagation itself is based on the time order.

Philosophical Foundations of Physics is a clear and approachable survey of topics from the philosophy of physics based on Carnap’s university lectures. Some theories expressed there are not those of Carnap alone, but they belong to the common heritage of logical empiricism. The subjects dealt with in the book include:

  • The structure of scientific explanation: deductive and probabilistic explanation.
  • The philosophical and physical significance of non-Euclidean geometry; the theory of space in the general theory of relativity. Carnap argues against Kantian philosophy, especially against the synthetic a priori, and against conventionalism. He gives a clear explanation of the main properties of non-Euclidean geometry.
  • Determinism and quantum physics.
  • The nature of scientific language. Carnap deals with (i) the distinction between observational and theoretical terms, (ii) the distinction between analytic and synthetic statements and (iii) quantitative concepts.

As a sample of the content of Philosophical Foundations of Physics we can briefly look at Carnap’s thought on scientific explanation. Carnap accepts the classical theory developed by Carl Gustav Hempel. Carnap gives the following example to explain the general structure of a scientific explanation:

(x)(Px→ Qx)
Pa
———
Qa

where the first statement is a scientific law; the second, is a description of the initial conditions; and the third, is the description of the event we want to explain. The last statement is a logical consequence of the first and the second, which are the premises of the explanation. A scientific explanation is thus a logical derivation of an appropriate statement from a set of premises, which state universal laws and initial conditions. According to Carnap, there is another kind of scientific explanation, probabilistic explanation, in which at least one universal law is not a deterministic law, but a probabilistic law. Again Carnap’s example is:

fr(Q,P) = 0.8
Pa
———-
Qa

where the first sentence means “the relative frequency of Q with respect to P is 0.8”. Qa is not a logical consequence of the premises; therefore this kind of explanation determines only a certain degree of confirmation for the event we want to explain.

8. Carnap’s Heritage

Carnap’s work has stimulated much debate. A substantial scholarly literature, both critical and supportive, has developed from examination of his thought. With respect to the analytic-synthetic distinction, Ryszard Wojcicki and Marian Przelecki – two Polish logicians – formulated a semantic definition of the distinction between analytic and synthetic. They proved that the Carnap sentence is the weakest meaning postulate, i.e., every meaning postulate entails the Carnap sentence. As a result, the set of analytic statements which are a logical consequence of the Carnap sentence is the smallest set of analytic statements. Wojcicki and Przelecki’s research is independent of the distinction between observational and theoretical terms, i.e., their suggested definition also works in a purely theoretical language. They also dispense with the requirement for a finite number of non-logical axioms.

The tentative definition of meaningfulness that Carnap proposed in “The Methodological Character of Theoretical Concepts” has been proved untenable. See, for example, David Kaplan, “Significance and Analyticity” in Rudolf Carnap, Logical Empiricist and Marco Mondadori’s introduction to Analiticità, Significanza, Induzione, in which Mondadori suggests a possible correction of Carnap’s definition.

With respect to inductive logic, I mention only Jaakko Hintikka’s generalization of Carnap’s continuum of inductive methods. In Carnap’s inductive logic, the probability of every universal law is always zero. Hintikka succeeded in formulating an inductive logic in which universal laws can obtain a positive degree of confirmation.

In Meaning and Necessity, 1947, Carnap was the first logician to use a semantic method to explain modalities. However, he used Tarskian model theory, so that every model of the language is an admissible model. In 1972 the American philosopher Saul Kripke was able to prove that a full semantics of modalities can be attained by means of possible-worlds semantics. According to Kripke, not all possible models are admissible. J. Hintikka’s essay “Carnap’s heritage in logical semantics” in Rudolf Carnap, Logical Empiricist, shows that Carnap came extremely close to possible-worlds semantics, but was not able to go beyond classical model theory.

The omega-rule, which Carnap proposed in The Logical Syntax of Language, has come into widespread use in metamathematical research over a broad range of subjects.

9. References and Further Reading

The Philosophy of Rudolf Carnap (1963) contains the most complete bibliography of Carnap’s work.  Listed below are Carnap’s most important works, arranged in chronological order.

a. Carnap’s Works

  • 1922 Der Raum: Ein Beitrag zur Wissenschaftslehre, dissertation, in Kant-Studien, Ergänzungshefte, n. 56
  • 1925 “Über die Abhängigkeit der Eigenschaften der Raumes von denen der Zeit” in Kant-Studien, 30
  • 1926 Physikalische Begriffsbildung, Karlsruhe : Braun, (Wissen und Wirken ; 39)
  • 1928 Scheinprobleme in der Philosophie, Berlin : Weltkreis-Verlag
  • 1928 Der Logische Aufbau der Welt, Leipzig : Felix Meiner Verlag (English translation The Logical Structure of the World; Pseudoproblems in Philosophy, Berkeley : University of California Press, 1967)
  • 1929 (with Otto Neurath and Hans Hahn) Wissenschaftliche Weltauffassung der Wiener Kreis, Vienna : A. Wolf
  • 1929 Abriss der Logistik, mit besonderer Berücksichtigung der Relationstheorie und ihrer Anwendungen, Vienna : Springer
  • 1932 “Die physikalische Sprache als Universalsprache der Wissenschaft” in Erkenntnis, II (English translation The Unity of Science, London : Kegan Paul, 1934)
  • 1934 Logische Syntax der Sprache (English translation The Logical Syntax of Language, New York : Humanities, 1937)
  • 1935 Philosophy and Logical Syntax, London : Kegan Paul
  • 1936 “Testability and meaning” in Philosophy of Science, III (1936) and IV (1937)
  • 1938 “Logical Foundations of the Unity of Science” in International Encyclopaedia of Unified Science, vol. I n. 1, Chicago : University of Chicago Press
  • 1939 “Foundations of Logic and Mathematics” in International Encyclopaedia of Unified Science, vol. I n. 3, Chicago : University of Chicago Press
  • 1942 Introduction to Semantics, Cambridge, Mass. : Harvard University Press
  • 1943 Formalization of Logic, Cambridge, Mass. : Harvard University Press
  • 1947 Meaning and Necessity: a Study in Semantics and Modal Logic, Chicago : University of Chicago Press
  • 1950 Logical Foundations of Probability, Chicago : University of Chicago Press
  • 1952 “Meaning postulates” in Philosophical Studies, III (now in Meaning and Necessity, 1956, 2nd edition)
  • 1952 The Continuum of Inductive Methods, Chicago : University of Chicago Press
  • 1954 Einführung in die Symbolische Logik, Vienna : Springer (English translation Introduction to Symbolic Logic and its Applications, New York : Dover, 1958)
  • 1956 “The Methodological Character of Theoretical Concepts” in Minnesota Studies in the Philosophy of Science, vol. I, ed. by H. Feigl and M. Scriven, Minneapolis : University of Minnesota Press
  • 1958 “Beobacthungssprache und theoretische Sprache” in Dialectica, XII (English translation “Observation Language and Theoretical Language” in Rudolf Carnap, Logical Empiricist, Dordrecht, Holl. : D. Reidel Publishing Company, 1975)
  • 1966 Philosophical Foundations of Physics, ed. by Martin Gardner, New York : Basic Books
  • 1977 Two Essays on Entropy, ed. by Abner Shimony, Berkeley : University of California Press

b. Other Sources

  • 1962 Logic and Language: Studies Dedicated to Professor Rudolf Carnap on the Occasion of his Seventieth Birthday, Dordrect, Holl. : D. Reidel Publishing Company
  • 1963 The Philosophy of Rudolf Carnap, ed. by Paul Arthur Schillp, La Salle, Ill. : Open Court Pub. Co.
  • 1970 PSA 1970: Proceedings of the 1970 Biennial Meeting of the Philosophy of Science Association: In Memory of Rudolf Carnap, Dordrect, Holl. : D. Reidel Publishing Company
  • 1971 Analiticità, Significanza, Induzione, ed. by Alberto Meotti e Marco Mondadori, Bologna, Italy : il Mulino
  • 1975 Rudolf Carnap, Logical Empiricist. Materials and Perspectives, ed. by Jaakko Hintikka, Dordrecht, Holl. : D. Reidel Publishing Company
  • 1986 Joëlle Proust, Questions de Forme: Logique at Proposition Analytique de Kant a Carnap, Paris, France: Fayard (English translation Questions of Forms: Logic and Analytic Propositions from Kant to Carnap, Minneapolis : University of Minnesota Press)
  • 1990 Dear Carnap, Dear Van: The Quine-Carnap Correspondence and Related Work, ed. by Richard Creath, Berkeley : University of California Press
  • 1991 Maria Grazia Sandrini, Probabilità e Induzione: Carnap e la Conferma come Concetto Semantico, Milano, Italy : Franco Angeli
  • 1991 Erkenntnis Orientated: A Centennial Volume for Rudolf Carnap and Hans Reichenbach, ed. by Wolfgang Spohn, Dordrecht; Boston : Kluwer Academic Publishers
  • 1991 Logic, Language, and the Structure of Scientific Theories: Proceedings of the Carnap-Reichenbach Centennial, University of Konstanz, 21-24 May 1991 Pittsburgh : University of Pittsburgh Press; [Konstanz] : Universitasverlag Konstanz
  • 1995 L’eredità di Rudolf Carnap: Epistemologia, Filosofia delle Scienze, Filosofia del Linguaggio, ed. by Alberto Pasquinelli, Bologna, Italy : CLUEB

Author Information

Mauro Murzi
Email: murzim@yahoo.com
Italy