set theory, axiom, Zermelo, Fraenkel, Frankel, Cantor, Frege, Russell, paradox, formal, axiomatic, axioms, comprehension, infinity

Back to title page.

2.1. Origin of Cantor's Set Theory

In the dates and facts of the real history I am following the excellent books by Fyodor Andreevich Medvedev (1922–1992) (see Medvedev [1965] and [1982]). See also online paper "The beginnings of set theory" in the MacTutor History of Mathematics archive.

In XIX century, development process of the most basic mathematical notions led to the intuition of arbitrary infinite sets. Principles of the past mathematical thinking were developed up to their logical limits. Georg Cantor did the last step in this process, and this step was forced by a specific mathematical problem.

G. Cantor. Ueber die Ausdehnung eines Satzes aus der Theorie der trigonometrischen Reihen. "Mathematische Annalen", 1872, Vol. 5, pp. 123-132 (available at http://www.maths.tcd.ie/pub/HistMath/People/Cantor/Ausdehnung/, see also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/cantor/cantor.html).

See also a collection of Cantor web-sites at www.geometry.net.

In this 1872 paper Cantor proved the following theorem: if two Fourier series are known converging to identical limit values at all but a finite number of points of the segment [-pi, pi], then these series (i.e. their respective coefficients) are identical. How far this theorem could be extended?

Cantor started with the simplest kinds of infinite sets of exception points, for example:

{1/n | n >= 1}.

This set is infinite, it possess only one the so-called condensation point (namely, x=0). Cantor succeeded in proving that his theorem holds, if the set of exceptions possess only one condensation point. The generalization to any finite number of condensation points was straightforward.

The next step would be considering sets having infinitely many condensation points. The most simple kind of such sets has exactly one "second order" condensation point, i.e. a condensation point for the usual ("first order") condensation points, for example:

{1/m + 1/n | m, n >= 1}

Cantor succeeded in proving his theorem for such sets of exceptions, too. The following step would involve the "third order" condensation points etc. In this way Cantor was forced to work with sets of points of rapidly growing complexity. Thus, gradually, the intuitive notion of arbitrary infinite sets of points began to take shape...

To bring some order into this process Cantor introduced the notion of the derivative set: if P is a set of points, then P' denotes the set of all condensation points of P. Further one can define P'' as (P')', P''' - as (P'')' etc.

Exercise 2.1. For any fixed k>=1, let us consider the set

Q(k) = {1/n1 +... + 1/nk | n1,... , nk >= 1}.

Prove that the k-th derivative set Qk(k) = {0}.

Cantor successfully generalized his Fourier series uniqueness theorem for exception sets of any finite order (i.e. for sets P having a finite Pk for some k).

In this way, investigating Fourier series, and having built a plentiful collection of complicated infinite sets of points, Cantor came to the intuition of arbitrary infinite sets. At some moment he arrived to the question: have all infinite sets equal "number" of members?

This critical point was reached in the fall of 1873. Cantor's letter to Richard Dedekind from September 29, 1873 contains the surprising one-to-one correspondence between natural and positive rational numbers:

1     2     3     4     5     6     7     8     9   …
1/1, 1/2, 2/1, 1/3, 3/1, 1/4, 2/3, 3/2, 4/1,...
---   --------   --------   ------------------
2        3            4                5  …

The sequence starts with the only fraction m/n such that m+n=2, after that follow: two fractions having m+n=3, two fractions having m+n=4 (2/2 is omitted, it equals to 1/1 that was already encountered) etc.

In other words, the (dense!) set of all rational numbers possess the same "number of members" as the (discrete!) set of all natural numbers! As the next step Cantor proposed to try enumerating of all real numbers, i.e. all the points of a straight line. In his reply, Dedekind pointed out that also the set of all algebraic numbers can be enumerated by using natural numbers. Still, he did not succeed in enumerating all the real numbers...

In his next letter to Dedekind (December 7, 1873) Cantor proved that this is not possible: there is no one-to-one correspondence between real numbers and natural numbers. Cantor's proof involves the method that is called now the diagonal method (perhaps, first used for another purpose by P. du Bois-Reymond in 1871).

Namely, assume that we have some segment [a, b] and some "enumerating" sequence of real numbers r1, r2,... , rn,.... Divide [a, b] into three equal parts, and take the part that does not contain the number r1. Denote this part by [a1, b1], divide it again into three equal parts, and take the part that does not contain the number r2, etc. This process produces a sequence of contracting segments:

a1 <= a2 <= a3 <= ... <= b3 <= b2 <= b1.

The only common point of these segments is some real number r that does not belong to the "counting" sequence r1, r2,... , rn,.... Hence, you cannot enumerate all real numbers (even a tiny segment of them!) by using natural numbers.

Thus there are at least two kinds of infinite sets: the so-called countable sets (that can be enumerated by using natural numbers), and some "more strongly infinite" sets that cannot be enumerated. This discovery by Georg Cantor is the most significant event in the history of the mathematical investigation of infinity. Still, for Cantor itself - at the moment of the discovery - the following corollary was even more significant: "most" real numbers are transcendent. (Indeed, all the algebraic numbers can be enumerated, but the transcendent ones cannot.) The above Cantor's proof is simple enough to be followed by children (unlike the 1844 construction of particular transcendent numbers by J. Liouville, and the 1873 proof by Ch. Hermite that e is transcendent).

These first Cantor's set theoretical results were published in 1874:

G.Cantor. Ueber eine Eigenschaft des Inbegriffes aller reelen algebraischen Zahlen, "J. reine und angew. Math.", 1874, Vol. 77, pp, 258-262 (see also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/cantor/cantor.html).

Exercise 2.2. Construct one-to-one correspondences between: a) two segments (of different length), b) between the segment [a, b] (i.e. the set {x | a<=x<=b}) and the interval (a, b) (i.e. the set {x | a<x<b}), c) between an interval and the entire set of all real numbers.

Having discovered the existence of at least two radically different types of infinite sets, Cantor went further. In a letter to Dedekind (January 5, 1874) he asks: is a two-dimensional continuum (for example, a rectangle) equivalent to a one-dimensional continuum (for example, a segment)? In other words: does a rectangle contain more points than a straight-line segment? Cantor conjectured "yes", and spent the following three years trying to prove that a rectangle contains more points than a segment. He did not succeed. Only after he decided to explore the "unrealistic" opposite hypothesis (rectangles and segments contain equal numbers of points), he succeeded almost immediately! His proof (first exposed in a letter to Dedekind from July 20, 1877) is simple enough to be followed by children:

A one-to-one correspondence between the rectangle [0, 1)x[0, 1) and the segment [0, 1] can be produced easily from the decimal fractions of coordinates:

(x, y) -> z

x = 0.abcd...

y = 0.ABCD...

z = 0.aAbBcCdD...

Exercise 2.3. Fill in the gaps and complete this proof (published in 1878):

G. Cantor. Ein Beitrag zur Mannigfaltigkeitslehre. "J. reine und angew. Math.", 1878, Vol. 84, pp. 119-133 (see also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/cantor/cantor.html).

Cantor was afraid that his simple proof has "destroyed" the notion of dimensionality. Replying to this, Dedekind pointed out that Cantor's correspondence seems to be discontinuous, and conjectured that continuous correspondences between continuums of different dimensionalities would be impossible. Still, G. Peano in 1890 and D. Hilbert in 1891 succeeded in producing continuous (yet not one-to-one!) mappings from a segment onto a rectangle. And only in 1911 L. Brouwer "saved" the notion of dimensionality by proving that Dedekind was right: a continuous one-to-one correspondence between continuums of different dimensionalities is impossible:

L. Brouwer. Beweis der Invarianz der Dimensionenzahl. "Math. Ann.", 1911, Vol. 70, pp. 161-165.

The remarkable paper of 1878 contains another famous statement - the so-called continuum hypothesis. Working hard with various infinite sets of points Cantor established that all infinite sets he could produce, fall into two categories:

a) the so-called countable sets (i.e. the sets that can be enumerated by using natural numbers),

b) the sets that are equivalent to the entire continuum (i.e. the set of all real numbers).

Cantor was unable to produce sets of "intermediate power", i.e. uncountable sets of points that were not equivalent to the entire continuum. This is why he conjectured that such sets do not exist. This conjecture is known as the continuum hypothesis: each infinite set of points either is countable, or it is equivalent to the entire continuum.

Cantor spent many years trying to prove the continuum hypothesis. For this purpose, he developed further his apparatus of derivative sets Pk. He introduced condensation points of infinite order, i.e. points that are simultaneously condensation points of any finite order, and defined the infinite derivative set Pw ("P omega") as the intersection of P', P'', P''' etc. After this, he introduced the next derivatives:

Pw+1 = (Pw)', Pw+2 = (Pw+1)',...

Pw*2 = intersection of Pw+k for all finite k,

Pw*2+1,... , Pw*3, Pw*3+1,...

Pw*w, Pw*w+1,...

Exercise 2.4. Try to produce a set P such that Pw = {0}, such that Pw*2 = {0} etc.

In this way Cantor arrived to a remarkable extension of the natural number system, the so-called transfinite ordinal numbers. These numbers extend the usual finite counting process:

Finite (i.e. natural) numbers are called first class ordinal numbers.

w (the first transfinite ordinal number) follows after all finite numbers,

w+1 follows immediately after w,

w+2 follows immediately after w+1,

...

w*2 is defined as w+w, i.e. it follows after all w+k where k is finite,

w*3 is defined as w*2+w, i.e. it follows after all w*2+k where k is finite,

...

w2 is defined as w*w, i.e. it follows after all w*k where k is finite, etc.

...

ww follows after all wk where k is finite, etc.

...

e0 (epsilon 0) follows after all expressions built up of natural numbers and w by using addition, multiplication and exponentiation,

...

Infinite numbers having a countable set of predecessors are called second class ordinal numbers.

w1 (omega 1) follows after all second class ordinal numbers, i.e. it is the least third class ordinal number.

...

Cantor proved that the set of all second class ordinal numbers O2 is the least uncountable set, i.e. if some infinite subset S of O2 is not equivalent to the entire O2, then S is countable. Thus, to prove the continuum hypothesis, it was sufficient to prove that O2 is equivalent to the entire continuum, i.e. "simply" to produce a one-to-one correspondence between the second class ordinal numbers and the real numbers. Still, Cantor and all his numerous followers failed to do this...

Thus, in some sense, the continuum hypothesis represents one of the most beautiful problems in mathematics: it can be explained to children, yet it still remains unsolved for 130 years!

Cantor was already tired of many years of unsuccessful attempts to prove the continuum hypothesis, when he received another blow. In 1895 he discovered in his set theory... a contradiction. In 1897 another contradiction was found (and published) by C.Burali-Forti...

Still, all these difficulties cannot change the "verdict of the history": Georg Cantor remains one of the most outstanding personalities in the history of mathematics. He succeeded in developing the principles of the past mathematical thinking up to their logical limits.

2.2 Formalization of Cantor's Inconsistent Set Theory

First let us formalize the set theory directly as G. Cantor created it. This theory is based on the intuitive concept of "a world of sets" where all sets (finite and infinite) and all their members exist simultaneously and completely. In our axioms we would like to describe the governing laws of this frozen "world of sets".

At the very beginning we must answer the following question: can our "world" consist of sets only? A set can consist of members, which are sets. If the set x1 contains a member x2, which is also a set, if x2 contains x3, x3 contains x4 etc. - then, following along this chain must we not encounter something more tangible than "sets of sets"? If nothing tangible exists, how can sets exist? If nothing exists, then the world is ... an empty set o (i.e. from nothing we obtain something). Having o we can build the set which contains o as a member - the set {o}. Having o and {o} we can build a two-element set {o, {o}} etc.:

x0 = o; x1 = {o}; x2 = {o, {o}}; ...; xn+1 = xn U {xn}; ...

Therefore, even postulating that "nothing exists" we can obtain infinitely many different sets (compare Devlin [1977]).

Now we can define the language of the formal set theory. We will use only one sort of variables: x, y, z, ... - subscripted or not. Intuitively, the range of each variable consists of "all possible sets" (since we have decided that the "world of sets" consists from sets only). We do not introduce constants (like as o to denote the empty set) and function symbols (like as xUy to denote union of sets). Later we will see how to do without these symbols. We introduce two predicate symbols: "in" (intuitively, "x in y" means "x is a member of y"), and "=" (intuitively, "x=y" means "x is the same set as y"). We can combine atomic formulas like as "x in y" and "x=y" by using connector and quantifier symbols, thus obtaining formulas of set theory. For example, the formula Ay ~(y in x) says that x is empty set, and the formula EyAz (z in x -> z=y) says that x possess exactly one member.

Exercise 2.4a. Provide formulas expressing the following assertions:

"x possess exactly two members",

"x possess exactly five members",

"x is a subset of y (i.e. all the members of x are members of y)",

"x and y do not intersect (i.e. x and y do not possess common members)",

"there is an infinite set".

As a serious formal theory, set theory adopts axioms and inference rules of the classical logic (see Appendix 3). After this we must define the specific meaning of the identity predicate in the set theory: x=y means that the sets x and y have the same members. It may seem obvious, yet it requires a special axiom - you cannot derive this meaning from pure logic. Using pure logic you can derive (check it!) only that

x=y -> Az (z in x <-> z in y).

This is normal: if x equals to y, then x an y have the same properties. Still, if the sets x and y have different definitions, and after some effort we could establish that Az(z in x <-> z in y), then we can conclude that x=y only by using the specific meaning of identity adopted in set theory. Logic as such is indifferent to the meaning of identity, it postulates only some general properties of this predicate. Therefore, to define the specific meaning of set identity we must introduce the following axiom:

Az (z in x <-> z in y) -> x=y. --------------- (C1)

This axiom is called the Extensionality Axiom (it says that in our set theory the identity of sets is treated extensionally, not intensionally, i.e. two different set definitions can lead to identical sets).

Since we adopted the classical logic for our "world of sets", we can prove the formula Ex(x=x), i.e. that at least one set exists in our "world". Still, pure logic does not allow to conclude something about the properties of this set x. To obtain sets having specific properties (for example, the empty set) we must introduce additional axioms.

The main principle of Cantor's set theory says that a set is a "many" of which we can think as of a single "whole". How could we think of many things as of a whole? One of the ways would be to define the combination of properties which allows to separate things as belonging to the whole and not belonging to it. Now, the corresponding notation is used widely in mathematics, for example, c = { x | C(x) }, where C(x) says that "x is a crocodile". Mathematicians say that c is the "set of all crocodiles" and may operate with c as with a single object.

In the modern language we formulate Cantor's principle as the Comprehension Axiom Schema. Let F(y) be a formula of set theory (it may contain additonal free variables, let us call them parameters). We may think of F(y) as of an assertion "the set y possess the property F". Hence, according to the Cantor's principle we can introduce the set x of all y's possessing the property F: x= { y | F(y) }. To legalize operations like this one we must adopt the following axiom schema:

ExAy (y in x <-> F(y)) -------------- (C2[F])

(of course, formula F does not contain x). For each formula F we have a separate comprehension axiom C2[F].

In particular, the axiom C2[~(y=y)] allows to prove the existence of the empty set:  ExAy (y in x <-> ~(y=y)). Hence, ExAy ~(y in x), i.e. x is the empty set.

Exercise 2.5. Using appropriate comprehension axioms, prove the existence of the following sets (o is the empty set): {o}, {o, {o}}, {{{o}}}. Can you prove the existence of at least one infinite set? (If not, see how this can be done a la von Neumann in Section 2.3.)

The axioms C1 and C2[F] (for all formulas F that do not contain x) and the axiom of choice define a formal set theory C which corresponds almost 100% to Cantor's intuitive set theory (of the "pre-paradox" period of 1873-94).

Cantor and the Axiom of Choice

Of course, in 1873-94 Cantor believed in the unrestricted comprehension principle. Still, did Cantor really accept a kind of axiom of choice as a valid principle of set theory? Let us check his two main papers on foundations of set theory:

G.Cantor. Grundlagen einer allgemeinen Mannigfaltigkeitslehre. "Math. Annalen", 1883, Vol.21, pp.545-586

G.Cantor. Beitraege zur Begruendung der transfiniten Mengenlehre. "Math. Annalen", 1895, Vol.46, pp.481-512; 1897, Vol.49, pp.207-246

Both papers were reprinted (with extensive comments by E.Zermelo) in

G.Cantor. "Gesammelte Abhandlungen", Springer, 1932

I am using the book by F.A.Medvedev "The early history of the axiom of choice" (Medvedev [1982]) that contains extensive relevant quotes from Cantor's works and Zermelo's comments.

Two main conclusions:

1. Cantor is using ad hoc - as he needs, and tens of times! - the "possibility of arbitrary choices" without any attempt to formulate something like a general axiom of choice. For example, when proving that each infinite set contains a countable subset (a quote from the 1895-97 paper):

"If we have some rule of deleting of elements t1, t2, ..., tn-1 from [an infinite set - K.P.] T, then always there is a possibility to delete one more element tn..."

2. Cantor believed in "validity" of some assertions that are equivalent (or almost equivalent) to the axiom of choice. For example, in the 1883 paper he qualifies the thesis "each well defined set can be well ordered" as a "remarkable generally valid law of thought" and promises to consider it in one of subsequent papers. Still, he never did, and in the 1895-97 paper this thesis does not appear at all. Because of the "smell" of paradoxes?

Today, we can guess, did Cantor's thesis "each well defined set can be well ordered" mean that all sets can be well ordered (this would be equivalent to the axiom of choice, see below)? Or, maybe, Cantor intended (already in 1883?) to distinguish between "well defined" and "ill defined" sets? This would mean that Cantor believed only that "each constructible set can be well ordered" (theorem proved by K.Goedel in 1938, see below).

Of course, we would be able to derive from our formal set theory C (i.e. from a very simple set of axioms!) all the common mathematics. A very remarkable fact - 100% of mathematics can be derived from an extremely simple set of axioms! Still,... these axioms are also sufficient... to derive a contradiction!

A very simple way how to do this was invented by Bertrand Russell in 1902, and is now called Russell's paradox. Russell discovered his paradox in Frege's formal system of mathematics (see Section 6.1), yet his method applies to Cantor's set theory as well. In terms of our formal set theory C Russell's paradox can be derived as follows.

Normally, sets are not members of themselves, i.e. normally, ~(y in y), for example, ~(o in o). Still, the axioms of C do not exclude the existence of "abnormal" sets, which are members of themselves. Hence, let us consider the set of all "normal" sets: x = { y | ~(y in y) }. The existence of this x is guarantied by the comprehension axiom C2[~(y in y)]:

ExAy (y in x <-> ~(y in y)).

Now substitute x for y, and you will have a contradiction:

Ex (x in x <-> ~(x in x)).

Hence, unexpectedly, some of the comprehension axioms lead to contradictions. Cantor's set theory C is inconsistent. The unrestricted comprehension axiom scheme cannot serve as a foundation of set theory!

Note. The real history of paradoxes in set theory was more complicated. In Russell's paradox the notion of "abnormal" sets (x in x) is involved, i.e. this paradox could not be discovered in the intuitive set theory, and was discovered first in Frege's formal system of mathematics. Cantor discovered the first paradox of set theory himself in 1895. The next paradox discovered - and published - C.Burali-Forti in 1987. These paradoxes were more complicated than Russell's paradox, yet much easier to discover! See Fraenkel, Bar Hillel, Levy [1973].

Note. In the above text the terms "invented" and "discovered" were used as synonyms. Mathematics is the only branch of science for which this is true (see Section 1.2).

Today, 100 years after, perhaps, the discovery of paradoxes in mathematics would not be percepted as a catastrophe. Since then, humans have seen things much worse than paradoxes. Still, for G.Cantor (1845 - 1918), R.Dedekind (1831 - 1916) and G.Frege (1848 - 1925), who started to believe in the unrestricted comprehension schema in 1870's and lived with this absolute belief for more than 20 years, the discovery of paradoxes was a personal tragedy. For Georg Cantor - his set theory, and for Gottlob Frege - his formal system of mathematics were their main and only contributions to mathematics. After the discovery of paradoxes they published nothing comparable with their outstanding works of the previous period. They could not find an exit. The exit was found by mathematicians of the next generation.

2.3. Zermelo-Fraenkel Axioms

The crucial idea is due to Ernst Zermelo. He proposed to restrict the comprehension axiom schema by adopting only of those axioms, which are really necessary for reconstruction of common mathematics. In 1908 Zermelo published his account:

E. Zermelo. Untersuchungen ueber die Grundlagen der Mengenlehre. "Math. Annalen", 1908, Vol. 65, pp. 261-281 (see English translation in Heijenoort [1967], see also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/zermelo/zermelo.html)

We will consider Zermelo's axioms some time later. Of course, you will not find the axiom C2[~(y in y)] among them, since such kind of reasoning is not used in common mathematics.

Still, how to solve the situation, when we have some formula F(y), we use it to collect the sets y having the property F, yet trying to treat this collection as a set, we obtain a contradiction? For example, what to do with the Russell's collection R = {y | ~(y in y)}? The empty set o belongs to it: ~(o in o), hence, o in R. Still, if you will consider this collection R as a set and denote it by r, then you will have the Russell's paradox: "r in r" is equivalent to ~(r in r). How to solve this problem? Let us use the well-known method due to King Solomon: let us consider Russell's paradox as the proof that Russell's collection is not a set!

Using this approach we must legalize some collections of sets that are not sets. Let F(y, z1, ..., zn) be a formula in the language of set theory (z1, ..., zn are optional parameters). Then let us say that for any fixed values of z1, ..., zn the formula F defines a class

A = {y | F(y, z1, ..., zn)},

i.e. the class of all y's possessing the property F. In general, different values of z1, ..., zn yield a different class. For example, the class { y | ~(y=z1)} consists of all sets except the set z1, i.e. it depends on the parameter z1. On the other hand, the class

V = {y | y=y}

consists of all sets, and does not depend on parameters.

Each set is a class: the set x coincides with the class {y | y in x}. In its general (inconsistent!) form, the comprehension axiom schema says that all classes are sets:

ExAy(y in x <-> F(y, z1, ..., zn)).

Still, "really", some classes are not sets. For example, the Russell's class R = {y | ~(y in y)} cannot be a set: if R=x, then "y in x" is equivalent to ~(y in y), and by setting y=x we obtain a contradiction. Classes that are not sets we will call proper classes. Now we can say simply that each paradox of set theory "generates" some proper class.

We will denote classes by capital letters: A, B, C, ... (small letters a, b, c, ... are variables of the language of set theory, i.e. they denote sets). This notation must remind to us of the metaphoric character of the following "formulas":

y in A, A=B, A<=B, A^B, AUB, A-B.

These "formulas" do not belong to the language of set theory, which does not contain "capital" variables. They are used merely as a convenient notation for the following ("proper") formulas:

F(y), Ay(F(y)<->G(y)), Ay(F(y)->G(y)), F(y)&G(y), F(y)VG(y), F(y)&~G(y),

where the formula F defines the class A, and the formula G defines B.

Now we can start formulating the axioms of set theory as Zermelo proposed them. We will use the same formal language of set theory introduced in the previous section. As the first axiom we adopt the extensionality axiom C1. And we will adopt only those comprehension axioms that are really used in mathematics for building of "useful" sets.

Separation Axiom Schema

Perhaps, the simplest way to obtain new sets would be separation: having a set x separate those members of x that possess some common property F. For example, in this way the set of all prime numbers is obtained from the set of all natural numbers. In general, the situation could be represented as follows:

|------------|-----------------y in x------------------|---------------------|
|-----------------------------------|-------------F(y)-------------|--------|
|-----------------------------------|------y in z------|---------------------|

The condition F(y) is any formula of set theory (it may contain parameters z1, ..., zn). Using this condition we separate those members of the set x that satisfy the condition F. The members separated make up a new set denoted by z.

To legalize this way of reasoning we must introduce the following Separation Axiom Schema: if F(y, z1, ..., zn) is any formula that contains free variables y, z1, ..., zn, but does not contain x and z, then the following formula is declared to be an axiom:

EzAy(y in z <-> y in x & F(y, z1, ..., zn). --------------- (C21[F])

Of course, this schema is a part of the general comprehension schema, namely, it is C2[y in x & F(y, z1, ..., zn)]. An alternative, extremely convenient form of the separation schema can be obtained by using the notion of classes: the formula F defines a class A, hence, the axiom C21[F] says that the intersection A^x (of the class A and the set x) is a set: A^x=z.

Now we can prove the existence of the empty set, i.e the formula EzAy ~(y in z). Indeed, from the logical axioms we know that some sets exist, let z0 be a set. Then, using the impossible condition ~(y=y) and the axiom C21[~(y=y)] we obtain a set x such that

Ay(y in x <-> y in z0 & ~(y=y)).

hence, Ay ~(y in x). Q.E.D.

On the other hand, the axiom of extensionality implies that there is only one empty set. Indeed, if the sets x1, x2 both are empty, i.e. Ay ~(y in x1) and Ay ~(y in x2), hence, Ay(y in x1 <-> y in x2) and (by C1) x1=x2.

Note. Do not underestimate this simple invention - the empty set. You may think safely, that the empty set is not a set. Still, this would make the treatment of, for example, set intersection very uneasy - it would be sometimes a set, and sometimes - empty!

It would be nice to denote the empty set, for example, by o, yet our language of set theory does not contain constants. If we would introduce one, this would not solve the problem completely, because after the first constant we may wish to introduce the second one etc. Therefore, it would be better to do without any constants at all. Let us see how this can be achieved.

If we wish to use the empty set constant o in our reasoning legally, we must provide a method allowing to translate our "o-extended" formulas into the pure language of set theory that does not contain o. This can be done easily. Indeed, if we assert that the empty set o possess some property expressed by a formula F(x), i.e. we use the formula F(o), then this assertion can be expressed also by the formula

Ax(Ay ~(y in x) -> F(x)),

i.e. if x is empty set, then x possess the property F. This formula does not contain the constant o. Hence, we can use the constant o in our reasoning safely. If needed, we can reformulate any such reasoning by using (more complicated) formulas that do not contain o.

The above approach can be generalized. Let us assume that we have proved the existence and uniqueness of the set satisfying some formula T(x). I.e. we have proved the following two formulas:

Ex T(x),

Ax1Ax2 ( T(x1) & T(x2) -> x1=x2 ).

Then we may introduce a constant t denoting the above-mentioned unique set, and use it in our reasoning safely. Any assertion like as F(t) (i.e. "t possess the property F") we can reformulate without using of t: Ax(T(x) -> F(x)).

Still, the formula T could contain parameters, for example, T(x, z1, z2). If we have proved that for each pair of z1 and z2 there is a unique x such that T(x, z1, z2), then T defines some operation. We may wish to introduce a specific symbol # denoting this operation, and use expressions like z1 # z2 in our reasoning. Then x=z1# z2 will be just another record of the formula T(x, z1 , z2). This can be done safely, since for example, the assertion F(z1 # z2) we can reformulate as Ax(T(x, z1, z2) -> F(x)).

The possibility of safe introducing additional constants and operation symbols is widely used in semi-formal reasoning of set theory.

Note. The above explanation somewhat simplifies the problem. For full treatment see the excellent book by E. Mendelson [1997].

Exercise 2.6. a) Prove that for each pair of sets x1, x2 the intersection x1^x2 and the difference x1-x2 also are sets. Justify the usage of both operation symbols.

b) Prove that if A<=B, and A is a proper class, then B also is a proper class.

The statement (b) allows to prove that the class of all sets V = {y | y=y} is a proper class (indeed, R<=V, where R is Russell's class).

The separation schema allows only obtaining smaller sets from larger ones. This means that without additional axioms we would be able to prove only the existence of the "smallest" set - the empty set. So, let us adopt some "positive" axioms, too.

Pairing Axiom

As the first "positive" axiom let us consider the Pairing Axiom. If we have two sets x1 and x2, then we can build a new set containing x1 and x2 as its only members. We will denote this set by {x1, x2}(if x1=x2, then we will write simply {x1}). To make this way of reasoning legal, we must adopt as an axiom the following formula:

Ax1Ax2EzAy ( y in z <-> y=x1 V y=x2). --------- (C22)

Of course, C22 is a comprehension axiom, namely, C2[y=x1 V y=x2]. In terms of classes: the pairing axiom says that if x1 and x2 are sets, then the class {x1, x2} also is a set.

The set {x1, x2} represents the so-called unordered pair. How to introduce in our theory the notion of ordered pair, which is important, for example, as a base for the notion of function? We will denote the ordered pair of x1 and x2 by (x1, x2).

N.Wiener. A simplification of the logic of relations. "Proc. of the Cambridge Philos. Soc.", 1914, vol. 17, pp.387-390

K.Kuratowski. Sur la notion d'ordre dans la theorie des ensembles. "Fund. Math.", 1921, Vol. 2, pp.161-171

The following elegant method of deriving (x1, x2) from {x1, x2} was proposed in the above papers of Norbert Wiener and Kazimierz Kuratowski:

(x1, x2) = {{x1}, {x1, x2}}.

Intuitively, x1 and x2 have different positions in the right hand side expression, i.e. they seem to be "ordered". Do not underestimate this simple invention - without it we would need an additional construct in the language of set theory!

Exercise 2.7. Justify the definition by Wiener and Kuratowski by proving that

(x1, x2) = (y1, y2) -> x1=y1 & x2=y2.

And, by the way, if x1 and x2 are different, then (x1, x2) and (x2, x1) also are different.

Using the notion of ordered pairs we can introduce the notions of product, relation and function.

The product of classes A, B is defined as follows:

A*B = { (u, v) | u in A & v in B },

or, more precisely:

A*B = { z | EuEv(u in A & v in B & z=(u, v) }.

If A and B are sets, then the product A*B also will be a set? Sorry, to legalize this conclusion we must wait for additional axioms.

Relations are defined as classes that consist only of ordered pairs. I.e. the class Q={ y | F(y) } will be called a relation, iff

Ay( y in Q -> EuEv y=(u, v) ).

Each formula F1(u, v) having two free variables defines a relation:

Q = { (u, v) | F1(u, v) }.

And conversely, for each relation Q we can build a formula F1(u, v) such that

(u, v) in Q <-> F1(u, v).

Some time later we will prove that some relations are proper classes, for example:

E = { (u, v) | u=v }, C = { (u, v) | u in v }.

If we had only the axioms C21 and C22, we could not build sets that possess more than two members. Therefore, let us consider the next operation for building sets - the union of sets.

The simplest case is the union xUy, yet in general we can merge arbitrary collections of sets. I.e. for any class B we can define the union of all members of B:

UB = { y | Ez(y in z & z in B) }.

In other words, the union UB consists of all "members of members" of B. By the way, U{x, y} represents the well-known xUy.

Union Axiom

The next axiom we adopt is the Union Axiom. It says that if x is a set, then Ux also is a set:

AxEu(Ay(y in u <-> Ez(y in z & z in x)). ----------- (C23)

This axiom is a comprehension axiom, namely, C2[Ez(y in z & z in x)].

Exercise 2.8. a) Prove that if B is a proper class, and x is a set, then the difference B-x is a proper class.

b) Show that the axioms C21, C22 and C23 allow to prove the existence of any set which can be defined by using a finite number of the empty set symbols o, commas and braces, for example, {o, {o, {o}}, {o, {o, {o}}}}.

c) (Open problem for students). My feeling says that the set theory C1+C21+C22+C23 is equivalent (in the sense of Section 3.2) to the first order arithmetic PA (defined in Section 3.1). Prove or disprove this conjecture. See also Exercise 2.13c.

Now we can prove that, if A and B are non-empty classes, and their product A*B is a set, then A and B also are sets. Indeed, let v0 be a member of B, then for each u in A:

(u, v0) in A*B, {{u}, {u, v0}} in A*B, {u} in U(A*B), u in UU(A*B).

Hence, A<=UU(A*B). Since A*B is a set, then UU(A*B) also is a set, hence, so is A. Similar argument proves that B also is a set. (For a proof that, if A and B are sets, then the product A*B also is a set, we must still wait!)

Exercise 2.9. Prove that the relations E and C defined above are proper classes.

Power-Set Axiom

The following rather complicated way of building sets was invented, perhaps, as late as in 1870s - during the attempts to derive the definition of real numbers from the properties of rational numbers. It appeared that speaking about "arbitrary" real numbers involves inevitably speaking about arbitrary sets of natural numbers. Let us consider, for example, the definition of real numbers by means of infinite binary fractions. Any such fraction, for example,

0.10101100110000101110...

"generates" some set of natural numbers. The above example generates the set

{1, 3, 5, 6, 9, 10, 15, 17, 18, 19, ...}.

In this way we can "generate" all the possible sets of natural numbers.

In general, this new operation is defined as follows. If x is a set, let us consider all possible subsets of x, i.e. all y's such that y<=x, or Az(z in y -> z in x). Let us denote the class of all subsets of x by

P(x) = { y | y<=x}

(P stands for "power"). We wish to postulate that if x is a set, then P(x) also is a set. Thus, we adopt the following Power-Set Axiom:

AxEzAy(y in z <-> y<=x). --------------- (C24)

Of course, C24 is a comprehension axiom, namely, C2[y<=x].

Now we can prove that the product of two sets also is a set:

x*y = { (u, v) | u in x & v in y }.

Indeed, if u in x and v in y, then {u} in P(x) and {u, v} in P(xUy). Hence, (u, v) = {{u}, {u, v}} in PP(xUy), i.e. the class x*y is a part of the set PP(xUy). Q.E.D.

Replacement Axiom Schema

Functions are defined as relations that possess the "mapping" property:

(u, v1) in F & (u, v2) in F -> v1=v2,

or

AuAv1Av2 (F(u, v1) & F(u, v2) -> v1=v2).

If F is a function, then F(u)=v can be used as a convenient record of "(u, v) in F". Some functions are proper classes, for example, the above-mentioned identity function E = {(u, v) | u=v}.

The well-known notions of domain and range of the function F can be defined as follows:

domain(F) = { u | Ev F(u)=v }

range(F) = { v | Eu F(u)=v }.

If F is a proper class, then, in general, domain(F) and range(F) also will be proper classes. For example, domain(E) = range(E) = V.

Exercise 2.10. Prove that F is a set, iff domain(F) and range(F) both are sets.

The "zero"-constant-function, i.e. F0 = {(u, v) | v=o}, has a proper class domain (domain(F0)=V), yet its range is a set (range(F0)={o}). Still, how about the fourth possibility - can a function have a set domain and a proper class range, i.e. can a function map a set onto a proper class? Of course, we do not wish such functions, yet to prohibit them we must adopt additional axioms - the so-called replacement axiom schema. You will not find these axioms in Zermelo's 1908 paper. They were proposed in 1921 by Abraham Fraenkel:

A. A. Fraenkel. Zu den Grundlagen der Cantor-Zermeloschen Mengenlehre. "Math. Annalen", 1922, Vol. 86, pp. 230-237

If F is a function, then for any class B we will denote by F"B the F-image of B:

F"B = { v | Eu (u in B & F(u)=v) }.

Exercise 2.11. Prove that if a function f is a set, then for any class B the image f"B is a set.

Still, if the function F is a class, and b is a set, then the image F"b is ... a set? Imagine, you take members of the set b one by one, and replace each member y by F(y). The result is F"b. Of course, we wish F"b to be a set. So, let us adopt the Replacement Axiom Schema:

F is a function -> AbEc F"b=c.

Or, more precisely, let F(u, v, z1, ..., zn) be a formula that does not contain v1, v2, x, y, b, c, then we adopt as the axiom C25[F] the following formula:

AuAv1Av2 (F(u, v1, z1, ..., zn) & F(u, v2, z1, ..., zn) -> v1=v2) -> AbEcAy(y in c <-> Ex(x in b & F(x, y, z1, ..., zn))).

Of course, this is again a comprehension axiom, namely, C2[Ex(x in b & F(x, y))], yet we allow to apply it only after we have proved that for each x there is only one (or none) y such that F(x, y).

Exercise 2.12. a) Prove that if B is a proper class, and b is a set, then no one-to-one function can map B into b.

b) Derive the separation axiom schema C21 from the replacement axiom schema C25.

c) (J.Mycielski) Derive the pairing axiom C22 from C21, C24 and C25. (Hint: apply first C21, then twice - C24, and finally - C25).

Axiom of Infinity

The axioms of set theory we have so far allow proving the existence only of finite sets. Indeed, if you wish, you can verify easily that all these axioms hold when interpreted in the area of sets, which can be defined by using of a finite number of the empty set symbols o, commas and braces, for example, {o, {o, {o}}, {o, {o, {o}}}}. Hence, our next step must be adopting of some axiom of infinity.

The approach used below is somewhat non-traditional: I will try to introduce the axiom of infinity as a comprehension axiom.

First let us declare that some of the simplest finite sets are natural numbers (by definition). Namely, the empty set o we declare to be zero, the set {o} - to be the number "one", {o, {o}}, or {0, 1} will be the number 2, {0, 1, 2} - the number 3, etc. In general, if the set cn is declared to be the number n, then the set cn+1 = cn U {cn} we declare to be the number n+1. It seems that we have thus also the set of all natural numbers

{0, 1, 2, 3, ..., n, n+1, ...}?

As we established above, the axioms we have adopted so far do not allow proving the existence of such (infinite!) set. Therefore, let us try to define at least the class of all natural numbers. It appears not an easy task! The easy "solutions" by using "infinite formulas" like as

y=c0 V y=c1 V y=c2 V ... V y=cn V ...

cannot be taken seriously, since the language of set theory allows only finite formulas. To obtain a finite formula N(y) expressing En(y=cn) let us follow an early (1923) idea by John von Neumann:

J.von Neumann. Zur Einfuehrung der transfiniten Zahlen. "Acta Szeged", 1923, 1, pp. 199-208

In 1930s von Neumann's constructions were simplified considerably by R.M. Robinson, K. Goedel and P. Bernays (see Fraenkel, BarHillel, Levy [1973]).

The non-trivial proposal is to formalize the semiformal predicate En(y=cn) as a combination of the following two properties:

a) Transitivity. The set y is called transitive, if it contains all members of each its member, i.e. Au(u in y -> u<=y), or

AuAv (u in v & v in y -> v in y).

b) Double-well-ordering under "in". The relation "<" on the set y is called a (partial) ordering, iff

Ab(b in y -> ~(b<b)) (non-reflexivity),

AbAcAd(b,c,d in y -> (b<c & c<d -> b<d)) (transitivity).

Let us call an ordering "<" of x a double-well-ordering, iff each non-empty subset of y contains a minimum and a maximum member under "<", i.e.

Az(z<=y & Es(s in z) -> Eu(u in z & Av(v in z -> v<u V v=u)) & Eu(u in z & Av(v in z -> u<v V u=v))).

Note. Intuitively, only finite sets can be double well ordered (by any relation).

The second property we introduce requires that y must be double well ordered by the membership relation "in".

Hence, the surprising (finite!) definition of the (infinite!) class of all natural numbers would be:

N = {y | y is transitive & y is double well ordered by "in"}.

Exercise 2.13. a) Show that the "standard" natural numbers cn defined above all are members of the class N. I.e. prove the theorem schema "for all n: cn in N". We do not know how to replace this schema (i.e. an infinite sequence of theorems) by one (finite) theorem.

b) Show that, if x in N and y is the maximum (under "in") member of x, then x-{y} in N, x-{y} in x, and x-{y}=y. Derive from this that for all n: if x in N, and x contains exactly n members, then x= cn.

c) (Open problem for students). Try to prove that all the axioms of the first order arithmetic PA (see Section 3.1) hold in the class N. Probably, you will need only the axioms C21, C22, C23 (see Exercise 2.8c).

Are the results (a) and (b) enough to say that "N is the class of all natural numbers"? For me, this is enough.

If we wish to think of N as of a set, then we must adopt the following Axiom of Infinity:

ExAy(y in x <-> y in N). (C26).

Of course, this is a comprehension axiom, namely, C2[y in N].

Exercise 2.14. Write down the full text of C26 and count the number of characters in it.

The set of all natural numbers is denoted traditionally by w (omega), instead of the above N.

Note. As you see, C26 is somewhat lengthy when compared to other single Zermelo axioms (not axiom schemas, of course!). Perhaps, for this reason, Zermelo and his followers use shorter forms of the axiom of infinity, for example:

Ex(o in x & Ay(y in x -> yU{y} in x)),

or even shorter:

Ex( Ey(y in x) & Ay(y in x -> Ez(z in x & ~(z=y) & y<=z))).

Of course, C26 implies these formulas (simply take w for x). If you wish, prove that the converse also is true.

As a single axiom, C26 allows to prove the existence only of countable sets. To prove the existence of uncountable sets, the power-set axiom C24 must be applied additionally: the set of all sets of natural numbers, i.e. P(w) is uncountable.

Exercise 2.15. Prove that in the set w the semiformal ("second order") Peano axioms hold (see Section 3.1).

The axiom of infinity completes the list of comprehension axioms, which are necessary for reconstruction of common mathematics, i.e. for building of "useful" sets. Is our list (the axioms from C21 to C26) "complete" in the sense that no "acceptable" comprehension axioms will be discovered in the future? The answer is "yes" (Church's thesis for set theory?). We will discuss this problem in Section 2.4.

So, it would be nice to stop at this point and finish our list of axioms by adopting an axiom asserting that no other sets exist - except those, which can be built by using the comprehension axioms? I.e. the axiom: "All sets can be built by using the comprehension axioms". Still, how to put this restriction into one (finite!) formula of set theory?

Open problem. How to put the statement "All sets can be built by using the comprehension axioms" into one formula of set theory? Is this possible at all?

While this problem remains unsolved, let us return to the tradition, and discuss the remaining two axioms - the axiom of regularity and the axiom of choice.

Axiom of Regularity

It appears that the existence of some "abnormal" kinds of sets is consistent with all the axioms we have adopted so far. For example, if you wish to assert the existence of a set x such that "x in x", you can do this safely: no contradiction with our previously adopted axioms will arise.

Exercise 2.16. Prove that this is the case by building a "model" where the "in" relation is interpreted appropriately (i.e. "abnormally"). Similarly, prove the relative consistency of the conjecture "there are two sets x, y such that x in y and y in x".

The generally acknowledged way of excluding such "abnormal" sets from the set theory is the Axiom of Regularity, introduced by J. von Neumann in 1925:

~Ex( Ey(y in x) & Ay(y in x -> Ez(z in x & z in y))). ----- (C3)

J. von Neumann. Eine Axiomatisierung der Mengenlehre. "Journal fuer reine und angewandte Mathematik", 1925, Vol.154, pp. 219-240.

The general idea behind C3 is excluding of infinite chains "down" the membership relation, for example:

... in xn in ... in x3 in x2 in x1 in x.

This is because the axiom of regularity is called sometimes the "foundation axiom".

Exercise 2.17. Derive from C3 that V=R, where R is the Russell's class, i.e. prove that ~(x in x) for all x. Similarly, derive from C3 that there are no two sets x, y such that x in y and y in x.

The set theory adopting the axiom of extensionality (C1), the separation axiom schema (C21), the pairing axiom (C22), the union axiom (C23), the power-set axiom (C24), the replacement axiom schema (C25), the axiom of infinity (C26) and the axiom of regularity (C3), is called Zermelo-Fraenkel set theory, and is denoted by ZF.

Zermelo had in his axiom list also the famous axiom of choice.

Axiom of Choice

If all members of the set x are non-empty sets, then we can try to define a choice function f that assigns to each y in x some f(y) in y. Can we hope to define a choice function for each collection of non-empty sets? At least, we can postulate, that such function always exists. Thus we obtain the Axiom of Choice (AC):

Ax( Ay(y in x -> y not empty) -> Ef(f is function & domain (f) = x & Ay(y in x -> f(y) in y))).

In 1904, Zermelo used this "principle of arbitrary choice" explicitly to prove that each set can be well-ordered. (The ordering "<" of some set x is called well-ordering, iff each non-empty subset of x contains a minimum member under "<".)

E.Zermelo. Beweiss, dass jede Menge wohlgeordnet werden kann. "Math. Annalen", 1904, Vol. 59, pp. 514-516 (see also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/zermelo/zermelo.html)

This provocative paper by Zermelo was by far not the first time in the history when something like the "principle of arbitrary choice" was used in mathematical proofs. Still, Zermelo dared to state this principle in its most unrestricted form.

Exercise 2.18. a) Prove the converse statement: if the union Ux is well ordered (by some relation "<"), then there is a choice function for x.

b) Derive from AC that each infinite set contains a countable subset.

Note that AC is not a comprehension axiom. The choice function f is not defined by a formula expressing that f(y)=z. The existence of f is merely postulated. The term "principle of arbitrary choice" underlines this extremely non-constructive nature of AC. I.e. we assume that we are able to make infinite number of choices without having any guiding rule. (For the long history of hot emotional discussions "around the axiom of choice" - see Medvedev [1982]).

We can "judge" AC as we please, yet as an axiom of set theory it is absolutely safe: in 1938 Kurt Goedel proved that

"If ZF+AC would be an inconsistent theory, then so would be ZF".

K.Goedel. The consistency of the axiom of choice and of the generalized continuum hypothesis. "Acad. U. S. A.". 1938, Vol. 24, pp.556-557.

The set theory ZF+AC is denoted traditionally by ZFC. By using the axioms of ZFC all theorems of Cantor's intuitive set theory can be proved. (Mathematicians may be interested to verify this themselves - just follow the excellent concise book by T. Jech [1971]: you will be forced to do yourselves 90% of the technical work.) Since the end of XIX century we know that all the other theories of common mathematics can be reformulated in set theory. This completes the first stage of Hilbert's program: convert all existing mathematics into a formal theory.

set theory, axiom, Zermelo, Fraenkel, Frankel, Cantor, Frege, Russell, paradox, formal, axiomatic, axioms, comprehension, infinity

Back to title page.