set theory, axiom, Zermelo, Fraenkel, Frankel, Cantor, Frege, Russell, paradox, formal, axiomatic, axioms, comprehension, infinity

Back to title page.

In the dates and facts of the real history I am following the excellent books by Fyodor Andreevich Medvedev (1922–1992) (see Medvedev [1965] and [1982]). See also online paper "The beginnings of set theory" in the MacTutor History of Mathematics archive.

In XIX century, development process of the most basic mathematical notions led to the intuition of arbitrary infinite sets. Principles of the past mathematical thinking were developed up to their logical limits. Georg Cantor did the last step in this process, and this step was forced by a specific mathematical problem.

**G. Cantor.** Ueber die Ausdehnung eines Satzes aus der
Theorie der trigonometrischen Reihen. "Mathematische
Annalen", 1872, Vol. 5, pp. 123-132 (available at http://www.maths.tcd.ie/pub/HistMath/People/Cantor/Ausdehnung/,
see also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/cantor/cantor.html).

See also a collection of Cantor web-sites at www.geometry.net.

In this 1872 paper Cantor proved the following theorem: if two
Fourier series are known converging to identical limit values at **all
but a finite number of points** of the segment [-pi, pi], then
these series (i.e. their respective coefficients) are identical.
How far this theorem could be extended?

Cantor started with the simplest kinds of infinite sets of exception points, for example:

{1/n | n >= 1}.

This set is infinite, it possess only one the so-called condensation point (namely, x=0). Cantor succeeded in proving that his theorem holds, if the set of exceptions possess only one condensation point. The generalization to any finite number of condensation points was straightforward.

The next step would be considering sets having infinitely many condensation points. The most simple kind of such sets has exactly one "second order" condensation point, i.e. a condensation point for the usual ("first order") condensation points, for example:

{1/m + 1/n | m, n >= 1}

Cantor succeeded in proving his theorem for such sets of exceptions, too. The following step would involve the "third order" condensation points etc. In this way Cantor was forced to work with sets of points of rapidly growing complexity. Thus, gradually, the intuitive notion of arbitrary infinite sets of points began to take shape...

To bring some order into this process Cantor introduced the
notion of the **derivative set**: if P is a set of points,
then P' denotes the set of all condensation points of P. Further
one can define P'' as (P')', P''' - as (P'')' etc.

**Exercise 2.1.** For any fixed k>=1, let us consider
the set

Q(k) = {1/n_{1} +... + 1/n_{k}
| n_{1},... , n_{k} >= 1}.

Prove that the k-th derivative set Q^{k}(k) = {0}.

Cantor successfully generalized his Fourier series uniqueness
theorem for exception sets of any finite order (i.e. for sets P
having a finite P^{k} for some k).

In this way, investigating Fourier series, and having built a plentiful collection of complicated infinite sets of points, Cantor came to the intuition of arbitrary infinite sets. At some moment he arrived to the question: have all infinite sets equal "number" of members?

This critical point was reached in the fall of 1873. Cantor's letter to Richard Dedekind from September 29, 1873 contains the surprising one-to-one correspondence between natural and positive rational numbers:

1 2 3 4 5 6 7
8 9 …

1/1, 1/2, 2/1, 1/3, 3/1, 1/4, 2/3, 3/2, 4/1,...

--- -------- -------- ------------------

2 3 4
5 …

The sequence starts with the only fraction m/n such that m+n=2, after that follow: two fractions having m+n=3, two fractions having m+n=4 (2/2 is omitted, it equals to 1/1 that was already encountered) etc.

In other words, the (dense!) set of all rational numbers possess the same "number of members" as the (discrete!) set of all natural numbers! As the next step Cantor proposed to try enumerating of all real numbers, i.e. all the points of a straight line. In his reply, Dedekind pointed out that also the set of all algebraic numbers can be enumerated by using natural numbers. Still, he did not succeed in enumerating all the real numbers...

In his next letter to Dedekind (December 7, 1873) Cantor
proved that this is not possible: there is no one-to-one
correspondence between real numbers and natural numbers. Cantor's
proof involves the method that is called now the **diagonal
method** (perhaps, first used for another purpose by P.
du Bois-Reymond in 1871).

Namely, assume that we have some segment [a, b] and some
"enumerating" sequence of real numbers r_{1}, r_{2},...
, r_{n},.... Divide [a, b] into three equal parts, and
take the part that does not contain the number r_{1}.
Denote this part by [a_{1}, b_{1}], divide it
again into three equal parts, and take the part that does not
contain the number r_{2}, etc. This process produces a
sequence of contracting segments:

a_{1} <= a_{2} <= a_{3}
<= ... <= b_{3} <= b_{2} <= b_{1}.

The only common point of these segments is some real number r
that does not belong to the "counting" sequence r_{1},
r_{2},... , r_{n},.... Hence, you cannot
enumerate all real numbers (even a tiny segment of them!) by
using natural numbers.

Thus **there are at least two kinds of infinite sets**: the
so-called countable sets (that can be enumerated by using natural
numbers), and some "more strongly infinite" sets that
cannot be enumerated. This discovery by Georg Cantor is the most
significant event in the history of the mathematical
investigation of infinity. Still, for Cantor itself - at the
moment of the discovery - the following corollary was even more
significant: "most" real numbers are transcendent.
(Indeed, all the algebraic numbers can be enumerated, but the
transcendent ones cannot.) The above Cantor's proof is simple
enough to be followed by children (unlike the 1844 construction
of particular transcendent numbers by J.
Liouville, and the 1873 proof by Ch.
Hermite that e is transcendent).

These first Cantor's set theoretical results were published in 1874:

**G.Cantor.** Ueber eine Eigenschaft des Inbegriffes aller
reelen algebraischen Zahlen, "J. reine und angew.
Math.", 1874, Vol. 77, pp, 258-262 (see also online comments
at http://thoralf2.uwaterloo.ca/htdocs/scav/cantor/cantor.html).

**Exercise 2.2.** Construct one-to-one correspondences
between: a) two segments (of different length), b) between the
segment [a, b] (i.e. the set {x | a<=x<=b}) and the
interval (a, b) (i.e. the set {x | a<x<b}), c) between an
interval and the entire set of all real numbers.

Having discovered the existence of at least two radically different types of infinite sets, Cantor went further. In a letter to Dedekind (January 5, 1874) he asks: is a two-dimensional continuum (for example, a rectangle) equivalent to a one-dimensional continuum (for example, a segment)? In other words: does a rectangle contain more points than a straight-line segment? Cantor conjectured "yes", and spent the following three years trying to prove that a rectangle contains more points than a segment. He did not succeed. Only after he decided to explore the "unrealistic" opposite hypothesis (rectangles and segments contain equal numbers of points), he succeeded almost immediately! His proof (first exposed in a letter to Dedekind from July 20, 1877) is simple enough to be followed by children:

A one-to-one correspondence between the rectangle [0, 1)x[0, 1) and the segment [0, 1] can be produced easily from the decimal fractions of coordinates:

(x, y) -> z

x = 0.abcd...

y = 0.ABCD...

z = 0.aAbBcCdD...

**Exercise 2.3.** Fill in the gaps and complete this proof
(published in 1878):

**G. Cantor.** Ein Beitrag zur Mannigfaltigkeitslehre.
"J. reine und angew. Math.", 1878, Vol. 84, pp. 119-133
(see also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/cantor/cantor.html).

Cantor was afraid that his simple proof has "destroyed" the notion of dimensionality. Replying to this, Dedekind pointed out that Cantor's correspondence seems to be discontinuous, and conjectured that continuous correspondences between continuums of different dimensionalities would be impossible. Still, G. Peano in 1890 and D. Hilbert in 1891 succeeded in producing continuous (yet not one-to-one!) mappings from a segment onto a rectangle. And only in 1911 L. Brouwer "saved" the notion of dimensionality by proving that Dedekind was right: a continuous one-to-one correspondence between continuums of different dimensionalities is impossible:

**L. Brouwer**. Beweis der Invarianz der
Dimensionenzahl. "Math. Ann.", 1911, Vol. 70, pp.
161-165.

The remarkable paper of 1878 contains another famous statement - the so-called continuum hypothesis. Working hard with various infinite sets of points Cantor established that all infinite sets he could produce, fall into two categories:

a) the so-called **countable** sets (i.e. the sets that can
be enumerated by using natural numbers),

b) the sets that are equivalent to the entire **continuum**
(i.e. the set of all real numbers).

Cantor was unable to produce sets of "intermediate
power", i.e. uncountable sets of points that were not
equivalent to the entire continuum. This is why he conjectured
that such sets do not exist. This conjecture is known as the **continuum
hypothesis**: each infinite set of points either is countable,
or it is equivalent to the entire continuum.

Cantor spent many years trying to prove the continuum
hypothesis. For this purpose, he developed further his apparatus
of derivative sets P^{k}. He introduced condensation
points of infinite order, i.e. points that are simultaneously
condensation points of any finite order, and defined the infinite
derivative set P^{w }("P omega") as the
intersection of P', P'', P''' etc. After this, he introduced the
next derivatives:

P^{w+1} = (P^{w})', P^{w+2} = (P^{w+1})',...

P^{w*2} = intersection of P^{w+k} for all
finite k,

P^{w*2+1},... , P^{w*3}, P^{w*3+1},...

P^{w*w}, P^{w*w+1},...

**Exercise 2.4. **Try to produce a set P such that P^{w}
= {0}, such that P^{w*2} = {0} etc.

In this way Cantor arrived to a remarkable extension of the
natural number system, the so-called **transfinite ordinal
numbers**. These numbers extend the usual finite counting
process:

Finite (i.e. natural) numbers are called first class ordinal numbers.

w (the first transfinite ordinal number) follows after all finite numbers,

w+1 follows immediately after w,

w+2 follows immediately after w+1,

...

w*2 is defined as w+w, i.e. it follows after all w+k where k is finite,

w*3 is defined as w*2+w, i.e. it follows after all w*2+k where k is finite,

...

w^{2} is defined as w*w, i.e. it follows after all w*k
where k is finite, etc.

...

w^{w} follows after all w^{k} where k is
finite, etc.

...

e_{0} (epsilon 0) follows after all expressions built
up of natural numbers and w by using addition, multiplication and
exponentiation,

...

Infinite numbers having a countable set of predecessors are
called **second class ordinal numbers**.

w_{1} (omega 1) follows after all second class ordinal
numbers, i.e. it is the least third class ordinal number.

...

Cantor proved that the set of all second class ordinal numbers
O_{2} is the least uncountable set, i.e. if some infinite
subset S of O_{2} is not equivalent to the entire O_{2},
then S is countable. Thus, to prove the continuum hypothesis, it
was sufficient to prove that O_{2} is equivalent to the
entire continuum, i.e. "simply" to produce a one-to-one
correspondence between the second class ordinal numbers and the
real numbers. Still, Cantor and all his numerous followers failed
to do this...

Thus, in some sense, the continuum hypothesis represents one of the most beautiful problems in mathematics: it can be explained to children, yet it still remains unsolved for 130 years!

Cantor was already tired of many years of unsuccessful attempts to prove the continuum hypothesis, when he received another blow. In 1895 he discovered in his set theory... a contradiction. In 1897 another contradiction was found (and published) by C.Burali-Forti...

Still, all these difficulties cannot change the "verdict of the history": Georg Cantor remains one of the most outstanding personalities in the history of mathematics. He succeeded in developing the principles of the past mathematical thinking up to their logical limits.

First let us formalize the set theory directly as G. Cantor created it. This theory is based on the intuitive concept of "a world of sets" where all sets (finite and infinite) and all their members exist simultaneously and completely. In our axioms we would like to describe the governing laws of this frozen "world of sets".

At the very beginning we must answer the following question:
can our "world" consist of sets only? A set can consist
of members, which are sets. If the set x_{1} contains a
member x_{2,} which is also a set, if x_{2}
contains x_{3}, x_{3} contains x_{4} etc.
- then, following along this chain must we not encounter
something more tangible than "sets of sets"? If nothing
tangible exists, how can sets exist? If nothing exists, then the
world is ... an empty set o (i.e. from nothing we obtain
something). Having o we can build the set which contains o as a
member - the set {o}. Having o and {o} we can build a two-element
set {o, {o}} etc.:

x_{0} = o; x_{1} = {o}; x_{2}
= {o, {o}}; ...; x_{n+1} = x_{n} U {x_{n}};
...

Therefore, even postulating that "nothing exists" we can obtain infinitely many different sets (compare Devlin [1977]).

Now we can define the **language of the formal set theory**.
We will use only one sort of variables: x, y, z, ... -
subscripted or not. Intuitively, the range of each variable
consists of "all possible sets" (since we have decided
that the "world of sets" consists from sets only). We
do not introduce constants (like as o to denote the empty set)
and function symbols (like as xUy to denote union of sets). Later
we will see how to do without these symbols. We introduce two
predicate symbols: "in" (intuitively, "x in
y" means "x is a member of y"), and "="
(intuitively, "x=y" means "x is the same set as
y"). We can combine atomic formulas like as "x in
y" and "x=y" by using connector and quantifier
symbols, thus obtaining **formulas of set theory**. For
example, the formula Ay ~(y in x) says that x is empty set, and
the formula EyAz (z in x -> z=y) says that x possess exactly
one member.

**Exercise 2.4a**. Provide formulas expressing
the following assertions:

"x possess exactly two members",

"x possess exactly five members",

"x is a subset of y (i.e. all the members of x are members of y)",

"x and y do not intersect (i.e. x and y do not possess common members)",

"there is an infinite set".

As a serious formal theory, set theory adopts axioms and inference rules of the classical logic (see Appendix 3). After this we must define the specific meaning of the identity predicate in the set theory: x=y means that the sets x and y have the same members. It may seem obvious, yet it requires a special axiom - you cannot derive this meaning from pure logic. Using pure logic you can derive (check it!) only that

x=y -> Az (z in x <-> z in y).

This is normal: if x equals to y, then x an y have the same properties. Still, if the sets x and y have different definitions, and after some effort we could establish that Az(z in x <-> z in y), then we can conclude that x=y only by using the specific meaning of identity adopted in set theory. Logic as such is indifferent to the meaning of identity, it postulates only some general properties of this predicate. Therefore, to define the specific meaning of set identity we must introduce the following axiom:

Az (z in x <-> z in y) -> x=y. --------------- (C1)

This axiom is called the **Extensionality Axiom** (it says
that in our set theory the identity of sets is treated
extensionally, not intensionally, i.e. two different set
definitions can lead to identical sets).

Since we adopted the classical logic for our "world of sets", we can prove the formula Ex(x=x), i.e. that at least one set exists in our "world". Still, pure logic does not allow to conclude something about the properties of this set x. To obtain sets having specific properties (for example, the empty set) we must introduce additional axioms.

The main principle of Cantor's set theory says that a set is a "many" of which we can think as of a single "whole". How could we think of many things as of a whole? One of the ways would be to define the combination of properties which allows to separate things as belonging to the whole and not belonging to it. Now, the corresponding notation is used widely in mathematics, for example, c = { x | C(x) }, where C(x) says that "x is a crocodile". Mathematicians say that c is the "set of all crocodiles" and may operate with c as with a single object.

In the modern language we formulate Cantor's principle as the **Comprehension
Axiom Schema**. Let F(y) be a formula of set theory (it may
contain additonal free variables, let us call them parameters).
We may think of F(y) as of an assertion "the set y possess
the property F". Hence, according to the Cantor's principle
we can introduce the set x of all y's possessing the property F:
x= { y | F(y) }. To legalize operations like this one we must
adopt the following axiom schema:

ExAy (y in x <-> F(y)) -------------- (C2[F])

(of course, formula F does not contain x). For each formula F we have a separate comprehension axiom C2[F].

In particular, the axiom C2[~(y=y)] allows to prove the existence of the empty set: ExAy (y in x <-> ~(y=y)). Hence, ExAy ~(y in x), i.e. x is the empty set.

**Exercise 2.5.** Using appropriate comprehension axioms,
prove the existence of the following sets (o is the empty set):
{o}, {o, {o}}, {{{o}}}. Can you prove the existence of at least
one infinite set? (If not, see how this **can** be
done a la von Neumann in Section 2.3.)

The axioms C1 and C2[F] (for all formulas F that do not contain x) and the axiom of choice define a formal set theory C which corresponds almost 100% to Cantor's intuitive set theory (of the "pre-paradox" period of 1873-94).

**Cantor and the Axiom of Choice**

Of course, in 1873-94 Cantor believed in the unrestricted comprehension principle. Still, did Cantor really accept a kind of axiom of choice as a valid principle of set theory? Let us check his two main papers on foundations of set theory:

**G.Cantor**. Grundlagen einer allgemeinen
Mannigfaltigkeitslehre. "Math. Annalen", 1883, Vol.21,
pp.545-586

**G.Cantor**. Beitraege zur Begruendung der transfiniten
Mengenlehre. "Math. Annalen", 1895, Vol.46, pp.481-512;
1897, Vol.49, pp.207-246

Both papers were reprinted (with extensive comments by E.Zermelo) in

**G.Cantor**. "Gesammelte Abhandlungen",
Springer, 1932

I am using the book by F.A.Medvedev "The early history of the axiom of choice" (Medvedev [1982]) that contains extensive relevant quotes from Cantor's works and Zermelo's comments.

Two main conclusions:

1. Cantor is using ad hoc - as he needs, and tens of times! - the "possibility of arbitrary choices" without any attempt to formulate something like a general axiom of choice. For example, when proving that each infinite set contains a countable subset (a quote from the 1895-97 paper):

"If we have some rule of deleting of elements t_{1},
t_{2}, ..., t_{n-1} from [an infinite set - K.P.]
T, then always there is a possibility to delete one more element
t_{n}..."

2. Cantor believed in "validity" of some assertions that are equivalent (or almost equivalent) to the axiom of choice. For example, in the 1883 paper he qualifies the thesis "each well defined set can be well ordered" as a "remarkable generally valid law of thought" and promises to consider it in one of subsequent papers. Still, he never did, and in the 1895-97 paper this thesis does not appear at all. Because of the "smell" of paradoxes?

Today, we can guess, did Cantor's thesis "each well
defined set can be well ordered" mean that **all** sets
can be well ordered (this would be equivalent to the axiom of
choice, see below)? Or, maybe, Cantor intended (already in 1883?)
to distinguish between "well defined" and "ill
defined" sets? This would mean that Cantor believed only
that "each constructible set can be well ordered"
(theorem proved by K.Goedel in 1938, see below).

Of course, we would be able to derive from our formal set
theory C (i.e. from a very simple set of axioms!) all the common
mathematics. **A very remarkable fact - 100% of mathematics can
be derived from an extremely simple set of axioms! **Still,...
these axioms are also sufficient... to derive a contradiction!

A very simple way how to do this was invented by Bertrand Russell in 1902, and is now called Russell's paradox. Russell discovered his paradox in Frege's formal system of mathematics (see Section 6.1), yet his method applies to Cantor's set theory as well. In terms of our formal set theory C Russell's paradox can be derived as follows.

Normally, sets are not members of themselves, i.e. normally, ~(y in y), for example, ~(o in o). Still, the axioms of C do not exclude the existence of "abnormal" sets, which are members of themselves. Hence, let us consider the set of all "normal" sets: x = { y | ~(y in y) }. The existence of this x is guarantied by the comprehension axiom C2[~(y in y)]:

ExAy (y in x <-> ~(y in y)).

Now substitute x for y, and you will have a contradiction:

Ex (x in x <-> ~(x in x)).

Hence, unexpectedly, some of the comprehension axioms lead to contradictions. Cantor's set theory C is inconsistent. The unrestricted comprehension axiom scheme cannot serve as a foundation of set theory!

**Note**. The real history of paradoxes in set theory was
more complicated. In Russell's paradox the notion of
"abnormal" sets (x in x) is involved, i.e. this paradox
could not be discovered in the intuitive set theory, and was
discovered first in Frege's formal system of mathematics. Cantor
discovered the first paradox of set theory himself in 1895. The
next paradox discovered - and published - C.Burali-Forti in 1987.
These paradoxes were more complicated than Russell's paradox, yet
much easier to discover! See Fraenkel,
Bar Hillel, Levy [1973].

**Note**. In the above text the terms "invented"
and "discovered" were used as synonyms. Mathematics is
the only branch of science for which this is true (see Section
1.2).

Today, 100 years after, perhaps, the discovery of paradoxes in mathematics would not be percepted as a catastrophe. Since then, humans have seen things much worse than paradoxes. Still, for G.Cantor (1845 - 1918), R.Dedekind (1831 - 1916) and G.Frege (1848 - 1925), who started to believe in the unrestricted comprehension schema in 1870's and lived with this absolute belief for more than 20 years, the discovery of paradoxes was a personal tragedy. For Georg Cantor - his set theory, and for Gottlob Frege - his formal system of mathematics were their main and only contributions to mathematics. After the discovery of paradoxes they published nothing comparable with their outstanding works of the previous period. They could not find an exit. The exit was found by mathematicians of the next generation.

The crucial idea is due to Ernst Zermelo. He proposed to restrict the comprehension axiom schema by adopting only of those axioms, which are really necessary for reconstruction of common mathematics. In 1908 Zermelo published his account:

**E. Zermelo.** Untersuchungen ueber die Grundlagen der
Mengenlehre. "Math. Annalen", 1908, Vol. 65, pp.
261-281 (see English translation in Heijenoort [1967], see also
online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/zermelo/zermelo.html)

We will consider Zermelo's axioms some time later. Of course, you will not find the axiom C2[~(y in y)] among them, since such kind of reasoning is not used in common mathematics.

Still, how to solve the situation, when we have some formula
F(y), we use it to collect the sets y having the property F, yet
trying to treat this collection as a set, we obtain a
contradiction? For example, what to do with the Russell's
collection R = {y | ~(y in y)}? The empty set o belongs to it:
~(o in o), hence, o in R. Still, if you will consider this
collection R as a set and denote it by r, then you will have the
Russell's paradox: "r in r" is equivalent to ~(r in r).
How to solve this problem? Let us use the well-known method due
to King Solomon: let us consider Russell's paradox as the **proof
**that Russell's collection **is not a set**!

Using this approach we must legalize some collections of sets
that are not sets. Let F(y, z_{1}, ..., z_{n}) be
a formula in the language of set theory (z_{1}, ..., z_{n}
are optional parameters). Then let us say that for any fixed
values of z_{1}, ..., z_{n} the formula F defines
a **class**

A = {y | F(y, z_{1}, ..., z_{n})},

i.e. the class of all y's possessing the property F. In
general, different values of z_{1}, ..., z_{n}
yield a different class. For example, the class { y | ~(y=z_{1})}
consists of all sets except the set z_{1}, i.e. it
depends on the parameter z_{1}. On the other hand, the
class

V = {y | y=y}

consists of all sets, and does not depend on parameters.

Each set is a class: the set x coincides with the class {y | y in x}. In its general (inconsistent!) form, the comprehension axiom schema says that all classes are sets:

ExAy(y in x <-> F(y, z_{1}, ...,
z_{n})).

Still, "really", some classes are not sets. For
example, the Russell's class R = {y | ~(y in y)} cannot be a set:
if R=x, then "y in x" is equivalent to ~(y in y), and
by setting y=x we obtain a contradiction. Classes that are not
sets we will call **proper classes**. Now we can say simply
that each paradox of set theory "generates" some proper
class.

We will denote classes by capital letters: A, B, C, ... (small letters a, b, c, ... are variables of the language of set theory, i.e. they denote sets). This notation must remind to us of the metaphoric character of the following "formulas":

y in A, A=B, A<=B, A^B, AUB, A-B.

These "formulas" do not belong to the language of set theory, which does not contain "capital" variables. They are used merely as a convenient notation for the following ("proper") formulas:

F(y), Ay(F(y)<->G(y)), Ay(F(y)->G(y)), F(y)&G(y), F(y)VG(y), F(y)&~G(y),

where the formula F defines the class A, and the formula G defines B.

Now we can start formulating the axioms of set theory as Zermelo proposed them. We will use the same formal language of set theory introduced in the previous section. As the first axiom we adopt the extensionality axiom C1. And we will adopt only those comprehension axioms that are really used in mathematics for building of "useful" sets.

**Separation Axiom Schema**

Perhaps, the simplest way to obtain new sets would be **separation**:
having a set x separate those members of x that possess some
common property F. For example, in this way the set of all prime
numbers is obtained from the set of all natural numbers. In
general, the situation could be represented as follows:

|------------|-----------------y in
x------------------|---------------------|

|-----------------------------------|-------------F(y)-------------|--------|

|-----------------------------------|------y in
z------|---------------------|

The condition F(y) is any formula of set theory (it may
contain parameters z_{1}, ..., z_{n}). Using this
condition we separate those members of the set x that satisfy the
condition F. The members separated make up a new set denoted by
z.

To legalize this way of reasoning we must introduce the
following **Separation Axiom Schema**: if F(y, z_{1},
..., z_{n}) is any formula that contains free variables
y, z_{1}, ..., z_{n}, but does not contain x and
z, then the following formula is declared to be an axiom:

EzAy(y in z <-> y in x & F(y, z_{1},
..., z_{n}). --------------- (C21[F])

Of course, this schema is a part of the general comprehension
schema, namely, it is C2[y in x & F(y, z_{1}, ..., z_{n})].
An alternative, extremely convenient form of the separation
schema can be obtained by using the notion of classes: the
formula F defines a class A, hence, the axiom C21[F] says that
the intersection A^x (of the class A and the set x) is a set:
A^x=z.

Now we can prove the **existence of the empty set**, i.e
the formula EzAy ~(y in z). Indeed, from the logical axioms we
know that some sets exist, let z_{0} be a set. Then,
using the impossible condition ~(y=y) and the axiom C21[~(y=y)]
we obtain a set x such that

Ay(y in x <-> y in z_{0} &
~(y=y)).

hence, Ay ~(y in x). Q.E.D.

On the other hand, the axiom of extensionality implies that
there is only one empty set. Indeed, if the sets x_{1}, x_{2}
both are empty, i.e. Ay ~(y in x_{1}) and Ay ~(y in x_{2}),
hence, Ay(y in x_{1} <-> y in x_{2}) and
(by C1) x_{1}=x_{2}.

**Note.** Do not underestimate this simple invention - the
empty set. You may think safely, that the empty set is not a set.
Still, this would make the treatment of, for example, set
intersection very uneasy - it would be sometimes a set, and
sometimes - empty!

It would be nice to denote the empty set, for example, by o, yet our language of set theory does not contain constants. If we would introduce one, this would not solve the problem completely, because after the first constant we may wish to introduce the second one etc. Therefore, it would be better to do without any constants at all. Let us see how this can be achieved.

If we wish to use the empty set constant o in our reasoning legally, we must provide a method allowing to translate our "o-extended" formulas into the pure language of set theory that does not contain o. This can be done easily. Indeed, if we assert that the empty set o possess some property expressed by a formula F(x), i.e. we use the formula F(o), then this assertion can be expressed also by the formula

Ax(Ay ~(y in x) -> F(x)),

i.e. if x is empty set, then x possess the property F. This formula does not contain the constant o. Hence, we can use the constant o in our reasoning safely. If needed, we can reformulate any such reasoning by using (more complicated) formulas that do not contain o.

The above approach can be generalized. Let us assume that we have proved the existence and uniqueness of the set satisfying some formula T(x). I.e. we have proved the following two formulas:

Ex T(x),

Ax_{1}Ax_{2} ( T(x_{1})
& T(x_{2}) -> x_{1}=x_{2 }).

Then we may introduce a constant t denoting the above-mentioned unique set, and use it in our reasoning safely. Any assertion like as F(t) (i.e. "t possess the property F") we can reformulate without using of t: Ax(T(x) -> F(x)).

Still, the formula T could contain parameters, for example,
T(x, z_{1}, z_{2}). If we have proved that for
each pair of z_{1} and z_{2} there is a unique x
such that T(x, z_{1}, z_{2}), then T defines some
**operation**. We may wish to introduce a specific symbol #
denoting this operation, and use expressions like z_{1} #
z_{2} in our reasoning. Then x=z_{1}# z_{2}
will be just another record of the formula T(x, z_{1} , z_{2}).
This can be done safely, since for example, the assertion F(z_{1}
# z_{2}) we can reformulate as Ax(T(x, z_{1}, z_{2})
-> F(x)).

The possibility of safe introducing additional constants and operation symbols is widely used in semi-formal reasoning of set theory.

**Note.** The above explanation somewhat simplifies the
problem. For full treatment see the excellent book by E. Mendelson [1997].

**Exercise 2.6.** a) Prove that for each pair of sets x_{1},
x_{2} the intersection x_{1}^x_{2 }and
the difference x_{1}-x_{2} also are sets. Justify
the usage of both operation symbols.

b) Prove that if A<=B, and A is a proper class, then B also is a proper class.

The statement (b) allows to prove that the **class of all
sets **V = {y | y=y} **is a proper class** (indeed, R<=V,
where R is Russell's class).

The separation schema allows only obtaining smaller sets from larger ones. This means that without additional axioms we would be able to prove only the existence of the "smallest" set - the empty set. So, let us adopt some "positive" axioms, too.

**Pairing Axiom**

As the first "positive" axiom let us consider the **Pairing
Axiom**. If we have two sets x_{1} and x_{2},
then we can build a new set containing x_{1} and x_{2}
as its only members. We will denote this set by {x_{1}, x_{2}}(if
x_{1}=x_{2}, then we will write simply {x_{1}}).
To make this way of reasoning legal, we must adopt as an axiom
the following formula:

Ax_{1}Ax_{2}EzAy ( y in z
<-> y=x_{1} V y=x_{2}). --------- (C22)

Of course, C22 is a comprehension axiom, namely, C2[y=x1 V
y=x2]. In terms of classes: the pairing axiom says that if x_{1
}and x_{2} are sets, then the class {x_{1},
x_{2}} also is a set.

The set {x_{1}, x_{2}} represents the
so-called **unordered pair**. How to introduce in our theory
the notion of **ordered pair**, which is important, for
example, as a base for the notion of function? We will denote the
ordered pair of x_{1} and x_{2} by (x_{1},
x_{2}).

**N.Wiener. **A simplification of the logic of relations.
"Proc. of the Cambridge Philos. Soc.", 1914, vol. 17,
pp.387-390

**K.Kuratowski. **Sur la notion d'ordre dans la theorie des
ensembles. "Fund. Math.", 1921, Vol. 2, pp.161-171

The following elegant method of deriving (x_{1}, x_{2})
from {x_{1}, x_{2}} was proposed in the above
papers of Norbert
Wiener and Kazimierz
Kuratowski:

(x_{1}, x_{2}) = {{x_{1}},
{x_{1}, x_{2}}}.

Intuitively, x_{1} and x_{2} have different
positions in the right hand side expression, i.e. they seem to be
"ordered". Do not underestimate this simple invention -
without it we would need an additional construct in the language
of set theory!

**Exercise 2.7.** Justify the definition by Wiener and
Kuratowski by proving that

(x_{1}, x_{2}) = (y_{1},
y_{2}) -> x_{1}=y_{1} & x_{2}=y_{2}.

And, by the way, if x_{1} and x_{2} are
different, then (x_{1}, x_{2}) and (x_{2},
x_{1}) also are different.

Using the notion of ordered pairs we can introduce the notions of product, relation and function.

The **product of classes** A, B is defined as follows:

A*B = { (u, v) | u in A & v in B },

or, more precisely:

A*B = { z | EuEv(u in A & v in B & z=(u, v) }.

If A and B are sets, then the product A*B also will be a set? Sorry, to legalize this conclusion we must wait for additional axioms.

**Relations** are defined as classes that consist only of
ordered pairs. I.e. the class Q={ y | F(y) } will be called a
relation, iff

Ay( y in Q -> EuEv y=(u, v) ).

Each formula F1(u, v) having two free variables defines a relation:

Q = { (u, v) | F1(u, v) }.

And conversely, for each relation Q we can build a formula F1(u, v) such that

(u, v) in Q <-> F1(u, v).

Some time later we will prove that some relations are proper classes, for example:

E = { (u, v) | u=v }, C = { (u, v) | u in v }.

If we had only the axioms C21 and C22, we could not build sets
that possess more than two members. Therefore, let us consider
the next operation for building sets - the **union** of sets.

The simplest case is the union xUy, yet in general we can merge arbitrary collections of sets. I.e. for any class B we can define the union of all members of B:

UB = { y | Ez(y in z & z in B) }.

In other words, the union UB consists of all "members of members" of B. By the way, U{x, y} represents the well-known xUy.

**Union Axiom**

The next axiom we adopt is the **Union Axiom**. It says
that if x is a set, then Ux also is a set:

AxEu(Ay(y in u <-> Ez(y in z & z in x)). ----------- (C23)

This axiom is a comprehension axiom, namely, C2[Ez(y in z & z in x)].

**Exercise 2.8.** a) Prove that if B is a proper class, and
x is a set, then the difference B-x is a proper class.

b) Show that the axioms C21, C22 and C23 allow to prove the existence of any set which can be defined by using a finite number of the empty set symbols o, commas and braces, for example, {o, {o, {o}}, {o, {o, {o}}}}.

c) (Open problem for students). My feeling says that the set theory C1+C21+C22+C23 is equivalent (in the sense of Section 3.2) to the first order arithmetic PA (defined in Section 3.1). Prove or disprove this conjecture. See also Exercise 2.13c.

Now we can prove that, if A and B are non-empty classes, and
their product A*B is a set, then A and B also are sets. Indeed,
let v_{0} be a member of B, then for each u in A:

(u, v_{0}) in A*B, {{u}, {u, v_{0}}}
in A*B, {u} in U(A*B), u in UU(A*B).

Hence, A<=UU(A*B). Since A*B is a set, then UU(A*B) also is a set, hence, so is A. Similar argument proves that B also is a set. (For a proof that, if A and B are sets, then the product A*B also is a set, we must still wait!)

**Exercise 2.9.** Prove that the relations E and C defined
above are proper classes.

**Power-Set Axiom**

The following rather complicated way of building sets was invented, perhaps, as late as in 1870s - during the attempts to derive the definition of real numbers from the properties of rational numbers. It appeared that speaking about "arbitrary" real numbers involves inevitably speaking about arbitrary sets of natural numbers. Let us consider, for example, the definition of real numbers by means of infinite binary fractions. Any such fraction, for example,

0.10101100110000101110...

"generates" some set of natural numbers. The above example generates the set

{1, 3, 5, 6, 9, 10, 15, 17, 18, 19, ...}.

In this way we can "generate" all the possible sets of natural numbers.

In general, this new operation is defined as follows. If x is
a set, let us consider all possible **subsets** of x, i.e. all
y's such that y<=x, or Az(z in y -> z in x). Let us denote
the class of all subsets of x by

P(x) = { y | y<=x}

(P stands for "power"). We wish to postulate that if
x is a set, then P(x) also is a set. Thus, we adopt the following
**Power-Set Axiom**:

AxEzAy(y in z <-> y<=x). --------------- (C24)

Of course, C24 is a comprehension axiom, namely, C2[y<=x].

Now we can prove that the **product of two sets also is a set**:

x*y = { (u, v) | u in x & v in y }.

Indeed, if u in x and v in y, then {u} in P(x) and {u, v} in P(xUy). Hence, (u, v) = {{u}, {u, v}} in PP(xUy), i.e. the class x*y is a part of the set PP(xUy). Q.E.D.

**Replacement Axiom Schema **

**Functions** are defined as relations that possess the
"mapping" property:

(u, v_{1}) in F & (u, v_{2})
in F -> v_{1}=v_{2},

or

AuAv_{1}Av_{2} (F(u, v_{1})
& F(u, v_{2}) -> v_{1}=v_{2}).

If F is a function, then F(u)=v can be used as a convenient record of "(u, v) in F". Some functions are proper classes, for example, the above-mentioned identity function E = {(u, v) | u=v}.

The well-known notions of **domain** and **range** of
the function F can be defined as follows:

domain(F) = { u | Ev F(u)=v }

range(F) = { v | Eu F(u)=v }.

If F is a proper class, then, in general, domain(F) and range(F) also will be proper classes. For example, domain(E) = range(E) = V.

**Exercise 2.10.** Prove that F is a set, iff domain(F) and
range(F) both are sets.

The "zero"-constant-function, i.e. F_{0} =
{(u, v) | v=o}, has a proper class domain (domain(F_{0})=V),
yet its range is a set (range(F_{0})={o}). Still, how
about the fourth possibility - can a function have a set domain
and a proper class range, i.e. can a function map a set onto a
proper class? Of course, we do not wish such functions, yet to
prohibit them we must adopt additional axioms - the so-called
replacement axiom schema. You will not find these axioms in
Zermelo's 1908 paper. They were proposed in 1921 by Abraham
Fraenkel:

**A. A. Fraenkel.** Zu den Grundlagen der
Cantor-Zermeloschen Mengenlehre. "Math. Annalen", 1922,
Vol. 86, pp. 230-237

If F is a function, then for any class B we will denote by
F"B the **F-image** of B:

F"B = { v | Eu (u in B & F(u)=v) }.

**Exercise 2.11.** Prove that if a function f is a set,
then for any class B the image f"B is a set.

Still, if the function F is a class, and b is a set, then the
image F"b is ... a set? Imagine, you take members of the set
b one by one, and replace each member y by F(y). The result is
F"b. Of course, we wish F"b to be a set. So, let us
adopt the **Replacement Axiom Schema**:

F is a function -> AbEc F"b=c.

Or, more precisely, let F(u, v, z_{1}, ..., z_{n})
be a formula that does not contain v1, v2, x, y, b, c, then we
adopt as the axiom C25[F] the following formula:

AuAv_{1}Av_{2} (F(u, v_{1}, z_{1},
..., z_{n}) & F(u, v_{2}, z_{1}, ...,
z_{n}) -> v_{1}=v_{2}) -> AbEcAy(y
in c <-> Ex(x in b & F(x, y, z_{1}, ..., z_{n}))).

Of course, this is again a comprehension axiom, namely, C2[Ex(x in b & F(x, y))], yet we allow to apply it only after we have proved that for each x there is only one (or none) y such that F(x, y).

**Exercise 2.12.** a) Prove that if B is a proper class,
and b is a set, then no one-to-one function can map B into b.

b) Derive the separation axiom schema C21 from the replacement axiom schema C25.

c) (J.Mycielski) Derive the pairing axiom C22 from C21, C24 and C25. (Hint: apply first C21, then twice - C24, and finally - C25).

**Axiom of Infinity**

The axioms of set theory we have so far allow proving the
existence only of **finite** sets. Indeed, if you wish, you
can verify easily that all these axioms hold when interpreted in
the area of sets, which can be defined by using of a finite
number of the empty set symbols o, commas and braces, for
example, {o, {o, {o}}, {o, {o, {o}}}}. Hence, our next step must
be adopting of some axiom of infinity.

The approach used below is somewhat non-traditional: I will
try to introduce the **axiom of infinity as a comprehension
axiom.**

First let us declare that some of the simplest finite sets are
**natural numbers** (by definition). Namely, the empty set o
we declare to be zero, the set {o} - to be the number
"one", {o, {o}}, or {0, 1} will be the number 2, {0, 1,
2} - the number 3, etc. In general, if the set c_{n} is
declared to be the number n, then the set c_{n+1} = c_{n}
U {c_{n}} we declare to be the number n+1. It seems that
we have thus also the set of all natural numbers

{0, 1, 2, 3, ..., n, n+1, ...}?

As we established above, the axioms we have adopted so far do
not allow proving the existence of such (infinite!) set.
Therefore, let us try to define at least the **class of all
natural numbers**. It appears not an easy task! The easy
"solutions" by using "infinite formulas" like
as

y=c_{0} V y=c_{1} V y=c_{2}
V ... V y=c_{n} V ...

cannot be taken seriously, since the language of set theory
allows only finite formulas. To obtain a finite formula N(y)
expressing En(y=c_{n}) let us follow an early (1923) idea
by John
von Neumann:

**J.von Neumann.** Zur Einfuehrung der transfiniten Zahlen.
"Acta Szeged", 1923, 1, pp. 199-208

In 1930s von Neumann's constructions were simplified considerably by R.M. Robinson, K. Goedel and P. Bernays (see Fraenkel, BarHillel, Levy [1973]).

The non-trivial proposal is to formalize the semiformal
predicate En(y=c_{n}) as a combination of the following
two properties:

a) **Transitivity**. The set y is called **transitive**,
if it contains all members of each its member, i.e. Au(u in y
-> u<=y), or

AuAv (u in v & v in y -> v in y).

b) **Double-well-ordering under "in".** The
relation "<" on the set y is called a (partial) **ordering**,
iff

Ab(b in y -> ~(b<b)) (non-reflexivity),

AbAcAd(b,c,d in y -> (b<c & c<d -> b<d)) (transitivity).

Let us call an ordering "<" of x a **double-well-ordering**,
iff each non-empty subset of y contains a minimum and a maximum
member under "<", i.e.

Az(z<=y & Es(s in z) -> Eu(u in z & Av(v in z -> v<u V v=u)) & Eu(u in z & Av(v in z -> u<v V u=v))).

**Note.** Intuitively, only finite sets can be double well
ordered (by any relation).

The second property we introduce requires that **y must be
double well ordered by the membership relation "in"**.

Hence, the surprising (finite!) definition of the (infinite!) class of all natural numbers would be:

N = {y | y is transitive & y is double well ordered by "in"}.

**Exercise 2.13.** a) Show that the "standard"
natural numbers c_{n} defined above all are members of
the class N. I.e. prove the theorem schema "for all n: c_{n}
in N". We do not know how to replace this schema (i.e. an
infinite sequence of theorems) by one (finite) theorem.

b) Show that, if x in N and y is the maximum (under
"in") member of x, then x-{y} in N, x-{y} in x, and
x-{y}=y. Derive from this that for all n: if x in N, and x
contains exactly n members, then x= c_{n}.

c) (Open problem for students). Try to prove that all the axioms of the first order arithmetic PA (see Section 3.1) hold in the class N. Probably, you will need only the axioms C21, C22, C23 (see Exercise 2.8c).

Are the results (a) and (b) enough to say that "N is the class of all natural numbers"? For me, this is enough.

If we wish to think of N as of a set, then we must adopt the
following **Axiom of Infinity**:

ExAy(y in x <-> y in N). (C26).

Of course, this is a comprehension axiom, namely, C2[y in N].

**Exercise 2.14.** Write down the full text of C26 and
count the number of characters in it.

The set of all natural numbers is denoted traditionally by w (omega), instead of the above N.

**Note.** As you see, C26 is somewhat lengthy when compared
to other single Zermelo axioms (not axiom schemas, of course!).
Perhaps, for this reason, Zermelo and his followers use shorter
forms of the axiom of infinity, for example:

Ex(o in x & Ay(y in x -> yU{y} in x)),

or even shorter:

Ex( Ey(y in x) & Ay(y in x -> Ez(z in x & ~(z=y) & y<=z))).

Of course, C26 implies these formulas (simply take w for x). If you wish, prove that the converse also is true.

As a single axiom, C26 allows to prove the existence only of **countable
sets**. To prove the existence of uncountable sets, the
power-set axiom C24 must be applied additionally: the set of all
sets of natural numbers, i.e. P(w) is uncountable.

**Exercise 2.15.** Prove that in the set w the semiformal
("second order") Peano axioms hold (see Section 3.1).

The axiom of infinity completes the list of comprehension axioms, which are necessary for reconstruction of common mathematics, i.e. for building of "useful" sets. Is our list (the axioms from C21 to C26) "complete" in the sense that no "acceptable" comprehension axioms will be discovered in the future? The answer is "yes" (Church's thesis for set theory?). We will discuss this problem in Section 2.4.

So, it would be nice to stop at this point and finish our list of axioms by adopting an axiom asserting that no other sets exist - except those, which can be built by using the comprehension axioms? I.e. the axiom: "All sets can be built by using the comprehension axioms". Still, how to put this restriction into one (finite!) formula of set theory?

**Open problem.** How to put the statement "All sets
can be built by using the comprehension axioms" into one
formula of set theory? Is this possible at all?

While this problem remains unsolved, let us return to the tradition, and discuss the remaining two axioms - the axiom of regularity and the axiom of choice.

**Axiom of Regularity**

It appears that the existence of some "abnormal" kinds of sets is consistent with all the axioms we have adopted so far. For example, if you wish to assert the existence of a set x such that "x in x", you can do this safely: no contradiction with our previously adopted axioms will arise.

**Exercise 2.16.** Prove that this is the case by building
a "model" where the "in" relation is
interpreted appropriately (i.e. "abnormally").
Similarly, prove the relative consistency of the conjecture
"there are two sets x, y such that x in y and y in x".

The generally acknowledged way of excluding such
"abnormal" sets from the set theory is the **Axiom of
Regularity**, introduced by J. von Neumann in 1925:

~Ex( Ey(y in x) & Ay(y in x -> Ez(z in x & z in y))). ----- (C3)

**J. von Neumann.** Eine Axiomatisierung der Mengenlehre.
"Journal fuer reine und angewandte Mathematik", 1925,
Vol.154, pp. 219-240.

The general idea behind C3 is excluding of infinite chains "down" the membership relation, for example:

... in x_{n} in ... in x_{3} in
x_{2} in x_{1} in x.

This is because the axiom of regularity is called sometimes the "foundation axiom".

**Exercise 2.17**. Derive from C3 that V=R, where R is the
Russell's class, i.e. prove that ~(x in x) for all x. Similarly,
derive from C3 that there are no two sets x, y such that x in y
and y in x.

The set theory adopting the axiom of extensionality (C1), the
separation axiom schema (C21), the pairing axiom (C22), the union
axiom (C23), the power-set axiom (C24), the replacement axiom
schema (C25), the axiom of infinity (C26) and the axiom of
regularity (C3), is called **Zermelo-Fraenkel set theory**,
and is denoted by ZF.

Zermelo had in his axiom list also the famous axiom of choice.

**Axiom of Choice**

If all members of the set x are non-empty sets, then we can
try to define a **choice function** f that assigns to each y
in x some f(y) in y. Can we hope to define a choice function for
each collection of non-empty sets? At least, we can postulate,
that such function always exists. Thus we obtain the **Axiom of
Choice** (AC):

Ax( Ay(y in x -> y not empty) -> Ef(f is function & domain (f) = x & Ay(y in x -> f(y) in y))).

In 1904, Zermelo used this "principle of arbitrary choice" explicitly to prove that each set can be well-ordered. (The ordering "<" of some set x is called well-ordering, iff each non-empty subset of x contains a minimum member under "<".)

**E.Zermelo.** Beweiss, dass jede Menge wohlgeordnet werden
kann. "Math. Annalen", 1904, Vol. 59, pp. 514-516 (see
also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/zermelo/zermelo.html)

This provocative paper by Zermelo was by far not the first time in the history when something like the "principle of arbitrary choice" was used in mathematical proofs. Still, Zermelo dared to state this principle in its most unrestricted form.

**Exercise 2.18.** a) Prove the converse statement: if the
union Ux is well ordered (by some relation "<"),
then there is a choice function for x.

b) Derive from AC that each infinite set contains a countable subset.

Note that AC **is not a comprehension axiom**. The choice
function f is not defined by a formula expressing that f(y)=z.
The existence of f is merely postulated. The term "principle
of arbitrary choice" underlines this extremely
non-constructive nature of AC. I.e. we assume that we are able to
make infinite number of choices without having any guiding rule.
(For the long history of hot emotional discussions "around
the axiom of choice" - see Medvedev
[1982]).

We can "judge" AC as we please, yet as an axiom of set theory it is absolutely safe: in 1938 Kurt Goedel proved that

"If ZF+AC would be an inconsistent theory, then so would be ZF".

**K.Goedel.** The consistency of the axiom of choice and of
the generalized continuum hypothesis. "Acad. U. S. A.".
1938, Vol. 24, pp.556-557.

The set theory ZF+AC is denoted traditionally by ZFC. By using the axioms of ZFC all theorems of Cantor's intuitive set theory can be proved. (Mathematicians may be interested to verify this themselves - just follow the excellent concise book by T. Jech [1971]: you will be forced to do yourselves 90% of the technical work.) Since the end of XIX century we know that all the other theories of common mathematics can be reformulated in set theory. This completes the first stage of Hilbert's program: convert all existing mathematics into a formal theory.

set theory, axiom, Zermelo, Fraenkel, Frankel, Cantor, Frege, Russell, paradox, formal, axiomatic, axioms, comprehension, infinity

Back to title page.