Peano Axioms. First Order Arithmetic

first order arithmetic, Peano, Dedekind, Grassmann, arithmetic, formal, axiomatic, axioms, Presburger

3. First Order Arithmetic

3.1. From Peano Axioms to First Order Arithmetic

Is the notion of natural numbers independent in mathematics, or it depends on other, more complicated mathematical notions - the notion of real numbers, and Cantor's notion of arbitrary infinite sets? I.e., could we define natural numbers separately from all the other mathematics? At first glance, it seems that the answer "must" be positive. If natural numbers are the most fundamental mathematical concept (more fundamental than real numbers and Cantor's infinite sets), then an independent definition of them "must" be possible?

One more reason to search for an independent formalization of natural numbers are paradoxes of set theory. Zermelo-Fraenkel's set theory does not allow you to derive Russell's paradox, Cantor's paradox and Burali-Forti paradox. Still, who could guarantee that there are no other inconsistencies? Still, have you ever thought about paradoxes in arithmetic of natural numbers? Natural number system seems to be the most reliable branch of mathematical reasoning.

So, let us try. Our work will result in the so-called Peano axioms. This terminology is one of numerous strange naming traditions in mathematics, since: "It is rather well-known, through Peano's own acknowledgment, that Peano borrowed his axioms from Dedekind and made extensive use of Grassmann's work in his development of the axioms." (see Hao Wang [1957], p.145).

Indeed, about 90% of the entire work were done already by Hermann Grassmann in his book published in 1861:

H. Grassmann. Lehrbuch der Arithmetik, 1861

The traditional recursive definitions of addition and multiplication are due to Grassmann (see Hao Wang [1957]):

x+0=x; x+(y+1)=(x+y)+1;

x*0=0; x*(y+1)=(x*y)+x.

In this way addition and multiplication of natural numbers are derived from one single argument operation x+1. By using these definitions Grassmann proved all the other principal properties of arithmetical operations (associativity, commutativity etc.). Grassmann's axioms included also the induction principle (see P3 below). Of course, the hard part of the work was not the invention of the above simple formulas, but the idea that they are necessary.

Richard Dedekind made another attempt in his essay published in 1888:

R. Dedekind. Was sind und was sollen die Zahlen. Braunschweig, 1888 (see also http://thoralf2.uwaterloo.ca/htdocs/scav/dedek/dedek.html)

Dedekind's intention was not axiomatization of arithmetic, but - giving an "algebraic" characterization of natural numbers as a mathematical structure. He writes in a letter to Dr. H. Keferstein (dated 27 February 1890, I will quote English translation from Hao Wang [1957]):

"[...]

(1) The number sequence N is a system of individuals or elements which are called numbers. This leads to a general study of systems as such (section 1 of my essay).

(2) The elements of the system N stand in a certain relation to one another, they are in a certain order determined, in the first place, by the fact that to each definite number n, belongs again a definite number n', the number which succeeds or is next after n. This leads to the consideration of the general concept of a mapping F of a system (section 2). Since the image F(n) of each number n is again a number n' and therefore F(N) is a part of N, we are here concerned with the mapping F of a system N into itself. And so this must be studied in its full generality (section 4).

(3) Given distinct numbers a, b, their successors a', b' are also distinct; the mapping F has therefore the character of distinctness or similarity (section 3).

(4) Not every number is successor n', i.e. F(N) is a proper part of N; this (together with the preceding paragraph) constitutes the infinitude of the number sequence N (section 5).

(5) More precisely, 1 is the only number which does not lie in F(N). Thus we have listed those facts which you regard as the complete characterization of an ordered simply infinite system N.

(6) But I have shown in my reply that these facts are still far from being adequate for a complete characterization of the nature of the number-sequence. Indeed, all these facts also apply to every system S which, in addition to the number-sequence N, contains also a system T of arbitrary elements t. One can always define the mapping F so as to preserve the character of similarity and so as to make F(T)=T. [...] What must we now add to the facts above in order to cleanse our system S from such alien intruders t [...]? This was one of the most difficult points of my analysis and its mastery required much thought. [...] I shall say: an element n of S belongs to the sequence N if and only if n is an element of every such part K of S which possesses the two properties (i) that the element 1 belongs to K and (ii) that the image F(K) is part of K. [...] Only after this addition is the complete character of the sequence N determined. [...]

For a brief period last summer (1889) Frege's "Begriffschrift" and "Grundlagen der Arithmetik" came, for the first time, into my possession. I noted with pleasure that his method of defining a relation between an element and another which it follows, not necessarily immediately, in a sequence, agrees in essence with my concept of chains [...]. Only one must not be put off by his somewhat inconvenient terminology.

[...]"

Dedekind refers here to the following essay by Gottlob Frege, where a similar (to Dedekind's) analysis of the natural number system is given:

G.Frege. Die Grundlagen der Arithmetik. Eine logisch-mathematische Untersuchung ueber den Begriff der Zahl. Breslau, 1884, 119 pp. (see also http://thoralf2.uwaterloo.ca/htdocs/scav/frege/frege.html)

In section 10 of his essay Dedekind proves that his characterization of the natural number system is complete in the sense that any two systems N₁ and N₂ satisfying the conditions (1)-(6) are isomorphic (for details see Appendix 1).

Giuseppe Peano made the next step in 1889 - he converted Dedekind's conditions (1)-(6) into axioms:

G.Peano. Arithmetices pricipia, nova methodo exposita. Torino, 1889, 40 pp. (see English translation in Heijenoort [1967], see also online comments at http://thoralf2.uwaterloo.ca/htdocs/scav/peano/peano.html).

The modern version of Peano axioms can be put as follows. Let variables x, y, … range over natural numbers, and let 0 denote the number "zero", Sx - denote the operation x+1, and let the variable F range over sets of natural numbers. Then the following statements are called Peano axioms:

P1) Ax~(0=Sx) (a part of Dedekind's condition (5)),

P2) AxAy(~(x=y) -> ~(Sx=Sy) (Dedekind's condition (3)),

P3) 0 in F & Ax(x in F -> Sx in F)) -> Ax(x in F) (Dedekind's condition (6), or the induction principle).

We do not need the remaining conditions (1), (2) and (4) in our list, since they can be derived from the general logical axioms and identity axioms.

One can prove easily that any two "systems" N₁ and N₂ satisfying the axioms (P1)-(P3) are isomorphic (see Appendix 1).

One of such "systems" can be constructed in the set theory ZF. Let us define 0 as the empty set, Sx - as xU{x}, and let variables range over the set w. Then the axioms P1-P3 can be proved as theorems of ZF (see Exercise 2.15). Still, this is not the kind of formalization we are searching for. Using of arbitrary sets of natural numbers seems to be less dangerous than using of Cantor's general notion of arbitrary infinite sets. Still, there is one-to-one correspondence between sets of natural numbers and real numbers. Can the notion of natural numbers depend on the notion of real numbers? If we believe that it cannot depend, we must try to formalize natural numbers without reference to arbitrary sets of these numbers.

One of the ways to do this would be replacing the axiom P3 (induction principle) by an axiom schema where the set variable F is "instantiated" by formulas in some formal language L. I.e. let us restrict the induction principle P3 to some kind of "definable" sets F. It appears that the power of the theory we obtain in this way depends essentially on the language L.

The minimum version of the language L (let us denote it by L₀) would contain: the constant letter 0, the function letter S, and the predicate letter "=". In this language the notion of term is defined as follows:

a) 0 and any variable is a term;

b) If t is a term, then St also is a term.

Hence, we have here only two types of terms: SS...S0 (i.e. representations of particular natural numbers) and SS...Sx (i.e. representations of functions x+n). Atomic formulas in the language L₀ are built by the following rule:

c) If t₁ and t₂ are terms, then (t₁=t₂) is an atomic formula.

Formulas of the language L₀ are built from atomic formulas by using logical connectors and quantifiers.

Let us denote by PA₀ the theory in the language L₀ having the following specific axioms:

P1₀) Ax~(0=Sx) (the same as P1),

P2₀) AxAy(~(x=y) -> ~(Sx=Sy) (the same as P2),

P3₀) B(0) & Ax(B(x)->B(Sx)) -> AxB(x),

where B is arbitrary formula of the language L₀ containing x as a free variable, and (maybe) some other free variables (parameters). I.e. P3₀ is the axiom schema replacing the axiom P3.

PA₀ is a very poor theory.

Exercise 3.1. Prove that PA₀ cannot "express" the relationship x<y. Hints: a) We would say that a formula LESS(x, y) in the language L₀ "expresses" the assertion "x<y", iff:

PA₀ proves: ~LESS(x, x),

PA₀ proves: LESS(x, Sx),

PA₀ proves: LESS(x, y) & LESS(y, z) -> LESS(x, z).

b) As the first step, prove that no formula LESS(x, y) in the language L₀ can possess the property: LESS(m, n) is "true" iff the number m is less than the number n.

Since x<y can be "expressed" as Ez(x+Sz=y), the addition of natural numbers also cannot be discussed in PA₀.

We can try to extend PA₀ by: a) adding to the language L₀ the "missing" predicate letter "<", and b) adding to PA₀ three new axioms: ~(x<x), x<Sx, x<y & y<z -> x<z. Unfortunately, in this way we obtain a theory PA₀₀ in which the addition of natural numbers still cannot be discussed (see Hilbert, Bernays [1934], section 7.4). I.e. none of the formulas PLUS(x, y, z) in the language of PA₀₀ can possess the properties:

a) PA₀₀ proves: PLUS(x, 0, x) (i.e. x+0=x),

b) PA₀₀ proves: PLUS(x, y, u) -> PLUS(x, Sy, Su) (i.e. x+Sy=S(x+y)).

As the next step, let us try to extend PA₀ by adding to the language L₀the "missing" function letter "+" and the constant letter 1. Then we will no more need the function letter S (the function Sx can be represented as x+1) and the predicate letter "<" (it can be "expressed" as Ez(x+z+1=y)). Let us denote this new language by L₁. Terms of L₁ are defined as follows:

a) The constants 0 and 1, and all variables are terms.

b) If t₁ and t₂ are terms, then (t₁+t₂) also is a term.

Atomic formulas of L₁ are built as (t₁=t₂), where t₁ and t₂ are terms. Since we can use, for example, the expression 2*x-3*y+1=0 as a shortcut for x+x+1=y+y+y, we can say simply that atomic formulas of L₁are linear Diophantine equations.

After this, we must modify and extend the axioms of PA₀:

P1₁) ~(0=x+1),

P2₁) ~(x=y) -> ~(x+1=y+1),

P3₁) x+0=x,

P4₁) x+(y+1)=(x+y)+1,

P5₁) B(0) & Ax(B(x)->B(x+1)) -> AxB(x).

The axioms P3₁ and P4₁ represent Grassmann's recursive definition of the addition. The induction schema P5₁ and our previous induction schema P3₀ look identical, yet actually P5₁ is much more powerful - in P5₁we can take as B(x) formulas that contain the addition letter.

Let us denote our new theory by PA₁. It is called sometimes Presburger's arihmetic. In 1929 Mojsesz Presburger established that:

a) There is an algorithm that allows deciding whether an arbitrary closed formula in the language L₁ is "true" or not.

b) Using the axioms of PA₁ we can prove each "true" formula of L₁.

c) PA₁ is consistent theory (in this last part of his proof Presburger used only "finitistic" means of reasoning allowed in Hilbert's program).

M. Presburger. Ueber die Vollstaendigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt. C.R. du I Congr. des Math. des pays Slaves, Warszawa, 1929, pp.92-101

Presburger's proof is a non-trivial piece of mathematics (see Hilbert, Bernays [1934], section 7.4). Of course, these results strengthened Hilbert's trust in a forthcoming solution of the entire problem... Just one more (maybe - technically very complicated) step, and we will have a "finitistic" proof that mathematics as a whole is both consistent and complete. Still, in 1930 a completely unexpected settlement followed... (see Section 5).

In 1974 Michael J.Fischer and Michael O.Rabin proved that any algorithm for deciding whether an arbitrary closed formula in the language L₁ is "true" or not requires at least 2**2**Cn units of time for a formula consisting of n characters (** stands for exponentiation). Could this result discourage Hilbert in 1929?

M.J.Fischer, M.O.Rabin. Super-Exponential Complexity of Presburger Arithmetic. "Proceedings of the SIAM-AMS Symposium in Applied Mathematics", 1974, vol. 7, pp.27-41 (see also Barwise [1977] - Russian translation available)

Unfortunately, PA₁ again is not a theory we are searching for: in PA₁multiplication of natural numbers cannot be discussed. I.e. none of the formulas MULT(x, y, z) in the language of PA₁ can possess the properties:

a) PA₁ proves: MULT(x, 0, 0) (i.e. x*0=0),

b) PA₁ proves: MULT(x, y, u) & MULT(x, y+1, v) -> u+x=v (i.e. x*(y+1)=x*y+x)).

(this result follows from Goedel's incompleteness theorem, see Section 5.3). Hence, you cannot discuss in PA₁, for example, the notion and properties of prime numbers.

As the next step, let us try again - let us extend PA₁ by adding to the language L₁ the "missing" function letter "*". Let us denote this new language by L₂. Terms of L₂ are defined as follows:

a) The constants 0 and 1, and all variables are terms.

b) If t₁ and t₂ are terms, then (t₁+t₂) and (t₁*t₂) also are terms.

Atomic formulas of L₂ are built as (t₁=t₂), where t₁ and t₂ are terms.

Since we can use, for example, the expression 2x²-3y²-1=0 as a shortcut for (1+1)*x*x=(1+1+1)*y*y+1, we can say simply that atomic formulas of L₂are arbitrary Diophantine equations. After this, we must modify and extend the axioms of PA₁:

P1₂) ~(0=x+1),

P2₂) ~(x=y) -> ~(x+1=y+1),

P3₂) x+0=x,

P4₂) x+(y+1)=(x+y)+1,

P5₂) x*0=0,

P6₂) x*(y+1)=(x*y)+x,

P7₂) B(0) & Ax(B(x)->B(x+1)) -> AxB(x).

The axioms P5₂ and P6₂ represent Grassmann's recursive definition of the multiplication. The induction schema P7₂ and our previous induction schema P5₁ look identical again, yet actually P7₂ is much more powerful - in P7₂ we can take as B(x) formulas that contain the addition and multiplication letters. Let us denote our new theory by PA₂.

The language of PA₂ is powerful enough to represent (at least) simple assertions about natural numbers. For example:

Ey(x=y+y) represents "x is even number",

1<x & ~EyEz(y<x & z<x & x=y*z) represents "x is prime number",

AxEy(x<y & y is prime number) represents "there are infinitely many prime numbers".

Exercise 3.2. Put in the language of PA₂ the following assertions: "x and y have no common divisors", "there are infinitely many twin prime numbers", "sqrt(2) is not a rational number".

If PA₂ is much more powerful than PA₁, then what kind of Fischer-Rabin's result can be proved for it? If all algorithms for deciding PA₁formulas (i.e. for deciding whether an arbitrary closed formula in the language of PA₁ is "true" or not) are absolutely infeasible, then, perhaps, there are no algorithms at all for deciding of PA₂formulas? In Section 6.3 we will prove that this is the case.

But how about another "next step" - why not to try adding to the language of PA₂ the exponentiation letter and the axioms: x⁰=1; x^y+1=x^y*x? It appears that this is not necessary. In Section 3.3 we will prove the so-called representation theorem: any computable function can be represented in PA₂. For example, there is a PA₂ formula EXP(x, y, z) such that:

PA₂ proves: EXP(x, 0, 1) (x⁰=1),

PA₂ proves: EXP(x, y, u) & EXP(x, y+1, v) -> u*x=v (i.e. x^y+1=x^y*x).

Hence, adding of new functional letters does not improve the power of PA₂.

Another estimate of the power of PA₂ you could have been proved in the Exercise 2.13(c): PA₂ is equivalent to a set theory where all sets are finite (i.e. to ZF minus infinity axiom).

Hence, the volume of means of reasoning included in PA₂, is not accidental. This is the reason why PA₂ is called simply the first order arithmetic. The term "first order" means that in PA₂ the notion of natural numbers is defined "in itself" (an sich) - without using the more complicated notion of real numbers (which is equivalent to the "second order" notion of arbitrary sets of natural numbers). Hence, in a sense, you may think of PA₂ as formalization of the so-called discrete mathematics (see also Section 5.2).

Traditionally, the first order arithmetic is called Peano arithmetic, and is denoted simply by PA (instead of the above PA₂). More about it - see Hajek, Pudlak [1993]. The original Dedekind's system of three axioms P1, P2, P3 (see above) is called second order Peano arithmetic (because it is using not only variables for natural numbers, yet also variables for sets of these numbers).

Our customary intuitive understanding of the language of PA (first order arithmetic) leads to an old Platonist prejudice which says that any closed formula of PA "must be" either "true" or "false". Let us clarify this. Of course, we must follow the intuition while selecting the axioms of PA. We must follow the intuition when trying to establish formal versions of statements from the usual (intuitive) number theory developed by working mathematicians (as we did it in the Exercise 3.2). Still, any attempt to go further than this, an attempt to grant our intuition of natural number sequence a mysterious informal meaning ends up in Platonism. Many people are going this way taking seriously the following "definition" of "true" formulas in PA:

a) Atomic formulas of PA are assertions about equality of polynomial values. For any particular values of variables the polynomial values can be calculated, and in this way the truth or falsity of atomic formulas can be established.

b) When a formula F is built up from two formulas A, B by using the connectors ~, &, v, ->, and the truth or falsity of A and B is known, then the truth of F is established easily by using the truth tables.

c) In PA we have two quantifier symbols E, A. The formula Ex C(x) is true, iff C(x) is true for at least one x. And the formula Ax C(x) is true, iff C(x) is true for all x's.

Hence, it seems that any closed formula of PA (i.e. a formula containing no free variables) "must be" either "true" or "false". This corresponds well to our customary intuition: a closed formula of PA asserts some completely definite property of natural number system (like as the assertion that there exist infinitely many prime numbers), and this system either possess this property or not.

Let us note, however, that in the above "definition" the sentence "C(x) is true for all x's" is used. According to our intuition of natural numbers, there exists an infinite number of x's. Hence, it is impossible to establish the truth of the assertion Ax C(x) simply by checking the truth of C(x) for particular x's. The truth of AxC(x) could be established (if ever) only theoretically, by proving the assertion from some axioms, i.e in some theory (for example, in PA or ZF). May we assert that any closed formula of PA must be either "theoretically provable", or "theoretically refutable"? We would like to assert this..., still, which particular theory are we intending - PA, ZF or other? Is this all the same? We must choose some particular theory, else our "definition" of true formulas will hang by a thread...

This problem was put elegantly by Paul Lévy (see Levy [1926]): "Ce qu'il faut admirer, c'est la puissance de l'analyse mathematique qui arrive ainsi, dans tant de cas, a reduire une infinite de verifications a un raisonnement unique. Qui peut s'etonner qu'elle n'y soit pas parvenue dans tous les cas? Non seulement cela n'a rien d'etonnant, mais il est a priory assez probable qu'il existe certains enonces, qui resument ainsi en une formule unique une infinite de cas particuliers, et pour lesquels il est impossible de jamais reduire toutes les verifications necessaires a un nombre fini d'operations..."

Maybe, the firm belief in the above-mentioned "definition" is due to the law of the excluded middle, which is a postulate in the classical logic. Of course, in PA the formula Av~A is provable for any A. Does it follow from this that for a closed A either PA proves A, or PA proves ~A? It follows from Goedel's incompleteness theorem that neither PA, nor ZF, nor any other fundamental mathematical theory can possess this perfect property - a mere postulation of the excluded middle cannot solve our problems! For an exact treatment see Section 5.

Let us return to PA as a formal theory. The principal means of proving theorems in PA is, of course, the induction principle P7₂. Let us prove, for example, the formula 0+x=x (until we prove that x+y=y+x, this formula differs from the axiom x+0=x). Let us denote 0+x=x by B(x). First, we must prove B(0), i.e. 0+0=0. This is simply an instance of the axiom x+0=x. Now we must prove that B(x)->B(x+1):

B(x) (0+x=x, hypothesis),

0+(x+1)=(0+x)+1 (an instance of the axiom x+(y+1)=(x+y)+1),

0+x=x -> (0+x)+1=x+1 (follows from identity axioms),

(0+x)+1=x+1 (by MODUS PONENS),

0+(x+1)=x+1, or B(x+1) (by transitiveness of identity).

Hence, by the Deduction Theorem, B(x)->B(x+1), i.e. also Ax(B(x)->B(x+1)). And, by the induction principle, AxB(x). Q.E.D.

In a similar way we could prove other simple theorems of PA. The only problem is to find the optimal order of proving. Try to prove directly, for example, the commutativity of multiplication (x*y=y*x). It seems to be a hard task. Still,

Exercise 3.3. Prove the following theorems of PA (see Mendelson [1997]):

x+z=y+z -> x=y,
0+x=x, (x+1)+y=(x+y)+1, x+y=y+x, (x+y)+z=x+(y+z),
0*x=0, (x+1)*y=(x*y)+y, x*y=y*x, x*(y*z)=(x*y)*z.

In PA we can prove all the necessary properties of the relation x<y (it can be defined by the formula Ez(x+z+1=y). In particular, we could prove the following schema of theorems:

~AxC(x) -> Ex(~C(x) & Ay(y<x -> C(y)).

I.e. if not all natural numbers possess the property C, then there is the minimum number that does not possess C.

In PA we can discuss freely the properties of the division operation. Let us denote by R(x, y, z) the formula Eu(x=y*u+z & z<y), i.e. x mod y = z in Pascal. Then we could prove in PA that:

0<y -> EzR(x, y, z),

R(x, y, z₁) & R(x, y, z₂) -> z₁=z₂.

All these (and many more) formal proofs you can find in Mendelson [1997]. In this book formal proving of theorems about natural numbers is performed to the extent that you can start proving in PA more serious theorems of the (intuitive) elementary number theory. For example, you can try to prove that:

there are infinitely many prime numbers,

sqrt(2) is not a rational number,

e and pi are transcendent numbers,

if p(x) is the number of primes in the sequence 1, 2, ..., x, then p(x)/ln x -> 1 as x -> infinity,

etc.

Do not be surprised at the use of some real numbers and the function ln x in the above statements. If in a statement or a proof only a fixed list of computable real numbers and computable real functions is used, then this statement/proof can be translated into PA (see Exercise 3.2).

Let us introduce shortcuts for some specific terms of PA: bold 0 will denote 0, bold 1 will denote 1, bold 2 - (1+1), bold 3 - ((1+1)+1) etc. These terms are called numerals, they are used as standard representations of particular natural numbers. To denote numerals in schemas of PA formulas we will use bold letters k, l, m, n, p, q, r.

For example, if k is a natural number, k>0, then:

PA proves: x<k <-> (x=0) v (x=1) v ... v (x=k-1),

PA proves: (Ex<k) C(x) <-> C(0) v C(1) v ... v C(k-1),

PA proves: (Ax<k) C(x) <-> C(0) & C(1) & ... & C(k-1).

Exercise 3.4. If some atomic formula (t₁=t₂) of PA contains variables x₁, x₂, ..., x_n, then by moving all components of t₂ to the left hand side we can obtain an arbitrary Diophantine equation t₁-t₂=0 (see Section 4.1). Our concern is solving of these equations in natural numbers. Show that, if we have a particular solution (b₁, b₂, ..., b_n) of the equation t₁-t₂=0, then

PA proves: Ex₁Ex₂...Ex_n(t₁=t₂).

Hint: show that PA proves every "true" atomic formula s₁=s₂, where the terms s₁ and s₂ do not contain variables.

3.2. How to Find Arithmetic in Other Formal Theories

In first order arithmetic (PA) the simplest of the (important) ways of mathematical reasoning is formalized, where only natural numbers (i.e. discrete objects) are used. Other, more complicated mathematical notions (real numbers, Cantor's arbitrary infinite sets) are formalized (as it should be) in more complicated formal theories. These more complicated theories are more "powerful" than PA in the sense that they are able to discuss more complicated objects (real numbers, arbitrary infinite sets). Still, how about their "power" in discussing of properties of natural numbers? Can they prove a theorem about natural numbers that cannot be proved in PA? For example, maybe, the twin prime conjecture can be proved in ZFC, yet it cannot be proved in PA? At first glance, this seems impossible, because the notion of natural numbers seems to be independent of (and "more fundamental than") other mathematical notions. Unfortunately, this is not the case... For the final solution see Section 6.5 and Appendix 2.

How to "select" natural numbers in some formal theory T? The most general method of doing this is the so-called relative interpretation. To indicate natural numbers in some theory T, we must first provide a formula N(x) that "asserts" that x is a natural number. The second component to be provided is an algorithm Tr transforming atomic formulas of PA (i.e. Diophantine equations) into formulas of T. If F is an atomic formula of PA, then Tr(F) must be a formula of T having exactly the free variables of F. The algorithm Tr can be extended to cover also non-atomic formulas of PA:

Tr(~F) = ~Tr(F),

Tr(F->G) = Tr(F)->Tr(G),

Tr(F & G) = Tr(F) & Tr(G),

Tr(F v G) = Tr(F) v Tr(G),

Tr(ExF(x)) = Ex(N(x) & Tr(F(x))),

Tr(AxF(x)) = Ax(N(x) -> Tr(F(x))).

Hence, having an assertion about natural numbers, i.e. some PA formula F, we can ask: if PA proves F, then T proves Tr(F)? And, maybe, T proves some Tr(F) such that PA cannot prove F?

Exercise 3.5. Generalize the above definition for arbitrary two formal theories T1 and T2.

Let us say that PA is relatively interpretable in the theory T, iff there is a relative interpretation (N, Tr) such that for each closed PA formula F: if PA proves F, then T proves Tr(F). Of course, to establish the relative interpretability of PA in some theory T, we need not to check all closed PA formulas. It is sufficient to check only the axioms of PA. Let us consider, for example, the axiom x+1=y+1 -> x=y. We must show that

T proves: N(x) & N(y) -> (Tr(x+1=y+1) -> Tr(x=y)).

If we have established this for all axioms of PA, then (since theory T contains 100% of logical means of reasoning) we have proved that T proves Tr(F), if PA proves F.

Acknowledging the fundamental role of natural numbers in mathematics, let us call a formal theory T a fundamental theory, iff PA is relatively interpretable in T.

The simplest fundamental theory is, of course, PA itself. The set theory ZF is (of course) fundamental also. As the formula N(x) asserting that "x is natural number" we can take here simply the formula "x in w", where w is the set of all natural numbers (see Section 2.3). As you established in Exercise 2.14, "x in w" is a very long formula in the language of ZF. The translation algorithm Tr from PA to ZF also is somewhat complicated. Of course, Tr(x=y) = (x=y). Since 0 is defined in ZF as the empty set, Tr(x=0) = Ay~(y in x). Since 1 is defined as the set {o}, Tr(x=1) = Ay(y in x <-> Tr(y=0)). After this, Tr(x=2) can be obtained as Ay(y in x <-> Tr(y=0) v Tr(y=1)). Etc.

Now, we can reformulate the question of the first paragraph of this section as follows: is it possible that there is a fundamental theory T and a PA formula F such that PA cannot prove F, yet T proves Tr(F)? See positive answers in Section 6.5 and Appendix 2.

Exercise 3.6. Generalize the result of the Exercise 1.4 by proving that the set Tr^-1(T) of "arithmetical theorems" of any formal theory T is effectively enumerable. Hint: use the computability of Tr.

For a complete treatment of relative interpretability see Feferman [1960]. The most striking result is due to Hao Wang: any formal theory T is relatively interpretable in PA+Con(T), i.e. in PA plus one more axiom - an arithmetical translation of the assertion "T is a consistent theory" (for details see Section 5.4).

3.3. Representation Theorem

How could we verify that some formula F(x) "asserts" that x is a prime number? Of course, if F(x) is the formula

1<x & ~EyEz(y<x & z<x & x=y*z),

we could say that it "corresponds" to the definition of prime numbers. Still, this is not the only formula "expressing" that x is a prime number...

As the first step, we could introduce the following "definition": formula F(x) "expresses" the predicate "x is prime number", iff for all n: F(n) is "true", iff n is indeed a prime number.

The next step would be to replace the vague notion of "truth" by the more exact notion of provability in PA: formula F(x) "expresses" in PA the predicate "x is prime number", iff for all n: n is a prime number, iff PA proves F(n) (where n is the numeral, i.e. 1+1+...+1 n times).

This definition involves unprovability (if n is not prime, then F(n) must be unprovable in PA). Establishing of unprovability is a very hard task. Indeed, if PA would be inconsistent, then every formula would be provable in PA, and thus, the predicate "x is prime number" could not be "expressed" at all. Since we do not know exactly, is PA consistent or not (for details see Section 5.4), the latter definition of "expressability" is not satisfactory.

A much better solution: let us say that a formula F(x) expresses the predicate "x is prime number" in PA, iff for all n:

a) If n is a prime number, then PA proves F(n).

b) If n is not prime number, then PA proves ~F(n).

This definition is 100% positive in the sense that it involves only provability of formulas, and does not use unprovability. The expressability according to this definition can be established independently of our (missing) knowledge, is PA consistent or not.

Let us adopt this definition as a general one. Let us say that a formula P(x, y) expresses the predicate p(x, y) in PA, iff P has exactly two free variables x, y, and for all pairs m, n:

a) If p(m, n), then PA proves P(m, n).

b) If not p(m, n), then PA proves ~P(m, n).

The definition of expressability for ternary etc. predicates is similar.

You could prove easily, for example, that the formula x=y expresses the identity predicate of natural numbers. Indeed, if m=n, then PA proves m=n (since the terms m, n are identical). If m, n are different numbers, then (in order to prove ~(m=n)) apply to the formula m=n the axiom x+1=y+1 -> x=y as many times as you can, and then apply the axiom ~(0=x+1).

If PA would be an inconsistent theory, then (according to our final definition) any formula would express any predicate, i.e. then all predicates would be expressible. If PA is consistent, then each formula can express only one predicate. Hence, in this case the set of all predicates that are expressible in PA is countable. Since the set of all possible natural number predicates is uncountable, some of these predicates cannot be expressible.

Let us assume that PA is consistent. Then only computable predicates are expressible in PA. Indeed, let some formula P(x, y) express in PA the predicate p(x, y). How to determine, is p(m, n) true or false for given numbers m, n? If p(m, n) is true, then PA proves P(m, n), and if p(m, n) is false, then PA proves ~P(m, n). Hence, let us take a computer that prints out all theorems of PA one by one. (Such a computer does exist, since the set of all theorems of any formal theory is effectively enumerable - see Exercise 1.4). Let us sit beside the output tape of this computer waiting for one of the formulas - P(m, n) or ~P(m, n). If P(m, n) appears, then ~P(m, n) will not appear (PA is assumed to be consistent), i.e. p(m, n) is true. If ~P(m, n) appears, then P(m, n) will not appear, and hence, p(m, n) is false. Thus we have a procedure solving p(m, n) for any given numbers m, n, i.e. the predicate p(x, y) is computable.

We do not know exactly, is PA consistent or not. Later in this section we will prove (without any consistency conjectures!) that each computable predicate can be expressed in PA.

For our crucial proofs in Section 5 we will need a somewhat stronger definition of "expressability" of functions in PA. The straightforward definition (the function f(x, y) is expressible iff the predicate f(x, y)=z is expressible) is too weak.

Let us say that the formula F(x, y, z) represents in PA the function f(x, y), iff F has exactly three free variables x, y, z, and for every natural numbers k, m, n such that f(k, m)=n:

a) PA proves F(k, m, n).

b) PA proves Az(~(z=n) -> ~F(k, m, z)).

For simple expressability of the predicate f(x, y)=z we would need instead of b):

b₁) if not f(k, m) = n, then PA proves ~F(k, m, n),

i.e. for each n we could provide a separate proof ~F(k, m, n), yet b) requires instead of these separate proofs a single general proof for all values of n.

If PA is inconsistent, then all natural number functions are representable in PA.

Exercise 3.7. a) Verify that if PA is consistent, then only computable functions can be represented in PA.

b) Prove that a predicate p(x) can be expressed in PA, iff the following function h_p(x) can be represented in PA: h_p(x)=1 if p(x), and h_p (x)=0 if not p(x).

Hence, we would prove that all computable predicates are expressible in PA, if we would prove the following

Representation Theorem. Every computable function can be represented in PA.

Proof. The rest of this section.

Note. The formulas we will build to represent functions in PA, will contain the letters "&" and "v", but not the negation letter "~", they will contain existential quantifiers Ex, and only restricted universal quantifiers Ax(x<t -> ...), where t are linear terms of PA. This observation will allow us to apply the representation theorem as the starting point for the solution of Hilbert's Tenth problem in Section 4.

Let us start the proof. We have a computable function f(x, y) mapping pairs or natural numbers into natural numbers. And we must build a PA formula F(x, y, z) representing f(x, y)=z in PA. I.e. we must provide an algorithm that allows converting computer programs computing natural number functions into formulas representing these functions.

What kind of computers should we choose for this purpose - Wintel, Macintosh, or Sun? For the simplicity of proof let us use the so-called Turing machines invented in 1936 by Alan M.Turing.

A.Turing. On computable numbers, with an application to the Entscheidungsproblem. "Proc. London Math. Soc.", 1936, vol. 42, pp.230-265 (received May 28, 1936)

Independently and almost simultaneously a similar explicit concept of abstract machines was proposed by Emil L. Post:

E.L.Post. Finite combinatory processes - formulation 1. "Journ. Symb. Logic", 1936, vol.1, pp.103-105 (received October 7, 1936)

Formally, a (somewhat modernized version of) Turing machine M consists of:

a) A finite set of internal states:

S_M = {s_start, s_stop, s₁, ..., s_m}.

Programmers can think of S_M as the main memory of M: 2+m=2^k, where k is the number of memory bits.

b) A finite set of letters:

B_M = {b₁, ..., b_n}.

c) A finite set (not sequence!) P_M of instructions (the "program") each having the form:

s, b -> s', b', e,

where s and s' are states, b and b' - letters, e = 0, +1, or -1. Two instructions of P_Mshould not have the same left-hand side (s, b).

This formal object works as follows. Imagine:

a) An infinite tape (the "hard disk" of M) consisting of fixed size cells, each cell containing a letter from B_M.

b) A box encapsulating some state from S_M, and attached to some cell on the tape.

| Box |
| cell 0 | cell 1 | cell 2 |........................| cell n |.........................................................

Every second the following happens: if the box is in the state s, and if the cell it is attached to contains the letter b, and if the program P_M contains the instruction s, b -> s', b', e, then the box changes its state from s to s', the letter b in the cell is replaced by the letter b', and the box moves by e cells. I.e. if e = -1, then it moves one cell to left, if e = +1, it moves one cell to right, if e = 0, it does not change the position. If the program P_Mdoes not contain an appropriate instruction, then nothing happens. If e = -1, but the box is attached to the leftmost cell of the tape, then the box does not move.

As you can see, Turing did not use the von Neumann's principle (invented some 10 years later) according to which data and programs should be kept in the same memory space.

Let us say that the machine M computes the function f(x, y), iff for each pair m, n the following happens. Starting in the situation where:

a) The box is in the state s_start and is attached to the leftmost cell of the tape.

b) The tape contains the following sequence of letters:

1 1 ... 1 0 1 1 ... 1 0 0 0 ...,
m times ... n times.............

the machine M performs a finite number of steps according to its program and after this the situation occurs where:

a) The box is in the state s_stop.

b) The tape contains the following sequence of letters:

1 1 ... 1 0 ...
f(m, n) times

The rest of the tape may contain arbitrary letters.

Let us say that a function f(x, y) is a computable function, iff it can be computed in the above way by some Turing machine.

As an example, let us build a machine computing the function f(x) = x+2. This machine must simply cross the array of 1's and append another two 1's to it.

S_M = { s_start, s_stop, s},

B_M = {0, 1},

P_M = {

s_start, 1 -> s_start, 1, +1; // skip 1's

s_start, 0 -> s, 1, +1; // write the first additional 1, register this as done

s, 0 -> s_stop, 1, 0; // write the second additional 1, and stop

Exercise 3.8. a) Build some of the Turing machines computing the following functions: x+y, x*2, x*y, 2^x, [log₂x] (or int(log2(x)) in Pascal).

b) Maybe, you cannot buy a real Turing machine in your city, still, you can surely force your PC to emulate these machines. Write (using your favorite programming language) an interpreter of Turing machines. It should be a program, receiving as inputs S_M, B_M, P_M and initial states of the tape cells. As outputs the program should print out final states of the tape cells.

Thus, your PC can easily emulate Turing machines. Much more striking is the converse statement: Turing machines can (not easily, yet they can!) emulate your PC, with your Pascal, C++, Lisp, Prolog etc. included. All the techniques necessary for proving this statement can be found, for example, in Mendelson [1997] or Kleene [1952].

Turing machines represent one of the possible formal reconstructions of the intuitive notion of computability (the concept of algorithm). Since 1930s, besides Turing machines, several other very different formal reconstructions of this notion were proposed: recursive functions, lambda-calculus by A. Church, canonical systems by E. Post, normal algorithms by A.A. Markov, etc. And the equivalence of all these reconstructions was proved (see Mendelson [1997] or Kleene [1952]).

The equivalence of different formal reconstructions of the same intuitive concept means that the volume of the reconstructed formal concepts is not accidental. This is the best reason to abandon the (vague) intuitive concept of computability, and to replace it by the formal concept, for example, by the concept of computability by Turing machines. This approach is known as the Church's thesis:

If some function is computable in the intuitive sense of the word, then an appropriate Turing machine can compute it.

Alonzo Church stated (an equivalent of) this thesis in 1936:

A.Church. An unsolvable problem in elementary number theory. "American Journal of Mathematics", 1936, vol. 58, pp.345-363

In the original form of Church's thesis recursive functions were used instead of Turing machines.

Let us return to the proof of our representation theorem. We must represent the predicate f(x, y)=z by a formula F(x, y, z) in the language of PA. Since f(x, y) is a computable function, we can try to build F(x, y, z) by describing in PA the computation process that leads from the value pair (k, m) to the value f(k, m). Let us denote by M some Turing machine performing this computation process.

Our task would be much easier, if the language of PA contained some additional constructs. The following sequence could be called a situation:

(s, p, a₀, a₁, ..., a_q-1),

where s is a state from S_M, p is the number of the cell to which the box M is currently attached, and a₀, a₁, ..., a_q-1 are letters in the first q cells of the tape (all the other cells contain the letter 0). Let us introduce a new kind of variables C, C₁, C₂, ..., their domain will consist of all possible situations. Let us introduce also some additional function letters:

s(C) = s in C,

p(C) = p in C,

q(C) = q in C,

a_i(C) = a_i in C.

If we had all these symbols in the language of PA, our task of representing computable functions by formulas would be much easier.

Our first formula START(C, x, y) must assert that C is the initial situation having the values of arguments x and y on the tape:

s(C)=s_start & p(C)=0 & q(C)=x+y+1 & (Ai<x+y+1)(a_i(C)=1 v (a_i(C)=0 & i=x))

(I'm sorry, but we need this trick to avoid negations).

The second formula STOP(C, z) must assert that C is a final situation having the function value z on the tape:

s(C)=s_stop & z<q(C) & a_z(C)=0 & (Ai<z)a_i(C)=1.

As the next step, we build for each instruction I: s, b -> s', b', e the formula STEP_I(C₁, C₂) asserting that by applying the instruction I in the situation C₁ we will obtain the situation C₂.

Exercise 3.9. Write these formulas yourselves ignoring my next few paragraphs.

First let us consider the case when e = 0:

s(C₁)=s & s(C₂)=s' & EkEn(p(C₁)=k & p(C₂)=k & q(C₁)=n & q(C₂)=n &

a_k(C₁)=b & a_k(C₂)=b' & (Ai<n)(i=k v Ej(a_i(C₁)=j & a_i(C₂)=j)).

We did not use expressions like as p(C₁)=p(C₂) to simplify our next steps.

Now the case when e = -1:

s(C₁)=s & s(C₂)=s' & EkEmEn(p(C₁)=k & p(C₂)=m & q(C₁)=n & q(C₂)=n &

a_k(C₁)=b & a_k(C₂)=b' & ((k=0 & m=k) v m+1=k) &

(Ai<n)(i=k v Ej(a_i(C₁)=j & a_i(C₂)=j)).

The second row of this formula says that if in C₁ the box is attached to the leftmost cell of the tape, then e = -1 works as e = 0.

And finally, the case e = +1:

s(C₁)=s & s(C₂)=s' & EkEmEnEr(p(C₁)=k & p(C₂)=m & q(C₁)=n & q(C₂)=r &

a_k(C₁)=b & a_k(C₂)=b' & k+1=m & ((m<n & n=r) v (m=n & n+1=r & a_m(C₂)=0) &

(Ai<n)(i=k v Ej(a_i(C₁)=j & a_i(C₂)=j)).

The condition a_m(C₂)=0 in the second row says that the "unregistered" rest of the tape contains only letters 0.

The next formula COMPUTE_M(C₁, C₂) will assert that starting the program P_M in the situation C₁ after a finite number of steps we obtain the situation C₂. To simplify the task let us introduce one more variable L taking as its values finite sequences of situations. We will need also the corresponding function symbols: d(L) = the length of the sequence L, g_i(L) = the i-th situation in L. Now we can easily write the formula COMPUTE_M(C₁, C₂):

ELEw((d(L)=w+1 & g₀(L)=C₁ & g_w(L)=C₂ & (Ai<w)EC₃EC₄ (g_i(L)=C₃ & g_i(L)=C₄ & STEP_M(C₃, C₄))),

where STEP_M(C₃, C₄) is the following formula:

STEP_I(C₃, C₄) v STEP_II(C₃, C₄) v ... v STEP_IK(C₃, C₄),

assuming that P_M = {I, II, ..., IK}.

Finally, we can write the formula F(x, y, z) asserting that f(x, y)=z as follows:

EC₁EC₂ (START(C₁, x, y) & COMPUTE_M(C₁, C₂) & STOP(C₁, z)).

This formula is not 100% a PA formula since it contains symbols denoting states and tape letters, situation variables (and even a variable for sequences of situations), and a lot of function symbols missing in the language of PA. Hence, to complete our proof we must show how to eliminate these constructs.

As the first step, let us replace the symbols denoting states (s_start, s_stop, s, s', etc.) and tape letters (0, 1, b, b', etc.) of the machine M by some natural numbers. How to replace the situation variables? If states and tape letters are already replaced by numbers, then situations are simply finite sequences of numbers. Hence, our situation variable problem would be solved, if we could find a good coding algorithm that allowed to represent any finite sequence of natural numbers by a single natural number (or, at least by two or three numbers). This algorithm must be "good" in the sense that it must allow representation of the functions s(C), a_i(C) etc. by formulas of PA.

It was Goedel's idea to use for this purpose the so-called Chinese remainder theorem. Could you find a number X such that (in Pascal):

X mod 3 = 2, X mod 5 = 3, and X mod 7 = 4?

Let us consider the numbers 7k+4 for k=0, 1, 2, ...:

4, 11, 18, 25, 32, 39, 46, 53, ...

Here 11 mod 3 = 2, 32 mod 3 = 2, 53 mod 3 = 2, yet only the number 53 possesses the property 53 mod 5 = 3. Hence, the least number that we can take is X = 53.

In general, if we have a sequence of divisors u₁, u₂, ..., u_n (i.e. u_i>=2 for all i), and a sequence of remainders v₁, v₂, ..., v_n (i.e. 0<=v_i<u_i for all i), could we find a number X such that X mod u_i = v_i for all i? If some of the numbers u_i have a common divisor, then this problem may have no solutions. For example, if u₁ = 6 and u₂ =10, then X=6y₁+v₁=10y₂+v₂, and thus v₁-v₂ must be an even number, i.e. if v₁ = 1 and v₂ = 2, then our problem has no solutions. Still, if two of the numbers u_i never have common divisors, then the solution always exists. This is the

Chinese Remainder Theorem. Let u₁, u₂, ..., u_n be a sequence of natural numbers (u_i>=2 for all i), two of them never have common divisors. And letv₁, v₂, ..., v_n be a sequence of natural numbers such that 0<=v_i<u_i for all i. Then there is a natural number X (less than the product u₁u₂...u_n) such that X mod u_i = v_i for all i.

Proof. Let us associate with every number x such that 0 <= x < u₁u₂...u_n, the "remainder vector" (x mod u₁, x mod u₂, ..., x mod u_n). Show that if two such numbers have equal remainder vectors, then their difference is a multiple of the product u₁u₂...u_n. Q.E.D.

Using this theorem we can try to organize a coding of sequences of natural numbers v₀, v₁, ..., v_n-1 by representing each number v_i as X mod u_i, where X is the "code" (possibly large, yet fixed number) and the sequence u₀, u₁, ..., is generated in some simple way. For example, we can try a linear function: u_i = yi+z. How to choose y and z? Two numbers u_i should not have common divisors. If d is a common divisor of u_i and u_j, then d divides also u_i-u_j = y(i-j). If we take z = y+1, i.e. u_i = y(i+1)+1, then divisors of y cannot divide neither u_i, nor u_j. Thus our common divisor d must divide i-j, i.e. a number less than n. Hence, if we take as y a multiple of (n-1)! (i.e. 1*2*3*...(n-1)), then the numbers u₀, u₁, ..., u_n-1 will have no common divisors. And finally, if we take y large enough to ensure that u₀>v₀, u₁>v₁, ..., u_n-1>v_n-1, then according to Chinese remainder theorem we can find a number x such that x mod u_i = v_i for all i = 0, 1, ..., n-1.

Hence, we could use the pair (x, y) as a code of the sequence v₀, v₁, ..., v_n-1. Such a code does not include the number n, so, it would be better to code the sequence of n, v₀, v₁, ..., v_n-1 instead of v₀, v₁, ..., v_n-1 alone.

The function:

beta(x, y, i) = x mod (1+y(i+1))

is called Goedel's beta-function. As we have proved, for each sequence of natural numbers v₁, ..., v_n a pair of natural numbers x, y can be found such that

beta(x, y, 0) = n, beta(x, y, 1) = v₁, ..., beta(x, y, n) = v_n.

Note also, that we can represent beta-function in PA by the following formula BETA(x, y, i, j) (asserting that beta(x, y, i) = j):

Ez(x=(1+y*(i+1))*z+j & j<1+y*(i+1)).

Now we can start rewriting of our formulas START, STOP, STEP etc. in the language of PA. We have already replaced by natural numbers all states of the machine M and all tape letters. Hence, any situation (s, p, a₀, a₁, ..., a_q-1) is now a sequence of natural numbers that we can replace by two numbers x, y such that:

beta(x, y, 0) = q, beta(x, y, 1) = s, beta(x, y, 2) = p, beta(x, y, 3) = a₀, ..., beta(x, y, q+2) = a_q-1.

Hence, we can replace any quantifier EC by two quantifiers ExEy, where x, y are variables of PA. The additional function symbols we have introduced:

q(C)=q₁, s(C)=s₁, p(C)=p₁, a_i(C)=b,

we can replace now by:

beta(x, y, 0) = q₁, beta(x, y, 1) = s₁, beta(x, y, 2) = p₁, beta(x, y, 3) = a₀, beta(x, y, i+3) = b.

The "illegal" inequalities such as q₁<q(C) also can be eliminated:

Eq₂(beta(x, y, 0)=q₂ & q₁<q₂).

Exercise 3.10. Rewrite the formulas START, STOP, STEP_I and STEP_M, and calculate the length of each.

In the formula COMPUTE_M we introduced the variable L for finite sequences of situations, and function letters d(L) and g_i(L). For each situation we have now a code consisting of two numbers x, y. Hence, if the code of the situation C_i is (c_i, d_i), then we can code the sequence L = (C₀,..., C_n-1) as the sequence of numbers:

n, c₀, d₀, c₁, d₁, ..., c_n-1, d_n-1,

i.e. by two numbers x, y such that:

beta(x, y, 0) = n, beta(x, y, 2i+1) = c_i, beta(x, y, 2i+2) = d_i.

Now we can replace the quantifier EL by two quantifiers ExEy, where x, y are variables of PA. Our two last additional function symbols can be eliminated as follows. We will replace the formula d(L)=w by beta(x, y, 0) = w, and g_i(L)=C - by

beta(x, y, 2*i+1)=c₁ & beta(x, y, 2*i+2)=d₁,

where (c₁, d₁) is the code of the situation C.

Exercise 3.11. Rewrite the formulas COMPUTE_M and F(x, y, z), and calculate the length of each.

Thus we have an algorithm allowing to convert a Turing machine M computing the function f(x, y) into a PA formula F(x, y, z) asserting that f(x, y) = z. To complete the proof of the representation theorem we must show that for every natural numbers k, m, n such that f(k, m)=n:

a) PA proves: F(k, m, n).

b) PA proves: Az(~(z=n) -> ~F(k, m, z)).

This would take a lot of time and space. See Mendelson [1997] or Kleene [1952] instead.

Let us think we have proved the representation theorem. Q.E.D.

first order arithmetic, Peano, Dedekind, Grassmann, arithmetic, formal, axiomatic, axioms, Presburger

Back to title page.