What Is Logic, Really? Mathematical Logic. Part 1.

theorem, logic, intuitionistic, classical, minimal, intuitionistic, law, rule, inference, generalization, axiom, theory, calculus, axiomatic, formal, formal theory, primitives, constant, function, predicate, deduction, first order theory, excluded middle, tertium non datur, axioms, first order, connective, logical, quantifier, modus ponens, language, first order language, formula, atomic, term, generalisation, material implication, paradox

Back to title page

Left

Adjust your browser window

Right

1. Introduction. What Is Logic, Really?

1.1. Total formalization is possible!

1.2. First order languages

1.3. Axioms of logic: minimal system, constructive system and classical system

1.4. The flavor of proving directly

1.5. Deduction theorems

What is logic?

Ontology, i.e. common framework for building theories?

Engine of reasoning?

Your own favorite concept of logic?

See also Factasia Logic by Roger Bishop Jones.

What is logic? Is logic absolute (i.e. unique, predestined) or relative (i.e. there are many kinds of logic)? In modern times, an absolutist position is somewhat inconvenient - you must defend your concept of logic against heretics and dissidents, but very little can be done to exterminate these bad people. They may publish freely their wrecking concepts on the Internet...

So let us better adopt the relativist position, and define logic(s) as any common framework for building theories. For example, we can view a theory as logic for all its extensions. The so-called absolute geometry can be viewed as a common logic for both the Euclidean and non-Euclidean geometry. Group theory serves as a common logic for theories investigating mathematical structures that are subtypes of groups. And if you decide to rebuild all mathematical theories on your favorite set theory, then you can view set theory as your logic.

Can there be a common logic for the entire mathematics? To avoid the absolutist approach let us appreciate all the existing concepts of mathematics - classical (traditional), constructivist (intuitionist), New Foundations etc. Of course, enthusiasts of each of these concepts must propose some specific common framework for building mathematical theories, i.e. some specific kind of logic. And they do.

Can set theory (for example, the most popular one - Zermelo-Fraenkel's set theory) be viewed as a common logic for the classical (traditional) mathematics? You can think so, if you do not wish to distinguish between the first order notion of natural numbers (i.e. discrete mathematics) and the second order notion (i.e. "continuous" mathematics based on set theory or a subset of it). Or, if you do not wish to investigate in parallel the classical and the constructivist (intuitionist) versions of some theories.

1.1. Total Formalization is Possible!

Gottlob Frege (1848-1925)

Bertrand Russell (1872-1970)

David Hilbert (1862-1943)

How far can we proceed with the mathematical rigor - with the axiomatization of some theory? Complete elimination of intuition, i.e. full reduction of all proofs to a list of axioms and rules of inference, is this possible? The work by Gottlob Frege, Bertrand Russell, David Hilbert and their colleagues showed how this could be achieved even with the most complicated mathematical theories. All these theories were indeed reduced to systems of axioms and rules of inference without any admixture of human skills, intuitions etc. Today, the logical techniques developed by these people allow ultimate axiomatization of any theory that is based on a stable, self-consistent system of principles (i.e. of any mathematical theory).

What do they look like - such "100% rigorous" theories? They are called formal theories (formal systems, or deductive systems) emphasizing that no step of reasoning can be done without a reference to an exactly formulated list of axioms and rules of inference. Even the most "self-evident" logical principles (like, "if A implies B, and B implies C, then A implies C") must be either formulated in the list of axioms and rules explicitly, or derived from it.

What is, in fact, a (mathematical) theory? It is an "engine" generating theorems. Then, a formal theory must be an "engine" generating theorems without involving of human skills, intuitions etc., i.e. by means of a computer program.

The first distinctive feature of a formal theory is a precisely defined ("formal") language used to express its propositions. "Precisely defined" means here that there is an algorithm allowing to determine, is a given character string a correct proposition, or not.

The second distinctive feature of a formal theory is a precisely defined ("formal") notion of proof. Each proof proves some proposition, that is called (after being proved) a theorem. Thus, theorems are a subset of propositions.

It may seem surprising to a mathematician, but the most general exact definition of the "formal proof" involves neither axioms, nor inference rules. Neither "self-evident" axioms, nor "plausible" rules of inference are distinctive features of the "formality". Speaking strictly, "self-evident" is synonymous to "accepted without argumentation". Hence, axioms and/or rules of inference may be "good, or bad", "true, or false", and so may be the theorems obtained by means of them. The only definitely verifiable thing is here the fact that some theorem has been, indeed, proved from a fixed set of axioms by means of a fixed set of inference rules.

Thus, the second distinctive feature of "formality" is the possibility to verify the correctness of proofs mechanically, i.e. without involving of human skills, intuitions etc. This can be best formulated by using the (since 1936 - precisely defined) notion of algorithm (a "mechanically applicable computation procedure").

A theory T is called a formal theory, iff there is an algorithm allowing to verify, is a given text a correct proof via principles of T, or not. If somebody is going to publish a "mathematical text" calling it "a proof of a theorem in T", then we must be able to verify it mechanically whether the text in question is really a correct proof according to the standards of proving accepted in T. Thus, in a formal theory, the standards of reasoning must be defined precisely enough to enable verification of proofs by means of a computer program. (Note that we are discussing here verification of ready proofs, and not the much more difficult problem - is some proposition provable in T or not, see Exercise 1.1.5 below and the text after it).

Axioms and inference rules represent only one (the most popular!) of the possible techniques of formalization.

As an unpractical example of a formal theory let us consider the game of chess, let us call this "theory" CHESS. Let us call propositions of CHESS all the possible positions - i.e. allocations of some of the pieces (kings must be included) on a chessboard - plus the flag: "white to move" or "black to move". Thus, the set of all possible positions is the language of CHESS. The only axiom of CHESS is the initial position, and the rules of inference - the rules of the game. The rules allow passing from some propositions of CHESS to some other ones. Starting with the axiom and iterating the rules we obtain theorems of CHESS. Thus, theorems of CHESS are all the possible positions (i.e. propositions of CHESS) that can be obtained from the initial position (the axiom of CHESS) by moving pieces according to the rules of the game (i.e. by "proving" via the inference rules of CHESS).

Exercise 1.1.1. Could you provide an unprovable proposition of CHESS?

Why is CHESS called a formal theory? When somebody offers a "mathematical text" P as a proof of a theorem A in CHESS, this means that P is a record of some chess-game stopped in the position A. And, of course, checking the correctness of such "proofs" is a boring, but an easy task. The rules of the game are formulated precisely enough - we could write a computer program that will execute this task.

Exercise 1.1.2. Try estimating the size of this program in some programming language.

Our second example of a formal theory is only a bit more serious. It was proposed by Paul Lorenzen, let us call this theory L. Propositions of L are all the possible "words" made of letters a, b, for example: a, b, aa, aba, abaab. Thus, the set of all these "words" is the language of L. The only axiom of L is the word a, and L has two rules of inference: X |- Xb, and X |- aXa. This means that (in L) from a proposition X we can infer immediately the propositions Xb and aXa. For example, the proposition aababb is a theorem of L:

a |- ab |- aaba |- aabab |- aababb
rule1- rule2--- rule1----rule1--------

This fact is expressed usually as L |- aababb ( "L proves aababb", |- being a "fallen T").

Exercise 1.1.3. a) Verify that L is a formal theory. (Hint: describe an algorithm allowing to determine, is a sequence of propositions of L a correct proof, or not.)

b) (P. Lorenzen) Verify the following property of theorems in L: for any X, if L |- X, then L |- aaX.

One of the most important properties of formal theories is given in the following

Exercise 1.1.4. Show that the set of all theorems of a formal theory is effectively denumerable (or "recursively denumerable", according to the official terminology), i.e. show that, for any formal theory T, a computer program P_T can be written that prints out on an (endless) paper tape all theorems of this theory (and nothing else).. (Hint: T is called a formal theory, iff an algorithm is presented for checking texts as correct proofs via principles of reasoning of T. Thus, assume, you have 4 functions: GenerateFirstText() - returns Text, GenerateNextText() - returns Text, IsCorrectProof(Text) returns true or false, ExtractTheorem(Text) - returns Text, you must implement the functions GenerateFirstTheorem() - returns Text, GenerateNextTheorem() - returns Text).

Unfortunately, such programs cannot solve the problem that the mathematicians are mainly interested in: is a given proposition A provable in T or not? When, running the program P_T on a computer, we see our proposition A printed out, this means that A is provable in T. Still, in general, until that moment we cannot know in advance whether A will be printed out some time later or it will not be printed at all.

Exercise 1.1.5. a) Describe an algorithm determining whether a proposition of L is a theorem or not.

b) Could you imagine such an algorithm for the theory CHESS? Of course, you can, yet... Thus you see that even, having a relatively simple algorithm for checking of proof correctness, the problem of provability can appear to be a very complicated one.

T is called a solvable theory (more precisely - effectively solvable), iff there is an algorithm allowing to check whether some proposition is provable by using the principles of T or not. In the Exercise 1.1.5a you proved that L is a solvable theory. Still, in the Exercise 1.1.5b you established that it is hard to state whether CHESS is a "feasibly solvable" theory or not. Checking the correctness of proposed proofs is always much simpler than determining the provability. It can be proved that most mathematical theories are unsolvable, the elementary (first order) arithmetic of natural numbers and set theory included (see, for example, Mendelson [1997], or click here).

Normally, mathematical theories contain the negation symbol not. In such theories solving the problem stated in a proposition A means proving either A, or proving notA ("disproving A", "refuting A"). We can try to solve the problem by using the enumeration program of the Exercise 1.1.4: let us wait sitting near the computer for until A or notA is printed. If A and notA would be printed both, it would mean that T is an inconsistent theory (i.e. using principles of T one can prove some proposition and its negation). Totally, we have here 4 possibilities:

a) A will be printed, but notA will not (then the problem A has a positive solution),

b) notA will be printed, but A will not (then the problem A has a negative solution),

c) A and notA will be printed both (then T is an inconsistent theory),

d) neither A, nor notA will be printed.

In the case d) we may forever be sitting near our computer, yet nothing interesting will happen: using the principles of T one can neither prove nor disprove the proposition A, and for this reason such a theory is called an incomplete theory. The famous incompleteness theorem proved by Kurt Goedel in 1930 says that most mathematical theories are either inconsistent or incomplete (see Mendelson [1997] or click here).

Exercise 1.1.6. Show that any (simultaneously) consistent and complete formal theory is solvable. (Hint: use the program of the Exercise 1.1.4, i.e. assume that you have the functions GenerateFirstTheorem() - returns Text, GenerateNextTheorem() - returns Text, and you must implement the function IsProvable(Text) - returns true or false). Where the consistency and completeness come in?

Exercise 1.1.7. Verify that "fully axiomatic theories" are formal theories in the sense of the above general definition. (Hint: assume, that you have the following functions: GenerateFirstText() - returns Text, GenerateNextText() - returns Text, IsPropositon(Text) - returns true or false, IsAxiom(Proposition) - returns true or false, IsRuleName(Text) - returns true or false, Infer(RuleName, ListOfPropositions) - returns Proposition, and you must implement the functions IsCorrectProof(ListOfPropositions) - - returns true or false, ExtractTheorem(Proof) - returns Proposition).

1.2. First Order Languages

Aristotle (384-322 BC) - in a sense, the "first logician", "... was not primarily a mathematician but made important contributions by systematizing deductive logic." (according to MacTutor History of Mathematics archive).

In mathematics and computer science, the most common approach to formalization is using the so-called first order languages.

Language primitives

The vision behind the notion of first order languages is centered on the so-called "domain" - a (non-empty?) collection of "objects" that you wish to "describe" (or "define") by using the language. Thus, the first kind of language elements you will need are variables:

x, y, z, x₁, y₁, z₁, ...

The above-mentioned "domain" is the intended "range" of all these variables. For example, when building the language of the so-called first order arithmetic, we are thinking about "all natural numbers" as the range of variables:

0, 1, 2, 3, 4, ...

Or, when building the language of set theory, we think about "all sets" as the range of variables.

Note. Since our screens and laser-jets allow only a limited number of pixels per inch, we should generate variable names by using a finite set of characters, for example, by using a single letter x:

x, xx, xxx, xxxx, xxxxx, ...

Note. Maybe, you have to describe two or more kinds of "objects" that you do not wish to reduce to "sub-kinds" of one kind of "objects" (for example, integer numbers and character strings). Then you may need introducing for each of your "domains" a separate kind ("sort") of variables. In this way you arrive to a kind of the so-called many-sorted first order languages. Theoretically, many-sorted languages can be reduced to one-sorted languages by introduding the corresponding predicates Sort_i(x) ("the value of x belongs to the sort i"). Still, in applications of logic (for example, in computer science) the many-sorted approach is usually more natural and more convenient. (See Chapter 10. Many-Sorted First Order Logic, by Jean Gallier.)

Warning! Do not try to use many sorted logic to separate "objects" and "object properties" (i.e. sets of "objects"), i.e. do not try building a second order language. You should better use your favorite first order set theory instead.

The next possibility we may wish to have in our language are the so-called constant letters - symbols denoting some specific "objects" of our "domain". For example, when building the language of the first order arithmetic, we will introduce two constant letters - 0 and 1 to denote "zero" and "one" - two natural numbers having specific properties. When building the language of set theory, we could introduce a constant letter denoting the empty set, but there is a way to do without it as well (see Section TT).

Note. In many-sorted first order languages, each constant letter must be assigned to some sort.

In some languages we may need also the so-called function letters - symbols denoting specific functions, i.e. mappings between "objects" of our "domain", or operations on these objects. For example, when building the language of the first order arithmetic, we will introduce only two function letters "+" and "*" denoting addition and multiplication of natural numbers, i.e. the two argument functions x+y and x*y. When building the language of set theory, we could introduce function letters denoting set intersections, unions, set differences, power sets etc., but there is a way to do without them as well (see Section TT).

Normally, we are writing f(x, y) to denote the value of the function f for the argument values x, y. This is a uniform way suitable for functions having any number of arguments: f(x), g(x, y), h(x, y, z) etc. In our everyday mathematical practice some of two argument functions (in fact - operations) are represented by using the more compact "infix" notation (x+y instead of the uniform +(x, y), etc.).

Note. In many-sorted first order languages, for each function letter, a) each argument must be assigned to some some sort, b) the function values must be assigned to a (single) sort.

The last kind of primitives we need in our language are the so-called predicate letters - symbols denoting specific properties (of) or relations between "objects" of our "domain". It may seem strange to non-mathematicians, yet the most popular relation of objects used in most mathematical theories, is equality (or identity). Still, this is not strange for mathematicians. We can select an object x in our "domain" by using a very specific combination of properties and relations of it, and then - select another object y - by using a different combination. And after this (sometimes it may take many years to do) we prove that x=y, i.e. that these two different combinations of properties and relations are possessed by a single object. Many of discoveries in mathematics could be reduced to this form.

In the language of first order arithmeti, equality "=" is the only available predicate letter. Other predicates must be reduced to equality. For example, the relation x<y for natural numbers can be reduced to equality by using the addition letter and the formula Ez(x+z+1=y). Of course, if the language does not have function letters (as, for example, the language of Zermelo-Fraenkel's set theory), then it must contain additional predicate letters. The language of set theory contains a specific letter "in" denoting the set membership relation: "x in y" means "x is a member of y".

The uniform way of representing suitable for predicates having any number of arguments is again the "prefix" notation: p(x), q(x, y), r(x, y, z) etc. In the real mathematical practice some of two argument predicates are represented by using the more compact "infix" notation (x=y instead of the uniform =(x, y), etc.).

Predicate constants? Zero-argument predicate letters can be treated as predicate constants. In an interpreatation, each such predicate must become either "true", or "false". Hence, paradoxically, predicate constants behave like "propositional variables" - they represent sentences that do not possess a meaning, but possess a "truth value".

Note. In many-sorted first order languages, for each predicate letter, each argument must be assigned to some sort.

Thus, the specification of a first order language includes the following primitives:

1) A countable set of variable names (you may generate these names, for example, by using a single letter "x": x, xx, xxx, xxxx, ...).

2) An empty, finite, or countable set of constant letters.

3) An empty, finite, or countable set of function letters. To each function letter a fixed argument number must be assigned.

4) A finite, or countable set of predicate letters. To each predicate letter a fixed argument number must be assigned.

Different sets of primitives yield different first order languages.

Summary of the primitives of the language of the first order arithmetic:

a) Variables: x, y, z, ....

b) Constants: 0, 1.

c) Function letters: +, *

d) Predicate letter: =

Terms and formulas

By using the language primitives, we can build terms, atomic formulas and (compound) formulas.

Terms are expressions denoting "objects" and functions:

a) Variables names and constant letters (if any), are terms.

b) If f is a k-argument function letter, and t₁, ..., t_k are terms, then the string f(t₁, ..., t_k) is a term.

c) There are no other terms.

In the first order arithmetic, for the addition and multiplication letters the "infix" notation is used: if t₁, t₂ are terms, then (t₁+t₂) and (t₁*t₂) are terms.

If a term does not contain variable names, then it "denotes" an "object" of our "domain" (for example, ((1+1)+1) denotes a specific natural number - the number 3). If a term contains variables, then it denotes a function. For example, (((x*x)+(y*y))+1) denotes the function x²+y²+1 (note that the language of first order arithmetic does not contain a function letter denoting the exponentiation x^y, thus, we must write x*x instead if x²).

If the language does not contain constant letters and function letters (as, for example, the language of set theory), then variable names are the only kind of terms.

Note. In many-sorted first order languages, the term definiton is somewhat more complicated: building of the term f(t₁, ..., t_k) is allowed only, if the sort of the term t_i values coincides with the sort of the i-th argument of f.

Of course, the key element of our efforts in describing "objects" will be assertions, for example, ((x+y)=(y+x)). In first order languages, assertions are called formulas (or, sometimes - sentences).

Atomic formulas (in some other textbooks: elementary formulas, prime formulas) are defined as follows:

a) If p is a k-argument predicate letter, and t₁, ..., t_k are terms, then the string p(t₁, ..., t_k) is an atomic formula.

b) (If you like so,) there are no other atomic formulas.

For the equality letter the "infix" notation is used: if t₁, t₂ are terms, then (t₁=t₂) is an atomic formula.

Note. In many-sorted first order languages, the atomic formula definiton is somewhat more complicated: building of the formula p(t₁, ..., t_k) is allowed only, if the sort of the term t_i values coincides with the sort of the i-th argument of p.

Summary of the atomic formulas of the language of the first order arithmetic:

a) The constants 0 and 1, and all variables are terms.

b) If t₁ and t₂ are terms, then (t₁+t₂) and (t₁*t₂) also are terms.

c) Atomic formulas are built as (t₁=t₂), where t₁ and t₂ are terms.

In this language, even using the only available predicate letter "=" atomic formulas can express a lot of clever things:

x+0=x, x+y=y+x, x+(y+z)=(x+y)+z,
x*0=0, x*1=x, x*y=y*x, x*(y*z))=((x*y)*z,
(x+y)*z=(x*z)+(y*z).

Exercise 1.2.1. As the next step, translate the following assertions into the language of first order arithmetic: 2*2=4, 2*2=5, (x+y)² = x²+2xy+y².

To write more complicated assertions, we will need compound formulas, built of atomic formulas by using a fixed set of propositional connectives and quantifiers. In this book, we will use the following set:

-> (implication letter, B->C means "if B, then C", or "B implies C", or "C follows from B").

& (conjunction letter, B&C means "B and C").

v (disjunction letter, BvC means "B, or C, or both", i.e. the so-called non-exclusive "or").

~ (negation letter, ~B means "not B").

A (the universal quantifier, AxB means "for all x: B").

E (the existential quantifier, ExB means "there is x such that B").

The widely used equivalence connective <-> can be derived in the following way: B<->C stands for (B->C)&(C->B). If you like using the so-called exclusive "or", you could define B xor C as ~(B<->C).

Warning! For programmers, conjunction, disjunction and negation are familiar "logical operations" - unlike the implication that is not used in "normal" programming languages. In programming, the so-called IF-statements, when compared to logic, mean a different thing: in the statement IF x=y THEN z:=17, the condition, x=y is, indeed, a formula, but the "consequence" z:=17 is not a formula - it is an executable statement. In logic, in B->C ("if B, then C"), B and C both are formulas.

We define the notion of formula of our first order language as follows:

a) Atomic formulas are formulas.

b) If B and C are formulas, then (B->C), (B&C), (BvC), and (~B) also are formulas (B and C are called sub-formulas).

c) If B is a formula, and x is a variable name, then (AxB) and (ExB) also are formulas (B is called a sub-formula).

d) (If you like so,) there are no other formulas.

Note. In many-sorted first order languages, the "meaning" of a quantifier depends on the sort of the variable used with it. For example, Ax means "for all values of x from the domain of the sort of x".

Some simple examples of compound formulas in the language of the first order arithmetic:

(Eu(x=(u+u)))	"x is an even number"
(Eu(((x+u)+1)=y))	"x is less than y", or x<y
(0<x & Eu(x=(y*u)))	"x is divisible by y"
((1<x)&(~(Ey(Ez(((y<x)&(z<x))&(x=(y*z))))))	"x is a prime number"
(Aw(Ex((w<x)&(x is a prime number))))	"There are infinitely many prime numbers" (one of the first mathematical theorems, VI century BC)
AxAy(0<y -> EzEu(u<y & x=y*z+u))	What does it mean?

Warning! Speaking strictly, predicate letters "<", ">", "<=", ">=", "<>" etc. do not belong to the language of the first order arithmetic. Thus, in the third formula we should replace the abbreviation 0<x by ~(x=0), in the fourth formula: 1<x, y<x, z<x - by Eu(((1+u)+1)=x), Eu(((y+u)+1)=x) and Eu(((z+u)+1)=x) respectively, etc.

Exercise 1.2.2. As the next step, "translate" the following assertions into the language of the first order arithmetic:

"x and y do not have common divisors" (note: 1 is not a divisor!),
"x and y are twin primes" (examples of twin pairs: 3,5; 5,7; 11,13; 17,19;...),
"There are infinitely many pairs of twin primes" (the famous Twin Prime Conjecture),
"x is a power of 2" (Warning! En(x=2ⁿ) is not a correct solution. Why? Because exponentiation does not belong to the language of the first order arithmetic),
"Each positive even integer >=4 can be expressed as a sum of two primes" (the famous Goldbach Conjecture),
"sqrt(2) is not a rational number" (Warning! ~EpEq(sqrt(2)=p/q, and Ex(x*x=2) are not correct solutions. Why?)

Warning! Some typical problems. 1) Computer programmers do not like using the implication connective ->, trying to write formulas as conditions of IF- or WHILE-statements, i.e. by using conjunction, disjunction and negation only. This "approach" makes most logical tasks much harder than they really are!
2) Do not use abbreviations at this early stage of your studies. For example, do not write (Ax>2)EyF(x,y) to say that "for all x that are >2, there is y such that F(x,y)". Instead, you should write Ax(x>2->EyF(x,y)). And don't be silly writing Ax(x>2&EyF(x,y)) - this would imply that Ax(x>2)! How about x=1? Similarly, instead of (Ea>2)EbG(a,b), you should write Ea(a>2&EbG(a,b)), where & (here, correctly!) is a conjunction.

Exercise 1.2.3. Imagine the following first order language: it consists of variables whose "range" is "the set of all people", and two predicate letters: FATHER(x, y) ("x is the father of y") and MOTHER(x, y) ("x is the mother of y"). Construct formulas expressing as much relationships between people as you can. For example, FATHER(x, y) v MOTHER(x, y) means CHILD(y, x) ("y is child of x"). For smart students: how about the predicate ANCESTOR(x, y)?

Omitting parentheses

Our formal definitions of terms and formulas lead to expressions containing many parentheses. Let us recall, for example, our formula expressing that "x is a prime number":

((1<x)&(~(Ey(Ez(((y<x)&(z<x))&(x=(y*z))))))

Such formulas are easy reading for computers, yet inconvenient for humans (especially, to put them correctly). In the usual mathematical practice (and in programming languages) we are allowed to improve the look of our formulas by omitting some of the parentheses - according to (some of) the following rules:

a) Omit the outermost parentheses, for example, we may write A->(B->C) instead of the formally correct (A->(B->C)). In this way we may improve the final look of our formulas. Still, if we wish to use such formulas as parts of more complicated formulas, we must restore the outermost parentheses, for example: (A->(B->C))->D.

b) We may write, for example, simply:

x+y+z+u, x*y*z*u, A&B&C&D, AvBvCvD, ExAyEzAu(F)

instead of the more formal

((x+y)+z)+u, ((x*y)*z)*u, ((A&B)&C)&D, ((AvB)vC)vD, Ex(Ay(Ez(Au(F)))).

In this way we can simplify the above expression "x is a prime number" as follows:

(1<x)&(~(EyEz((y<x)&(z<x)&(x=(y*z)))).

c) We can apply the so-called priority rules. For example, the priority rank of multiplications is supposed to be higher than the priority rank of additions. This rule allows writing x+y*z instead of the more formal x+(y*z) - because of its higher priority rank, multiplication must be "performed first". The most popular priority rules are the following:

c1) The priority rank of function letters is higher than the priority rank of predicate letters. This allows, for example, writing x*y = y*x instead of (x*y)=(y*x), or "x in yUz" - instead of "x in (yUz)".

c2) The priority rank of predicate letters is higher than the priority rank of propositional connectives and quantifiers. This allows, for example, writing y<x & z<x instead of (y<x)&(z<x).

c3) The priority rank of quantifiers is higher than the priority rank of propositional connectives. This allows, for example, writing Ex(F)&Ay(G) instead of (Ex(F))&(Ay(G)), or writing ~Ex(F) instead of ~(Ex(F)).

c4) The priority rank of negations is higher than the priority rank of conjunctions and disjunctions. This allows, for example, writing ~A&~B instead of (~A)&(~B).

c5) The priority rank of conjunctions and disjunctions is higher than the priority rank of implications. This allows, for example, writing A->AvB instead of A->(AvB).

In the usual mathematical practice some additional priority rules are used, but some of them are not allowed in the common programming languages. To avoid confusions we do not recommend using too many priority rules.

According to the above priority rules, we can simplify the above expression "x is a prime number" obtaining a form that is much easier for human reading (but is somewhat complicated for computers to understand it):

1<x & ~EyEz(y<x & z<x & x=y*z).

As you see, all the above rules are mere abbreviations. In principle, you could use any other set of abbreviation rules accepted by your audience. If computers would do logic themselves, they would not need such rules at all (except, maybe, for displaying some of their results to humans, but why?).

Exercise 1.2.3X. Restore parentheses in the formulas preceding the Exercise 1.2.1.

Free variables and bound variables

The above expression "x is a prime number":

1<x & ~EyEz(y<x & z<x & x=y*z)

contains 3 variables: x - occurs 4 times in terms, y - 2 times in terms and 1 time in quantifiers, z - occurs 2 times in terms and 1 time in quantifiers. Of course, x is here a "free" variable - in the sense that the "truth value" of the formula depends on particular "values" taken by x. On the contrary, the "truth value" of the formula does not depend on the particular "values" taken by the two "bound" variables y and z - the quantifiers Ey, Ez force these variables to "run across their entire range".

More precisely, first, we will count only the occurrences of variables in terms, not in quantifiers. And second, we will define a particular occurrence o_x of a variable x in (a term of) a formula F as a free occurrence or a bound occurrence according to the following rules:

a) If F does not contain quantifiers Ex, Ax, then o_x is free in F.

b) If F is ExG or AxG, and o_x is free in G, then o_x is bound in F.

c₁) If F is G&H, GvH, or G->H, and o_x is free in G (or in H), then o_x is free in F.

c₂) If F is ~G, EyG, or AyG, where y is not x, and o_x is free in G, then o_x is free in F.

d₁) If F is G&H, GvH, or G->H, and o_x is bound in G (or in H), then o_x is bound in F.

d₂) If F is ~G, EyG, or AyG (where y is any variable, x included), and o_x is bound in G, then o_x is bound in F.

Thus, the above formula 1<x & ~EyEz(y<x & z<x & x=y*z) contains 4 free occurrences of x, 2 bound occurrences of y, and 2 bound occurrences of z.

Exercise 1.2.4. Verify that an occurrence of x in F cannot be free and bound simultaneously. (Hint: assume that it is not the case, and consider the sequence of all sub-formulas of F containing this particular occurrence of x.)

Formally, we can use formulas containing free and bound occurrences of a single variable simultaneously, for example, x>1 -> Ex(x>1). Still, we do not recommend using a single variable letter in many different roles in a single formula. Such formulas do not cause problems for computers, but they may be inconvenient for human reading.

Let us say, that x is a free variable of the formula F, iff F contains at least one free occurrence of x, or F does not contain occurrences of x at all.

If a formula contains free variables, i.e. variables that are not bound by quantifiers (for example: x=0 v x=1), then the "truth value" of such formulas may depend on particular values assigned to free variables. For example, the latter formula is "true" for x=1, yet it is "false" for x=2. Formulas that do not contain free occurrences of variables, are called closed formulas, for example:

AwEx( w<x & x is a prime number).

Closed formulas represent "definite assertions about objects of theory", they are expected to be (but not always really are) either "true", or "false".

Term substitution

To say that x is a free variable of the formula F, we may wish to write F(x) instead of simply F. Replacing all free occurrences of x by a term t yields an "instance" of the formula F. It would be natural to denote this "instance" by F(t).

Sometimes, this operation can lead to crazy results. For example, in our expression "x is a prime number", let us replace x by y:

1<y & ~EyEz(y<y & z<y & y=y*z).

Does this formula mean "y is a prime number"? Since y<y is always false, the second part ~EyEz(...) is true, hence, the latter formula means simply that "1 is less than y".

Of course, we failed because we replaced a free variable x by a variable y in such a way that some free occurrences of x became bound by quantifiers. In this way we deformed the initial meaning of our formula.

The following simple rule allows to avoid such situations. Suppose, x is a free variable of the formula F. We will say that the substitution F(x/t) (i.e. the substitution of the term t for x in the formula F) is admissible, iff no free occurrences of x in F are located under quantifiers that bind variables contained in t. Of course, if the substitution F(x/t) is admissible, then replacing by t all free occurrences of x in F preserves the initial meaning of the formula F(x), hence, we may safely denote the result of this substitution by F(t).

Exercise 1.2.5. Is x/y an admissible substitution in the following formulas? Why?

x=0 v Ey(y>z)
x=0 v Ey(y>x)

Exercise 1.2.6. a) Mathematicians: think over the analogy between bound variables in logic and bound variables in sum expressions and integrals.

b) Programmers: think over the analogy between bound variables in logic and loop counters in programs.

1.3. Axioms of Logic: Minimal System, Constructive System and Classical System

Now, for any fixed first order language L, we wish to formulate a list of logical axioms and inference rules that would allow formalization of any "pure logical" chains of reasoning.

Gottlob Frege (1848-1925) - "In 1879 Frege published his first major work Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens (Conceptual notation, a formal language modelled on that of arithmetic, for pure thought). A.George and R Heck write: ... In effect, it constitutes perhaps the greatest single contribution to logic ever made and it was, in any event, the most important advance since Aristotle. ... In this work Frege presented for the first time what we would recognise today as a logical system with negation, implication, universal quantification, essentially the idea of truth tables etc."(according to MacTutor History of Mathematics archive).

Bertrand Russell (1872-1970) - "The Principia Mathematica is a three-volume work on the foundations of mathematics, written by Bertrand Russell and Alfred North Whitehead and published in 1910-1913. It is an attempt to derive all mathematical truths from a well-defined set of axioms and inference rules in symbolic logic. The main inspiration and motivation for the Principia was Frege's earlier work on logic, which had led to some contradictions discovered by Russell." (according to Wikipedia, the Free Encyclopedia).

David Hilbert (1862-1943), Wilhelm Ackermann (1896-1962):

D.Hilbert, W.Ackermann. Grundzuege der theoretischen Logik. Berlin (Springer), 1928 (see online comments Hilbert and Ackermann's 1928 Logic Book by Stanley N.Burris).

The first version of logical axioms was introduced in 1879 by G.Frege in his above-mentioned Begriffsschrift (see online comments Concept Script: Frege by Stanley Burris). The next important version was proposed in 1910-1913 by B.Russell and A.Whitehead in their Principia Mathematica. And finally, in 1928 D.Hilbert and W.Ackermann published in their above-mentioned book, in a sense, the final version. Modifications of this version are used in all textbooks of mathematical logic.

In our version, the axioms will be represented by means of the so-called axiom schemas. Each schema represents an infinite, yet easily recognizable collection of single axioms. For example, the schema L₃: B&C->B may represent the following axioms ("instances of the schema") in the language of first order arithmetic:

x=y & x=x -> x=y,

1*1=1 & 1+1=1+1 ->1*1=1,

and many other axioms (depending on the language).

We will not specify properties of the equivalence connective in axioms. We will regard this connective as a derived one: B<->C will be used as an abbreviation of (B->C)&(C->B).

Axioms

In the schemas L₁-L₁₁ below, B, C and D are any formulas.

"Definition" of the implication connective:

L₁: B->(C->B) (what does it mean?),

L₂: (B->(C->D))->((B->C)->(B->D)) (what does it mean?).

"Definition" of the AND-connective (conjunction):

L₃: B&C->B (what does it mean?),

L₄: B&C->C (what does it mean?),

L₅: B->(C->B&C) (what does it mean?).

"Definition" of the (non-exclusive) OR-connective (disjunction):

L₆: B->BvC (what does it mean?),

L₇: C->BvC (what does it mean?),

L₈: (B->D)->((C->D)->(BvC->D)) (what does it mean?).

"Definition" of the NO-connective - refutation by deriving a contradiction (Reductio ad absurdum):

L₉: (B->C)->((B->~C)->~B) (what does it mean?).

"Contradiction Implies Anything" (Ex contradictione sequitur quodlibet, or Ex falso sequitur quodlibet):

L₁₀: ~B->(B->C) (what does it mean?).

Law of the Excluded Middle (Tertium non datur):

L₁₁: Bv~B (what does it mean?).

The above 11 schemas (plus the Modus Ponens rule of inference, see below) represent the classical propositional logic in the language L.

In the following schemas L₁₂, L₁₃, F is any formula, and t is a term such that the substitution F(x/t) is admissible (in particular, t may be x itself):

L₁₂: AxF(x)->F(t) (in particular, AxF(x)->F(x), what does it mean?),

L₁₃: F(t)->ExF(x) (in particular, F(x)->ExF(x), what does it mean?),

In the following schemas L₁₄, L₁₅, F is any formula, and G is a formula that does not contain x as a free variable:

L₁₄: Ax(G->F(x))->(G->AxF(x)) (what does it mean?),

L₁₅: Ax(F(x)->G)->(ExF(x)->G) (what does it mean?),

In the following rules of inference, B, C and F are any formulas.

Rules of Inference

Modus Ponens: B->C, B |- C (what does it mean?).

Generalization: F(x) |- AxF(x) (what does it mean?).

This list of logical axioms and rules of inference represents the so-called classical predicate logic in the first order language L (or, simply - the classical logic in the language L).

Comments

The axioms L₁, L₂ represent the (currently) most popular version of "defining" the implication connective. About other (equivalent) versions - containing 3 or 4 axioms - see Hilbert, Bernays [1934] (Chapter III) and Exercise 1.5.4.

The axiom L₉ represents the (currently) most popular version of "defining" the negation connective. About other (equivalent) versions - see Hilbert, Bernays [1934] (Chapter III), Exercise 2.4.2, and Section 7.1.

Three of the above axioms seem to be (at least partly) problematic.

For example, how do you find the funny axiom L₁₀: ~B->(B->C)? If ~B and B were true simultaneously, then anything were true? Is this a really "true" axiom? Of course, it is not. Still, this does not matter: we do not need to know, were C "true" or not, if ~B and B were "true" simultaneously. By assuming that "if ~B and B were true simultaneously, then anything were true" we greatly simplify our logical apparatus. For example, we will prove in Section 2.6 that, in the classical logic, ~~B->B. This simple formula could not be proved without the "crazy" axiom L₁₀ (see Section 2.8).

In fact, the first axiom L₁: B->(C->B) is as funny as L₁₀. If B is (unconditionally) true, then B follows from C, even if C has nothing in common with B? Moreover, in Exercise 1.4.2(d) we will see that the axioms L₁, L₉ allow proving that ~B, B |- ~C, i.e. if ~B and B were true simultaneously, then anything were false (50% of L₁₀!). After this, could we think of L₁ as a really "true" axiom? Of course, not. Still, this does not matter: if B is (unconditionally) true, then we do not need to know, follows B from C or not. By assuming that "if B is true, then B follows from anything" we greatly simplify our logical apparatus.

The above two phenomena are called paradoxes of the material implication, see Paradoxes of Material Implication by Peter Suber, and Falsity Implies Anything by Alexander Bogomolny.

Constructive Logic

Still, the most serious problem is caused by L₁₁: Bv~B - the Law of the Excluded Middle. How can we think of L₁₁ as a "true" axiom, if (according to Goedel's Incompleteness Theorem) each sufficiently strong consistent theory contains undecidable propositions? I.e. we postulate that B or ~B "must be true", yet for some B we cannot prove neither B, nor ~B! Knowing that Bv~B is "true" may inspire us to work on the problem, but it may appear useless, if we do not succeed... Should we retain L₁₁ as an axiom after this?

For these (and some other) reasons some people reject L₁₁ (or even both - L₁₀ and L₁₁) as "valid" logical axioms.

The above list as it stands is called the classical logic.

By excluding L₁₁ from the above list the so-called constructive (historically, and in most textbooks - intuitionistic) logic is obtained. As a concept, it was introduced by L.E.J.Brouwer in 1908:

L.E.J. Brouwer. De onbetrouwbaarheid der logische principes (in Dutch - " The unreliability of the logical principles" ), Tijdschrift voor Wijsbegeerte, 2 (1908), pp.152-158.

As a formal system, the intuitionistic logic was introduced by Arend Heyting in 1930:

A.Heyting. Die formalen Regeln der intuitionistischen Mathematik. Sitzungsberichte der Preussischen Akademie der Wissenschaften, Physikalisch-mathematische Klasse, 1930, pp.42-56.

The constructive concept of logic differs from the classical one mainly in its interpretation of disjunction and existence assertions:

- To prove BvC constructively, you must prove B, or prove C. To prove BvC classically, you may assume ~(BvC) as a hypothesis, and derive a contradiction. Having only such a "negative" proof, you may be unable to determine, which part of the disjunction BvC is true - B, or C, or both. Knowing that BvC is "true" may inspire you to work on the problem, but it may appear useless, if you do not succeed...

- To prove ExB(x) constructively, you must provide a particular value of x such that B(x) is true. To prove ExB(x) classically, you may assume Ax~B(x) as a hypothesis, and derive a contradiction. Having only such a "negative" proof, you may be unable to find a particular x for which B(x) is true. Knowing that ExB(x) is "true" may inspire you to work on the problem, but it may appear useless, if you do not succeed...

Note. Informally, we may regard existence assertions as "huge disjunctions". For example, in the language of first order arithmetic, ExB(x) could be "replaced" by B(0)vB(1)vB(2)v..., i.e. by an infinite "formula". Thus, the above two theses are, in a sense, equivalent.

For elegant examples of non-constructive proofs see Constructive Mathematics by Douglas Bridges in Stanford Encyclopedia of Philosophy.

Note. A similar kind of non-constructive reasoning is represented by the so-called Double Negation Law: ~~B->B, see Section 2.6.

The constructive (intuitionist) logic is one of the great discoveries in mathematical logic - surprisingly, a complete system of constructive reasoning (as we will see later, in Section 4.4) can be obtained simply by dropping the Law of the Excluded Middle from the list of valid logical principles.

Luitzen Egbertus Jan Brouwer (1881-1966): "He rejected in mathematical proofs the Principle of the Excluded Middle, which states that any mathematical statement is either true or false. In 1918 he published a set theory, in 1919 a measure theory and in 1923 a theory of functions all developed without using the Principle of the Excluded Middle." (according to MacTutor History of Mathematics archive). "Como Heinrich Scholz solia decir en sus cursos: no son ni Heidegger ni Sartre los verdaderos renovadores de la filosofia, sino Brouwer porque sólo él ha atacado el bastión dos veces milenario del platonismo: la concepción de los entes matematicos como cosas en si." (quoted after Andrés R.Raggio, Escritos Completos, Prometeo Libros, 2002).

Minimal Logic

By excluding both L₁₀ and L₁₁ the so-called minimal logic is obtained. It was introduced by Ingebrigt Johansson in 1936:

I.Johansson. Der Minimalkalkuel, ein reduzierter intuitionistischer Formalismus. Compositio Mathematica, 1936, Vol. 4, N1, pp.119-136.

As a separate concept, the minimal logic is much less significant than the constructive logic. Indeed, since it allows proving of ~B, B |- ~C (50% of L₁₀!), dropping of L₁₀ is not a very big step.

First Order Theories

All really working formal mathematical theories are formulated as the so-called first order theories.

Each first order theory T includes:

a) a specific first order language L(T),

b) logical axioms and rules of inference for this language (the classical or constructive version may be adopted),

c) a set of specific non-logical axioms of T (specific non-logical rules of inference are not necessary, as we will see in Section ZZ).

An example of a first order theory - the so-called first order arithmetic (also called Peano arithmetic):

The language:

a) The constants 0 and 1, and all variables are terms.

b) If t₁ and t₂ are terms, then (t₁+t₂) and (t₁*t₂) also are terms.

c) Atomic formulas are built as (t₁=t₂), where t₁ and t₂ are terms.

Since we can use, for example, the expression 2x²-3y²-1=0 as a shortcut for (1+1)*x*x=(1+1+1)*y*y+1, we can say simply that, in the first order arithmetic, atomic formulas of are arbitrary Diophantine equations.

The specific (non-logical) axioms of the first order arithmetic:

x=x,
x=y -> y=x,
x=y -> (y=z -> x=z),
x=y -> x+1=y+1,
~(0=x+1),
x+1=y+1 -> x=y,
x+0=x,
x+(y+1)=(x+y)+1,
x*0=0,
x*(y+1)=(x*y)+x,
B(0) & Ax(B(x)->B(x+1)) -> AxB(x), where B is any formula.

The axioms 7-10 represent recursive definitions of addition and multiplication. As the last the so-called induction schema is listed. (To read more - click here.)

Proofs and Theorems

In general, any sequence of formulas F₁, F₂, ..., F_mcould be regarded as a (correct or incorrect) formal proof (or. simply, a proof) of its last formula F_m. In a correct proof, formulas can play the following roles:

a) Axioms. Some formulas may be instances of logical or non-logical axioms.

b) Consequences of earlier formulas, obtained by using rules of inference. For example, if F₂₅ is A, and F₃₄ is A->B, and F₅₁is B, then we can say that F₅₁ has been obtained from F₂₅ and F₃₄ by using the Modus Ponens rule. Or, if F₆₂ is C(x), and F₆₃ is AxC(x), then we can say that F₆₃ has been obtained from F₆₂ by using the Generalization rule.

c) Hypotheses. Some formulas may appear in the proof as "proved elsewhere", or even without any justification, simply by assuming they are "true".

Thus, the following notation can describe the real status of a formal proof:

[T]: A₁, A₂, ..., A_n |- B,

where T is a first order theory (it determines which formulas are axioms and which are not), A₁, A₂, ..., A_n are all the hypotheses used in the proof, and B is the formula proved by the proof. Each formula in such a proof must be either an axiom or a hypothesis from the set A₁, A₂, ..., A_n, or it must be obtained from earlier formulas (in this proof) by using a rule of inference. You may wish to read the above notation as "in theory T, by using formulas A₁, A₂, ..., A_n as hypotheses, teh formula B can be proved".

For first examples of real formal proofs see the next section, Theorem 1.4.1 and Theorem 1.4.2.

In the real mathematical practice, when proving [T]: A₁, A₂, ..., A_n |- B, we may apply some theorem Q that already has been proved earlier. If we would simply insert Q into our formal proof, then, formally, this would yield only a proof of [T]: A₁, A₂, ..., A_n, Q |- B. To obtain the desired formal proof of [T]: A₁, A₂, ..., A_n |- B, we must insert not only Q itself, but the entire proof of Q! In this way we obtain the following

Theorem 1.3.1. If there is a proof of [T]: A₁, A₂, ..., A_n, Q |- B, and a proof of [T]: A₁, A₂, ..., A_n |- Q, then there is a proof of [T]: A₁, A₂, ..., A_n |- B.

Proof. A proof of [T]: A₁, A₂, ..., A_n, Q |- B is a sequence of formulas F₁, F₂, ..., Q, ..., F_m, B, and a proof of [T]: A₁, A₂, ..., A_n |- Q is some sequence of formulas G₁, G₂, ..., G_p, Q. Let us replace Q by G₁, G₂, ..., G_p, Q:

F₁, F₂, ..., G₁, G₂, ..., G_p, Q, ..., F_m, B,

and eliminate the duplicate formulas. This sequence is a proof of [T]: A₁, A₂, ..., A_n |- B. Q.E.D.

If, in some proof, hypotheses are not used at all, then we may write simply [T]: |- B, or even T |- B, and say that B is a theorem of theory T. Of course, using axioms directly almost never can prove real complicated theorems. Still, we can retain our simple formal definition of the notion of theorem because of the following

Corollary 1.3.1. If there is a proof of [T]: A₁, A₂, ..., A_n |- B, and proofs of [T]: |- A₁, [T]: |- A₂, ..., [T]: |- A_n, then there is a proof of [T]: |- B.

Proof. Immediately, by Theorem 1.3.1.

Consistency

Sometimes, a seemingly plausible set of axioms allows deriving of contradictions (the most striking example - Russell's paradox in the "naive" set theory). A formula F is called a contradiction in the theory T, iff [T]: |- F and [T]: |- ~F, i.e. if T proves and disproves F simultaneously. Theories containing contradictions are called inconsistent theories. Thus, T is called a consistent theory; iff T does not allow deriving of contradictions.

Normally, for a first order theory, the set of all theorems is infinite, and, therefore, consistency cannot be verified empirically. We may only hope to establish this desirable property by means of some theoretical proof (click here for a more detailed discussion of this problem).

For theories adopting the above logical axioms, inconsistency is, in a sense, "the worst possible property". Indeed, the axiom L₁₀: ~B->(B->C) says that if a theory allows deriving a contradiction, then, in this theory, anything is provable. In Section 2.4 we will - without L₁₀ - prove 50% of it: ~B->(B->~C). Thus, even without L₁₀: if a theory allows deriving a contradiction, then, in this theory, anything is disprovable.

Is consistency enough for a theory to be "perfect"? In Section 4.3 we will prove the so-called Model Existence Theorem: if a first order theory is consistent, then there is a "model" (a kind of "mathematical reality") where all its axioms and theorems are "true".

Completeness

AwEx( w<x & x is a prime number).

Closed formulas represent "definite assertions about objects of theory", they are expected to be either "true", or "false". Or, in a first order theory, they are expected to be either provable, or disprovable. The above closed formula (stating that "there are infinitely many prime numbers") is provable - in the first order arithmetic.

T is called a complete theory, iff for each closed formula F in the language of T: [T]: |- F or [T]: |- ~F, i.e. iff T proves or disproves any closed formula of its language. Thus, in an incomplete theory, some closed formulas ("definite assertions about objects of theory") can be neither proved, not disproved.

Formally, according to this definition, an inconsistent theory is complete. Indeed, the axiom L₁₀: ~B->(B->C) says that if a theory allows deriving a contradiction, then, in this theory, anything is provable, i.e. it is complete. In Section 2.4 we will - without L₁₀ - prove 50% of it: ~B->(B->~C). Thus, even without L₁₀: if a theory allows deriving a contradiction, then, in this theory, anything is disprovable, i.e. it is complete.

Of course, if T would be both consistent and complete, then we would call it "absolutely perfect". Unfortunately, Goedel's incompleteness theorem says that fundamental mathematical theories are either inconsistent or incomplete (see Mendelson [1997] or click here), i.e. none of them is "perfect".

Exercise 1.3.1. Re-formulate the above axiom system for a many-sorted first order language (or, see Chapter 10. Many-Sorted First Order Logic, by Jean Gallier.)

1.4. The Flavor of Proving Directly

Theorem 1.4.1. [L₁, L₂, MP]: |- A->A for any formula A (reflexivity property of implication).

The following sequence of formulas represents a proof of the formula A->A.

(1)	(A->((C->A)->A))->((A->(C->A))->(A->A))	It's the axiom schema L₂: (B->(C->D))->((B->C)->(B->D)), with B = A, C = C->A, D = A.
(2)	A->((C->A)->A)	It's the axiom schema L₁: B->(C->B), with B = A, C = C->A.
(3)	(A->(C->A))->(A->A)	It follows from (1) and (2) by Modus Ponens.
(4)	A->(C->A)	It's the axiom schema L₁: B->(C->B), with B = A, C = C.
(5)	A->A	It follows from (3) and (4) by Modus Ponens.

As you can see, the proof is easy to verify, but it could be hard to build it from scratch. "Why" should we take "the axiom L₂ with B = A, C = C->A, D = A" for (1)? No answer.

Theorem 1.4.2. For arbitrary formulas A, B, C, [L₁, L₂, MP]: A->B, B->C |- A->C. What does it mean? It's the so-called Law of Syllogism (by Aristotle), or the transitivity property of implication.

The following sequence of formulas represents a proof of the formula A->C from the hypotheses A->B and B->C.

(1)	A->B	Hypothesis.
(2)	B->C	Hypothesis.
(3)	(A->(B->C))->((A->B)->(A->C))	It's the axiom schema L₂: (B->(C->D))->((B->C)->(B->D)), with B = A, C = B, D = C.
(4)	(B->C)->(A->(B->C))	It's the axiom schema L₁: B->(C->B), with B = B->C, C = A.
(5)	A->(B->C)	It follows from (2) and (4) by Modus Ponens.
(6)	(A->B)->(A->C)	It follows from (3) and (5) by Modus Ponens.
(7)	A->C	It follows from (1) and (6) by Modus Ponens.

Note. Only axiom schemas L₁ and L₂ , and inference rule Modus Ponens are used in proving the theorems 1.4.1 and 1.4.2. Hence, these theorems will remain valid for any logical system containing L₁, L₂ and Modus Ponens.

Exercise 1.4.1. Build sequences of formulas representing the following proofs (only axiom schemas L₁ and L₂ and Modus Ponens are necessary):

a) [L₁, MP]: A |- B->A (a sequence of 3 formulas). What does it mean?

b) [L₂, MP]: A->B, A->(B->C) |- A->C (a sequence of 5 formulas). What does it mean?

c) [L₁, L₂, MP]: A->(B->C) |- B->(A->C) (a sequence of 9 formulas). What does it mean? It's the so-called Premise Permutation Law. (Hint: use Theorem 1.4.2).

d) [L₁, L₂, MP]: A->(A->B) |- A->B (a sequence of 9 formulas). What does it mean? (Hint: use Theorem 1.4.1).

Theorem 1.4.3. [L₁₄, MP, Gen] If F is any formula, and G is any formula that does not contain x as a free variable, then

G->F(x) |- G->AxF(x).

The following sequence of formulas represents a proof of the formula G->AxF(x) from the hypothesis G->F(x).

(1)	G->F(x)	Hypothesis.
(2)	Ax(G->F(x))	It follows from (1) by Generalization.
(3)	Ax(G->F(x))->(G->AxF(x)),	It's the axiom schema L₁₄ (G does not contain x as a free variable).
(4)	G->AxF(x)	It follows from (2) and (3) by Modus Ponens.

Exercise 1.4.2. Build sequences of formulas representing the following proofs (F is any formula, and G is a formula that does not contain x as a free variable):

a) [L₁₅, MP, Gen]: F(x)->G |- ExF(x)->G (a sequence of 4 formulas). What does it mean?

b) [L₃-L₅, MP]: A&B |- B&A (a sequence of 8 formulas). What does it mean?

c) [L₆-L₈, MP]: |- AvB->BvA (a sequence of 5 formulas). What does it mean?

d) [L₁, L₉, MP]: B, ~B |- ~C (a sequence of 9 formulas). What does it mean? It's 50% of the axiom L₁₀!

e) [L₃, L₄, L₉, MP]: |- ~(A&~A) (a sequence of 5 formulas). What does it mean? It's the so-called Law of Non-Contradiction.

1.5. Deduction Theorems

If, assuming B as a hypothesis, we have proved C, then we have proved that B implies C. This "trivial" way of reasoning is formalized in the so-called deduction theorems (introduced by Jacques Herbrand and Alfred Tarski):

J.Herbrand. Recherches sur la théorie de la démonstration. PhD Thesis, University of Paris, 1930 (approved in April 1929).

A.Tarski. Ueber einige fundamentale Begriffe der Metamathematik. "Comptes Rendus de Séances de la Société des Sciences et des Lettres de Varsovie, Classe III", 1930, Vol.23, pp. 22--29.

We will prove two such theorems.

Theorem 1.5.1 (Deduction Theorem 1). If there is a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C, then there is a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n|- B->C. (I.e. having a Modus Ponens proof of C from the hypotheses A₁, A₂, ..., A_n, B, we can build a Modus Ponens proof of B->C from the hypotheses A₁, A₂, ..., A_n.)

Exercise 1.5.1. (For smart students) Do not read the proof below. Try proving yourself.

Proof (thanks to Sergey Kozlovich for the idea, see also Kleene [1967], Exercise 10C). We must define a procedure allowing to convert any proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C into a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n|- B->C.

The easy way to do this would be using an induction by the number of formulas in the proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C. But we will use a more elegant idea: each formula F of the proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C, we will replace by 3 or 5 formulas, one of them being the formula B->F.

We must consider the following cases:

1) F is an axiom (i.e. an instance A of L₁, or L₂). Replace F by 3 formulas: F, F->(B->F), B->F. The second formula is an instance of L₁, the third formula is obtained from the first two ones by using Modus Ponens.

2) F is one of the hypotheses A_i. Replace F by 3 formulas: F, F->(B->F), B->F. The second formula is an instance of L₁, the third formula is obtained from the first two ones by using Modus Ponens.

3) F is B. Replace F by the 5 formulas from the proof of Theorem 1.4.1, where D is any formula:

(B->((D->B)->B))->((B->(D->B))->(B->B)) (an instance of L₂),
B->((D->B)->B) (an instance of L₁),
B->(D->B))->(B->B) (by Modus Ponens),
B->(D->B) (an instance of L₁,),
B->B (by Modus Ponens).

The last formula is, of course, B->F.

4) F is derived from some previous formulas F_i and F_j by Modus Ponens, F_i having the form F_j->F (i.e. F_j->F and F_j yield F by Modus Ponens). Then, the formulas B->F_j and B->(F_j->F) are already present in the converted proof (they appeared during the replacement operations applied to the formulas F_j and F_j->F). Replace F by 3 formulas:

(B->(F_j->F))->((B->F_j)->(B->F)) (an instance of L₂),
(B->F_j)->(B->F) (by Modus Ponens),
B->F (by Modus Ponens).

Thus, what we have now, is a correct proof in [L₁, L₂, MP] that is using the hypotheses A₁, A₂, ..., A_n, but not B! The last formula of this proof is B->C (because C is the last formula our initial proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C). Thus, we have a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n |- B->C. Q.E.D.

Corollaries 1.5.1. 1) If there is a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B₁, B₂, ..., B_k |- C , then there is a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n|- (B₁->(B₂->(...->(B_k->C)...))).

2) If T is a theory whose axioms include the schemas L₁, L₂, then, if there is a proof [T, MP]: A₁, A₂, ..., A_n, B |- C then there is a proof [T, MP]: A₁, A₂, ..., A_n|- B->C . In particular, if [T, MP ]: B |- C, then [T, MP]: |- B->C.

Proof. 1) By iterating the Deduction Theorem 1.

2) In the proof [T, MP]: A₁, A₂, ..., A_n, B |- C, the axioms of T, other than L₁, L₂, can be regarded as additional hypotheses H₁, ..., H_k. Thus, there is a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, H₁, ..., H_k, B |- C. By Deduction Theorem 1, then there is a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, H₁, ..., H_k, |- B->C. This proof can be regarded also as a proof [T, MP]: A₁, A₂, ..., A_n|- B->C. Q.E.D.

Exercise 1.5.2. In earlier versions of logical axioms, instead of the axiom L₂, the following 3 axioms were in use:
L₂₁: (A->(A->B))->(A->B),
L₂₂: (A->(B->C))->(B->(A->C)) (i.e. the Premise Permutation Law),
L₂₃: (A->B)->((B->C)->(A->C)) (the Law of Syllogism, or the transitivity property of implication).
Verify that both versions, i.e. [L₁, L₂, MP] and [L₁, L₂₁, L₂₃, L₂₃, MP] are equivalent. (Hint: a) See Section 2.1 to verify that [L₁, L₂, MP] proves L₂₁, L₂₃, L₂₃. b) Verify that [L₁, L₂₁, L₂₃, L₂₃, MP] proves L₂ either directly, or by proving the Deduction Theorem 1 for [L₁, L₂₁, L₂₃, L₂₃, MP].)

Exercise 1.5.3 (thanks to Sergey Kozlovich for the idea).

a) Prove the following "generalization" of the Modus Ponens rule:

[L₁, L₂, MP]: (D₁->(D₂->...(D_k->B)...), (D₁->(D₂->...(D_k->(B->C))...) |- (D₁->(D₂->...(D_k->C)...).

b) Prove the following "generalization" of the axiom L₁₄ (formulas D₁, D₂, ..., D_k do not contain x as a free variable):

[L₁, L₂, L₁₄, MP]: |- Ax(D₁->(D₂->...(D_k->F(x))...) -> (D₁->(D₂->...(D_k->AxF(x))...).

Warning! Generalization involved...

Now, what, if in the proof of A₁, A₂, ..., A_n, B |- C not only Modus Ponens, yet also Generalization is used?

We must be careful, because, trying "simply" to apply Deduction Theorem 1, we can obtain crazy results. Indeed, having a formula F(x), by Generalization, we obtain the formula AxF(x). Thus, F(x) |- AxF(x). If Deduction Theorem 1 could be extended to Gen without any restrictions, then we could conclude that |- F(x)->AxF(x). If this is true for any x, it is true also for x=2, hence, |- F(2)->AxF(x). Thus, if the number 2 is prime, then all numbers are prime?

So, let us try deriving a restricted formulation of the Deduction Theorem - we should prohibit application of Gen to the free variables of the hypothesis B "to be moved".

Theorem 1.5.2 (Deduction Theorem 2). If there is a proof [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n, B |- C, where Generalization is not applied to the free variables of B, then there is a proof [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n|- B->C.

Proof. We must extend the above proof of the Deduction Theorem 1 that consisted of 4 cases. First, we must extend the first case:

1') F is an axiom (i.e. an instance A of L₁, L₂, or L₁₄). Replace F by 3 formulas: F, F->(B->F), B->F. The second formula is an instance of L₁, the third formula is obtained from the first two ones by using Modus Ponens.

And we must add the following case:

5) F is derived from some previous formula F_i by Generalization, thus, F having the form AxF_i, where x is not free in the formula B. Replace F by the following 3 formulas:

Ax(B->F_i)->(B->AxF_i),
Ax(B->F_i),
B->AxF_i.

Since x is not free in B, the first formula is an instance of L₁₄. The second formula is obtained by Generalization from the formula B->F_i that is already present in the converted proof (it appeared during the replacement operation applied to the formula F_i). The third formula is obtained from the first two ones by using Modus Ponens.

Thus, what we have now, is a correct proof in [L₁, L₂, L₁₄, MP, Gen] that is using the hypotheses A₁, A₂, ..., A_n, but not B! The last formula of this proof is B->C (because C is the last formula our initial proof [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n, B |- C). Thus, we have a proof [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n |- B->C. Q.E.D.

Corollaries 1.5.2. 1) If there is a proof [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n, B₁, B₂, ..., B_k |- C, where Generalization is not applied to the free variables of the formulas B₁, B₂, ..., B_k, then there is a proof [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n|- (B₁->(B₂->(...->(B_k->C)...))).

2) If B is a closed formula, and there is a proof of [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n, B |- C, then there is a proof of [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n|- B->C.

3) If T is a theory whose axioms include the schemas L₁, L₂, L₁₄, then, if there is a proof [T, MP, Gen]: A₁, A₂, ..., A_n, B |- C, where Generalization is not applied to the free variables of B, then there is a proof [T, MP, Gen]: A₁, A₂, ..., A_n|- B->C. In particular, if [T, MP, Gen]: B |- C, where Generalization is not applied to the free variables of B, then [T, MP, Gen]: |- B->C.

Proof. Similar to the proof of the above Corollaries of Deduction Theorem 1.

Exercise 1.5.4. (For smart students) Investigate the size (the number of formulas) of the proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, |- B->C as a function f(m) of the size m of the proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C. You may wish to report your result. We will publish your report on the web as an appendix to this book. (The current record holder is Sergey Kozlovich, 2004: f(m) <= 3m+2.)

Exercise 1.5.5. (For smart students) Investigate the size (the number of instances of atomic formulas) of the proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, |- B->C as a function g(m) of the size m of the proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C. You may wish to report your result. We will publish your report on the web as an appendix to this book.

Warning! Previously proved theorems involved...

In the real mathematical practice, when proving [T, MP, Gen]: A₁, A₂, ..., A_n |- C we may wish to apply some theorem Q that we have already proved earlier. If we would simply insert Q into our formal proof, then, formally, this would yield only that [T, MP, Gen]: A₁, A₂, ..., A_n, Q |- C. To obtain the desired formal proof [T, MP, Gen]: A₁, A₂, ..., A_n |- C, we must insert not only Q itself, but the entire proof of Q!

Still, with the Deduction Theorem 2 this may be problematic. If we are proving [T, MP, Gen]: A₁, A₂, ..., A_n, B |- C with the intention to apply Deduction Theorem 2 (to obtain [T, MP, Gen]: A₁, A₂, ..., A_n |- B->C), then, before inserting the proof of Q, we must ensure that, in this proof, Generalization is not applied to free variables of B. But, of course, the original proof of Q could contain such Generalizations! To solve this problem, we could try, in the proof of Q, before inserting, rename simultaneously all the variables to which Generalization is applied and which are free variables in B. But this simultaneous renaming may affect the bound variables of Q, and thus - destroy the intended use of Q...

The problem is solved completely by the following extension of the Deduction Theorem 2:

Theorem 1.5.3 (Deduction Theorem 2A). If there is a proof [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n, B |- C, where, after B appears in the proof, Generalization is not applied to the free variables of B, then there is a proof [L₁, L₂, L₁₄, MP, Gen]: A₁, A₂, ..., A_n|- B->C.

Indeed, having such a theorem, we obtain the necessary

Corollary 1.5.3. If there is a proof [T, MP, Gen]: A₁, A₂, ..., A_n, B, Q |- C, where, after B appears in the proof, Generalization is not applied to the free variables of B, and there is a proof [T, MP, Gen]: A₁, A₂, ..., A_n |- Q, then there is a proof [T, MP, Gen]: A₁, A₂, ..., A_n |- B->C.

Proof. In the proof [T, MP, Gen]: A₁, A₂, ..., A_n, B, Q |- C, first, move the hypotheses A₁, A₂, ..., A_n to the beginning. Then, immediately after them, insert the proof [T, MP, Gen]: A₁, A₂, ..., A_n |- Q. Now we have a proof [T, MP, Gen]: A₁, A₂, ..., A_n, B |- C, where, after B appears in the proof, Generalization is not applied to the free variables of B. By Deduction Theorem 2A, then there is a proof [T, MP, Gen]: A₁, A₂, ..., A_n |- B->C. Q.E.D.

Proof of the Deduction Theorem 2A. Let us modify the above proof of the Deduction Theorem 2.

We must define a procedure allowing to convert any allowed proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C into a proof [L₁, L₂, MP]: A₁, A₂, ..., A_n|- B->C.

Unlike the above proof, let us leave unchanged all the formulas of the proof [L₁, L₂, MP]: A₁, A₂, ..., A_n, B |- C before B appears in the proof. After this, starting with B, we will replace each formula F by 3 or 5 formulas, one of them being the formula B->F.

We must consider the following cases:

1), 2), 3) - as in the proof of the Deduction Theorem 1.

4) F is derived from some previous formulas F_i and F_j by Modus Ponens, F_i having the form F_j->F (i.e. F_j->F and F_j yield F by Modus Ponens). Then, 4 subcases are possible.

4a) F_j and F_j->F both appear before B, i.e. they remain unchanged in the converted proof. Let us replace F by the following 3 formulas: F, F->(B->F), B->F. The second formula is an instance of L₁, the third formula is obtained by using Modus Ponens from the first two ones.

4b) F_j appears before B, and F_j->F is B or appears after B. Then, the formulas F_j and B->(F_j->F) are already present in the converted proof. Let us replace F by the following 5 formulas:

(B->(F_j->F))->((B->F_j)->(B->F)) (an instance of L₂),
(B->F_j)->(B->F) (by Modus Ponens),
Fj->(B->Fj) (an instance of L₁),
B->Fj (by Modus Ponens),
B->F (by Modus Ponens).

4c) F_j is B or appears after B, and F_j->F appears before B. Then, the formulas B->F_j and F_j->F are already present in the converted proof. Let us replace F by the following 5 formulas from the proof of Theorem 1.4.2:

(B->(F_j->F))->((B->F_j)->(B->F)) (an instance of L₂),
(F_j->F)->(B->(F_j->F)) (an instance of L₁),
B->(F_j->F) (by Modus Ponens),
(B->F_j)->(B->F) (by Modus Ponens),
B->F (by Modus Ponens).

4d) F_j and F_j->F both are B or appear after B. Then, the formulas B->F_j and B->(F_j->F) are already present in the converted proof (they appeared during the replacement operations applied to the formulas F_j and F_j->F). Let us replace F by the following 3 formulas:

(B->(F_j->F))->((B->F_j)->(B->F)) (an instance of L₂),
(B->F_j)->(B->F) (by Modus Ponens),
B->F (by Modus Ponens).

5) F is derived from some previous formula F_i by Generalization, thus, F having the form AxF_i. Then, 2 subcases are possible.

5a) F_i appears before B. Then x is not free in B. Let us replace F by the following 3 formulas:

F (by Generalization, x is not free in B),
F->(B->F) (an instance of L₁),
B->F

5b) F_i is B or appears after B. Then x is not free in B, and the formula B->F_i that is already present in the converted proof (it appeared during the replacement operation applied to the formula F_i). Let us replace F by the following 3 formulas:

Ax(B->F_i) (by Generalization, x is not free in B),
Ax(B->F_i)->(B->AxF_i) (an instance of L₁₄, since x is not free in B),
B->AxF_i(by Modus Ponens).

Exercise 1.5.6. In some other textbooks, a somewhat different system of logical axioms is used: instead of the axioms L₁₄, L₁₅ and the Generalization rule the following two rules of inference are used: G->F(x) |- G->AxF(x) (A-Introduction), F(x)->G |- ExF(x)->G (E-Elimination). Of course, here, G is a formula that does not contain x as a free variable. Verify that both systems are equivalent in all of their versions (minimal, constructive, and classical).

Back to title page