Inductive Inference of Functions by Probabilistic Strategies. By K.Podnieks, 1975, 1977.

machine learning, inductive inference, probabilistic, strategy, machine, inductive, learning, inference, Karlis, Podnieks, theorem, estimate, theory, log, bound, upper, lower, logarithmic

INDUCTIVE INFERENCE OF FUNCTIONS BY PROBABILISTIC STRATEGIES

Karlis Podnieks

Institute of Mathematics and Computer Science

University of Latvia

Karlis.Podnieks@mii.lu.lv

Translation of papers [Po 75] and [Po 77-2] published in Russian.
For the results (without proofs) in English see also [FBP 91].

Table of contents
1. Definitions and results
2. Proofs
3. Proofs of Lemmas 2, 3
4. Proof of Lemma 4
5. Exclusion algorithm
6. Tail reduction proof
References

Abstract

The following model of inductive inference is considered. Arbitrary numbering tau = {tau₀, tau₁, tau₂, ... } of total functions N->N is fixed. A "black box" outputs the values f(0), f(1), ..., f(m), ... of some function f from the numbering tau. Processing these values by some algorithm (a strategy) F we try to identify a tau-index of f (i.e. a number n such that f = tau_n). Strategy F outputs hypotheses h₀, h₁, ..., h_m, ... If lim h_m = n and tau_n = f, we say that F identifies in the limit tau-index of f. The complexity of identification is measured by the number of mindchanges, i.e. by F^tau(f) = card{m | h_m <> h_m+1 }. One can verify easily that for any numbering tau there exists a deterministic strategy F such that F^tau(tau_n) <= n for all n. This estimate is exact (see [Ba 74], [FBP 90]). In the current paper the corresponding exact estimate ln n + o(log n) is proved for probabilistic strategies. This result was published without proof in [Po 75]. Proofs were published in [Po 77-2] (in Russian).

1. Definitions and results

Probabilistic strategy is defined by some:

a) probability space (W, B, P), where W is the set of elementary events, B - a Borel field over subsets of W, P - a probability measure over B;

b) mapping M: N*->Z, where N* is the set of all finite sequences of natural numbers, Z - the set of all random variables over the space (W, B, P) taking their values from NU{oo} (oo means "undefined").

Thus M associates with each elementary event e in W some deterministic strategy F_e. The hypothesis F_e (<f(0),...,f(m)>) takes its values n with fixed eprobabilities P{F_e(<f(0),...,f(m)>)=n}.

Computable (recursive) probabilistic strategies can be defined by means of probabilistic Turing machines introduced first in [LMS 56]. Let a random Bernoulli generator of the distribution (1/2, 1/2) be fixed. The generator is switched into deterministic "apparatus" of a Turing machine. As a result, the operation of the machine becomes probabilistic, and we can speak of the probability that the operation satisfies certain conditions.

Consider the following Turing machine M operating with a fixed Bernoulli generator. With input sequence

f(0), f(1), ..., f(m), ...

this machine prints as output an empty, finite or infinite sequence of natural numbers (hypotheses):

h₀, h₁, ..., h_m, ...,

where h_m depends only on the values f(0), ..., f(m). To each infinite realization of Bernoulli generator's output (i.e. an infinite sequence of 0's and 1's) corresponds a completely determined operation of the machine M as a deterministic strategy.

By P{M, tau, f} we denote the probability that a probabilistic strategy M identifies in the limit tau-index of the function f. By P{M, f, <=k} we denote the probability that strategy M makes no more than k mindchanges by the function f.

Let us denote by f^[m] the code of <f(0), ..., f(m)>, then the random variable M(<f(0), ..., f(m)>) can be denoted by M(f^[m]). By P_m(M, f) we denote the probability that M changes its hypothesis at the step m, i.e. P{M(f^[m+1])<>M(f^[m])}.

First, let us consider some sufficient condition for P{M, tau, f}=1. We will say that strategy M is tau-consistent on the function f if, for all m,

a) M(f^[m]) is always defined (i.e. defined for all events e in W),

b) if M(f^[m])=n for some event e in W, then tau_n(j)=f(j) for all j<=m.

Thus, consistent strategies do not output "explicitly incorrect" hypotheses.

THEOREM 1. For any enumerated class (U, tau) there is a probabilistic strategy M and a constant C>0 such that: a) M always identifies in the limit tau-index of each function f in U, and b) M changes its mind by the function tau_nno more than ln n+C*sqrt(ln n)*lnln n times with probability ->1 as n->oo. For a computable numbering tau, a computable probabilistic strategy M can be constructed..

THEOREM 2. For any countable set FI of probabilistic strategies there exists an enumerated class (U, tau) and a constant C>0 such that: for any strategy M in FI, which identifies in the limit tau-index of each function f in U, there is an increasing sequence {n_k} such that M changes its mind by the function tau_nkat least ln n - C*sqrt(ln n)*lnln n times with probability ->1 as k->oo. For the class of all computable probabilistic strategies, a computable numbering tau can be constructed.

The upper bound ln n can be proved by means of a strategy from [BF 72]. Essential difficulties arise, however, not in the construction of the strategy but in its analysis.

The main idea is as follows. Let us suppose that the function f in the "black box" which outputs the values f(0), f(1), f(2), ..., is chosen at random from the numbering tau, according to some probability distribution pi={pi_n} (pi_n is the probability that f=tau_n, pi₁+pi₂+pi₃+... =1). Then, having received the values f(0), ..., f(m), let us consider the set E_m of all tau-indices "suitable" for such f, i.e.

E_m = {n | (Aj<=m) tau_n(j) = f(j)}.

It would be natural to output as a hypothesis any tau-index n in E_m with probability pi_n / s, where s = Sum{pi_n |n in E_m}.

To put it precisely, we define a probabilistic strategy BF_tau,pi as follows. Let tau be any numbering of total functions. Take some probability distribution {pi_n}, where pi_n>0 for all n, and pi₁+pi₂+pi₃+... =1.

If the set E₀ = {n | tau_n(0)=f(0)} is empty, then we put BF_tau,pi(f^[0]) undefined with probability 1. If E₀ is non-empty, we put BF_tau,pi(f^[0])=n with probability pi_n/s for every n in E₀, where s = Sum_n{pi_n | n in E }.

Let us assume now that the hypotheses BF_tau,pi(f^[j]) have already been determined for j<m, and BF_tau,pi(f^[m-1])=p. If p is "undefined", then we set BF_tau,pi(f^[m]) undefined with probability 1. Else, if tau_p(m)=f(m) (i.e. the hypothesis p is correct also for the next argument m), we set BF_tau,pi(f^[m])=p with probability 1.

Suppose now that tau_p(m)<>f(m). Let us take the set of all appropriate (for the present) hypotheses, i.e.

E_m = {n | (Aj<=m) tau_n(j) = f(j)}.

If E_m is empty, then we put BF_tau,pi(f^[m]) undefined with probability 1. If E_m is nonempty, we put BF_tau,pi(f^[m])=n with probability p_n/s for every n in E_m, where s = Sum_n{ pi_n | n in E_m }.

Of course, BF_tau,pialways identifies in the limit tau-index of each function f in U.

2. Proofs

LEMMA 1. Let {X_m} be a series of independent random variables taking values from {0,1} in such a way that: a) Sum_mX_m is always finite, b) E = Sum_mP{X_m=1}is finite. Then for any number t>0:

P{ |Sum_mX_m - E|>=t*sqrt(E) } <= 1/t².

PROOF. Immediately, by Chebishev inequality.

Lemma 1 allows deriving upper and lower bounds of the number of mindchanges from the corresponding bounds of Sum_m P_m(M, f).

LEMMA 2. For all n,

Sum_m{ P_m(BF_tau,pi, tau_n) } <= ln(1/pi_n).

LEMMA 3. Let a function f of the numbering tau be fixed. Then the following events are independent:

A_m = { BF_tau,pi(f^[m]) <> BF_tau,pi(f^[m+1]) }, m = 0, 1, 2, ...

It is curious that the events A_m (i.e. "at the m-th step BF_tau,pi changes its mind") do not display any striking indications of independence; nevertheless, they do satisfy the formal independence criterion.

If we take

pi'_n = c / (n* (ln n)²),

with the convention that 1/0=1 and ln 0 = 1, then by Lemma 2 the sum of the probabilities of BF_tau,pichanging hypothesis by the function tau_n will be less than ln n + 2lnln n - ln c. Lemma 3 and Lemma 1 (let us take t = lnln n) allow us to derive from this that, as n->oo, with probability ->1, the number of mindchanges of BF_tau,pi' by tau_ndoes not exceed ln n + C*sqrt(ln n)*lnln n, thus proving Theorem 1. For proofs of Lemmas 2, 3 see Section 3.

It is easy to verify that if the numbering tau is computable, then the strategy BF_tau,pi' can be modified so that it becomes computable (see Section 3).

The lower bound ln n of Theorem 2 is proved by diagonalization. Let {M_i} be some numbering of strategies of the set FI. In the numbering tau to be constructed, for each strategy M_i an infinite sequence of blocks B_ij is built in, each block consisting of a finite number of functions tau_n. If M_i identifies (with probability 1) tau-indices of all functions of B_ij, then by some function of this block M_i will change its mind sufficiently often.

To construct an individual block B_ij the following Lemma 4 will be used.

Let {Z_j} be a sequence of independent random variables taking values 0,1 in such a way that

P{Z_j = 1} = 1/j, P{Z_j = 0}=1- 1/j.

Thus,

P{Z₂=1}+ ... +P{Z_n=1}= 1/2+...+1/n = ln n+O(1).

Let us take t = lnln n in Lemma 1, then for some C>0, as n->oo,

P{ Z₂+...+Z_n >= ln n - Csqrt(ln n)lnln n) }->1.

Now one can understand easily the significance of

LEMMA 4. Let M be a probabilistic strategy, k and n - natural numbers with k<n, e>0 - a rational number, gamma - a binary string. Then there is a set of n functions s₁, ..., s_n such that: a) each function s_j has gamma as initial segment, and b) if M identifies with probability 1 s-indices of all functions s_j, then by one of these functions M changes its mind >=k times with probability

>=(1-e) P{ Z₂+...+Z_n >=k }.

If M is recursive strategy, then the set s₁, ..., s_n can be constructed effectively.

Now, to prove Theorem 2, we derive the block B_ij from the set of functions of Lemma 4 for

M = M_i, n = 2^j, k = ]j*ln 2 - sqrt(j)ln j[, e = 2^-j , gamma = 0...01

with <i, j> zeros in gamma, where <i, j> is Cantor's number of the pair (i, j).

Let us introduce a special coding of triples (i, j, s) such that 0<=s<=2^j (see [BF 74]). The code of (i, j, s) is defined as the binary number

100....0a_t...a₀100....0, ----------(*)
|------ j ----|--|-- i --|--------------

where a_t...a₀ is the number s. Clearly,

2^i+j+1 + 2ⁱ <= code(i, j, s) <= 2^i+j+2 - 2ⁱ.

Now the numbering tau can be defined as follows. If n is (*) for some (i, j, s), 0<=s<=2^j, then tau_n is the s-th function of the block B_ij. Else tau_n is set equal to zero.

Let a probabilistic strategy M in FI identify with probability 1 tau-indices of all functions of the numbering tau. Then M=M_i for some i, and for each j>1 the block B_ij contains a function by which M changes its mind at least ]j*ln 2 - sqrt(j) ln j[ times with probability

>= (1-2^j) P{ Z₂+...+Z_n>= ]j*ln 2 - sqrt(j) ln j[ } -------------(**)

Let us denote by n_j the (unique) tau-index of this "bad" function. Obviously,

2^i+j+1 < n_j < 2^i+j+2.

Hence, {n_j} is an increasing sequence, and

log₂ n_j -i-2 < j < log₂ n_j -i-1.

By the function tau_nj the strategy M changes its mind at least

j*ln 2-sqrt(j) ln j > ln n_j - Csqrt(ln n_j)lnln n_j

times with probability (**) which ->1 as j->oo, thus proving Theorem 2.

For a proof of Lemma 4 see Section 4 and Section 5.

3. Proofs of Lemmas 2, 3

4. Proof of Lemma 4

5. Exclusion algorithm

6. Tail reduction proof

References

machine learning, inductive inference, probabilistic, strategy, machine, inductive, learning, inference, Karlis, Podnieks, theorem, estimate, theory, log, bound, upper, lower, logarithmic