Inductive Inference of Functions by Probabilistic Strategies. By K.Podnieks, 1975, 1977.

3. Proofs of Lemmas 2, 3

LEMMA 5. Let L be an empty or finite set of natural numbers, set K(L) = the set of all tuples (i₁, ..., i_s) such that (i₁<i₂<...<i_s) & (Ak<=s) i_k in L (the empty tuple included). Then for an arbitrary sequence of reals {x_n} we have

Sum_i{ (x_i1-1)...(x_is-1) | i in K(L) }= Prod_j{ x_j| j in L },

where Sum_i ranges over all tuples i in K(L).

PROOF. Immediately, by induction.

PROOF OF LEMMA 3. Let us consider the tree of functions of the numbering tau which coincide up to some moment with the function f:

----> b₀...----> b₁...----> b₂..----> b₃.......................----> b_m.....................................
|...............|...............|..............|...................................|..................................................
|...............|...............|..............|...................................|..................................................
-----------------------------------------------------------------------------> a .........(f)
S₀............S₁............S₂............S₃................................S_m................................................

The infinite path drawn here corresponds to the function f (which may have more than one tau-index, by the way). The outgoing arrows correspond to functions tau_n declining from f. With each vertice of the tree we associate the probability that a function tau_n chosen according to the distribution pi meets this vertice. Let b_m be the probability that tau_n meets the vertice S_m and immediately after that declines from f. Let

B_m = b_m + b_m+1 +...

Then the probability assigned to S_m will be a+B_m, where a is the probability assigned to the infinite path f (i.e. a = Sum_n{pi_n |tau_n=f}).

By b_ij (i>=0, j>=i) we denote the probability that in the case of changing its mind at the vertice S_i the istrategy BF_tau,pi directs its new hypothesis through S_j aside from f. Clearly,

b_ij = b_j/(a+B_i). ----------------(*)

iThe total probability of changing the mind at S_m, m>0, can be expressed by the numbers b_ij:

P{A_m} = Sum{ b_0,i1*b_i1+1,i2*...*b_ik+1,m },

where Sum ranges over all tuples (i₁, ..., i_k) such that k>=0, 0<=i₁< i₂ < ... < i_k < m.

The probability of simultaneous mindchanges at S_m1, ..., S_mt (where m₁ < m₂ < ... < m_t ) can be expressed similarly:

P{A_m1 ^ .. .^ A_mt} = Sum{ b_0,i1*b_i1+1,i2*...*b_ik+1,mt },

where Sum ranges over all tuples (i₁, ..., i_k) such that k>=0, 0<=i₁< i₂ < ... < i_k < m_t and {m₁, ..., m_t-1}is a subset of {i₁, ..., i_k}.

By (*), the probability P{A_m1 ^ .. .^ A_mt} depends only on a, b₀, ..., b_m and B_m+1, where m=m_t =max m_i. Let us introduce new variables g_i, 0<=i<=m+1:

a+B_m+1 = ag_m+1,

a+B_m = ag_mg_m+1, b_m = a(g_m-1)g_m+1,

a+B_m-1 = ag_m-1g_mg_m+1, b_m-1 = a(g_m-1-1)g_mg_m+1,

a+B_j = ag_jg_j+1 ... g_m+1,b_j = a(g_j-1)g_j+1 ... g_m+1 (j=0,1,...,m).

Then we will have

b_ij = b_j/(a+B_i) = (g_j-1)/(g_i ... g_j),

b_0,i1*b_i1+1,i2*...*b_ik+1,m = (g_i1-1)...(g_ik-1)(g_m-1) / (g₀ ... g_m),

P{A_m1 ^ .. .^ A_mt} = (g_m1-1)...(g_mt-1) Sum{ (g_i1-1)...(g_ik-1) }/ (g₀ ... g_m),

where L = {0, 1, ..., m}-{m₁, ..., m_t}, and Sum ranges over all tuples (i₁, ..., i_k) in K(L), where K(L) is defined in Lemma 5. By Lemma 5, the latter Sum is equal to Prod{ g_i | i in L }, hence,

P{A_m1 ^ .. .^ A_mt} = (g_m1-1)...(g_mt-1) Prod{ g_i | i in L } / (g₀ ... g_m) = (g_m1-1)/g_m1 * ... *(g_mt-1)/g_mt.

For t=1 we would have:

P{A_m} = P_m(BF_tau,pi, f) = (g_m-1)/g_m = b_m/(a+B_m).

Hence,

P{A_m1 ^ .. .^ A_mt} = P{A_m1}* ... *P{A_mt}, i.e. the events A_i are independent. This proves Lemma 3.

PROOF OF LEMMA 2. As we already know,

P_m(BF_tau,pi, f) = b_m/(a+B_m) = 1 - (a+B_m+1)/(a+B_m).

Summing up for all m, and using the inequality 1-x <= ln(1/x) we obtain that

Sum_m{ b_m/(a+B_m) }<= ln Prod_m{ a+B_m+1)/(a+B_m) }.

Since

Prod_m<=s{ a+B_m+1)/(a+B_m) } = (a+B₀)/(a+B_s+1) -> (a+B₀)/a, as s->oo,

we obtain that

P_m(BF_tau,pi, f) <= ln((a+B₀)/a).

If f=tau_n, then a>=pi_n. Clearly, a+B₀<=1, hence,

ln((a+B₀)/a) <= ln (1/pi_n).

Q.E.D.

To complete the proof of Theorem 1 we show now, for a computable numbering tau and computable probability distribution pi (pi₁+pi₂+pi₃+...=1), how the recursive counterpart BF'_tau,pi of the strategy BF_tau,pi can be constructed. We will use also a omputable sequence of rationals {e_m} such that Prod_m(1+e_m) < oo (for example, e_m=2^-m).

Let us modify the definition of the hypothesis BF_tau,pi(f^[m]) (see Section 2) as follows. If the numbering tau is computable, the set E_m is recursive, hence, we can compute a binary-rational probability distribution (lambda_n1, lambda_n2, ..., lambda_nk) which e_m-approximates the distribution {pi_n/s | n in E_m}, s = Sum_n{pi_n | n in E_m}, i.e. lambda_n1+ lambda_n2+...+ lambda_nk= 1, and for all i:

n_i in E_m & lambda_ni <= (1+e_m) pi_ni/ s

Now define BF'_tau,pi(f^[m]) = n_i with probability lambda_ni for all i = 1, ..., k.

LEMMA 6. Let BF'_tau,pi be the modified computable probabilistic strategy. Then for all n and k,

P{BF'_tau,pi, tau_n, >=k} <= P{BF_tau,pi, tau_n, >=k}* Prod_m(1+e_m).

PROOF. Let us return to the proof of Lemma 3. For the probabilities b'_ijof BF'_tau,pi (corresponding to b_ij of BF_tau,pi in Section 2) we have:

b'_ij <= (1+e_i) b_ij. -------------(1)

The probability P{BF'_tau,pi, f, >=k} can be expressed by b'_ij:

P{BF'_tau,pi, f, >=k} = Sum{ b'_0,i1*b'_i1+1,i2* ... *b'_ik-1+1,ik },

where Sum ranges over all tuples (i₁, ..., i_k) such that k>=0, 0<= i₁ < i₂ < ... < i_k. Hence, by (1),

P{BF'_tau,pi, f, >=k} <= Prod_m(1+e_m) * Sum{ b'_0,i1*b'_i1+1,i2* ... *b'_ik-1+1,ik }
<= P{BF_tau,pi, f, >=k}* Prod_m(1+e_m).

By Lemma 6, since the inequality of Theorem 1 holds for the strategy BF_tau,pi, it holds also for BF'_tau.pi.

4. Proof of Lemma 4

Let us carry out the (more complicated) "computable case" of the proof. Let M be a computable probabilistic strategy, n, k - natural numbers, k<n, e>0 - a rational number, gamma=g₀g₁...g_a - a binary string. We will construct n functions s₁, s₂, ..., s_n starting with the values from gamma, such that if M identifies with probability 1 s-indices of all functions s_i, then by one of these functions M will change its mind >=k times with probability

>= (1-e) P{ Z₂+...+Z_n >=k},

where Z_i are random variables defined in Section2.

Let us consider the idea of the proof for the case n=4, k=2. The generalization is straightforward.

Procedure P_M. We initiate parallel computing of probabilities of the following events:

M(b₀)=t₀ & M(b₀b₁)=t₁ & ... & M(b₀...b_m)=t_m -----------------(1)

for all binary strings beta=b₀b₁...b_m and all finite sequences t=t₀t₁..t_m of natural numbers. This can be done as follows. For all pairs (alpha, beta) of binary strings alpaha, beta the following parallel -computation process is carried out: alpha serves as a finite realization of Bernoulli generator's output (i.e. a finite sequence of 0's and 1's), and beta - as initial segment f(0), ..., f(m) of some function f taking values 0,1. Initially, we associate with each pair (beta, t) an empty set of binary strings alpha. When, during the computation process with alpha and beta we see the sequence s printed on the output tape of M, then we add alpha to the set associated with (beta, t). If it appears that alpha is too short for some computation, we simply drop this computation. And so on.

End of procedure P_M.

Simultaneously with P_M, we add new values to functions s₁, s₂, s₃, s₄:

s_i(0)=g₀, s_i(1)=g₁, ..., s_i(a)=g_a, s_i(a+1)=0, s_i(a+2)=0,...

(where gamma=g₀g₁...g_a). Only at some particular moments we will interfere this process, and add a finite number of 1's as values of s_i.

The first of these moments will appear, when the probabilities of (1) will be computed precisely enough to obtain for some number j₁ the following approximate probability distribution of the hypothesis M(gamma + 0^j1 ):

| 1.......2.......3........4 | ....................(2)
|q_1.......q_2.......q_3........q₄ |..........................

where q₁+q₂+q₃+q₄ =1 and for all i:

q_i <= P{M(gamma + 0^j1) = i}/(1-delta),

where delta>0 is a constant such that (1-delta)³>=1-e (here we have n-1=3, see Section 5).

If such a moment does not appear, it would mean, that for each j in N the hypothesis M(gamma+0^j) is undefined or not in {1, 2, 3, 4} with probability greater than delta. Then, with probability >delta, this is the case for infinitely many j's simultaneously, i.e. by the function gamma + 0^oo the strategy M outputs infinitely many hypotheses other than 1, 2, 3, 4. But this is exactly the case, when we do not interfere the definition process of the functions s₁, s₂, s₃, s₄, and hence, they will be all equal to gamma + 0^oo. For this case, Lemma 4 holds obviously.

Now let us consider the case, when the probability distribution (2) can be obtained. Using (2) and the algorithm described in Section 5, we exclude one of the numbers 1, 2, 3, 4 in the following sense (for example, let it be the number 1):

The function s₁, instead of the current value s₁(x₁)=0, obtains the value s₁(x₁)=1, and for all x>x₁ s₁ is set equal to 0. The remaining three functions s₂, s₃, s₄ obtain for x=x₁ the value 0 (i.e. other than s₁(x₁)), and then (after our "interference" is concluded) they continue to obtain zero values. I.e., after this moment, s₁ differs from s₂, s₃, s₄, and by these last 3 functions the hypothesis 1 will be wrong. Our algorithm (see Section 5) guarantees that 1 will be removed only if q₁>0.

The definition process of s₂, s₃, s₄ will be interfered for the second time, when the probabilities of (1) will be computed precisely enough to obtain for some j₂ > j₁ the numbers q_1i:

| 2........3........4 | .....................(3)
|q_12.....q_13.....q₁₄ |..........................

such that q₁₂+q₁₃+q₁₄=1 and for all i:

q_1i <= P{ M(gamma + 0^j1)=1 & M(gamma + 0^j2) =2 }/ (1-delta)².

If such a moment would not appear, it would mean, that for some delta'>0 and all j > j₁:

P{ M(gamma + 0^j) not in {2,3,4} | M(gamma + 0^j1) = 1 } > delta'.

Hence, with probability >delta' this is the case for infinitely many j's simultaneously. Since our second interference does not take place in this case, the functions s₂, s₃, s₄ will be set equal to gamma + 0^oo. Lemma 4 holds in this case obviously.

Let us consider the case, when the distribution (3) can be obtained. Then, by the algorithm of Section 5, we exclude another function s_i (for example, let it be s₂). We set s₂(x₂)=1, s₃(x₂)=s₄(x₂)=0, and for all x>x₂: s₂(x)=0 (here, of course, x₂>x₁, where x₁ is the location of our first interference).

The third interference (and the last one - when n=4) in the definition process of functions s₃, s₄ will take place, when the probabilities of (1) will be computed precisely enough to obtain a number j₃ > j₂ and numbers q_12i, q_22i:

| 3.......... 4 |.....| 3...........4 |......................(4)
| q_123...q₁₂₄ |.....|q_223...q₂₂₄ |..........................

such that q₁₂₃+q₁₂₄=1, q₂₂₃+q₂₂₄=1, and for i=3, 4:

q_12i <= P{ M(gamma + 0^j1)=1 & M(gamma + 0^j2) =2 & M(gamma + 0^j3) =i }/ (1-delta)³.

q_22i <= P{ M(gamma + 0^j1)=2 & M(gamma + 0^j2)=i }/ (1-delta)³.

In the case, when the numbers j₃, q_12i, q_22i cannot be obtained, Lemma 4 holds obviously.

If the numbers (4) have been obtained, the algorithm of Section 5 "removes" another function s_i (for example, let it be s₃). We set s₃(x₃)=1 (where, of course, x₃>x₂), s₄(x₃)=0, and for all x>x₃: s₄(x)=0. Since n=4, now only one function s₄ remains, let s₄(x)=0 for all x>x₃.

Hereby we conclude the definition of functions s₁, ..., s_n, corresponding by Lemma 4 to the probabilistic strategy M, natural numbers k, n (k<n) and rational number e>0. The algorithm, described in Section 5, will guarantee that by one of the functions s_i the strategy M will change its mind >=k times with a sufficiently large probability.

Back to title page