The obvious definition of a random sequence is one that has the same properties as a sequence of coin flips. But this definition leaves the question, what are these properties? Kolmogorov resolved these problems with a new definition of random sequences: those with no description shorter than their full length. See survey and history in [Kolmogorov, Uspensky 1987], [Li, Vitanyi 1993].
of the string x given y is
the length of the shortest possible program p which lets algorithm A
transform y into x:
. There exists a Universal
Algorithm U such that,
, for every algorithm A. This
constant
is bounded by the length of the program U needs to simulate
A. We abbreviate
as
or
, for empty y.
An example: For
,
, so
.
Can we compute
? One could try all programs
and find
the shortest one generating x. This does not work because some programs
diverge, and the halting problem is unsolvable. In fact, no algorithm can
compute K or even any its lower bounds except
.
Consider an old paradox expressed in the following phrase:
``The smallest integer which cannot be uniquely and clearly defined
by an English phrase of less than two hundred characters.''
There are
English phrases of < 200 characters. So there must
be integers that cannot be expressed by such phrases. Then there is the
smallest such integer, but isn't it described by the above phrase?
A similar argument proves that K is not computable. Suppose an algorithm
computes a lower bound for
. We can use it to compute
that finds x with
, but
and
, so
: a contradiction.
So, K and its non-constant lower bounds are not computable.
A nice application of Kolmogorov Complexity measures the Mutual Information:
. It has many uses which we cannot consider here.
There are ways to estimate complexity that are correct in some important cases,
but not always. One such case is when a string x is generated at random. Let
us define
; (
).
What is the probability of
, for |x|=n? There are
strings x
of length n. If
, then
. There are
programs
of such length, generating
strings. So, the probability of such
strings is
(regardless of n)! So even for n=
1,000,000, the probability of
is absolutely negligible (provided
x was indeed generated by fair coin flips). We call
the
deficiency of randomness of a string x with respect to the uniform
probability distributions on all strings of that length.
If
is small then x satisfies all other reasonable properties of random
strings. Indeed, consider a property ``
'' with enumerable
of negligible probability. Let
be the number of strings of length n,
violating P and
. What can be said about the complexity of
all strings in S? S is enumerable and sparse (has only
strings). To
generate x, using the algorithm enumerating S, one needs only the position
i of x in the enumeration of S. We know that
and, thus,
. Then the deficiency of randomness
is large.
Every x which violates P will, thus, also violate the ``small deficiency of
randomness'' requirement. In particular, the small deficiency of randomness
implies unpredictability of random strings: A compact algorithm with frequent
prediction would assure large
. Sadly, the randomness can not be
detected: we saw, K and its lower bounds are not computable.
, given all previous bits. (Without such m we
would not know a segment needed to extract even one not fully predictable bit.)
We can fold X into an
matrix. No relation is required between
n,m,i,k, but useful are small m,i,k and huge
. We also need
a small
matrix Z, independent of X and really uniformly random
(or random Toeplitz, i.e. with restriction
). Then the
product X Z has uniform distribution with accuracy
. This follows from [Goldreich, Levin], which uses earlier ideas of U.
and V. Vazirani.