\newpage \subsection {Simulation.}
We considered several types of machines (models of computation).
We will see now that all these machines can be simulated by the
simplest of them: the Turing Machine. In other words, these powerful
machines can compute only those functions computable by a TM.
\vfill{\bf Church-Turing Thesis} is a generalization of this conclusion:
TMs can compute every function computable in any thinkable physical
model of computation. This is not a mathematical result because the
notion of model is not formally specified. But the long history of
investigations of ways to design real and ideal computing devices makes
it very convincing. Moreover, this Thesis has a stronger {\em Polynomial
Time} version which states that if any model computes a function in
polynomial volume, TM can do the same. Both forms of the Thesis play a
significant role in foundations of Computer Science.
\vfill{\bf PKM Simulation of PPM.} For convenience, we assume each PPM
node has an edge into root. Now, for each node, we reconnect its
incoming (in unlimited number) PPM edges, 2 per leaf, to a bidirectional
binary tree with new PKM colors $l,r,u$. The number of edges increases
at most 4 times. The nodes simulate PPM's pulling stage by extending
their trees to double depth. To simulate the re-coloring stage, each
node gets a binary {\em name} formed by the $l,r$ colors on its path
through the root tree. Then it broadcasts its name down its own tree.
When each node thus receives identities and pointers of its PPM
neighbors, it stores them by creating a little auxiliary chain (acting
as TM). Then it computes and implements the actions of the original PPM
and rebalances its tree. This simulation of a PPM step takes
polylogarithmic time.
\vfill{\bf TM Simulation of PPM.} To simulate the above PKM by a TM, we
first represent its memory configuration on the TM tape as the list of
all pointers, sorted by the source names (described above) and then by
color. The PKM program is reflected in the TM's transition table. Now
the TM can simulate a PKM's pulling stage as follows: It creates a copy
of each pointer and sorts copies by their sinks. Now each pointer,
located at source, has its copy near its sink. So both components of
2-pointer paths are nearby: the special double-colored pointers can be
created and moved to their sources by sorting. The re-coloring stage is
straightforward, as all relevant pointers have the same source and are
located together. When no active edges remain in the root, the Turing
machine stops and its tape represents the PKM output. If a PPM computes
a function $f(x)$ in $t(x)$ steps, using $s(x)$ nodes, the simulating TM
uses space $S = O(s\log s)$, ($O(\log s)$ bits for each of $O(s)$
pointers) and time $T= O(S^2)t$, as TM sorting takes quadratic time.
{\small TM cannot outperform Bubble Sort. Is its quadratic overhead a
big deal? In a short time all silicon gates on your PC run, say, $X\!=\!
10^{23}\!\sim\!2^{2^{6.25}}$ clock cycles combined. Silicon parameters
double almost annually. Decades may bring micron-thin things that can
sail sunlight in space in clouds of great computing and physical (light
beam) power. Centuries may turn them into a Dyson Sphere enveloping the
solar system. Still, the power of such an ultimate computer is limited
by the number of photons the Sun emits per second: $Y\!\sim\!
2^{2^{7.25}} \!\!=\! X^2$. Giga-years may turn much of the known
universe into a computer, but its might is still limited by its total
entropy $2^{2^{8.25}}\!\!=\!Y^2$. Squaring matters!}
\vfill{\bf Faster PPM Simulations.} Parallel Bubble-Sort on CA or
Merge-Sort on sequential FCM take nearly linear time. Parallel FCM can
do much better [Ofman 65]. It represents and updates pointer graphs as
the above TM. All steps are straightforward to do locally in parallel
polylog time except sorting of pointers. We need to create a fixed
connection sorting network. Sophisticated networks sort arbitrary arrays
of integers in log time. We need only a simpler polylog method.
Merge-Sort splits an array of two or more entries in two halves and
sorts each recursively. Batcher-Merge combines two sorted lists in {\em
parallel} log time.
% Errata in SIGACT NEWS 28(2):80, June 1997 (submitted 1996):
\vfill{\bf Batcher Merge:} Call array entries {\em $i$-th partners} when
their addresses differ only in $i$-th bit. Operations on partners can be
implemented on a {\em Shuffle Exchange} graph of $2^k$ nodes. Each node
has pointers to its $k$-th partner and to and from its {\em shift} node
obtained by moving its first address bit to the end.
A {\em bitonic cycle} is the combination of two sorted arrays (one may
be shorter), connected by max-to-max and min-to-min entries. We merge
sorted arrays by appending the reversed second one to the first,
considering the last and first entries as neighbors, and sorting the
resulting bitonic cycle.
The sorting of a $2^k$ long bitonic cycle proceeds by comparing each
entry with its $k$-th partner (i.e. diametric opposite on the cycle) and
switching if wrongly ordered. Each half becomes then a bitonic cycle and
any two entries from different halves are in proper order. The process
then repeats for each half recursively (decrementing $k$ through the
graph's shift edges).