[G. Itkis and L. A. Levin. Fast and Lean Self-Stabilizing Asynchronous Protocols, FOCS, 1994]
Distributed computation in general, and self-stabilizing protocols in
particular, are a rich area of research. However, all prior protocols (even
non-stabilizing) for most non-trivial tasks required the memory of each node of
the network to grow (at least logarithmically) with the size of the network.(1)
Our main contribution: we give the first fault-tolerant (self-stabilizing) protocols for these tasks that use constant (independent of the network size) memory per each node. Furthermore, our protocols work in asynchronous networks. Finally, we show that handling asynchrony and providing self-stabilization does not limit the computational power and carries only a constant factor memory cost.
Suppose, some nodes in arbitrary states are carelessly tossed into a heap,
forming connections in the process (this is pretty much what happens in the case
of sensor and ad-hoc networks).
Even more challenging: suppose that instead careless tossing, a malicious adversary creates a (connected) network of any size and topology of her choosing. She also determines the initial state of each node (e.g., she might initialize the system to "think" that the task has already been accomplished, but hold a wrong answer).
Still worse, after the network is thus created (or modified) the adversary controls the timing of all the nodes - determining the schedule: the action order of different nodes (possibly activating some nodes more frequently than others).
To break symmetry randomness is required. But we let the adversary flip coins for each node! Bad coins may, of course, delay stabilization, but no more than by the number of times two nodes can flip identical coins in a row (thus failing to break symmetry between them).
Finally, the most difficult part is: the nodes are constant size. In particular, this means that the nodes cannot have names, nor can one node "point" to another node in the network outside its immediate neighborhood. Prior to our work it was believed (2) that this situation is hopeless.
What task can such a "pile of junk" solve? We show that surprisingly it can solve any task that can be solved by a reliable computer, e.g., Turing machine, of the same (up to a constant factor) size and the same information (network topology, e.g., given on a read-only tape as an adjacency matrix). Clearly, no tasks requiring greater resources can be solved by such a network.
(1) Such growing space requirements often open up possibilities for peculiarities of the models. Thus, our constant space model provides simplicity as an important benefit.
(2) In part this belief was probably due to a misinterpretation of the result of Israel and Jalfon , showing that a deadlock cannot be broken by sub-logarithmic space nodes even on a simple ring network.
Consider a city in which everyone has, say, three randomly chosen friends,
and wants to visit them any time without risking traffic jams. For that she buys
land and builds private railroads from her home to their houses, denying access
to anyone else. The city so designed will be huge, very sparsely populated, most
of its area covered by the railroads. Also, even without traffic jams the travel
will be slow, since the average distance is quadraticly larger than what it
would be were the city populated densely.
This is much how VLSI chips are designed today: each pair of communicating components linked by a separate wire. An alternative approach is similar to the way the actual cities are build: populated as densely as possible and sharing public roads for traffic. The obvious risk of such design is traffic jams if everyone simultaneously tries to reach the destination as quickly as possible.
However, it turns out that if everyone accepts a small, say about logarithmic, delay factor, then traffic rules can be devised to exclude the traffic jams completely: everybody's arrival time will be guaranteed.
More formally, for any monotone constructible f, the VLSI f-model assumes that a signal propagates across a wire of length d in time f(d). Avoiding traffic jams for linear (e.g., d divided by the speed of light) f turns out to be impossible, but for slightly super-linear f, e.g., f(d)=d (log d)2, they can be avoided by our traffic rules.
More precisely, we show that if Sumd 1/f(d) is finite then the computational power of the f-model chips remains the same when any combination of the following is added:
(1) eliminating (making zero the width of) each wire of length d, except for its log d segment (containing the destination's relative address); (1)
(2) allowing wires to transmit log d bits simultaneously;
(3) making the switching time f(d) of each node depend only on the length d of its own input wires, thus enabling small sub-circuits to run faster;
(4) enabling the nodes to change connections arbitrarily in the runtime.
We construct a kind of operating system Link Server or Linx, for short, running on a planar grid of cellular automata and simulating all these powers on-line. We also show that the condition on f cannot be weakened.
Using (prefixless) Kolmogorov Complexity K(d), condition of finite Sumd 1/f(d) can be replaced by a single bound f(d) = Ω(d) 2K(log d).
Imagine a cavalier theorist who must design a chip. He specifies the connections, but neglects to specify the layout of the wires. He does not even leave any room for them! Moreover, he assumes that the connection can be changed arbitrarily in the run-time. And with all this he expects the manufacturer to somehow implement his chip without an area increase or a computation slow-down of more than a constant factor. Our result shows that his expectations can be satisfied, however little he deserves it. The manufacturer has to build only a primitive chip, i.e. a simple grid running our Linx software. This Linx software, will create an illusion that the chip works exactly as our mad scientist thinks it should: users will not know the difference.
(1) This eliminates the wires layout and area considerations.