## Preface

- The consensus problem
- Consensus with Byzantine Failures

## Processor Failures in Message Passing

`Fail stop`

: at some point the processor stops taking steps- at the processor’s final step, it might succeed in sending only
**a subset of the messages**it is supposed to send, e.g.:`Crash`

- Power outage
- Hardware failure
- Out of memory/disk full

- Strategies:
- Checkpoint state and restart (
**High latency**) - Replicate state and fail over (失效备援) (
**High cost**)

- Checkpoint state and restart (

- at the processor’s final step, it might succeed in sending only
`Byzantine`

:**Everything that is not fail stop**. Processor changes state arbitrarily and sends message with arbitrary content.- E.g.:
- Bit flip in memory or on disk corrupts data
- Older version of code on one node sends (now) invalid messages
- Node starts running malicious version of software

- Goal: turn into fail stop
- Checksums/ECC (Error Correction Code)
- Assertions
- Timeouts

- E.g.:

### Failure Matrix

### Negative result for **link failures**

It is impossible to reach consensus in case of link failures, even in the synchronous case, and even if one only wants to tolerate a single link failure.

Consensus under link failures: `the 2 generals problem`

.

- There are two generals of the same army who have encamped a short distance apart.
- Their objective is to capture a hill, which is possible only if they attack simultaneously.
- If only one general attacks, he will be defeated.
- The two generals can only communicate by sending messengers,
**which is not reliable**. - Is it possible for them to attack simultaneously?

See Fischer/Lynch/Paterson for more discussion.

## The Consensus Problem

Each process starts with an individual `input`

from a particular value set V. Processes may fail by crashing. All non-faulty processes are required to produce outputs from the value set V, subject to simple agreement and validity. A solution to the **consensus proble**m must guarantee the following:

`Termination`

: Eventually every**nonfaulty processor**must decide on a value (decision is irrevocable).`Agreement`

: All decisions by nonfaulty processors must be the same.`Validity`

: If all inputs are the same, then the decision of a nonfaulty processor must equal the common input

Once a processor crashes, it is of no interest to the algorithm, and no requirements are placed on its decision.

**f-resilient system**:

- at most
`f`

processors may fail - The set of faulty processors may be
**different**in different executions.

In the last round:

`Clean crash`

: none or all of the outgoing messages are sent`Not-clean crash`

: an arbitrary set of its outgoing messages are delivered.

### Consensus algorithm in the presence of crash failures

Each process maintains a set of the values it knows to exist in the system; initially, this set contains only its own input. At the first round, each process broadcasts its own input to all processes. For the subsequent f rounds, each process takes the following actions:

- updates its set by joining it with the sets received from other

processes, and - broadcasts any new additions to the set to all processes.

After f+1 rounds, the process decides on the smallest value in its set.

- Intuition for
`Agreement`

: Assume that a process pi decides on a value x smaller than that decided by some other process p_j. Then, x has remained “hidden” from pj for (f+1) rounds. We have at most f faulty processes. A contradiction!!! - Number of processes: n > f
- Round Complexity: f + 1
**Message Complexity**: (at most) n^2 * |V| messages, where V is the set of input values.

#### Worst case scenario

## Consensus with Byzantine Failures

Theorem (5.7): Any consensus algorithm for 1 Byzantine failure must have at least 4 processors.

Proof of Theorem 5.7:

Theorem: Any consensus algorithm forfByzantine failures must have at least3f+1processors.

- Partition the processors into three sets P0, P1, P2;
- Each containing at most n/3 processors

- P0 simulates p0, P1 simulates p1 and P2 simulates p2
- n processors solves consensus => {p0, p1, p2} solves consensus. ==> Contradiction.

## Exponential Information Gathering (EIG) Algorithm

This algorithm uses

- f + 1 rounds (optimal)
- n = 3f + 1 processors (optimal)
- exponential size messages (sub-optimal)

Each processor keeps a tree data structure in its local state. Values are filled in the tree during the f + 1 rounds

At the end, the values in the tree are used to calculate the decision.

### Local Tree Data Structure

- Each tree node is labeled with
**a sequence of**unique processor indices. - Root’s label is empty sequence ; root has level 0
- Root has
`n`

children, labeled 0 through n - 1 - Child node labeled i has n - 1 children, labeled i : 0 through i : n-1 (
**skipping i : i**) - Node at level
`d`

labeled`v`

has`n - d`

children, labeled v : 0 through v : n-1 (**skipping any index appearing in v**) - Nodes at level f + 1 are leaves.

The tree when n = 4 and f = 1 :

### Filling in the Tree Nodes

- Initially store your input in the root (level 0)
- Round 1:
- send level 0 of your tree to all
- store value x received from each p_j in tree node labeled j (level 1); use a default if necessary
- “pj told me that pj ‘s input was x”

- Round 2:
- send level 1 of your tree to all
- store value x received from each pj for each tree node k in tree node labeled k : j (level 2); use a default if necessary
- “pj told me that pk told pj that pk’s input was x”

- Continue for f + 1 rounds

### Calculating the Decision

- In round f + 1, each processor uses the values in its tree to compute its decision.
Recursively compute the “resolved” value for the root of the tree,

`resolve()`

, based on the “resolved” values for the other tree nodes:

Example of resolving values when n = 4 and f = 1:

## A Polynomial Algorithm (The King Algorithm)

We can reduce the message size to polynomial with a simple algorithm.

- The number of processors increases to: n > 4f
- The number of rounds increases to 2(f + 1)
- Uses f+1 phases, each taking two rounds.

## Asynchronous Consensus

Assumptions

- Communication system is reliable
- Only processor failures (crash / Byzantine)
- Completely asynchronous

Theorem: For n ≥ 2, there is no algorithm in the read/write shared memory model that solves the agreement problem and guarantees wait-free termination

`Consensus is impossible`

!!!

- Even in the presence of one single

processor (crash) failure

Proof of impossibility

- Impossibility for shared memory
- The (n-1)-resilient case (wait-free case)
- The 1-resilient case

- Impossibility for message passing
- Simulation

- Impossibility motivates the use of
`failure detectors`

.- E.g., “The weakest failure detector for solving consensus”, JACM 43(4).

## Summary

Let f be the maximum number of faulty processors.

## References

[1] Attiya, Hagit, and Jennifer Welch. **Distributed computing**: fundamentals, simulations, and advanced topics. Vol. 19. John Wiley & Sons, 2004..

[2] Byzantine Fault Tolerance, Distributed System (A free online class), by Chris Colohan.

[3] The Byzantine Generals Problem, Leslie Lamport, Robert Shostack and Mashall Peace. ACM TOPLAS 4.3, 1982.

[4] CSCE 668: Distributed Algorithms and Systems Spring 2014.