Category Archives: Consensus

The problem with consensus

A distributed system is collection of nodes, each which there own local memory, which are able to communicate via message passing, cooperate to perform a computation. CAP theorem [1,2] argues that its not possible to achieve consistency, availability and partition tolerance. But machines will fail and there failures are tolerated (to varying extents) by replication. These replicates now need to agree on consistent worldviews, leading to the problem of consensus, originally proposed by [6], Consensus is when these nodes agree on a value, applications for this include:

  • mutable exclusion locks
  • committing a transaction to a database
  • distributed storage such as NFS
  • Implementing reliable broadcast
  • leader election

A protocol for consensus must provide the following to be “correct”:

  • Agreement: all correct nodes arrive at the same value, (the safety property)
  • Validity: the value chosen is one that was proposed by a correct node, (the non-triviality property)
  • Termination: all correct nodes eventually decide on a value, (the liveness property)

A correct node is a node that will eventency make progress so its hasn’t yet and will not experience any of the failures listed below:

QUESTION: Some papers [7] , list a 4th condition called unanimity, stating that if all nodes propose the same value, this will be the value chosen. It appears to me that this is just a case of validity, since the value chosen must have proposed by a node, so if all nodes propose the same value then there is only one possible value to be chosen, according to the validity condition. But this used in a few paper so I can’t be right, so why ? 

The following failures are possible:

  • Fail-stop – nodes may stop but will not restart
  • Fail-recover – nodes may stop and restart
  • Byzantine – nodes behavior unexpectedly, from fault or malicious

Assuming synchronous communication (reliable unicast with known bounded message delay and execution of nodes)  and less than 2/3ds of nodes experience Byzantine failures , we can achieve consensus [3] but a completely asynchronous consensus protocol cannot guarantee consensus with just a single fail-stop node [4], the intuition behind this is that you can’t detect a fail-stopped node, as it my just be slow or its messages not yet delivered. This can be masked by trying to detect failures with waiting on unresponsive nodes, comprising liveness or timeouts/heartbeats comprising accuracy. All consensus algorithms provide consistency but for this they comprises partition tolerance and/or availability.

Henceforth, I would like to consider processes communicating over TCP/IP, making the following assumptions:

  • Nodes have local state, no shared memory and local clocks (though they may be shared when processes are running on the same host)
  • If a path is available, message passing is reliable unicast with unbounded delay (thanks to TCP’s reliable delivery, data integrity and in order delivery) but network partitions are possible
  • Failure detection methods don’t need to be accurate, thus its ok to use timeouts and assume a node is dead when it is infact live
  • A node can tell which node sent a message (e.g. MACs/IPs) so a Byzantine node cannot forge a message which appears to come from an honest host


Two phase commit is depend on a single node, whereas three phase commit [5] can tolerant half its nodes failing but not network partitions or unbounded network delays.

My next post will consider some more complex approaches meeting the above requirements


[1] Brewer, E.  Rowards Robust Distributed System. Symposium on Principles of Distributed Computing (PODC).(2000).

[2] Gilbert, Seth, and Nancy Lynch. “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services.” ACM SIGACT News 33, no. 2 (2002): 51-59.

[3 Lamport, Leslie, Robert Shostak, and Marshall Pease. "The Byzantine generals problem." ACM Transactions on Programming Languages and Systems (TOPLAS) 4, no. 3 (1982): 382-401.

[4] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (April 1985), 374-382.

[5] Skeen, Dale; Stonebraker, M. (May 1983). “A Formal Model of Crash Recovery in a Distributed System”. IEEE Transactions on Software Engineering: 219–228

[6] Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2) (April 1980) 228–234