Tag Archives: Flexible Paxos

Towards an intuitive high-performance consensus algorithm

Distributed consensus is the problem of how to reach decisions in asynchronous and unreliable distributed systems. The two most widely known algorithms are: Paxos, the traditional yet poorly understood solution; and Raft, the newcomer which democratised the understanding of distributed consensus.

Raft is one of the most influential papers of this decade in distributed systems and it has been tremendously successful in industry, with 50+ open source implementations including CoreOS’s etcd and HashiCorp’s Consul. Over the years, we have increasingly seen many efforts to optimise the performance of Raft (which I myself am also guilty of). The Raft paper makes clear that it prioritises understandability over performance. Fundamentally, what makes Raft so understandable – its strong leadership and focus on state machine replication – also makes it a poor basis for high-performance consensus. The vast existing literature on distributed consensus proposes many more efficient algorithms than Raft though, like Paxos, their presentation is often less understandable to the broader community. It is sad to see that such insightful research (EPaxos, Vertical Paxos, VR and Ring Paxos to name but a few) is being largely ignored in practice.
The question, therefore, arises: can we have a consensus algorithm which is both high-performance and intuitive to reason about?

Our latest draft, available now on arXiv, tries to take the first steps towards these goals. We present a new approach to reaching distributed consensus. We aim to improve performance and understandability by utilising two key weapons: generality and immutability.

Weapon 1: Generality

Instead of proposing a single concrete algorithm optimised for a particular scenario, we provide a set of rules which, if followed, allow any algorithm to safely implement consensus. It is well understood that compromise is inevitable in distributed consensus, our aim is to enable engineers to choose their tradeoffs by customising the consensus algorithm to meet their particular needs.

In the paper, we prove that Paxos implements these rule, as does our earlier work, Flexible Paxos and an interesting decentralised consensus algorithm called Fast Paxos. In moving to this generalised model, we also observe that algorithms such as Paxos and Fast Paxos are conservative in their approach and can be optimised accordingly.

Weapon 2: Immutability

The confusion around Paxos largely lies in its subtlety. We have been inspired by the recent trend to use immutable state to tame the complexity of systems and so we have applied this idea to consensus. We build our algorithm from persistent write-once registers and the guarantee that if a value is read from a register then that response is valid forever. From this simple primitive, we can construct consensus algorithms with the aim to make reasoning about safety so intuitive that it is clear, without formal proofs, why an algorithm is correct.

Summary

Our new draft, available here, proposes an immutable and generalised approach to reaching distributed consensus over a single value. We prove that our approach is sufficiently general to encode Paxos, Flexible Paxos and Fast Paxos as well as opening the door to a board spectrum of new solutions to distributed consensus. Our hope is that in the future this new perspective on consensus may be used to express more algorithms from the academic literature and thus effectively communicate their insights to the broader community.