Paper notes on S-Paxos [SRDS’12]

The following is a paper notes for “S-Paxos: Offloading the Leader for High Throughput State Machine Replication”. This paper was recommended to me as a example of high-throughput consensus, achieved by offloading responsibilities from the leader.

The paper starts by demonstrates that JPaxos is throughput limited by leader CPU, peaking at 70 kreqs/sec, where as the throughput of S-Paxos can reach 500 kreqs/sec.

Algorithm

The (normal case) algorithms works as follows:
  • Any node is able to receive a request for a client, lets call this coordinator
  • Coordinator sends request and its ID to all nodes
  • All nodes send ack with ID to all other nodes
  • When leader receives f+1 asks then sends phase 2a for ID
  • When leader receives f+1 successful phase 2B for ID then send commit for ID to all
  • When coordinator receives commit for ID, then execute request and reply to client
Path of request:
client -> all -> all -> all -> leader -> all -> client
1 + n + n^2 + n + n + n + 1 = n^2 + 4 + 1
or
client -> all -> all -> all -> all -> client
1 + n + n^2 + n + n^2 + 1 = 2n^2 + 2 + 1
depending on if message 2b is sent to all nodes or just leader
for comparison multi-paxos is:
client -> leader -> all -> leader -> client
1 + n + n + 1 = 2n + 2
or
client -> leader -> all -> all -> client
1 + n + n^2 + 1
depending on if message 2b is sent to all nodes or just leader
The paper proposes various optimizations such as batching and pipelining and message piggybacking to reduce network load

Evaluation

The evaluation demonstrated that under the right conditions S-Paxos can achieve 5x the throughput of JPaxos. Throughput the evaluation, graph x-axis shows number of closed loop clients, which are client who send the next request when the previous response is received. Without any indication of client latency, this did not tell us much about the rate of incoming requests. For example, the graphs in Figure 4 do need seem to be a fair comparison as # of client for Paxos and S-Paxos represents quite at different workloads.

Conclusion

I like the basic idea of this paper and its is interesting to see that latency was increased only by 1/3. However, the system places a substantially load on the network when compared to Paxos and the leader is still required to execute phase 2 paxos for each client request.

Unanimous: System Research Group talklet

Screen Shot 2015-04-14 at 08.44.05

I’m looking forward to sharing my thoughts on consensus for the edge network with the SRG today, abstract below

Many projects in the SRG at the moment (HAT, UCN, contacts app, MirageOS for ARM, Jitsu, databox, signposts) are trying to give individuals an viable alternative to 3rd party centralised services and put them back in control of their personal data. However developing applications for the hostile edge network, with its heterogeneous hosts and networks, trust issues and poorly understood middle boxes is tricky. This is made worse by the fact that consensus algorithms are famously difficult to use, underspecified and based on decade old assumption about the internet. In this talklet, I will motivate Unanimous, a new consensus algorithm for the modern internet.

NB: this is a practice talk for EuroSys doctoral workshop next Tuesday, thus this 5 min talk will simply motivate a research direction instead of presenting a complete solution.
EDIT (17/4): these slides are now online