The following is a paper notes for “S-Paxos: Offloading the Leader for High Throughput State Machine Replication”. This paper was recommended to me as a example of high-throughput consensus, achieved by offloading responsibilities from the leader.
The paper starts by demonstrates that JPaxos is throughput limited by leader CPU, peaking at 70 kreqs/sec, where as the throughput of S-Paxos can reach 500 kreqs/sec.
Algorithm
The (normal case) algorithms works as follows:
- Any node is able to receive a request for a client, lets call this coordinator
- Coordinator sends request and its ID to all nodes
- All nodes send ack with ID to all other nodes
- When leader receives f+1 asks then sends phase 2a for ID
- When leader receives f+1 successful phase 2B for ID then send commit for ID to all
- When coordinator receives commit for ID, then execute request and reply to client
Path of request:
client -> all -> all -> all -> leader -> all -> client 1 + n + n^2 + n + n + n + 1 = n^2 + 4 + 1
or
client -> all -> all -> all -> all -> client 1 + n + n^2 + n + n^2 + 1 = 2n^2 + 2 + 1
depending on if message 2b is sent to all nodes or just leader
for comparison multi-paxos is:
client -> leader -> all -> leader -> client 1 + n + n + 1 = 2n + 2
or
client -> leader -> all -> all -> client 1 + n + n^2 + 1
depending on if message 2b is sent to all nodes or just leader
The paper proposes various optimizations such as batching and pipelining and message piggybacking to reduce network load
Evaluation
The evaluation demonstrated that under the right conditions S-Paxos can achieve 5x the throughput of JPaxos. Throughput the evaluation, graph x-axis shows number of closed loop clients, which are client who send the next request when the previous response is received. Without any indication of client latency, this did not tell us much about the rate of incoming requests. For example, the graphs in Figure 4 do need seem to be a fair comparison as # of client for Paxos and S-Paxos represents quite at different workloads.
Conclusion
I like the basic idea of this paper and its is interesting to see that latency was increased only by 1/3. However, the system places a substantially load on the network when compared to Paxos and the leader is still required to execute phase 2 paxos for each client request.