Category Archives: Reading List

Paper notes on S-Paxos [SRDS’12]

Algorithm

The (normal case) algorithms works as follows:

Any node is able to receive a request for a client, lets call this coordinator
Coordinator sends request and its ID to all nodes
All nodes send ack with ID to all other nodes
When leader receives f+1 asks then sends phase 2a for ID
When leader receives f+1 successful phase 2B for ID then send commit for ID to all
When coordinator receives commit for ID, then execute request and reply to client

Path of request:

client -> all -> all -> all -> leader -> all -> client
1 + n + n^2 + n + n + n + 1 = n^2 + 4 + 1

client -> all -> all -> all -> all -> client
1 + n + n^2 + n + n^2 + 1 = 2n^2 + 2 + 1

depending on if message 2b is sent to all nodes or just leader

for comparison multi-paxos is:

client -> leader -> all -> leader -> client
1 + n + n + 1 = 2n + 2

client -> leader -> all -> all -> client
1 + n + n^2 + 1

depending on if message 2b is sent to all nodes or just leader

The paper proposes various optimizations such as batching and pipelining and message piggybacking to reduce network load

Evaluation

The evaluation demonstrated that under the right conditions S-Paxos can achieve 5x the throughput of JPaxos. Throughput the evaluation, graph x-axis shows number of closed loop clients, which are client who send the next request when the previous response is received. Without any indication of client latency, this did not tell us much about the rate of incoming requests. For example, the graphs in Figure 4 do need seem to be a fair comparison as # of client for Paxos and S-Paxos represents quite at different workloads.

Conclusion

I like the basic idea of this paper and its is interesting to see that latency was increased only by 1/3. However, the system places a substantially load on the network when compared to Paxos and the leader is still required to execute phase 2 paxos for each client request.

Conservative election in Byzantine Fault Tolerant Raft

1 Reply

Tangaroa, is a Byzantine tolerant Raft implementation in Haskell, developed by Christopher Copeland and Hongxia Zhong for a Distributed Systems class at Stanford. The authors apply many of the approaches in PBFT to Raft, allowing for the Byzantine failure of nodes. My interest in this work is how can you stop unnecessary leader elections in Raft algorithm using more conservative election approaches, given that election are too common in some environments (see s4.2 in ARC) .

Copeland and Zhong quickly identify this is one of the key barriers to developing a byzantine raft and describe their lazy voters approach. As with the original Raft protocol, a node time out, becomes a candidate, increments its terms and dispatches RequestVotes RPC’s. In addition to the normal restrictions on granting a vote, Tangaroa adds two extra conditions

new term must be < current_term + H, where H is a statistically unlikely number of split votes.
the current leader must have failed to dispatch AppendEntries within the heartbeat.

I am not convinced that the first condition will not lead the system to become permanently unavailable. For example, a network partition could separate the nodes into two groups thus one group will have a leader in a low term and the other group constantly trying to re-elect a new leader as it cannot get a majority. When network connectivity is restored, the nodes in will observe a sudden unbounded term increase. Thus nodes will not give out their votes and the system is stalled.

The author’s highlight how the second condition could be implemented. Nodes who are in a position to grant a vote except that the second condition isn’t meet, will record the vote locally and then dispatch the votes only when they observe a leader failure for themselves. This means that a node may only replace the leader when a majority of nodes detect a failure (or the client does) instead of when a single node detects a failure. The paper doesn’t go into much detail about this and how this would impact performance, I suspect that this approach is a little too conservative and many in some cases make it significantly more difficult to replace a fault leader. Various implementation details need to be consider here, for example, if a node times out and dispatches its vote, how long until the node steps up to candidate itself? or after how long does it “forget” its local vote, the election timeout maybe.

Writing a Byzantine Fault Tolerant Raft was always going to be hard, Tangaroa is a good first step in the right direction.

Middleboxes considered harmful: DNS Edition

1 Reply

This article is brief overview of how middleboxes interact with DNS traffic. In particular I’m interested in finding out the answers to the following: Will middleboxes drop/modify DNS traffic and what is the purpose of this: stopping abuse, security, buggy implementations, advertising or censorship? Therefore does using your own stub resolver and recursive nameserver free you from the above issues? Do DNS recursive nameservers with caching respect the TTL? And ultimately how does the all this affect the deploy of DNS extensions such as DNSSEC, DNSCurve, DynDNS, EDNS?

My particular interest in DNS is how will research projects for naming edge network devices (e.g. HIP, UIA, UIP, MobilityFirst, CoDoNS, FERN) actually fair in the wild and is using or extending DNS a way around such issues. The title of this article is play on the title of the paper describing Delegation-Oriented Architecture.

Applications & Stub Resolvers

Stub resolver are in essence the clients to the Domain Name System (DNS), they sit between applications and DNS, usually ran locally by the OS and interfaced with by gethostbyname. The stub resolver is responsible for forming and parsing DNS packets for the application, offering a simple API to application for resolving domain names into IP address. The simplicity of this API is also its downfail, for example, gethostbyname has few error codes compared to DNS’s RCODEs. Proponents of DNSSEC hope that web browsers will present DNS validation failures to users in the same way that TLS failures are presented. At the moment however, for many stub resolvers the only possible error codes (often called h_errno) are HOST_NOT_FOUND, TRY_AGAIN, NO_RECOVERY and NO_ADDRESS. The application may not ever get this much information depending on the language API, such as Unix.gethostbyname in OCaml’s standard library.

A common linux default is to request AAAA records as well as A records even if the host doesn’t have a IPv6 address. Kreibich et al found that 13% of all sessions requested AAAA records: 42% of linux session requested AAAA records, compared to 10% of non-linux sessions, backing up this theory.

Some stub resolvers and client applications cache DNS responses, interestingly some do not respect TTLs. For example, the default cache time for ncsd (enabled by default on some linux distros) is 15 mins regardless of TTL, whereas internet explorer caches all records for 30 mins. It is important the caches respect short TTLs as they are increasingly utilised by content distribution networks and dynamic DNS. A quick check on my own browser (go to chrome://net-internals/#dns in chorme) shows that the browser cache contains 73 A/AAAA active records and 263 expired records.

Weaver et al. and Kreibich et al. studied how middleboxes interact with DNS traffic using the Netalyzr tool. Weaver et al. concluded that application wishing to use non-standard resource records (RRs) including TXT resources or DNSSEC should use their own DNS resolver and bypass the stub resolver provided by the host. It is often not possible for an application to overwrite the stub resolver’s choice of DNS resolver, which is normally a DNS resolver at the gateway, with a host of problems (see next section). The study also concluded that host stub resolvers often lack failovers (e.g. trying requests over TCP) to common issues such as: the gateway resolver not supporting the full DNS protocol, the gateway resolver cannot be trusted, the gateway resolver may be slow and the network gateway/middleboxes may filter UDP traffic.

In-Gateway Resolvers

The gateway resolver is a common (but not necessary) stage in DNS resolution (there may also be multiple stages of gateway resolvers). The stub resolver running on local host will usually forward the DNS query to the resolver(s) whos address it was given by DHCP lease when connecting to the local router. This address is normally a DNS resolver running at the gateway (at the .1 or .254 address in the local subnet e.g. 192.168.1.x) . I say “usually” as this can be overwritten, for example some people instead opt to use a public DNS server such as Google’s or OpenDNS, or run their own resolver, this is of course rare. Furthermore, not all gateways run DNS resolvers, in this case they typically refer hosts straight to the ISPs resolvers. Gateway resolvers have the advantage that they can enable the local resolution of domain such as .local or domain name for router adminisation (e.g. www.routerlogin.net for Netgear devices).

Weaver et al. tested the whether in-gateway resolvers correctly processed various DNS queries, they found that following: AAAA lookup (96%), TXT RRs (92%), unknown RRs (91%) and EDNS0 (91%). They also found that a significant number of in-gateway resolver are externally usable, opening the gateway to DoS attacks.

ISPs (& Other) Resolvers

The ISP’s resolver is a common (but not necessary) stage in DNS resolution (there many also be multiple stages of ISP resolvers). The ISP resolver is often the resolver responsible for begin to the actual resolution instead of just forwarding/proxying queries.

Despite there widespread deployment and dedicated management, these resolvers are not without there problems. Weaver et al found that 4% of sessions did not implement source port randomisation, only 55% of sessions exhibit EDNS0 usage, 4% of sessions implemented 0x20 whilst 94% propagate capitalisation unmodified. Kreibich et al found that 49% of sessions used DNSSEC enabled resolvers.

https://www.vs.uni-due.de/wander/20121229_Secure_Name_Resolution.pdf

DNSSEC capable resolvers by Matthäus Wander

NXDOMAIN wildcarding is where resolvers replaces responses with the NXDOMAIN error (for example, when a user mistypes a domain) with valid DNS responses point to another site, often with advertising. Weaver et al observed this in 24% of the sessions surveyed. This should only be done on queries from web browsers, though this is not always the case. This may also interact with web browsers who treat NXDOMAIN errors specifically, e.g. if the query fails due to NXDOMAIN, then suggest some likely alternatives. Worryingly, Weaver et al also observed a few cases of SERVFAIL wildcarding, IPv4 addresses in responses where IPv6 only was requested and ignoring additional answer RRs. Some resolvers redirect queries for some search engine, whilst other have malware to inject adverting. Kreibich et al found that essentially all resolver respected a 0 and 1 second TTL.

Another interesting area is the treatment of RRs from the Authority and Additional RR sets. For example, glue records are A RRs in the Additional section added to an answer with NS RRs which put the name servers under the domain they control, without these additional RR’s we would have a circular dependency. Kreibich et al found that 61% of sessions accept glue records when the glue records refer to authoritative nameservers, 25% accept A records corresponding to CNAMEs contained in the reply and 21% of sessions accepting any glue records present in the Additional field, and those only doing so for records for subdomains of the authoritative server.

Other ISP controlled middleboxes

It is clear that resolvers (stub, in-gateway and ISP/Public) do not reliability handle all DNS traffic and all its extensions. Users could opt to run there own resolver and stub resolvers, would this mean that their traffic be free from modification by middleboxes? Of course not.

ISPs have been know to highjack traffic to port 53 to their own DNS resolvers or simply drop it, blocking use of third party DNS resolvers. Some public resolvers support alternative ports (e.g. OpenDNS supports port 5353), but this can be difficult to configure as its cannot be easily expressed in /etc/resolv.conf. There is some evidence of gateways provided by ISPs, redirecting traffic to port 53 to the ISP’s DNS resolvers

TLDs and Root Servers

The root DNS server (or actually the 504 servers, 13 addresses) is the heart of the DNS. The root has supported DNSSEC since 2010, will not support DNSCurve. Likewise many of the TLD’s support DNSSEC and will not support DNSCurve. On the whole, these seems to fairly well managed and free of major issues.

Link

This course maybe of interest to readers, titled “Fog Networks and the Internet of Things”.

This course teaches the fundamentals of Fog Networking, the network architecture that uses one or a collaborative multitude of end-user clients or near-user edge devices to carry out storage, communication, computation, and control in a network. It also teaches the key results in the design of the Internet of Things, including consumer and industrial applications.

Link: https://www.coursera.org/course/fog

Link

This Harvard course page has a nice reading list to get you started.

Link

This seems like an interesting venue for work on edge network distributed systems. Nice to see that we are not the only ones who think that the edge is area worth researching.

CFP: Fog Networking for 5G and IoT workshop

>>>>> In conjunction with SECON 2015 <<<<<

=========================================

22 June 2015, SEATTLE – USA

http://secon2015.ieee-secon.org/workshop-program

Important dates

Submission deadline(Hard): April 1st, 2015

Notification of acceptance: April 15th, 2015

Camera Ready: April 30th, 2015

Workshop: June 22nd, 2015

Scope:

Pushing computation, control and storage into the “cloud” has been a key trend in networking in the past decade. Over-dependence on the cloud, however, indicates that availability and fault tolerance issues in the cloud would directly impact millions of end-users. Indeed, the cloud is now “descending” to thenetwork edge and often diffused among the client devices in both mobile and wireline networks. The cloud is becoming the “fog.”

Empowered by the latest chips, radios, and sensors, each client device today is powerful in computation, in storage, in sensing and in communication. Yet client devices are still limited in battery power, global view of the network, and mobility support. Most interestingly, the collection of many clients in a crowd presents a highly distributed, under-organized, and possibly dense network. Further, wireless networksis increasingly used locally, e.g. intra-building, intra-vehicle, and personal body-area networks; and data generated locally is increasingly consumed locally.

Fog Network presents an architecture that uses one or a collaborative multitude of end-user clients or near-user edge devices to carry out storage, communication, computation, and control in a network.

It is an architecture that will support the Internet of Things, heterogeneous 5G mobile services, and home and personal area networks. Fog Networking leverages past experience in sensor networks, P2P and MANET research, and incorporates the latest advances in devices, network systems, and data science to reshape the “balance of power” in the ecosystem of computing and networking.

As the first high-quality IEEE workshop in the emergent area of Fog Networking, this workshop’s scope includes:

– Edge data analytics and stream mining

– Edge resource pooling

– Edge caching and distributed data center

– Client-side measurement and crowd-sensing

– Client-side control and configuration

– Security and privacy in Fog

– Fog applications in IoT

– Fog applications in 5G

– Fog applications in home and personal area networking

Accepted and presented papers will be published in the IEEE FOG Networking Proceedings by the IEEE Computer Society Conference Publishing Services and IEEE Xplore Digital Library.To be published in the IEEE FOG Networking Proceedings an author of an accepted paper is required to register for the workshop at the full (member or non-member) rate and the paper must be presented by an author of that paper at the conference unless the TPC Chair grants permission for a substitute presenter arranged in advance of the event and who is qualified both to present and answer questions. Non-refundable registration fees must be paid prior to uploading the final IEEE formatted, publication-ready version of the paper. For authors with multiple accepted papers, one full registration is valid for up to 3 papers.

Workshop Co-Chairs:

Mung Chiang

Arthur LeGrand Doty Professor of Electrical Engineering

Director of Keller Center for Innovation in Engineering Education

Princeton University

Sangtae Ha

Assistant Professor, Computer Science Department

University of Colorado at Boulder

Junshan Zhang

Professor, Electrical and Computer Engineering Department

Arizona State University

Workshop Technical Program Committee:

Bharath Balasubramanian (AT&T Labs)

Suman Banerjee (University of Wisconsin)

John Brassil (HP Labs)

Gary Chan (Hong Kong University of Science and Technology)

Tian Lan (George Washington University)

Athina Markopoulou (UC Irvine)

Rajesh Panta (AT&T Labs)

Chunming Qiao (University of Buffalo)

Moo-ryong Ra (AT&T Labs)

Tao Zhang (Cisco)

Link

There has been lot of buzz about the latest alternative to cookies from Google, Macaroons: Cookies with Contextual Caveats for Decentralized Authorization in the Cloud. Its well worth a read.

Read, Write, Execute

Notebook of a researcher in distributed systems.

Category Archives: Reading List

Paper notes on S-Paxos [SRDS’12]

Algorithm

Evaluation

Conclusion

Conservative election in Byzantine Fault Tolerant Raft

Middleboxes considered harmful: DNS Edition

Applications & Stub Resolvers

In-Gateway Resolvers

ISPs (& Other) Resolvers

Other ISP controlled middleboxes

TLDs and Root Servers

Can You Engineer Privacy?