Paper Notes: Ethical Privacy Guidelines for Mobile Connectivity Measurements [Report Nov’13]

Ethical Privacy Guidelines for Mobile Connectivity Measurements is the first item on the C3S reading list, below is my brief notes on this November 2013 report by the Oxford Internet Institute.

The stated purpose of this report is to inform networking researchers about the best practices for preserving data subject privacy when performing active measurement of mobile networks.

Researcher must make a comprise between the privacy of data subjects and dissemination of research artefacts for reproducibility. To aid in reasoning about this comprise, the report presents a risk assessments format covering: the contributions of the research, risk of de-identification, impact of re-identification, unforeseen risk (such as data theft), methods to dissemination artefacts, informed consent and transparency. The report goes onto discuss a few legal implications, in particular, the ongoing debate on whether IP addresses and communication metadata are personally identifiable information.

The authors focus on a guiding principle: collect the minimal data possible to conducted the stated research. Data should n0t be used for a secondary purpose unless explained at the consent stage. This includes open dissemination of the data collected. The report suggests some methods of fuzzing the data including: perturbation, truncation, permutation, quantisation, pseudonymization, k-anonymity and differential privacy.

Overall, I would recommend the report to someone new to the domain of data privacy, as its a nice introduction to the topic.  The authors raise awareness of the necessary compromise between reproducible research and data privacy. Though they do not provide concrete advise to researchers on how to make the best compromise (other than telling them to be conservative). The report claims to focus on active mobile measurements, in practice its contribution is much more general than this. I would love to see this report with real-world examples of measurement studies that have been conducted, the comprise between reproducible research and data privacy that was chosen and how it was chosen.

Paper Notes: The Network is Reliable [ACMQ July’14]

The Network is Reliable is an excellent article which attempts to formalise the discussion on real world failures for distributed systems. There is currently great debate on whether the assumption that network partitions are rare is too strong or too weak, for modern networks. Much of the data which we could use to answer this question is not published, instead is takes the form of anecdotes shared between sysadmins over a beer. This article lets the reader sit at the table and share the stories.

It is worth noting that network partition is used here in the broadest possible sense. We are looking at end-to-end communications and anything which can hinder these. This goes beyond simple link and switch failures to distributed GC, bugs in the NIC, and slow IO.

Here 5 examples of the anecdotes covered:

  • split brain caused by non-transitive reachability on EC2
  • redundancy doesn’t always prevent link failure
  • asymmetric reachability due to bugs in the NIC
  • GC and blocking for  I/O can cause huge runtime latencies
  • short transient failures become long term problem with existing algorithms

So, what does this mean for Unanimous and my research on reliable consensus for real world networks?

My original inspiration for working on consensus algorithms was the need for reliable consensus at the internet edge. With consensus comes fault tolerance,  from this we can construct new systems for preserving privacy and handling personal data, offering users a viable alternative to 3rd party centralised services.

However, it has become apparent that the issues of reliable consensus is much more general. This article illustrates that even within the constrained setting of a datacenter, distributed systems are failing to tolerance the wide range of possible failures.