Good Morning everyone!
I am here in Estonia with Zafar, Steffen and Jon at the First Cyber Security Summer School. It’s 8am and we have a packed schedule ahead of us today.
Our first session is lead by Steven M. Bellovin (Columbia University).
Our second session is lead by Jaan Priisalu (Senior Fellow at CCDCOE), with the help of Andres Kütt, Heiko Vainsalu, Mark Erlich (Estonian Information System Authority) and Kristjan Vassil (University of Tartu)
The panel discuss is lead by Lauri Almann (BHC Laboratory) and includes Parag Pruthi, Jon Crowcroft, Konstantina Papagiannaki, Jaan Priisalu, Andres Kütt.
Hello from a lovely little spa hotel in a forest in Estonia.
After an excellent dinner and many coffees, our first session started at 9pm (yes! they are working us hard already). We received a warm welcome from Olaf Meaneel (TUT) to the first ever cyber security summer school (C3S2015). Dr Parag Pruthi kicked off proceedings with his talk titled “Advancing big data analytics – cyber defence solutions”.
Parag asked “When was the first cyber war?” The answer: in 1982, during the cold war, the CIA attacked the flow control software for soviet serbian gas pipeline. Our networks are even more fragile. Example of Iran hijacking US drone and some excellent clips from the IT crowd. Breaching systems is fast, discovery is slow and recovery is very slow. We always blame ‘dave’, we aren’t good at protecting against human error. Intrusion detection systems are not reliable, 1% false positive rate gives a trust levels of .19%.
We researchers are disconnected from the real world, we make simplifying assumptions, design a solution and test in simulation against the assumptions. Parag motivates engineering from real world network. He details the challenges in collecting petabytes of data, storage, compression, retrieval, processing and evaluating.
Parag key message was that big data provides us with near perfect information for intrusion detection.
Q: Is you approach limited in time, we must collect data and anaylsis before we can react?
A: Correct, we still have real people watching data visualisation, like a security guard watch CCTV, but they are not an order of magnitude faster then they where before.
Ethical Privacy Guidelines for Mobile Connectivity Measurements is the first item on the C3S reading list, below is my brief notes on this November 2013 report by the Oxford Internet Institute.
The stated purpose of this report is to inform networking researchers about the best practices for preserving data subject privacy when performing active measurement of mobile networks.
Researcher must make a comprise between the privacy of data subjects and dissemination of research artefacts for reproducibility. To aid in reasoning about this comprise, the report presents a risk assessments format covering: the contributions of the research, risk of de-identification, impact of re-identification, unforeseen risk (such as data theft), methods to dissemination artefacts, informed consent and transparency. The report goes onto discuss a few legal implications, in particular, the ongoing debate on whether IP addresses and communication metadata are personally identifiable information.
The authors focus on a guiding principle: collect the minimal data possible to conducted the stated research. Data should n0t be used for a secondary purpose unless explained at the consent stage. This includes open dissemination of the data collected. The report suggests some methods of fuzzing the data including: perturbation, truncation, permutation, quantisation, pseudonymization, k-anonymity and differential privacy.
Overall, I would recommend the report to someone new to the domain of data privacy, as its a nice introduction to the topic. The authors raise awareness of the necessary compromise between reproducible research and data privacy. Though they do not provide concrete advise to researchers on how to make the best compromise (other than telling them to be conservative). The report claims to focus on active mobile measurements, in practice its contribution is much more general than this. I would love to see this report with real-world examples of measurement studies that have been conducted, the comprise between reproducible research and data privacy that was chosen and how it was chosen.