The problem with consensus

A distributed system is collection of nodes, each which there own local memory, which are able to communicate via message passing, cooperate to perform a computation. CAP theorem [1,2] argues that its not possible to achieve consistency, availability and partition tolerance. But machines will fail and there failures are tolerated (to varying extents) by replication. These replicates now need to agree on consistent worldviews, leading to the problem of consensus, originally proposed by [6], Consensus is when these nodes agree on a value, applications for this include:

  • mutable exclusion locks
  • committing a transaction to a database
  • distributed storage such as NFS
  • Implementing reliable broadcast
  • leader election

A protocol for consensus must provide the following to be “correct”:

  • Agreement: all correct nodes arrive at the same value, (the safety property)
  • Validity: the value chosen is one that was proposed by a correct node, (the non-triviality property)
  • Termination: all correct nodes eventually decide on a value, (the liveness property)

A correct node is a node that will eventency make progress so its hasn’t yet and will not experience any of the failures listed below:

QUESTION: Some papers [7] , list a 4th condition called unanimity, stating that if all nodes propose the same value, this will be the value chosen. It appears to me that this is just a case of validity, since the value chosen must have proposed by a node, so if all nodes propose the same value then there is only one possible value to be chosen, according to the validity condition. But this used in a few paper so I can’t be right, so why ? 

The following failures are possible:

  • Fail-stop – nodes may stop but will not restart
  • Fail-recover – nodes may stop and restart
  • Byzantine – nodes behavior unexpectedly, from fault or malicious

Assuming synchronous communication (reliable unicast with known bounded message delay and execution of nodes)  and less than 2/3ds of nodes experience Byzantine failures , we can achieve consensus [3] but a completely asynchronous consensus protocol cannot guarantee consensus with just a single fail-stop node [4], the intuition behind this is that you can’t detect a fail-stopped node, as it my just be slow or its messages not yet delivered. This can be masked by trying to detect failures with waiting on unresponsive nodes, comprising liveness or timeouts/heartbeats comprising accuracy. All consensus algorithms provide consistency but for this they comprises partition tolerance and/or availability.

Henceforth, I would like to consider processes communicating over TCP/IP, making the following assumptions:

  • Nodes have local state, no shared memory and local clocks (though they may be shared when processes are running on the same host)
  • If a path is available, message passing is reliable unicast with unbounded delay (thanks to TCP’s reliable delivery, data integrity and in order delivery) but network partitions are possible
  • Failure detection methods don’t need to be accurate, thus its ok to use timeouts and assume a node is dead when it is infact live
  • A node can tell which node sent a message (e.g. MACs/IPs) so a Byzantine node cannot forge a message which appears to come from an honest host

Approaches:

Two phase commit is depend on a single node, whereas three phase commit [5] can tolerant half its nodes failing but not network partitions or unbounded network delays.

My next post will consider some more complex approaches meeting the above requirements

 

[1] Brewer, E.  Rowards Robust Distributed System. Symposium on Principles of Distributed Computing (PODC).(2000).

[2] Gilbert, Seth, and Nancy Lynch. “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services.” ACM SIGACT News 33, no. 2 (2002): 51-59.

[3 Lamport, Leslie, Robert Shostak, and Marshall Pease. “The Byzantine generals problem.” ACM Transactions on Programming Languages and Systems (TOPLAS) 4, no. 3 (1982): 382-401.

[4] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (April 1985), 374-382.

[5] Skeen, Dale; Stonebraker, M. (May 1983). “A Formal Model of Crash Recovery in a Distributed System”. IEEE Transactions on Software Engineering: 219–228

[6] Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2) (April 1980) 228–234

Poster Feedback Wanted

This is a draft of the A1 sized poster I will be presenting at LCDNets in a few weeks and I’d like some feedback, I’ll be printing tomorrow at 9am

Version 2 (Latest Version)

poster

Version 1 (Original Version)

LCDNets Draft Poster

OCaml Monthly Meeting – Live Blog

Today’s OCaml Labs Monthly Meeting is all about practise talks for OCaml2013 so in that spirit, I’ll practising a bit of live-blogging too.

13:53 – Today’s SRG Meeting is over and its time for some work before the OCaml Labs meeting at 4:00, see you then …

16:02 Techincal difficulties delayed the start

16:02 Intro from Anil

introducing Gabriel Scherer who is visiting us this week and going we are going to Maypole after this meeting. We had a cash prise from ASPLOS after winning the HiPEAC paper award and the money will go towards SRG wine for XMAS party. Signpost paper was accepted to FOCI and a HotNet paper on Trevi was also just accepted

OCL Website – Too much manual management at the moment, moving to an ocaml planet feed of blog posts. David has been busy hacking on OPAM2web, OPAM has 512 packages, Opam2web takes a subset of the OPAM packages and makes the metadata into a minisite, like on OPAM. Doesn’t require manual updates, like an ATOM feed.

Upcoming events – Tomorrow is the 2nd compiler hacking event, at the makespace. Anil will be talking at QCon on Mirage, Mirage 1.0 release date is October 22nd, so maybe a workshop before. We 3 talks for Ocaml2013 (Platform, OcamlOT and Ctypes) so here we go …

16:09 Anil practice talk on OCaml Platform 1.0

Languages take many difference approaches to platform, but what does platform even mean? As a late mover in this field, we can learn from other languages. A platforms is NOT a group of temporarily motivated hackers to build a replacement standard library. Its hard to adopt a particular approach without a domain specific purpose, there are too many opinions, we need objective way to determine what belongs in the platform, we need a genie community that is sustainable (even if a large party leaves). A platform is a bundle of tools that interoperate, with quantitative metric to judge success, built in agility and supporting developers thought the whole development life cycle. Industrial partners have a range of needs, as each work in different domains.

Tooling – Overview of 5 areas: OPAM from OCamlPro, IDE Tools, OPAM-DOC, OCaml compiler itself and Ocaml.org.

OPAM – 1.1 released today (maybe), over 100 contributors to OPAM,  500+ packages, 1500+ unique versions, external dependency solver using CUDF

IDE Support – OCaml has many intermediate files. In OCaml 4.0 onwards, we have a binary format of an abstract syntax tree with type annotations called cmt (and cmti for interface files), we can now create external tools to query this like opam-doc. ocp-index and ocp-indent from OCamlPro, and Merlin (I thinks this is EPIC) are also now available

opam-doc – Now we have cmt files, we need unified documentation across packages, this is much harder than it sounds as it touches every part of the tool stack. Not all packages can be installed at once due to conflicts. Module inclusion is tough to code in static html. (Need to make a demo) bindoc takes the Typed AST (in cmt) and generates cmd, which include the ocamldoc comments, Opamdoc takes the cmt database for opam and output a single website with your universe of packages.

ocaml.org – Demo of ocaml.org at ocaml-redesign.github.io/pkg/index.html, feedback is welcome says amir

Now we have the tools, what metrics can we extract to see how well our tools are doing.

Portability – windows compatibility ?

Maintainer – is there a place for docs and will people response to issues/comments/emails, where can issues be submitted ?

Tests – code coverage, multi variant benchmarking in core-bench

Stability – OPAM support pining, how stable are the interfaces of libraries ?

opam tracks compiler constraint, statically analyses the build system from logs (OCamlOT)

Agility – Building a platform is EXHAUSTING. We want to ask “WANT IF” questions: what if let was monomophic? what if we removed camlp4? what is the syntax precedence changes ?

Distrusted workflow – build on git, distributing tasks between 3 actors: Author (library writers), OCamlOL workers and maintainers. As we become more stable we move from staging to stable to inclusion in the platform.

We are building a tussle, we want to launch a game in janurary and let people put standard libraries into the ring, running OCamlOT to discover the winner

No clear winner: Lwt – portability, Batteries – free of syntax extensions, core – comprehensive.

16:36  Discussion over the battle of the standard libraries and talk feedback

C: talk is a bit long, not sure what to cut..

C: OPAM was dicussed last year at OCaml2013, we want to update everyone and follow on without overlapping too much

Q: Haven’t we already decided on JS’s core ?

A: No, we use all of them, i.e. Mirage used lwt extensively

Q: What if we don’t want any of the new standard libraries ? maybe I just want to use domain specific libraries from OPAM as and when I need them

A: We are not forcing the new standard libraries on anyone, but they are useful for beginners, nice to have consistent style, interoperability and few open statements e.g. Open Core.Std

Q: What if I have already decided which standard library I want to use ?

A: Again we are not forcing standard libraries on anyone, we are just trying to force effort more directly. OCaml tools will always be standard library agnoctic

C: the diagram of OCamlOT is confustion

C: how to not overlap with david talks

16:41 Davids talk on OCamlOT

State for the open source OCaml community

Outline: what is quality software? what is the user experience? what is feedback loop for package authors? How do we represent the thing underneath this all? utopian future ?

Quality: Work on every core (ANIL: We want multi-core :P ), consistent results: work or die nicely with obvious solution, not more “What have I forgotten?” questions, it should just tell you. We need addictive actions (not sure what they are), consistency, quality functions…

Universal concerns: compiler hypothesis “what if” questions (anil already said this), build system hypotheses “what strange assumuptions is the buid system making?”, package manager hypothesis and environmner hypothesis

Workflow: Make a pull request, curator observes the proposal, predict the future, proposes amendments, feedback loop and finally agreement is reached. Core is release weekly for example, we are trying to work like linux kernal patches

New workflow: promote health of OCaml community, preaching compatibility, “observe, orient, decide and act”, Computer assisted curator will help a human, to run the loop faster, human can pose questions to the computer assisted curator e.g  “will this run on ARM ?”

Repository Observation: github binding with web hooks but we are not tied to github. We merge into the world and we need dependences from each possible users prospective of the world

Dependency Orientation: capabilities with environmental dependances, packages with constriant-based dependencies, repositories with revision dependencies and artifact dependencies. example of the android repo

Triage Decisions: taking plain text error and parsing them into categories such as unsatisfiability (can’t have these two packages), dependencies (if my dependency is broken, then I am broken), transient (network down), system, metadata, external dependences (you forgot to write a dependency), build errors and a combo of many of the above.

State Action: commit intention, build, error analysis and buid results

Internet res: The agents negotiates over REST API on HTTPS, independent metadata layers (not sure about this) ,everythings an s-exp, branch consistent store explained, like git or Irminsule

Current state: github web hooks, we are conservative so one byte changes and we rebuild everything, basic triage heuristics completed, no amendment are proposed by the system atm, we don’t commit the outcome but the evidence, simple reactions to results, a website with green and red boxes in the large table

History: we have found lots of metadata issues, many packages bugs, some tool bugs like a non relocatable compiler and ocamlbuild PATH ignorer, we currently have 30+ x84-64 30+x84-32, 8 ARMs , many Linux distros , dead Raspberry Pi, panicking *nix filesystems and lots of people have set warning as error

Future: opamfu for DAG analysis, schema migration overhead, lower overhead for administrating exotic workers contributed to OCamlOT, we need to authenticate machines using ocaml-sodium, we need more advanced automation, proposed amendments, lets have a dialogue, better website integration, benchmarking your upgrades (how much improves cost), run experiments on whole OPAM universe with differential analysis and VM-based test system, to specific the worker finely.

What I think quantity is, vision of the future, how its represented underneath and what’s next,

Discussions

C: that was 20mins, feedback to David regarding content to be cut,

17:23 Ctypes by Jeremy 

This is a update not a practice talk

An examples of puts from C, how we can write no C and link in OCaml,

NEW things in Ctypes:
prettyprinting – for C types and C values, making it much eaiser to examine values for debuygging

biarray – support for lump of C memory

More type – nullable string, complex numbers

String conversions – much faster

Memory management issues – ctypes now gives the programmer more control over lifetime of OCaml passed to C,

finaliser – which you can attach to memory

Future

stub generation – instead of dynamically binding, it will generate stub code to act to the API

capability-style memory safty – one rogue pointer in a C library, can cause hell, loading each C library in a seperate address space so i library can only kill itself, you can then even run on C library on a foreign host or on a virtual machine

static strcut/union layout – checking layout of structures and unions against the API

17:40 Amir demo of ocaml-resdesign.githuib.io/docs/opam, (its look great :))

ocaml

OCaml Development in Vim

This is a quick run-through of how I set up my development environment in vim:

Install pathogen.vim

mkdir -p ~/.vim/autoload ~/.vim/bundle; \
curl -Sso ~/.vim/autoload/pathogen.vim \
    https://raw.github.com/tpope/vim-pathogen/master/autoload/pathogen.vim

Add the following to ~/.vimrc:

execute pathogen#infect()
syntax on
filetype plugin indent on

Install Syntastic

cd ~/.vim/bundle
git clone https://github.com/scrooloose/syntastic.git

Then quit vim and used :Helptags to check installs so far have worked.

Install Merlin

opam switch 4.01.0dev+trunk
opam update
opam upgrade
opam install merlin

Add the following to ~/.vimrc

:set rtp+=~/.opam/4.01.0dev+trunk/share/ocamlmerlin/vim
:set rtp+=~/.opam/4.01.0dev+trunk/share/ocamlmerlin/vimbufsync
let g:syntastic_ocaml_checkers=['merlin']

:SyntasticInfo will return a list of syntax checkers available to Syntastic, check that this now includes merlin

Install OCP Indent

opam install ocp-indent

Add the following to ~/.vimrc

autocmd FileType ocaml source /home/heidi-ann/.opam/4.01.0dev+trunk/share/typerex/ocp-indent/ocp-indent.vim

How do I download many pdf’s a webpage at once ?

A combo of curl and regular expression, e.g.

Ross Andersons “Security Engineering” is avalaible online under CC here but you must download each chapter as a seperate pdf, this is can the fixed using:

$ curl http://www.cl.cam.ac.uk/~rja14/Papers/SEv2-(toc|pref|acks|c[01-27]|biblio|index).pdf

ICFP, CUFP & OCaml2013

I’m busy planning my first trip across the Atlantic to attend ICFP, CUFP and OCaml 2013. Today, I’ve been given the duty of “live blogging” the event, over at the syslog, the Cambridge Systems Research Group blog.

My other job for the event is to improve the documentation for Janestreet’s Async library. if anyone else is keen, I would love to organise a doc-a-thon to populate the .mli files

1 2 3 4 8