Monthly Archives: July 2012

UROP Goals

RFC 1034 – An insight into the 1980’s

Historical Context

Originally per host file in /etc/hosts for hostname to IP address mapping was updated via FTP.
The bandwidth required to update this per host file was proportional to number of host squared, hence exponential grow of hosts mean new system was required
The first operational packet switching network was the Advanced Research Projects Agency Network (ARPANET)
When changes to the per host file were make, they took along time to propagate to all hosts

Common Ideas from Proposed Solutions

Hierarchical name space with the hierarchy corresponding to organisations structures
Using a full stop “.” to mark the boundary between hierachy levels

DNS Design Goals

Consistent name space where names do not include network identifiers or addresses
Distributed management with local caching
Minimising attempts to collect consistent copies of entire database as this is not scalable
Source (not client) control tradeoff between cost, speed and accuracy of caches
All data associated with a name is tagged with a type
Name server translations should be independent of communication system that carriers them
The system should be useful across a wide spectrum of hosts

Usage Assumptions

Initially total database size is proportional to number of hosts but this will grow in time
System wide data will change slowly, subnet wide data will change rapidly
Each organisation is responsible for its own domains including providing redundant name servers
Clients use trusted name server before name server outside the “trusted set”
Availability is more important than consistency so its ok that updates are not simultaneous
Believe old data when new data is not yet avaliable
Copies of translations are distributed with time-outs for refreshing, the distributor sets time-out value
When a name server is presented with a query that can only be answered by some other server, either a recursive (name server asks another server to do query) or iterative (name server returns address of another name server to ask instead). The iterative approach is preferred but the recursive approach is still an option
All data originates from master files keep on hosts that use the DNS
All master files are updated by local sysadmin and all master files are text files that are read by a local name server
User programs access name servers through standard programs called resolvers
The standard format of master files allows them to be exchanged between hosts
Sysadmin provides zone boundaries, master files of data, updates of master fules and refreshing policies
DNS provides standard formats for resource data, standard method of querying the database, standard methods for name server to refresh local data from foreign name servers

DNS & Resource Records (RR)

The names in the name space will have a tree structure and RR hold the data associated with each name
Each node and leaf on the DNS tree has a set of information associated with it and a DNS query aims to extract this information
A DNS query includes the name of interest and the type of information that is required

Name Servers

These are server programs which hold information about the DNS tree structure and information set
A name server may cache information on DNS tree structure or information associated with particular names
In general, a name server has complete information from a subset of the DNS tree leaves and then pointers to other name server that can be used to get information on the other areas of the DNS tree
A name server has authority for the area of the DNS tree for which it has complete information
Authotitative information is organized into units called zones

Resolvers

Resolvers are programs that extract information from the name servers in response to client queries
All resolvers must have access to at least one name server
Typically a resolver is a system routine that is accessible to user programs

DNS from a client prospective

DNS is accessed via a procedure call to the local resolver
The DNS consists of a single tree and the user can request information from any section of the tree

DNS from the resolvers prospective

DNS is composed of an unknown number of name servers
Each name server hold at least one piece of information from the DNS tree
The DNS tree is static

DNS from the name servers prospective

DNS is divided into zones
The name server has local copies of some zones
The name server must periodically refresh its zones from master copies in local files or foreign name server

Name Specification

The term “node” is used to refer to both interior nodes and leaves
Each node has a resource set, though this can be empty
Each node has a label, 0 to 63 octets (an octet is 8 bits) in length
Brother nodes may not have the same label
The same label can be used by nodes that are not brothers
The root label is null
The domain name of a node (not the same as the label for a node) is the list of labels on the path from the node to the tree root
The label that compose the domain name are read from (most specific) left to (most general) right, e.g. consider the structure of the DNS name for college computers as342.priv.queens.cam.ac.uk
Domain names are case-insenstive
In domain names the labels are separated by dots (.)
A domain name ending in a dot represents a absolute/complete domain name
A domain name not ending in a dot represents an incomplete domain name that should be completed by local software (aka relative domain name)
The maximum domain name length is 255 octets (set of 8 bits), this includes labels and label lengths
A domain is identified by a domain name and consists of the tree at and below that domain name
A domain is a sub domain of another domain if it is contained within that domain
You can test if a domain is a sub-domain of another, if the domain name is the within the end of the sub-domain e.g. cam.ac.uk is a sub-domain of uk and queens.cam.ac.uk is a subdomain of ac.uk
Labels must follow the rules for ARPANET host names

Email

The usually format of email addresses is localpart@maildomain, this is mapped to a domain name so that it can be looked up in the domain name tree by converting localpart into a single label (ignoring dots) then converting maildomain into a domain name (not ignoring dots) and replacing @ with a dot to form a domain name so the email address as342@cam.ac.uk will become as342.cam.ac.uk
Domain names ( and therefore email addresses) are case insensative

Resource Records (RR)

Each node in the tree has a set of resource information, this can be empty
The set of resource information is composed of the set of Rrs
Order of RRs is not significant and need not be preserved
The owner of RR is the domain name where the RR is found, this is normally inplicit rather than being an explicit part of a RR
The type of a RR is an encoded 16 value such as:

A – a host address

CNAME – the canonical name of an alias

HINFO – CPU & OS used by a host

MX – mail exchange for domain

NS – authoritative name server for the domain

PTR – pointer to another part of domain name space

SOA – start of zone of authority

The class of a RR in an encoded 16 bit value such as IN – internet system or CH – Chaos System
The TTL of a RR is the time to live of the RR, this is a 32 bit integer in seconds, this defines how long a RR can be cached for before discarding, this limit does not apply to authoritative data in zones (TTL of 0 will prevent caching)
The RDATA of an RR contains different values depending on the type of class of the RR such as:
If type is A and class is IN, then RDATA is a 32 bit IP address
If type is A and class is CH, then RDATA is a domain name followed by a 16 bit octal Chaos address
If type is CNAME, then RDATA is a domain name
If type is MX, then RDATA is a 16 bit preference values followied by a host name willing to act as a mail exchange
If type is NS, then RDATA is a host name
If type is PTR, then RDATA is a domain name
If type is SOA, then RDATA is several fields
The structure of an RR is as follows:
start of line gives the owner of the RR, next we list the TTL, the type and the class of the RR (dig giving results in different order)
Domain names in RR’s could point at the primary name not another alias to avoid extra indirection

Queries

queries are carried in UDP or over TCP connection
responses to queries are either the answer to the question, refers the requester to another set of name server or signals some error condition
in general the resolver deals with queries on the users behalf
DNS queries and responses have a standard format, the start is a opcode which is one for a standard query
Then there are 4 section in a DNS query/response:
Question – Query name and other parameters
Answer – Carries RR’s which directly answer the query
Authority – Carries Rrs which describe other authoritative servers
Additional – Carries Rrs which may be helpful in using the Rrs in the other sections
A standard query specifies a target domain name (QNAME), query type (QTYPE) and query class (QCLASS)
QTYPE field may specify a specific type, AXFR special zone transfer QTYPE, MAILB matches all mailbox related Rrs or * which (like the regular expression) matchs all RR types
QCLASS field may contain a specific type or * to match all RR classes

Getting Started with Python

Why have I chosen to learn Python ?

Its a language sometimes used by my summer research project
It works particularly better than other language on the Raspberry Pi
I’ve covered the basics of Python before but never gone further than simple syntax
Its open source
It can be used as a scripting language, which is an area that I really want to improve
It supports object-oriented programming so I feel that with my previous experience of Java and C++, I will have covered the most popular OOP languages

Python 2.7 or 3 ?

The first decision that I needed to make before learning Python was whether to learn Python 2.7 or 3, since Python 3 fixes some of the “non-optional” design decision found in 2.7 and it of course more update I decided to learn Python 3. The potential problems with this is the lack of backwards compatibility with 2.7 and lack of teaching materials.

Python in my degree ?

The lecture course which is most likely to have included Python is Concepts of Programming Language which I took in Easter term this year. The course makes a few references to the language but does not cover the language to the degree that languages such as Fortran, Lisp, Algol, Pascal, Simula, Smalltalk, SML and Scala were covered.

What programming paradigms does Python support ?

Python advertises itself as a multi-paradigm programming language and from what I can see the range of supported paradigms supports this claim. This makes Python a multi-purpose language along with languages such as Java, C, C++ and SML compared to special-purpose languages such as SQL and LaTeX. The paradigm which Python support include:

imperative programming – describing computation by prescribing how to move the program from one state to another, imperative languages included Fortran, C and assembly
object-oriented programming – it supports dynamic lookup, abstraction, subtyping and inheritance, OOP language include Java, C++, Simula, Smalltalk and Scala
Functional programming – treats computation as the evaluation of mathematical function and avoids state and mutable data. Examples include Lisp, Ocaml, Haskell and Scala
Reflection programming – allowing a program to examine and modify its structure at runtime. Like Java, ML and Haskell
Structured programming – subroutines, blocks, for and while making code more readable compared to simple tests and goto’s. Examples include most modern languages except assembly code

How does Python handle types?

Type checking is the process of attempting to prevent type errors by ensuring that the operations in a programm are applied properly. There are two common types of type checking:

Dynamic (run-time) type checking – the compiler generates code such that whenever an operation is performed when the program is ran, the code checks to make sure that the operands have the correct types. Examples of dynamically type checked languages are MATLAB, Prolog and Ruby
Static (compile-time) type checking – the compilers checks that program text for potential type errors as the program is compiled. Examples of statically type checked languages are Fortran, Haskell, Java and Ocaml


Python uses Duck typing

Python is dynamically type checked, in fact it uses duck typing. Duck typing means that valid semantics of a object are dictated by the current state of methods instead of its inheritance from a class/interface like in Java. Why call it duck typing then ? the name refers to the duck test:

“When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck”

This allows use of EAFP or “It’s easier to Ask Forgiveness than Permission”. attributed to Grace Hopper.

A type system is said to be strongly typed when it specifies restrictions on how operations involving values of different data types can be intermixed. If this is not true, then we describe the language as weak. Python is strongly typed.

The built in types are as follows:

bool – truth value e.g. True or False
int – int of unlimited magnitude
list – mutable sequence (supports mixed types) e.g. [4, 4.0, “Four”]
tuple – immutable sequence (supports mixed types) e.g. ( 4, 4.0, “Four”)
complex – complex number e.g. 3+5j
float – floating point number e.g. 3.14
dict – group of key and value pairs {“key1”:1.0, “key2”:2.0}
set – unordered, no-duplicate collection of values
bytes – immutable sequence of bytes
bytearray – mutable sequence of bytes
str – sequence of Unicode characters

Mutable means that an objects state can be modified after its created, immutable means the opposite.

Python uses the off-side rule

Syntax & Semantics of Python ?

Unlike almost all other languages that I have studied Python uses whitespace to delimit block instead of curly braces (like Java), this feature is termed the off-side rule.

You define a function or method using the def keyword. The typically statements that you would except also apply: if, for, while, try, except and finally.

Python uses the words and, or, not for boolean expressions. Integer division (using //) is defined to round towards minus infinity and floating point division is do using /. Compare by value is achieved using == and compare by reference is achieved using is.

The syntax and semantics of Python highlight the main purpose of Python as a language for teaching programming, hence the focus on readability. This is why Python is the language being taught to school children using the Raspberry Pi

Installing the Interpreter

I install the packages for Python 3 from the Ubuntu Repositories using sudo apt-get install python3-minimal, and then run the interpreter by entering Python3 into the terminal

Once I get going with Python programming I hope to return to this type of language analysis and review how to design decisions made in the development of Python 3 reflect the use of the language in an introduction to programming. I hope to put up a part 2 of Getting Started with Python, later this week.

As ever, feel free to comment and highlight my mistakes even the spelling/grammar ones.

I’m back from brussels

1 Reply

From this …

I arrived in the back in Cambridge yesterday afternoon after the IBM EMEA Best Student Recognition Event 2012 in Brussels, Belgium. The theme of this years 3 day event was “Data Analytics”. The highlight for me on Wednesday 4th July was the a semenar by Prof. Bart Goethals, a lecturer at University Antwerp on data mining and machine learning in data analytics. For Thursday 5th July my highlights were a talk by Corinna Schulze from the IBM Governmental Program on privacy and a talk by Robert-Jan Sips focusing on how data analytics is being applied to water management in the Netherlands. And for the final day was highlights was a talk by Guy Lefever on the application of Watson to future medical care. Overall I really enjoyed the event and I would throughly recommend attending if you get the opportunity to go.

It looks as if I’m going to have some new work on Signposts next week but I don’t know as yet what this will be so over the weekend I intend to review some of the new material that I’ve learned over the my first few weeks on the Undergraduate Research Opportunities Project (UROP) here at the William Gates Building, Cambridge. My initial project brief was:

“Signposts – the virtual personal internet

Users with multiple networked devices are having a hard time linking them up when said devices reside at more than one location. Simple example is trying to synch your android phone while on the bus to the laptop in your office and the Mac Mini at home. NATs and Firewalls and other obstacles to symmetric communication abound, and obstruct. Point solutions (Dropbox, iCloud) do not deal with many cases (different OS) and only work if you pay (with money or eyeball time on adverts). In any case, they are not acceptable to privacy conscious users.

Instead, we are building a system called signposts. This is the smallest conceivable virtual machine that sits in the public internet, and looks after your personal namespace of devices. Registration and lookup of names is “effectful”, in other words, has side effects. The side effect is to use a set of possible tactics to rendezvous between the devices (various techniques for NAT traversal, for example) including possible multiple simultaneous path use, and also allowing for constraints, such as battery life, and price of use of connections. The system sits on top of DNSSEC, but more interestingly, as a piece of Computer Science, makes use of the new Cloud OS software being developed in CL, called Mirage (see http://www.openmirage.org/ and https://github.com/avsm/signpostd

This project is to help with the documentation and testing “in the wild” of signposts. We expect to uncover a lot of problems with different oddities (see http://conferences.sigcomm.org/imc/2011/docs/p181.pdf for examples) that appear, and we need to have good stories to account for what we are doing to the public and to security people. OCaml programming is pretty essential, as well as at least basic understanding of development tools (including git etc) “

Over the last few weeks, I had used a wide range of tools that are new or that I had not previously used to this extent including (in no particular order):

Unix command line – although this is something that i typically use, I have definitely used it more and for a wider range of applications that ever before
Git & GitHub – for my 2nd year group project PROTON, we took the (controversial) decision to use Mercurial and BitBucket instead of Git and GitHub like most other group so it been interesting to see how they compare
Tor – As alot of my work over the last few weeks has been looking at testing the performance of tools for online anonymity, I’ve made extensively use of the onion router
Orbot – Tor for android
Tails – An OS build around tor
Iperf – As just explained as I’ve been working with testing network performance Iperf has been an invaluable tool (though I have recently discovered Torperf)
Amazon Elastic Compute Cloud – I’ve been using this a server when testing network performance, its really use as it provides as global domain name and IP address and avoid pathological cases that could be created by having other cleint and server on the same network
SHH – for remote access to the other computers I am testing with locally and the virtual machine on Amazon EC2
dig-host – command line tools for performing DNS queries, just generally useful
CyanogenMod – the OS on my android phone which allows easy using of proxies and tethering
Privoxy – heavy customizable non-caching web proxy
Hamachi – used with privoxy so set up encrypted VPN access
OpenVPN – easy set up of VPNs
Texmaker – a nice LaTeX editor, that I would throughly recommend which I have been using to keep the notes on my work over the last few weeks
Mozilla Firefox & addons – previously a Chrome user myself, I’ve opted to use Firefox since the research project begin for a change and during this time I have experimented with a range of addons for privacy including: HTTPS Everywhere, NoScript, AdBlock Plus, DoNotTrackPlus, Collusion …
Eclipse & Android Development tools – getting to grips with the basics of Android development in preparation for the Google European Android Camp
Telnet – for commend line control of Tor
Wireshark – very useful for learning about computer networking, allows you to watch the packets on the network
TraceRoute – brillient tools which used TTL to should the route that a packet takes to its destination
Ocaml – the main project that I’m working with over the summer makes extensive use of Ocaml

I plan to use this list to visit some of the material that I’ve learned over the last few weeks in preparation for new work next week.

Read, Write, Execute

Notebook of a researcher in distributed systems.