Tuesday, November 11, 2008

2008-11-04 Readings - DNS and the Web


DEVELOPMENT OF THE DOMAIN NAME SYSTEM

Paul V. Mockapetris, Kevin J. Dunlap


Summary:

Talks about the original motivation for and implementation of DNS. Motivated by how the HOSTS.txt fails to scale. Original system requirements include scale, distributed management, no limit for names & data associated with names, interoperable across DARPA Internet and other networks. Additional requirements that helped transition/adoption include independence on network topology, avoid focing a single OS or organizational style. Generally minimalist design to allow greatest implementation flexibility. Caching responses is really important, including caching negative responses. Adoption resistance mainly arised from different assumptions regarding how systems should work. Also protocol compliance tests should assume people do enough to get this working and do no more, and thus must include mechanisms that require demonstration of the required functionality.

Discussions & Criticisms:

Overall a very interesting story of how a core component of the Internet got developed. The motivation of the system was pretty obvious, and I don't think anyone would dispute that DNS is needed or that DNS has been successful. What got me going is the adoption story, perhaps a model for designing future technologies that are likely to be adopted on a large scale.

I think there are several criteria for the DNS success story to be repeated. 1. The technology must tackle a critical problem that makes both research and industry say "Oh crap we're screwed", and resolve the problem with significant demonstrable benefits. 2. The technology must be minimalist in design, and make no assumptions about system topology, architecture, organizational style, policy priorities. 3. The initial technology must be minimalist in design, resolve the problem at hand and do not try to force solutions for other problems (the debate on MX records could easily have blown out of control and had MX been central to DNS, the MX debate could have scuttled DNS adoption). Once adoption is well underway, layer-on functionality can be introduced. 4. The techonology must work under the assumption that real life implementers will achieve minimal functionality and does very little testing/tuning.

Would appreciate comments re the above list.

I also find it interesting to think about the naming resolution/distribution problems for the Internet (DNS) and for the phone network (basically through human contact).


DNS PERFORMANCE AND THE EFFECTIVENESS OF CACHING

Jaeyeon Jung, Emil Sit, Hari Balakrishnan


Summary:

Presents DNS trace data and associated analysis. Data comes from around 2000-2001. Three continuous chunks on the order of days. Trace from link connecting MIT LCS and KAIST to the WAN. Also has trace-driven simulations to investigate caching strategies and the effect of changing the cache TTL. Key results include a large % of no answer or error results - the former takes up a large chunk of traffic due to overly persistent retries. Also, long-tail name access patterns suggest low cache TTL does not impede performance, and shared cache among clients do not improve performance.

Discussions & Criticisms:

First, the data collection method. I appreciate that as a first-of-its-kind study, they had to work with whatever data they have instead of whatever data they would wish for. Still, future studies should consider capturing more data and do some kind of Poisson sampling with the data (e.g. samples a few hours at a time over several months using Poisson sampling). General intuition that Poisson sampling captures time averages. I think both tracing and data analysis capabilities have increased greatly since the work was first published.

Long tail name access patterns means TTL ~ duration of each user session, since it's reasonably pointless to cache across sessions that go all over the place. Turning point of all their TTL vs hit rate curves occur at TTL ~ 500 sec. Seems to correspond to duration of user sessions.

Did they propose any fixes for the observed pathologies (no answers, repeated answers etc.)?

Caching and TTL messages reasonably convincing and looks adoptible - just requires software config. of DNS cache, right? Have people been willing to do this?



No comments: