Tuesday, November 11, 2008

2008-10-30 Readings - Distributed Hash Tables


LOOKING UP DATA IN P2P SYSTEMS

Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica


Summary:

Surveys various DHT (distributed hash table) schemes for lookup in P2P systems. Frames the "lookup problem" and describes the DHT. Outlines one-dimensional routing (Chord), tree-like routing (Pastry and Tapestry), and multidimensional routing (CAN). Notes open problems in the field, e.g. distance function, operation costs, fault tolerance, proximity routing, malicious nodes, indexing and keyword search.

Discussions & Criticisms:

It would be nice for a survey paper to kind of do some comparative study using some figure of merit. Doesn't need to say one thing sucks and another rocks, but would give an indication of the pros and cons of each. The community did arrive at the conclusion later that Chord is basically the best of them.

Not much else to say for a survey paper. DHT is still new material for me. Skipped the undergraduate database class and didn't fully "get" the topic in CS262A. Also read the Chord and CAN paper there. This will hopefully sink in on the second pass.


Chord: A SCALABLE PEER-TO-PEER LOOKUP SERVICE FOR INTERNET APPLICATIONS

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan

Summary:

Presents Chord, an architecture and an implementation of a P2P lookup services. Key ideas include a circular structure, consistent hashing to load balance, finger tables to provide O(logN) lookup cost, successor nodes to ensure nodes can join and leave without affecting system performance. Generally agreed to be the best P2P DHT there is. Lots of details in the implementation. Not clear how to deal with concurrent joins and departures.

Discussions & Criticisms:

Same concerns as my first read of the paper. The system is obviously proved to be good and accepted as such. Easy of implementation? Is it adopted widely? Is the next random P2P start up going to jump onto the Chord bandwagon?

The modifications to the vanilla protocol to facilitate concurrent joins/departures look somewhat awkward. Seems like one does need to read their technical report to see all the proofs.

Mentioned this in class. Chord uses hop count now. People have proved for various contexts that hop count may suck as a routing metric. Any attempts to apply ETX and other such tricks to Chord? Could be side project once I take care of business in the datacenter track.

DHT for datacenters? Surely there must be techniques that we can recycle for something.



No comments: