Tuesday, November 11, 2008

2008-10-25 Readings - Overlay Networks


RESILIENT OVERLAY NETWORKS

David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris

Summary:


Presents the Resilient Overlay Networks (RON), an overlay network system to allow participating nodes to send data to each other around network outages. Basic idea is to have an overlay network, with participating nodes being hopefully geographically/topologically dispersed. Nodes run some algorithms to detect outage conditions, and switch from direct routes to source - 3rd party - destination type routes as appropriate. Include tight application integration mechanisms to specify what is an "outage". Also includes mechanisms to allow overlay routing per some policies and algorithm to detect outages and select alternate routes. Results show that the system addresses the problem that it sets out to address.

Discussions & Criticisms:

Waaaaaaaaaait .......... isn't one of the original goals of the internet to be resilient against outages and to auto-magically route around them? What the hell happened? Did we totally fail? I guess we did :(

I guess the root cause for the failure comes from the need for both intra-AS and inter-AS routing algorithms to avoid fluctuations and route flaps, which necessarily mean slow convergence to new route table state when existing routes go down. So basically RON type overlay is necessary for quick outage detection.

Does RON ever switch back to a direct route once the traditional routing tables converge? Didn't manage to dig that out of their route selection algorithm.

Good piece of research but arguably a less worthy piece of software engineering. My main objection is that the system tries to do too much at once, again leading to a good technology that is very unlikely to see adoption. The key idea is to have some kind of overlay that routes around outages. But the RON system, as it is presented, messes that up with tighter application integration, policy routing, and some what of a new routing protocol.

Ok so here's what I would do. My approach would be to do the minimalist design, i.e. overlays that route around outages with a minimalist routing protocol, do it TRANSPARENTLY to the application, and offer optional handles to specify application "outage" definitions. Routing policy specifications should be left in the hands of people who actually understand policy specs, and should come as a double optional configuration thing. "Legacy" applications running on the ordinary TCP stack should require minimal migration changes or no changes at all.

I would also put out Java/C/C++/Python libraries implementing this new overlay in a transparent fashion, basically still offering the nice socket interface, complete with all the necessary online API documentation.

The next step would be to partner with some enterprise/campus network, and have the system in limited, managed deployment to work out all the kinks (also data for more research).

Somewhere along the line we would need to figure out what applications/use cases would especially benefit from this overlay. Else there's no way we can convince ordinary users to install/configure the system. Good time to push for adoption would be after significant outages - "Use our system so that the next time you can keep working instead of just helplessly whine and moan and endlessly complain!" Somewhat devious .... benefiting from outages.

More seriously, technology wise, seems that a very small number of geographically/topologically dispersed nodes would be sufficient to facilitate routing around outages.

The resilience against flooding attack argument wouldn't work if nodes are not multi-homed.

Overlay routing itself seems to be an open question still.

Obvious security concerns with data traversing 3rd party network end points.


ACTIVE NETWORK VISION AND REALITY: LESSONS FROM A CAPSULE-BASED SYSTEM

David Whetherall


Summary:


Talks about the experiences of implementing the ANTS system and how it shaped the author's views on the original vision for active systems. ANTS is implemented in Java, has basic active network functionality, and includes a sandbox (extension of existing Java security mechanisms) for running active network code. Active network node receives capsules, demux for the appropriate routine to run, runs the routine (which may include forwarding), and repeats. Variety of outstanding issues including security and resource isolation. Key lessons include capsules are useful, but need traffic patterns that made code caching effective; sandboxing achieves some local isolation/protection, but attacks against the network still need to be stopped; greatest value in allowing evolution of services in the network layer.

Discussions & Criticisms:

Fingers tired from the RON ramble. This might be short.

Is the fundamental goal of the network to communicate or to compute? Hmm ...

Their "Deployment" section basically tells me the technology is not deployed. But nice to see the authors being sensitive to deployment requirements and issues.

Any business case for offering new services in the network?

DNS caching results may undermine their reliance/assumption of traffic patterns that allow code caching to be effective.

Hmm ... the real value of active networks seems to be network services variation/evolution. There's gotta be more uses for this very "different" idea.

ASs may prevent active network services from traversing administrative boundaries? "Why should I use resources on my own network to help you compute services for which users pay only you and not also us?"

Seems really useful for sensor nets though.



No comments: