Monday, November 24, 2008

2008-11-25 Readings - Datacenter Networking


A POLICY-AWARE SWITCHING LAYER FOR DATA CENTERS

Dilip Antony Joseph, Arsalan Tavakoli, Ion Stoica

Summary:

Presents the idea of a policy aware, layer-two PLayer, as well as policy aware pswitches. Middle boxes are located off path, and policy and reacheability are separated. pswitches explicitly direct traffic to off-path middle boxes according to policy. The bulk of the paper talks about implementation details. Prototype implemented in Click, evaluations done on small topologies. Include mechanisms to allow policy specifications and middle box instance selection.

Discussions & Criticisms:

Policy fluctuates a lot, but router architecture necessarily need to have infrequent changes. But middle boxes are needed. Hmm. How to balance the two ... Maybe some kind of funky application layer overlay stuff? Use overlay to fake routes to middle boxes and define traversal order? Only down side may be inefficient use of network bandwidth. However, adoption should be easier than pushing a whole new router architecture.

Maybe the answer is autoconf applications? Once upon a time it was a pain to (mis)configure compilers and such too. But then autoconf came along ...

Scalability behavior unknown?

Dealing with a complexity problem by adding another abstraction layer and more complexity??? Seems like the fundamental problem with the current setup is that middle boxes have to be on the path of the traffic, and placed in order of desired order of traversal. Once we solved that, we solved all problems.

Hmmm. Is the performance of Click anywhere near production switch quality? Basically all research stuff on routers use Click. Paper admits that software routers are nowhere near line speed. But if anything requires what the paper calls "fork lift upgrade" of all routers, it's going to be a hard sell to Cisco and potential barrier to adoption :( Results from NetFPGA might be interesting.



IMPROVING MapReduce PERFORMANCE IN HETEROGENEOUS ENVIRONMENTS

Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica

Summary:

Outlines a new way to schedule speculative tasks for Hadoop. Schedules speculative execution based not on progress rate but on remaining expected running time. Outlines assumptions implicit in Hadoop scheduling and how the assumptions break down in heterogenous environments. Implements the LATE (Longest Approx Time to End) scheduler. Critical insights include make decisions early in the job, use finishing times instead of progress rates in decisions, avoid speculative execution on slow nodes, and have speculative task caps. Evaluation done on EC2 (hundreds of nodes/VMs) and local testbed (R cluster? tens of nodes). Results show visible improvement over native Hadoop speculative scheduler for all setups. Improvement can be as high as factor of 2 decrease in finishing time.

Discussions & Criticisms:

MapReduce goodness :)

This looks adoptable. Anyone who felt the benefits are worth the effort yet?

Have only one question/reservation. MapReduce is a general programming paradigm and Hadoop is only one implementation of it. Doing stuff with Hadoop is fine and relevant, but people might say maybe Hadoop just sucks, and a better implementation of MapReduce would get rid of all Hadoop problems. I get asked that about my energy efficiency of MapReduce work. So what is the fundamental inefficiency in MapReduce going form homogeneous to heterogeneous environments? When I say MapReduce I mean the underlying programming paradigm, not just the Hadoop implementation. Yea ... sounds like all the wrong assumptions are Hadoop wrong assumptions, and the insight from the paper would lead to a better implementation of MapReduce.

Could it be that a more fundamental solution to MapReduce is to assign not tasks of homogeneous size, but assign task size depending on historical work rate? Or assign different sized tasks based on known system config params like disk speed, CPU speed, network speed etc.? Sounds like that will increase not just the speculative finishing time but the finishing time of all tasks.

Not sure if tuning Hadoop like crazy is the best way to go long term. I got persuaded to take a broader, more fundamental perspective ...



Wednesday, November 19, 2008

2008-11-20 Readings - Multicast


PREAMBLE

Might be helpful to briefly review network level multicast.



A RELIABLE MULTICAST FRAMEWORK FOR LIGHT-WEIGHT SESSIONS AND APPLICATION LEVEL FRAMING

Sally Floyd, Van Jacobson, Steven McCanne, Ching-Gung Liu, Lixia Zhang

Summary:

Talks about the stuff in the title (well duh the title is long enough). Includes a REALLY nice description of the problems facing multicast, and why many of the unicast techniques cannot be migrated. Describe one example of multicast framework in LBL's wb system. Big ideas include receivers are responsible for detecting error loss and requesting retransmission, requesting retransmission using not sequence numbers, but address identifying the data.

Discussions & Criticisms:

Yay Sally Floyd and Van Jacobson! But what's with the long title?

I tend to really like the underlying philosophy/world-view/scientific-outlook that permeate many of their papers. Here the money line is "One cannot make a single reliable multicast delivery scheme that simultaneously meets the functionality, scalability and efficiency requirements of all applications." Perhaps a very necessary bucket of cold water that we should pour on ourselves every time we attempted one-size-fits-all solutions. I think that as engineers, we would inevitably be super excited about the stuff we build; but as scientists in search of why systems work well instead of just what systems we can build, we should constantly question and be skeptical about our own findings/systems/views. Anyhow. I'm actually a big fan of sound scientific method and good experiment design. Probably the residual trauma of having parents who are experimental chemists/chemical engineers. Hence hats-off to Floyd and Jacobson again.

I feel this work is a much more complete/thorough work than the other paper. I actually read them in the reverse order. There're discussions on different topologies, trying to isolate the bare skeleton functionality from bonus features, as well as a generally more convincing presentation of the data and the thinking behind it.

One thing I am confused about - is their method really application layer or are they kind of faking transport layers stuff in the application layer? In a sense the answer perhaps matters only for Internet architecture purists.

Also, where do the routers and switches fit into the system picture?

wb. Yea. Far cry from the collaboration stuff we have now. SharePoint anyone? :)

More seriously though, the receiver responsible for retransmission view is actually totally new to me. Very innovative.

The last set of papers and this one has also caused me to think perhaps the right way to design the network is not to organize it by operations we want to do on the communication channel, but the operations we want to do on the data that flows through the channel. The data addressing scheme they brought up is totally along the data centric path.

Turning devil's advocate, I would actually question the general need for multicast. Multicast makes sense only for any live, streaming data transfers. Else one-to-many general content distribution can be done in any number of ways, given that "liveness" is not an issue. So really the killer live streaming apps that could burn the network with bandwidth are live webcast, gaming, anything else? Live webcast we can always deliver through the cable network :). Gaming has intrinsic bandwidth limits based on our reaction time/sensory limits. So yea. Seems multicast is something that's common enough to warrant that it be done right, but not so common that I won't ask why do we need it ...



SCALABLE APPLICATION LAYER MULTICAST

Suman Banerjee, Bobby Bhattacharjee, Christopher Kommareddy

Summary:

Presents the NICE application-layer multicast protocol. The protocol is in charge of arranging the end hosts into a hierarchy that has some nice properties. The hierarchy is created and maintained in a distributed fashion. Nodes maintain soft state. Some nice properties include O(logN) control overhead, O(logN) "stretch" bound, and O(logN) hierarchy diameter. Include simulation results, as well as WAN implementation comparative measurements against the Narada protocol.

Discussions & Criticisms:

So there's the NICE multicast protocol and the NICE project. What does the NICE project do that demanded the construction of such ... elaborate protocol?

I actually found this paper to be good and bad at the same time. It's evident that the scheme they proposed is quite elaborate. They put a lot of effort into it and really did a lot of work, both theory wise in proving some qualities about it and experiments wise in simulating and measuring a real implementation.

However I'm quite disturbed by the seeming lack of any thinking behind their architecture nor any talk of how they came up with the design. The paper is presented as here's the pain point, here's a system, here are some properties about it, and here are our simulation and implementation measurement results. Somewhat feel like "Fiat Lux!/NICE!/whatever!" And I am left to wonder how the hell did they conjure up the system like the big bang.

The unease just mentioned leads to other reservations. What about other topologies? Hypercube? If it's really low bandwidth and delay variation tolerant within certain bounds, what's wrong with faking multicast with unicast and stagger the transmission time? The hierarchy seems to be very even and symmetrical. Any note on distinguishing the source of the data? Usually for each data streaming application there is/are a small number of data sources but a large number of data sinks. Their system doesn't seem to account for that.

Also I wonder what's really Akamai's secret sauce. We did talk about Akamai earlier in the semester in the context of DNS. Perhaps now would be a good time to revisit the convo? How do all the stock ticker systems and ESPN game ticker systems do it? For ESPN they could have problems during Super Bowl or major sporting events. And I imagine some stock ticker system somewhere would be near collapse during times of high volume trading.

Methodology wise all their results are dependent on comparison against Narada. What is that???? What if Narada just sucks ....


Tuesday, November 18, 2008

2008-11-18 Readings - Data-oriented networking and DTNs


AN ARCHITECTURE FOR INTERNET DATA TRANSFER

Niraj Tolia, Michael Kaminsky, David G. Andersen, and Swapnil Patil

Summary:

Presents the DOT architecture. Basically an abstration layer between transport and application layers that allow innovations in data transfer to occur separately from innovations in application functionality. Reasonable API. Implemented as various plugins. Does not impose excessive overhead. Able to rewrite mail program to use DOT.

Discussions & Criticisms:

I'm hesitant to say this but the paper is actually one of the very few papers where I disagree with not just the methodology or the significance of the results or the relevance/impact of the work. Here I actually disagree with the big picture philosophical outlook of the paper.

In my mind the greatest motivation for DOT is to facilitate innovation in data transfer independent of application specific details, and to allow cross-application services related to data (e.g. caching, middle-box type processing etc.). Isn't it the explicit purpose of transport layer protocols to facilitate data transfer independent of application layer details?

Most of the experiments done in the paper are to show that DOT does not suck or impose excessive overhead. But there are no results to demonstrate innovation is facilitated where it has been impossible, or made easier where it has been difficult. Neither was the cross-application services realized nor enough effort put into demonstrating that such services are possible with the existing system architecture and API. So in my mind the work did not address the very valid pain points that it brought up.

That said, I do subscribe to the view that a data-centric perspective on the network would be enlightening. After all the network exist because we use it to communicate data. I also buy the benefits of separating "content negotiation" from "data transfer". It's just that the approach in the paper seems to be a very odd way to approach these set of topics.

Will bring skepticism to class and see what everyone says.


A DELAY-TOLERANT NETWORK ARCHITECTURE FOR CHALLENGED INTERNETS

Kevin Fall

Summary:

Sets the context and define the problems faced by "challenged" Internets. Examples of such networks include terrestrial mobile, exotic media like satellite, ad-hoc networks, or sensor networks. Basic argument is that the real-time interactive message exchange assumed by most existing protocols is broken for challenged Internets. Outlines potential architecture to resolve the problems. Does not contain architecture realization and performance study.

Discussions & Criticisms:

There has been another series of similar work by one of Eric Brewer's students, recently presented at dissertation, on Internet for developing regions. There, the context of developing regions present a whole host of unexpected stuff that creates a very challenged Internet. It also uses the old "sneaker net" idea of directly delivering data (the DOT paper had a version of this idea re portable storage too).

Does a good job of setting the context and framing the problem.

I think the real value of this series of work may be to that the "unchallenged" Internet poses a small subset of the "challenged" Internet. So the techniques for "challenged" Internet should enlighten efforts to make the regular Internet more robust.

There are no implementation or performance evaluation. So I guess the paper is more of a philosophical outlook type paper. I think the measure of success for anything dealing with "challenged" Internet should just be that things work. They don't even need to work well. We either have non-working Internet, or we have "challenged" Internet where things suck but kinda work. Better than nothing.

Again, I do have some reservations that instead of some big and fuzzy new architecture, the right response to "challenged" Internet may be to overhaul the existing IP/TCP/UDP protocols and make them more able to withstand "challenges". This way the everything-converge-to-IP thing would allow all the exotic context the paper brings up to benefit. Else we'd have to go through the pain of implementing one-size-fits-all solutions that require consensus across the pretty well segregated exotic networks fields. Or have each field implement their own DTN solutions, which is essentially taking the DTN philosophy but continue on the existing path of devicing/improving custom protocols for each type of "challenged" networks.



Thursday, November 13, 2008

2008-11-13 Readings - Measurement and Tracing


END-TO-END INTERNET PACKET DYNAMICS

Vern Paxson

Summary:

Presents results from large scale Internet tracing study. Order 20,000 TCP bulk transfers of 100 KB between 35 Internet sites. Measurements at Poisson intervals to extract time averages. Analysis network "pathologies" including out of order delivery, packet replication, and packet corruption that's not due to noise. Also finds the bottleneck bandwidth for each pair of links. Presents distributions of packet loss and packet delays. Methodological refinements over previous studies in various areas.

Discussions & Criticisms:

Does Vern always do his research by himself?

The bottleneck link findings - I would be more interested in where the bottleneck links are. Knowing where they are would allow us to figure out ways of relieving/removing the bottlenecks. Also what are T1 circuits?

The packet loss rates are quite high. And the claim is that TCP works nevertheless?

Figure 9 looks very similar to the graphs we have from DC Metro measurements. Probably worth going back to those graphs again and see if it's indeed the same phenomenon.

Interesting how the data is from 1995, but the publication is in 1999. Does it take 4 years to crunch through the data?

Internet in 1995 is order 5 million hosts. In 2008 it's order 500 million hosts. How big of an effort would it be to repeat a similar measurement study?

Also, 35 hosts out of 5 million is almost nothing. Vern noted several findings different from other studies in the same things. Fundamental shortcoming of any attempt to get a snap shot at the Internet?

Packet replication and corruption - hard to imagine what could happen ...

Infinite variance in loss outage duration! I wonder if that's still the case.

The Poisson sampling thing is really nice for measuring long term phenomena with a small amount of data. Don't see that often anymore. Did people forget the early 1990s papers?


X-TRACE: A PERVASIVE NETWORK TRACING FRAMEWORK

Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion Stoica

Summary:

Presents the architecture and implementation of the X-Trace infrastructure. Allows tracing of causal interactions across multiple network layers. Does so through book keeping in metadata that gets propagated through the network stack. Uses pushDown() and pushNext() operations for this purpose. Metadata easily goes into extensible headers for HTTP/TCP/IP protocols and the like. Offline reconstruction of call trees. Takes care of administrative control - people who request X-Trace doesn't necessarily see the traced data - data goes to the admin of each administrative domain. Needs modifications to protocol/clients/nodes.

Discusisons & Criticisms:

Requires modifying various protocols/clients/nodes. Already noted multiple times in the paper as a shortcoming. Probably a necessary thing if we were to really get comprehensive traces. I think the whole ugly business of tracing is not like ... shortcoming of the trace tools that they must modify various things, but a shortcoming in the original network design architecture that accounting/tracing was a very low priority. Sure the top priority of the DARPA project was blah blah survivability, but tracing/accounting/debugging capabilities shoudl arguably not even be a priority, but a general system design principle. Without these capabilities, we end up having systems that are hard to understand and hard to improve.

One could give the counter argument - screw accounting and tracing, what's important is having systems that work, and who cares about understanding why they work. In that view we'll often end up re-inventing the wheel and stumbling from revolutionary change to revolutionary change, with very little common evolutionary theme connecting it all.

Real life is somewhere in the middle .... Hence X-Trace I guess ...

I don't understand fundamentally what's preventing us from having a task graph? It's just metadata and task identifiers, right? So the metadata could be designed as extensible and carry multiple task identifiers? Also do something like XML name spaces and have each X-Trace metadata conform to a common format, but extensible beyond that? Could be nice for future evolution as well.

Paper did note that the key measurement for X-Trace's success is when it can help real life admins and developers debug/maintain their stuff. So one year since the paper was published, any happy stories in that regard?



Tuesday, November 11, 2008

2008-11-06 Readings - Names, Identifiers, and Network Architecture


MIDDLEBOXES NO LONGER CONSIDERED HARMFUL

Michael Walfish, Jeremy Stribling, Maxwell Krohn, Hari Balakrishnan, Robert Morris, and Scott Shenker

Summary:


Presents the Delegation-Oriented Architecture (DOA), that not only allows, but also facilitates, the deployment of middleboxes. Basic idea is very simple. Instead of doing node ID on IP addresses & ports, do it using another layout of abstraction of EID (Endpoint ID). Implementation requires DHT for EID translation, addition to the OS kernel space to do the EID lookup, and addition to the network stack headers to have DOA header between IP header and transport header. Their initial implementation has tolerable overhead.

Discussions & Criticisms:

First,
minor point, the name DOA isn't all that cute or inspiring or blandly technological. Usually I don't even notice the name but when I was reading the paper and my roommate was watching some disturbing junk on Youtube, the screams he made pretty much sounded like DOOOOOUUUUUUAAAAAAHHHH!!!!! Had the name being Goa then at least some people would recognize it as some place in Indian, albeit associated with the short-lived Portugese colonial presence there ...

Second, major point. Continuing the train of thought from the DNS paper, there's no chance this technology would get adoptedin real life. It solves a somewhat artificial problem - the complaints from Internet architecture purists (the authors themselves) about the harmfulness of middle boxes and how it violates various Internet design principles. I would have preferred something snappy and blunt, e.g. Scott Shenker coming out and saying "We think middle boxes are no longer full of crap." He made and equivalent statement about his prior work in QOS and that message stuck more than any paper would. Also, the way it's presented, it requires too many architectural changes in the underlying network implementation. So costs > benefits.

There might be an economy of scale business argument for concentrating the management of middle boxes into an expert third-party, and remove the redundant expertise in that area from across multiple businesses/organizations. But for most organizations, the control over the potentially critical functionality of the middle boxes far negates any saving in efficiency. Nevertheless the "outsourced" middle boxes idea is worth keeping in mind and has the potential to completely disrupt the on-the-path middle boxes business.

The overhead results in the paper are not yet fully convincing. Latency overhead could get large for large administrative domains with large DHT lookup table. Packet header overhead could get large for small chatty type packets. Might need lots of caching in the style of DNS.

Also it's not clear to me on which nodes the DHT would reside. For example, if we use Chord, who would be the "other" Chord nodes?

Potential security exploits in DOA are obvious.



INTERNET INDIRECTION INFRASTRUCTURE

Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana


Summary:


Outlines the Internet Indirection Infrastructure (i3). Motivated by the current inefficiencies in multicast and anycast, and the network architecture's sole focus on unicast. Basic idea is to have (ID, data) and (ID address) pairs to separate sending and receiving. The senders and receivers rendevouz at some point in the network. Implemented in an overlay network (Chord used for the particular implementation). Each node stores sufficient state to facilitate rendevouz. Each node stores state for ID allocated in Chord style. Lots of implementation details.

Discussions & Criticisms:

Not sure if we're solving a real life problem here. Most multicast/unicast would be large content provider to many individual end-consumer. The multi-cast tree/Akamai approach seems to work fine. Mobility would be a better motivation. But perhaps that's because the home agen/foreign agent business is really ugly. How do cell phones manage mobility? Perhaps IP/voice convergence should learn from cell phones in terms of mobility?

The architecture would inherit all the deficiencies of Chord. In particular the hop count routing metric is broken (ETX, ETT would be better?).

Much of the implementation ugliness needed arised out of the assumption that multi-cast and unicast traffic can be treated well in the same infrastructure. I think they are fundamentally different. But not sure if the best solution is the current solution (faking multicast using clever unicast through Akamai and such), or something else. Mobility we should consider borrowing ideas from cell phones.

Security threat model is horrendous. You have Chord nodes storing routing information and act as points of transit. These are end consumers, and has a significantly lower level of trust than routers etc. managed by ISPs and backbones. Can eavesdrop/re-route/spoof like crazy.

Also potentially creating bottlenecks at the network edge - using the low capacity of the network edge to do stuff that should arguably be done in the network core.

Still, a nice idea that more or less solves what it sets out to solve under the constraints it places on itself.



2008-11-04 Readings - DNS and the Web


DEVELOPMENT OF THE DOMAIN NAME SYSTEM

Paul V. Mockapetris, Kevin J. Dunlap


Summary:

Talks about the original motivation for and implementation of DNS. Motivated by how the HOSTS.txt fails to scale. Original system requirements include scale, distributed management, no limit for names & data associated with names, interoperable across DARPA Internet and other networks. Additional requirements that helped transition/adoption include independence on network topology, avoid focing a single OS or organizational style. Generally minimalist design to allow greatest implementation flexibility. Caching responses is really important, including caching negative responses. Adoption resistance mainly arised from different assumptions regarding how systems should work. Also protocol compliance tests should assume people do enough to get this working and do no more, and thus must include mechanisms that require demonstration of the required functionality.

Discussions & Criticisms:

Overall a very interesting story of how a core component of the Internet got developed. The motivation of the system was pretty obvious, and I don't think anyone would dispute that DNS is needed or that DNS has been successful. What got me going is the adoption story, perhaps a model for designing future technologies that are likely to be adopted on a large scale.

I think there are several criteria for the DNS success story to be repeated. 1. The technology must tackle a critical problem that makes both research and industry say "Oh crap we're screwed", and resolve the problem with significant demonstrable benefits. 2. The technology must be minimalist in design, and make no assumptions about system topology, architecture, organizational style, policy priorities. 3. The initial technology must be minimalist in design, resolve the problem at hand and do not try to force solutions for other problems (the debate on MX records could easily have blown out of control and had MX been central to DNS, the MX debate could have scuttled DNS adoption). Once adoption is well underway, layer-on functionality can be introduced. 4. The techonology must work under the assumption that real life implementers will achieve minimal functionality and does very little testing/tuning.

Would appreciate comments re the above list.

I also find it interesting to think about the naming resolution/distribution problems for the Internet (DNS) and for the phone network (basically through human contact).


DNS PERFORMANCE AND THE EFFECTIVENESS OF CACHING

Jaeyeon Jung, Emil Sit, Hari Balakrishnan


Summary:

Presents DNS trace data and associated analysis. Data comes from around 2000-2001. Three continuous chunks on the order of days. Trace from link connecting MIT LCS and KAIST to the WAN. Also has trace-driven simulations to investigate caching strategies and the effect of changing the cache TTL. Key results include a large % of no answer or error results - the former takes up a large chunk of traffic due to overly persistent retries. Also, long-tail name access patterns suggest low cache TTL does not impede performance, and shared cache among clients do not improve performance.

Discussions & Criticisms:

First, the data collection method. I appreciate that as a first-of-its-kind study, they had to work with whatever data they have instead of whatever data they would wish for. Still, future studies should consider capturing more data and do some kind of Poisson sampling with the data (e.g. samples a few hours at a time over several months using Poisson sampling). General intuition that Poisson sampling captures time averages. I think both tracing and data analysis capabilities have increased greatly since the work was first published.

Long tail name access patterns means TTL ~ duration of each user session, since it's reasonably pointless to cache across sessions that go all over the place. Turning point of all their TTL vs hit rate curves occur at TTL ~ 500 sec. Seems to correspond to duration of user sessions.

Did they propose any fixes for the observed pathologies (no answers, repeated answers etc.)?

Caching and TTL messages reasonably convincing and looks adoptible - just requires software config. of DNS cache, right? Have people been willing to do this?



2008-10-30 Readings - Distributed Hash Tables


LOOKING UP DATA IN P2P SYSTEMS

Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica


Summary:

Surveys various DHT (distributed hash table) schemes for lookup in P2P systems. Frames the "lookup problem" and describes the DHT. Outlines one-dimensional routing (Chord), tree-like routing (Pastry and Tapestry), and multidimensional routing (CAN). Notes open problems in the field, e.g. distance function, operation costs, fault tolerance, proximity routing, malicious nodes, indexing and keyword search.

Discussions & Criticisms:

It would be nice for a survey paper to kind of do some comparative study using some figure of merit. Doesn't need to say one thing sucks and another rocks, but would give an indication of the pros and cons of each. The community did arrive at the conclusion later that Chord is basically the best of them.

Not much else to say for a survey paper. DHT is still new material for me. Skipped the undergraduate database class and didn't fully "get" the topic in CS262A. Also read the Chord and CAN paper there. This will hopefully sink in on the second pass.


Chord: A SCALABLE PEER-TO-PEER LOOKUP SERVICE FOR INTERNET APPLICATIONS

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan

Summary:

Presents Chord, an architecture and an implementation of a P2P lookup services. Key ideas include a circular structure, consistent hashing to load balance, finger tables to provide O(logN) lookup cost, successor nodes to ensure nodes can join and leave without affecting system performance. Generally agreed to be the best P2P DHT there is. Lots of details in the implementation. Not clear how to deal with concurrent joins and departures.

Discussions & Criticisms:

Same concerns as my first read of the paper. The system is obviously proved to be good and accepted as such. Easy of implementation? Is it adopted widely? Is the next random P2P start up going to jump onto the Chord bandwagon?

The modifications to the vanilla protocol to facilitate concurrent joins/departures look somewhat awkward. Seems like one does need to read their technical report to see all the proofs.

Mentioned this in class. Chord uses hop count now. People have proved for various contexts that hop count may suck as a routing metric. Any attempts to apply ETX and other such tricks to Chord? Could be side project once I take care of business in the datacenter track.

DHT for datacenters? Surely there must be techniques that we can recycle for something.



2008-10-25 Readings - Overlay Networks


RESILIENT OVERLAY NETWORKS

David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris

Summary:


Presents the Resilient Overlay Networks (RON), an overlay network system to allow participating nodes to send data to each other around network outages. Basic idea is to have an overlay network, with participating nodes being hopefully geographically/topologically dispersed. Nodes run some algorithms to detect outage conditions, and switch from direct routes to source - 3rd party - destination type routes as appropriate. Include tight application integration mechanisms to specify what is an "outage". Also includes mechanisms to allow overlay routing per some policies and algorithm to detect outages and select alternate routes. Results show that the system addresses the problem that it sets out to address.

Discussions & Criticisms:

Waaaaaaaaaait .......... isn't one of the original goals of the internet to be resilient against outages and to auto-magically route around them? What the hell happened? Did we totally fail? I guess we did :(

I guess the root cause for the failure comes from the need for both intra-AS and inter-AS routing algorithms to avoid fluctuations and route flaps, which necessarily mean slow convergence to new route table state when existing routes go down. So basically RON type overlay is necessary for quick outage detection.

Does RON ever switch back to a direct route once the traditional routing tables converge? Didn't manage to dig that out of their route selection algorithm.

Good piece of research but arguably a less worthy piece of software engineering. My main objection is that the system tries to do too much at once, again leading to a good technology that is very unlikely to see adoption. The key idea is to have some kind of overlay that routes around outages. But the RON system, as it is presented, messes that up with tighter application integration, policy routing, and some what of a new routing protocol.

Ok so here's what I would do. My approach would be to do the minimalist design, i.e. overlays that route around outages with a minimalist routing protocol, do it TRANSPARENTLY to the application, and offer optional handles to specify application "outage" definitions. Routing policy specifications should be left in the hands of people who actually understand policy specs, and should come as a double optional configuration thing. "Legacy" applications running on the ordinary TCP stack should require minimal migration changes or no changes at all.

I would also put out Java/C/C++/Python libraries implementing this new overlay in a transparent fashion, basically still offering the nice socket interface, complete with all the necessary online API documentation.

The next step would be to partner with some enterprise/campus network, and have the system in limited, managed deployment to work out all the kinks (also data for more research).

Somewhere along the line we would need to figure out what applications/use cases would especially benefit from this overlay. Else there's no way we can convince ordinary users to install/configure the system. Good time to push for adoption would be after significant outages - "Use our system so that the next time you can keep working instead of just helplessly whine and moan and endlessly complain!" Somewhat devious .... benefiting from outages.

More seriously, technology wise, seems that a very small number of geographically/topologically dispersed nodes would be sufficient to facilitate routing around outages.

The resilience against flooding attack argument wouldn't work if nodes are not multi-homed.

Overlay routing itself seems to be an open question still.

Obvious security concerns with data traversing 3rd party network end points.


ACTIVE NETWORK VISION AND REALITY: LESSONS FROM A CAPSULE-BASED SYSTEM

David Whetherall


Summary:


Talks about the experiences of implementing the ANTS system and how it shaped the author's views on the original vision for active systems. ANTS is implemented in Java, has basic active network functionality, and includes a sandbox (extension of existing Java security mechanisms) for running active network code. Active network node receives capsules, demux for the appropriate routine to run, runs the routine (which may include forwarding), and repeats. Variety of outstanding issues including security and resource isolation. Key lessons include capsules are useful, but need traffic patterns that made code caching effective; sandboxing achieves some local isolation/protection, but attacks against the network still need to be stopped; greatest value in allowing evolution of services in the network layer.

Discussions & Criticisms:

Fingers tired from the RON ramble. This might be short.

Is the fundamental goal of the network to communicate or to compute? Hmm ...

Their "Deployment" section basically tells me the technology is not deployed. But nice to see the authors being sensitive to deployment requirements and issues.

Any business case for offering new services in the network?

DNS caching results may undermine their reliance/assumption of traffic patterns that allow code caching to be effective.

Hmm ... the real value of active networks seems to be network services variation/evolution. There's gotta be more uses for this very "different" idea.

ASs may prevent active network services from traversing administrative boundaries? "Why should I use resources on my own network to help you compute services for which users pay only you and not also us?"

Seems really useful for sensor nets though.



2008-10-21 Readings - Topology


ON POWER LAW RELATIONSHIPS OF THE INTERNET TOPOLOGY

Michalis Faloutsos, Petros Faloutsos, Christos Faloutsos

Summary:


Presents analysis of routing data to extract insights of Internet topology circa 1997-1998. Data set comes from BGP routing tables between ASs and router tables. Internet growth during the period covered is 45%. Key finding is that many topology metrics have power law relationships (A = kB^p or logA = plogB + logk) to one another. Fit data to power laws using linear regression, with correlation coefficients ~0.96 and above, sometimes as high as 0.99. Specific power laws found include out-degree = rank of the node to the power of ~-0.80, frequency (number) of nodes of an out-degree = out-degree to the power of ~-2.2, pairs of nodes separated by a number of hops = number of hops to the power of ~4.7, with the upper bound of number of hops < graph =" order" style="font-weight: bold;">

Discussions & Criticisms:


Are the authors related? Couldn't figure out whether they are relations or not, but Michalis and Petros' webpages make it looks like there's a Faloutsos clan out there.

Must have taken a LONG time to crunch the data and generate the graphs. I like how the key results are presented tersely in nicely highlighted boxes.

What is the significance of the eigenvalue of graphs? The authors mention the eigenvalue power law exponent is good for distinguishing different families of graphs. Other significance?

There is not much to criticize. Their mathematics is near air-tight (at least the couple of small parts that I cared to check). One can complain about the representativeness of their data, but their power law relationships are so near identical across data sets that it has to be some really non-fundamental coincidence to give them such nice results. Also the linear regression coefficients are as high as any experimental data can hope to get. Really surprising.

If there is a shortcoming I guess it's the unclear way of how someone could use their results. I buy their results and I concur that the power law perspective is really nice for understanding and essential for simulation, and it should also inform network design. But how does one begin to incorporate their findings in actual design tasks? Any related work that uses their power law findings?

The authors make me feel inadequate about my own math .... Must seek to arm myself with some basic level of number crunching competence. Would help with understanding machine learning stuff too.


A FIRST PRINCIPLES APPROACH TO UNDERSTANDING THE INTERNET'S ROUTER LEVEL TOPOLOGY

Lun Li, David Alderson, Walter Willinger, John Doyle


Summary:


Presents toy models to show that the power-law focus on building Internet models fail, and models more guided by first-principle designs would be better. Basic idea is that many randomly generated graphs can fit the macroscopic power laws, but not all of those would be a good representation of the Internet. Presents topology choices as driven by technology and economic constraints, and thus are not randomly generated at all. Uses toy topologies including preferential attachment, randomly generated graphs, heuristically optical graphs, graphs inspired by some real-life systems, and graphs designed to be sub-optimal. All conform to the same power laws, but some are clearly better than others. Uses traditional performance metrics, graph theoretic metrics (same for all topologies), and graph likelihood metric (roughly how likely would the given graph topology turn up in a random graph generator).

Discussions & Criticisms:

Pretty complicated and heavily theoretical paper to present a very simple idea - that Internet shows power law relationships doesn't mean stuff with power law relationships would represent the Internet.

Good paper to temper the (over)enthusiasm over power laws.

Good discussion of economic constraints.

Took a LONG time to really make it clear what their "first principles approach" actually is.

Caveat for the other implication they brought up. Just because things that have been designed are unlikely to come from randomness, things that are not random doesn't automatically imply the presence of some organized design.

At what point does model become attempt to replicate or attempt to snap shot?



2008-10-16 Readings - Sensor Networks


TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS

Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong


Summary:


Describes a way to implement SQL-like functionality for ad-hoc sensor nets. There are two key ideas. First, users write SQL-like stuff to query the sensor net, instead of writing low-level potentially buggy C code to do the same thing. Second, the query will be fulfilled as it is "incast" back to the query source. Lots of tricks to save the amount of transmission to reduce energy consumption. Simulation based evaluation and prototype implemented. Results show their system performs much better than "centralized" query in cutting down communication.

Discussion & Criticisms:

Philosophically speaking, I do believe aggregation is very important for sensor networks. Also, detecting anomalies also very important. So are we opening some sort of flood gate of requests for all functionalities to be put on the sensor motes? Ok so we can express anomalies detection queries in SQL. Non-SQL expressible queries? If there can be such things ....

Transmission power >> on-board computational power for wireless. Therefore all the efforts to do as much as possible through active nodes in the network makes complete sense. Should borrow this general approach for other wireless apps. Wired situations on-board computation power ~ or > transmission power, so duplicate computations may be more power intensive there. How can we systematically measure end-to-end power efficiency in wireless?

Did the paper try to cram too much into a limited amount of space? It also defined more or less a new database query language.

Insights can be recyled in DHT stuff? Basically the paper outlines a way to implement a distributed database capable of responding to a variety of queries. There's GOT to be use for the same ideas elsewhere.

Agree with the usability argument that user-facing API should be as high level as possible. Wouldn't want to have a null pointer exception from a mote ... I want to actually use this!

Also liked the work as an example of mixing fields (databases and wireless) to create something really funky and really new. No longer a new idea and people have caught on and done follow up studies, right?



DIRECTED DIFFUSION: A SCALABLE AND ROBUST COMMUNICATION PARADIGM FOR SENSOR NETWORKS

Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin

Summary:


Describes the directed diffusion paradigm for sensor nets. Tasks described by attribute-value pair. A node injects "interest" in certain tasks. The "interest" gets forwarded periodically and kept alive periodically through the network. Nodes getting an "interest" may forward it to some subset of its neighbors. "Interest" state is cached at nodes, with state scaling with the number of active tasks. "Gradients" establish the direction of data propagation. Nodes propagate along the gradients the events they detect that match their cached interests. Can "reinforce" certain paths to select gradients for data flow based on empirical link quality.

Discussions & Criticisms:

The key ideas are not seen only here - doing computation in the network, use empirically good paths. I feel like doing some reference and publishing date chasing to see who borrowed from what, or have we been reinventing the wheel, so to speak. Would be sad if the wireless world got so big that even experts in the field cannot keep up .... TAG did give a nod to this paper, I don't think the Roofnet set of papers referenced this, even though they came later.

Data centric approach definitely smells of design sensibilities from databases. Chased down the authors backgrounds. Interestingly none of them have a prominant database background.

The authors explicity want to defer the study of naming to focus on the study of diffusion/data routing. I understand the need for this. But can it actually be done? Not convinced if all their results need to be qualified as "The system is thus given we name everything like this."

Periodic broadcast of "interest" - doesn't this lead to very bad/chatty protocol that's very power hungry? Possibly that alone undid all the savings from doing data transformation in the network. Their comparative simulations relative to flooding and omniscient multi-cast would fail to capture this cost.

State storage could become a scalability issues when the number of active tasks go up.

Again, doing more computations on nodes can save transmission energy - duplicate suppression unsurprisingly good.

I wonder what power model they used to compute/simulate dissipated energy. Looks like they only included transmission energy anyways. The authors weren't very detailed about this.

Did not like the somewhat unintuitive terminology.

I also noticed that my routine skepticism re wireless ad-hoc networks for IP type data reduces significantly when we're talking about wireless ad-hoc for sensor nets.


Monday, November 3, 2008

Doing drop-front queueing on readings


But will blog
in order. Hence the appearance of being severely behind :-)

Drop-front really IS the best queuing scheme.