CS268 F2008 Class Readings

Tuesday, November 11, 2008

2008-10-25 Readings - Overlay Networks

RESILIENT OVERLAY NETWORKS

David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris

Summary:

Presents the Resilient Overlay Networks (RON), an overlay network system to allow participating nodes to send data to each other around network outages. Basic idea is to have an overlay network, with participating nodes being hopefully geographically/topologically dispersed. Nodes run some algorithms to detect outage conditions, and switch from direct routes to source - 3rd party - destination type routes as appropriate. Include tight application integration mechanisms to specify what is an "outage". Also includes mechanisms to allow overlay routing per some policies and algorithm to detect outages and select alternate routes. Results show that the system addresses the problem that it sets out to address.

Discussions & Criticisms:

Waaaaaaaaaait .......... isn't one of the original goals of the internet to be resilient against outages and to auto-magically route around them? What the hell happened? Did we totally fail? I guess we did :(

I guess the root cause for the failure comes from the need for both intra-AS and inter-AS routing algorithms to avoid fluctuations and route flaps, which necessarily mean slow convergence to new route table state when existing routes go down. So basically RON type overlay is necessary for quick outage detection.

Does RON ever switch back to a direct route once the traditional routing tables converge? Didn't manage to dig that out of their route selection algorithm.

Good piece of research but arguably a less worthy piece of software engineering. My main objection is that the system tries to do too much at once, again leading to a good technology that is very unlikely to see adoption. The key idea is to have some kind of overlay that routes around outages. But the RON system, as it is presented, messes that up with tighter application integration, policy routing, and some what of a new routing protocol.

Ok so here's what I would do. My approach would be to do the minimalist design, i.e. overlays that route around outages with a minimalist routing protocol, do it TRANSPARENTLY to the application, and offer optional handles to specify application "outage" definitions. Routing policy specifications should be left in the hands of people who actually understand policy specs, and should come as a double optional configuration thing. "Legacy" applications running on the ordinary TCP stack should require minimal migration changes or no changes at all.

I would also put out Java/C/C++/Python libraries implementing this new overlay in a transparent fashion, basically still offering the nice socket interface, complete with all the necessary online API documentation.

The next step would be to partner with some enterprise/campus network, and have the system in limited, managed deployment to work out all the kinks (also data for more research).

Somewhere along the line we would need to figure out what applications/use cases would especially benefit from this overlay. Else there's no way we can convince ordinary users to install/configure the system. Good time to push for adoption would be after significant outages - "Use our system so that the next time you can keep working instead of just helplessly whine and moan and endlessly complain!" Somewhat devious .... benefiting from outages.

More seriously, technology wise, seems that a very small number of geographically/topologically dispersed nodes would be sufficient to facilitate routing around outages.

The resilience against flooding attack argument wouldn't work if nodes are not multi-homed.

Overlay routing itself seems to be an open question still.

Obvious security concerns with data traversing 3rd party network end points.

ACTIVE NETWORK VISION AND REALITY: LESSONS FROM A CAPSULE-BASED SYSTEM

David Whetherall

Summary:

Talks about the experiences of implementing the ANTS system and how it shaped the author's views on the original vision for active systems. ANTS is implemented in Java, has basic active network functionality, and includes a sandbox (extension of existing Java security mechanisms) for running active network code. Active network node receives capsules, demux for the appropriate routine to run, runs the routine (which may include forwarding), and repeats. Variety of outstanding issues including security and resource isolation. Key lessons include capsules are useful, but need traffic patterns that made code caching effective; sandboxing achieves some local isolation/protection, but attacks against the network still need to be stopped; greatest value in allowing evolution of services in the network layer.

Discussions & Criticisms:

Fingers tired from the RON ramble. This might be short.

Is the fundamental goal of the network to communicate or to compute? Hmm ...

Their "Deployment" section basically tells me the technology is not deployed. But nice to see the authors being sensitive to deployment requirements and issues.

Any business case for offering new services in the network?

DNS caching results may undermine their reliance/assumption of traffic patterns that allow code caching to be effective.

Hmm ... the real value of active networks seems to be network services variation/evolution. There's gotta be more uses for this very "different" idea.

ASs may prevent active network services from traversing administrative boundaries? "Why should I use resources on my own network to help you compute services for which users pay only you and not also us?"

Seems really useful for sensor nets though.

2008-10-21 Readings - Topology

ON POWER LAW RELATIONSHIPS OF THE INTERNET TOPOLOGY

Michalis Faloutsos, Petros Faloutsos, Christos Faloutsos

Summary:

Presents analysis of routing data to extract insights of Internet topology circa 1997-1998. Data set comes from BGP routing tables between ASs and router tables. Internet growth during the period covered is 45%. Key finding is that many topology metrics have power law relationships (A = kB^p or logA = plogB + logk) to one another. Fit data to power laws using linear regression, with correlation coefficients ~0.96 and above, sometimes as high as 0.99. Specific power laws found include out-degree = rank of the node to the power of ~-0.80, frequency (number) of nodes of an out-degree = out-degree to the power of ~-2.2, pairs of nodes separated by a number of hops = number of hops to the power of ~4.7, with the upper bound of number of hops < graph =" order" style="font-weight: bold;">

Discussions & Criticisms:

Are the authors related? Couldn't figure out whether they are relations or not, but Michalis and Petros' webpages make it looks like there's a Faloutsos clan out there.

Must have taken a LONG time to crunch the data and generate the graphs. I like how the key results are presented tersely in nicely highlighted boxes.

What is the significance of the eigenvalue of graphs? The authors mention the eigenvalue power law exponent is good for distinguishing different families of graphs. Other significance?

There is not much to criticize. Their mathematics is near air-tight (at least the couple of small parts that I cared to check). One can complain about the representativeness of their data, but their power law relationships are so near identical across data sets that it has to be some really non-fundamental coincidence to give them such nice results. Also the linear regression coefficients are as high as any experimental data can hope to get. Really surprising.

If there is a shortcoming I guess it's the unclear way of how someone could use their results. I buy their results and I concur that the power law perspective is really nice for understanding and essential for simulation, and it should also inform network design. But how does one begin to incorporate their findings in actual design tasks? Any related work that uses their power law findings?

The authors make me feel inadequate about my own math .... Must seek to arm myself with some basic level of number crunching competence. Would help with understanding machine learning stuff too.

A FIRST PRINCIPLES APPROACH TO UNDERSTANDING THE INTERNET'S ROUTER LEVEL TOPOLOGY

Lun Li, David Alderson, Walter Willinger, John Doyle

Summary:

Presents toy models to show that the power-law focus on building Internet models fail, and models more guided by first-principle designs would be better. Basic idea is that many randomly generated graphs can fit the macroscopic power laws, but not all of those would be a good representation of the Internet. Presents topology choices as driven by technology and economic constraints, and thus are not randomly generated at all. Uses toy topologies including preferential attachment, randomly generated graphs, heuristically optical graphs, graphs inspired by some real-life systems, and graphs designed to be sub-optimal. All conform to the same power laws, but some are clearly better than others. Uses traditional performance metrics, graph theoretic metrics (same for all topologies), and graph likelihood metric (roughly how likely would the given graph topology turn up in a random graph generator).

Discussions & Criticisms:

Pretty complicated and heavily theoretical paper to present a very simple idea - that Internet shows power law relationships doesn't mean stuff with power law relationships would represent the Internet.

Good paper to temper the (over)enthusiasm over power laws.

Good discussion of economic constraints.

Took a LONG time to really make it clear what their "first principles approach" actually is.

Caveat for the other implication they brought up. Just because things that have been designed are unlikely to come from randomness, things that are not random doesn't automatically imply the presence of some organized design.

At what point does model become attempt to replicate or attempt to snap shot?

2008-10-16 Readings - Sensor Networks

TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS

Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong

Summary:

Describes a way to implement SQL-like functionality for ad-hoc sensor nets. There are two key ideas. First, users write SQL-like stuff to query the sensor net, instead of writing low-level potentially buggy C code to do the same thing. Second, the query will be fulfilled as it is "incast" back to the query source. Lots of tricks to save the amount of transmission to reduce energy consumption. Simulation based evaluation and prototype implemented. Results show their system performs much better than "centralized" query in cutting down communication.

Discussion & Criticisms:

Philosophically speaking, I do believe aggregation is very important for sensor networks. Also, detecting anomalies also very important. So are we opening some sort of flood gate of requests for all functionalities to be put on the sensor motes? Ok so we can express anomalies detection queries in SQL. Non-SQL expressible queries? If there can be such things ....

Transmission power >> on-board computational power for wireless. Therefore all the efforts to do as much as possible through active nodes in the network makes complete sense. Should borrow this general approach for other wireless apps. Wired situations on-board computation power ~ or > transmission power, so duplicate computations may be more power intensive there. How can we systematically measure end-to-end power efficiency in wireless?

Did the paper try to cram too much into a limited amount of space? It also defined more or less a new database query language.

Insights can be recyled in DHT stuff? Basically the paper outlines a way to implement a distributed database capable of responding to a variety of queries. There's GOT to be use for the same ideas elsewhere.

Agree with the usability argument that user-facing API should be as high level as possible. Wouldn't want to have a null pointer exception from a mote ... I want to actually use this!

Also liked the work as an example of mixing fields (databases and wireless) to create something really funky and really new. No longer a new idea and people have caught on and done follow up studies, right?

DIRECTED DIFFUSION: A SCALABLE AND ROBUST COMMUNICATION PARADIGM FOR SENSOR NETWORKS

Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin

Summary:

Describes the directed diffusion paradigm for sensor nets. Tasks described by attribute-value pair. A node injects "interest" in certain tasks. The "interest" gets forwarded periodically and kept alive periodically through the network. Nodes getting an "interest" may forward it to some subset of its neighbors. "Interest" state is cached at nodes, with state scaling with the number of active tasks. "Gradients" establish the direction of data propagation. Nodes propagate along the gradients the events they detect that match their cached interests. Can "reinforce" certain paths to select gradients for data flow based on empirical link quality.

Discussions & Criticisms:

The key ideas are not seen only here - doing computation in the network, use empirically good paths. I feel like doing some reference and publishing date chasing to see who borrowed from what, or have we been reinventing the wheel, so to speak. Would be sad if the wireless world got so big that even experts in the field cannot keep up .... TAG did give a nod to this paper, I don't think the Roofnet set of papers referenced this, even though they came later.

Data centric approach definitely smells of design sensibilities from databases. Chased down the authors backgrounds. Interestingly none of them have a prominant database background.

The authors explicity want to defer the study of naming to focus on the study of diffusion/data routing. I understand the need for this. But can it actually be done? Not convinced if all their results need to be qualified as "The system is thus given we name everything like this."

Periodic broadcast of "interest" - doesn't this lead to very bad/chatty protocol that's very power hungry? Possibly that alone undid all the savings from doing data transformation in the network. Their comparative simulations relative to flooding and omniscient multi-cast would fail to capture this cost.

State storage could become a scalability issues when the number of active tasks go up.

Again, doing more computations on nodes can save transmission energy - duplicate suppression unsurprisingly good.

I wonder what power model they used to compute/simulate dissipated energy. Looks like they only included transmission energy anyways. The authors weren't very detailed about this.

Did not like the somewhat unintuitive terminology.

I also noticed that my routine skepticism re wireless ad-hoc networks for IP type data reduces significantly when we're talking about wireless ad-hoc for sensor nets.

Monday, November 3, 2008

Doing drop-front queueing on readings

But will blog in order. Hence the appearance of being severely behind :-)

Drop-front really IS the best queuing scheme.

Wednesday, October 29, 2008

2008-10-09 Readings - Making the Best of Broadcast

ExOR: OPPORTUNISTIC MULTI-HOP ROUTING FOR WIRELESS NETWORKS

Sanjit Biswas and Robert Morris

Summary:

Describes ExOR, a routing + MAC protocol for multi-hop wireless networks. Basic idea is that since each transmission is broadcast, some packets may be received by far away highly lossy links. Traditional routing (ETT/ETX flavor) would not take advantage of such opportunities, and all data would go along fixed route. Key elements include a packet buffer to store received packets in a batch, a mechanism to determine what is the set of nodes that may have received the packet, a mechanism to determine which nodes actually received the packet, and a mechanism to determine which node should next transmit towards the destination. Measurements under certain contexts show it's a significant improvement over ETX.

Discusisons & Criticisms:

Yet another study done on the Roofnet platform. Basically the whole train of work would be waste if one day someone shows directional antenna > omnidirectional antenna, or someone finds something really wrong with Roofnet.

Knee-jerk skepticism re multi-hop wireless networks.

The Roofnet paper also came out at the same time as this (SIGCOMM 2005 for this and MobiComm 2005 for Roofnet). Roofnet uses ETT as a significant improvement over ETX. Why wasn't ETT talked about here?

Also, per ETX paper, ETX is highly susceptible to discrepancies between loss rate measurement packet size and data packet size. In particular, for the ETX implementation in the ETX paper, packets at 1024 bytes perform VERY POORLY. The ExOR comparison measurements here use 1024 bytes packets, without telling us whether the ETX problems noted in the ETX paper have been properly mitigated. So comparison conclusions are sketchy.

Would be interesting for them to present the "average" hop count reduction that ExOR makes. From the Sally Floyd wireless modeling paper, we know that any intermediate transmission should be minimized because wireless is by nature not duplex.

Reduced throughput variation was cool though.

XORs IN THE AIR: PRACTICAL WIRELESS NETWORK CODING

Sachin Katti, Hariharan Rahul, Wenjun Hu, Dina Katabi, Muriel Medard, Jon Crowcroft

Summary:

Presents COPE (never said what it stands for), a new architecture for wireless mesh networks. Basic ideal is to have a relay XOR packets and send out. Omnidirectional broadcast means XOR packets can be decoded by each sender if they know their own transmitted packet, and they're receiving an unknown packet. This would potentially cut the transmission count. Benefits of architecture can be characterized as coding gain (reduction in packet transmission needed per data) and coding/MAC gain (increased throughput due to traditional MAC fairness throttling relay send rate). Need mechanisms to opportunistically listen, opportunistically code through XOR, and learn neighbor state. Measurements on single floor wireless testbed show significant improvement for both TCP and UDP.

Discussions & Criticisms:

My standard skeptical reaction towards wireless multi-hop networks.

Also, a fundamental reservation about the overall approach. The authors showed that the traditional MAC concept of "fairness" is broken in a multi-hop context. So shouldn't the approach be redesigning the MAC instead of adding more complexity to cover up a broken MAC?

TCP losses > 10% and the thing still work? What happened to the 3% figure above which loss rate TCP basically fail and not make progress?

Good to see them use ETT routing as outlined in Roofnet paper.

Throughput was their only performance metric. Latency?

Also, their results for TCP is for an artificially compressed topology, and they themselves admit limited improvement with COPE in a real topology.

Increased RTT could push memory needs to unacceptable levels.

Overall, not sure if adding COPE is the right approach to address the problems they identified. Maybe designing a better MAC for wireless mesh would be more helpful.

Sunday, October 26, 2008

2008-10-07 Readings - Routing in Ad-hoc Networks

A HIGH THROUGHPUT PATH METRIC FOR MULTI-HOP WIRELESS ROUTING

Douglas S. J. De Couto, Daniel Aguayo, John Bicket, Robert Morris.

Summary:

Presents the ETX (Estimated transmission count) metric. Motivated by the fact that conventional DSDV hop-count metric is broken - picks long links with high error rates instead of short links with low error rates. Routing packets get through in DSDV but data packet does not. ETX calculated as 1/(df x dr) where df and dr are the forward and reverse delivery probabilities, and their product is the probability that a packet is transmitted successfully. df and dr found using dedicated link probe packets, sent according to some known size and average frequency. Can be implemented by modifying DSDV and DSR. Performance is best when the network has many loss links, or transmitters have low power. For a given network, performance improve is best for multi-hop links. Discrepancy between size of link probe packets and data packets can decrease performance.

Discussions & Criticisms:

Still don't somewhat skeptical about the multi-hop wireless network concept ... Where would such networks be needed, given that we have so much wired infrastructure in place, even in 3rd world countries? Seems only needed for sensor nets and other such niche applications.

As already mentioned in their own evaluation, any time you have a transmission count based thing you would likely have issues with packet size variations. Glad to see they modified the metric to ETT (Estimated transmission time) for Roofnet.

Also, the metric solves specifically the problem of conventional DSDV not finding routes with low loss rates. What about other metrics for evaluating a routing algorithm? Overhead (for both traffic and latency)? Stability? Speed of convergence? Topology changes (either nodes moving or link quality change)?

It wasn't immediately clear that with ETX replacing hop-count in DSDV, DSDV will preserve its loop-free property.

I think the whole wireless multi-hop routing thing has gone through the typical borrow-first refine-later process. That process is pretty evident in the paper.

A PERFORMANCE COMPARISON OF MULTI-HOP WIRELESS AD HOC NETWORK ROUTING PROTOCOLS

Josh Broch, David A. Maltz, David B. Johnson, Yih-Chun Hu, Jorjeta Jetcheva.

Summary:

Compares the performance of DSDV, TORA, DSR, and AODV protocols. Emphasize the performance of protocols under various degrees of mobility. Done in ns2 with several modification to take care of wireless link details. Movement patterns with each node move to a random destination at a constant uniformly distributed speed, and then pause for a range of discrete pause times. Nodes send at constant bit rate, and various number of nodes send for various scenarios. Figure of merit include delivery ratio, and number of routing packets (overhead). Bottom line finding: AODV and DSR performs well, DSDV and TORA not so well.

Discussions & Criticisms:

Routing skepticism re multi-hop wireless networks.

The study emphasize mobility a lot. Good thing their results extend to pretty long pause times (900 seconds = 15 minutes), else we wouldn't know the protocol performance for stable topologies. I would argue they need even longer pause times. 30 sources and 15 minutes pause time = a topology change roughly every 30 seconds - still a very very non-stable network.

Also another interesting axis to plot might be the performance vs. the speed at which the nodes moved.

Maybe the Roofnet guys need to build a car-roof net and repeat this experiment for a "real" implementation comparison. Would be more convincing than ns2 stuff.

Sunday, October 19, 2008

2008-10-02 Readings - Wireless Networks in the Real World

MODELING WIRELESS LINKS FOR TRANSPORT PROTOCOLS

Andrei Gurtov, Sally Floyd

Summary:

Talks about how to model real world wireless links. Critical topic because modeling and simulation is precursor to any extensive and/or expensive field tests. Includes discussion about essential aspects of models, examples of bad models in literature, modeling various link characteristics, modeling queue management, effect on mobility, and wireless link vs. transport protocol issues.

Background:

Needs to know basic wireless 802.11 type knowledge.

Discussions and Criticisms:

Paper talks well about science. More helpful would be a operational checklist of common modeling pitfalls, and list of how to model each aspect of network. This way somebody wanting to use wireless link models can quickly reference the essential results of the work. For my own future reference, I make the list below.

Common modeling pitfalls (may apply for wired as well)

Unrealistic models:
- TCP with unrealistically high loss rates.
- Evaluating active queue management by focusing on TCP start-up.
- Assuming regular inter-packet delays.
- Constant TCP retransmission timer.
- Using deprecated TCP versions.
- Modeling WLAN as duplex.

Modeling a subset of parameter space:
- Evaluate using a single flow.
- Assume a particular link is bottleneck and not try the opposite assumption.
- Assume reserves transmission.

Overly realistic models:
- Focus on transient implementation flaws.

Lack of reproducibility (my routine complaint against a lot of methodological descriptions, far less emphasized today as it was in the early days of modern science):
- Not giving enough modeling details.

Also, list of wireless link aspects being modeled:

- Error loss & corruption.
- Delay variation.
- Packet reordering.
- On-demand resource allocation.
- Bandwidth variation.
- Asymmetry in bandwidth/latency.
- Queue management.
- Mobility.

ARCHITECTURE AND EVALUATION OF AN UNPLANNED 802.11B MESH NETWORK

John Bicket, Daniel Aguayo, Sanjit Biswas, Robert Morris.

Summary:

Talks about the architecture and realization of the Roofnet system. Key aspects of the system include totally unplanned - volunteers sign up and get antenna kit, omnidirectional antennas, use multi-hop routing, with estimated transmission time (ETT) metric - derived from estimated transmission count (ETX), which is presented in a separate paper.

Background:

Probably should read the ETX paper before this one. Also 802.11b knowledge. Maybe something about the different mesh routing algorithms would also help.

Discussions & Criticisms:

Pretty thorough performance characterization. However most of the results have no baseline for comparison - ok it's good, but HOW good is it? Not clear what should be the baseline of comparison either. Some kind of planned network? Directional antenna? Single hop to gateway? Some mixture of everything?

Tables 4 & 5 pretty valuable - comparison of multi-hop and single-hop networks.

Two key insights:
- Transmissions on multi-hops will interfere with each other.
- 802.11 RTS/CTS is BROKEN. They found RTS/CTS doesn't do what it's supposed to do, and introduces overhead. The evaluation should be repeated for other more general 802.11 settings. If results are reproduced, then we need to give general advice that RTS/CTS should be turnned off.

One evaluation that I think they missed - evaluate network performance under heavy load/utilization. Their network is utilized around 70% - for the Internet gateway i.e. - can we push it up to heavier loads? Also, scalability of gateway - all traffic funnel through gateway, so gateway = bottleneck? This will be apparent only if the network more heavily loaded.