Thursday, November 13, 2008

2008-11-13 Readings - Measurement and Tracing


END-TO-END INTERNET PACKET DYNAMICS

Vern Paxson

Summary:

Presents results from large scale Internet tracing study. Order 20,000 TCP bulk transfers of 100 KB between 35 Internet sites. Measurements at Poisson intervals to extract time averages. Analysis network "pathologies" including out of order delivery, packet replication, and packet corruption that's not due to noise. Also finds the bottleneck bandwidth for each pair of links. Presents distributions of packet loss and packet delays. Methodological refinements over previous studies in various areas.

Discussions & Criticisms:

Does Vern always do his research by himself?

The bottleneck link findings - I would be more interested in where the bottleneck links are. Knowing where they are would allow us to figure out ways of relieving/removing the bottlenecks. Also what are T1 circuits?

The packet loss rates are quite high. And the claim is that TCP works nevertheless?

Figure 9 looks very similar to the graphs we have from DC Metro measurements. Probably worth going back to those graphs again and see if it's indeed the same phenomenon.

Interesting how the data is from 1995, but the publication is in 1999. Does it take 4 years to crunch through the data?

Internet in 1995 is order 5 million hosts. In 2008 it's order 500 million hosts. How big of an effort would it be to repeat a similar measurement study?

Also, 35 hosts out of 5 million is almost nothing. Vern noted several findings different from other studies in the same things. Fundamental shortcoming of any attempt to get a snap shot at the Internet?

Packet replication and corruption - hard to imagine what could happen ...

Infinite variance in loss outage duration! I wonder if that's still the case.

The Poisson sampling thing is really nice for measuring long term phenomena with a small amount of data. Don't see that often anymore. Did people forget the early 1990s papers?


X-TRACE: A PERVASIVE NETWORK TRACING FRAMEWORK

Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion Stoica

Summary:

Presents the architecture and implementation of the X-Trace infrastructure. Allows tracing of causal interactions across multiple network layers. Does so through book keeping in metadata that gets propagated through the network stack. Uses pushDown() and pushNext() operations for this purpose. Metadata easily goes into extensible headers for HTTP/TCP/IP protocols and the like. Offline reconstruction of call trees. Takes care of administrative control - people who request X-Trace doesn't necessarily see the traced data - data goes to the admin of each administrative domain. Needs modifications to protocol/clients/nodes.

Discusisons & Criticisms:

Requires modifying various protocols/clients/nodes. Already noted multiple times in the paper as a shortcoming. Probably a necessary thing if we were to really get comprehensive traces. I think the whole ugly business of tracing is not like ... shortcoming of the trace tools that they must modify various things, but a shortcoming in the original network design architecture that accounting/tracing was a very low priority. Sure the top priority of the DARPA project was blah blah survivability, but tracing/accounting/debugging capabilities shoudl arguably not even be a priority, but a general system design principle. Without these capabilities, we end up having systems that are hard to understand and hard to improve.

One could give the counter argument - screw accounting and tracing, what's important is having systems that work, and who cares about understanding why they work. In that view we'll often end up re-inventing the wheel and stumbling from revolutionary change to revolutionary change, with very little common evolutionary theme connecting it all.

Real life is somewhere in the middle .... Hence X-Trace I guess ...

I don't understand fundamentally what's preventing us from having a task graph? It's just metadata and task identifiers, right? So the metadata could be designed as extensible and carry multiple task identifiers? Also do something like XML name spaces and have each X-Trace metadata conform to a common format, but extensible beyond that? Could be nice for future evolution as well.

Paper did note that the key measurement for X-Trace's success is when it can help real life admins and developers debug/maintain their stuff. So one year since the paper was published, any happy stories in that regard?



No comments: