Tuesday, November 25, 2008

Improving Map-Reduce Performance in Heterogeneous Environments

Andy and Matei's great work over the last year - if you are in the RAD Lab you just cannot have not heard this talk! Awesome work - smart problem selection and neat solution!

Hadoop is an open-source implementation of MapReduce. The implicit assumption is that nodes are homogeneous and increasingly this assumption is turning out to be wrong with environments like Amazon's Ec2 becoming popular. Hadoop deals with "stragglers" by starting their job on multiple machines and take results from the one (original or backup) that finished earliest. The authors argue successfully that this is not the way to go.

LATE (Longest Approximate Time to End) is their solution. It uses finishing times instead of progress rates in decisions. They use progress counters to figure out when each task will finish. They execute the farthest-to-finish tasks on idle fast machines.

This paper makes the assumption that all tasks take the same amount of time. A straggler might actually be executing a heavy task that is making it go slow...differentiating between this case and genuinely slow-working machines could be a nice principled future work. This might lead to an intelligent assignment mechanism that allocates tasks based on how heavy they are, and the capabilities of the machines.

Also, how much of this is a problem with Hadoop as opposed to Map-Reduce as such? The good question would be to ask, Are we solving Hadoop's problems or MapReduce's problems? Looks like Hadoop has made a set of sub-optimal assumptions and we should work more towards identifying general problems. That said, this probably shows the complexity in handling corner-cases with distributed systems...

Policy-Aware Switching Layer

I have talked with Dilip so many times about this work and have used pLayer code myself - very nice!

Middleboxes are here to stay (whether purists like them or not) and datacenters typically have load balancers, firewalls, SSH offload boxes etc. in them. PLayer is a proposal for middlebox deployments and the proposal at heart, is quite simple. Configure policies, the set and sequence of middleboxes to traverse, at a centralized controller that then distributes it to the PSwitches.

I loved the way the paper goes about describing and motivating the problems - section 2.2 was very helpful in getting the magnitude of the problem and why existing solutions are totally ad hoc and ugly.

The solution is to specify policies that indicate the sequence of middleboxes to traverse for each traffic type. They talk only about middlebox types and not instances enabling fancy load balancing and resilience techniques. The details are well explained - seamless switchover on policy changes probably has taken up a little more space than warranted but no questions about its utility.

I had a question about the evaluation - it seemed to evaluate something that PLayer wasn't solving! The problem why PLayer came into existence was that middlebox configuration was a hard problem... and the ideal evaluation in such papers have to come from some sort of a testing on actual network operators. Evaluating latency, throughput etc. on a software router really doesn't indicate much.

Overall, a very nice paper...

Thursday, November 20, 2008

Scalable Application Layer Multicast

This paper starts off lamenting about the fact that network-layer multicast solutions have not been adopted even after a decade of their proposals and hence the push for application-level solutions. Application-level multicast solutions are obviously less efficient than network-level solutions and these can be quantified in terms of stress (duplicating packets on a link) and stretch (sub-optimal path to reach a node).

NICE pushes for a tree hierarchy. Each layer in the tree is divided into clusters and a member of each cluster belongs to a higher-level cluster. The cluster leader is chosen such that it has the minimum distance to all the other hosts in teh cluster. This choice is important in ensuring that new members are quickly able to find their aposition in the hierarchy.

One immediate question that pops up is why a tree and why not something else? While the authors can claim tree-based topologies to be the common thing in multicast literature, it would still have been useful to argue about this. Overall, this paper had a feel of being way too complicated unnecessarily.

Their results are against Narada and probably Narada was the new kid on the block then...but then is it really the gold standard?

I personally would vote against keeping this paper and favour the Narada or Overcast paper (the Overcast paper is extremely lucid and comes across as highly practical).

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

(One day when I grow up, I want to write a paper with a title longer than this! :P )

This is a beautiful paper describing why TCP is a bad model for achieving reliable multicast. It is an enormous burden and sometimes impossible for the sender to keep state of of each receiver in the multicast group. Throw in high churn with receivers coming and dropping out, and you have an obvious scalability problem. The logical solution is to make the receivers keep state and make them responsible.

In-order delivery is the second major selling point of TCP, and inarguably achieved at a high premium. This paper proposes doing away with it and let the application deal with ordering using numbering in application data units (ADU). Simple, but effective solution. If a recipient notices a "hole", it sends out a request for repairing that hole. Probabilistic delays in these requests and repair messages guard against an implosion.

I liked the overall design philosophy in this paper - one-size-fits-all is not sensible and not trying to do too many things. I had my questions about the scalability of this approach but I am dead sure that multicast is not (and will not be!) deployed on a large-scale anywhere. The largest I can imagine is within an enterprise for streaming talks...and these are not particularly challenged networks. I think SRM should more than do the job. Nice and neat!

Tuesday, November 18, 2008


Data Oriented Transfer service is an architecture that aims to separate the data transfer from the control layer. This is related to PANTS, my class project. Even the APIs they have are similar to what we came up with.

I like the motivating scenarios in the paper - all of them seem practically useful to me. The authors come up with a solution that is flexible but obviously this seems like a presentation of a design space as opposed to any concrete solution. I like the way this paper goes about its job.

Their evaluations seem almost apologetic to me - it is more to show that performance does not really crap out. I would have liked a better, if only analytical, evaluation of the possible benefits of say, cross-application caching. They could have used some sample traces from a set of users to see if there are benefits to doing this. I am sure people will cricitize this paper on this count...and that would be sad.

I give them credit for integrating DOT with Postfix - for what it's worth, I always love a real implementation in a paper, even if it's novelty value isn't super-high. To be able to prove that things can work with your system is quite a big statement.

I vote for keeping it in.


Traditionally network protocols have been designed with assumptions about delay and loss not being excessive and an end-to-end path more often than not exists. DTNs violate these assumptions and the authors give a list of example scenarios in the genre of mobile ad hoc networks. My doubts about DTNs "becoming important" persist - the scenarios listed here seem outlandish and I am not convinced these are becoming prevalant or will ever. This reminds me of the din surrounding the work on MANETs...all exciting research problems but (I apologize) contrived scenarios. The standard approach of using a special proxy/gateway to connect to these networks along with specialized protocols internally seems to be more than enough. Their argument about not being able to use these networks as "transit" doesn't seem to convince me. Don't get me wrong - I totally understand that the problem is immensely challenging in the context where these networks operate. Anyways, let's read on...

The paper does a good job of listing the challenges DTNs pose and how they are markedly different from traditional networks. This paper was published in 2003 and I am not sure if there was enough of deployments and data, but I would have loved to see some data to back their challenges - what are the communication patterns, how often, how many bytes etc.

Their architecture is pretty solid and their naming and forwarding schemes are nice. The best thing seems to be that this is entirely generic but then it really isn't optimized to any network. And my guess is that the moment you start optimizing it for different scenarios, we will end up at a state not too different from where we are now!

This is a good overview paper laying down a nice framework with all the possible issues (though sometimes it goes overboard, e.g., with security - now that clearly is totally unnecessary in these settings!). But I don't think this paper finds a place in a class teaching networking fundamentals.

Thursday, November 6, 2008

Middleboxes No Longer Considered Harmful

This paper points out that middle-boxes and Internet architects have never been happy with each other and given the fact that middle-boxes have become enormously useful, integrating them into the Internet architecture is an important thing. To that end, they propose Delegation Oriented Architecture (DOA).

DOA achieves its goal using flat namespace of self-certifying names and the ability for the sender and receiver to define the intermediaries that process the packets. Routing between end-point identifiers (EIDs) using a global resolution service.

The reason for doing this seems unconvincing. Architectural cleanliness seems a good thing but given that NATs and firewalls have been deployed and work fairly well, it seems unnecessary to deploy this new addressing format etc. in DOA. At least, I am not convinced why someone would go for this. Peer-to-peer applications have also fixed this problem through an externally accessible machine. Even if you were to disregard all this, the whole system has serious performance concerns dependent on the DHT resolution.

Overall this paper has a highly "academic" feel to it and I wouldn't vote to keep it on the list.