Tuesday, November 25, 2008
Improving Map-Reduce Performance in Heterogeneous Environments
Hadoop is an open-source implementation of MapReduce. The implicit assumption is that nodes are homogeneous and increasingly this assumption is turning out to be wrong with environments like Amazon's Ec2 becoming popular. Hadoop deals with "stragglers" by starting their job on multiple machines and take results from the one (original or backup) that finished earliest. The authors argue successfully that this is not the way to go.
LATE (Longest Approximate Time to End) is their solution. It uses finishing times instead of progress rates in decisions. They use progress counters to figure out when each task will finish. They execute the farthest-to-finish tasks on idle fast machines.
This paper makes the assumption that all tasks take the same amount of time. A straggler might actually be executing a heavy task that is making it go slow...differentiating between this case and genuinely slow-working machines could be a nice principled future work. This might lead to an intelligent assignment mechanism that allocates tasks based on how heavy they are, and the capabilities of the machines.
Also, how much of this is a problem with Hadoop as opposed to Map-Reduce as such? The good question would be to ask, Are we solving Hadoop's problems or MapReduce's problems? Looks like Hadoop has made a set of sub-optimal assumptions and we should work more towards identifying general problems. That said, this probably shows the complexity in handling corner-cases with distributed systems...
Policy-Aware Switching Layer
Middleboxes are here to stay (whether purists like them or not) and datacenters typically have load balancers, firewalls, SSH offload boxes etc. in them. PLayer is a proposal for middlebox deployments and the proposal at heart, is quite simple. Configure policies, the set and sequence of middleboxes to traverse, at a centralized controller that then distributes it to the PSwitches.
I loved the way the paper goes about describing and motivating the problems - section 2.2 was very helpful in getting the magnitude of the problem and why existing solutions are totally ad hoc and ugly.
The solution is to specify policies that indicate the sequence of middleboxes to traverse for each traffic type. They talk only about middlebox types and not instances enabling fancy load balancing and resilience techniques. The details are well explained - seamless switchover on policy changes probably has taken up a little more space than warranted but no questions about its utility.
I had a question about the evaluation - it seemed to evaluate something that PLayer wasn't solving! The problem why PLayer came into existence was that middlebox configuration was a hard problem... and the ideal evaluation in such papers have to come from some sort of a testing on actual network operators. Evaluating latency, throughput etc. on a software router really doesn't indicate much.
Overall, a very nice paper...
Thursday, November 20, 2008
Scalable Application Layer Multicast
NICE pushes for a tree hierarchy. Each layer in the tree is divided into clusters and a member of each cluster belongs to a higher-level cluster. The cluster leader is chosen such that it has the minimum distance to all the other hosts in teh cluster. This choice is important in ensuring that new members are quickly able to find their aposition in the hierarchy.
One immediate question that pops up is why a tree and why not something else? While the authors can claim tree-based topologies to be the common thing in multicast literature, it would still have been useful to argue about this. Overall, this paper had a feel of being way too complicated unnecessarily.
Their results are against Narada and probably Narada was the new kid on the block then...but then is it really the gold standard?
I personally would vote against keeping this paper and favour the Narada or Overcast paper (the Overcast paper is extremely lucid and comes across as highly practical).
A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing
This is a beautiful paper describing why TCP is a bad model for achieving reliable multicast. It is an enormous burden and sometimes impossible for the sender to keep state of of each receiver in the multicast group. Throw in high churn with receivers coming and dropping out, and you have an obvious scalability problem. The logical solution is to make the receivers keep state and make them responsible.
In-order delivery is the second major selling point of TCP, and inarguably achieved at a high premium. This paper proposes doing away with it and let the application deal with ordering using numbering in application data units (ADU). Simple, but effective solution. If a recipient notices a "hole", it sends out a request for repairing that hole. Probabilistic delays in these requests and repair messages guard against an implosion.
I liked the overall design philosophy in this paper - one-size-fits-all is not sensible and not trying to do too many things. I had my questions about the scalability of this approach but I am dead sure that multicast is not (and will not be!) deployed on a large-scale anywhere. The largest I can imagine is within an enterprise for streaming talks...and these are not particularly challenged networks. I think SRM should more than do the job. Nice and neat!
Tuesday, November 18, 2008
DOT
I like the motivating scenarios in the paper - all of them seem practically useful to me. The authors come up with a solution that is flexible but obviously this seems like a presentation of a design space as opposed to any concrete solution. I like the way this paper goes about its job.
Their evaluations seem almost apologetic to me - it is more to show that performance does not really crap out. I would have liked a better, if only analytical, evaluation of the possible benefits of say, cross-application caching. They could have used some sample traces from a set of users to see if there are benefits to doing this. I am sure people will cricitize this paper on this count...and that would be sad.
I give them credit for integrating DOT with Postfix - for what it's worth, I always love a real implementation in a paper, even if it's novelty value isn't super-high. To be able to prove that things can work with your system is quite a big statement.
I vote for keeping it in.
DTN
The paper does a good job of listing the challenges DTNs pose and how they are markedly different from traditional networks. This paper was published in 2003 and I am not sure if there was enough of deployments and data, but I would have loved to see some data to back their challenges - what are the communication patterns, how often, how many bytes etc.
Their architecture is pretty solid and their naming and forwarding schemes are nice. The best thing seems to be that this is entirely generic but then it really isn't optimized to any network. And my guess is that the moment you start optimizing it for different scenarios, we will end up at a state not too different from where we are now!
This is a good overview paper laying down a nice framework with all the possible issues (though sometimes it goes overboard, e.g., with security - now that clearly is totally unnecessary in these settings!). But I don't think this paper finds a place in a class teaching networking fundamentals.
Thursday, November 6, 2008
Middleboxes No Longer Considered Harmful
DOA achieves its goal using flat namespace of self-certifying names and the ability for the sender and receiver to define the intermediaries that process the packets. Routing between end-point identifiers (EIDs) using a global resolution service.
The reason for doing this seems unconvincing. Architectural cleanliness seems a good thing but given that NATs and firewalls have been deployed and work fairly well, it seems unnecessary to deploy this new addressing format etc. in DOA. At least, I am not convinced why someone would go for this. Peer-to-peer applications have also fixed this problem through an externally accessible machine. Even if you were to disregard all this, the whole system has serious performance concerns dependent on the DHT resolution.
Overall this paper has a highly "academic" feel to it and I wouldn't vote to keep it on the list.
i3
Internet Indirection Infrastructure (i3) is a neat idea that presents a communication abstraction based on an overlay. The key notion is the decoupling the act of sending from receiving. The paper is motivated by the fact that while IP has served the Internet pretty well in its original form, adding more services to it like multicast, mobility has been a problem with inelegant solutions.
In comes i3! In i3, nodes interested in receiving packets insert a trigger in the i3 overlay node. Packets are sent to identifiers and appropriately forwarded to interested receivers – delivery is best-effort as the Internet. The overlay consists of a set of i3 nodes with each of them responsible for a chunk of identifiers. Mapping of the identifiers to the nodes is through Chord. Every packet is routed to the i3 node responsible for the packet’s identifier which in turn forwards it to the interested receivers.
I would have liked a little bit of a discussion on what naming service is required for i3 – mapping servers to identifiers – but I guess it shouldn’t be too different than DNS. Routing inefficiency is a major problem with this overlay and if this were to be deployed at a large scale, can potentially be a show-stopper. The solution for it sort of hand-wavy and I didn’t like that.
Given that now the i3 nodes are crucial to the communication, the churn in the overlay can potentially leave TCP extremely confused! Handling the dynamics already present is hard enough – i3 just added a lot more using overlay nodes that can come and go, and possibly be overloaded causing more dynamicity.
I liked the security analysis. The anonymity argument made me uncomfortable. Contrary to their argument, tracing back attackers is a much more important problem than hiding your own identity. In the earlier Internet, attackers at least had to spoof their addresses (and fight against techniques that prevent that) to avoid being traced back – with i3 they wouldn’t even have to do that. I think this is a fairly serious concern. On the other hand, the fact that the receiver can remove a trigger at will makes it less vulnerable to DoS attacks.
All these neat ideas make me wonder what exactly is the E2E principle relevant for and why is it such a big deal in the networking world?! Fine paper, a keeper!
Tuesday, November 4, 2008
DNS Performance and the Effectiveness of Caching
Over a third of all DNS lookups were not answered successfully. 23% of the client lookups in the MIT trace failed to elicit any answer while 13% lookups gave errors in the answer. This seems a fairly high number...wonder what it would be today. What does "not elicit any answer" mean? Packet loss?
As expected, the name popularity is Zipf-distributed. Hence aggressive caching does not necessarily buy us much. That said, I wonder if traces from a university are generic enough to make this comment. Maybe corporations have a different trend? Definitely caching should be useful in scenarios when the number of clients is more varied and large (e.g., ISPs) .
This paper does seem to point out some interesting results about DNS performance and I would vote for keeping it in the reading list.
Development of the Domain Name System
DNS had the following design goals - do at least as much as HOSTS.TXT, allow the database to be maintained in a distributed manner, no obvious size restrictions on names, inter-operability and tolerable performance overheads. Name servers and resolvers are the two main components of DNS.
Caching is the main trick why DNS works as well as it does. This paper acknowledges that caches could be a point of mischief. Of course, they have been proved right as cache poisoning is quite a scary thing. Cache poisoning has been an important area of research and rightfully so given its importance. While this paper is neat and I enjoyed reading it, I would vote for replacing it with some paper on cache poisoning.
Thursday, October 30, 2008
P2P Survey
Some of the approaches like a central database and broadcast are obviously out of the window for the scale that is being aimed for p2p systems. But given this is a survey paper, I would have wished the authors gave it a little more credit – these approaches do work in many limited but useful settings. The dismissal of the “superpeer” idea irked me. I think in practice, this approach should work very well and might actually provide better “guarantees” if superpeers were maintained by some commercial entity. To me, the p2p era became a “bubble” mainly because such un-sexy approaches weren’t given much importance (fair enough, you can’t get them published!).
The lookup problem has been addressed using ring-based approaches (Chord) and tree data structures (Pastry, Tapestry and Kademlia). Differences between them include distance functions that are combination of being symmetric and unidirectional, cost of node join/failure operations and topology-aware routing.
This whole space seemed a little disconnected from reality – who really is the customer for such smart and sophisticated lookup schemes? At some point, it comes across as over-optimization and complexity that we can do without. That said, I think there were quite a few positives and overall an interesting area.
I would like this paper to be substituted with some paper that shows a real system based on any of these schemes. If a survey paper is a must, my vote is for “The impact of DHT routing geometry on resilience and proximity” at SIGCOMM 2003.
Chord
Chord is based on consistent hashing that has a nice property of balancing values across the image space. In my opinion, showing the community the value of consistent hashing in the then important p2p space (though I believe it was a touch magnified) was a crucial contribution. Each node and key is assigned a unique identifier by the consistent hashing function. I found the representation of the space as a “ring” to be quite beautiful. Every node is responsible for a range of key values and keeps pointers (in its finger table) to nodes at distances that exponentially increase. Queries can be resolved in log(n) time. A beautiful intuitive scheme – genius!
While Chord in its original form can be faulted for not considering the underlying topology in its allocation, we should remember that this was the beginning of the p2p storm. Follow-on’s have been published where people take into account such optimizations too. An interesting question is what happens to the guarantees of consistent hashing when such optimizations take place – the allocation is not something completely under its control.
Chord has been used in multiple contexts and is also applicable in datacenters. While it can be argued that storage of state isn’t that big a deal, I still think there is value in having to store lesser state and be more resilient to node failures and key redistribution (given that DCs mostly have machines with commodity hardware).
I would have liked its results to be compared to Pastry and OceanStore; the contrasting in the related works section came across as hand-waving (“other details").
Overall, I think it was a very nice paper (I believe it is the most cited in the last decade or something) and is an obvious keeper!
Tuesday, October 28, 2008
Active network vision and reality
This paper is based on observations and feedback from the ANTS system. The system uses capsule-based forwarding. Capsules contain code that can either be actually present in the packet or referenced locally (taking advantage of caching). Accessibility is an important goal where code from untrusted users should not do any harm to users of other services. The authors discuss techniques to guard against malicious code but also do some hand-waving by saying that this is analogous to malicious users in the Internet monopolizing bandwidth. It sort of seems to be saying, “There are bad guys anywhere you go…that doesn’t mean dealing with them is the highest priority”. Not the most convincing given that the proposal is to run arbitrary code in other machines.
For the most part, I tried not to cloud my reading with my skepticism for active networks. Were there any useful concepts that came out of active networks that people use in other areas? For example, did network management and upgrades become easier and elegant with active networks? That would be a good discussion to have in class – While active networks as a whole did not make much progress, what concepts of it became useful.
I didn’t like this paper but I would vote for keeping it on the reading list for that specific reason. People should have the chance to read it and make their call.
Resilient Overlay Networks
I liked the fact that RON gives applications control over the routing protocol. Applications can register RON router and RON membership manager that implement the routing protocol and the consequent decisions. This spirit is similar to the class project we are working on – ask the needs of the application and take decisions on network interfaces accordingly. While this might not be ideal for every single application, the ceding of control is a fundamental shift; applications that do not want it can use a default module. That said I am not a fan of frameworks/architectures that suggest that applications have to be re-written. Also, an interesting extension would be to offer applications the need to monitor custom parameters and not just use throughput, latency etc. that are available.
Their motivation scenario where an ISP deploys a RON to provide better customer services seemed compelling. I wonder if ISPs actually use it – unlike many other motivating scenarios for overlay networks, this one is actually motivating! :-) But I think they blame BGP a little too much - it's the mess that BGP gets in with policies that results in its sorry performance sometimes. It is not clear how RON interacts with BGP policies...
The scalability aspect (or the lack of it) raises some questions. What happens if RON actually becomes very popular and everybody deploys one? It is not clear if aggressive probing by the numerous RONs would do the network any good. I suspect not but an evaluation that points out the cases when it won’t would be nice to see. Another aspect is fairness - isn't RON being unfair to non-RON users by being very aggressive?
Their result that two-hop paths are good enough to route across most outages seems surprising and non-obvious. I generally prefer identifying worst-case scenarios too where the system would not work.
Overall, I have no doubt that this paper should be retained! It was a good read and introduces many important points and concepts. I get a feeling that most of these important papers that we are reading are out of MIT and that is noteworthy ;-)
Thursday, October 9, 2008
XORs in The Air: Practical Wireless Network Coding
Ah, this paper that has a mechanism true to its name! Routers use the broadcast medium in wireless networks to estimate packet reception probabilities and optimally encode the packets to minimize transmissions. The idea is clearly explained in Figure 3 of the paper. Given the knowledge of which nodes have received which of the packets, the router can accordingly encode the packets and rely on the nodes’ ability to decode the packets. The routing algorithm, COPE, uses the ETX metric to estimate the probability of which of the nodes have received the packets. I didn’t quite understand their explanation of why reception reports won’t work. The authors mention that during periods of high congestions, reception reports are likely to get lost. But arguably, the potential for any smart encoding is going to be limited under high contention as the ETX values are going to very low too (estimating that dynamically is another major headache!).
The coding+MAC gain of COPE was interesting. MAC protocols aim to achieve fairness among all the nodes though the load on the routers and the end-nodes are vastly different. COPE reduces the effects of this “unfair” fairness (!) and also helps in draining the router queues faster.
COPE increases the throughput of UDP flows significantly, but not much for TCP.
I like COPE better than ExOR as it seems applicable in a more practical scenario. We should definitely keep it in the reading list.
ExOR: Opportunistic Multi-Hop Routing for Wireless Networks
This paper uses the inherent broadcast mechanism in wireless networks to increase throughput of multi-hop wireless networks. I think the idea of late-binding in routing decisions is very smart. Traditional routing protocols choose the best route beforehand. ExOR broadcasts every packet with a “forwarder list”. The list contains the set of packets that are likely to be closest to the destination. But since other packets hear it too, we inherently end up with redundancy in routes.
ExOR operates on batch of packets. The source node includes a list of candidate forwarders for each packet, essentially an ordered list of nodes prioritized by their distance to the destination. All nodes that receive the packets wait till the end of the batch. Using a timer for synchronization, nodes transmit the packets they have received in order of priority. This keeps moving forward till the packet reaches the destination. The forwarding list is estimated by the source by having a “link-state” of the entire network. The metric used is ETX. ETX ensures that this seemingly crazy method of broadcasting at every step is making forward progress towards the destination. After 90% of the transfer is complete, ExOR falls back on traditional routing mechanisms. Overall, I think this is a very smart idea and the fact that they got it to work on real networks is impressive.
By design, this mechanism increases latency. Hence it is not suited for streaming, web etc. Given that the only practical example of a multi-hop network that I know of is RoofNet (or networks that increase the coverage of a wireless network), and Internet access typically consists of web accesses, I wonder how useful ExOR is in practice.
All this variable latency per packet might leave TCP totally confused! If TCP isn’t happy with ExOR, this makes its case even weaker. What are the other scenarios? Maybe applications that work over intermittent networks…
Also, this idea seems unsuited to scenarios that concentrate on power optimization. If nodes mostly sleep and wake up only periodically as in sensor nodes or PSM in Wi-Fi (does it make sense in ad hoc Wi-Fi?), ExOR is not too useful.
If you buy into the idea of multi-hop wireless networks, I think ExOR is a good paper to keep. I vote for it.
Btw can somebody tell me what has ExOR got to do with the XOR operator? J
Tuesday, October 7, 2008
A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols
Simulations are the basis for testing most new proposals, and this paper extends ns2 to include a realistic model of the physical layer where the signal attenuates based on the distance between nodes, a MAC on top of it and a few other things. The signal attenuates depending on the horizontal distance between nodes, but I wonder if there is a component that has got to do with the height of the object above the ground (the vertical distance). Anyway, this is a reasonable model and the authors proceed to test the performance of the available ad hoc routing protocols using their enhanced simulator. This extension in itself is quite valuable and it would have been nice for them to spend more time describing the details of their implementation (it was hardly a page!).
The routing protocols evaluated are DSDV, TORA, DSR and AODV. DSDV is a hop-by-hop distance vector routing protocol that requires each node to periodically broadcast routing updates, guaranteeing loop-free routing. TORA is based on the link reversal algorithm. It discovers routes on demand, and minimized communication overheads by localized reactions. DSR uses source-routing. AODV is a mixture of DSR and DSDV. DSR and AODV are familiar from undergrad days when I used to work on ad hoc routing protocols. The question I had then (and still have now) is the practical significance of such ad hoc routing protocols. It gets even more unclear with assumptions of mobility. Where exactly in real-life do these scenarios crop up? While they definitely seem interesting problems to solve, I am yet to get the significance of these problems in real-life.
Now on to their methodology. The mobility pattern they use is “random waypoint” – nodes move to another random point with certain velocity. This seems a reasonable model but I fear if this would lead to designing protocols for the worst case scenario. Clearly, nobody moves randomly in reality and hence one can do a whole bunch of optimizations to predict routes. Let’s see if there are interesting insights into how this has affected MANET routing protocols in class…
Results: (1) DSDV-SQ and DSR use routes very close to optimal, (2) TORA has a high overhead and (3) DSDV-SQ is not very sensitive to node mobility.
The little paragraph where they talk about not using TCP raises more questions than answers. Probably a tone that said “While our current results are valuable for
While I admit that my reading was clouded by my lack of understanding about the real-world significance of MANETS, I didn’t find this paper worth keeping. I would vote against it.
A High Throughput Path Metric for Multi-Hop Wireless Networking
Minimum hop-count is the metric widely used for route selections. This is sort of a natural extension from the wired world where a link is treated as up/down. The wireless world requires a more nuanced dealing and this paper starts off with experimental results showing why hop-count often doesn’t pick the route with the best throughput. Links have varying losses and asymmetry. Picking a route based on hop-count is clearly meaningless especially under churn.
ETX to the rescue! Expected Transmission Count (ETX) tries to estimate the number of link-layer retransmissions. Dedicated link probe packets are sent to calculate the delivery ratio of the link. Each node broadcasts probes periodically, and nodes calculate the ratio of the number of packets received to the number of packets that would have been transmitted, in a window. Inverse of the products of the delivery ratios in both directions is the ETX of the link. ETX of a route is the sum of the link metrics. The authors show the elegance of their metric by easily retro-fitting their metric into existing routing protocols.
I had a few questions on the paper:
1. How does the metric function when different node pairs have links with different bit-rates? I guess we could incorporate this information into ETX and calculate the end-to-end throughput for route selection.
2. While the authors have been clear that load balancing is not what they are targeting, I wonder how different ETX would behave when the network is loaded.
3. While on load, a sensitivity analysis on the significance of w, the window used for calculating the ETX of the link would be interesting. Like most systems this is a parameter to be “tuned” – high values would make it less susceptible to transient changes while low values would make it agile.
4. I wasn’t clear on how “interference between successive hops in a multi-hop path” is captured by ETX.
5. ETX seems to give equal weightage to delivery ratios in both directions – can we optimize this? I am not sure how link layer acknowledgements work, but is every packet acknowledged? If not, shouldn’t I assign more weight to the forward delivery ratio?
Overall, I vote to keep this paper in the reading list. It presents a landmark shift and also encourages discussion with a bunch of related/unanswered questions.