Thursday, October 30, 2008

Chord

This is a celebrated paper from the p2p era and addresses a fundamental problem with p2p applications – given a key, locate the node that is responsible for the key. This problem has wider applicability, as the values associated with the key can be application-specific. The authors do a nice job of presenting some potential applications – I especially liked the time-shared storage and combinatorial search breaking examples.

Chord is based on consistent hashing that has a nice property of balancing values across the image space. In my opinion, showing the community the value of consistent hashing in the then important p2p space (though I believe it was a touch magnified) was a crucial contribution. Each node and key is assigned a unique identifier by the consistent hashing function. I found the representation of the space as a “ring” to be quite beautiful. Every node is responsible for a range of key values and keeps pointers (in its finger table) to nodes at distances that exponentially increase. Queries can be resolved in log(n) time. A beautiful intuitive scheme – genius!

While Chord in its original form can be faulted for not considering the underlying topology in its allocation, we should remember that this was the beginning of the p2p storm. Follow-on’s have been published where people take into account such optimizations too. An interesting question is what happens to the guarantees of consistent hashing when such optimizations take place – the allocation is not something completely under its control.

Chord has been used in multiple contexts and is also applicable in datacenters. While it can be argued that storage of state isn’t that big a deal, I still think there is value in having to store lesser state and be more resilient to node failures and key redistribution (given that DCs mostly have machines with commodity hardware).

I would have liked its results to be compared to Pastry and OceanStore; the contrasting in the related works section came across as hand-waving (“other details").

Overall, I think it was a very nice paper (I believe it is the most cited in the last decade or something) and is an obvious keeper!

No comments: