This paper argues against providing low-level functions in distributed systems. Choosing appropriate modules and boundaries between modules is an important part of system design. Communication is in general provided as a separate subsystem with a well-defined interface. It illustrates the point with a sample application that reads file from a disk, communicates to another machine over the network that in turn reads it and writes the data to disk. Reliable communication is of the many things the application is dependent on for its correct functioning – other things being correct disk reads and writes, and host crashes. The author suggests that it is important we do not strive for an error rate below what an application wants. There is a clear trade-off between reliability and performance. Also, low-level functionalities might force all applications to use it, regardless of their requirements.
The author reinforces the point against low-level functionalities with examples of delivery guarantees (ACKs, which would seem meaningless without verifying whether the application did what it was supposed to do with the data), secure transmission (inefficient for applications that do not care about security), duplicate suppression (the application might initiate duplicates itself, beyond recognition for the lower layers), and real-time communication (I don’t care as much about reliability as I do about latency). The author finally tones down his arguments and says that end-to-end design is not an absolute rule and is meant to motivate people to identify the “right” end points.
Overall, many of the principles in the paper like providing greater flexibility to tune the low-level modules, and differentiating between policy and mechanisms are very powerful and have found their way into modern systems.
1. What exactly was the state of the art when this paper was published? Wasn’t work on improving reliability for other modules, like file systems, already happening? If yes, why was it so hard to buy the idea of providing modules at the low-level?
1. I find it hard to understand the philosophy of tying something to a specific application (e.g., error rates in communication).
2. Delivery guarantees: Again, why does the web server care what the client did with the data? I can read the data and either render it, store it, do some text-processing on it or maybe just count the bytes. I would be curious to know the setting in which the author was made to think against providing low-level functionalities.