In this Sunday Read with Horton edition we take a closer look at the selection of papers about Distributed Consensus provided by Camille Fournier (Zookeeper PMC) as part of the RfP (Research for Practice) of the ACM. For Hadoop practitioners distributed consensus is best know as Apache Zookeeper, which supports most critical aspects of almost all Hadoop components.
The Chubby lock service for loosely-coupled distributed systems [*]
Burrows, M. 2006
Zookeeper is essentially based up on this publication in 2006 by Google. Chubby is Google’s distributed consensus system backing the Google File System (GFS) and BigTable.
In this paper the authors highlight their design decision to implement consensus as a service rather than a library implementing a consensus protocol. This proofs to be a very good decision looking at the Hadoop ecosystem and the way it leverages the use of Zookeeper.
Surely another design decision that made Chubby so successful discussed in the paper, is the implementation of locks and discovery services ontop of a file system like structure, which makes it easily comprehensible by developers.
Among other aspects I found it interesting that the authors mention a Chubby cell being in use across data centers with thousands of kilometers apart.
Important to note is the fact that Chobby is designed forward coarse-grained and not fine-grained locking. A fact that is also true for Zookeeper and as to be kept in mind when architecting distributed systems around it.
Paxos made live—an engineering perspective [*]
Chandra, T. D., Griesemer, R., Redstone, J. et al. 2007
In this paper the authors describe the effort needed to turn a protocol proofed by couple of lines in pseudo code into a production ready implementation using a higher level programming language. Distributed computing remains challenging and involved.
“A distributed system is one in which the failure of a computer you didn’t know existed can render your own computer unusable.”
—Leslie Lamport
Especially the authors highlight the need to work with permanent storage that has a finite capacity and can get corrupted, just as Zookeeper persists value changes by writing an image of it’s database to disk.
Additionally a master lease is being introduced to ensure read operations are not served from invalid state as a new leader has been elected.
In search of an understandable consensus algorithm [*]
Ongaro, D., Ousterhout, J. 2014
- Paxos is exceptionally difficult to understand
- Paxo does not provide a good foundation for building practical implementations
Almost as a testament to the second drawback mentioned by the authors is the also here listed publication of Paxos made live, which demonstrates the hassle of implementing a Paxos protocol that can be used in production environments.
Raft benefits from it’s concise design around a state machine around Leader, Candidate, and Follower.