The upcoming release of Hyperledger Sawtooth 1.2 includes Sawtooth PBFT—a new, Cargill-sponsored, production-ready consensus algorithm. Sawtooth PBFT is much more than a basic implementation of the PBFT consensus algorithm that was originally defined in 1999. Sawtooth PBFT has all the core features of the original definition—it provides the same safety and liveness guarantees—but we have adapted the algorithm to take advantage of features unique to blockchains, and extended it to provide more flexibility and robust guarantees.
In two earlier blog posts, Making Dynamic Consensus More Dynamic and Introduction to Sawtooth PBFT, we introduced Sawtooth’s dynamic consensus interface and explained how the PBFT consensus algorithm works from a high level. In this post, I’ll describe the adaptations and enhancements that helped us bring a robust PBFT consensus algorithm to Hyperledger Sawtooth:
- Sequence number simplications (no more watermarks)
- An idle timeout to guarantee more liveness
- Regular view changes for fairness
- A consensus seal as proof of commit
- No checkpoints for garbage collection
- A catch-up procedure for slow and new nodes
- Membership changes
Sequence Number == Block Number
In traditional PBFT, the primary node assigns a sequence number for each request when it broadcasts a pre-prepare message for the request. In Sawtooth PBFT, the “requests” are actually blocks that are created by a Sawtooth validator. Blocks in Sawtooth are ordered; we can take advantage of this ordering to simplify the sequence numbering in PBFT. Instead of assigning a sequence number for each block, the primary node will just use the block number as the sequence number.
This simplification makes the algorithm easier to understand. It also means that secondary nodes don’t need to check that the primary node picks reasonable sequence numbers. Originally, nodes would have to make sure that the sequence number was between a low and high watermark, in case the primary node picked a number that was so high that it exhausted all of the possible sequence numbers. Now that the sequence number is automatically determined by the block number, a secondary node only needs to check the pre-prepare message to make sure that the sequence number matches the block number of the block in the message. It’s as simple as that!
More Liveness with an Idle Timeout
One liveness problem that traditional PBFT does not solve is the “silent leader” problem. The leader is responsible for building and sharing blocks with the rest of the network to start a new round of consensus. Even if transactions are being submitted to the network, the leader could stay silent by never sharing a block with the network; this “silent leadership” would allow the leader to halt progress indefinitely.
To solve this problem, we implemented an idle timeout as a new way to trigger a view change. After a block is committed, each node starts its idle timer. The primary node must publish the next block and broadcast a pre-prepare message for the block before the timer expires. Otherwise, a secondary node will initiate a view change.
The idle timeout provides more robust liveness by ensuring that a faulty primary node can’t stall the network indefinitely. This timeout can lead to unnecessary view changes, though; if the network really is idle (no transactions are being submitted), then the primary node isn’t faulty when it doesn’t publish a block, since the Sawtooth validator does not publish a block when there are no transactions. The cost of these extra view changes is negligible, however; when the network is idle, a view change will not hinder the performance of the network.
Pseudo-Fairness with Regular View Changes
Determining if a leader is acting fairly is a very complex and challenging problem; there is still a lot of research being done to solve it.
To get us a step closer to fairness, we introduced a regular view change mechanism to Sawtooth PBFT. This mechanism is fairly simple: after every nth block is committed, the network will “force” a view change by incrementing its view by one. The number of blocks between forced view changes is called the forced view change interval, which is a configurable setting.
Regular view changes reduce unfairness by not allowing any node to control block publishing for too long; instead, every node gets a chance to be the leader for a period of time. While this doesn’t completely solve the fairness problem, it limits unfairness in a way similar to Tendermint and lottery-style consensus algorithms. This solution is also inexpensive: since the view change is “forced” (the view number is directly incremented) at a consistent point across the network, we don’t need the expensive view changing procedure.
The original PBFT algorithm does not define a reliable way to check if a block was committed validly (that is, when 2f + 1 nodes agreed to commit the block). The only way to check this would be to request each node’s commit messages and count them. The problem with this approach is that old messages might be unavailable if the messages were removed (garbage collected) or if a node is no longer part of the network.
To overcome this limitation, we implemented a consensus seal for each block that is committed. The consensus seal is a signed data structure that contains a block ID and 2f signed commit messages for that block ID. This seal proves that the network agreed to commit the block and that the block was committed validly.
The seal is stored on the blockchain itself, which means that the seal is immutable and can be checked at a later point. A seal can’t be inserted directly into the block that the seal is for, however, due to the way blocks are constructed by the Sawtooth validator. Instead, the seal is added to the next block that is committed to the blockchain; thus, each block contains the consensus seal for the previous block. This approach has a limitation—the most recently committed block can’t be validated using a consensus seal, since there isn’t a subsequent block yet. However, all previous blocks in the chain can be validated to ensure that they were committed validly, which is a useful guarantee.
In the real world, distributed networks face failures, segmentations, dropped messages, and many other hiccups. When these issues arise, nodes may fall behind, so they need a way to catch up to the rest of the network.
Sawtooth PBFT uses a special catch-up procedure that takes advantage of the consensus seal extension. The catch-up procedure is best illustrated by an example.
Let’s say the network as a whole has committed block 100, but a node has fallen behind and has only committed blocks up to block 90. When the slow node eventually gets block 91, it waits for a pre-prepare message from the primary node. Because the network has already moved past this block, though, the node never gets this old pre-prepare message. Instead, the slow node gets block 92, which contains a consensus seal for block 91. The node examines the seal from block 92 to make sure it’s valid, then commits block 91. Our node knows that it can skip the usual consensus procedure because the consensus seal in block 92 proves that the network agreed to commit block 91.
The node follows this procedure until it commits block 99. At this point, it needs to do something different. Because there is no block 101 yet, the node doesn’t have a seal for block 100. So the node broadcasts a request to the rest of the network for a consensus seal for block 100. If another node has already committed block 100, that node builds the seal and sends it directly to the node that requested it. When the node receives the seal, it will verify the seal and commit block 100. The node that fell behind is now up to date!
Who Needs Checkpoints?
As originally defined, PBFT uses a checkpoint procedure to perform garbage collection for each node’s log of consensus messages. In this procedure, the nodes exchange messages to come to consensus on the current version of state. When the nodes have agreed on the current state, they save these messages as proof of the agreement, then discard all previous consensus messages.
Sawtooth PBFT does not need this checkpoint procedure. Because each block contains a consensus seal that provides proof of the validity of the previous block, and each block is immutable and permanent by nature of a blockchain, every block contains a checkpoint that will be kept forever. This means that the nodes can clean up old log messages any time they wish, while the state can still be verified using the consensus seal.
Another fact of the real world is that network membership can change. This happens for a variety of reasons: a node may need to be replaced or a new company might want to join a consortium’s network. The original PBFT algorithm does not define a procedure to change membership of a PBFT network, so we implemented our own.
In Sawtooth PBFT, membership in the network is determined by an on-chain Sawtooth setting, sawtooth.consensus.pbft.members, which is a list of the public keys of all nodes in the network. Since this list is on the blockchain itself, it’s shared and agreed on by the whole network, which makes changing membership easy.
An administrator can update the on-chain setting to add, remove, and reorder nodes by submitting the change as a transaction. Each time a block gets committed, all PBFT nodes check this setting to see if it changed because of a transaction in that block, then update their local state if necessary.
Like forced view changes, a membership change is done at a consistent point across the network: when a block is committed. This consistency makes the local view of membership on each node easy to update in an atomic way. This approach to membership also makes it easy to check the historical membership throughout the life of a Sawtooth blockchain managed by PBFT consensus.
Sawtooth PBFT has a lot to offer. It not only includes all the functionality you would expect from PBFT, but it adds features and optimizations for real-world use. We’ve developed new solutions to overcome limitations of traditional PBFT. Plus, we’ve been able to take advantage of the properties of a blockchain, such as immutability and ordering, to make the algorithm more efficient and robust.
Want to Learn More?
These features are all new to PBFT, and some of the implementations are unique among published consensus algorithms. If you’re interested in learning more about our PBFT extensions, read the regular view changes RFC, consensus seal RFC, and PBFT catch up RFC. Also, check out the Sawtooth PBFT documentation and source code on GitHub.
Do you have questions? Do you have comments or concerns about how the extensions work? Let us know in the #sawtooth-consensus-dev channel on Hyperledger Chat!
Stay tuned for more news and info about the upcoming Sawtooth PBFT 1.0 release.
About the Author
Logan Seeley is a Software Engineer at Bitwise IO. He has been a part of the Hyperledger Sawtooth community since May 2018,