As I noted in a previous post, we’ve been asked to look into the security aspects of running a Tendermint validator on the Cosmos Hub. In order to assess if the proper controls are in place, we must consider what the salient risks are against a system. There has been a lot of discussion of risks across blockchain networks in general, for instance denial of service attacks, Sybil attacks, eclipse attacks, routing attacks, crypto attacks, block discarding / withholding attacks, double spend attacks, and long range attacks. There is a great deal of overlay and interplay between these risks – something that is itself worthy of its own blog post. Tendermint addresses most of those risks through a Proof of Stake Partial Byzantine Fault Tolerant network design. The major implication of this design is that it puts higher weight on the security of the validators: in particular, if an attacker can subvert control of enough validators to control over ⅔rds of the stake in the network, Game Over.
Tendermint has one key feature to encourage validators to be secure: punish those that are seen to behave badly –that is, exhibiting signs that they have been compromised or attempting one of the above attacks; this punishment is a loss of stake. Since both validator owners and those that delegate their stake to a given validator have an economic investment in their stake, this means that owners and delegates take an actual financial loss for poor security. Put in a positive light, they are economically incentivized to have good security.
So what does good security look like to a validator? In order to determine that, we need to first determine what the risks against it are; that is the purpose of this post. To find the risks, we will examine the assets, threat actors, subsequent threats, the likelihood and impact of those threats, and finally pull those together into an overall risk assessment. We close with some questions. We will start with a forward addressing copyright and some definitions:
Forward / Copyright
Risk assessments are one of the services we perform at BuboWerks. Most risk assessments are performed under contract to a client and are subsequently considered client confidential work. Tendermint and Cosmos Hub are open source projects, and have hence provided us the opportunity to perform a risk assessment for the community. This has the distinct benefit for BuboWerks of being able to showcase to potential clients what a risk assessment looks like. Obviously, every environment is fairly different — enterprise environments can have orders more assets and many more threat actors. Nevertheless, this work should give a general idea of what a risk assessment covers.
We at BuboWerks hope this work benefits the Tendermint and Cosmos Hub community. That said, we also plan to use this work to perform paid services for members of the community, and recognize that others may wish to do the same. As such, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. We are happy to discuss licensing this work for commercial use — just contact us!
- Sentry System: A system (as defined below) that runs a Cosmos Hub full node that acts as a public or private sentry node for a validator; most Sentry Systems will be addressable from the public Internet.
- Support System: A system (as defined below) that provides support functions for Validator System(s) or Sentry System(s), such as monitoring, administration, or backup; also includes the computer network(s) connecting the Systems in a Validator Deployment, including networking hardware such as routers, switches, and VPN gateways; most of these systems should not be directly addressable from the public Internet.
- System: The sum total of the hardware, operating system, application software, and any related components (such as firmware) that compose a single logical computer system
- Validator Deployment: The sum total of Validator System(s), Sentry Systems, and Support Systems that make up a logical Validator
- Validator System: A system (as defined above) that runs a Cosmos Hub validator.
The core asset for a validator might appear on the surface to be the validator system, but it is in fact the keys it uses to sign blocks. The system itself is still important, but not critical, as the system can be lost (say through natural disaster) but the validator can continue to operate seamlessly, assuming a failover system and the original keys. These keys (particularly those from numerous validators) would allow a sophisticated attacker to launch many of the above mentioned blockchain attacks, either on the original validators, or the attacker’s cloned validators. The destruction of these keys could also cause the network to adopt a new validator that is already under the attacker’s control. Beyond the core validator, there are sentry nodes, support systems, and the network itself.
Security goes beyond just computer systems, however. A validator needs its stake to be one of the 100-300 Cosmos Hub live validators. We can decompose that stake to the stake contributed by the validator owner, and that contributed by delegates. Similarly, the ability of the validators to remain viable is related to its reputation, and ability of its team to execute. The reputation includes any public presence, including websites, social media accounts, Cosmos Forum accounts, and press interviews. The team itself can be further broken into key personnel and other staff, where key personnel are those who provide an essential element of the validator’s operation, be that stake, operational know how, security access, or something else.
In summary, we have this list of assets:
- Validator public/private keys
- Validator system(s)
- Sentry systems
- Support systems
- Owner stake
- Delegate stake
- Validator reputation
- Key personnel
- Other staff
Threat actors are any entity that could damage the above assets. Many of these threats are not malicious: someone who has lost stake in a validator does not care much whether the loss came from a malicious party or force majeure; furthermore, most unintentional damage could be carried out maliciously, so it is best to protect against threats holistically. Through this lens, we come up with the following list of threat actors:
- Force Majeure: Events outside the control of any other parties listed here, inclusive of so called “acts of god” or “mother nature”, as well as extreme events such as serious rioting and acts of war
- Resourceful Attackers: External attackers with significant resources, including nation states, non-state actors (such as terrorist groups), and organized crime; motivated by significant economic payoffs or harm
- Opportunistic attackers: External attackers without significant resources who may be looking for easy resources to steal or deface; motivated by fame or minor financial incentives; given that they will have visibility into what the stake for each validator is, that could inform decisions to attack availability in exchange for a ransom
- Insiders – Key Personnel: Just as key personnel are an asset, anyone with an essential element of the validator’s operation may also cause significant damage, whether accidental or intentional; generally key personnel are motivated to act in the best interest of the validator, but they also have the greatest opportunity to do damage
- Insiders – Other Personnel: Anyone with internal access to the validator deployment may cause damage, whether accidental or intentional; internal access is defined as any access to any component that does not go through the public Tendermint P2P protocol, including both privileged and regular user access (there is an assumption that most personnel will have privileged access to at least one component, hence there is little value in separating these two user classes); typically motivated to act in the best interest of the validator, other personnel will sometimes behave badly, especially if they feel slighted
- Delegates: Anyone who assigns their stake to a validator; as with insiders, will generally act in the best interest of the system, although can behave badly if they feel slighted, or if they are investing with ulterior motives
- Other validator operators / delegates: Anyone invested (by virtue of labor or stake) in a different validator; will generally be motivated by actions that give their validator a greater stake, hence producing more returns for themselves; however, a larger threat comes here from other validators who become compromised themselves
We can construct a matrix of threats based on what each threat actor may try to do to each asset. This matrix can be viewed in the three appendices on threat likelihood, impact, and uncontrolled risks. In this section we review the full list of threats with a brief description of each.
- Compromise due to exploited trust link from sentry or support system: validator systems at a minimum must communicate with sentries and support systems. These trusted communication links could be exploited to gain access to the validator. In particular, these communication links will typically be how the validator system is accessed for ongoing administration; if an attacker has access to the system that is logging into the validator (such as the sysadmin desktop), they can at a minimum piggyback on that remote session.
- Compromise due to exploited trust link from validator or support system: As with validators, sentries will have trusted links that can be exploited to access them.
- Compromise due to infected USB device: A machine can be infected with targeted or general purpose malware if an infected USB device is plugged into it; this is certainly more prevalent on certain operating systems and with systems like desktops where the user sits physically at the system, but has even been used as a vector for air-gapped machines.
- Compromise due to MitM attack: Once the realm of theoretical attacks and local area network proofs of concept, recent large-scale redirection attacks have demonstrated that sophisticated attackers are capable of inserting themselves into the middle of network traffic. In the past five years we’ve seen these types of attacks escalate to redirect traffic to the attackers servers to serve up false information (including an attack on Bitcoin), so it stands to reason it could be used to provide Trojaned updates to a system, subsequently providing a backdoor for the attacker.
- Compromise due to other network vulnerability: The various services that run on modern hosts — even outside of what Tendermint specific services it is running — present an attack surface where vulnerabilities tend to arise. Given a vulnerability in any network software, an attacker is bound to exploit it.
- Compromise due to other network vulnerability over trusted link: As with other network vulnerabilities, some hosts such as validators may only be accessible over trusted links — even with such inherent architectural protection the other end of the link could still exploit that trust to attack the protected system.
- Compromise due to phishing: Phishing is likely the most successful — and hence prevalent — form of attack today. This generally involves an attacker getting the target to go to a site the attacker controls but that the target thinks is legitimate and providing their credentials or other protected information. More sophisticated forms exist as well that start to look more like MitM attacks.
- Compromise due to phishing / phishing hole: An expansion of the basic phishing attack, the phishing hole is where the attacker incites the target to go to a website (such as through use of a sensational headline), which may be malicious in nature, subsequently attacking the target’s browser or plugins (especially Java, Flash, or PDF) and gaining control of the attacker’s computer.
- Compromise due to ransom / extortion / bribery: This is where an insider may abuse their position, cutting off (or threatening to do so) other’s access to the asset and use their exclusive control of the asset as leverage to get what they want.
- Compromise due to supply chain attack: While the supply chain could include compromised hardware (such as the substitution of malicious chips), the real concern these days is the software supply chain. Operating system vendors, utilities vendors, infrastructure vendors, and application vendors have all been hacked. Especially if a specific system or systems were targeted, an attacker could easily substitute a trojaned package and likely never be noticed.
- Compromise due to Tendermint network vulnerability: Validators and Sentries at a minimum need to run the Tendermint network services. Any vulnerabilities here could be exploited by an attacker (either directly; or via malicious transactions, consensus messages, or blocks) to gain access to those systems.
- Compromise due to Tendermint network vulnerability over trusted link: As above, but specifically executed over a trusted (private network) connection.
- Compromise for botnet due to network vulnerability: Opportunistic attackers will typically monetize their control of machines by either adding them to a botnet or ransoming their control of it to the owners. Should there be any known vulnerabilities in network services, opportunistic attackers will find them.
- Compromise for botnet due to other network vulnerability: As above, called out for network services other than Tindermint services for validators and sentries.
- Compromise for botnet due to phishing / phishing hole: As above, but gaining control of the machine by getting the user to go to a site under control of the attacker rather than attacking the network service directly. With the use of firewalls and attack surface reduction, this is a far more popular means of gaining a foothold on systems these days.
- Compromise keys due to compromise of backup / support system: Even if keys are secured, any copies of them in backups or on other support systems could still be compromised and used to clone the validator to malicious ends.
- Compromise keys due to compromise of validator system: Even if the keys are secured, if they can be used from the validator (such as to sign blocks), then an attacker who has control of the validator can get anything they want signed by the keys.
- Compromise of any accessible assets due to extortion: Even if a validator has perfect technical controls, it will still be operated by people. Anything those people can access is vulnerable if the people are vulnerable, and in fact, most successful attacks these days occur due to failures by people, be it phishing, social engineering, or — in extreme cases — extortion. There have been numerous cases of armed robbery involving cryptocurrency — it stands to reason that given a sufficient upside, attackers will resort to blackmail or threats of physical violence to achieve their goals.
- Compromise website to add false, misleading, or inflammatory statements due to network vulnerability: Validators only have value to their owners and delegates if they are one of the accepted validators in the network, meaning if a sufficient number of delegates pull out their stake, the validator may be dropped, leaving no value for those who remain. Delegates will choose where to invest their stake based on the reputation of a validator, or pull out that stake if the validator develops a negative reputation. Consequently, an attacker may attack the validator’s website — which may be completely separate from all other validator operations, possibly even outsourced — to place information that is false, misleading, or inflammatory (such as hate speech). In order for the validator operators to claim that it was not them who made the statements, they must admit they were breached, which still tarnishes their reputation.
- Death due to natural or unnatural causes: Everybody dies eventually — sometimes it’s preventable, sometimes not. The threat to the organization is: will they be able to carry on without this person. This has implications both in terms of operational duties (can anyone else perform them?), as well as legal implications (what happens to their stake in the validator and other ownership in the organization?).
- Destruction due to event that destroys data center: Data centers tend to be very robust buildings, designed to withstand extreme weather events and with physical security to protect against physical attacks such as bombs. Nevertheless, no building is indestructible, be it from extreme acts of nature, such as earthquakes or tornadoes, very creative attackers (think 9/11), or other unexpected catastrophes (failure of the dam for that cheap hydroelectric power or meltdown of the nearby nuclear power plant).
- Destruction due to sabotage: Sometimes people flip out — whether due to some misguided rational thought process (such as being slighted by their employer), or due to a loss of rationality all together. This can result in the person causing huge amounts of damage (physical or logical) to anything they can access.
- Extreme event leaves personnel unable to perform duties: Any type of force majeure may render a staff member unable to perform their duties, perhaps because they are trapped at home without power, evacuated without network access, or something of that nature.
- Hardware Failure: Computers are ultimately machines, meaning that they have some finite lifespan and will fail at some point.
- Keys on discarded hardware: When hardware dies or reaches end-of-life, it gets discarded. After that, the organization loses control of it. A dedicated attacker might attempt to target acquiring that hardware to learn the organization’s secrets, and even an opportunistic attacker might buy discarded hardware to see what they can ransom an organization for.
- Loss due to compromise of the holder’s wallet: Tendermint holds all the stake for validators in trust. Should the storage mechanism for that stake be compromised (be it the compromise of a wallet or analogous issue), then the stake the owners and delegates have in a validator could be siphoned off, dropping the validator from the pool, and leaving them with a loss of both their stake and income from validator operations.
- Negative forum reviews: As discussed above, validators rely heavily on their reputation. A sophisticated attacker may try to manipulate public opinion through false and misleading statements on public forums to try to drive delegates away from a validator, particularly so that validator will be dropped from the pool and the attacker’s can be picked up.
- Negative forum reviews for extortion: As discussed above, validators rely heavily on their reputation. Some opportunistic attackers and even unscrupulous delegates may attempt to extort the validator operator under threat of a negative review.
- Network unavailable due to DDoS ransom: Many DDoS attacks these days are conducted by opportunistic attackers who promise to end the DDoS if a ransom is paid to them.
- Posting of hate speech or other inflammatory material: As discussed above, validators rely heavily on their reputation. While we acknowledge that everybody has opinions that may be unpopular, a good business leader knows when to keep these thoughts to themselves, for fear of tarnishing the brand. We would like to think that this should go without saying, but it would seem otherwise.
- Stake flip attack: This is a new threat in Proof-of-Stake systems wherein a very resourceful attacker will invest significant stake in a validator, then when they want their validator in the pool, they will in short order unstake their investment in the one validator and place a significant stake in the other validator. Tendermint attempts to protect against this by creating a three week unstaking period, during which time that stake is not available to the delegate; this should deter the average delegate, but will probably not deter a resourceful attacker, who can stand to have double the resources tied up in the attack. This attack played out across validators holding two-thirds of the total network stake could allow an attacker to take control of the network in short order. More realistically, doing so to one-third of the network stake and violating consensus will bring down the entire network. Even without that catastrophic result, the attack could still deprive the validator owner and other delegates of income. We have put up a longer post detailing the stake flip attack.
- Tie up stake with end of operations: Generally, validator operators are not going to put in the effort and expense of starting up a validator and then just walk away from it. Sometimes, stuff happens though — be it death (see above), insolvency, or some other reason, a validator may need to shut down. When this happens, the stake of the other owners and delegates will be tied up for three weeks, depriving them of income during that time.
- Unavailable due to data center unavailability (power, cooling, etc.) due to disaster, weather event, etc.: As noted above, data centers are generally robust, but even extreme weather events will bring them down. In particular, hurricanes such as Katrina and Sandy have been major contributors to data center outages, as there is only a certain amount of water that a data center can withstand. We expect with the increase in extreme weather events this will continue to be an issue.
- Unavailable due to DDoS: Since validators are penalized for downtime, a DDoS attack against the validator or any of its supporting infrastructure (sentries and support systems) presents a distinct threat to validator operations.
- Unavailable due to DDoS over trusted link: As above, but leveraging private network connections established for trusted communications.
- Unavailable due to ransom / extortion: Just as an insider may control or threaten to control an asset for the purposes of ransom or extortion, they may bring a system down (or threaten to do so) in order to have their demands met.
- Unavailable due to ransom / ransomware: Opportunistic attackers are frequently using a variant of malware known as ransomware to encrypt the entire contents of a system, rendering it unusable until a ransom is paid.
- Unavailable due to targeted network outage: While DDoS attacks are the most popular way of knocking a system or systems off the Internet, an attacker may also use a much more targeted network outage, including means such as blackholing traffic or exploiting a vulnerability on the validator network’s hardware.
- Unavailable due to targeted network outage over trusted link: As above, but done utilizing a private network established for trusted communications.
- Unstaked due to compromise of the owner’s credentials: Should the owner’s credentials for a stake be compromised, that stake could be unstaked.
- Unstaked due to ransom / extortion / bribery: It is expected that delegates will move around some, and there will be some ebb-and-flow for the exact stake a given validator has. If a given delegate, or set of delegates, has a significant stake in a validator, they may threaten to unstake it (hence taking the validator out of operation), unless their demands are met.
- WAN unavailable due to data center unavailability (power, cooling, etc.) due to disaster, weather event, etc.: One aspect of data center unavailability worth calling out from the others is WAN availability, as this may be affected separate from the rest of the data center, particularly due to a backhoe attack.
- Website unavailable due to DDoS: As noted above, the validator website may be operated independently from the rest of the validator infrastructure. Consequently, it may be vulnerable to a DDoS attack, which would subsequently raise questions to outsiders (such as potential delegates) about whether the rest of the validator infrastructure is vulnerable as well.
See the appendix below for the full threat likelihood matrix. It takes each of the threats discussed above, correlated to asset (rows) and threat actor (columns) and color codes the threat as low (green), moderate (yellow), or high (red) likelihood. This is the likelihood of the threat occurring “in the wild” — that is, without the protection of compensating controls. For example, a sophisticated attacker will certainly try to attack the network services on the validator, but with proper controls like firewalls and limiting network services, the likelihood will drop. The likelihood can generally be viewed as probability over time. Events that are likely (more than 50% probability) to happen within a matter of years are high, those that are likely to happen in a matter of decades are medium, those that are likely to happen less frequently are low.
Generally, the biggest threat comes from resourceful attackers — particularly those who are looking to gain some sort of control or access to the validator as part of a larger campaign against the network. Other personnel also present a distinct threat, absent any controls — this is largely reflective of the insider threat. Key personnel are generally not regarded as likely a threat, as they are expected to generally be interested in operating in the best interests of the system, so as to protect their investment; the key exception to that is the tendency for these personnel — who are in a position of power — to abuse that power and order employees to do something even when the employee knows it is not in the best interests of the system — this is particularly a threat as the key person may do so without an understanding of the true risks. The threats from other validator operators come from an assumption that some other operators may be unscrupulous and peering with them without proper technical and procedural (due diligence) controls presents distinct threats.
See the appendix below for the full threat impact matrix. As with likelihood, it takes each of the threats discussed above, correlated to asset (rows) and threat actor (columns) and color codes the threat as low (green), moderate (yellow), or high (red) impact. High impact events are those with the potential of ending the operations of the validator. Moderate events are those that can incur a significant remediation cost. Low events are those that will have to be dealt with, but should be manageable in the normal course of events. As with likelihood, the impact is assessed without the benefit of any compensating controls; for example, a weather event that destroys the keys will be game over for the validator, unless there are backups or a redundant copy of said keys.
While the likelihood ratings were based largely on the threat actor, impact is tied much more closely to the asset. Anything that makes the keys unavailable is high impact. Similarly, most threats against the validator are high impact unless they allow the validator to continue to operate, or their effectiveness is limited (such as the amount of DDoS traffic could be pushed over trusted links). Most threats against the sentries are moderate, based on the principal that that a validator likely has multiple sentries, or can at least spin up new ones in short order. Similarly, the validator can continue to operate (although it does so at significant risk) in the absence of most support systems, save for the networks or an event with the potential to take all the support systems out of operation for an extended period. All the threats against the owners’ stake are high impact, as the loss of that stake will often put the validator out of business. The impact from delegator stake is not as great, as delegators are expected to come and go, but may be higher if the threat may impact the stake from many delegators. Threats against reputation are largely tempered by the fact that most readers will take anything posted on the Internet with a grain of salt, but are more serious when they come from within or are indicative of real problems with the operator. Threats against staff — which are generally those things with widespread impact on the organization are generally high, with a few tempered to the moderate level when the operator can exert a bit more control over the events.
If we combine the impact and likelihood, we get the risk levels for the uncontrolled threats shown in the appendix below. If both impact and likelihood are high, the risk is high (red). If one is moderate and the other is high, then the risk is medium-high (orange). If both are moderate or one is high and the other is low, the risk is moderate (yellow). If one is low and the other one is moderate, then the risk is medium-low (lime). If both are low, the risk is low (green). Note that threats may be associated with multiple risk levels, depending on the asset and threat actor involved. We categorize each based on its highest risk level. The rest of this section reviews the threats by risk level.
High risk threats
There are 19 high risk threats:
- Compromise due to exploited trust link from sentry or support system
- Compromise due to exploited trust link from validator or support system
- Compromise due to MitM attack
- Compromise due to other network vulnerability
- Compromise due to other network vulnerability over trusted link
- Compromise due to phishing / phishing hole
- Compromise due to ransom / extortion / bribery
- Compromise due to supply chain attack
- Compromise due to Tendermint network vulnerability
- Compromise due to Tendermint network vulnerability over trusted link
- Compromise of any accessible assets due to extortion
- Extreme event leaves personnel unable to perform duties
- Hardware failure
- Loss due to compromise of the holder’s wallet
- Network unavailable due to DDoS ransom
- Unavailable due to DDoS
- Unavailable due to ransom / extortion
- Unavailable due to targeted network outage
- Unstaked due to compromise of the owner’s credentials
Medium-high risk threats
There were 29 medium-high risk threats, and 12 of those were covered at a higher risk level, leaving 17:
- Compromise due to infected USB device
- Compromise for botnet due to network vulnerability
- Compromise for botnet due to other network vulnerability
- Compromise for botnet due to phishing / phishing hole
- Compromise keys due to compromise of backup / support system
- Compromise keys due to compromise of validator system
- Death due to natural or unnatural causes
- Destruction due to sabotage
- Keys on discarded hardware
- Posting of hate speech or other inflammatory material
- Stake flip attack
- Unavailable due to data center unavailability (power, cooling, etc.) due to disaster, weather event, etc.
- Unavailable due to ransom / ransomware
- Unavailable due to targeted network outage over trusted link
- Unstaked due to ransom / extortion / bribery
- WAN unavailable due to data center unavailability (power, cooling, etc.) due to disaster, weather event, etc.
- Website unavailable due to DDoS
Moderate risk threats
There were 18 moderate risk threats, and 11 of those were covered at a higher risk level, leaving seven:
- Compromise due to phishing
- Compromise website to add false, misleading, or inflammatory statements due to network vulnerability
- Destruction due to event that destroys data center
- Negative forum reviews
- Negative forum reviews for extortion
- Tie up stake with end of operations
- Unavailable due to DDoS over trusted link
Medium-low risk threats
There were five medium-low risk threats, all of which were covered at a higher risk level.
Low risk threats
There was one low risk threat, which was covered at a higher risk level.
The examination in this risk assessment makes some assumptions as to how certain aspects of Tendermint / Cosmos Hub operate. A few open questions to help verify those assumptions are:
- Can a validator operate without the owners having any stake (that is, can it operate solely on delegate stake)?
- Is the stake for validators stored in a wallet, or is it effectively “in the aether?” such as accounted for in the blockchain?
- If a validator ends operations, does the stake invested in that validator still go through the three-week unstaking process to be returned to the investors?
This assessment covered the assets of and threat actors against a typical Tendermint / Cosmos Hub Validator. Given those, a list of 38 potential threats was developed. Those threats were assessed for likelihood and impact to develop an overall risk rating for each threat, absent any compensating controls. We welcome the community’s feedback on the threats and their risk ratings, as well as the few open questions that arose in the course of the assessment. Please leave any feedback in the post comments, below.