Non-Security

Cosmos / Tendermint Network Architecture

This post is a bit different than our usual offerings in that it’s not about security — at least not directly. We have been asked to look into some security aspects of the Cosmos Network, which is based on the Tendermint blockchain technology. In order to do so, I wanted to understand what its network communications looked like, and so to help others and facilitate discussions, I wanted to capture that here. We’ll take a super quick look at what Cosmos / Tendermint is, what the network components of it are, what the key data components of it are, and then how they talk to each other.

Cosmos / Tendermint

Tendermint LogoLet’s start with the underlying technology: Tendermint is a blockchain technology designed to utilize Proof of Stake instead of Proof of Work (as Bitcoin and Ethereum do), as well as be Byzantine Fault-Tolerant in that it can withstand up to 1/3 of the validators failing (including being actively malicious) before it becomes unavailable, as well as requiring over 2/3 of the validators to be actively malicious before it stores invalid blocks on the chain.

The Cosmos NetworkCosmos Network Logo is a network of blockchains, in the same way that the Internet is a network of networks. The core blockchain in this collection is called Cosmos Hub, and is powered by Tendermint, although other blockchains can be wired up to Cosmos. Such a network of blockchains enables applications like exchanges — not just trading one cryptocurrency for another, but doing things like trading cryptocurrency for energy contracts.

While Cosmos may be able to run different blockchain technologies, the remainder of this post will be focused on the implementation of Tendermint for Cosmos Hub.

Network Components

Full Node: A machine that keeps the full ledger of the blockchain.

Validator: A Full Node that can propose new blocks and vote to accept or reject proposed blocks. Cosmos Hub will be limited to 100 Validators initially, up to 300 after ten years. Validators should be very secure, will be punished if they  behave badly, and as a consequence should not be directly accessible from the public Internet.

Sentry Node: Since Validators should not be accessible from the public Internet, Sentry Nodes are Full Nodes that are typically accessible from the public Internet, essentially acting as an application firewall for Validators by receiving messages and validating them before forwarding them along. Sentries may also be private, either for internal redundancy, or facilitating Validator-to-Validator communication.

Application Node: Applications can technically connect to any publicly available Full Node to interact with the rest of the network, but in practice they will generally run their own Full Nodes for such interactions.

Seed Node: A node that constantly queries the other publicly available nodes to discover their peers such that it can provide a list of “seed” nodes to any new node to connect to the network.

Lightweight Node: A machine that only keeps the most recent block or blocks of the blockchain.

Data Components

Transaction: A discrete and deterministic state change being recorded by an application.

Block: A collection of transactions.

Blockchain: An ordered collection of Blocks, validated by consensus.

Network Communication

By default, Tendermint RPC communication occurs over an HTTP REST interface on TCP port 26657 (what we call the Application Connection below). P2P communication occurs over TCP port 26656. We’ll start with the big picture of how all the components talk to each other, and then zoom in for greater detail on each area.

Cosmos Network Architecture
Cosmos Network Architecture – click for full resolution

Now we’ll take a closer look at the Applications, Other Nodes, Validators and Public Sentry Nodes, Private Sentry Nodes, and wrap up with some anti-patterns. But first, a closer look at the Key for the connections we see here.

Connection Key

Applications

Cosmos Network Architecture - Applications

Other Nodes

Cosmos Network Architecture - Other Nodes

Validators and Public Sentry Nodes

Cosmos Network Architecture - Validators and Public Sentries

Private Sentry Nodes

Cosmos Network Architecture - Private Sentries

Anti-Patterns

Cosmos Network Architecture - Anti-Pattern 1

Cosmos Network Architecture - Anti-Pattern 2

Conclusion

So there you have it, a quick summary of what the Cosmos Network, Cosmos Hub, and Tendermint along with their components and how they all connect to each other. This is still something I am figuring out, so please leave your feedback in the comments and I will update as appropriate.

Copyright

This post, specifically, is licensed as follows:
Creative Commons License
Cosmos / Tendermint Network Architecture by S Terry Brugger, PhD is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://bubowerks.io/blog/2018/07/19/cosmos-tendermint-network-architecture/.

7 thoughts on “Cosmos / Tendermint Network Architecture

    1. Thank you Iancu!
      Since a Tindermint network like Cosmos Hub has a limited number of validators and the ability of the network is dependent on at least ⅔ of the validators being online and operating correctly, validators will be punished for downtime, which will mean a loss of stake (monetary loss) for the operators and delegators of that validator. There are numerous controls that should be implemented to protect against that (to be discussed in a future blog post), but the key one is system redundancy — in other words, there should be a system to failover to in the event of an outage for any reason.
      Failover systems come in three flavors: cold, warm, and hot. A cold failover needs to be turned on and spun-up, sometimes with physical intervention — it is good for low cost when you have a Recovery Time Objective (RTO) on the order of days. A hot failover runs live in parallel with the primary system — often the load is distributed between the systems and if one of them goes offline, the others continue working such that the users shouldn’t even notice the outage — it is good when your RTO is on the order of seconds or less, but is much more expensive to operate since all the systems must be synchronized. A warm failover is inbetween the two, where you have a system running with everything ready to go, but in the event of a failure there might be a delay as the failover system (either through manual or automated means) is brought up to the current state.
      Fortunately, in a blockchain-based system this is much easier, as it can operate as a private peer and always have the current blockchain available, meaning that failover is limited to changing its state to tell it that it’s now a validator and should start participating in consensus activities.

  1. awesome article @drzow! Can i translate this in Korean and spread it to the Korean Comos community? I think it’ll be very helpful for people to understand the Cosmos architecture at a technical level

  2. hey, editor
    Bravo, I really love this article.
    I found the `Connection Key` section is a bit empty. Do we miss some graphs here?
    Thanks in advance.

    1. All of the graphics are in SVG format to try to allow for anyone to load them up and zoom in as much as they need to — if you are not seeing anything, my first guess is it might be your browser or version.

Leave a Reply

Your email address will not be published. Required fields are marked *