TPS is a terrible metric, stop using it to compare systems

Please note: This article discusses and compares different aspects of Ethereum scalability solutions. I work on Optimism, an Optimistic Rollup system, and therefore have an inherent bias in this discussion. I've tried to be as fair as possible while still taking issue with certain statements that I believe to be disingenous. Enjoy!


A very bold statement is made

I recently saw the following tweet which made some bold claims about the theoretical maximum throughput of various Ethereum scalability solutions.

One has to be very careful when making statements about "theoretical maximum throughput" of any system. Theoretical bounds really only make sense in the context of a well-defined problem space. I feel quite comfortable making a statement about the theoretical maximum speed of light. I do not feel comfortable making a statement about the theoretical maximum throughput of an Ethereum scaling solution (because I'm almost certain to be wrong).

As a side note, I think that the author of the original tweet seems to misunderstand the meaning of "theoretical maximum". A reply by the author states that "all of these numbers are subject to change". One would typically expect a theoretical bound to be much less flexible. Is it really a theoretical bound if we can expect to exceed it within the next year? Oh well...

Examining the evidence

Either way, let's attempt to analyze the claims here in a more rigorous manner. The source given for the 100 TPS "max" for Optimistic Rollups is an article by Matter Labs, the company behind zkSync (a ZK Rollup). You should generally be careful when reading claims that anyone makes about their competitors (including the claims I'm making in this article, of course). This is no exception. Let's break down the math behind the claims.

Matter Labs effectively relies on the assumption that computation is unbounded, chains can process as many transactions as you'd like, and any rollup system is simply limited by the maximum number of transactions it can post to Ethereum every second. I do want to dig into that assumption (because it's wrong), but let's use their model to see what an appropriate "theoretical maximum" TPS for an Optimistic Rollup should look like given what we know about building ORUs today.

Matter Labs cites a telegram message (really? is that where crypto's at with citations?) that claims a simple transfer transaction on an Optimistic Rollup runs about ~4k gas to publish to Ethereum. It turns out this number is actually vaguely reasonable for Optimistic Rollups today. A simple transfer on Optimism costs about ~3800 gas on Ethereum. ~1700 of this gas comes from calldata and ~2100 comes from execution overhead of processing these transactions on-chain. With Optimism's recently enabled calldata compression we can expect a ~35% average compression ratio. Altogether we're left with an approximate total cost of about ~3200 gas. Ethereum blocks are currently at a target gas limit of 15m and are produced approximately every 13 seconds, so let's do some math:

# some variables
gas_per_block = 15000000
seconds_per_block = 13
gas_per_tx = 3200

# some math
gas_per_second = gas_per_block / seconds_per_block = 1153846
tx_per_second = gas_per_second / gas_per_tx = 360

360 is clearly bigger than 100 [citation needed], so we've already broken our theoretical limit. But cmon, we're talking about theoretical numbers here. We can do better. Optimism is planning to roll out a major upgrade (Bedrock) which should bring per-transaction overhead close to zero. This means that the gas required to publish a transaction to Ethereum will be purely defined by its calldata size. Our simple transfer transactions will only cost ~1100 gas (after compression) on Ethereum, bringing our TPS all the way up to 1049. We just broke 1000 TPS and we didn't even need BLS signatures!

We can do even better! Crank that TPS up!

I feel like I've probably already made my point about theoretical maximum values. Your predicted maximum should be based on the information you have available at the time. I understand that techniques for reducing rollup costs weren't widely known in 2019, but it's 2022 and these numbers clearly aren't valid anymore. But, since we're talking about theoreticals, we can always do better. EIP-4844 (blob transactions) is expected to bring the cost of publishing data to Ethereum down by a factor of almost 100. Let's be conservative and assume that it only gives us a 10x improvement, we're immediately up to more than 10000 transactions per second!

If you want to get really crazy, we can start to think about sharding. Optimism is designed so that its chain data can be published to multiple shards at the same time. Ethereum is expecting to host 64 shard chains. Let's be conservative again and assume that we only use some of these shard chains and get another 10x cost improvement. Now we're up to a theoretical maximum throughput of over 100000 transactions per second. That's more than Visa's reported 64000 TPS. We've solved the scalability problem!

Bad assumptions lead to bad results

Okay, we've had our fun. 100k TPS is clearly unreasonable, but why? First, our methodology here is completely broken. Simple transfer transactions are a small subset of all transactions, so you can't just compute a maximum TPS based on the unreasonable assumption that all transactions are simple transfers. Still, the average calldata cost of a transaction on Optimism is only about ~2300 gas, so we could still use the methodology above to get to crazy TPS numbers.

Of course, we also don't have free reign over the entirety Ethereum's block space. We're sharing the blockchain with other applications and even other rollups. It's hard to sustain 100k TPS if every transaction is extremely expensive because we're constantly buying up so much of Ethereum's block space. A robust theoretical TPS analysis would likely have to consider the impact of fees on transaction volume. It might not be unreasonable for a popular rollup to consistently purchase some large percentage of every block (20%?), so we're still up in the "many thousands" of TPS (remember, we made conservative estimates of the impact of EIP-4844 and sharding).

Another major limiting factor here is compute. Compute is not free. If compute were free, Ethereum would be much cheaper than it is and we might not have a scaling problem to be solving in the first place. You cannot make the assumption that your client software will be able to instantaneously process tens of thousands of transactions per second. Compute will get cheaper over time, yes, but this also means that compute will get cheaper for everyone. Making the assumption that you can process arbitrarily many transactions per second is like thinking about building a car from scratch and feeling satisfied with never actually doing it because you know that you could achieve it after years and years of effort.

If you calculate things based on bad assumptions, you will get bad results.

TPS still doesn't convey anything useful

At the end of the day, even if you could construct the perfect operating assumptions for a TPS comparison, TPS simply doesn't convey anything useful. I can always spam my network with thousands of small, low-impact transactions and claim that I'm able to handle a large number of transactions. Sure, the number is real, but it doesn't reflect the way my system is going to be used in the real world, by real people. A more accurate metric would probably be to measure the amount of "Ethereum-equivalent gas units" a system can handle every second. You could do this by attempting to convert your system's local representation of gas into equivalent units on Ethereum. Unfortunately, Ethereum gas doesn't map cleanly onto a single dimension because it has to reflect multiple different resources (storage, compute, bandwidth) with a single unit. So even this is always going to be a bit of a stretch.

Again, the point here is that TPS is a bad metric that doesn't convey anything useful. TPS can be manipulated heavily to favor one system or another. It's not a standardized metric and no one ever treats TPS comparisons with the level of rigor necessary to make it even remotely valuable.

So I'll say it one more time: TPS is a terrible metric, stop using it to compare systems.

For people who are looking to compare various different systems, just ignore TPS. You're much better off simply testing the systems out to see what they can actually do. People that talk your ear off about the "theoretical properties" of their system will either never deliver or take years to get to anything useful. Always take vanity metrics (like TVL, ugh, what a bad metric, that deserves its own post) with a very salty grain of salt.

Why do I care?

I care about this sort of stuff because real people read these tweets and make decisions based on them. I find these sorts of TPS comparisons to be at best misguided and at worst entirely deceptive. But at the end of the day I do end up feeling like these comments don't treat users with the level of respect they deserve. We should be creating the best possible world for our users, should we really be pitching them with impressive numbers that don't mean anything?

Maybe I'm just getting old and grumpy, but whatever. Someone has to keep talking about this sort of stuff or it'll never stop.

-kf