Optimistic EVM proofs, should they be native?

It looks like a new L2 (or perhaps an L2 research project?) is entering the scene! I recently spotted a tweet from a project called Specular that's marketing itself as "the first 'EVM-native' Optimistic Rollup":

Specular is aiming to be an EVM-equivalent Optimistic Rollup (just like Optimism). It's attempting to differentiate itself based on its underlying optimistic proof model.

According to the Specular website, this means that "[Specular] provides stronger security and decentralization properties than existing optimistic rollup solutions". As someone who works on an L2 (Optimism), I think it's important to dive into alternative architectures and come up with a relatively neutral analysis of their pros and cons. After all, Optimism's decision to move away from the single-step fault proof to its now EVM-equivalent design was the result of this sort of analysis. So, what's the deal with Specular?

Specular's design is actually very straightforward. If you want to build an EVM-equivalent Optimistic Rollup, you need to be able to prove the correctness of the EVM. You effectively have two options for doing this:

  1. Prove the EVM directly ("native EVM")
  2. Prove a lower-level VM (like MIPS) and compile something that implements the EVM (like Geth) down to this lower-level VM

Each of these approaches has its own set of trade-offs. Specular is going with approach #1. Optimism considered both approaches during the design process for its upcoming Bedrock upgrade but settled on approach #2 (Proto even messed around with an early python implementation of the EVM native proof system). Let's take a look at their similarities and differences.

EVM Equivalence

From the perspective of an end-user, a complete implementation of either of these systems should look identical. This is because modern Rollup architectures will completely decouple the client and the proving system. In fact, we construct Rollups today with the mindset that the client has no notion that the proving system even exists.

Both of the above approaches will give you EVM equivalence. Smart contracts that run on Ethereum will run on your Rollup, gas should work the same way, etc etc. Generally speaking, either approach will get you where you want to go in the long-run.

Proof Complexity

All of the facets I'm going to compare here will necessarily be quite nuanced because either approach gets you to EVM equivalence. However, nuanced doesn't mean unimportant. I find this especially true when building software in the real world --- theory often comes into conflict with practical implementation. The success of a system is as much a product of the practical effects of design decisions as it is the theoretical capabilities of that system.

One concrete example of a practical concern derived from theoretical design is the complexity of the on-chain proof component. The EVM is a highly complex architecture that was purpose-built for the blockchain environment. Rules about the calculation of the gas usage of any given opcode can be convoluted and the call frame model introduces a significant number of edge-cases. A native EVM fault proof would have to implement all of these edge cases exactly according to the EVM specification.

It's also worth remembering that an EVM-equivalent fault proof actually needs to implement the entire Ethereum state transition function, which includes multiple components on top of the EVM itself. This include things like the transaction pre-check, receipt and log generation, and block generation. All of these components will introduce additional complexity into the final proof contracts.

In comparison, Optimism's complete MIPS proof contracts are approximately ~1000 lines of code in total (broken up into multiple legible files). Since the lower-level VM approach means the proof has no concept of the EVM, none of the EVM's complexity leaks into on-chain components.

Although it's not an impossible task to build a native EVM proof (LeapDAO made some progress towards this in 2018-2020 though never implemented the full STF), it's significantly more effort than constructing a lower-level system. In the real world, this means at least a small team of people with a deep knowledge of the EVM dedicated to building these components. More people means more overhead, and more overhead means a slower time to production.


Another key point of comparison between these approaches is flexibility. By flexibility, I generally mean to refer to the ability for the approach to introduce changes to the network. In the context of the EVM, I think this can further be broken down into the ability to keep up with EVM changes and the ability to introduce new features entirely.

A native EVM fault proof is necessarily at the mercy of changes to the upstream EVM. Ethereum hardforks approximately twice per year and at least one of these hardforks will typically introduce EVM tweaks (some more than others, like the introduction of EIP-1559 in the London hardfork). If a native EVM proof system wants to maintain upstream EVM equivalence, the proof contracts must be upgraded. Taking into account the tendency for hardfork specifications to be flexible up until the spec deadline and the multi-month upgrade process (when accounting for audit time), I would not be surprised if native EVM chains lagged behind the upstream EVM by at least 1-2 months.

A system with a native EVM proof may struggle further when attempting to introduce new features outside of the EVM. Take for example calldata compression which reduces the cost of an L2 transaction by an average of ~30-40% (very significant savings that can possible be pushed down further with stateful compression techniques). Combined with other techniques, compression can take total transaction cost down more than 50% from an unoptimized system. Users really care about cost reduction, so these sort of numbers can make a huge difference.

As with every other part of the Rollup node, the calldata decompression step must also be provable. This means that a native EVM would have to specify their compression/decompression scheme such that it can be proven on-chain. Off the top of my head, I think the only viable approaches here would either be to specify a custom compression scheme or to find (and probably modify) an existing scheme that includes sufficiently small component functions such that each "step" in the process can be easily implemented and executed on-chain. Again, all of this would have to be carefully implemented and audited (on both the client side and the contract side).

In practice I think this leads to the result that improvements like this are either never added or are added at an extremely slow pace. Compare this to the MIPS case where we can simply use off-the-shelf compression libraries (with small chunk sizes to avoid memory issues on-chain). We can add a library, compile the client, update the code hash stored on-chain and we're done, no contract code changes required. This sort of flexibility gives you an immense amount of power to introduce convenience features that improve the user and developer experience.

The lower-level VM design also makes the introduction of new EVM changes relatively trivial. Simply rebase your client changes on top of the latest upstream client code, update the code hash, and start working on the next important thing. Optimism can be compatible with upstream EVM updates essentially immediately after they're released on Ethereum with minimal effort.

I think this an excellent example of the way in which theoretical design can have a significant impact on practical implementation. Since Optimism can basically freely modify client code without touching any smart contracts, precious engineering time becomes available to work on future improvements (like Sequencer decentralization). A tight engineering team will also generally limit coordination overhead and speed up the process of getting everything done.

Auditability and Correctness

Although auditability is a component of general system security, I think there are sufficiently many things to cover in the context of security that it's best for this to have its own section. For a system to be considered secure, we should be very confident that it does what we think it does. Both the native and non-native approaches to proof design have clear pros and cons on this front.

The native EVM approach gets a strike against it as a result of the complexity of the EVM. Since the EVM is so complex, the contracts that implement the EVM will be equally (if not more) complex. I suspect that the audit process for the EVM proof contracts will be long and drawn out and will require auditors deeply familiar with the EVM.

However, the native EVM gets bonus points as a result of the existence of the KEVM, the formal K model of the EVM. It's possible that one could find a way to actually formally verify the correctness of the on-chain EVM implementation by cross-referencing against the KEVM. I'm not sure how much effort this is in practice, but it's a very promising concept. Of course, this would not apply to any custom system components outside of the scope of the EVM (e.g., calldata compression).

The low-level approach is, as we've discussed, significantly simpler to construct on-chain. Additionally, since changes to the client don't need to be propagated to the proof, EVM updates don't trigger a new smart contract audit.

On the other hand, the flexibility of the low-level approach means that client updates will likely be more frequent, and some types of non-trivial client updates will need to be audited. In this sense, flexibility does not come for free. That certain client changes should be audited applies to both approaches, but the increased ability to make changes means this is probably more of a concern for the low-level approach.

An interesting point against the low-level proof model is that we need to run proofs through a slightly different node than what's running in production. Since very few modern machines run MIPS natively, the node that we run during the proof process needs to sit inside of a MIPS virtual machine. A native (e.g., x86) binary is always going to be significantly more performant than a node inside of a MIPS VM inside of an x86 machine. As a result, we assume that people run the x86 binary in production and run the MIPS binary during the proof process.

This discrepancy means that the low-level approach relies on the assumption that whatever compiler toolchain we're using spits out x86 and MIPS binaries that produce identical results. Effectively, we're relying on the correctness of the compiler toolchain to guarantee the correctness of the proof. I think this assumption is reasonable for a mature (and relatively simple) target like MIPS, but it's still an additional assumption. As I'll cover in a second, we can also mitigate this problem by introducing client and proof diversity.

Client Diversity

Client diversity is a key component of the overall security of a network because it limits the impact of bugs in any one client. Client diversity is why a (theoretical) bug in Geth that mints infinite ETH wouldn't permanently damage Ethereum --- the chain would split and "correct" clients would continue to operate normally until the bug is resolved. Similarly, client diversity on L2 is necessary to prevent a client bug from becoming "reality".

Both the native and low-level approaches to fault proof design can support client diversity in different ways. The native EVM approach has the advantage here that the proof is based (primarily) on the EVM execution trace, whereas the low-level approach is based on the client execution trace. This is important because every client that implements the EVM shares the EVM trace and only specific clients will share the client trace. In practice, this means that different clients can participate in the same challenge process (the proof is client-agnostic).

In the low-level case, different clients would have to participate in different proofs. For example, there would have to be a "Geth proof" and an "Erigon proof". If a user challenges you with the "Geth proof" then you must run a Geth node to respond. Note, though, that this does not require two different sets of smart contracts --- we simply store two different code hashes on-chain and run two challenge processes.

Although running multiple challenge processes is more expensive than running a single process (by a constant N, where N is the number of different clients), you still get client diversity in the end. Early on Optimism will probably "cheat" a bit by only supporting the Geth proof. Under this model, faults can still be detected by an Erigon node but can only be proven by a Geth node. Eventually, when you're ready to add multiple proofs, you can allow the Erigon node to prove faults too.

You can also only support particular versions of a client because, e.g., Geth v1.10.23 will generate a different trace than Geth v1.10.22. The primary downside of this is that the low-level approach might get certain upstream client improvements less frequently. Assuming the upstream changes do not impact the behavior of the EVM, you may be able to circumvent this by running newer clients in production and running older clients in the proof process.

Finally, another advantage of the native EVM approach is that it doesn't care about the language in which the client was written. The low-level approach requires that the language used supports the low-level VM as a compiler target. Conveniently this typically isn't an issue for MIPS. Golang will natively compile to MIPS and many other languages (including Java and Rust) will compile to LLVM which can then target MIPS. This means we have MIPS support for Geth, Erigon, Besu, OpenEthereum (RIP), and Akula out of the box. Since this covers almost all of the major clients, I think it's good enough.

Proof Diversity

The final facet I'll cover is proof diversity. Like client diversity, proof diversity is the idea of having multiple proof implementations. In theory, each additional proof implementation reduces the impact of a bug in any one implementation.

The native EVM proof contract is significantly more complex than the low-level proof, sufficiently so that even building a single native EVM proof will be a massive engineering effort. However, the existence of KEVM means that it might be possible to formally verify the correctness of the EVM aspect of the proof. You would then only need multiple implementations of the non-EVM part (e.g., calldata compression) which seems more achievable.

At the end of the day, I simply don't know how realistic it is to use KEVM to verify the proof contracts. If it's not feasible to do this, I would have a hard time seeing anyone building more than one native EVM proof any time soon. This problem is made more difficult by the fact that you should be building your additional proofs in a clean-room environment to avoid design bugs from one implementation creeping into another.

Proof correctness is so critical to the security of the system that you essentially must rely on either formal verification or proof diversity. Even a single minor bug in the proof could cause a valid transaction result to be rejected or, even worse, an invalid one to be accepted. I imagine that this would be an important long-term research problem for anyone using the native EVM approach.

The low-level approach gives you two options for proof diveristy. You can either choose to implement multiple versions of the low-level VM contract or you can alternatively choose to build a different low-level VM entirely. Since most low-level VMs will be much simpler than the EVM, either choice seems like a viable approach. Optimism's MIPS proof was implemented over the course of several months by a single person (okay fine maybe citing geohot is cheating, but still).

My guess is that low-level Rollups will go for proof diversity over multiple VMs at first. Doing this might give you support for new clients that you previously didn't support and also helps mitigate the x86-to-MIPS issue I described earlier. We also kind of already have low-level proof diversity because Optimism could use Arbitrum's WASM proof and Arbitrum could use Optimism's MIPS proof.

General Utility

Although this isn't a fair comparison, I think it's cool enough that it's worth sharing here. The native EVM approach gives you the ability to prove the correctness of EVM execution. A low-level apporach like MIPS gives you the ability to prove the correctness of any code that compiles to MIPS.

What can you do with this? I don't know. Maybe you can use the MIPS proof to verify that a game of snake was played correctly. Maybe you can create a Bitcoin Rollup on Ethereum. I bet you can do a lot of weird things!

Point is, this makes the low-level approach a foundational building block that many new projects can experiment with. I look forward to the day when we standardize an interface for the bisection game and for VM contracts and maintain a community repository of available VM proof systems on-chain.

In Summation

I wrote all of this because I was interested in exploring the trade-offs between the native and low-level EVM proof approaches. Although the native EVM approach certainly gets you to EVM equivalence, I generally feel that the low-level approach still maintains several significant advantages. In particular, the simplicity, flexibility, and general utility of the low-level approach means it can ship fast and iterate quickly.

I do think that the native EVM approach makes a few things somewhat easier (namely supporting different clients). That said, I think that (1) the low-level approach has a very realistic pathway to client diversity and (2) the complexity of the proof system works against the security gains of supporting multiple clients.

At the end of this post, I still feel that the practical benefits of the low-level approach outweigh any benefits of the native EVM model. A simple proof model means you get to spend much less engineering time on the basics of your protocol. And the ability to free up most of your engineering resources to work on things like EIP-4844 and Sequencer decentralization seems more important in the long-term. But who knows, maybe I'm wrong! I like being wrong, because it means Optimism gets better.

The Catty Corner

And since I'm feeling a little spicy today, I also want to take a second to address specific statements in the Specular tweet thread:

[The low-level EVM approach] ... leads to ... a lack of support for L2 client diveristy

Well, no. Low-level EVM approach can do that too.

[The low-level EVM approach] ... leads to ... poor upgrade transparency

You just compile the open-source client code and store a hash of the binary on-chain. Anyone can clone the repo, look at the diff since the last release, compile the client, and verify the code hash for themselves before the upgrade goes through. I think that's pretty transparent.

[The low-level EVM approach] ... leads to ... frequent upgrades

Hmmm, maybe. I did give some ways to get around this (use new version in prod, old version in proofs). But the ability to upgrade frequently and add new features or optimizations can also be beneficial.

[Specular] ... provides stronger security and decentralization properties than existing optimistic rollup solutions

... is a strong statement about a nuanced subject. The native EVM approach has its pros and cons. I don't think it offers any significant advantages over the low-level approach at a practical level.

And Finally

You made it to the end! Thanks for sticking around. I hope you liked this post since it took me all day to write. If you have questions or comments, please feel free to reach out to me directly.