Foreign vs Native DA

An Exploration into the depths of Data Availability

Shoutout to Noah Citron @ncitron.eth for our continual discussions— I love you dearly. And @EulerLagrange for helping me noodle on non-attributable DA. If you like what you read, I'm active on Farcaster @vinny.

Ethereum’s successful Dencun update in March and the recent launch of EigenLab’s data availability solution, EigenDA, has brought data availability (DA) back into the spotlight. Dencun introduced EIP4844, Ethereum’s first major leap towards a rollup-centric future that introduced data blobs. Blobs provide a dedicated space for rollup data; rather than being stored as calldata, and by having a separate data availability space for layer-2 projects (eg. rollups) that is not Ethereum Virtual Machine (EVM)-accessible, the blobs can be verified separately from a block. While data blobs serve as the native Ethereum DA solution, other non-native DA solutions—like EigenDA and the Celestia network—have become wildly successful. When examined more closely, however, there are important and nuanced differences between native and non-native DA that concern security, attribution, and ultimately should be reflected in price.

Why Bother with DA at all?

For a rollup to settle its state on L1, the L1 has to be convinced that the rollup executed its transactions correctly. The L1 nodes could run all of the transactions to verify that the claimed state is correct, but that would prevent any real scaling. In practice, a rollup will post state differences and transaction data to the L1 and then rely on either a fault proof or a zk (zero-knowledge) proof to convince the L1 that the state is correct.

A fault proof operates as follows—the L2 sequencer posts an updated state root along with a bond, any network participant can download the transaction data and run the transactions, and if a party (verifier) detects a mismatch between the rollup state referenced in the state root on the L1 chain then the verifier can challenge this state transition and prove its invalidity. The verifier does this by computing a machine trace, which captures the step-by-step execution of the EVM, including opcode execution, memory and storage modifications, and any other relevant state changes. The verifier and sequencer use essentially a binary search algorithm to determine where the initial disagreement lies. That step is then run on chain, and if the new state differs from the sequencer’s claim, then the sequencer’s bond is given to the verifier and the rollup state is reversed. Because of this process, fault proofs have a seven-day settlement period. Interestingly, when analyzed game theoretically, any sequencer that attempts to deceive the L1 will be caught due to the economic incentive for the verifier and the ability to rerun the transactions. Crucial to this process, however, is the universal availability of rollup transaction data. If the data is not available, no party can rerun the transactions and challenge a sequencer’s claims, allowing a sequencer to commit fraud and lie about the rollup state without reproach.

While a validity proof operates differently, data availability is still crucial. In a validity proof, the sequencer posts a zk proof that the transactions and therefore new state root have been executed correctly. A zk-validity proof is a succinct cryptographic proof that the state transition followed the rules of the system. The zero-knowledge nature of the proof indisputably proves that the transactions and the state change are valid but reveals no information about the transactions themselves. In this case, transaction data is not needed for full nodes to verify that the transactions were run correctly and compute new state roots. If the data is not made available, however, nodes on the L1 cannot open the cryptographic commitment to the state and determine account balances, which is applicable only to certain scenarios. For example, if the data is not available and the rollup operator ceases operations, rollup users would be unable to withdraw to the main chain because nodes have no way of calculating their current balance. In addition, if a malicious sequencer did any transactions, it would drastically change the state root—without the data being made available, no one can withdraw their funds.

To understand why, it is imperative to delve into what the state root is. The state root is a cryptographic commitment that can be opened, usually a Merkle Patricia trie, but it can also be a standard binary merkle trie or verkle tree. The most basic Merkle tree is a tree where leaf nodes carry the state data—think account balances and smart contracts. Each leaf node is hashed with its neighbors, creating a parent node. Then the parent node neighbors are hashed together to determine the subsequent parent node. This process continues until the root of the tree is reached and a single hash—the state root—is produced. Changing even a single leaf node will affect each parent hash and propagate up the tree, thereby drastically changing the values of the intermediate nodes and the state root. Because of this property of hash functions, the state root is a commitment to the state of the tree, since any change or disagreement between nodes will result in a different state root. When a user wants to withdraw their funds from the L2 to the L1, the L1 nodes “open” the commitment by checking an inclusion proof that requires the L1 receiving log(n) of the sibling node hashes. If a byzantine sequencer conducted a single transaction without making the data available, the state root will be radically changed, and the L1 will be unable to construct a Merkle inclusion proof. Since account balances aren’t known, this renders any funds inaccessible. Therefore, a malicious sequencer could extort accounts and successfully launch a ransom attack. It is important to note that the sequencer can’t steal users’ funds due to the incorruptibility of the proof system, but without making the data available, they can hold users’ balances at ransom.

By now, the importance of data availability should be clear, and we can properly define the DA problem—how to prove that transaction records exist and are available to download without each node actually downloading the data. EIP4844—the Dencun upgrade—was the implementation of proto-danksharding (shoutout Vitalik, Protolambda and Dankrad Feist), the first step in Ethereum’s DA solution. Currently, all blob data is stored by all of the nodes, but with the future implementation of PeerDAS, which enables full danksharding, the blob data will be “sharded” amongst many nodes. L1 nodes will then use data-availability sampling (DAS) to ensure the data has been made available.

Data Availability Sampling Explained

Here’s a good mental model to understand DAS, what it tries to solve, and why it works. PeerDAS, the system invented by the Ethereum Foundation (another shoutout to Vitalik, Dankrad, and Protolambda) to be used in data blob sampling, is a more complicated two-dimensional Reed-Solomon erasure encoding scheme, but the fundamental tenets are the same. For anyone interested, I encourage you to read their paper.

Let’s start with n data points that we wish to shard amongst a committee of nodes. To make numbers simple, let n = 1000 and our committee have a size of 10. We can plot our data on a Cartesian system where the first data point (n = 1) has an x-value of 1 and a y-value corresponding to the data value. Similarly, the second data point (n = 2) will have an x-value of 2 and a y-value corresponding with that second data value. With all 1000 data points now plotted, we can fit a polynomial of degree n-1 (in this case 999-degree) to the data. And given that our polynomial is a continuous function, we can now extend the polynomial by another 1000 values (eg. until x = 2000). While our data is encoded in the first 1000 points, it’s crucial to note that recovering any 1000 of the 2000 points is sufficient to recreate the polynomial and then calculate those first 1000 points. Now we split up our 2000 points amongst the 10 nodes in our committee such that they have 200 points each.

Trivially, we could sample all 10 nodes for all 2000 points and recreate our original data as such. However, this would quickly overwhelm our network and not result in a substantial bandwidth increase. So our task is as follows: how can we be convinced that the data is available without having to sample it all? The key is that while we expect all of the nodes to each carry 200 data points, in reality we only need them to make 100 points available for our initial data to be recoverable. If we sample a point at random and the node has stored 0 points, then clearly we won’t get a response, and if the node has saved all 200 points, we’ll always get a response. 

But what if the node has stored 99 of the 200 points? Well, if all the nodes we sample only store 99 out of the 200 points then we won’t have enough values to recreate our original data (since 99*10 = 990 is less than 1000) and we encounter the issues discussed earlier. If we sample a node that has stored only 99 out of the 200 points at random, then there’s about a 50% chance we’ll get a response. That’s not enough for us to trust. But what if we sample the node 20 times? The chances the node can answer each query is (99/200)^20 or approximately 1 in 1.2 million. That seems like a pretty safe assumption. By only sampling 20 out of the 200 points, we have a 0.000083% chance that our data isn’t available and we’re being played the fool. Interestingly, this scales quite well since if each node stores 2000 points instead of 200, we still only need to sample 20 points for a similar guarantee.

The astute among you may have an enduring question that I have avoided answering until now— what prevents a node that is being sampled from just making up some response. After all, the whole ethos of the crypto space is to minimize trust and it seems like we’re putting a lot of faith that Joe-Schmo’s node won’t send us a random value just to stop our pesky inquiries. The short answer is that the initial data is committed to using a KZG commitment which makes it impossible for the node to answer a query falsely without our knowing. The longer answer is that when encoding the data into chunks, the block proposer will use a KZG commitment scheme to commit to the polynomial, which is published to all the nodes along with the data availability root (DAR) which is the Merkle root of the data chunks. The network nodes will then download the chunks they receive and use the DAR to verify that the chunks are consistent and check the KZG commitment to ensure that the chunks came from the right polynomial. When nodes want to check DA and sample a random fraction of the data points, they use the KZG commitment to ensure that the node being sampled isn’t lying. The longest answer— for those of you who, like me, continually and perhaps annoyingly ask “why?”— will require a deeper-dive into commitment schemes, which is a piece that I am currently working on and will be linked here when completed.

In practice, our system would ensure redundancy by having a few different nodes store each part of the sharded data. It’s also important to note that this is a mental model for one-dimensional Reed-Solomon erasure coding. Ethereum, and its PeerDAS scheme, uses a two-dimensional model which is more complicated to explain—however, I might take a crack at that sometime in the future.

First Method of Subtle Exploitation

The interesting critique of Ethereum’s native DA is that you can deceive a handful of nodes into believing that a blob that does not exist might get finalized. This may initially seem like a glaring security concern, but we can be certain that this false block will never be finalized. Let’s see why.

Ethereum’s consensus mechanism relies on votes (called attestations) by nodes that the block at the head of the chain is correct. Nodes will only vote on that block if, among other things, the node can sample the data and is convinced that the data has been made available. Now, what happens if node A is sampling only node B to ensure that B has made the data available (we’ll come back to this), and B only responds to A’s queries? In this event, node A would be convinced that the data was available. However, node B could then decide not to respond to any other node’s queries, effectively rendering the data unavailable. Since node A believes the data to be made available, A will attest to this block at the head of the chain and amend its internal ledger. However, since Ethereum’s finality mechanism requires a supermajority of votes and we assume an honest majority (if we don’t have an honest majority, this becomes the least of our worries), the other nodes to which node B doesn’t respond will refuse to attest to the block, it won’t become finalized, and the chain will be reorged. In fact, not only will the nodes refuse to attest to the block, they will not even consider it a valid block. This means that even if two-thirds of the nodes are deceived or malicious, an attacker still cannot mount a DA withholding attack since the honest minority will fork off. Note: if node B responds to enough of the data sampling queries, then the other nodes can effectively recreate the data and it would have been made available, so this “attack” only works on a few nodes.

The other challenge for this specific attack is the P2P Gossip network that underlies the Ethereum network. Nodes aren’t just talking to one other node but rather multiple, so an attacker would have to surround the node it’s attacking on the network to either block communications with other nodes or intercept the data packet responses before they arrive. This is prohibitively hard to pull off but does merit mentioning. Also, this issue does not arise with traditional L1 transactions since all transactions are executed by the nodes, and any block that includes invalid transactions won’t be attested to.

Native vs Non-Native DA

While data-blob is the native Ethereum solution, non-native solutions have been increasingly gaining popularity. Celestia and EigenDA are two prominent solutions currently in the spotlight. As stated on Celestia’s website, “rollups and L2s use Celestia as a network for publishing and making transaction data available for anyone to download.” Similarly, EigenDA is a solution built atop the popular Ethereum restaking protocol Eigenlayer, which launched on main net earlier this month.  

Let’s look further at how both operate, beginning with EigenDA. Fundamentally, EigenDA is an honest majority protocol—the EigenDA nodes check to ensure that the data is available and sign that it is. An L1 smart contract tallies the signatures, and if a supermajority (greater than 66%) attest that the data has been made available, then it is considered available. The reliance on an honest majority—which opens up a more unique avenue of attack—isn’t the only challenge of EigenDA, since the data unavailability is not uniquely attributable. This simply means that you can’t prove that a DA node is not making the data available. While the Eigen-nodes may attest that the data has been made available, nothing prevents a DA node from not answering your query specifically. In this event, there’s nothing you can submit to a slashing contract to prove that the data hasn’t been made available. EigenDA, and other non-native solutions, pawns this off to governance. Since slashing is handled by a governance layer, the only way for a byzantine DA node to be slashed is by a vote from the governance members, which leaves you at the whim of those governance nodes. So theoretically, if a node refuses to make data available, it would be put up to a vote to slash their stake, and if the governance nodes choose, they can collude or exclude individuals at whim. It’s important to note this is not a problem for Ethereum’s native DA, since each L1 node will sample the data and refuse to attest to or even accept the block as valid since they won’t be convinced the data has been made available. 

At first glance, one might wonder why anyone would choose a non-native DA solution. The real benefit of non-native DA solutions is that they are cheaper than L1 DA because the EVM isn’t sampling these off-chain DA solutions. An additional challenge of EigenDA, however, is that the economic security of EigenDA comes from restaked ETH, and in the event of a compromise, the Ethereum validators are incredibly unlikely to fork the entire chain to secure EigenDA (see “Don’t Overload Ethereum’s Consensus” by Vitalik Buterin). 

Furthermore, because Eigenlayer and EigenDA rely on restaked ETH, it is easier for a malicious actor to launch a 51% attack on the network. As of early last month, Eigenlayer’s total-value locked (TVL) is above 14 billion dollars, which represents less than 5% of the total ETH in circulation. An attacker could lock up enough ETH in the Eigenlayer smart contract to gain a majority of the stake. While trying to suddenly acquire 15 billion dollars worth of ETH would undoubtedly drive prices up, Ether’s market capitalization of around 400 billion dollars would soften the price impact. Compared to trying to launch a similar attack on the EVM, assuming a well-enough capitalized malicious actor, any attempt to seize a majority of the stake could drive the price of ETH prohibitively high—many parties may refuse to sell out of principle and such a large action would immediately be cause for caution. And, as we’ve discussed, in the worst-case scenario if the entire network is compromised, the Ethereum foundation in conjunction with the 10 biggest client teams would be able to execute a hard-fork— rewrite the software and reset balances—to save the network. EigenDA can’t rely on that nuclear option—known as social slashing—to save its network.

If we look at Celestia, we see similar problems arise. Take the example of a roll-up using Celestia for DA. The rollup would build a Celestia light client as a smart contract on Ethereum, and if a supermajority of Celestia nodes have attested that the data has been made available, the L2 will consider that correct. But what if the EVM wanted to sample Celestia? It might seem trivial to have the EVM sample the Celestia nodes, but when looking at possible solutions, we see it’s not nearly as simple.

Let’s take the base case—the smart contract uses some form of randomness (let’s assume a random oracle) to produce checks that someone can then do and send over to the smart contract. Here, with the random checks made public, Celestia would know which nodes are Ethereum’s and could easily lie—keep in mind the L1 nodes are on an entirely different network and have no capacity to sample themselves, since they’re not running the Celestia software and plugged into the Celestia network themselves. One may think it’s trivial to restrict who can be a sampler, but then we end up with another trust assumption and arguably a more tenuous one than before.

We could also tell the samplers to go sample, and then Ethereum uses a random oracle to select a subset of samplers at random. Now, Celestia is in a position where it doesn’t know which samples are Ethereum’s since at the time of sampling it hasn’t been decided yet. To understand the challenge, we must look at how sampling works—a reasonable option is that all samplers submit their public keys to the smart contract, a subset is selected at random, and the smart contract checks that the sample is signed by the correct corresponding private key. However, public-private key generation is trivially cheap—your phone can generate 10 billion public-private key pairs without breaking a sweat. And here we have our second problem: nothing prevents Celestia or one malicious actor from signing up for ten million samplers. Even if the honest samplers are 10,000, with overwhelming probability Celestia’s nodes will be chosen. And in this case, the malicious samplers won’t answer the query until the smart contract opens up which queries it is sampling, at which point only those queries will be answered. Since the smart contract is only sampling a subset of the data, answering those queries will not be enough to recreate the data. In fact, this type of attack, called a sybil attack, is the reason that blockchains employ proof-of-work or proof-of-stake schemes. But if we require the samplers to lock up stake in the smart contract, then we have to assume that a majority of the stake is by honest samplers, and we’re back to our honest majority assumption. Even if an attacker doesn’t control the majority of the samplers or the majority of the stake, in the time when the queries are released by the smart contract, nothing prevents a malicious actor from bribing the chosen samplers. 

A reasonable solution to get beyond this bribery issue would be to have the Ethereum smart contract require not only the sample but also the solution to a special type of function called a Verifiable-Delay Function (VDF). A VDF is a type of function that theoretically takes a fixed amount of time to solve regardless of the compute power used—the function is extremely non-parallelizable, so larger compute doesn’t help solve it faster. By sampling and then using that sample in conjunction with the VDF, we could be sure that in the interim period no tampering occurred, since the sampler would be occupied solving the VDF. The tricky thing here is that VDFs are relatively new to the space, and computer scientists are still unsure about how much they can be accelerated. A well-capitalized malicious actor could possibly even build specialized circuits to help speed up the process. In addition, we have to contend with the issue of sybils checking every data point, running the VDF, and then selectively disclosing only the samples requested. This again reverts back to the honest majority assumption.

It seems, that for now at least, we’re stuck with an honest majority assumption for non-native DA solutions.

Another Type of Honest Majority Assumption

While we’ve seen that native DA doesn’t rely on an honest majority assumption, there is one scenario where an honest majority assumption is required. Full data blobs are stored only for eighteen days—a time frame deemed long enough for all interested parties to download the data and take appropriate actions, like playing the fault proof game to challenge a malicious sequencer. A node that goes offline, however, and comes back online over eighteen days later, has no choice but to trust the majority of Ethereum that the data was made available, since there is no way for the node to sample the data itself. This is a reasonable assumption, but important to note nonetheless.

In Sum

There are important tradeoffs and nuance to consider when comparing different DA solutions. Ethereum blobs are more expensive than non-native DA since each L1 node samples the data to ensure it has been made available, and developers will have to conscientiously weigh the pros and cons. Given Ethereum's rollup-centric future, however, it is imperative that we understand how DA works, the various security assumptions, and possible solutions. Stay tuned for more deep-dives and a follow-up about how blockchain solutions should think about the different considerations!

Loading...
highlight
Collect this post to permanently own it.
Subscribe to EV3 Research and never miss a post.
#ethereum#da#dataavailability#vinny