Behind the curtain

What makes light clients work? I've done some handwaving in the past about merkle proofs and cryptography and general mathemagic but I'm going to attempt to at least go one level deeper. Bear with me in this silly example and I hope it will shed some light on how we can trust that light clients should work.

A thought experiment

Let's start with a thought experiment. Imagine you are part of a 10,000 member DAO and the whole DAO has voted to do an NFT gift exchange where each member of the DAO is randomly paired up with another member of the DAO to exchange NFTs with. At the time the vote is finalized, one member of the DAO is assigned to distribute out the assignments. Unfortunately, somebody wants to mess things up and is going around Discord impersonating the assigner and sending random Discord user names to each DAO member, totally messing up the gift exchange . What if there was a simple solution to this solve this and the real assigner could somehow prove to each DAO member that they had been given the correct name? If the DAO had just used a merkle tree, they could have avoided this whole problem!

ELI5 - Merkle Trees!

Before I try and explain a merkle tree, let's define a couple of terms.

A hash function is a deterministic mathematical function that returns a unique value for any input. To take a simple example, lets say we have a hash function that takes any English word of any length, say "supercallifragilisticexpiallodocious" and it produces a string of 5 alphanumeric characters like a1542. And not only that, if you change one letter so "sopercallifragilisticexpiallodocious", you would get something like 2jnv1. In other words, the hash of each word should be unique and you should never get the same hash for two different words.
A tree in computer science terms is a data structure that looks something like below:

where 1 is the root node of the tree, 7 and 9 are its children, and the leaf nodes are 5, 11, and 5. Each layer in the tree has some relationship to the nodes above and below it which is defined by the type of tree it is. If this all feels like gibberish, just bear with me.

Okay, so we have a hash function, and we have this goofy looking thing called a tree. What's a merkle? Despite what you might think, it's not something that young Gerald McGrew might catch if he ran the zoo.

Ralph Merkle invented merkle trees but that's as far into the origin of the name as we can go if we want to ever get back to our gift exchange.

Anyway, a merkle tree is a tree where the relationship between any parent and child nodes in the tree is that the value of the parent node is the hash of the two values of the two child nodes. Do what? Imagine we have the following.
1
/ \
2 3
/ \
4 5
which is a merkle tree where the leaf nodes are 2, 4, and 5. Each of these contains a name, 2 - Jim, 4 - Sally, 5 - Perez. Using the hash function we described above, the value of 3 is the hash of "SallyPerez" and that value is say vm521. Now, the value of 1 is the hash of "Jimvm521" so say k61ps. Thus, what is called the hash root of the tree is k61ps.

Now, what's magical about this is that from this merkle tree is, if you only know the root node, I can prove to you that Perez in in node 5 while only giving you a subset of nodes from this tree. If I give you a list of words - k61ps, (Jim, vm521), (Sally, Perez), that proves that Perez is in node 5. Well, yes, you say, it does, but you gave me the entire tree. What's so magical about that? The magic is what if 2 and 4 are not leaf nodes but have child nodes under them? I don't have to give you any of those. The hashes of their parents are all that is needed to reconstruct the part of the tree that leads to the root node. And, if you only know the root node and I give you the "merkle proof" (this subset of nodes in the tree), you can use the hash function to verify that these nodes lead back to the root that you have (or not if I'm trying to lie to you).

So, back to our example. Let's slightly rewrite history. When the DAO decided to do the gift exchange, they produced a merkle tree with the name of each DAO member in a leaf node of the tree and then distributed the root hash to each DAO member. Then, when the member given the role of assigning each name sends out the assignments to each random pair of DAO members, they send a merkle proof along with the name of each member. Since all the members of the DAO have the hashroot of the tree (which was embedded in the Snapshot vote that kicked off the gift exchange), they can easily verify if the name they have been given is truly a member of the DAO by verifying the merkle proof that the assigner provided with the name assignment. Then, when the baddie tries to mess things up, the DAO members just need to ask the fake assigner for a merkle proof along with the name of the other DAO member they are to exchange gifts with and will quickly see the baddie is lying.

What does this have to do with light clients?

This is a silly example but if you imagine the list of DAO member names as the entire state of the Ethereum chain, light clients are able to depend on merkle proofs to validate that the data that full nodes are sending them is correct based on only having the state root for the Ethereum state from a recent block. And that, is why I am interested in light clients.