Web Proof, Make more data verifiable

1. Introduction

I first encountered Web Proof (zkTLS) a few months ago through a CT. At the time, Web Proof was only mentioned by a few individuals, and I dismissed it as just another buzzword without looking into it deeply. A few months later, while participating in the 2024 PSE Core Program, I discovered a project led by the PSE team called TLSNotary, which finally got me into studying Web Proof seriously. Among the various projects I encountered during the PSE Core Program, Web Proof, including TLSN, captured my interest the most. With this article, I aim to introduce Web Proof to more people.

Before diving into the content, I want to thank everyone who has been working hard to edcuate Web Proof over quite a long time through CT and other forms of contents. First, I want to thank @HarshaKaramchat, @Mojtek, @dawufi, @aaronjmars for reviewing this article and giving insightful feedback. Also, the materials from people like @aaronjmars, @madhavanmalolan, @Euler__Lagrange, @delitzer, and @_weidai were very helpful for me to understand this concept, and I hope my writing will similarly help others understand about the Web Proof.

2. What is Web Proof?

2.1 Definition

Web Proof is evidence I can present to a third party to prove that information was received from a website and has not been tampered with. A simple example is as follows:

I can provide a Web Proof of my year-end Spotify wrap-up page showing that my top artist is The Weeknd, proving to a third party that I'm a big fan of The Weeknd.
I can provide a Web Proof from my Bank account page showing my balance, proving to a third party that my bank account has more than a certain amount.

Previously, this kind of verification was only possible when the website (the entity providing the information) offered a specific API or signature. Web Proof enables anyone to prove information received via HTTPS from any website, even if the website doesn't offer a specific API for this purpose.

2.2 A More Formal Definition

Web Proof is proof of the following statement: “A user received a response R from website W for request Q, and the response R contains the string S.”

For instance, if I want to create Web Proof that I am the owner of the X account @samoyedali, this would mean creating proof of the statement: “The user received the 'setting.json' response from the website x.com, and the response contains the string 'screen_name: samoyedali'.”

Another example could be Web Proof that I follow A$AP Rocky on Spotify, which would mean proving the statement: “The user received the 'following?market=from_token' response from the website open.spotify.com, and the response contains the string 'name: A$AP Rocky.'”

2.3 One Liner

The most impressive description of Web Proof I've come across comes from @aaronjmars, who called it “API on demand for anything on earth”. In other words, Web Proof allows you to use any website, even if it doesn’t offer an API, as if it does.

3. Why is Web Proof Important?

3.1 Making More Data Verifiable

Fundamentally, Web Proof is meaningful because it makes more data verifiable. As @Euler__Lagrange mentioned in this video, currently, only a very small fraction of data is verifiable—things like onchain data, JWT, email, or verifiable credentials. With Web Proof, we can make much more data verifiable without altering the existing system.

Why is making more data verifiable important? Just as assets have value when uncertainty is reduced, data also gains more economic value when its authenticity can be verified. For example, when training machine learning models, data whose origin can be verified is far more valuable than data of unknown origin. Beyond economic value, leveraging blockchain technology could also enable interoperability based on verifiable data.

3.2 A Solution to the Cold Start Problem

From a business perspective, Web Proof is important because it offers a new solution to the common Cold Start problem faced by platform businesses. The Cold Start problem refers to the challenge new platform businesses face when they cannot generate sufficient network effects due to a lack of initial users, thereby failing to gain competitiveness. Traditional solutions to this problem involve providing economic incentives or leveraging other platforms to acquire initial users. For instance, PayPal used a $10 sign-up bonus strategy, and Zynga leveraged Facebook’s user base to grow.

With Web Proof, platforms entering the market can easily leverage existing platforms' data without their permission. For example, the decentralized food delivery service Nosh enables drivers and restaurants to transfer their data from Doordash via a 'sign with doordash' button using Web Proof. Similarly, the onchain ride-sharing service Teleport allows drivers to easily bring over their Uber ratings via Web Proof.

Vampire attacking Web2 Marketplaces with zkTLS | DePIN Summit 2024

With Web Proof allowing easy transfer of data from existing platforms, the network effects, which were previously considered a significant moat for established platform businesses, may become less critical. For instance, without Web Proof, services like Nosh or Teleport might struggle to attract participants, even if they offered better costs and services, due to users' reluctance to leave behind their reputation and data accumulated on platforms like Doordash or Uber. Thanks to Web Proof, that barrier is now removed.

3.3 10 Use Cases for Web Proof

The potential applications of Web Proof are vast. Below, I've selected 10 use cases that either already exist or resonate most with me:

Crypto OnRamp: Submit Web Proof of offchain transactions like Venmo or Revolut and receive funds from an onchain escrow. A related project is zkp2p.
Expanding Airdrop Criteria: Web Proof can extend the criteria for airdrops beyond on-chain activities to include off-chain actions. As @delitzer mentioned in his post, while Web Proof wasn’t used in the process, Pleasr airdropped 10 $ALBUM tokens to individuals who held $GME stock. Using Web Proof would make such tasks easier and allow them to be applied to a wider range of services without needing permission from those services.
Onchain Incentives for OffChain Actions: Onchain escrows can automate reward payment based on the submission of Web Proof for specific offchain actions. For instance, submitting Web Proof of a pull request on a GitHub repository could automatically release USDC from an onchain escrow to incentivize more contributions from external developers.
Platform and Marketplace Bootstrapping: As mentioned earlier with Nosh or Teleport, new platforms can use Web Proof to bring over seller data, reputation, and curation from existing platforms without their permission.
Oracle Role in Prediction Markets: Web Proof can serve as an oracle for resolving outcomes in prediction markets like Polymarket. For example, in a prediction regarding the U.S. presidential election, Web Proof could be submitted from a New York Times article declaring the winner, allowing for a verifiable oracle without needing consensus among participants. For instance, @0xsmallbrain's TMR.NEWS is an Intelligent Prediction Market where users predict the next day's New York Times headline. The system uses OpenAI to measure how much the user's input sentence differs contextually from the actual headline, and payouts are made accordingly. This project uses Reclaim Protocol’s zkFetch as the oracle. For more detailed information about TMR.NEWS, refer to the following article.
Open & Private Community Building: Current community spaces are either too public, like X, or too private, like group chats. In-between solutions such as subreddits or token-gated Discord servers exist, but they require ongoing resources to manage, are complex, or are limited. Tonk's Speakeasy aims to create an "open but private" community using Web Proof. This allows anyone to join a private group chat without an invitation as long as they meet the required qualifications. For example, people who have watched Studio Ghibli films can join a group chat without needing an invitation. For more information on how Speakeasy came about, check out this article.
Verifying Data for AI Model Training: Web Proof can verify the source and integrity of data used for training AI models, helping ensure that high-quality, trusted data is used, ultimately contributing to better model performance.
Proof of Humanity (PoH): Continuous Amazon purchase records or Doordash order history could serve as a simple Proof of Humanity (PoH). Equal uses this approach by verifying user addresses through order history from Swiggy, an Indian food delivery app.
Automating Manual Verification Processes: Tasks that previously required manual verification can be automated with Web Proof. Daisy, for example, uses Web Proof to automate the verification of influencer engagement, replacing the need for screenshots and manual checks.
Pay with Your Data/Dynamic Pricing: Imagine a secure and verifiable way to share your data with service providers, unlocking new possibilities in commerce. As proposed in @HarshaKaramchat's tweets, instead of paying with money or viewing ads, you could exchange your data for services. In this model, service providers can offer dynamic pricing tailored to each user based on their data. Users get better deals, while companies can target their products to the most relevant customers, creating a win-win scenario.

4. A Closer Look

Now that we've covered what Web Proof is, why it's important, and what it can do, let's dive deeper into its technical aspects.

4.1 HTTPS & TLS

Why can’t we prove the information we receive from a website to a third party using the existing system? To answer that, we need to first understand HTTPS and TLS.

From what I've learned, HTTPS is a protocol that encrypts communication between websites and users to allow secure communication. It adds a security protocol called TLS to the traditional unencrypted HTTP. In other words, the security protocol used by HTTPS is TLS, which ensures that the information exchanged between the user and the website is securely transmitted without exposure.

The key point to understand about the TLS protocol is that both the website and the user use the same key (the session key) to encrypt and decrypt information. TLS uses a combination of asymmetric and symmetric key algorithms. Initially, asymmetric key algorithms are used to securely exchange session keys during the handshake process, after which the actual data transmission is encrypted using the session key (a symmetric key algorithm).

4.2 The Problem with TLS Session Keys

While TLS uses a combination of asymmetric and symmetric key algorithms to balance security and efficiency, and there's no problem with the secure data transmission between the website and the user, proving that data to a third party presents a challenge.

Since the website and the user both use the same session key to encrypt/decrypt data, a third party has no way to distinguish whether the data provided by the user was truly received from the website or tampered with by the user. For example, if I want to prove to a third party that I am the owner of the X account @elonmusk, the third party cannot differentiate between the following two scenarios:

I am truly Elon Musk, and I forwarded the HTTPS response I received from x.com to the third party.
I am not Elon Musk, but I decrypted the HTTPS response I received from x.com, altered the account information ('screen_name: samoyedali' → 'screen_name: elonmusk'), and then forwarded it to the third party.

Therefore, under the current system, it is impossible to prove the information exchanged between a user and a website to a third party. After all, TLS was designed to ensure secure data transmission between users and websites, so this limitation is somewhat expected.

4.3 What We Want

Web Proof is attractive because it allows us to prove the data we exchange with websites to a third party while maintaining the existing HTTPS system. So, what are the attributes we want from Web Proof? These can be summarized into two "must-have" properties and one "nice-to-have" property:

Provenance: The identity of the website must be verifiable. For example, it should be possible to verify that the data truly came from github.com. Provenance can be verified by examining the certificate included in the TLS transcript, which can be checked using the public key of a trusted Certificate Authority (CA).
Authenticity: Web Proof must show that data itself isn’t tampered by the user. For example, it should be possible to verify that the data transmitted by github.com has not been modified since it was received.
Selective Disclosure: Users should be able to disclose only the necessary information to a third party. For example, a user may want to prove that they own a GitHub account but not disclose other information. Selective disclosure is a nice-to-have property, but it is not strictly necessary.

Of course, I'm not saying that selective disclosure isn't an important feature for Web Proof. In fact, it greatly enhances the usability of the system. For example, with selective disclosure, I can share only relevant information—like my account balance—instead of revealing all the details from my bank, which helps protect my privacy. My point is that while selective disclosure is a desirable feature, it's not a necessary requirement for Web Proof.

5. Three Types of Web Proof

Web Proof can be categorized into three main types based on the implementation.

5.1 MPC-Based Web Proof

5.1.1 Mechanism

MPC-based Web Proof uses Multi-Party Computation (MPC) and a commitment scheme to prevent users from arbitrarily modifying the responses they receive from websites. In this type of Web Proof, a new actor called a Notary is introduced, and their role is as follows:

Engaging in an MPC Protocol with the User: The Notary shares part of the session key required for the TLS protocol with the user through the MPC protocol. As a result, the user cannot encrypt or decrypt the TLS transcript alone but must do so through the MPC protocol with the Notary.
Blind Signature: At the end of the session, the user creates a commitment of the data they received, and the Notary signs it. Since the Notary does not see the actual message, this process is called a Blind Signature. This Blind Signature ensures that the user cannot alter the data they received via the HTTPS response.

Through the combination of the MPC protocol and Blind Signature, users cannot arbitrarily modify the HTTPS responses they receive, and third parties can trust the Provenance and Authenticity of the data provided by the user. Because this computation scales poorly beyond two parties, in most cases, its 2PC-Based Web Proof.

5.1.2 Trust Assumptions

The trust assumption for MPC-based Web Proof is that the Notary should not collude with the user. If the Notary colludes with the user and signs a false commitment, it could deceive the third party.

5.1.3 Pros and Cons

One advantage of MPC-based Web Proof is that the website cannot distinguish whether the user is using MPC-based Web Proof or just the regular TLS protocol. A disadvantage is the network overhead and latency due to the computational complexity of MPC, which impacts efficiency.

5.1.4 TLSNotary (TLSN)

TLSN is a project by Privacy & Scaling Exploration (PSE) and is the most representative project for MPC-based Web Proof. The previously explained mechanism closely aligns with how TLSNotary actually works.

By using the TLSN Browser Extension, you can easily generate Web Proof through TLSNotary directly in your web browser. In addition to the Notary server, TLSNotary requires TCP communication, so a WebSocket Proxy server is needed to convert HTTPS to TCP. Once all necessary conditions are met, you can generate Web Proof conveniently using the plugin. Below is an example of using the plugin to verify a Twitter profile.

Afterward, you can use the Verify feature of the extension to check whether the Web Proof for the HTTPS request and response was created correctly as intended.

Another example of TLSNotary in use can be found in zkCredit or @aaronjmars ’ Instagram plugin.

5.1.5 Opacity

Opacity addresses the risk of collusion between the Notary and the user by employing the following mechanisms:

Proof of Committee: Opacity initially selects a random Notary from a network of multiple Notaries to engage in the MPC protocol with the user. After this initial process, the system transitions to a 2-party MPC scheme that leverages below mechanisms.
Verifiable log of Attempts: A verifiable log of attempts allows us to track and record instances where a user may be attempting to find a Notary to collude with.
X Account Binding: To prevent malicious Notaries from increasing the probability of collusion by operating multiple Notaries, each Notary is bound to an X account, ensuring Sybil resistance. Although I mentioned X account as an example, we can use any static information such as account creation date, handle from other platforms.
AVS & Economic Slashing: By leveraging EigenLayer’s AVS, Opacity enhances economic security and reduces incentives for malicious behavior through economic slashing.
Whistleblowing Process: Users can submit proof of malicious behavior by Notaries, adding an additional layer of security.

Although not much information has been disclosed yet, projects such as Nosh and Daisy, Teleport are known to utilize Opacity.

Beside TLSN and Opacity, there are projects like Pluto, vLayer, OpenLayer, and PADO Labs working closely to this scheme.

5.2 Proxy-Based Web Proof

5.2.1 Mechanism

In Proxy-based Web Proof, a new actor called a Proxy is introduced, isntead of Notary. The Proxy mediates the encrypted data exchanged between the user and the website. The user’s HTTPS request is sent to the website via the Proxy, and the website’s response is delivered to the user through the Proxy. The Proxy performs the following roles:

Provenance Attestation: The Proxy provides attestation that the request indeed came from the user and that the response indeed came from the website.
Encrypted Data Logging: The Proxy logs the encrypted data exchanged, which, along with a Zero-Knowledge Proof (ZKP) provided by the user, helps prove the authenticity of the data later.

5.2.2 Trust Assumptions

Like MPC-based Web Proof, Proxy-based Web Proof relies on the assumption that the Proxy is not colluding with the user. Additionally, it assumes that the Proxy is directly connected to the website and that malicious users do not bypass the Proxy.

5.2.3 Pros and Cons

The primary advantage of Proxy-based Web Proof is that it avoids the computational overhead of MPC, leading to less network latency. However, a potential risk arises if the Proxy’s IP address, not the user’s, is connected to the website, as this could lead to a ban on large-scale usage. While this can be mitigated by using residential IP sourcing, it often results in poor user experience due to unreliability and reintroduces the risk of collusion within the protocol.

5.2.4 Reclaim Protocol

Reclaim Protocol is the most representative project using Proxy-based Web Proof. Its core mechanism is very similar to what has been explained previously. However, like Opacity in the MPC-based Web Proof camp, Reclaim Protocol also uses a network of multiple Proxies and an economic incentive/slashing mechanism to reduce the risk of collusion between the Proxy and the user and to prevent malicious actions.

Through Reclaim Protocol's Demo, you can experience creating Web Proof firsthand. On the demo site, you first select the type of proof you want to generate, scan a QR code, and are redirected to the mobile app. After logging in to the respective site in the app and navigating to the relevant page, Web Proof is automatically generated. In my case, I logged into the Kaggle website via the app to create a Web Proof for my Kaggle Username, and I was able to generate the following Web Proof:

Recently, Reclaim Protocol has open-sourced its code and released Devtool v3 to make it easier for developers to build applications using Reclaim. If you're interested, I recommend checking it out directly.

Interestingly, projects like Zap are adopting a hybrid approach that combines MPC-based and proxy-based mechanisms to leverage the advantages of both.

5.3 TEE-Based Web Proof

5.3.1 Mechanism

TEE-based Web Proof uses a Trusted Execution Environment (TEE) to guarantee the authenticity of the HTTPS requests and responses between the user and the website. The user provides encrypted login credentials to the TEE, which logs in and stores the received response. The TEE then generates a signature that proves the Provenance and Authenticity of the data exchanged between the user and the website.

5.3.2 Trust Assumptions

TEE-based Web Proof depends on both the security of the hardware and the integrity of the signature process. Additionally, it requires a trust assumption that companies like Intel (or other TEE providers) won’t be incentivized to limit these activities in the future—a concept referred to as the "hard Intel trust assumption" by @Euler__Lagrange.

5.3.3 Pros and Cons

The main advantage of TEE-based Web Proof is that it does not require additional actors like a Notary or Proxy. However, the above mentioned trust assumptions are disadvantage of this scheme.

5.3.4 Clique

Clique is the most prominent program utilizing TEE-based Web Proof. Clique operates on a TEE node network, supporting the execution of custom bytecode in virtual machines like EVM and WASM. It primarily uses Intel SGX and can be applied to Web Proof as well as various onchain data authentication and incentive distribution use cases.

Sibyl is Clique's privacy-preserving API solution, which uses TEE to provide Provenance and Authenticity for data accessed through arbitrary API calls. This API call can generate Zero-Knowledge Proofs (ZKP), and the TEE can create an ECDSA signature verifying that it executed the computation as expected. This signature can then be verified onchain through Intel's Root CA.

6. Q&A

6.1 Why Do I Use the Term "Web Proof" Instead of "zkTLS"?

You may have noticed that I have been using the term "Web Proof" instead of "zkTLS," a term commonly used by many others. The reason I choose to use "Web Proof" is that ZK (Zero-Knowledge) is not the core of this concept.

In Web Proof, Provenance and Authenticity are essential attributes, while Selective Disclosure is a nice-to-have feature. The core mechanisms for ensuring Provenance and Authenticity rely on MPC, Proxy, and TEE—not Zero-Knowledge technology. Therefore, I believe "Web Proof" better captures the essence of this technology.

6.2 Why Can’t Websites Just Sign Their Responses?

If we lived in a world where every website signed their responses, Web Proof wouldn’t be necessary. The fundamental reason Web Proof exists is that in HTTPS communication, the user and the website use a single session key for encryption and decryption, rather than digital signatures.

In fact, Signed Exchanges (SXG) is a mechanism where websites digitally sign their content. Originally, it was used by Google to pre-cache content and speed up website loading, but it can also be used to provide authenticity for the data from websites. Websites hosted via Cloudflare can easily enable SXG. For more detailed information, refer to this post by @viv_boop here. Additionally, a standard proposed by the Amazon team for digitally signing and verifying parts of an HTTPS message is worth checking out here.

In conclusion, while some websites may adopt signing, large internet platforms lack incentives to implement it. Therefore, Web Proof remains the most practical solution for making more data verifiable.

6.3 Are There Any Legal Issues?

Some people may worry that Web Proof could violate the Computer Fraud and Abuse Act (CFAA), as many websites’ terms of service prohibit the use of automated tools to collect data. However, according to a post by @dbarabander, Web Proof does not violate CFAA as long as the Web Proof browser extension does not directly control the user’s account.

7. Conclusion

In this article, I introduced Web Proof and explained why it is important and what can be done with it. While Web Proof still faces challenges such as developer/user experience, latency, and proving dynamic data, I believe these are relatively minor issues compared to the hurdles faced by so-called onchain native applications. Given the variety of applications already using Web Proof, I believe it is no longer a technology of the future but one that is already practical today. I look forward to seeing more applications emerge based on Web Proof and will introduce interesting projects as they come out.