10 Weeks of Journey into vFHE

i’ve been working on deep dive into vFHE ((verifiable Fully Homomorphic Encryption)) for last 10 weeks. This post is to reflect on the project and briefly explain my project. The whole paper can be found at this link.

1. Half Success, Half Failure

I can't predict what grade the professor will give me, but I would evaluate my project as being worth 50 points out of 100.

Strengths

In my view, the strength of my research lies in being the most comprehensive and up-to-date guide on vFHE (Therefore, I titled it 'The Hitchhiker’s Guide to the vFHE’). Surprisingly, vFHE is a very nascent technology than I expected. In my search for materials to study vFHE, I found only one paper that was beginner-friendly and covered the significance and concepts of vFHE. The paper was excellent, but had the following shortcomings:

vFHE combines ZKP and FHE, but the paper did not sufficiently explain these two technologies. Therefore, it might be difficult for absolute beginners who are not already familiar with ZKP and FHE.
The paper was published in January 2023, so it is not outdated. However, since vFHE is in its early stages and has been actively researched recently, the paper does not mention the latest vFHE research published since it was written.

Therefore, in this project, I aimed to address these two points by 1) ensuring that the reader could understand the overall flow of ZKP, FHE, and vFHE from this single work, and 2) including the most notable recent vFHE research as of June 2024 to differentiate my work from previous studies.

Weaknesses

To be honest, the result of my work is just a collage of existing research and materials. Although this has its own value, I failed to create something new or to take even a half-step beyond existing studies as I originally aimed. Also, my topic selection was too vast and ambitious. Not knowing enough about ZKP or FHE, I spent much of the 10 weeks just understanding these technologies, which I believe impacted my failure to go beyond existing research.

Now, I will briefly summarize my research, dividing it into the problem and the solution.

2. Problem

2.1 Let's not Repeat the Mistakes of the Past

There are several ways to use machine learning, but I think there are two main forms: 1) training your own model and using it, and 2) using MLaaS (Machine Learning as a Service). For example, using chatGPT to get the desired information is an example of using a model trained by OpenAI as an MLaaS.

The awareness of the problem that led me to start this project stemmed from the following assumptions:

The main differentiators in ML models are data and computational resources.
Most companies and individuals, except for a few large corporations, will use machine learning through MLaaS.
In the near future, ML services will become more deeply integrated into our daily lives.

If the main differentiators in ML models are data and computational resources, models created by a few large corporations that can make significant investments will dominate the market, and other companies and individuals will use MLaaS provided by these corporations. As ML services become more integrated into people's lives, our lives will become dependent on the services provided by these corporations. For example, everything from content recommendations to disease diagnosis might be decided by services provided by a few corporations' ML models.

Personally, this situation reminds me of how social media platforms have influenced our lives. It's not that social media platforms are inherently bad, but we have already experienced through the Facebook-Cambridge Analytica scandal what can happen when our lives become dependent on services provided by a few entities.

Therefore, I believe that we should proactively consider the safety of users and model providers before ML services become more integrated into our lives, learning from our experiences with social media platforms. I believe that the direction of ensuring user safety and the direction of creating better ML models are not mutually exclusive but can actually be synergistic. Thus, ensuring user safety is an urgent task that must be addressed in our time, as important as creating better ML models.

2.2 What is a Safe Machine Learning Service from the User’s Perspective?

What does 'safety' mean for users in the MLaaS scenario we are currently considering? To understand this, let’s briefly review the entire MLaaS process:

Model provider (MP) deploys the model: MP provides a pre-trained model as a service, allowing users to use it immediately without additional training or to further customize the model with their own data.
User provides data: The user uploads their data through the platform, which preprocesses it to make it suitable for model training or prediction.
Computation and result delivery: MP uses the user's data to train the model or perform prediction computations, then provides the results to the user, usually in the form of an API.

In this scenario, I believe users would want the following two things to feel that they are using a ‘safe’ machine learning service:

Data privacy: Users would likely want to protect their data privacy from the MP as much as possible. Especially for more sensitive MLaaS, users would want to disclose as little data as possible to prevent potential data breaches without compromising service performance.
Verifying the correctness of results: Here, 'correct' means that the MP used 1) the data I submitted and 2) the promised model to generate the results. If the MP used different data or a cheaper or malicious model instead of the promised one, the results would be 'incorrect,' and users would want to prevent this situation. Therefore, users would want to verify the correctness of the results.

vFHE ensures user safety by guaranteeing the correctness of computations, allowing AI services to expand into more sensitive industries. Sometimes I think about how AI could help with our mental health. There are already services like Wysa and Youper, but as a user, I would be too afraid of personal data leaks to use them. Given the highly personal nature of mental health, even if the service is claimed to be secure, I would feel uneasy. In such cases, if vFHE could cryptographically ensure user safety, it would bring more peace of mind. I plan to discuss mental health and AI in more detail later.

3. Solution

3.1 Why vFHE?

vFHE is the most suitable solution to the problem because it combines the technologies of FHE and ZKP.

During the training, distribution, and inference of machine learning models, several technologies protect data, collectively known as Privacy Preserving Machine Learning (PPML). Each PPML has its own strengths and weaknesses, and the most suitable PPML depends on the scenario setting.

For a comparison of various PPMLs, I recommend this post by Bagel.

Bagel, With Great Data, Comes Great Responsibility

In conclusion, FHE is the best for protecting data privacy, and ZKP is the best for verifying the correctness of specific computations. This aligns with the two attributes that users want, making vFHE, which combines these two technologies, the most suitable solution to the problem we are considering. vFHE enhances traditional FHE by making computations verifiable through ZKP.

FHE (Fully Homomorphic Encryption) allows arbitrary computations on encrypted data without decrypting it, and ZKP (Zero-Knowledge Proof) allows verifying the correctness of a computation without revealing the computation's details.

The Biggest Challenge of vFHE is Practicality

The biggest challenge of vFHE is its efficiency and practicality, as it combines two already complex and time-consuming technologies, FHE and ZKP. Therefore, recent research on vFHE focuses on improving its practicality through various combinations of FHE & ZKP methods and optimization tricks.

Moyed, The Hitchhiker’s Guide to the vFHE

The End is Just the Beginning

In the end, what I wanted to say with this project is:

Discussions on AI safety should start now.
vFHE is the optimal solution in the context of most people accessing machine learning through MLaaS (though this assumption could change depending on the market share of on-device AI).
The biggest challenge of vFHE is practicality, but research has been active recently.

Although this project has ended, it might not be completely over, as I will participate in the 2024 PSE Core program this summer.

During this project, I deeply felt the lack of programming experience related to cryptography, which limited my understanding and practical application. And interstingly, at thtat time I learned about the 2024 PSE Core program, and given my current theoretical understanding of ZKP and FHE, this program, which focuses on cryptographic programming, seemed like a great next step. I applied immediately, and despite my limited programming experience, I think I was accepted due to my enthusiasm and the results of this project. From late July to September, I will take this new step and share my reflections afterward.