AI Has a GDPR Problem

If Europe’s General Data Protection Regulation were to be strictly enforced, Generative AI would not be allowed to operate because if violates so many of its core tenets. But is the problem with the technology or with the fact that privacy law today simply does not account for modern technologies?

This is a slightly longer link-enhanced version of an article that first appeared in the Mint. You can read the original here.

Over the past few weeks, I have been invited to more than my fair share of panel discussions on the regulation of artificial intelligence (AI), and in all of them, I have found that the conversation very quickly tends to veer towards privacy. Most panellists are worried that given the way in which this new technology has been built, it will not meet the stringent requirements of Europe’s General Data Protection Regulation (GDPR). And since the rest of the world sees GDPR as the gold standard for privacy regulation, they fear that this will detrimentally affect the deployment of this technology around the world.

Consent

Take consent, for instance. GDPR operates on the principle that personal data should only be collected and processed with the consent of the individual. This ensures that the person to whom any data pertains has some autonomy over who can access it and what they can do with it. The training data-sets on which most Large Language Models (LLMs) are trained have mostly been assembled by scraping the web for data. If some personally identifiable information finds its way into that data-set, it probably got there without consent. This is clearly not what the GDPR prescribes.

Personal data can be collected and processed without consent where there is a “legitimate interest” that necessitates it. However, to avail of this exception, it must be demonstrated that the creation of these LLMs are “necessary” and would, for that reason, justify such data collection without consent. As useful as they might be, it would be hard to argue that LLMs are a “necessity” in the way that would be required to demonstrate that their collection of personal data without consent is justified.

Minimisation and Retention

GDPR also requires compliance with the principles of data minimization and retention restriction, and stipulates that only so much personal data should be collected and/or processed as is required to achieve a specific purpose. It stipulates that once that purpose has been served, data that is no longer required must be purged.

LLMs operate very differently. Their effectiveness almost entirely depends on being able to access vast amounts of data for training. They need that data to be indefinitely available so that their models can be continuously refined. All of which runs contrary to the way in which GDPR requires data businesses to operate.

Private Conversations

Finally, there are the privacy implications of conversational AI solutions that encourage users to engage in an almost human-like dialogue with the AI. During conversations like these that are designed to be all too real, the likelihood that personally identifiable information will be shared with the AI is greater than ever. What’s worse, there is a risk that this data could enter the reinforcement learning cycle and become part of the AI model, raising concerns about further ongoing violations of the GDPR.

For all these reasons, it seems highly unlikely that LLMs, in their current form, will be seen fulfilling the data protection requirements set out under the GDPR.

Earlier this year, the Garante, Italy’s Data Protection Authority, held as much, banning Replika, an AI chatbot, from accessing the personal data of Italian residents. Last month, it did the same to ChatGPT, forcing OpenAI to take its services offline in Italy. The French, German and Irish regulators are reportedly looking closely into the matter and the European Data Protection Board has set up a task force to coordinate investigations and enforcement. The days of LLMs seem to be well and truly numbered.

The Problem Lies Elsewhere

But what if the problem does not lie with the technology. What if what needs to change are the laws that are being used to regulate it? Just because generative AI does not meet the requirements set out under the GDPR does that mean we should prohibit that technology from being used. Or should we, instead, try and redesign our regulatory frameworks so that they can enable these new technologies to function better?

Throughout history, new technologies have forced us to re-consider existing legal frameworks. When DNA sequencing advanced to the point where its use became commercially viable, existing laws had to be amended to let individuals continue to avail insurance benefits and to protect them from the misuse of genetic insights in denying them employment. Similarly, when drones became commonplace, India enacted a brand new set of regulations to make sure that this new technology was not crushed under the weight of aviation regulations that had been enacted at a different time and for other regulatory objectives.

LLMs and generative AI are evolutionary technologies that call for a re-examination of our existing legal frameworks. If, instead of doing that, we constrain generative AI within the bounds of existing law, we will impose on it a regulatory construct that is incompatible with what all it has to offer.

Europe’s GDPR was enacted at a time when much of what is happening today was functionally impossible. If even the best technologist of the day could not have foreseen what these technologies could evolve into, we can hardly expect legislators to have factored this into the laws they drafted. And if GDPR has not been designed to regulate LLMs, why should we assume that the legal framework that it describes is fit for purpose?

Regulation must strike a delicate balance between fostering innovation and preserving privacy. That will not come from blindly forcing new technologies to comply with existing frameworks.