Does AI change how much my data is worth?

Placing financial value on an individual’s personal data, let alone selling it, has always felt like a futile exercise. While ‘big data’ is a hot commodity, my own personal dataset has seemed pretty worthless commercially. But as more LLMs announce partnerships with media companies in exchange for training data, I can’t help but wonder whether the value of my personal dataset is going up?

Individual datasets have traditionally been viewed as commercially insignificant because the market only attributes value to data in the aggregate. It’s aggregated data that enables companies like Facebook and Google to spot trends and patterns across millions of users and monetize those insights through targeted ads. Whole empires have been built off of our collective digital exhaust. And while a robust market exists to facilitate the buying and selling of consumer data for the purpose of targeted marketing, this value has never really trickled down to people like you and me.

History doesn’t repeat itself, but it does rhyme. And we’re seeing a separate (though related) market emerge to facilitate the buying and selling of consumer data for the purpose of training LLMs. Every day there’s a new announcement of another AI company signing a deal with a media platform. Google signed a $60m multiyear deal with Reddit. Apple offered $50m+ for news publishing data. Photobucket, a niche image-hosting site, discussed charging rates between $1 to $2 per image, $2 to $4 per short-form video, and $0.001 per word.

Where are they getting these numbers from? It feels a bit like shooting from the hip to me. But can we expect a clearer framework on how to price these datasets over time? Here’s what Mira Murati of OpenAI had to say last week:

“[OpenAI is] experimenting with methods to basically create … tools that allow people to be compensated for data contribution. This is quite tricky both from a technical perspective and also just building a product like that, because you have to sort of figure out how much a specific amount of data, how much value it creates in a model that has been trained afterwards."

OpenAI seems to recognize the need for a better solution when it comes to pricing data. And I doubt it’s the only AI company to do so. 

As competition for training data intensifies and as more media platforms wake up to this new revenue model, I can see this market reaching the same scale and sophistication as the one for targeted marketing. Supply and demand will determine the price of a pixel, and the more recent, differentiated, and authoritative the data, the better.

Back to whether any of this actually changes the worth of my data? Well, Mira continued:

“And maybe individual data would be very difficult to gauge, how much value that would provide. But if you can sort of create consortiums of aggregate data and pools where people can provide their data, maybe that'd be better.”

So the answer is ‘yes and no.’ At a theoretical level, the history of my reddit posts are now worth more to Reddit because it can monetize my data in a way that wasn’t possible before. But if I were to download all my Reddit data to a .txt file and hawk it on Craigslist, the data would still be pretty worthless to an LLM. Yet again, the value seems to belong to the platform and not the person.

Are there signs value will end up trickling down to users? We’re already seeing them rebel against platforms like Stack Overflow, which has so far refused to compensate creators of the posts used to train ChatGPT.

I personally think it’s unlikely we see platforms compensate individual users in the form of cents and dollars any time soon. It’s inefficient and too likely to end up as “selling your house for firewood.”

Instead I’m paying attention to less obvious business models that enable people to band together and form data collectives in which they own a stake. We’re already seeing this happen with companies like Vana, which is building user-owned foundation models, and Hive Mapper, which enables anyone to contribute mapping data in exchange for a share in the economic benefits of the map’s development.  

So does AI change how much my data is worth? In isolation, probably not. But collectively? The value is there. As this new data economy continues to play itself out, I’m betting on data collectives and cooperative models that ensure individuals who contribute their data can directly participate in the benefits of its use.

Loading...
highlight
Collect this post to permanently own it.
In transit logo
Subscribe to In transit and never miss a post.