Let's Talk About Data

Data runs the world.

It is used by car insurance companies to analyze driving patterns. By software companies to understand which features in their applications are being used more and which are being used less. By gaming companies to analyze player behavior and optimize in-app purchases. By Netflix to recommend movies to you. By Spotify to recommend songs. And the list goes on.

After about 3.5 years at Microsoft, I moved back to Austin, TX, where I’ve spent the last nine years at SparkCognition, an industrial artificial intelligence (AI) company, and SkyGrid, a joint venture between SparkCognition and Boeing.

Most of my time at SparkCognition was spent building AR/VR interfaces integrated Natural Language Processing (NLP) applications— a subset of AI apps focused on human-machine interaction via natural language (think ChatGPT like Large Language Models (LLM), before ChatGPT existed 😊). At SkyGrid, we built the first AI + blockchain software platform for drone integration into the airspace.

After spending a long time building and seeing AI solutions getting built around me, one truth reigns supreme. Data does in fact run the world!

Data sanitization was, and remains, a large part of creating good AI models. There's a common saying in the field of computer science, "garbage in, garbage out" and that principle very much applies to building and fine-tuning AI models.

So, what's the purpose in telling you all this? Well, it sets the stage for us to dive into what Navigate is and why building a platform like this is crucial for the next wave of AI.

We have entered the age of Data Wars. X (formerly Twitter) locked down access to their site and increased the pricing of their APIs. Reddit increased the price of their APIs, impacting a lot of developers who used these APIs to build applications. Why? After OpenAI released ChatGPT which exists in large part due to scraped data from the internet, both of these companies realized the importance of having control over data.

X and Reddit are two of the most active platforms in the world where you have new data being generated by a massive user community every minute of every day, and it was being scraped by companies to create their own products! Ironically, OpenAI's terms and conditions don't allow you to use ChatGPT to train LLMs that can compete with it, even though OpenAI used data created by people and companies all over the world in order to create ChatGPT.

Before this, it was Apple with the release of their “App Tracking Transparency” feature which ignited the war between them and Facebook (which Facebook ultimately lost and it cost them billions of dollars). Now we have Google doing the same with third-party cookies. Come Q3, Chrome will phase out third-party cookies, to protect user privacy. It just so happens that Google will also release “privacy preserving solutions”, which, you guessed it, will be available via APIs to developers.

The size of the “data collection” pie is increasing, and everyone wants a larger chunk of it. Sharing is not the top priority here.

While companies are battling each other to maximize their revenue and get a larger part of the market, the users who are creating all this data are getting, you guessed it, nothing.

This is the problem we need to solve. The exploitation of users is getting worse. You create content, building a following on centralized controlled and closed platforms, and you have no recourse if these companies decide to suspend your account, remove features, change their terms and conditions. Chris Dixon, made a fine point in his book, Read Write Own, where he talks about the rising popularity of platforms like Substack which offer creators the ability to export email addresses of subscribers, thereby allowing creators (at least for now) to move away from Substack if they wish to do so.

This is where Navigate comes in.

At Navigate, we’re building a decentralized intelligence platform— one that rewards a community of contributors for creating and augmenting datasets that will be used for training and tuning AI models.

All of us already contribute data to so many Big Tech companies every day, helping make their products better, but not getting any earnings for it. With Navigate, we believe in distributing rewards amongst the community that is creating new and valuable datasets for AI.

So here we are. Building a platform to decentralize AI, reward contributors, and give people ownership of what they help build. The datasets created and augmented can be used by a multitude of companies building and tuning AI models, companies that need consumer insights for advertising purposes, etc. The advantage for contributors is that not only do their get rewarded for contributing, but also every time the data they created or augmented is used by a customer. The community wins by selling data to companies vs. giving it away for free and having companies sell it to each other.

One last thing I'll say is gamification will be a large part of the Navigate ecosystem. We want to create a social experience where you "Play to Build" and earn as a result of that. I don’t like the term gamification very much, so I’ll try and say this a little differently. We want to build games that help us create the largest community owned and curated datasets. We want to build games that help us decentralize AI.

I hope reading this brief introduction will energize and excite you about what we will all build together. See you soon, Navigators!

- Ali

thechaingamer.eth