Cover photo

Neural Media

an exploration of neural networks as a creative tool & medium

"All media are extensions of some human faculty — psychic or physical."  ~Marshall McLuhan

For most of 2024, and especially since publishing my last essay, I’ve been spending quite a bit of time trying to make sense of what we now call “generative AI” and its implications for me personally and for society more broadly. Like many, I’ve been captivated by AI as a creative tool, and have found myself implementing many of these new products into my workflows, particularly for creative writing and music creation. However, as a crypto investor thinking about consumer media and user-facing applications, AI has increasingly felt like a blind spot for me. When we talk about the most successful consumer media businesses of the Internet Age, we don’t talk about them in terms of technological silos because that isn’t how they are built. We don’t think about Facebook purely as a “mobile app” or an “AI app” despite its success being path dependent on those innovations. Instead, we recognize that it was the confluence of many different innovations which made an application like Facebook possible.

With that context in mind, the purpose of this essay is to consolidate and refine my personal findings & insights from exploring AI throughout the last year. I’m sharing in hopes that this will resonate with or be helpful to others (especially my fellow crypto enthusiasts).

Yet Another “Napster Moment”

Today, the majority of conversation around AI-generated media is focused on: (1) the ethics of model training & data scraping, (2) whether or not “AI art” is “real Art,” and (3) deepfake dystopia. These conversations are all very interesting and worth having, however I think they all miss the forest for the trees in important ways.

The framework I’ve found most useful for reasoning about the rise of generative AI is to consider that IP is going through yet another “Napster moment,” but this time for production rather than distribution. As I’ve written about previously, the rise of the Internet and the subsequent collapsing of media distribution costs to $0 was a “zero-to-one” moment in time. The abruptness of this shift is captured brilliantly in a documentary that I highly recommend called How Music Got Free, which tells the story of how a CD factory worker and a group of teenage hackers brought the entire music industry to its knees seemingly overnight. Up until the emergence of Napster, and digital-file sharing more broadly, the entire corporate media industrial complex (and thus artist livelihoods) was dependent upon the technological reality of media distribution being expensive, high-friction, and centralized. Within just a few years of its launch, major labels went from doing record sales volume to literally begging the federal government to save them via legal intervention. The industry was presented with an immensely hard pill to swallow: the economic regime underpinning their business had fundamentally and irreversibly changed, and the days of buying music were over.

Today, I think generative AI is presenting us with an ever harder pill to swallow. The implications for the cost of creative production going to $0 are, in many ways, even harder to grapple with because they strike right to the core of what many feel makes us human: our creativity. This existential fear does not, however, change the reality that media generation, and in particular “style transfer” or aesthetic mimicry, is now free for every media type we care about (text, images, video, audio, software) — this is another “zero-to-one” moment in time. However, the most important distinction between today and the early 2000s is that in the case of Napster vs. the media corporations, the government sided with the corporations and ultimately decided to criminalize file-sharing as “piracy.” (Hence why I often refer to corporate media / IP as “fiat media”). That decision, along with Steve Jobs introducing the iPod as a way to popularize what would become iTunes and eventually “streaming,” is what kept the industry from collapsing entirely. Unfortunately, I think creatives who are expecting the government to step in and take action here are coping at best and deluding themselves at worst. What I think we’ll probably find out is that the IP regime exists largely to protect corporations and their fiat media, and no one is coming to save us. Legacy media corporations learned this lesson the hard way last time around, so they’ve proactively done licensing deals with all the major AI companies and will be compensated at least to some extent. New media corporations are training models on the user-generated content shared on their platforms, even if they claim not to be. Once again, the technological reality has shifted abruptly, and independent creatives are largely being left behind.

Computing: The Medium of Our Time

It’s easy to see why so many creatives view generative AI as disempowering, and I think much of that fear is warranted. However, I also think there’s an opportunity to consider that computing is evolving in a way that’s calling us to engage with it not merely as a communications medium, but also as a creative medium.

The idea of computing as a creative medium won’t be novel to anyone that’s created video games or generative artworks, but even today, this still hasn’t clicked for many. Software, the first digitally-native category of media, is something that most people understand primarily through the lens of “services,” “utility” and “optimization,” but not necessarily through the lens of creative expression. I have many theories for why this is, despite computing obviously being the medium of our time, but it seems that generative AI is now driving the point home quite aggressively by sending the production costs of every other medium to $0. One of the existential questions that this seems to bring up for people is something like … “so where does the human creativity come in? where does craft come in?” My answer, probably unsurprisingly to some, is “at the level of programmability,” but before diving into what exactly I mean by that, there’s a few important technical concepts we need to unpack:

Neural Networks 101 (for Dummies)

Training is a process that essentially involves “teaching” a model how to do some task by providing it with many examples of that task being done, then allowing it to find patterns, make predictions based on novel inputs, and correct itself when wrong. Conceptually, it’s similar to how we learn to draw by first copying shapes until we can create something original, using the feedback of our peers and teachers to refine our approach along the way. The key distinction, of course, is that a text generation model, for example, has not learned how to write in the way that you and I can, but has instead learned to simulate writing with extremely high fidelity. This is one of many reasons why I’ve come to agree that “simulators” are a much better mental model for neural networks than “agents.”

Latent space, or as I like to call it, “high-dimensional possibility space,” is a representational space within neural networks where learnings from training are represented in a compressed form. Metaphorically speaking, you can think of this as an “internal world model” that’s constructed as the model learns to make sense of the complex relationships and similarities between various detectable features in its training data. As I’ll expand on below, understanding the concept of latent space is key to understanding the essence of neural networks as a creative tool and medium.

latent space visualization #1 — interpolating between known embeddings
latent space visualization #2 — representation of the multi-dimensional attributes & relationships between embeddings

Embeddings can be thought of as the mapping of inputs onto specific points within latent space. It's the process by which your prompt is essentially translated into the language of the model's "mind." In this way, we can understand “prompting” as a way of exploring and navigating a model’s latent space — implying that to be skilled at prompting is to develop an intuition for the shape of a model’s latent space, such that one can direct the model to produce specific, desired outputs.

Much of the fun in playing with neural networks lies in the fact that their deepest inner-workings largely remain a mystery to us, however I think these basic concepts should provide the context needed for thinking about models as creative tools.

Neural Networks: A New (Creative) Paradigm

One of the first and most important things to grasp about computer media is that it calls us to shift our thinking from focusing primarily on outputs (songs, images, videos, text) to focusing more centrally on systems & processes. In the specific case of neural networks, this means thinking in terms of programmable media generation engines rather than thinking exclusively in terms of particular pieces of media. It’s through this lens that it became obvious to me that the answer to the aforementioned question of “where does human creativity and craft come in?” can be found in the training process and the design of the model’s architecture — this is what I mean by “at the level of programmability.”

xhairymutantx by Holly Herndon & Mat Dryhurt — outputs from a model that was trained strictly on photos of Holly and produces photos inspired by her likeness no matter what prompt is given.

If you consider that neural networks represent an attempt to implement software-based abstractions of human cognitive functions, it becomes clear that training and designing a model is akin to teaching it how to think. You could imagine telling ("prompting") all your friends to “imagine a childhood memory,” and each response will of course be different because what they generate will depend on their personal backgrounds and imagination (“training data”). Over many prompts, you can also imagine that some of your friends will consistently generate more beautiful or creative responses, maybe even adhering to a particular personal style. Now what if you could run this exercise with every human mind that exists and has ever existed? What if you could pick out particularly unique human brains to run this exercise with, like Picasso or Kanye West? This is essentially the creative superpower that neural networks afford us with — the ability to harness other minds as creative tools. Here, I think what’s truly compelling is not so much any one particular output of a model, but more so the opportunity to creatively program a “software brain” that thinks unique thoughts and generates unique artifacts.

Arcade.ai is a “prompt-to-product” marketplace that allows users to design their own jewelry products. The’ve fined-tuned a model specifically to produce high-fidelity images of jewelry that use only the materials available to their end-users for manufacturing.

Taking this point about systems > outputs further, another hallmark of interacting with neural networks is the experience of participating in a continuous feedback loop of prompts and responses — an experience I’ve heard a few people compare to the feedback loop of reading and writing. I’ver personally noticed that it’s rare for me to prompt a model, receive a single output, and be done. Almost every interaction I have with a model brings me into this interactive, feedback loop of an experience where I’m continuously iterating, reflecting and exploring. This a very subtle but critical point to grasp as it relates to the kinds of media generation that neural networks are meant for:

Agent-based media

I briefly mentioned this concept in a previous essay, but the idea is fairly simple — here, the model is simulating the role of an agentic human companion of some kind, trying to relate to us as we go back and forth with it, and engaging primarily through text-based conversation although it can also understand and respond with other forms of media. This is also where we’re seeing models that are capable of taking actions (i.e. doing financial transactions) on behalf of others or itself. Think chatbots, AI companions, in-game NPCs or any other form of anthropomorphic UX. Infinite Backrooms, a creative experiment by Andy Ayrey that involves positioning various instances of Claude to communicate without human intervention, is a particular interesting example of this.

Real-time game engines

Here, the model is simulating a game engine (or more specifically, a game state transition function) by taking in user actions within the game as prompts, and generating the next frame within the game as a response output. At fast enough speeds, this should feel like navigating a virtual world that is rendering itself in real time in response to your actions. Immersive and interactive media at its finest.

DOOM game frames generated by GameNGen, a game engine powered entirely by a neural model, as outlined in Google’s Diffusion Models Are Real-Time Game Engines paper.

Multiverse generators

Here, the model acts as a creative oracle by helping us expand upon our original ideas through the generation of infinite variations, each of which can also be further explored and manipulated. This enables us to “branch out” from any idea or concept to explore the surrounding possibility space. AI Dungeon, a text-based "choose your own adventure" game, is also a great example of this.

view of the UI for Loom, a tree-based writing interface for LLMs such as Chat GPT by @repligate

Latent Space as a Creative Tool

Increasingly, I’ve come to believe that this idea of “exploring possibility space” is central to understanding neural networks as a creative tool and medium. In my time playing with tools like Midjourney, Suno, Websim, Claude & more, I’ve noticed that much of what I’m doing falls into this flow of Prompt —> Generate specific variations on that output —> Use variations as prompt for new output —> Generate specific variations on that output —> and so on…

For example, when using AI-powered music generation tool Suno, I typically provide the model with a 60-second demo of my own singing and some written lyrics as a prompt. I’ll then use the Cover feature to generate an output, then generate 10+ variations of that output, and then use whatever pieces I like from those variations as inputs for further prompting.  I’m essentially exploring the possibility space around my own demo as an embedding within the model’s latent space — discovering variations that are based on my original work, but that I might not have been able to come up with on my own or generate in a reasonable amount of time. I think this unlocks a kind of rapid prototyping and beta testing for creative work that hasn’t been possible before, and will lead to the emergence of “100x creatives” much like the software community has talked about the AI giving rise to “100x engineers.”

It’s clear to me that latent space is a creative tool, and that harnessing the power of AI in creative production won’t just be about training powerful models, but also about designing interfaces that empower users to explore and manipulate these vast spaces of latent possibility with more precision and granularity.

Consumer Behavior & Cultural Impact

As for the more “practical” conversation around how this technology transforms consumer behavior and what emerging business opportunities it creates, I have three predictions I’ll share:

Everything becomes a creative tool

Prompting — whether text-based, image-based or otherwise — is becoming embedded into more and more interfaces and experiences, inviting end-user creativity into spaces it’s never previously been welcome in. I agree with folks like Scott Belsky on the idea of most prompts being abstracted away from end-users as “controls,” but most importantly, I think this fundamentally transforms how we have to think about interface design going forward.

Corporate Media —> User-generated Media —> Machine-generated media

The last major shift in media business models saw corporations go from generating their own media to outsourcing that generation entirely to end-users. It seems obvious at this point that the next major consumer media businesses will have built their businesses around the proliferation of machine-generated media, but its not entirely clear what a “winner” here would like look. Will it be generalized models like Midjourney, more specialized creative tools or even social experiences built on top, a third less obvious thing? Regardless, I think that if you’re a founder or independent creative operating within the consumer media landscape today, you probably want to have a strategy for how these tools can enhance and drive value to your business.

Beyond that, another place I think it’s worth spending time is on the question of how to make AI-powered experiences feel more social and multiplayer. Today, at least for me, the majority of AI-apps feel very anti-social in the sense that you’re primarily interacting with models rather than other humans. There will likely be lots of opportunity and design space to play around with here, including building more human-centered co-creative experiences, and also creating ways for humans and bots to socialize in more meaningful ways.

The Impact on IP

It’s not just that the cost of creative production is going to $0, but specially the cost of aesthetic mimicry. I can take a picture of a person’s outfit, give it to Midjourney as a prompt for designing a couch in the same style. I can do the same kind of style transfer with that person’s voice, writing style, and more. What is the value and significance of IP in under this new paradigm? I haven’t answered this question yet myself, but it certainly feels like most previous assumptions and mental models no longer hold.

The Role of Crypto & Closing Thoughts

If you’ve made it this far — thank you!

I’ll be exploring the implications of all this for Crypto in future essays, but to preview what I’ll be spending time on going forward:

  • Opportunities for crypto companies to build around new media — the intersection of onchain markets and machine-generated media

  • Crypto as an incentive layer for IP — going beyond attribution and provenance to think about building incentives & networks around media

  • Crypto as a monetization & access control layer for media, particularly user-generated software — rethinking webpage architecture; minting as a business model for small models; NFTs as infra for personal programs & user-generated software

  • Crypto as a social & economic coordination layer between humans and machines — humans and AI collaborating on identifying, funding & solving problems of various kinds; community-owned & operated models

Loading...
highlight
Collect this post to permanently own it.
eclecticisms logo
Subscribe to eclecticisms and never miss a post.
#ai#crypto#art#generative#neural#networks#media#software#nft