In Search of Magic: AI Interfaces Beyond the Chat Box

There’s a popular adage from science fiction writer Arthur C. Clarke:

Any sufficiently advanced technology is indistinguishable from magic.

One of my favorite questions to ask is, when was the last time you used tech that felt like magic?

I think everyone has only a handful of uniquely personal experiences with technology that evoke a visceral feeling of magic.

The first time I felt the magic of generative AI was in Paris last summer. I was walking around the perfectly trimmed hedges and ornamental fountains of the Tuileries Garden, when I vaguely remembered an interesting discussion from a college seminar about how the design elements of French and English gardens can be traced back to differing philosophies. I forgot what the explanation was, so I pulled out the then recently launched ChatGPT app.

Maybe this isn’t that impressive to you - but for me at that time, reading the explanation as the text typed out was a magical experience. Just moments after this vague thought crossed my mind, ChatGPT helped me remember what I’d learned back in college. It felt like I was watching a Youtube video buffer on 360p and then switch to 4k high definition. The convenience, the speed, and the brevity of the answer was a completely different experience from opening the Safari app, searching on Google, finding the Wikipedia page, and scrolling to dig out the part I want.

For the rest of my time in Paris, I used the ChatGPT app as a personal museum/art history/architecture/philosophy guide. It was like having a personal Rick Steves guide who could answer any niche question - a game changer for travel.

Chat boxes lead to "prompter's block"

In the last year of consumer generative AI, a haves and have-nots has formed between:

People who have tried ChatGPT a handful of times but “don’t know what to use it for”
People who have identified a job to be done and what to prompt to accomplish it

The cause of this divide is the chat box interface.

There has been plenty of discussion on how chat interfaces are not the future. I like how Amelia Wattenberger frames it in her post: “Good tools make it clear how they should be used. And more importantly, how they should not be used.” Presenting a user with a general purpose tool and an empty chat box leads to decision paralysis - “prompter’s block”.

What can I ask? What should I ask? I heard that prompt engineering is important, how do I do that?

The future of AI enabled products and interfaces shouldn’t require users to know what special commands or words to use. Interfaces have to move closer to the user and clearly frame the job to be done.

*Benedict Evans made this clever comparison with WordPerfect keyboard overlays in his essay* *“Building AI Products”*

Some people argue that LLMs are the end of interfaces because now computers can understand natural language, and “agents” (elusive definition) will do everything for us. I think this is wrong - humans are tool users. The best forms of human computer interaction cannot only be language; there is value in visual information hierarchy, interaction design, haptics, sound. It’s telling that babies and animals intuitively understand how to use touch screens.

Nat Friedman pointed out that it took 20 years of the internet existing before the industry discovered the infinite scrolling feed, which has fundamentally shaped the topography of how we use the internet:

I kind of suspect there’s an inverse correlation between the profoundness of an innovation and the time it takes for it to actually make a difference. The Internet was, it took 20 years to your point, for the feed to come along, or 15 years, wherever it might be, which is the feed to my mind is the core Internet sort of innovation.
That was something that could not be done before to have a dynamically created list of content that never ends, that is personalized to every single individual, that is fundamentally new. It took 15 years to get to it, and that’s what unlocked the advertising model, it unlocked the entire economy of the Internet.

New technologies and capabilities enable native interfaces. We're still learning how to make these very capable models useful to people. What is the version of the infinite feed for AI?

There’s a favorite line from investor/blogger types: "the next big thing starts out looking like a toy."

It's Time to Build “Promptless AI“ Toys

Recently I’ve been spending time exploring low hanging fruit around AI interfaces. Lots of energy has been directed towards b2b/workflow use cases, but I’m excited about consumer mobile apps – the surface that we use casually the most. With mobile, we can take advantage of building natural flows around capabilities like the camera, user's camera roll, and location.

I call it "Promptless AI": simple apps that use LLM APIs to solve for a niche use case without making the user type into a chat box.

Here are 3 toys I've been working on, focused on image as input:

1. Color Analysis

Color analysis is a method that analyzes your skin tone, eye color, hair color to determine which of 12 possible color seasons you are. Knowing your color season empowers you to know what color clothes and makeup fit your complexion. There are professional studios that charge hundreds of dollars for comprehensive color analysis, which has especially taken off in Korea.

Earlier this year, my girlfriend showed me a viral TikTok “hack” where women used ChatGPT to get a free color analysis. The “hack” entailed taking a selfie, using the digital color picker/eyedropper tool to find the hex colors of your skin/eyes/hair, and pasting it in a predefined prompt to ask ChatGPT what your color season is. These TikTok tutorials were effectively low-fidelity promptless AI apps where you don’t have to think about what to prompt.

I thought it’d be fun to brush up on my SwiftUI skills and build a vertical experience around using GPT4o’s vision capabilities to get your color analysis:

you can try this, it's live on the app store - use referral code "COLORS" for a free analysis :)

2. Art Museum Guide

Take a picture of an artwork to learn about it. It's like a universal museum audio guide. I built a prototype with 3 modes:

1. Museum tour guide

2. Explain like I'm 5

3. Impress my date

I had a lot of fun testing this out at the Met. In its current one-shot-to-claude-sonnet-3.5 iteration, it's maybe 60% accurate at identifying works. Time to make a new "artwork identification eval" for models?

I'm planning on working on this app more - improving accuracy, adding an audio guide mode and more explanation styles, and maybe even a social layer to track your favorite works at museums around the world.

8/22/24 update - I spent a couple weeks polishing this app and now it's live on the App Store

3. Improve my outfit

I’d seen the whole male “looksmaxxing” apps take off (take a selfie, get a rating of how hot you are, print $$ off scammy weekly subscription paywall tactics).

There's natural appeal around apps/personality quizzes that tell you about yourself. This prototype has you upload a picture of an outfit, and then tells you:

1. Why it works

2. Ideas to improve

3. Rating from 0-100

I'll admit that this one is a bit of a tarpit idea, but it's cool to have AI identify and tag parts of an outfit. For relatively simple male outfits that are really just about color and cut for your shirt and pants, some of the advice isn't bad.

4. Bonus: BestWatermelonPicker

Have you bought watermelon? Do you know how to pick the best one? Introducing: BestWatermelonPicker.

Never worry about taking home a dud of a watermelon again.

I haven't built this, but I recently saw a meme graphic about how to read the webbings and spots to choose the most ripe watermelon. A use case ripe for a dumb vertical toy app!

The UX Arbitrage Opportunity in AI

UX Arbitrage is when a product presents an opinionated, novel UX on top of a commodity technology (foundational LLM/image/video model APIs). Colloquially, they call them ChatGPT wrappers.

These "promptless" toy prototypes just scratch the surface of AI interfaces. They're effectively still just a one-shot input output tool, conceptually not all that different from a chat interface (but visually very different!). I've been more interested in the nuanced interaction between user actions and AI capabilities. Jordan Cooper captures this eloquently in his blog post:

Granola’s brilliance is that they acknowledge the truth which is that I don’t want to outsource my thinking to AI. A summary of the meeting, without my input, isn’t what I value or want to reference. Granola gives me a note pad, to jot down my shorthand, but then utilizes the transcript from the meeting to “enhance” my notes with the surrounding context from the meeting. The output is fully flushed out, highly legible notes, where I’ve defined the focal points (where I end) and it has filled in the blanks (where it begins). It’s a very subtle interplay between the user and autonomy that once again strikes this magical balance similar to Tesla.
Each use case that contemplates autonomy calls for a different interaction paradigm between a user and automation, and those that elegantly enable the baton of control to be passed back and forth seamlessly within a given use case will thrive.

Most of the new AI products that have resonated with users take a clearly defined use case and combine it with a clever, often subtle, UX that delivers a magical feeling experience. There is massive value in effectively framing the context for a user to come in and naturally understand what to do.

Examples that have stood out to me:

Cursor: The first time I used Cursor to ask how to fix a complex bug that spanned several files in a codebase - felt like magic. Putting a chat sidebar in the code editor and adding the whole codebase as context was a massive UX improvement to the ChatGPT status quo of copying and pasting code, switching windows, and asking the question.
Devin: Split the chat interface in half, show the user how the LLM “thinks” and plans steps, run code in browser and check/debug if it works
Claude Artifacts: Split the chat interface in half, and render generated code from the chat directly.
TLDraw’s Make it Real: show a drawing interface, tell the user you can draw some web interface, feed the drawing as image input into a vision capable model, and get HTML code.
Perplexity: scrape live, up to date search results, keep track of sources/links, display answer to user while clearly displaying the sources of information.
Apple Intelligence: embed generative AI inside very specific use cases - email summaries, rewrite text inside existing apps

Each of these examples leverage the power of LLMs in a specific context and make it clear how to use them. I think we’ve just started to explore the first level of UX arbitrage in AI interfaces. What do the second, third, fourth levels of UX arbitrage look like?

What native experiences can we build with generative AI? And how can we bring these magical experiences to more humans?

I'm excited to continue exploring promptless AI interfaces, UX arbitrage, and broadly to build technology that feels like magic (haven't forgotten about you, vision pro). If you're also excited by these things, I'd love to chat. Feel free to DM me on Twitter @spenciefy.

Saima Akhtar

Commented 1 year ago

wonderful article

In Search of Magic: AI Interfaces Beyond the Chat Box

Honest Optimism with Spencer

In Search of Magic: AI Interfaces Beyond the Chat Box

In Search of Magic: AI Interfaces Beyond the Chat Box

3 promptless AI mobile app prototypes and other thoughts on UX Arbitrage

Chat boxes lead to "prompter's block"

It's Time to Build “Promptless AI“ Toys

1. Color Analysis

2. Art Museum Guide

3. Improve my outfit

4. Bonus: BestWatermelonPicker

The UX Arbitrage Opportunity in AI