Why AI Still Falls Short at Understanding Real Users

Came across a research tool called Synthetic Users. It uses large language models to simulate realistic user interviews. No recruiting. No scheduling. Just simulated feedback on demand. The idea is to speed up early research and hypothesis testing by replicating what a real user might say.

I love seeing all the different ways LLMs can be used. This is how we learn what works and what doesn't.

It reminded me of an experiment I ran back in 2023. I used ChatGPT to synthesize insights from interviews, first-hand experience, and secondary market data into a set of proto-personas for my collectors of Second Realm's artwork. Then I pushed it further. I asked the LLM how each persona might respond to influence using Robert Greene’s 48 Laws of Power.

That work was not just a creative exercise. I've spent nearly two decades in human-centered design, leading service design and customer experience research. I've worked with real users across messy, high-stakes environments and paradigm-shifting emerging technology products. So when I ran that experiment, I wasn't trying to replace people. I was trying to understand what this tech could actually do.

And here's what I've learned since. LLMs are good at surfacing patterns. They're fast. They're directional. But once you step into ambiguity, they start to break down and need human intervention.

Recent studies back that up. In AbsenceBench, researchers found LLMs are much better at spotting inserted facts than noticing when something important is missing. They're built to predict what should come next, not question what isn't there. They work best in environments where the rules are clear. But the world doesn't operate like that.

Even the team behind Synthetic Users admits this. In their own comparison study, they found simulated responses often lack depth unless you ask for it directly. Personal stories, emotional nuance, contradiction, all the things that tend to unlock real product insights, need to be forced into the conversation.

They claim 85 to 92 percent fidelity when comparing synthetic responses to real interviews. But if you read closely, that number comes from structural and thematic overlap. Not emotional realism. Not subtext. Not the awkward silence that happens when a real person gets asked a hard question.

Humans are still better at handling complexity. We pick up on tone, timing, hesitation. We notice when something feels off, even if we can't explain why. As I previously wrote, LLMs are flooding the market with cheap competence. But what they are not replacing is our ability to move through layered, context-dependent situations and come out the other side with insight.

Simulated user feedback might help you get started. They are fast, cheap, and sometimes good enough to move things forward. But they're still mirrors. They reflect what's already been said, not what's been left unsaid.

And that's where the truth usually lives.

In my experience, observational research is still the most reliable way to find early signals. Watching real users interact with your product or service tells you more than a dozen simulated interviews ever will. Start with five people. That is usually all it takes to surface patterns and unexpected truths.

I can see specific use cases for synthetic people, especially when you are testing a strict set of conditions or validating edge-case logic. But those are the exception, not the rule.

Who comes to mind when you read this post?

Send it their way. It might say what they’ve been needing to hear.

Have you missed other posts from Eric's Blog?

Don’t let the next one slip by.

Follow Eric: X | Farcaster | SuperRare