LLMs: Powerful, But Not Magic

I can't imagine there is anyone in the tech world that hasn't heard of ChatGPT at this point. "AI" and "LLMs" are buzzwords that have touched everyone at least as far as sharing the exploits of their usage gone wrong. These are new and powerful technologies but most people would benefit from understanding a little more about how they work in order to be successful with them.

We aren't yet at the point where people can use LLMs like they use a car. There was a point in time when everyone needed to be a mechanic in order to drive a car. Now, you jump in and it just goes. Maybe you take it for maintenance periodically. LLMs are not there yet.

With that in mind, the one thing to understand in order to be successful is that they aren't intelligence machines but more consensus machines. By 'consensus machines,' I mean that LLMs provide responses based on the most common or likely answer found in their training data, rather than through genuine understanding or reasoning. They have access to all of this knowledge, but they aren't "smart," per se.

The flair with which LLM use case examples are delivered often obscures the reality for those new to the technology, leading to misunderstandings about its potential. A lot of the focus on artificial general intelligence (AGI) and on the magical scenarios where LLMs seem to do amazing things has really mismanaged people's expectations about how these things work and what they can do. Quite literally, they are probabilistic prediction machines. Given a query or prompt, it can guess what the next word in response should be. Sometimes this does feel like magic. But sometimes it also produces unexpected results. As reported by The Verge, Google's AI recommended using glue to make toppings adhere to pizza. In another instance, a customer service email bot apparently started rickrolling people. The rickroll is an internet cultural touchstone. I expect in its predictive capacity based on the info it is trained on, the LLM determined that in many scenarios, a link to a Rick Astley video is highly likely.

The takeaway here is that if you are using an unmodified generalist LLM (ChatGPT, Claude, Perplexity, etc.) you're going to be the most successful in scenarios where you can assess the quality of the output. If you want to experiment with recipes, that's great. You're going to try the recipe to see what it's like. If you want it to write some code, wonderful, you're going to run the code and see if it works. But I wouldn't recommend, for example, you generate a legal document replete with citations if you can't or won't check the validity of the document, as cautioned by Simon Willison in his analysis of a ChatGPT-generated legal brief.

I do believe these tools can be borderline magical in particular ways when used thoughtfully. However, it's crucial to understand both their benefits and shortcomings. This understanding can significantly affect your outcomes when using LLMs. By recognizing their nature as consensus machines rather than true intelligence, users can set realistic expectations and leverage these powerful tools more effectively in appropriate contexts.