The other day I was scrolling through my YouTube feed and stumbled across a Steve Jobs talk I hadn’t seen before. It was titled “The Objects Of Our Life”, a talk given at the 1983 International Design Conference in Aspen. I believe it was only recently released by the Steve Jobs Archive this past July.
I was expecting more or less the same platitudes that have become familiar in these Steve Jobs videos. But I was struck by how prescient and truly visionary his thoughts were – this was in 1983, a time when the concept of personal computers was still foreign to most people. Apple Computer had just launched the Lisa computer, one of the first commercially available machines with a mouse and a graphical user interface.
Steve discusses a range of topics, from the personal computer revolution to software distribution to reasons for the low employee churn at Apple. I found his thoughts on the history and future of computing particularly relevant today. It's amazing to me that Steve's ultimate vision of "computers understanding language" is finally here 40 years later. I wonder how he would have reacted to the OpenAI ChatGPT advanced voice mode demo – powered by low latency speech to text, text to speech, and large language models.
Voice recognition is going to be the better part of the decade away. We can do toy voice recognition now. The problem is, it isn’t just recognizing the voice. When you talk to someone, it’s in understanding language, which is much harder than understanding the voice. We can sort out the words, but what do they all mean?
Most language is exceptionally contextually driven. One word means something in this context and something entirely different in another context. When you’re talking to someone, people interact. It’s not a one-way communication like “yep yep yep yep”; they gracefully interact, they go in and out of levels of detail. Boy, this stuff’s hard. So, I think you’re really looking at the better part of a decade before we get even close to that.
I took the Youtube video and fed it into GPT4o to get an edited transcript, which I published in full here if you want to read. Below are my favorite excerpts:
On the computer generation and predicting that the computer will take over the TV:
A lot of you are products of the television generation. I’m pretty much a product of the television generation, but I’m also starting to be a product of the computer generation. And the kids growing up now are definitely products of the computer generation. In their lifetimes, the computer will become the predominant medium of communication, just as television took over from radio, and radio took over from the book.
computer generation.. iphone generation.. social media generation.. chatgpt generation.. what's next?
Computers feel like magic because they can do simple things very fast
The analogy he gives of running outside to get flowers is a great example of how he was able to communicate complex ideas in a way everyone can understand.
Computers are really dumb. They are exceptionally simple, but they’re really fast. The raw instructions we feed these little microprocessors—even the instructions for giant Cray-1 supercomputers—are the most trivial things. They are things like: “Get some data from here,” “Fetch a number from there,” “Add two numbers together,” “Test to see if it’s bigger than zero,” and “Put it over there.” It’s the most mundane thing you could imagine.
But the key thing is that computers can execute these instructions extremely fast. Let’s say I could move 100 times faster than anyone in here. In the blink of an eye, I could run outside, grab a bouquet of fresh spring flowers, run back in, and snap my fingers. You’d all think I was a magician, but I’d just be doing a series of really simple actions—running out, grabbing the flowers, running back, snapping my fingers. The only difference is, I’d be doing them so fast that it would seem magical.
It’s exactly the same with a computer. It can grab numbers, add them together, and throw them around at a rate of about a million instructions per second. So, we tend to think something magical is going on, but really it’s just a series of simple instructions executed at incredible speed.
Progression of human-computer interaction = higher and higher levels of abstraction
What we do is take these very simple instructions and, by building collections of them, create higher-level instructions. Instead of saying, “Turn right. Left foot. Right foot. Extend hand. Grab flowers. Run back,” we can say, “Could you go get some flowers? Could you pour a cup of coffee?”
We have started in the last 20 years to deal with computers in higher and higher levels of abstraction. But ultimately these levels of abstraction get translated down into these stupid instructions that run really fast.
Command line -> GUI with mouse and keyboard -> touch screens
Higher levels of abstraction today:
Natural conversational interfaces.
Spatial hand gestures with vision pro
Short aside, but this excerpt from Bret Taylor's recent podcast with Patrick O'Shaughnessy captures why conversational interfaces are so exciting as the next level of abstraction in HCI:
That demo of GPT-4o was so remarkable. I had so many people text me about it, just talking about just how emotional the experience of watching that was.
What I love about it is it's the way a science fiction author would describe how you should interact with a computer. We started out with punch cards, then we had mice and keyboard, now we have touch screens, which are, I feel, slightly more haptic and slightly more natural.
Every time you watch a science fiction film or a movie, you're just having a conversation, right? And it was probably the first time for a lot of people where it really fell right. It didn't have that uncanny valley feeling of I'm talking to a robot. And I think that's really exciting.
And so obviously, multimodal models to generate images and video are quite exciting, like Sora. But I think that particular thing, which is the interface to computers goes away, then we can simply have a conversation to interact with software, interact with this digital world is so powerful.
I think of just the accessibility of software in this world of conversational AI was just such a tremendous breakthrough. It's just the correct way we should interact with software, and no one needs an instruction manual on how to have a conversation.
Call to action to get more designers working on computers than the cars/auto industry in the 80s
Hindsight is always 20:20, and it's wild to hear the "prediction" that people will be spending 2-3 hours a day on computers... "more than they spend driving".
Let me digress for a minute. One of the reasons I’m here is because I need your help. If you’ve looked at computers, they look like garbage. All the great product designers are off designing automobiles or buildings, but hardly any of them are designing computers.
Now, if we take a look, we’re going to sell 3 million computers this year. We’ll sell 10 million computers in ‘86, whether they look like a piece of junk or they look great. It doesn’t really matter, because people are going to suck this stuff up so fast they’ll buy it no matter what it looks like. And here’s the thing: it doesn’t cost any more to make it look great.
These computers will be new objects in everyone’s working, educational, and home environments. We have a shot at putting a great object there—or, if we don’t, we’ll end up with just another piece of junk. By ’86 or ‘87, pick a year, people will be spending more time interacting with these machines than they do with their cars today. They’ll be spending two or three hours a day, sometimes longer, using these machines—more time than they spend driving.
So, the industrial design, the software design, and how people interact with these machines must be given at least the same consideration we give to automobiles today, if not more. If you take a look, most automobiles, televisions, audio electronics, watches, cameras, bicycles, calculators—almost all the objects in our lives—aren’t being designed in America anymore. Europe and Japan have taken over those markets.
The history of communication medium shifts
New technologies enable new communication mediums (printing press = books, telegraph = 1:1 long distance communication, radio = 1:many communication locally). Each medium has new and unique opportunities - it's all about finding these.
In the tech/social media thesis that you need new fundamental technology shifts for new forms of communication: SMS 140 chars -> Twitter, mobile camera + 3g -> Instagram, Snapchat, better phones + 4g -> Musical.ly/TikTok. Underlying all of these is the unique capability of the infinite personalized feed (which I wrote briefly about here).
Okay, let’s go back to this revolution. What’s happening? What’s happening is the personal computer is a new medium of communication—one of the media. So, what’s a medium? It’s a technology of communication. A book is a medium, as are the telephone, radio, and television. These are all mediums of communication, and each medium has pitfalls, shortcomings, and boundaries you can’t cross. But each also has new and unique opportunities.
The interesting thing is that each medium shapes not only the communication that goes through it but also the process of communication itself. A perfect example is comparing the telephone to what we’re seeing now in electronic mail. With email, we link computers together and can send messages to an electronic mailbox, which people can retrieve at their leisure. While we’re still sending information through wires, the process is fundamentally different.
Breaking out of old habits and not leveraging the unique capabilities of new technology mediums
I love this line in particular: "We’re starting to break out of old habits, and it’s really exciting."
What I’m claiming is that computers are a medium, and personal computers are a new and different medium from large computers. When a new medium enters the scene, we tend to fall back into old media habits. Let’s look at a few transitions from one medium to another: radio to television, and television to the interactive medium of the video disc.
Now, let’s look at the next transition: the optical video disc, which can store 55,000 images on a side or an hour of video, randomly accessible. But what are we using it for? Movies—we’re falling back into old media habits. Still, there are some experiments happening, and you can believe that in five to ten years, it will come into its own. For example, MIT did an experiment here in Aspen, where they photographed every street and intersection in town. Using a video disc and a computer, you could navigate the streets virtually and even see the town in different seasons. While not incredibly useful, it points to the interactive potential of this new medium, which is just starting to break out from traditional uses like movies.
Let’s go back to computers. Right now, we’re in the “I Love Lucy” stage of our medium development. When microcomputers and personal computers first came on the scene, we fell back into old habits. We ran strange languages like COBOL and used them for business accounting—things we’ve been doing on computers historically. It took us about four years to start breaking out of that mindset, and we’re just starting now.
Look at Lisa. Lisa enables someone like me, who isn’t an artist, to sit down and draw artistic pictures using a program called LisaDraw. I can erase, move, shrink, grow, and change textures with ease. I can even use an airbrush effect, making areas darker the more I scrub. With no drawing talent, I can create neat drawings, combine them with words in documents, and send them electronically. We’re starting to break out of old habits, and it’s really exciting.
Computers enable thousands of individual experiences, all based on one set of underlying principles
Would young Steve Jobs have used Character AI to talk to Aristotle?
When I was going to school, I had a few great teachers and a lot of mediocre ones. The thing that probably kept me out of trouble was books. I could read what Aristotle wrote, or what Plato wrote, without needing an intermediary in the way. A book is a phenomenal thing—it gets right from the source to the destination without anything in the middle.
The problem is, you can’t ask Aristotle a question. I think as we look toward the next 50 to 100 years, if we can really come up with machines that capture an underlying spirit, or an underlying set of principles, or an underlying way of looking at the world, then when the next Aristotle comes along, maybe, if he carries one of these machines around his whole life, typing all his ideas into it, then maybe someday, after he’s gone, we can ask the machine: “What would Aristotle have said about this?”
Injecting some liberal arts into these computers
"People do judge a book by its cover" - Mike Markkula
Every computer to date has used a weird type on the screen. As you know, the I’s are just as wide as the W’s—they’re non-proportionally spaced fonts, as we call them. And, um, it’s really been impossible to use multiple fonts on the screen at any given time. Matter of fact, the fonts have been just garbage, and it’s been really impossible to embed any kind of graphics with text.
If you take a look at Lisa, it is totally proportionally spaced text. We have 30, 40 fonts on the screen that come out at approximately 80 dots per inch resolution on the screen and approximately up to 300 dots per inch resolution on a laser printer. And that’s where we are today. What you’re saying is we really want to go to 600 or 800 dots per inch on a laser film printer.
We’re not there yet. But we’re solving the problems of injecting some liberal arts into these computers. That’s what we’re trying to do right now: let’s get proportionally spaced fonts in there, let’s get multiple fonts in there, let’s get graphics in there so we can deal in pictures.
On putting something back into the pool of human experience
Someone asked how Apple has been able to have such low employee churn:
We feel that for some crazy reason, we’re in the right place at the right time to put something back. What I mean by that is, most of us didn’t make the clothes we’re wearing, we didn’t cook or grow the food that we eat, we’re speaking a language that was developed by other people, and we use mathematics that was developed by other people. We’re constantly taking, and the ability to put something back into that pool of human experience is extremely neat.
"The day someone at Apple decides they can't make a difference anymore is the day we've lost them"
At Apple, when you get hired, some people survive and some don’t. But in general, it’s, “Hey, this is the general thing we think we need done. Go figure out what we need, come back and tell us how much it’s going to cost, and then go do it.”
So we’ve got an incredible group of entrepreneurs, and we’re always arguing with each other, but that’s just fine. Out of the 5,000 people we have, most are very independent thinkers. What they really want is the environment where they don’t have to convince 30 other people that it’s the right thing to do. Does that make sense?
It gets harder as we get older, and it’s harder to spend time with everyone and pull everyone together. But we make an attempt to do that. Our feeling is that the day someone at Apple decides they can’t make a difference anymore is the day we’ve lost them.
I highly recommend spending an hour to listen to the whole talk. I published my GPT4o edited trasnscript in full here if you want to read.