Cover photo

Language Is Having a Moment

Considering language, English dominance and networked effects

Let us start with some facts about language, before veering into speculation and analysis:

The global population is 8 billion.

67% of it is online. There are roughly 400 million native English speakers in the world, but English now has 1.5 billion speakers. 18.8% of the world speaks English, which means 13.8% of them speak it as a second language.

No other language has a similar usage profile. Mandarin has 1.1 billion speakers, but a tiny population who speak it as a second language. Hindi has 600 million speakers, Spanish 550 million. Rounding out the top six is French and Arabic, both with 275 million. To put this in context, there are approximately 7000 living languages in the world and 50% of the global population speaks just 6 of them.

What is even more interesting is that the growth rate of English has accelerated dramatically since 2017.

When I first started looking into this, I assumed that the rise of English would track the growth of global internet usage. 55-60% of all content online is in English, a number that's been steady for years. The chart below tells a different story. A steady 14-15% of the world population spoke English between 1995 and 2016, it's only from 2017 onwards that we see a rapid growth in English language adoption. English and the internet have not moved in lock step with each other. Something else is at work here.

The reason for this growth is not entirely clear. The first problem we encounter is the paucity of reporting and statistics on this trend. The data in this article is cobbled together from several sources. All statistics I've seen on language growth come from a paywalled reference publication, Ethnologue and I got the data by watching this YouTube video. The numbers match up to stats I've seen cited by the CIA Fact Book and Encyclopedia Britannica, but methodology and proficiency levels are lacking.

The second problem is there's little to no widely available analysis on these trends beyond financial reporting on the growth of the English Language Training market. Despite that, all signs are pointing to English as increasing its market share and cementing its status as the global language of business, culture and the web.

What the heck is going on?

Had this surge in English happened in 2020, it would have been all to easy to point to Covid as the reason. 2017 on the other hand shows no clear catalysts, but the trend has been going strong for six years now indicating it was not an anomaly.

Language apps are certainly helping, but let's take a look at the best known one in the world, Duolingo who reported quarterly DAUs on this blog post:

These numbers are impressive, but Duolingo teaches many languages and is quite popular in the US which boasts the largest native English speaking population.

Reports say that most growth in English learning is based in APAC with the two largest populations in the world (India and China) making up the bulk of it. This isn't however a case of two markets carrying an entire global trend. Statistics here are dated, but India is estimated to have 125 million English speakers and China 10 million. It's possible that growth is coming primarily from these locations and there is a reporting gap, but it's far more likely that English use is growing everywhere as a result of globalization.

In short, there are no good answers (at least ones that I can provide) for the rise of English. We have to look at macro trends of all levels and point to a variety of effects. First, there's an increasing concentration around the large languages as a result of many compounding effects from the legacy of colonization to urbanization to economic opportunity favoring scale and conformity. We are losing diversity in language across the board. Second, we need to look at globalization and the effects of corporations adopting standard languages which are required for employment. Third the internet has to have an effect here, especially in creating access to learning. Duolingo is just one of many language learning tools out there.

Still this isn't a particularly satisfying answer. If anyone out there can point to specific triggers for the recent surge in English, please share. This feels like too big a trend to leave in the hands of fuzzy reasons and unclear causations.

Let's Get Speculative

In under 30 years, we've put 5.4 billion people online. That's double the population of the entire world in 1954, a mere 70 years ago. At least one billion of them (but likely more) speak English, and the rate of English adoption has started accelerating in the last six years.

These numbers are staggering on so many levels and yet we never stop to question what the second order effects are of this transition. We assume that things either behave the same as always (government, institutions, values) or we view the internet at a surface depth, never pondering how system dynamics are changing as languages start reaching billion plus speaker levels of penetration.

This leads me to the title of this essay, Is Language Having a Moment?

It's now time to explore whether the scale and speed of this dominant, English speaking, networked communication layer is having a bigger impact than is readily apparent. We'll begin by looking backwards:

A broad arc of economic alpha over the last century looks like this:

Resources -> Manufacturing -> Management -> Finance/Technology -> Information

By this I mean that the greatest opportunity for growth has shifted from one area of focus to another as each of the preceding spaces develop and mature. Back in 1920 what mattered was pulling things out of the ground. At a certain point transforming those resources into finished goods took prominence. Efficiency in both extraction and production then took center stage as management streamlined operations until it was gated by pencil, paper and punch cards. That led to a push in the technology and finance arenas, creating global networks connecting us and making it easier to move capital worldwide. We literally opened the world to same day interoperability on an unimagined scale. With that accomplished, data became the priority as we filled databases with numbers and then made them talk to each other. The trends and relationships within that data, then informed businesses to a degree they previously lacked.

What we see in this evolution is a pattern of increasing complexity that requires higher and higher degrees of abstraction. We package components into concepts, rolling things up into ever larger assemblages which we can wield and manipulate.

Ask yourself would you rather own a coal mine or a SaaS provider? The answer completely depends on whether you lived in 1920 or 2020.

Now ask yourself would you rather own a SaaS provider or a Large Language Model? Four years ago, one was a cashflow generating machine and the other was an unproven high cost research project. Now AI is poised to render specialty software obsolete by its switchblade capabilities.

Information remains as important as ever, but is it the alpha that it once was or has complexity increased to the point that a higher level of abstraction is ready to take center stage?

Here is where we make a leap into the truly speculative and suggest that language is in the process of supplanting information as the new source of alpha. None of this is binary. Information remains important. So do resources, manufacturing, tech and finance, however all of them are maturing systems working in concert with each other in a feedback loop known as the global economy. The market has absorbed all the low hanging fruit it can get from the last unlock and needs to add a new component on the edge to push itself forward even further.

Everyone seems to think that is AI, but what if I argued that it is actually language or said more precisely language forms the top and bottom with AI sandwiched in between? Prior to LLMs, we had machine learning. ML was about forming predictions from past data and steering systems into beneficial behaviors. It was a triumph of information, but not of language. As such, the use cases for ML were limited to the corporate and institutional sectors. LLMs are far more democratizing because are able to form associations between language and act as text prediction engines on steroids whose outputs go beyond text and into media and software. Anyone from a kid wanting to make a cartoon of themselves to a Fortune 500 company can benefit from LLMs.

Not only that, but the interface for LLMs are dead simple. They are text, asking for language as the starting point in a conversation which ultimate ends in the transformation of huge amounts of information into an end product created on demand for a user who simply asked for it.

Accessibility and language go hand in hand. You cannot have one without the other once you ascend up Mazlow's hierarchy of needs past the very basics. For as powerful and transformative as AI will be, the ubiquity and versatility of language may actually make it the real star of our next step forward.

The Case For Language

We are going to start this section with the structuralist argument that language defines what we are capable of thinking about. If there are not words to describe a thing or an experience, then it is impossible for one human to communicate the concept to another. A thing must be named and that name must be disseminated via broad communication between people, or else it is not useful because we can't place it in our mind's eye.

Empiricism and the start of the scientific revolution is an example where the formation of language was just as important as the underlying work being performed. The table of elements, taxonomy, anatomy, physics, mathematics all required building upon existing language in order for their respective disciplines to advance. Without new words or expanded definitions of existing words, there was only superstition or the will of God to explain our natural world.

Let's get more modern. Router, WIFI, USB cable, 5G, virtual reality, mobile app, cryptocurrency, play2earn, inference model, eigenvectors...our lives are dominated by techno jargon. We must constantly update our language in order to understand and interact with the proliferation of tools that we are creating at an ever more accelerated pace.

The advantage of language is that it is plastic, used everyday and essential. By plastic, I mean language is self-organizing and bottom up. There is no committee that decides whether or not a thing is a word. It is decided by social contract in ever-broadening circles of communication between peoples. If a new word is useful, it is adopted. Once it reaches a critical mass of importance and distribution, a committee does sit down and include it in a dictionary, but that is a post-facto action not an initiating one.

We live in an age of enormity. There's three times more of us than there was when my father was born. Not only that, but we live beyond the physical now and extend into digital worlds and networks which are conceptually boundless. There's simply more things to talk about and give our attention to which means language matters more than ever because of competition for our mindshare. We only think about things we can understand and communicate to others.

The argument here is that language sits on top of all other advancements. It is the necessary precursor for the establishment of new ideas and technology. Unlike other fields, language has the advantage of speed and malleability. It is universal and in constant use, and has continued to show it is more than capable of keeping up with what comes next.

What Does Come Next? Looking At The Weird Today To Talk Tomorrow

Over half of my son's TV watching is not TV. If it wasn't for our parental intervention, I suspect all of his consumption would be watching and listening to other people playing video games on YouTube or Twitch.

Streamers are a relatively new global phenomenon. It's bottom up, radically inclusive (in terms of access, not culture) and perhaps the ultimate abstraction to also double as an industry. We are talking about watching someone else navigate a virtual world. The story here is not the story of the game, but rather someone else's attempt to beat it. In under twenty-five years, streamers and influencers have gone from non-existent to a completely normalized and heavily aspirational activity/industry for younger people.

It's also language all the way down. Even the games themselves are nothing but a collection of language made to behave in arbitrary ways. Perhaps we don't consider streaming all that strange however because it still sits squarely in the realm of ad-supported entertainment.

Lets switch gears and talk about memes, cryptocurrency and their intersection, memecoins. Crypto has a language problem that is worse than many fields of frontier tech. It is dominated by STEM-trained males who believe that code is law, that mechanism design matters more than usability, and favor economic incentives (giving away tokens) as their primary and sometimes only means of marketing.

Some of this mindset is justified by the nascent state of the industry. Many of the protocols developed in the space are bleeding edge concepts which have never been attempted before and whose utility is unproven. There's also an incentive loop which justifies skipping out on marketing and communications. Crypto can create value unlike any other field of finance, and early adopters are heavily rewarded for simply being early and participating. While the barrier to entry is high, the payoff is worth the work. One might not understand just what they are getting into, but more often than not jumping in with only a partial understanding is EV+.

Memes are a hack of language. Combining an image with a short burst of text creates an easily communicated concept capable of moving at digital speeds. It's possible that we are able to scale the internet as fast as we have simply because we've compressed many ideas that used to take a paragraph of text to communicate down into a viral semiotic shorthand. Memes are abstraction on steroids.

Memecoins then are combination of crypto and memes in a doubly compressed format. All the complexity of crypto is reduced down to owning a permissionless tradeable asset whose value either goes up or down. A holder simply decides how much to buy and when to sell it. On the flipside, memecoins are a way to make culture and language interoperable with markets in the most conceptually simple way. If memes are a compression of language into snackable format, memecoins are a snackable, social way to participate in global finance.

There's been a lot of thotting on the rise of memecoins. I myself have called them tokens of attention and a way for people to project shared beliefs through pooling capital. Others have called them a symbol of financial nihilism, an unregulated way to gamble, and modern day lottery tickets. All of these things have elements of truth, but perhaps there's another element at work here, and that's the abstractive power of language to steer capital becoming democratized at a fully networked scale.

Would the memecoin market be worth $53 billion dollars if there weren't 5.5 billion people online? Would internet culture be able to accrue value at this scale if half the world didn't speak just 6 languages? Here's one very strange glimpse into why English adoption is accelerating. Memecoins are dead easy to understand, their concepts are overwhelmingly communicated in simple English, and they are the highest performing category in crypto this year.

Toby Shorin wrote an important piece called Life After Lifestyle which traces the evolution of brands as drop shipping, white labeling and ecommerce social distribution came to prominence in the last decade. In it he argued that consumer goods became plastic enough (meaning you can put any label on nearly any product) that the underlying thing lost its meaning. In its place, the brand association gained importance to the point that brands freed themselves from product itself, and were able to perpetuate themselves based on meaning and values alone. In some ways, memecoins are the manifestation of that vision. There is no product, there is only the value associated with the values of an idea. Said another way, there is only language.

If we are trending in this direction, then generative AI is poised to take it even further. I spent the first couple months of this year enamored with AI music making five albums worth of material under the Starholder name. All I needed was the ability to write lyrics, a deep working knowledge of the language of music and Suno.ai to talk to. That effort resulted in a three way conversation between myself, the platform and the dozens of people who listened to what I produced. In all that a single instrument was never picked up, abstracting music up to the level of language.

There's a simplistic ubiquity circling around generative AI. All models use text as their user interface. It doesn't matter if you are making images, articles, music or video, each modality just asks you to type into it. There's also a lack of precision at work in these models which favor large descriptive vocabularies. The tech is young enough that each has its own fussy levels of refinement and capabilities. They often converge into the mid, producing generic outputs and getting them to break from that to create interesting work requires lulling AI models away from common terrain via conversation the same way you need to get your football fanatic co-worker talking about anything else.

Conclusion

In some ways, making the case for language is like arguing about the importance of water. It is essential, it is everywhere, but it's also so basic that it gets taken for granted and might not be all that actionable. On the flipside, something larger is at work here and it might be too early to pin down exactly what that is.

What is clear is this:

  • Half the world speaks just 6 languages with English scaling to 1.5 billion speakers

  • We are seeing people learn English at an accelerated rate, with a big pickup since 2017

  • The frontier of technology is operating in a conversational modality. We are past the era of code, math and numbers at the user interface level.

  • While hard science and technology development will remain in the realm of engineering and mathematics in the short term, access to those advances is being radically democratized via chat interfaces into artificial intelligence.

  • The world continues to get flatter, more connected and operates at an accelerated pace. Language is able to adapt to these changes by concentrating speakers into fewer languages, communicating at higher frequencies making it easier to create and disseminate updates to its corpus in a way that pushes other fields forward.

Because language is so big and so basic, I think it's elevation in importance is being overlooked. There's some blindness at play here as well, we've always mentally separated the world into industrialized and developing regions. The value add is not coming yet from turning Madagascar into a chip manufacturing powerhouse, it's coming from integrating a broad distributed base of very online people into an emerging digital economy which is traditionally underreported. No one writes articles about all the labor gig workers contribute, they write about the creator who outsourced and assembled that effort.

This will change in time. It will change when the economic activity of more weird productless digital ideas' reach tipping points where capital stands up and takes notice. Today, venture capitalists and hedge funds are dipping their toes into memecoins because the returns are there. We are not far off from that extending into media. Influencers and streamers have already shown they are capable of using the power of language to assemble audiences which rival broadcast networks, they just are doing it under the umbrella of tech platforms like YouTube, Instragram, TikTok and Twitch. There will come a time when independently produced entertainment will be distributed over permissionless networks at a scale that forces Hollywood to stand up and take notice.

Our larger story already has a scripted ending. It's the arc promised by globalization, where economic development (AKA labor arbitrage) was an investment in the future of developing nations. If we reach a point where that optimism plays out and there are flat screen TVs and Hondas in every house in the world, then analysis will point to free trade agreements and western liberal democracy as the catalyst for transformation. It won't focus on the actions of each person who adapted their lives. It won't point to our horrific treatment of migrants, discouraging them from moving to another country. It will not point to language, nor to the self-organizing effects of the internet.

Under the surface, there will be another story. It will be one of scale, connection and language. More people communicating in common tongues on a shared public good, using more powerful tools to create new categories, scenes and even industries. This is where the opportunity is. It will be strange, seem niche, and move at uncomfortably fast speeds. Language will lead it, because we cannot have new things without words for them. Soon enough it will be words that produce objects, code and systems.

It remains to be seen how impactful AI is or what its end interfaces will look like, but the more it mirrors how we communicate today the more we will integrate it into our lives. If current trends hold, we could see a quarter of the world speaking the same language. Combine those two things together and the potential for growth is explosive.

Maybe in hindsight, the case for language will have been obvious. Only time will tell.

Loading...
highlight
Collect this post to permanently own it.
Starholder logo
Subscribe to Starholder and never miss a post.