How will the internet evolve in the coming decades?
Fiction writers have explored some possibilities.
In his 2019 novel “Fall,” science fiction author Neal Stephenson imagined a near future in which the internet still exists. But it has become so polluted with misinformation, disinformation and advertising that it is largely unusable.
Characters in Stephenson’s novel deal with this problem by subscribing to “edit streams” – human-selected news and information that can be considered trustworthy.
The drawback is that only the wealthy can afford such bespoke services, leaving most of humanity to consume low-quality, noncurated online content.
To some extent, this has already happened: Many news organizations, such as The New York Times and The Wall Street Journal, have placed their curated content behind paywalls. Meanwhile, misinformation festers on social media platforms like X and TikTok.
Stephenson’s record as a prognosticator has been impressive – he anticipated the metaverse in his 1992 novel “Snow Crash,” and a key plot element of his “Diamond Age,” released in 1995, is an interactive primer that functions much like a chatbot.
On the surface, chatbots seem to provide a solution to the misinformation epidemic. By dispensing factual content, chatbots could supply alternative sources of high-quality information that aren’t cordoned off by paywalls.
Ironically, however, the output of these chatbots may represent the greatest danger to the future of the web – one that was hinted at decades earlier by Argentine writer Jorge Luis Borges.
The rise of the chatbots
Today, a significant fraction of the internet still consists of factual and ostensibly truthful content, such as articles and books that have been peer-reviewed, fact-checked or vetted in some way.
The developers of large language models, or LLMs – the engines that power bots like ChatGPT, Copilot and Gemini – have taken advantage of this resource.
To perform their magic, however, these models must ingest immense quantities of high-quality text for training purposes. A vast amount of verbiage has already been scraped from online sources and fed to the fledgling LLMs.
The problem is that the web, enormous as it is, is a finite resource. High-quality text that hasn’t already been strip-mined is becoming scarce, leading to what The New York Times called an “emerging crisis in content.”
This has forced companies like OpenAI to enter into agreements with publishers to obtain even more raw material for their ravenous bots. But according to one prediction, a shortage of additional high-quality training data may strike as early as 2026.
As the output of chatbots ends up online, these second-generation texts – complete with made-up information called “hallucinations,” as well as outright errors, such as suggestions to put glue on your pizza – will further pollute the web.
And if a chatbot hangs out with the wrong sort of people…