Artificial intelligence has in recent years proved itself to be a quick study, although it is being educated in a manner that would shame the most brutal headmaster. Locked into airtight Borgesian libraries for months with no bathroom breaks or sleep, AIs are told not to emerge until they’ve finished a self-paced speed course in human culture. On the syllabus: a decent fraction of all the surviving text that we have ever produced.
When AIs surface from these epic study sessions, they possess astonishing new abilities. People with the most linguistically supple minds—hyperpolyglots—can reliably flip back and forth between a dozen languages; AIs can now translate between more than 100 in real time. They can churn out pastiche in a range of literary styles and write passable rhyming poetry. DeepMind’s Ithaca AI can glance at Greek letters etched into marble and guess the text that was chiseled off by vandals thousands of years ago.
The trouble is that, like other high-end human cultural products, good prose ranks among the most difficult things to produce in the known universe. It is not in infinite supply, and for AI, not any old text will do: Large language models trained on books are much better writers than those trained on huge batches of social-media posts. (It’s best not to think about one’s Twitter habit in this context.) When we calculate how many well-constructed sentences remain for AI to ingest, the numbers aren’t encouraging. A team of researchers led by Pablo Villalobos at Epoch AI recently predicted that programs such as the eerily impressive ChatGPT will run out of high-quality reading material by 2027. Without new text to train on, AI’s recent hot streak could come to a premature end.