Languaging the Future: A provisional check-list of obstacles

Andrew Joscelyne

4 min readFeb 10, 2023

We are currently bewitched by the belief that “natural” language — a grammatical flow of intentional meaningfulness — can be extracted via statistical wizardry from cohorts of digital words. But it is we humans, of course, that are doing the understanding, not the machine…
We want to optimize languaging at scale for commercial, personal, and regulatory activities of all kinds. This entails building tools to identify, produce, correct, summarize and translate written and spoken content from one-to-many languages on a global scale. But we know that the current statistical approaches to automated talk and text lead to massive quality issues, confusion about content provenance, and far less than global coverage of languages due to the lengthy production timeline;
We also want to rapidly introduce technical solutions for language disabilities everywhere, but we know that most of the world cannot access reliable electricity-powered solutions or digital networks to drive such aids;
We want to accelerate language revitalization, often for tongues spoken in parts of the world where technology is a luxury. But speakers of these languages are typically wary of the negative influence of networked tech on the revitalization process (spying, loss of local control, exploitation of their patrimony, loss of traditions among younger generations, etc.);
We could benefit from deep-monitoring the health of languages worldwide in terms of their changing access / populations / education / translation / digitalization quotas. But as yet there is no critical model for classifying language status, apart from comparing speaker numbers;
We want to enhance language learning as a practical advantage for hundreds of millions (digital devices like phones are an obvious but insufficient tool to scale up). Yet constantly improving automatic translation appears to offer a much quicker/cheaper technical pathway to cross-language understanding and communication than individual brain-to-brain;
We want to simplify and accelerate the actions and passions involved in everyday cross-media communication for millions of people by enabling them to creatively listen to books, talk to films, zap ideas into pictures or music — i.e. digitally “make things new”. Yet we know that the same technology will impinge on copyright and reduce the value of individual creative outputs;
We accept the fact that clashing populations led by religious/political power-seekers should be able to use their own languages, but find it hard to swallow the possibility that language sharing or mutual learning might be used to mitigate hostility and encourage better co-habitation;
We have nearly reached the launch stage for a brain-dump of human written expression. Large language models are attempting to scour all available written (and soon speech) data from digital resources across all relevant languages. Yet new content is now being seen as variations on existing text, rendering new “knowledge” a mere mix of the past. And very few of the most-used 1,000 languages are so far covered by these digital projects. So what do we know about the real extent of possible knowledge?
Language applications such as chat-search, media content generation, and more are constantly improving in various ways. Yet we all know they focus overwhelmingly on content and ability in English and a few other languages, using hidden armies of checkers to edit and clean acceptable content. Will the weight of the digital language pyramid be skewed against the long tail for the foreseeable future?
James Lovelock’s Gaia hypothesis argues that “Life” acts in a self-regulating way to maintain habitable conditions for itself on planet Earth. If proved true, there will be no real benefit for long-termist escaping out to the galaxy, except to dig for rare minerals. So we’ll need to develop ways of equitably languaging the planet in a context of inevitable language change, drift, loss…and possibly re-emergence. How can we best prepare for a proper language agenda for little Earthism?
Language loss is not yet a major “ecological” issue on a par with species loss. Will it be possible to progress further and use technology to start de-extincting languages, even though they have no DNA? “Dead” languages using writing have been with us for centuries and are still used by some scholars and education systems. The quickest way to save dying languages would be to digitize them in a “rich” manner. But would they be a new artifact or a “natural” language?
The key sites for language technology inventions throughout history have been the usual three suspects of prosthetic engineering: a) automating military or government information processing at a distance (by signaling, telegraph, phones, computers), b) enhancing religious or political proselytization through mass translation and the use of writing/print, and c) improving conditions for the sensorially disabled to communicate more effectively (via typewriting, tape-recording, Braille, etc.). Yet it is ultimately story-telling in all its forms that is the driving force for human sense-making and reasoning, and even making up new words and inventing new worlds. Good translators, for example, basically “tell the story” of the sentence or text they are working on! When will the AGI agenda manage to address (not just imitate) this decisively human competence?
Coda: Human knowledge made explicit via language and preserved as speech, text and image appears to encode natural truth about the world. We tend to say that knowledge has been dis-covered — previously lurking out of sight beyond words. We “know” that most of our everyday knowledge (our “native” physics, psychology, art, etc.) is tacit — i.e. not couched in linguistic phrases somewhere, and so not searchable via a linguistic prompt. You can’t easily articulate what you don’t know, but you can know and predict many things without being able to utter exactly how or what they are. Is it possible that, rather in the manner of the long-lasting CYC project, we still aim to collect zillions of previously tacit statements about the human universe and make them all utterable, searchable, and shareable as “the truth”? Or will the limits of my world eventually mean the limits of my language? to reverse Wittgenstein!

Languaging the Future: A provisional check-list of obstacles

Written by Andrew Joscelyne