What Google Translate Tells Us About Where AI Is Headed Next

The pc scientists Wealthy Sutton and Andrew Barto have been recognized for an extended track record of influential ideas with this 12 months’s Turing Award, essentially the most prestigious in the sector. Sutton’s 2019 essay “The Bitter Lesson,” for example, underpins much of today’s feverishness around artificial intelligence (AI).

He argues that methods to enhance AI that depend on heavy-duty computation somewhat than human knowledge are “ultimately essentially the most effective, and by a big margin.” That is an idea whose truth has been demonstrated over and over in AI history. Yet there’s one other essential lesson in that history from some 20 years ago that we must heed.

Today’s AI chatbots are built on large language models (LLMs), that are trained on huge amounts of information that enable a machine to “reason” by predicting the subsequent word in a sentence using probabilities.

Useful probabilistic language models were formalized by the American polymath Claude Shannon in 1948, citing precedents from the 1910s and Nineteen Twenties. Language models of this type were then popularized within the Nineteen Seventies and Eighties to be used by computers in translation and speech recognition, through which spoken words are converted into text.

The primary language model on the dimensions of latest LLMs was published in 2007 and was a component of Google Translate, which had been launched a 12 months earlier. Trained on trillions of words using over a thousand computers, it’s the unmistakeable forebear of today’s LLMs, although it was technically different.

It relied on probabilities computed from word counts, whereas today’s LLMs are based on what’s generally known as transformers. First developed in 2017—also originally for translation—these are artificial neural networks that make it possible for machines to raised exploit the context of every word.

The Pros and Cons of Google Translate

Machine translation (MT) has improved relentlessly previously twenty years, driven not only by tech advances but additionally the scale and variety of coaching data sets. Whereas Google Translate began by offering translations between just three languages in 2006—English, Chinese, and Arabic—today it supports 249. Yet while this will likely sound impressive, it’s still actually lower than 4 percent of the world’s estimated 7,000 languages.

Between a handful of those languages, like English and Spanish, translations are sometimes flawless. Yet even in these languages, the translator sometimes fails on idioms, place names, legal and technical terms, and various other nuances.

Between many other languages, the service can assist you to get the gist of a text, but often accommodates serious errors. The most important annual evaluation of machine translation systems—which now includes translations done by LLMs that rival those of purpose-built translation systems—bluntly concluded in 2024 that “MT isn’t solved yet.”

Machine translation is widely used regardless of these shortcomings: Way back to 2021, the Google Translate app reached one billion installs. Yet users still appear to grasp that they need to use such services cautiously. A 2022 survey of 1,200 people found that they mostly used machine translation in low-stakes settings, like understanding online content outside of labor or study. Only about 2 percent of respondents’ translations involved higher stakes settings, including interacting with healthcare employees or police.

Sure enough, there are high risks related to using machine translations in these settings. Studies have shown that machine-translation errors in healthcare can potentially cause serious harm, and there are reports that it has harmed credible asylum cases. It doesn’t help that users are inclined to trust machine translations which are easy to grasp, even after they are misleading.

Knowing the risks, the interpretation industry overwhelmingly relies on human translators in high-stakes settings like international law and commerce. Yet these employees’ marketability has been diminished by the incontrovertible fact that the machines can now do much of their work, leaving them to focus more on assuring quality.

Many human translators are freelancers in a marketplace mediated by platforms with machine-translation capabilities. It’s frustrating to be reduced to wrangling inaccurate output, not to say the precarity and loneliness endemic to platform work. Translators also must contend with the actual or perceived threat that their machine rivals will eventually replace them—researchers check with this as automation anxiety.

Lessons for LLMs

The recent unveiling of the Chinese AI model Deepseek, which appears to be near the capabilities of market leader OpenAI’s latest GPT models but at a fraction of the value, signals that very sophisticated LLMs are on a path to being commoditized. They will probably be deployed by organizations of all sizes at low costs—just as machine translation is today.

After all, today’s LLMs go far beyond machine translation, performing a much wider range of tasks. Their fundamental limitation is data, having exhausted most of what is obtainable on the web already. For all its scale, their training data is prone to underrepresent most tasks, just because it underrepresents most languages for machine translation.

Indeed the issue is worse with generative AI. Unlike with languages, it’s difficult to know which tasks are well represented in an LLM. There’ll undoubtedly be efforts to enhance training data that make LLMs higher at some underrepresented tasks. However the scope of the challenge dwarfs that of machine translation.

Tech optimists may pin their hopes on machines with the ability to keep increasing the scale of the training data by making their very own synthetic versions, or of learning from human feedback through chatbot interactions. These avenues have already been explored in machine translation, with limited success.

So the foreseeable future for LLMs is one through which they’re excellent at a number of tasks, mediocre in others, and unreliable elsewhere. We are going to use them where the risks are low, while they could harm unsuspecting users in high-risk settings—as has already happened to laywers who trusted ChatGPT output containing citations to non-existent case law.

These LLMs will aid human employees in industries with a culture of quality assurance, like computer programming, while making the experience of those employees worse. Plus we can have to cope with recent problems akin to their threat to human artistic works and to the environment. The urgent query: is that this really the longer term we wish to construct?