AI Babel Fish becomes reality, allowing direct speech-to-speech translations

An AI model that can translate speech and text, including direct speech-to-speech translations, for up to 101 languages is described in Nature. The model, named SEAMLESSM4T, fills gaps in language coverage and outperforms existing systems. The work may pave the way for rapid universal translations, with resources being made publicly available (for non-commercial use) to assist further research on inclusive speech translation technologies.

Readers of science fiction might be familiar with the Babel Fish from The Hitchhiker’s Guide to the Galaxy, a small fish that could be inserted into an ear and simultaneously translate from one spoken language to another. Such a tool would be valuable in facilitating communication in an interconnected global landscape, but most existing machine learning translation systems are text oriented, or involve multiple steps-speech recognition, translation into text, and conversion of text to speech.

In addition, language coverage for existing speech-to-speech models falls behind that of text-to-text models and tends to be skewed towards translating from a source language into English, rather than from English to another language.

Addressing these limitations, the Seamless Communication Team from Meta have developed a single model that supports multiple modes of translation between up to 101 languages. SEAMLESSM4T can facilitate speech-to-speech translation (recognizing 101 languages and translating to 36 languages), speech-to-text translation (101 to 96 languages), text-to-speech translation (96 to 36 languages), text-to-text translation (96 languages), and automatic speech recognition (96 languages).

For speech-Instant speech-to-speech translation, SEAMLESSM4T translates text with up to 23% more accuracy than existing systems. The AI model can filter out background noise and adjust to speaker variation. Although further optimization is required, SEAMLESSM4T may represent a step towards improving communication across language barriers, the authors conclude.

More information:
Marta Costa-jussà, Joint speech and text machine translation for up to 100 languages, Nature (2025). DOI: 10.1038/s41586-024-08359-z. www.nature.com/articles/s41586-024-08359-z