Bad broadband, no problem: Google’s open-source speech codec works on even low quality networks
In a bid to put an end to the all-too-familiar choppy, robotic voice calls that come with low bandwidth, Google is open-sourcing Lyra, a new audio codec that taps machine-learning to produce high-quality calls even when faced with a dodgy internet connection.
Google’s AI team is making Lyra available for developers to integrate with their communication apps, with the promise that the new tool enables audio calls of a similar quality to that achieved with the most popular existing codecs, while requiring 60% less bandwidth.
Audio codecs are widely used today for internet-based real-time communication. The technology consists of compressing an input audio file into a smaller package that requires less bandwidth for transmission, and then decoding the file back into a waveform that can be played out over a listener’s phone speaker.
The more compressed the file is, the less data is required to send the audio over to the listener. But there is a trade-off: typically, the most compressed files are also harder to reconstruct, and tend to be decompressed into less intelligible, robotic voice signals.
“As such, a continuing challenge in developing codecs, both for video and audio, is to provide increasing quality, using less data, and to minimize latency for real-time communication,” Andrew Storus and Michael Chinen, both software engineers at Google, wrote in a blog post.
The engineers first introduced Lyra last February as a potential solution to this equation. Fundamentally, Lyra works similarly to conventional audio codecs: the system is built in two pieces, with an encoder and a decoder. When a user talks into their phone, the encoder identifies and extracts attributes from their speech, called features, in chunks of 40 milliseconds, then compresses the data and sends it over the network for the decoder to read out to the receiver.
To give the decoder a boost, however, Google’s AI engineers infused the system with a particular type of machine learning model. Called a generative model, and trained on thousands of hours of data, the algorithm is capable of reconstructing a full audio file even from a limited number of features.
Where traditional codecs can merely extract information from parameters to re-create a piece of audio, therefore, a generative model can read features and generate new sounds based on a small set of data.
Generative models have been the focus of much research in the past few years, with different companies taking interest in the technology. Engineers have already developed state-of-the-art systems, starting with DeepMind’s WaveNet, which can generate speech that mimics human voice.
Equipped with a model that reconstructs audio using minimal amounts of data, Lyra can therefore maintain very compressed files at low bitrates, and still achieve high-quality decoding on the other end of the line.
Storus and Chinen evaluated Lyra’s performance against that of Opus, an open-source codec that is widely leveraged for most voice-over-internet applications.
When used in a high-bandwidth environment, with audio at 32 kbps, Opus is known to enable a level of audio quality that is indistinguishable from the original; but when operating in bandwidth-constrained environments down to 6 kbps, the codec starts showing degraded audio quality.
In comparison, Lyra compresses raw audio down to 3 kbps. Based on feedback from expert and crowdsourced listeners, the researchers found that the output audio quality compares favorably against that of Opus. At the same time, other codecs that are capable of operating at comparable bitrates to Lyra, such as Speex, all showed worst results, marked by unnatural and robotic sounding voices.
“Lyra can be used wherever the bandwidth conditions are insufficient for higher-bitrates and existing low-bitrate codecs do not provide adequate quality,” said Storus and Chinen.
The idea will appeal to most internet users who have found themselves, especially over the past year, faced with insufficient bandwidth when working from home during the COVID-19 pandemic.
Since the start of the crisis, demand for broadband communication services has soared, with some operators experiencing as much as a 60% increase in internet traffic compared to the previous year – leading to network congestion and the much-dreaded conference call freezes.
Even before the COVID-19 pandemic hit, however, some users were already faced with unreliable internet speeds: in the UK, for example, 1.6 million properties are still unable to access superfast broadband.
In developing countries, the divide is even more striking. With billions of new internet users expected to come online in the next few years, said Storus and Chinen, it is unlikely that the explosion of on-device compute power will be met with the appropriate high-speed wireless infrastructure anytime soon. “Lyra can save meaningful bandwidth in these kinds of scenarios,” said the engineers.
Among other applications that they expect will emerge with Lyra, Storus and Chinen also mentioned archiving large amounts of speech, saving battery or alleviating network congestion in emergency situations.
It is now up to the open-source community, therefore, to come up with innovative use-cases for the technology. Developers can access Lyra’s code on GitHub, where the core API is provided along with an example app showcasing how to integrate native Lyra code into a Java-based Android app.