You have just returned home after a long day at work and are about to sit down for dinner when suddenly your phone starts buzzing. On the other end is a loved one, perhaps a parent, a child or a childhood friend, begging you to send them money immediately.
You ask them questions, attempting to understand. There is something off about their answers, which are either vague or out of character, and sometimes there is a peculiar delay, almost as though they were thinking a little too slowly. Yet, you are certain that it is definitely your loved one speaking: That is their voice you hear, and the caller ID is showing their number. Chalking up the strangeness to their panic, you dutifully send the money to the bank account they provide you.
The next day, you call them back to make sure everything is all right. Your loved one has no idea what you are talking about. That is because they never called you – you have been tricked by technology: a voice deepfake. Thousands of people were scammed this way in 2022.
As computer security researchers, we see that ongoing advancements in deep-learning algorithms, audio editing and engineering, and synthetic voice generation have meant that it is increasingly possible to convincingly simulate a person’s voice.
Even worse, chatbots like ChatGPT are starting to generate realistic scripts with adaptive real-time responses. By combining these technologies with voice generation, a deepfake goes from being a static recording to a live, lifelike avatar that can convincingly have a phone conversation.
Cloning a voice
Crafting a compelling high-quality deepfake, whether video or audio, is not the easiest thing to do. It requires a wealth of artistic and technical skills, powerful hardware and a fairly hefty sample of the target voice.
There are a growing number of services offering to produce moderate- to high-quality voice clones for a fee, and some voice deepfake tools need a sample of only a minute long, or even just a few seconds, to produce a voice clone that could be convincing enough to fool someone. However, to convince a loved one – for example, to use in an impersonation scam – it would likely take a significantly larger sample.
Protecting against scams and disinformation
With all that said, we at the DeFake Project of the Rochester Institute of Technology, the University of Mississippi and Michigan State University, and other researchers are working hard to be able to detect video and audio deepfakes and limit the harm they cause. There are also straightforward and everyday actions that you can take to protect yourself.
For starters, voice phishing, or “vishing,”…