AI misunderstands some people’s words more than others

'Sorry, I didn't get that': AI misunderstands some people's words ...

The idea of a humanlike artificial intelligence assistant that you can speak with has been alive in many people’s imaginations since the release of “Her,” Spike Jonze’s 2013 film about a man who falls in love with a Siri-like AI named Samantha. Over the course of the film, the protagonist grapples with the ways in which Samantha, real as she may seem, is not and never will be human.

Twelve years on, this is no longer the stuff of science fiction. Generative AI tools like ChatGPT and digital assistants like Apple’s Siri and Amazon’s Alexa help people get driving directions, make grocery lists, and plenty else. But just like Samantha, automatic speech recognition systems still cannot do everything that a human listener can.

You have probably had the frustrating experience of calling your bank or utility company and needing to repeat yourself so that the digital customer service bot on the other line can understand you. Maybe you’ve dictated a note on your phone, only to spend time editing garbled words.

Linguistics and computer science researchers have shown that these systems work worse for some people than for others. They tend to make more errors if you have a non-native or a regional accent, are Black, speak in African American Vernacular English, code-switch, if you are a woman, are old, are too young or have a speech impediment.

Tin ear

Unlike you or me, automatic speech recognition systems are not what researchers call “sympathetic listeners.” Instead of trying to understand you by taking in other useful clues like intonation or facial gestures, they simply give up. Or they take a probabilistic guess, a move that can sometimes result in an error.

As companies and public agencies increasingly adopt automatic speech recognition tools in order to cut costs, people have little choice but to interact with them. But the more that these systems come into use in critical fields, ranging from emergency first responders and health care to education and law enforcement, the more likely there will be grave consequences when they fail to recognize what people say.

Imagine sometime in the near future you’ve been hurt in a car crash. You dial 911 to call for help, but instead of being connected to a human dispatcher, you get a bot that’s designed to weed out nonemergency calls. It takes you several rounds to be understood, wasting time and raising your anxiety level at the worst moment.

What causes this kind of error to occur? Some of the inequalities that result from these systems are baked into the reams of linguistic data that developers use to build large language models. Developers train artificial intelligence systems to understand and mimic human language by feeding them vast quantities of text and audio files containing real human speech. But whose speech are they feeding them?

If a system scores high accuracy rates when speaking with affluent white Americans in their mid-30s, it is reasonable to guess…

Access the original article

Subscribe
Don't miss the best news ! Subscribe to our free newsletter :