Sarcasm, notoriously difficult to interpret, demystified by multimodal approach

Oscar Wilde once said that sarcasm was the lowest form of wit, but the highest form of intelligence. Perhaps that is due to how difficult it is to use and understand. Sarcasm is notoriously tricky to convey through text—even in person, it can be easily misinterpreted. The subtle changes in tone that convey sarcasm often confuse computer algorithms as well, limiting virtual assistants and content analysis tools.

Xiyuan Gao, Shekhar Nayak, and Matt Coler of Speech Technology Lab at the University of Groningen, Campus Fryslân developed a multimodal algorithm for improved sarcasm detection that examines multiple aspects of audio recordings for increased accuracy. Gao presented their work May 16, at a joint meeting of the Acoustical Society of America and the Canadian Acoustical Association, running May 13–17 at the Shaw Center located in downtown Ottawa, Ontario, Canada.

Traditional sarcasm detection algorithms often rely on a single parameter to produce their results, which is the main reason they often fall short. Gao, Nayak, and Coler instead used two complementary approaches—sentiment analysis using text and emotion recognition using audio—for a more complete picture.

“We extracted acoustic parameters such as pitch, speaking rate, and energy from speech, then used Automatic Speech Recognition to transcribe the speech into text for sentiment analysis,” said Gao.

“Next, we assigned emoticons to each speech segment, reflecting its emotional content. By integrating these multimodal cues into a machine learning algorithm, our approach leverages the combined strengths of auditory and textual information along with emoticons for a comprehensive analysis.”

The team is optimistic about the performance of their algorithm, but they are already looking for ways to improve it further.

“There are a range of expressions and gestures people use to highlight sarcastic elements in speech,” said Gao. “These need to be better integrated into our project. In addition, we would like to include more languages and adopt developing sarcasm recognition techniques.”

This approach can be used for more than identifying a dry wit. The researchers highlight that this technique can be widely applied in many fields.

“The development of sarcasm recognition technology can benefit other research domains using sentiment analysis and emotion recognition,” said Gao.

“Traditionally, sentiment analysis mainly focuses on text and is developed for applications such as online hate speech detection and customer opinion mining. Emotion recognition based on speech can be applied to AI-assisted health care. Sarcasm recognition technology that applies a multimodal approach is insightful to these research domains.”