It takes a body to understand the world – why ChatGPT and other language AIs don’t know what they’re saying

When we asked GPT-3, an extremely powerful and popular artificial intelligence language system, whether you’d be more likely to use a paper map or a stone to fan life into coals for a barbecue, it preferred the stone.

To smooth your wrinkled skirt, would you grab a warm thermos or a hairpin? GPT-3 suggested the hairpin.

And if you need to cover your hair for work in a fast-food restaurant, which would work better, a paper sandwich wrapper or a hamburger bun? GPT-3 went for the bun.

Why does GPT-3 make those choices when most people choose the alternative? Because GPT-3 does not understand language the way humans do.

Bodiless words

One of us is a psychology researcher who over 20 years ago presented a series of scenarios like those above to test the understanding of a computer model of language from that time. The model did not accurately choose between using rocks and maps to fan coals, whereas humans did so easily.

The other of us is a doctoral student in cognitive science who was part of a team of researchers that more recently used the same scenarios to test GPT-3. Although GPT-3 did better than the older model, it was significantly worse than humans. It got the three scenarios mentioned above completely wrong.

GPT-3, the engine that powered the initial release of ChatGPT, learns about language by noting, from a trillion instances, which words tend to follow which other words. The strong statistical regularities in language sequences allow GPT-3 to learn a lot about language. And that sequential knowledge often allows ChatGPT to produce reasonable sentences, essays, poems and computer code.

Although GPT-3 is extremely good at learning the rules of what follows what in human language, it doesn’t have the foggiest idea what any of those words mean to a human being. And how could it?

Humans are biological entities that evolved with bodies that need to operate in the physical and social worlds to get things done. Language is a tool that helps people do that. GPT-3 is an artificial software system that predicts the next word. It does not need to get anything done with those predictions in the real world.

I am, therefore I understand

The meaning of a word or sentence is intimately related to the human body: people’s abilities to act, to perceive and to have emotions. Human cognition is empowered by being embodied. People’s understanding of a term like “paper sandwich wrapper,” for example, includes the wrapper’s appearance, its feel, its weight, and, consequently, how we can use it: for wrapping a sandwich. People’s understanding also includes how someone can use it for myriad other opportunities it affords, such as scrunching it into a ball for a game of hoops, or covering one’s hair.

All of these uses arise because of the nature of human bodies and needs: People have hands that can fold paper, a head of hair that is about the same size as a sandwich wrapper, and a need to be employed and thus follow rules…

Access the original article