AI doesn’t ‘see’ the way that you do, and that could be a problem when it categorizes objects and scenes

AI doesn't 'see' the way that you do, and that could be a problem ...

Even with no fur in frame, you can easily see that a photo of a hairless Sphynx cat depicts a cat. You wouldn’t mistake it for an elephant.

But many artificial intelligence vision systems would. Why? Because when AI systems learn to categorize objects, they often rely on visual cues – like surface texture or simple patterns in pixels. This tendency makes them vulnerable to getting confused by small changes that have little effect on human perception.

A vision system aligned more closely with human perception – one that perhaps emphasizes shape, for instance – might still confuse the cat for another similarly shaped mammal, like a tiger; but it is unlikely to indicate an elephant.

The kinds of mistakes an AI makes reveal how it organizes visual information, with potential limitations that become concerning in higher-stakes settings.

red stop sign with stickers and graffiti

Stickers and graffiti on a stop sign could serve as an adversarial attack, confusing AI in autonomous vehicles.
rick/Flickr, CC BY

Imagine an autonomous vehicle approaching a vandalized stop sign. While a human driver recognizes the sign from its shape and context, an AI that relies on pixel patterns may misclassify it, pushing the altered sign out of the category “sign” altogether and into a different group of images that it identifies as similar, such as a billboard, advertisement or other roadside object.

Together, these problems point to a misalignment between how humans perceive the visual world and how AI represents it.

We are experts in visual perception, and we work at the intersection of human and machine perception. People organize visual input into objects, meaning and relationships shaped by experience and context. AI models don’t organize visual information the same way. This key difference explains why AI sometimes fails in surprising ways.

Seeing objects, not features

Imagine that in front of you is a small, opaque object with both straight and curved edges. But you don’t see those features; you just see your coffee mug.

Vision isn’t a camera, passively recording the world. Instead, your brain rapidly turns the light your eyes absorb into objects you recognize and understand, organizing experience into structured mental representations.

Researchers can understand how these representations are structured by examining how people judge similarity. Your coffee mug is not like your computer, but it’s similar to a glass of water despite differences in appearance. That judgment reflects how the mug is mentally represented: not just in terms of appearance, but also what the mug is used for and how it fits into everyday activities.

clear glass of water next to white ceramic mug in saucer on table

Very alike in how you use them; less similar in looks.
Oscar Wong/Moment via Getty Images

Importantly, the mental organization of representations is flexible. Which aspects of an object stand out change with context and goals. If packing a moving box, shape and size…

Access the original article

Subscribe
Don't miss the best news ! Subscribe to our free newsletter :