‘Indirect prompt injection’ attacks could upend chatbots

ChatGPT’s explosive growth has been breathtaking. Barely two months after its introduction last fall, 100 million users had tapped into the AI chatbot’s ability to engage in playful banter, argue politics, generate compelling essays and write poetry.

“In 20 years following the internet space, we cannot recall a faster ramp in a consumer internet app,” analysts at UBS investment bank declared earlier this year.

That’s good news for programmers, tinkerers, commercial interests, consumers and members of the general public, all of whom stand to reap immeasurable benefits from enhanced transactions fueled by AI brainpower.

But the bad news is whenever there’s an advance in technology, scammers are not far behind.

A new study, published on the pre-print server arXiv, has found that AI chatbots can be easily hijacked and used to retrieve sensitive user information.

Researchers at Saarland University’s CISPA Helmholtz Center for Information Security reported last month that hackers can employ a procedure called indirect prompt injection to surreptitiously insert malevolent components into a user-chatbot exchange.

Chatbots use large language model (LLM) algorithms to detect, summarize, translate and predict text sequences based on massive datasets. LLMs are popular in part because they use natural language prompts. But that feature, warns Saarland researcher Kai Greshake, “might also make them susceptible to targeted adversarial prompting.”

Greshake explained it could work like this: A hacker slips a prompt in zero-point font—that is, invisible—into a web page that will likely be used by the chatbot to respond to a user’s question. Once that “poisoned” page is

Subscribe
Don't miss the best news ! Subscribe to our free newsletter :