AI has made it easier than ever to find information: Ask ChatGPT almost anything, and the system swiftly delivers an answer. But the large language models that power popular tools like OpenAI’s ChatGPT or Anthropic’s Claude were not designed to be accurate or factual. They regularly “hallucinate” and offer up falsehoods as if they were hard facts.
Yet people are relying more and more on AI to answer their questions. Half of all people in the U.S. between the ages of 14 and 22 now use AI to get information, according to a 2024 Harvard study. An analysis by The Washington Post found that more than 17% of prompts on ChatGPT are requests for information.
One way researchers are attempting to improve the information AI systems give is to have the systems indicate how confident they are in the accuracy of their answers. I’m a computer scientist who studies natural language processing and machine learning. My lab at the University of Michigan has developed a new way of deriving confidence scores that improves the accuracy of AI chatbot answers. But confidence scores can only do so much.
Popular and problematic
Leading technology companies are increasingly integrating AI into search engines. Google now offers AI Overviews that appear as text summaries above the usual list of links in any search result. Other upstart search engines, such as Perplexity, are challenging traditional search engines with their own AI-generated summaries.
The convenience of these summaries has made these tools very popular. Why scour the contents of multiple websites when AI can provide the most pertinent information in a few seconds?
AI tools seem to offer a smoother, more expedient avenue to getting information. But they can also lead people astray or even expose them to harmful falsehoods. My lab has found that even the most accurate AI models hallucinate in 25% of claims. This hallucination rate is concerning because other research suggests AI can influence what people think.
It bears emphasizing: AI chatbots are designed to sound good, not give accurate information.
Language models hallucinate because they learn and operate on statistical patterns drawn from a massive amount of text data, much of which comes from the internet. This means that they are not necessarily grounded in real-world facts. They also lack other human competencies, like common sense and the ability to distinguish between serious expressions and sarcastic ones.
All this was on display last spring, when a user asked Google’s AI Overviews tool to suggest a way to keep cheese from sliding off a pizza. The tool promptly recommended mixing the cheese with glue. It then came to light that someone had once posted this obviously tongue-in-cheek recommendation on Reddit. Like most large language models, Google’s model had likely been trained with information scraped from myriad internet sources, including…