White House wants to vet powerful AI models for risks − a computer scientist explains why AI safety is so difficult

The Trump administration is looking to develop a process that would have the federal government review the safety of powerful artificial intelligence models before approving their release, according to a report in The New York Times on May 4, 2026. The move would stand in contrast to the administration’s generally anti-regulatory approach to industry and comes in the wake of Anthropic voluntarily postponing the release of its latest AI model, Mythos.

Anthropic was concerned because when it tested Mythos, the model found thousands of vulnerabilities in operating systems and web browsers. The implication was that if a cybercriminal or hostile foreign agent had Mythos, they could penetrate computer systems worldwide and compromise the basic computer code underlying public safety, national economies and military security.

As a result, Anthropic gave limited access only to about 50 companies and organizations managing critical infrastructure as part of its Project Glasswing. The initiative aims to help governments and corporations close software loopholes Mythos has identified. When Anthropic sought to broaden the number of organizations with access to Mythos, the White House objected.

Security experts, meanwhile, have expressed concern that AI researchers in nations such as China, Russia, Iran and North Korea might soon create similarly powerful AI models and use them to threaten or attack other countries, or to create chaos in those countries’ economies.

Major challenges

As a computer scientist in this area, my work on computer security and malware shows it’s difficult to even define what safety measures the field should take to make models safe to use. Yet the future of many industries, critical infrastructure, national security and human well-being seems to depend on achieving AI models that are truthful, ethical and reasonable.

The first of these challenges, truthfulness and factual accuracy, came to light when OpenAI’s ChatGPT burst onto the scene in 2022. People worldwide realized that the output of large language models does not necessarily reflect a truthful reality. The goal for AI companies was coherent writing that read as if a human wrote it. If an output was factually flawed, programmers wrote it off as a “hallucination” by the model.

After AI programs led to some legal catastrophes and stock market panic, AI companies have made at least some effort to ensure that their models avoid falsehoods and inaccuracies.

Nonetheless, false information stated confidently within a sea of solid-sounding text can take on a life of its own. Because of the consequences, research is underway on how to engineer truthfulness into models, or at least prevent hallucination.

AI-controlled vertical farms promise revolution in food production

In AI Robotics Startups

on 30 December 20209 min read

Truthfulness and grounding in reality are part of a larger and more general concern about safe AI models. The very pace of their advancement may pose a threat.

Cybersecurity experts…

Access the original article

White House wants to vet powerful AI models for risks − a computer scientist explains why AI safety is so difficult

Major challenges

Cybersecurity experts…

Subscribe