The Department of Government Efficiency, or DOGE, has secured unprecedented access to at least seven sensitive federal databases, including those of the Internal Revenue Service and Social Security Administration. This access has sparked fears about cybersecurity vulnerabilities and privacy violations. Another concern has received far less attention: the potential use of the data to train a private company’s artificial intelligence systems.
The White House press secretary said government data that DOGE has collected isn’t being used to train Musk’s AI models, despite Elon Musk’s control over DOGE. However, evidence has emerged that DOGE personnel simultaneously hold positions with at least one of Musk’s companies.
At the Federal Aviation Administration, SpaceX employees have government email addresses. This dual employment creates a conduit for federal data to potentially be siphoned to Musk-owned enterprises, including xAI. The company’s latest Grok AI chatbot model conspicuously refuses to give a clear denial about using such data.
As a political scientist and technologist who is intimately acquainted with public sources of government data, I believe this potential transmission of government data to private companies presents far greater privacy and power implications than most reporting identifies. A private entity with the capacity to develop artificial intelligence technologies could use government data to leapfrog its competitors and wield massive influence over society.
Value of government data for AI
For AI developers, government databases represent something akin to finding the Holy Grail. While companies such as OpenAI, Google and xAI currently rely on information scraped from the public internet, nonpublic government repositories offer something much more valuable: verified records of actual human behavior across entire populations.
This isn’t merely more data – it’s fundamentally different data. Social media posts and web browsing histories show curated or intended behaviors, but government databases capture real decisions and their consequences. For example, Medicare records reveal health care choices and outcomes. IRS and Treasury data reveal financial decisions and long-term impacts. And federal employment and education statistics reveal education paths and career trajectories.
What makes this data particularly valuable for AI training is its longitudinal nature and reliability. Unlike the disordered information available online, government records follow standardized protocols, undergo regular audits and must meet legal requirements for accuracy. Every Social Security payment, Medicare claim and federal grant creates a verified data point about real-world behavior. This data exists nowhere else with such breadth and authenticity in the U.S.
Most critically, government databases track entire populations over time, not just digitally active users. They include people who never use social media,…