DOGE threat: How government data would give an AI company extraordinary power
- Written by Allison Stanger, Distinguished Endowed Professor, Middlebury
 
The Department of Government Efficiency, or DOGE, has secured unprecedented access[1] to at least seven sensitive federal databases, including those of the Internal Revenue Service and Social Security Administration. This access has sparked fears about cybersecurity vulnerabilities[2] and privacy violations[3]. Another concern has received far less attention: the potential use of the data to train a private company’s artificial intelligence systems.
The White House press secretary said government data that DOGE has collected isn’t being used[4] to train Musk’s AI models, despite Elon Musk’s control over DOGE. However, evidence has emerged that DOGE personnel simultaneously hold positions[5] with at least one of Musk’s companies.
At the Federal Aviation Administration, SpaceX employees have government email addresses[6]. This dual employment creates a conduit for federal data to potentially be siphoned to Musk-owned enterprises, including xAI. The company’s latest Grok AI chatbot model conspicuously refuses to give a clear denial[7] about using such data.
As a political scientist and technologist[8] who is intimately acquainted with public sources of government data[9], I believe this potential transmission of government data to private companies presents far greater privacy and power implications than most reporting identifies. A private entity with the capacity to develop artificial intelligence technologies could use government data to leapfrog its competitors and wield massive influence over society.
Value of government data for AI
For AI developers, government databases represent something akin to finding the Holy Grail[10]. While companies such as OpenAI, Google and xAI currently rely on information scraped from the public internet, nonpublic government repositories offer something much more valuable: verified records of actual human behavior across entire populations.
This isn’t merely more data – it’s fundamentally different data[11]. Social media posts and web browsing histories show curated or intended behaviors, but government databases capture real decisions and their consequences. For example, Medicare records[12] reveal health care choices and outcomes. IRS and Treasury data reveal financial decisions and long-term impacts. And federal employment and education statistics reveal education paths and career trajectories.
What makes this data particularly valuable for AI training is its longitudinal nature and reliability[13]. Unlike the disordered information available online, government records follow standardized protocols[14], undergo regular audits and must meet legal requirements for accuracy. Every Social Security payment, Medicare claim and federal grant creates a verified data point about real-world behavior. This data exists nowhere else with such breadth and authenticity in the U.S.
Most critically, government databases track entire populations over time[15], not just digitally active users. They include people who never use social media, don’t shop online, or actively avoid digital services. For an AI company, this would mean training systems on the actual diversity of human experience rather than just the digital reflections people cast online.
The technical advantage
Current AI systems face fundamental limitations that no amount of data scraped from the internet can overcome. When ChatGPT or Google’s Gemini make mistakes, it’s often because they’ve been trained on information that might be popular but isn’t necessarily true[17]. They can tell you what people say about a policy’s effects, but they can’t track those effects across populations and years.
Government data could change this equation. Imagine training an AI system not just on opinions about health care but on actual treatment outcomes across millions of patients. Consider the difference between learning from social media discussions about economic policies and analyzing their real impacts across different communities and demographics over decades.
A large, state-of-the-art, or frontier, model trained on comprehensive government data[18] could understand the actual relationships between policies and outcomes. It could track unintended consequences across different population segments, model complex societal systems with real-world validation and predict the impacts of proposed changes based on historical evidence. For companies seeking to build next-generation AI systems, access to this data would create an almost insurmountable advantage.
Control of critical systems
A company like xAI could do far more with models trained on government data than building better chatbots or content generators. Such systems could fundamentally transform – and potentially control – how people understand and manage complex societal systems. While some of these capabilities could be beneficial under the control of accountable public agencies, I believe they pose a threat in the hands of a single private company.
Medicare and Medicaid databases[19] contain records of treatments, outcomes[20] and costs across diverse populations over decades. A frontier model trained on new government data could identify treatment patterns that succeed where others fail, and so dominate the health care industry. Such a model could understand how different interventions affect various populations over time, accounting for factors such as geographic location, socioeconomic status and concurrent conditions.
A company wielding the model could influence health care policy by demonstrating superior predictive capabilities and market population-level insights to pharmaceutical companies and insurers.
Treasury data represents perhaps the most valuable prize[21]. Government financial databases contain granular details about how money flows through the economy. This includes real-time transaction data across federal payment systems, complete records of tax payments and refunds, detailed patterns of benefit distributions, and government contractor payments with performance metrics.
An AI company with access to this data could develop extraordinary capabilities[22] for economic forecasting and market prediction. It could model the cascading effects of regulatory changes, predict economic vulnerabilities before they become crises, and optimize investment strategies with precision impossible through traditional methods.
Elon Musk’s xAI company is well financed.Infrastructure and urban systems
Government databases contain information about critical infrastructure usage patterns, maintenance histories, emergency response times and development impacts. Every federal grant, infrastructure inspection and emergency response creates a data point that could help train AI to better understand how cities and regions function.
The power lies in the potential interconnectedness of this data[23]. An AI system trained on government infrastructure records would understand how transportation patterns affect energy use, how housing policies affect emergency response times, and how infrastructure investments influence economic development across regions.
A private company with exclusive access would gain unique insight into the physical and economic arteries of American society. This could allow the company to develop “smart city” systems[24] that city governments would become dependent on, effectively privatizing aspects of urban governance. When combined with real-time data from private sources, the predictive capabilities would far exceed what any current system can achieve.
Absolute data corrupts absolutely
A company such as xAI, with Musk’s resources and preferential access through DOGE, could surmount technical and political obstacles far more easily than competitors. Recent advances in machine learning have also reduced the burdens of preparing data for the algorithms to process, making government data a veritable gold mine – one that rightfully belongs to the American people.
The threat of a private company accessing government data transcends individual privacy concerns. Even with personal identifiers removed, an AI system that analyzes patterns across millions of government records could enable surprising capabilities for making predictions and influencing behavior at the population level. The threat is AI systems that leverage government data to influence society, including electoral outcomes.
Since information is power, concentrating unprecedented data in the hands of a private entity with an explicit political agenda represents a profound challenge to the republic. I believe that the question is whether the American people can stand up to the potentially democracy-shattering corruption such a concentration would enable. If not, Americans should prepare to become digital subjects rather than human citizens.
References
- ^ unprecedented access (www.nytimes.com)
- ^ cybersecurity vulnerabilities (theconversation.com)
- ^ privacy violations (www.washingtonpost.com)
- ^ isn’t being used (www.politico.com)
- ^ simultaneously hold positions (www.theverge.com)
- ^ have government email addresses (www.theverge.com)
- ^ conspicuously refuses to give a clear denial (www.politico.com)
- ^ political scientist and technologist (scholar.google.com)
- ^ public sources of government data (yalebooks.yale.edu)
- ^ finding the Holy Grail (serval.unil.ch)
- ^ fundamentally different data (ibridgellc.com)
- ^ Medicare records (doi.org)
- ^ longitudinal nature and reliability (doi.org)
- ^ standardized protocols (bjs.ojp.gov)
- ^ track entire populations over time (www.nidcd.nih.gov)
- ^ Al Drago/Getty Images (www.gettyimages.com)
- ^ popular but isn’t necessarily true (doi.org)
- ^ trained on comprehensive government data (doi.org)
- ^ Medicare and Medicaid databases (healthcaredelivery.cancer.gov)
- ^ outcomes (doi.org)
- ^ perhaps the most valuable prize (home.treasury.gov)
- ^ develop extraordinary capabilities (www.cbo.gov)
- ^ interconnectedness of this data (www.snowflake.com)
- ^ “smart city” systems (stvinc.com)
Authors: Allison Stanger, Distinguished Endowed Professor, Middlebury


