Back to Insights

GS Federal Insights

Washington Should Set the Safety Test, Not Hold the Guest List

The future of AI should not be decided by a velvet rope in Washington. Safety matters, but progress should not need a permission slip.

People waiting outside a guarded entrance next to a glowing artificial intelligence display

The future of AI should not be decided by a velvet rope in Washington.

I understand the need for safety. I understand the national security concerns. I understand why a frontier AI model should be tested before it is released broadly. But what I do not accept is the idea that the government should get to decide, customer by customer, who gets access to the best technology first.

That is absolutely, 100%, the wrong answer, and for several obvious reasons. Pay-to-play. Unfair market advantages. Slowing research. No clear way to make the "early access" list. And the list goes on….

The right and obvious answer is a clear, consistent, transparent testing framework that applies before powerful models go to market. The government should help define the safety test. It should not become the gatekeeper of progress.

According to Axios, the White House asked OpenAI to limit the initial release of GPT-5.6 to a small group of government-approved partners before any wider launch, citing security concerns. Axios also reported that the White House Office of the National Cyber Director and the Office of Science and Technology Policy are involved as the administration builds a framework for testing and evaluating the security of new models.

Reuters reported that OpenAI planned a limited preview for select partners, with the government “approving access customer by customer during this preview period.”

Seriously? That phrase should concern anyone who cares about innovation: customer by customer.

That is not just safety oversight. That is access control.

I cannot overstress how much early access matters. When a new frontier model comes out, the first users get a real advantage. They can build products sooner, test use cases sooner, improve workflows sooner, discover security applications sooner, and position themselves in the market before everyone else. If the government gets to pick those early users, then Washington is no longer just protecting the public. It is influencing who gets the first shot at the next wave of progress.

This creates market imbalance at best, and at worst, can be very dangerous. There is a better way. And to the government's credit, it does seem to be in discussion.

The government should not be deciding which companies, researchers, or builders are allowed through the door first. It should be supporting the setting of rules for the door itself. And that door should have a very wide frame, and open to everyone at the same time.

A responsible framework would look something like this.

First, define which models require enhanced review. Not every AI model should be treated like a national security asset. The threshold should be based on measurable capabilities, such as advanced cyber operations, autonomous agent behavior, biological or chemical misuse potential, large-scale persuasion capability, or the ability to meaningfully assist criminal activity. The current White House executive order already points in this direction by calling for a classified benchmarking process to assess advanced cyber capabilities and determine when a model becomes a “covered frontier model.” Powerful models should be tested more heavily than ordinary models.

Second, require a pre-release safety test, not a political permission slip. The current executive order contemplates a voluntary framework where developers can provide the federal government access to covered frontier models for up to 30 days before release to trusted partners. It also says developers may collaborate with the government to select trusted partners for early access.

That is where the line gets blurry.

A 30-day review window can be reasonable. Government-approved customer selection is the problem. The review should answer a clear question: does the model meet the safety standard for release? It should not answer a political question: who does Washington trust to use it first?

Third, the test should be standardized. Every frontier AI lab should know the rules before launch. The government should publish the broad categories of testing, even if some specific national security benchmarks remain classified. The public does not need every exploit detail, but companies need predictable expectations. Otherwise, we end up with one-off decisions, private pressure, inconsistent enforcement, and companies guessing what will trigger intervention.

Fourth, use independent red teams. Testing should not rely only on the company and it should not rely only on the government. A strong system would include certified third-party evaluators, academic security labs, and trusted technical experts. They should test for cybersecurity misuse, jailbreak resistance, autonomous tool use, data exfiltration, privacy leakage, dangerous instruction-following, model deception, and the ability to bypass safeguards.

Fifth, make the outcome rule-based. If the model passes the defined tests, it can launch. If it fails, the company gets a clear remediation path. Fix the vulnerability, retest, document the safeguards, and move forward. That is how we handle serious industries without turning every release into a political negotiation.

Sixth, separate safety review from market access. This is the most important piece. The government can say, “This model cannot be released until it passes these tests.” That is a rule. But the government should not say, “These favored partners may use it first, and everyone else waits.” That is gatekeeping.

The Anthropic situation shows why this matters. Anthropic said the U.S. government issued an export control directive suspending access to Fable 5 and Mythos 5 by any foreign national, including foreign national Anthropic employees, and said the practical effect was that it had to disable the models for all customers to ensure compliance. Just Security noted that the government had not publicly disclosed the order, its reason, or its legal basis, and argued that one-off private directives leave industry and the public in the dark.

That is exactly the problem.

When the rules are unclear, enforcement becomes arbitrary. When enforcement is arbitrary, innovation slows down. Companies hesitate. Investors hesitate. Customers hesitate. Builders hesitate. And the advantage shifts from the best ideas to the best access.

We should learn from China, not imitate it.

China has taken a much more aggressive approach to AI regulation, including rules around recommendation algorithms, synthetic media, chatbots, content controls, and public-facing generative AI services. Carnegie has described China as rolling out some of the world’s earliest and most detailed AI regulations. Reuters also reported that China has proposed additional rules for AI services that simulate human personalities, including lifecycle safety responsibilities, algorithm review, data security, personal information protection, and strict content limits.

Some of that may sound reasonable on paper. But the broader lesson is that heavy, permission-based systems create friction. TIME reported that analysts have said Chinese AI developers face higher compliance burdens, and that content filters and aggressive fine-tuning can water down capabilities.

That friction helps explain why America should not voluntarily adopt the same instinct, even in a softer form.

America’s advantage has never been that Washington picks the winners. America’s advantage has been competition, openness, speed, experimentation, entrepreneurship, and the ability of outsiders to build things the establishment did not see coming.

So yes, test frontier AI models before they go to market.

Yes, require serious cybersecurity evaluations.

Yes, require documentation, red teaming, incident reporting, and post-release monitoring.

Yes, protect critical infrastructure and national security.

But do not turn frontier AI into a government-approved guest list.

A better policy would say:

Define the risk threshold. Test against the threshold. Require fixes when a model fails. Allow release when it passes. Publish the general rules. Keep emergency restrictions narrow and time-limited. Do not let political access become a competitive advantage.

That is how we protect safety without choking progress.

The government should be the referee that enforces clear rules, not the bouncer deciding who gets into the club.

Because once innovation requires permission, progress starts slowing down. And if America starts copying the same gatekeeping instincts that hold back more controlled systems like China’s, we should not be surprised when we begin giving away the advantage that made us the leader in the first place.

Safety matters.

But progress should not need a permission slip.

Originally published by Greg Pliler on LinkedIn on June 26, 2026. View the original LinkedIn post.