White Circle Raises $11M to Stop AI Models from Going Rogue

The gap between training AI models and deploying them safely in production environments is widening. White Circle, a London-based startup, just raised $11 million in seed funding to address exactly this problem: giving enterprises real-time control over what AI systems actually do after deployment.

The Problem with Post-Training Safety

AI labs spend considerable resources on safety training before releasing models. But once those models enter enterprise workflows, the controlled conditions of the lab give way to messy reality. Employees find creative prompts that bypass guidelines. Autonomous agents take unexpected actions. Sensitive data leaks through innocuous-sounding queries. Biases emerge in high-stakes decisions that were never tested during development.

Denis Shilov, White Circle's founder, discovered this gap firsthand in late 2024 when he found a universal jailbreak prompt that could bypass safety filters across leading AI models. That experience led him to a core insight: pre-deployment safety training, while necessary, is insufficient for production environments where the range of possible inputs and contexts is effectively unlimited.

As Shilov puts it: "Jailbreaks are just one part of the problem. In as many ways people can misbehave, models can misbehave too."

How White Circle Works

White Circle operates as a real-time enforcement layer positioned between users and AI models. Every input and output passes through their system, which examines both against company-specific policies. When something problematic is detected, the system can flag it for review, block it entirely, or trigger custom remediation workflows.

The technical approach is notable for its flexibility. Rather than relying solely on static rules or keyword filters (which sophisticated prompts can easily circumvent), White Circle combines multiple detection methods. Their system can identify:

Malware generation attempts: Requests designed to produce malicious code, even when disguised as legitimate development tasks
Data exfiltration: Queries that could leak proprietary information, customer data, or trade secrets
Scam enablement: Outputs that could facilitate fraud, phishing, or social engineering
Biased decision-making: Patterns in autonomous agent behavior that systematically disadvantage certain groups
Policy violations: Company-specific rules around topics, tone, or actions that the base model does not inherently respect

The key distinction from built-in model safety is that White Circle enforces behavior rather than merely influencing it. A model's training creates tendencies and preferences. White Circle creates hard boundaries.

The Investor Signal

The seed round's investor list reads like a who's who of frontier AI development:

Romain Huet, Developer Experience Lead at OpenAI
Durk Kingma, former OpenAI cofounder now at Anthropic
Guillaume Lample, Cofounder and Chief Scientist at Mistral
Thomas Wolf, Cofounder and Chief Science Officer at Hugging Face
Ophelia Cai, partner at Tiny VC

When the people building frontier models invest in tools to constrain those models, it signals something important about their assessment of deployment risks. These investors understand, perhaps better than anyone, that the models they create need external controls when operating in unstructured environments.

Why This Matters for Enterprise AI

The timing of White Circle's funding aligns with a broader shift in enterprise AI adoption. We are moving past the experimental phase where companies ran isolated pilots with limited exposure. Now, AI agents handle customer interactions, process financial transactions, make hiring recommendations, and access sensitive databases autonomously.

The regulatory environment is evolving in parallel. Singapore released its agentic AI governance framework earlier this year. The EU's AI Act is coming into full enforcement. Yale's Chief Executive Leadership Institute recently published a cross-industry governance framework identifying eight key variables for responsible agentic AI deployment. All of these frameworks emphasize post-deployment monitoring and intervention capabilities, exactly what White Circle provides.

For organizations in regulated industries like banking, healthcare, and government, the question is no longer whether they need runtime AI controls, but which solutions can actually deliver them without breaking existing workflows.

The Broader Landscape

White Circle enters a market that is becoming increasingly crowded. Existing approaches include model-level fine-tuning for safety, prompt injection detection systems, and output filtering based on content classifiers. Each has limitations: fine-tuning is expensive and inflexible, prompt detection is an arms race, and output filtering often creates false positives that degrade user experience.

What differentiates White Circle is the combination of real-time enforcement with policy customization. Enterprises do not need generic safety (they get that from the base model). They need safety tailored to their specific risk profile, compliance requirements, and operational context.

The 20-person team, distributed across London, France, Amsterdam, and other European locations, is primarily composed of engineers. This technical focus suggests a product-first approach: build something that works reliably at scale, then expand.

Looking Forward

The $11 million seed round positions White Circle to scale its platform while the market for enterprise AI safety tools is still defining itself. First movers in this space have an advantage: enterprises that implement a control layer become operationally dependent on it, creating switching costs.

The more interesting question is whether runtime safety controls will become a standard component of enterprise AI architecture, like firewalls became standard for network security. If the leaders building frontier models are investing in this approach, they seem to think so.

For AI practitioners deploying agents in production, the message is clear: training-time safety is the foundation, but runtime controls are what let you actually trust the system in the wild.

Sources:

Fortune exclusive on White Circle funding