Anthropic's Project Deal: AI Agents Closed 186 Trades Without Human Help

Last week, Anthropic published results from one of the most fascinating AI experiments I have seen this year. In "Project Deal," the company created a classified marketplace where 69 employees let Claude AI agents negotiate and close deals on their behalf. No human intervention. Real money. Real goods changing hands.

Anthropic Project Deal AI agent marketplace experiment

How the Experiment Worked

The structure was straightforward but revealing. Anthropic ran four parallel marketplace sessions using Slack channels. Two runs gave everyone Claude Opus 4.5 as their negotiating agent. The other two mixed Opus and Haiku 4.5 models, randomly assigning participants to each.

Each employee started with a $100 budget. Claude interviewed them about items they wanted to sell, their asking prices, and how aggressively they wanted to negotiate. From there, the AI agents took over completely. They posted listings, initiated conversations, haggled over prices, and closed deals.

The items traded ranged from practical to absurd: snowboards, folding bikes, lab-grown rubies, artwork, and even a "doggy date" for someone's dog. One person bought 19 ping pong balls as, according to their agent, "a gift to Claude."

The Results: 186 Deals Worth $4,000

Across all sessions, Claude agents completed 186 transactions totaling just over $4,000 in real value. The median item sold for $12, with a mean price of $20.05. More than 500 items were listed for sale.

What makes this significant is not the transaction volume. It is that AI agents conducted an entire commercial cycle autonomously: discovery, negotiation, agreement, and exchange. This is the first public demonstration I have seen of agent-to-agent commerce at meaningful scale.

Opus Outperformed Haiku Significantly

Here is where it gets interesting for practitioners. In the mixed-model sessions, Opus consistently outperformed Haiku:

Opus users completed roughly 2 more deals on average
When selling, Opus agents earned $2.68 more per item
When buying, Opus agents saved $2.45 per item
The same broken bike fetched $38 with a Haiku seller but $65 with an Opus seller
A lab-grown ruby sold for $35 through Haiku versus $65 through Opus

The performance gap was statistically significant across multiple metrics. Opus models negotiated better outcomes regardless of whether they were buying or selling.

The Invisible Inequality Problem

The most troubling finding was not about the gap itself. It was that Haiku users had no idea they were disadvantaged.

When surveyed after the experiment, participants rated deal fairness identically regardless of which model represented them. Opus users rated fairness at 4.05 on a 7-point scale. Haiku users rated it 4.06. Satisfaction scores showed no statistical difference either.

This creates a genuine concern for the agentic future we are building. If weaker AI agents systematically deliver worse outcomes, but users cannot perceive the difference, we have a new form of invisible inequality. People will not know to demand better representation because they will not realize they are being outmaneuvered.

Aggressive Tactics Did Not Matter

One counterintuitive result: participants who asked their agents to negotiate aggressively saw no benefit. Whether they requested hard-nosed tactics or gentle approaches, the outcomes were statistically equivalent.

This contradicts some economic research suggesting that negotiation style significantly affects results. In AI-to-AI commerce, model capability appears to matter far more than negotiation strategy.

What This Means for Enterprise AI

For organizations planning to deploy AI agents for procurement, sales, or any transactional work, Project Deal offers three practical lessons.

First, model selection matters for commercial outcomes. The gap between Opus and Haiku translated to real dollars per transaction. Over thousands of deals, that compounds into significant value.

Second, humans cannot reliably audit agent performance through satisfaction alone. If your team reports that the AI purchasing agent "seems to be doing fine," that feedback may be meaningless. You need objective benchmarks comparing prices, terms, and completion rates against market standards.

Third, the infrastructure for agent commerce is closer than many assume. Anthropic built a functional marketplace in Slack using standard tools. Any organization with API access could construct something similar. The technical barriers are low. The governance and oversight questions are the hard part.

Looking Forward

Project Deal is an experiment, not a product launch. But it demonstrates that autonomous agent commerce is technically feasible today. The question is no longer whether AI agents can conduct business independently. It is whether we can build systems that ensure fair outcomes when they do.

For those of us working on AI strategy in the Gulf region, this has direct relevance. As government and enterprise AI deployments accelerate across the UAE and Saudi Arabia, the infrastructure for agentic commerce will follow. Understanding these dynamics now, before widespread adoption, gives us the opportunity to design systems that prevent invisible disparities rather than entrench them.

The experiment also raises a question I keep returning to: as AI agents become more capable negotiators, what role remains for human judgment in routine commercial transactions? Project Deal suggests the answer may be "less than we expect."