Back to Blog
·4 min read

Thinking Machines Lab Unveils Real-Time AI Interaction Models

Mira Murati's startup introduces interaction models that eliminate turn-taking, enabling simultaneous speech and 200ms response times.

interaction modelsreal-time AImultimodal AIMira Murati

Mira Murati, the former CTO of OpenAI, has been relatively quiet since leaving the company in 2024. That changed this week when her startup, Thinking Machines Lab, unveiled what they call "interaction models," a fundamentally new approach to human-AI conversation that eliminates the awkward turn-taking we have all grown accustomed to.

Thinking Machines Lab demonstrates real-time AI interaction models
Thinking Machines Lab demonstrates real-time AI interaction models

The Problem With Current AI Conversations

Every AI assistant today, whether it is ChatGPT, Claude, or Gemini, follows the same basic pattern: you speak or type, the AI processes your input, then it responds. This sequential approach feels unnatural because real human conversations do not work that way. We interrupt each other, we respond with "mmhmm" while still listening, and we adjust our responses based on real-time feedback from facial expressions and tone.

Thinking Machines Lab argues that this turn-based limitation is not just a user experience problem. It fundamentally constrains what AI assistants can do. How can an AI be a true collaborator if it cannot react to your confusion mid-sentence or notice that you are looking at the wrong part of the screen?

A Split Architecture for Real-Time Response

The technical approach is clever. Instead of one model trying to do everything, interaction models split the work between two components. The interaction model stays constantly connected to the user, handling the real-time stream of audio, video, and text. Meanwhile, a background model handles reasoning, tool use, and complex tasks asynchronously.

This split enables what Thinking Machines calls "time-aligned micro-turns": the system processes 200-millisecond chunks of continuous input and output rather than waiting for you to finish speaking. The result is a 0.40-second response latency, roughly matching the pace of natural human conversation.

Benchmarks That Actually Matter

The company introduced three new evaluation frameworks to measure capabilities that existing benchmarks cannot capture. TimeSpeak tests whether the model can answer questions like "what time is it?" while conversing (their model achieves 64.7% accuracy versus 4.3% for competitors). CueSpeak evaluates response to conversational cues and interruptions. Visual proactivity tests measure whether the model can notice and react to visual changes without being explicitly asked.

On the more established FD-bench for interaction quality, their model (TML-Interaction-Small, a 276 billion parameter mixture-of-experts system with 12 billion active parameters) scores 77.8, dramatically outperforming competitors that range from 39 to 54.3.

Practical Implications for the UAE

For organizations in the UAE and the broader Middle East working on AI applications, this development signals an important shift. Customer service automation, telemedicine consultations, and educational AI tutors could all benefit from more natural, real-time interaction. The current generation of AI assistants often feels like talking to a very smart answering machine. Interaction models could make AI feel more like talking to a knowledgeable colleague who actually pays attention.

The split architecture also has implications for edge deployment. By separating the lightweight interaction component from the heavier reasoning component, organizations could potentially run the responsive front-end locally while offloading complex processing to cloud infrastructure.

What Comes Next

Thinking Machines Lab is currently running a limited research preview. A broader release is planned for later in 2026. The company has not disclosed pricing or API availability details, but given Murati's experience scaling AI products at OpenAI, I would expect a developer-friendly approach.

The timing is interesting. OpenAI recently released GPT-5.5 Instant with improved conversational tone and reduced latency. Google continues to iterate on Gemini's multimodal capabilities. But neither has tackled the fundamental turn-taking problem that Thinking Machines is addressing.

Whether interaction models represent the next paradigm shift in AI or a specialized capability for specific use cases remains to be seen. But for anyone building products that depend on natural human-AI dialogue, this is worth watching closely.

Book a Consultation

Business Inquiry