Stanford's DeLM Framework Cuts Multi-Agent AI Task Costs by 50% Through Decentralized Coordination
Stanford University researchers have introduced DeLM, a decentralized language model framework that enables AI agents to coordinate directly without a central orchestrator. This approach challenges traditional multi-agent systems, which often rely on a main agent to manage tasks and information flow, leading to bottlenecks and inefficiencies. DeLM utilizes parallel agents, a shared context of verified information summaries ("gists"), and a task queue to facilitate direct communication and collaborative problem-solving. The framework has demonstrated significant performance improvements and cost reductions, achieving a 10.5% higher accuracy on SWE-bench Verified and reducing cost per task by approximately 50%. It also showed superior accuracy in long-context reasoning benchmarks like LongBench-v2 Multi-Doc QA compared to leading models, suggesting a more efficient and robust method for scaling AI reasoning tasks.

A new framework developed by Stanford researchers, called a decentralized language model (DeLM), aims to reduce the costs and improve the efficiency of multi-agent artificial intelligence (AI) tasks by eliminating the need for a central orchestrator. This system allows AI agents to coordinate directly, bypassing the traditional model where a single "boss" agent manages all communication and task allocation.
Traditional centralized multi-agent systems often encounter communication and integration bottlenecks. In these setups, a main agent divides tasks, assigns them to sub-agents, and then merges and summarizes their progress. Researchers Yuzhen Mao and Azalia Mirhoseini highlight that as the number of subtasks grows, this central controller becomes overloaded, potentially diluting or distorting crucial information. This can lead to slower coordination, iterative processes, and reduced overall progress, particularly in scenarios requiring long-context reasoning.
DeLM addresses these challenges through a design centered on parallel agents, a shared context, and a task queue. The shared context acts as a curated repository of "gists," which are compact, verified summaries of findings, partial results, and documented failures. Agents can independently claim tasks from the queue and directly access this shared knowledge base. This allows them to build on prior work, avoid redundant efforts, and focus on unresolved issues without routing every interaction through a central controller.
The system's pipeline involves initializing work units into a queue, enabling agents to execute tasks in parallel while reading from the shared context. Results are then compressed into reusable gists, verified for accuracy, and added to the shared group context. If further work is needed after the queue is empty, the last agent to return an answer inspects the shared context to determine the next steps before providing a final answer.
DeLM has shown promising results in real-world benchmarks. On SWE-bench Verified, which evaluates AI models in solving software engineering problems, DeLM performed 10.5% better than the strongest baseline and reduced cost per task by roughly 50%. It also achieved the highest accuracy across several leading large language models, including GPT-5.4, Claude Sonnet, Gemini Flash, and DeepSeek-V4-Pro, in the LongBench-v2 Multi-Doc QA benchmark, which assesses long-context reasoning.
The framework's effectiveness stems from several factors. Agents in DeLM share documented failures, preventing others from pursuing the same unproductive paths. Verified constraints are immediately added to the shared context, becoming binding shared states that guide subsequent agent actions. Furthermore, DeLM maintains a balance between providing comprehensive information and managing costs by making shared progress "unfoldable." Agents view short gists by default but can access more detailed summaries or raw evidence when necessary, avoiding the long-context bottlenecks associated with sharing full traces.
This decentralized approach aims to make agentic tasks more efficient by preventing repeated analysis, more effective by propagating useful findings across parallel threads, and more robust by sharing only verified claims. For enterprise builders, DeLM suggests that decentralized multi-agent workflows can be faster, more accurate, and more cost-effective than their centralized counterparts.
According to VentureBeat, Yuzhen Mao, a co-developer of the framework, detailed the reasons for DeLM's outperformance.
