When one AI model isn’t enough

Artificial intelligence is learning to delegate.For the past few years, the story of AI has been about making individual models bigger, smarter and faster. Now, a different idea is gaining ground: that the real power might come not from a single brilliant model, but from many models working together, each doing what it does best. Think: less lone genius, more well-run team.Perplexity Computer, launched recently, is the latest expression of that idea. Instead of routing a complex task to one AI model and hoping for the best, the system breaks the job into pieces and sends each piece to a specializedAI agent, a software program designed to take actions and make decisions on its own. The agents work simultaneously—typically in the cloud, rather than on a user’s own machine—and report back when done.“The space around orchestration and building something that can manage a team is getting a bit crowded,”Aaron Baughman, an IBM Distinguished Engineer and Master Inventor, said on arecent episodeof theMixture of Expertspodcast.The reason so many researchers and companies are chasing this idea, Baughman said, comes down to two stubborn limits that even the most powerful AI models run into. The first is speed: a single model will work through a long, complicated task one step at a time, like a solo surgeon performing every role in an operating room at once. The second is memory: every AI model has acontext window, a limit on how much information it can hold in its working memory at once. Push a model past that limit, and it starts to lose the thread, forgetting earlier instructions and making mistakes that compound.“If coordinated correctly and the task can be parallelized, meaning split into simultaneous streams, N agents can complete the task N times as fast as a single agent alone,”Eugene Vinitsky, an Assistant Professor at New York University, said in an interview withIBM Think. “As the length of the input to an agent grows, its performance degrades accordingly, and it can start to forget things or fail to execute its task. Spawning things to sub-agents with dedicated roles can be useful for wringing the best performance out of agents.”There is a diagnostic benefit, too, according toNiranjan Balasubramanian, an Assistant Professor of Computer Science at Stony Brook University. When a single model makes a mistake in the course of completing a long task, finding and fixing the error can be like searching a room for a lost key in the dark. Distributed systems make the problem smaller and the solution cleaner.“The partitioning of roles across the agents actually allows for effective debugging and analysis of failure modes,” Balasubramanian said in an interview withIBM Think. “Getting into multi-agent systems not only has immediate computational and modularity benefits. It is where I believe systems development is: AI as services.”The approach draws on a design principle that has been central to software engineering for decades, Balasubramanian said: complex systems work better when they are built from smaller, independent parts that each do one thing well, a principle called “modularity.” “Specialized AI models working together naturally mirrors this need when building complex workflows,” he said.

None of this fully explains why agent systems captured the popular imagination when they did.Gabe Goodhart, Chief Architect of AI Open Innovation at IBM, pointed to something that sounds almost comically mundane. What changed the conversation, he said onMixture of Experts, was not a technical breakthrough, but the addition of afor loop, a basic programming instruction that tells a system to repeat a task, andcron jobs, scheduled commands that fire automatically at set times. “That had a self-improvement aspect to it,” Goodhart said. “It actually tried curate almost a persona for your bot. That part may not have had a whole lot of utility to it, but it certainly started capturing the imagination and personified these things in a way that just having a long-running agent doesn’t.”Giving an AI system a personality, it turns out, is more persuasive than giving it a benchmark. Perplexity Computer manages long-running tasks by routing them to sub-agents, Goodhart said, but what made early agent systems resonate with users was something it does not appear to replicate. “I think the kernel of what Perplexity Computer is trying to do is manage extremely long-running tasks that are very high-order in nature,” he said.

Years of academic work underpin these systems, Baughman said. Work on deep learning frameworks for optimal agent selection, systems that decide which AI model is best suited to a given subtask, and on the theory of manager agents that coordinate others has been developing in academic literature for years. “Lots of this work is built around that, which has been years in development,” he said. “That gives us a foundation within science and engineering.”That research foundation has not protected any product from overpromising, Baughman noted. Perplexity describes its new system as enabling “fully autonomous project management” and “broad accessibility.”This is a pattern familiar to anyone who has watched a technology cycle play out .“What they’re really doing is trying to build you an end-to-end agentic platform where all of these elements come together,” saidChris Hay, a Distinguished Engineer at IBM, on the podcast. The vision is coherent. Whether any current product fully delivers it is a different question.

Comments (0)