The year companies stop building AI agents and start running them
This will be the year enterprises shift their attention away from building AI agents and toward the harder, more consequential work of operating them safely, at scale, and inside real business systems, according to one IBM expert.That assessment comes fromMaryam Ashoori, Vice President of Product and Engineering at IBM watsonx, who toldIBM Thinkin an interview that the market has moved through several fast cycles sincegenerative AIfirst entered corporate life. Early enthusiasm gave way to frustration, followed by a surge of interest in agents, and is now settling into a more sober phase focused on control, visibility and governance.Just three years ago, in 2023, she said, most companies treated generative AI as an exploratory investment. Teams experimented with models and prompts, but struggled to move beyond pilots. The value, when it appeared, was concentrated in a narrow set of use cases such as summarization, classification, question answering, information extraction, code generation and content creation.“When organizations had a clear, production-relevant use case in those categories, the value was immediate,” she said. “Without that clarity, the investment generated insight, but not measurable business outcomes.”In early 2024,large language modelsgained the ability to take actions by calling application programming interfaces. The capability, now commonly described asagentic AI, allowed models to interact directly with enterprise software and legacy systems.The reaction inside large organizations was immediate. “The enterprise market saw this as an opportunity to drive generative AI acceleration into every corner of the business,” Ashoori said.Chief information officers began asking for agents, sometimes without a clear sense of what those agents should actually do. The emphasis was on building quickly. In many cases, speed won out over structure.By late 2025, Ashoori said, enterprises found themselves with dozens or even hundreds of agents running across different platforms. Some were built by developers. Others were built by business users. Many came from different providers and operated under different assumptions.“You can build an agent in less than five minutes,” she said. “The problem is what happens after that.”
As agents multiplied, so did their risks. Ashoori said agents inherit the limitations of the models that underpin them, and those limitations are amplified when systems are allowed to act rather than simply respond.Hallucinations, she said, are a familiar problem at the model layer. At the agent layer, they can become operational failures.“If the model hallucinates and takes the wrong tool,” she said, “and that tool has access to unauthorized data, then you have a data leak.”That possibility has pushed enterprises into a new phase of concern. The question is no longer how quickly an agent can be built, but whether it can be trusted once it is connected to sensitive systems.The focus, she said, has shifted from build time to run time. Companies are moving from experimentation into production and discovering that managing agents is more complex than creating them.“What enterprises are dealing with now is managing and governing a collection of agents,” she said. “That has become an issue.”
At the center of this shift isobservability, a concept Ashoori said many enterprises underestimated during the initial rush to deploy agents.Agents do not perform a single action, Ashoori explained. They break a request into a series of steps, each involving a model decision and often a call to an external tool or data source. Understanding that sequence is essential once something goes wrong.To explain the idea, she described a hypothetical customer support interaction. A user tells an agent that their camera is not working. The agent first searches internal manuals connected through aretrieval system. If it finds an answer, it responds with references. If it does not, it may call an online search tool or look for answers in public forums.Each of those actions is a decision point. If the agent surfaces incorrect or inappropriate information, enterprises need to know exactly how that result was produced.“This is what we call tracing,” she said.Tracing records every action an agent takes. According to Ashoori, that includes which model was involved, which tool was called, what the inputs and outputs were, how long each step took, and what it cost. Observability, she said, is the ability to surface, aggregate and analyze those traces in a way that supports auditability and improvement.“With tracing, you can go back and see exactly what happened,” she said.The visibility serves multiple purposes. It allows enterprises to investigate failures and demonstrate compliance. It also exposes opportunities for optimization, such as identifying slow steps, expensive model calls or inefficient workflows.“You can see where the latency was caused,” she said. “You can see where the cost was triggered, and you can replace that model.”Despite the stakes, Ashoori said adoption of observability remains low. She cited figures showing that only about 19%of organizations currently focus on observability and monitoring in production, even as the risks and costs increase.Looking ahead, she said that the gap will close quickly. She pointed to Gartner estimates suggesting that by 2028, roughly one-third of interactions with generative AI systems will occur through agents.“Agents are going to be everywhere,” she said.Observability, however, is only one part of the challenge. Ashoori said the next phase of agent adoption will also be shaped by policy enforcement and security.As agents gain autonomy, defining responsibility becomes harder. If something goes wrong, it is not always clear who is accountable. The builder, the model provider, the tool involved and the end user may all play a role.“The security of agents is a very hot topic,” she said.She said enterprises are beginning to set non-negotiable policies to govern agent behavior, particularly in highly regulated environments. Some of those policies come from risk and compliance teams. Others come from security professionals concerned with identity, access and responsibility.There are still many open questions, she said, especially around who owns an agent’s actions and how responsibility should be assigned when autonomy is involved.“There’s a certain level of autonomy associated with agents that has security consequences,” she said.Fragmentation makes the problem harder. Agents are built on different platforms and frameworks, both inside and outside enterprises. The reality, she said, is that it is driving interest in governance approaches that are agnostic to how agents are created or where they run.From her perspective, best practice involves separating the systems that build agents from those that govern them, enabling enterprises to monitor, evaluate and optimize agent behavior regardless of origin.