Skip to main content

The Parallel Agent Paradox: Why More AI in Your Dev Team Doesn't Automatically Mean More Productivity

Chloe Tan
Chloe Tan Fintech Product Leader & Digital Banking Strategist
The Parallel Agent Paradox: Why More AI in Your Dev Team Doesn't Automatically Mean More Productivity - Featured image illustration

There is a new pattern spreading quietly through engineering floors across Silicon Valley, Singapore, and Bangalore. Open a developer’s terminal window and you are increasingly likely to see not one but three or four AI coding sessions running simultaneously—Claude Code working on a refactoring task in one pane, Codex CLI handling a test-writing job in another, and a third agent exploring whether a new library will actually work with the existing codebase.

The Pragmatic Engineer newsletter documented this shift in October 2025, calling it a “new trend: programming by kicking off parallel AI agents.” Software engineer Simon Willison, widely regarded as one of the most clear-eyed voices on AI development tools, wrote about “embracing the parallel coding agent lifestyle” that same month, describing how he now routinely fires off multiple Claude Code and Codex CLI instances against the same codebase.

A product manager reviewing multiple code review windows on a large monitor in a modern co-working space

For product managers overseeing engineering teams—especially in the rapidly evolving tech ecosystems of Southeast Asia—this trend raises an urgent question: should you be pushing your teams to adopt this approach? The honest answer is more nuanced than the enthusiasm in developer forums suggests.

The Seductive Logic of Parallel Agents
#

The appeal of parallel AI agents is intuitive. If a single AI agent can handle a research task, a maintenance fix, or a small feature in twenty to thirty minutes with no human interaction, why not run four at once? Willison identifies three practical categories where the approach shines: research and proof of concept work (answering “will this library do what we need it to do?”), small maintenance tasks (fixing deprecation warnings, updating dependencies), and carefully specified directed work where the developer has already thought through the implementation in detail.

The architecture insight here matters. Willison notes that code reviewed from your own specification is dramatically less effort to verify than code that arrives from a specification you did not write. By designing the task, the developer retains enough context to quickly confirm correctness—the bottleneck shifts from writing to reviewing, and review becomes manageable.

This, combined with the fact that agents can work while the developer’s attention is elsewhere, creates a genuine multiplier. The Pragmatic Engineer makes a pointed observation: senior engineers, who already juggle code reviews across two to five workstreams, mentor juniors, and switch context dozens of times per day, are likely to be “naturals” at working with parallel AI agents precisely because their existing workflows mirror the skill set required.

The Counterintuitive Finding That Should Make Every PM Pause
#

Before you update your team’s working practices based on developer enthusiasm alone, consider a landmark study published by Model Evaluation and Threat Research (METR) in July 2025.

METR recruited sixteen experienced developers—contributors to mature open-source repositories averaging over twenty-two thousand GitHub stars—and paid them $150 per hour to fix real issues from projects they knew well. Some were assigned AI tools (primarily Cursor Pro with Claude 3.5 and 3.7 Sonnet); others were not. Screens were recorded. Results were measured. The finding was striking:

“When developers use AI tools, they take 19% longer than without. AI makes them slower.”

More revealing still was the perception gap. Before using the tools, developers predicted AI would speed them up by 24 percent. After completing the work—still believing they had been faster—they estimated a 20 percent speedup. In reality, they had been slower. The study authors characterised this as evidence that “developers drastically overestimate the usefulness of AI on developer productivity, even after they have spent many hours using the tools.”

The Pragmatic Engineer’s detailed breakdown of the METR findings identifies where the time was lost: developers saved time on writing code and on research, but spent significantly more time prompting, waiting on AI responses, reviewing AI output, and managing IDE overhead. The net result was negative.

There was one notable exception. The single developer with more than fifty hours of prior Cursor experience completed work 38 percent faster than his non-AI peers—a result dramatically better than the group average. Simon Willison’s interpretation resonates: “This study mainly demonstrated that the learning curve of AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learning curve.”

What the Data Actually Tells Product Managers
#

The tension between the METR findings and the enthusiastic parallel-agent trend is not actually a contradiction. It is a signal about the conditions under which AI-assisted development delivers value—and those conditions have direct implications for how PMs should structure their teams.

The learning curve is real and it is steep. Developers who have not invested serious time in learning how to use AI tools effectively will be slower, not faster, in the short term. Mandating tool adoption without providing time to genuinely learn them is likely to hurt sprint velocity before it helps.

The gains are concentrated in experienced users. The 38 percent speedup for the developer with 50+ hours of experience is not a freak result—it reflects the pattern that Quentin Anthony, the PhD student in question, identified clearly: knowing which tasks are amenable to LLMs (writing tests, understanding unfamiliar code) versus which are not (GPU kernels, complex synchronisation semantics) is the core skill, and it takes time to develop.

Measurement is broken. If developers believe they are 20 percent faster when they are actually 19 percent slower, standard productivity proxies—developer satisfaction surveys, story point velocity, self-reported estimates—are worse than useless when evaluating AI tooling. Teams need outcome-based measurement: time from task creation to merge, defect rates, review cycles.

The real productivity unlock may be task composition, not raw speed. In January 2026, the Pragmatic Engineer published a first-person account of replacing a $120/year SaaS product in twenty minutes using Codex. The productivity gain there was not about doing an existing task faster—it was about making a previously uneconomical task (rebuilding a third-party dependency) suddenly viable. Product managers should be asking not just “are my engineers faster?” but “are my engineers tackling problems they would not have attempted before?”

Implications for Southeast Asia Engineering Teams
#

The regional context matters. Engineering teams across Singapore, Vietnam, and Indonesia are working in environments where the talent market is competitive and the pressure to demonstrate velocity is intense. There is a risk that parallel AI agents become a signalling tool—something teams claim to use because it sounds modern—rather than a genuine productivity lever.

In practice, the same principles apply here as anywhere: structured adoption, adequate learning time, and honest measurement. The advantage that regional teams do have is proximity to the product. Engineers embedded in markets with distinct user behaviours—Indonesia’s dominance of mobile-first commerce, Vietnam’s rapid growth in fintech, Singapore’s AI-first enterprise adoption—can direct AI agents toward product discovery and proof-of-concept work in ways that teams far removed from their users cannot.

The parallel-agent pattern is particularly valuable for exactly this kind of exploratory work: fire off an agent to assess whether a new payment SDK integrates cleanly with the existing stack while a second agent builds a throwaway prototype of a feature under consideration. Neither output needs to be production-ready; both generate information that speeds up the decision that matters.

A Framework for Product Managers
#

If you manage an engineering team and you are trying to decide how to engage with the parallel AI agent trend, three questions are worth answering before you update your working practices:

1. Have individuals on your team reached genuine fluency? If the answer is no for most of your team, adding parallel workflows will compound the learning overhead. Focus first on depth, not breadth.

2. Are you measuring outcomes, not effort? Time-to-merge, defect introduction rates, and the proportion of user-facing issues caught before production are more reliable than velocity points or developer self-reports.

3. Are you using agents for the right tasks? Parallel agents work best on well-defined, low-stakes, parallel work: research, small maintenance, directed implementations against a clear spec. They work poorly on high-ambiguity, tightly-coupled, or critically-reviewed tasks. The worst outcome is parallelising the wrong things and creating more noise for your reviewers to manage.

The underlying shift that the parallel agent trend represents—from developers as writers of code to developers as directors of AI agents—is real and significant. But the firms that will benefit are those that take the transition seriously: investing in the learning curve, measuring honestly, and using the new capability to attack problems that were previously out of reach, not merely to generate more of what they already produce.


References
#

AI-Generated Content Notice

This article was created using artificial intelligence technology. While we strive for accuracy and provide valuable insights, readers should independently verify information and use their own judgment when making business decisions. The content may not reflect real-time market conditions or personal circumstances.

Related Articles