AI and Engineering Leadership

When Code Gets Cheap, Complexity Moves: Rethinking Estimation in the AI Era

AI makes first-draft code cheaper, but it does not erase ambiguity, integration, verification, coordination, or production risk. That means engineering complexity has to be discussed differently, especially in Scrum planning.

16 min read

TL;DR

  • Your intuition is directionally right: in an AI-assisted environment, implementation time is becoming a worse proxy for complexity.
  • But the stronger version of the claim is more precise: time was never the same thing as complexity, and AI is exposing that mismatch much more aggressively.
  • AI compresses first-draft coding on bounded tasks. It does not compress ambiguity, dependency management, verification, cross-team coordination, deployment risk, or operational hardening at the same rate.
  • For Scrum teams, this means story discussions should focus less on “how long will coding take?” and more on “what kind of delivery complexity are we buying?”
  • The planning mistake of the AI era is not underestimating code generation speed. It is mistaking fast code creation for low-complexity delivery.

The rest of this article uses sources current through April 22, 2026.

The Simple Idea

Here is the intuition behind the hypothesis:

If AI can generate a feature scaffold, test draft, refactor suggestion, migration helper, or UI implementation in minutes, then the old mental shortcut starts to break:

“This will take longer, therefore it must be more complex.”

That shortcut was always imperfect, but it was often good enough. In many teams, writing and rewriting code consumed such a large share of the work that elapsed development time felt like a reasonable stand-in for complexity.

That is much less true now.

When code generation gets dramatically cheaper, the hard part of software work becomes easier to see. It was not only the typing. It was the uncertainty around the typing.

That is why the hypothesis is useful, but it needs sharpening.

The Stronger Version of the Hypothesis

I would restate it like this:

In the AI era, coding time is becoming a weaker proxy for software complexity, so engineering teams need a better language for complexity and a better way to forecast delivery than relying on implementation effort alone.

That wording matters because “complexity cannot be measured in time anymore” is emotionally correct, but literally too absolute.

Time still matters. If something takes six weeks, that is obviously a planning concern.

What changed is this:

  • coding time is no longer the dominant cost in as many tasks as before
  • the variance introduced by AI is highly context-dependent
  • the remaining work is often socio-technical rather than purely syntactic
  • elapsed time now hides multiple kinds of complexity that do not move together

In other words, AI does not eliminate complexity. It rearranges where complexity lives.

What the Evidence Actually Says

The most interesting part of this debate is that the evidence is not cleanly pro-AI or anti-AI. It is more useful than that. It shows where the gains show up and where they get absorbed by other forms of work.

1. AI clearly speeds up some coding tasks

GitHub’s controlled Copilot experiment found that developers finished a JavaScript server task about 55% faster with Copilot than without it. In later enterprise research with Accenture, GitHub also reported higher developer satisfaction, less mental effort on repetitive work, and easier access to flow.

That matters because it confirms the basic premise: AI can substantially compress the cost of generating working code in bounded environments.

If your mental model of complexity was anchored heavily to “how much code has to be written,” AI breaks that model quickly.

2. AI can also slow down experienced developers in real brownfield work

METR’s 2025 randomized controlled trial found something almost everyone considered surprising: experienced open-source developers working in their own familiar repositories took 19% longer when using early-2025 AI tools.

That result is incredibly important for managers.

It does not mean AI is useless. It means the bottleneck in mature systems is often not initial code generation. It is:

  • loading repository context
  • honoring local conventions
  • validating almost-right output
  • reviewing and cleaning up generated changes
  • making safe decisions under existing constraints

That is already a strong challenge to time-based thinking. The same “task type” can look easy in a greenfield benchmark and messy in a real production system.

3. AI improves some local process measures without automatically improving delivery

The 2025 DORA AI report adds another critical layer.

DORA found that higher AI adoption was associated with better documentation quality, code quality, code review speed, approval speed, and lower perceived code complexity. But the same report found worse delivery performance, including a likely reduction in throughput and a larger reduction in delivery stability.

Their hypothesis is highly plausible: if AI lets teams produce much more code in the same time, teams may start shipping larger change sets. DORA has repeatedly shown that large batches are slower and more destabilizing.

This is one of the clearest pieces of evidence for your hypothesis.

The complexity did not disappear. It moved downstream into:

  • reviewability
  • batch size
  • integration safety
  • instability in production

4. Productivity research was already warning us not to use one metric

Before the current AI wave, the SPACE framework argued that developer productivity cannot be reduced to a single metric. Productivity spans satisfaction, performance, activity, communication and collaboration, and efficiency and flow.

That matters here because teams often slip into a very narrow AI productivity story:

“The model wrote more code faster, so the work must be simpler.”

But faster code production is mostly an activity and efficiency signal. It says much less about collaboration overhead, verification burden, service health, or whether the right problem got solved.

5. Scrum was already built for complex work

The Scrum Guide defines Scrum as a framework for generating value through adaptive solutions for complex problems. It also frames backlog refinement around increasing transparency and sizing work that the team can actually complete within a Sprint.

This is a useful reminder: story points were never supposed to be a direct translation of developer hours.

In practice, many teams quietly drifted there anyway. AI now makes that drift much more dangerous.

If a team estimates mostly from coding effort, then AI will make many stories look artificially smaller, even when the real uncertainty sits somewhere else.

So What Is Complexity Now?

This is the core of the article.

In an AI-assisted engineering environment, complexity is better discussed as a delivery surface, not just as implementation effort.

I find it useful to break that surface into six dimensions.

1. Ambiguity complexity

How unclear is the problem itself?

Examples:

  • requirements are incomplete
  • stakeholders disagree on the expected behavior
  • edge cases are still being discovered
  • success criteria are not testable yet

AI can help generate options here, but it does not remove the ambiguity. In some cases it increases it by making many possible implementations feel equally reachable.

2. Integration complexity

How many existing systems, services, constraints, and conventions must the change respect?

Examples:

  • legacy schemas
  • third-party APIs
  • auth and permission models
  • caching layers
  • frontend/backend contract changes
  • migration sequencing

This is where brownfield work often stops being “just code.”

3. Verification complexity

How hard is it to prove that the change is correct?

Examples:

  • hard-to-reproduce bugs
  • weak test harnesses
  • brittle environments
  • performance-sensitive code
  • security-sensitive behavior
  • subtle regressions that only appear under load or in production data

AI often shifts effort here. The draft appears faster. The proof remains expensive.

4. Coordination complexity

How many people, teams, approvals, or organizational interfaces have to align?

Examples:

  • platform team dependencies
  • design review
  • compliance or legal review
  • QA coordination
  • release windows
  • customer communication

This is one reason the same feature can be “small” for one team and “big” for another.

5. Consequence complexity

What happens if this goes wrong?

Examples:

  • payment failures
  • security exposure
  • data corruption
  • customer-visible downtime
  • expensive rollbacks
  • regulatory implications

Low coding effort and high consequence can coexist. AI makes that mismatch more common, not less visible.

6. Hardening complexity

How much work is needed after the first working version exists?

Examples:

  • cleanup and refactoring
  • observability
  • documentation
  • backward compatibility
  • rollout controls
  • monitoring and alerting
  • support readiness

This is the classic “it worked in the demo” trap. The prototype arrives quickly, and everyone underestimates the cost of making it safe, maintainable, and operable.

The New Rule

Here is the most concise way to say it:

AI reduces implementation friction faster than it reduces delivery complexity.

You do not need a grand theory to see the modern pattern:

  • AI is exceptionally good at removing friction around drafting, translating, scaffolding, and exploring implementation options
  • AI is much less reliable as a substitute for judgment around problem framing, tradeoffs, organizational context, blast radius, and proof

So when teams say, “AI made this faster,” the right next question is:

Which part got faster?

Because that answer determines whether the overall work really became less complex or whether the complexity simply migrated.

Common Questions Teams Are Asking Now

These are the questions showing up again and again in engineering discussions, delivery reviews, and Scrum debates.

”If AI can build it in one afternoon, why is this still an 8?”

Because the implementation draft is not the same thing as the delivery risk.

The story may still involve data migration, verification gaps, release sequencing, rollback planning, stakeholder alignment, or non-obvious edge cases.

In the AI era, teams need to stop letting demo speed overpower delivery reality.

”Should we stop using story points?”

Not necessarily.

But teams should stop pretending story points represent coding hours. If story points are still useful, they should reflect a blended judgment about uncertainty, integration, verification, and consequence, not just implementation effort.

If your team cannot keep that meaning stable, flow-based forecasting with historical cycle-time ranges may be a better fit.

”Why do easy-looking tickets still blow up during the Sprint?”

Because AI makes the visible part of the ticket cheaper.

That means teams discover late that the hard part was never the scaffold. It was the mismatch between the local code change and the wider system.

”Why are PRs bigger and reviews harder even though coding is faster?”

Because faster generation encourages larger batch sizes unless teams actively resist that pull.

DORA’s findings are a warning here: local coding speed can coexist with worse delivery stability when AI-generated changes become too large to review and validate safely.

”Does this mean senior engineers matter less?”

Almost the opposite.

Senior leverage shifts away from raw code output and toward:

  • framing the problem correctly
  • decomposing work into safe batches
  • deciding what must be verified
  • spotting hidden dependencies
  • protecting system integrity

AI compresses junior-accessible implementation. It does not erase the value of judgment.

What This Looks Like in Real Planning

The best way to make this concrete is through a few common scenarios.

Scenario 1: The “simple” CRUD feature

The assistant generates the form, validation, endpoint, and tests in an hour.

Everyone feels like the story is tiny.

But the real work includes:

  • reconciling role-based access rules
  • updating audit logs
  • enforcing data retention rules
  • handling partial failure states
  • mapping the new behavior into existing reporting

The coding time collapsed. The business and system complexity did not.

Scenario 2: A bug in a mature system

The change itself may be three lines.

But the bug only appears under specific concurrency conditions in production, the original author left years ago, the test harness is weak, and the team is not fully confident in the rollout path.

This is low implementation complexity and high verification complexity.

Teams that size only by “amount of code” will underestimate it every time.

Scenario 3: A cross-team platform change

AI can generate adapters, migration scripts, and documentation drafts quickly.

But the work still depends on:

  • coordination with another team
  • contract negotiation
  • rollout timing
  • backward compatibility
  • downstream consumers

This is where engineering managers especially need a broader language than “fast” or “slow.”

Scenario 4: A flashy prototype becomes a roadmap commitment

Someone uses AI to build a convincing internal demo in two days.

Leadership now assumes the production feature is close.

But what remains is:

  • security review
  • observability
  • UX polish
  • operational ownership
  • support readiness
  • performance testing
  • change management

The prototype was real. The complexity was also real. They lived in different parts of the lifecycle.

What Breaks in Scrum Planning

This shift hits Scrum in a very practical way.

1. Teams accidentally re-anchor estimates to coding time

This is the biggest trap.

As soon as AI speeds up visible implementation, teams start shrinking estimates without re-evaluating the rest of the work. The result is predictable:

  • Sprints look overcommitted late
  • “almost done” stories pile up
  • review and testing become the hidden bottleneck
  • confidence in estimation gets worse, not better

2. Story points become unstable across work types

A point value that felt calibrated in pre-AI feature work may no longer map well across:

  • greenfield vs brownfield work
  • isolated components vs system-wide changes
  • internal tools vs regulated customer-facing flows
  • prototype work vs production-hardening work

The faster AI gets, the more this inconsistency shows up.

3. Sprint plans ignore AI-induced review load

Teams often celebrate that AI helped produce more code, then quietly overwhelm reviewers, QA, or release owners.

If review capacity is not part of planning, AI can raise local output while reducing system-level flow.

4. Managers mistake reduced effort for reduced uncertainty

These are not the same thing.

A task can feel easier because AI handles more of the drafting, while still remaining highly uncertain because:

  • requirements are unstable
  • dependencies are unresolved
  • the failure modes are poorly understood

This is one of the reasons AI can make work feel smoother while forecasts stay noisy.

A Better Way to Estimate in the AI Era

Engineering teams do not need a magical new framework as much as they need a better conversation.

Here is the practical shift I would recommend.

1. Estimate delivery difficulty, not typing effort

During refinement or planning, ask:

  • What is unclear?
  • What must integrate?
  • What is hard to verify?
  • Who else must align?
  • What happens if this fails?
  • What has to be hardened after the first working version?

This quickly surfaces the real complexity that AI does not erase.

2. Split stories by complexity type

A story that mixes discovery, implementation, verification, and rollout is harder to estimate than a story that isolates one mode of work.

Possible split:

  • discovery slice
  • implementation slice
  • hardening slice
  • rollout slice

This is often much more useful than trying to produce one “smart” number.

3. Forecast with ranges, not false precision

If a team uses Scrum, the healthiest planning move is usually to reduce certainty theater.

Say:

  • “AI will likely compress implementation here”
  • “integration and review remain the risk”
  • “we should forecast this as a range, not as a deterministic promise”

The goal is not to sound less confident. It is to be more accurate.

4. Recalibrate estimates by work class

Many teams now need separate intuition for different categories:

  • greenfield feature work
  • brownfield changes
  • incident fixes
  • migrations
  • cross-team platform work
  • production hardening

AI affects each category differently. Treating them as one estimation domain makes the numbers noisier than they need to be.

5. Protect small batch sizes aggressively

If AI increases output, teams need stronger discipline around:

  • narrower pull requests
  • earlier review
  • tighter acceptance criteria
  • stronger test gates

Otherwise AI turns into a batch-size amplifier, and the delivery system absorbs the gains as instability.

What Engineers and Engineering Managers Should Change

For engineers:

  • stop using “I can generate this quickly” as evidence that the work is simple
  • distinguish first draft from verified change
  • optimize for smaller, reviewable slices
  • make hidden integration and verification costs explicit earlier

For engineering managers:

  • stop treating AI coding speed as a direct forecast input by itself
  • ask where the remaining uncertainty lives
  • monitor review queues, batch size, escaped defects, and rework, not just output volume
  • separate prototype velocity from production readiness in roadmap conversations

This is where the management impact is strongest.

The old planning instinct was:

more coding speed -> lower effort -> smaller estimate

The better AI-era instinct is:

more coding speed -> check where complexity moved before shrinking the estimate

Final Position on the Hypothesis

So, is the hypothesis right?

Yes, with one important correction.

It is not that complexity can no longer be discussed in relation to time at all. It is that time is no longer a trustworthy shortcut for complexity when AI changes only some parts of the work.

That matters because Scrum planning, engineering forecasting, and delivery conversations all become distorted when teams continue to treat implementation speed as the main signal.

The new planning language should sound more like this:

  • this is low coding effort but high verification risk
  • this is easy to scaffold but hard to integrate
  • this is fast to prototype but expensive to harden
  • this is small in code but large in blast radius
  • this is simple for one team but coordination-heavy across three

That is the conversation AI is forcing us to have.

And honestly, it is a healthier conversation anyway. It is closer to how software complexity really worked all along.


Sources