AI Coding Workflows

From Spec to Parallel Delivery: Spec Kit, Cursor, Beads, and Claude Code on a Real Feature

A practical engineer-first guide to turning a messy enterprise feature into specs, acceptance criteria, Beads task graphs, git worktrees, and parallel delivery across Cursor and Claude Code.

16 min read

My earlier article on Claude Code, Cursor, and Beads answered an important question: which tool is better for planning, execution, and persistence?

The missing follow-up was the one most engineers actually need: how do you use them together on a feature that is big enough to involve architecture decisions, acceptance criteria, background work, branch isolation, and multiple people?

This article is that companion piece.

On April 2, 2026, I rechecked the workflow primitives here against the current Spec Kit docs, the Cursor changelog and docs snippets, Anthropic’s Claude Code docs, the Beads README, and the official Git worktree docs. One important note up front: the exact end-to-end workflow below is my synthesis. The individual capabilities are source-backed. The orchestration is the practical part we as engineers still have to design.

TL;DR

  • The earlier comparison article was right that Spec Kit, Cursor, Beads, and Claude Code solve different layers of the workflow.
  • The missing piece is a spec-first operating model: requirements and acceptance criteria first, durable task graph second, isolated implementation third.
  • A strong default flow is:
    1. Spec Kit in Cursor for constitution, spec, clarification, plan, and task generation
    2. Draft PR for the planning docs so the team agrees on scope before code starts
    3. Beads to convert the approved plan into persistent, dependency-aware tasks
    4. Git branches and worktrees to isolate parallel implementation streams
    5. Claude Code on the web for long-running or well-bounded tasks that can run asynchronously
  • If multiple humans and agents are involved, the safest rule is simple: one task stream, one branch, one worktree, one owner at a time.

What You Will Learn Here

  • How to review the earlier comparison article and turn it into a more practical companion
  • How to use Spec Kit’s workflow as the planning backbone inside Cursor
  • How to turn acceptance criteria and requirements into a Beads task graph
  • How to split a complex feature across branches, worktrees, AI agents, and human teammates
  • Where Claude Code on the web fits best in a real engineering loop
  • What to avoid when multiple agents and developers are all changing the same system

What the Earlier Article Got Right

The previous article still holds up in three important ways:

  • Claude Code is strongest when you want explicit planning, shell access, and branch-oriented execution.
  • Cursor is strongest when you want an IDE-native planning surface and async agent experience.
  • Beads is strongest when you need durable task memory that survives context resets and handoffs.

Where that article intentionally stayed higher-level was the part most teams struggle with in practice:

  • what the actual planning artifacts should be
  • when to freeze the spec and open implementation branches
  • how to avoid task collisions across people and agents
  • when to use local work versus cloud execution

So instead of another comparison, let’s walk through one realistic feature end to end.


The Real Use Case

Assume we are building a B2B SaaS platform and need to add:

  • SAML SSO for enterprise tenants
  • Just-in-time user provisioning
  • SCIM directory sync
  • Audit log events for admin and identity actions
  • Admin UI for identity provider setup and sync monitoring

This is a good example because it crosses product, backend, worker, security, and platform boundaries. It also creates real sequencing:

  • the domain model and auth boundaries have to be agreed first
  • the API and callback flows need strong acceptance criteria
  • the worker can be parallelized, but only after contracts are stable
  • audit logging touches almost every path

Here is the mental model:

feature request
  -> Spec Kit + Cursor
  -> acceptance criteria + requirements + technical plan
  -> draft PR for plan review
  -> Beads task graph
  -> isolated branches + git worktrees
  -> humans + Cursor agents + Claude Code web sessions
  -> PRs + review + merge

That is the whole thesis of this article: planning artifacts are not enough by themselves. You need a clean handoff from planning into execution.


Why This Workflow Is Plausible

This flow is not magic. It is just a clean composition of capabilities that now exist.

Spec Kit gives you the planning sequence

Spec Kit’s quick start describes a structured flow: define a constitution, create the spec, clarify ambiguity, create a technical plan, then break the work into tasks. It also notes that the active feature is inferred from the current Git branch, which is a subtle but powerful fit for engineering workflows.

That makes Spec Kit a good backbone for:

  • project rules
  • feature intent
  • clarification loops
  • plan generation
  • task breakdown

Cursor gives you a strong planning surface

Cursor’s planning docs describe planning around to-dos and queued work, and the current Cursor 3 changelog explicitly says the new Agents Window can run many agents in parallel across repos and environments, including worktrees, the cloud, and remote SSH. Cursor also added an explicit /worktree command and includes plans in shared chats.

That makes Cursor a strong place to do the messy middle:

  • explore the codebase
  • debate architecture in context
  • keep to-dos visible
  • share the evolving plan with the team

Beads gives you durable execution memory

Beads describes itself as a distributed graph issue tracker for AI agents, with persistent structured memory, dependency-aware tasks, hash-based IDs, a bd ready queue, and explicit claiming. That is the missing layer after the plan is approved.

Claude Code on the web gives you async execution

Anthropic’s current docs say Claude Code is available in the terminal, IDE, desktop app, and browser. The web product can run tasks asynchronously on cloud infrastructure, clone the repo, run setup scripts, execute tasks, show diffs, create PRs, auto-fix PR activity, and run multiple remote tasks in parallel. Anthropic also documents the pattern of plan locally, execute remotely with claude --permission-mode plan followed by claude --remote.

Git worktree gives you safe isolation

The official Git docs describe git worktree as a way to manage multiple working trees attached to the same repository, which means you can check out multiple branches at once without constant stash-switch chaos.

That is the missing safety rail when you have several parallel streams.


Step 1: Start with a Planning Branch

Before anyone touches production code, create a branch for the planning artifacts only.

git switch -c 042-enterprise-sso-spec

I like using a numeric prefix because it maps cleanly to:

  • a feature folder
  • a Beads epic
  • a draft PR
  • a branch naming convention

If you are using Spec Kit’s branch-aware flow, a branch like 042-enterprise-sso-spec gives the feature a stable identity immediately.

At this point, the branch should contain only planning artifacts such as:

specs/042-enterprise-sso/
  constitution.md
  spec.md
  plan.md
  tasks.md

You do not want implementation mixed into this branch. The goal is to make the plan reviewable on its own.


Step 2: Use Spec Kit in Cursor to Define the Work

This is where Cursor becomes the planning cockpit.

If you already use Cursor as the primary IDE, keep the whole planning conversation there and run the Spec Kit sequence in order.

1. Constitution

Start with project rules and non-negotiables:

/speckit.constitution
This feature is security-first and audit-first.
All identity changes must be attributable to a tenant and actor.
All SSO and SCIM flows must emit structured audit events.
Database changes must be backward compatible for one deploy cycle.
We prefer incremental delivery over a single big-bang release.

That is not busywork. It is what stops later agent output from drifting into “whatever compiles.”

2. Specification

Then describe the feature in business terms, not implementation details:

/speckit.specify
Build enterprise identity management for a B2B SaaS platform.
Tenant admins must be able to configure a SAML identity provider,
test login safely, enable just-in-time provisioning, and monitor SCIM
sync status. The system must maintain audit history for configuration
changes, login attempts, provisioning events, and directory sync actions.

3. Clarification

Now use the clarify step to force ambiguity out of the room.

Examples:

/speckit.clarify
Focus on tenant isolation, rollback strategy, audit retention, and failure handling.
/speckit.clarify
Clarify whether SCIM deactivation removes access immediately or marks the user as suspended.
Clarify whether SSO can be enabled before SCIM is configured.
Clarify the safe rollout sequence for existing enterprise tenants.

4. Technical plan

Only after the spec is sharp enough do we move into implementation shape:

/speckit.plan
The product is a TypeScript monorepo with Next.js web app, Node API,
Postgres, Redis-backed jobs, and OpenTelemetry. Keep the rollout incremental.
Add feature flags per tenant. Prefer additive schema changes. Use idempotent
worker behavior for SCIM sync.

5. Task generation

Finally:

/speckit.tasks

This is where the previous comparison article stops and the execution system begins.


Step 3: Convert the Plan into Acceptance Criteria Before You Trust the Tasks

This is the part people skip. Do not move straight from generated tasks into coding.

The planning branch should produce a crisp acceptance-criteria layer first.

For this feature, I would want the planning docs to lock in ACs like these:

  • Tenant admins can configure SAML metadata without enabling SSO immediately.
  • Test login validates the SAML handshake without locking out password-based fallback.
  • When SSO is enabled for a tenant, only approved domains can JIT provision users.
  • SCIM deactivation suspends access within the agreed SLA and emits an audit event.
  • Failed SCIM syncs are retryable, observable, and visible in the admin UI.
  • Every identity-admin action writes a structured audit event with tenant, actor, action, and timestamp.
  • Rollout can be enabled per tenant behind a feature flag.
  • Existing local-auth tenants can migrate without downtime.

If the generated tasks.md does not clearly map back to these ACs, the task list is premature.

A useful rule is:

ACs are the contract. Tasks are the implementation proposal.

That distinction matters because tasks often change during delivery. The contract should not.


Step 4: Open a Draft PR for the Plan Before Opening Execution Branches

Once constitution.md, spec.md, plan.md, and tasks.md are in decent shape, open a draft PR containing only those documents.

That gives you one clean review checkpoint for:

  • scope
  • rollout order
  • acceptance criteria
  • dependency sequencing
  • ownership boundaries

This is where the tech lead, staff engineer, or responsible PM should challenge the plan before code starts.

If you skip this step, Beads and the worktrees will just operationalize a shaky plan faster.


Step 5: Move the Approved Work into Beads

Now we translate the approved plan into durable task memory.

Initialize Beads in the repo:

bd init

Then create the epic and major streams using the approved plan, not the raw prompt history:

bd create "Enterprise SSO rollout" -p 0
bd create "Admin UI for IdP setup and sync monitoring" -p 1
bd create "SAML ACS endpoint and JIT provisioning" -p 0
bd create "SCIM sync worker with retry and reconciliation" -p 0
bd create "Audit events for SSO, SCIM, and admin changes" -p 0
bd create "Tenant rollout flags and migration path" -p 1

Then connect the dependencies:

bd dep add <ui-task-id> <epic-id>
bd dep add <api-task-id> <epic-id>
bd dep add <worker-task-id> <api-task-id>
bd dep add <audit-task-id> <api-task-id>
bd dep add <rollout-task-id> <api-task-id>
bd ready

The point is not the exact commands. The point is the shape:

  • the spec branch gives you the approved scope
  • Beads turns that scope into durable, claimable, dependency-aware work
  • bd ready becomes the operational queue for humans and agents

This is the right moment to add ownership fields, notes, and links back to the planning docs.


Step 6: Create One Branch and One Worktree Per Delivery Stream

After the planning PR is approved, merge it.

Then create implementation branches from updated main, not from stale copies of the planning branch.

git switch main
git pull --ff-only

git worktree add ../wt-042-sso-ui -b feat/042-sso-ui
git worktree add ../wt-042-sso-api -b feat/042-sso-api
git worktree add ../wt-042-sso-worker -b feat/042-sso-worker
git worktree add ../wt-042-sso-audit -b feat/042-sso-audit
git worktree add ../wt-042-sso-rollout -b feat/042-sso-rollout

Now each stream has:

  • its own branch
  • its own working tree
  • its own owner
  • its own Beads task or subgraph

That is what prevents “everyone is editing auth and migrations at once” chaos.

A clean ownership map

StreamExample Beads focusBranchWorktreePrimary owner
UIIdP config screens, sync health, rollout controlsfeat/042-sso-ui../wt-042-sso-uiFrontend engineer
APISAML callback flow, JIT provisioning, policy checksfeat/042-sso-api../wt-042-sso-apiBackend engineer
WorkerSCIM sync, retries, reconciliationfeat/042-sso-worker../wt-042-sso-workerClaude Code web or backend engineer
AuditEvent schema, instrumentation, observabilityfeat/042-sso-audit../wt-042-sso-auditPlatform engineer
Rolloutfeature flags, migration scripts, tenant enablementfeat/042-sso-rollout../wt-042-sso-rolloutTech lead or staff engineer

If two streams need the same files, they are not separate streams yet. Go back and redraw the boundaries.


Step 7: Use Cursor for Ambiguous Work, Claude Code on the Web for Bounded Work

This is the practical split I recommend.

Keep work in Cursor when:

  • the architecture is still moving
  • you need heavy codebase exploration
  • a human needs to guide the solution in tight loops
  • the task affects several shared abstractions at once

Use Claude Code on the web when:

  • the task is well-bounded and already tied to a branch
  • the acceptance criteria are stable
  • the agent can run for a while without constant steering
  • you want async progress, diff review, or PR auto-fix

For example, the SCIM worker stream is a great candidate for cloud execution once the contracts are stable.

From the worker worktree:

cd ../wt-042-sso-worker
git push -u origin feat/042-sso-worker

claude --remote "Checkout feat/042-sso-worker and implement the approved
SCIM retry and reconciliation tasks from specs/042-enterprise-sso/tasks.md.
Do not change API contracts without asking. Run tests and leave the branch
ready for PR review."

Why this split works:

  • Claude Code on the web runs the task asynchronously in Anthropic-managed infrastructure
  • it can review diffs before PR creation
  • it can respond to PR comments and CI failures if auto-fix is enabled
  • you can teleport the session back into the terminal later if you want to continue locally

This is much safer than pointing a cloud agent at a vague prompt like “build enterprise SSO.”


Step 8: Give Humans and Agents the Same Operating Rules

If both humans and AI are participating, the repo needs one shared contract.

I would add a short instruction block to AGENTS.md or CLAUDE.md like this:

## Delivery rules for feature 042-enterprise-sso

- Read `specs/042-enterprise-sso/spec.md`, `plan.md`, and `tasks.md` before editing code.
- Claim a Beads task before starting implementation.
- One task stream per branch and worktree.
- Do not change auth contracts, audit schema, or rollout flags without updating the planning docs.
- If a task touches files owned by another active stream, stop and ask for re-planning.

This matters more than people think.

When several agents are active, the main failure mode is not syntax errors. It is unspoken overlap.


Step 9: Assign Work to the Team Like an Engineering Lead, Not Like a Prompt Author

The most useful handoff unit is not “please help with SSO.”

It is something closer to:

  • Frontend engineer: own feat/042-sso-ui; implement admin configuration, validation, and sync status screens; do not change callback semantics
  • Backend engineer: own feat/042-sso-api; implement SAML ACS flow, JIT provisioning, and tenant-domain enforcement
  • Platform engineer: own feat/042-sso-audit; define audit event schema and instrumentation hooks
  • Claude Code web: own feat/042-sso-worker; implement retryable SCIM reconciliation against the approved contracts
  • Tech lead: own feat/042-sso-rollout; keep the migration path and feature-flag strategy coherent across all streams

That is how Beads, worktrees, and async agents become helpful instead of noisy.

Each contributor should claim tasks explicitly:

bd ready
bd update <task-id> --claim
bd show <task-id>

Now the coordination surface is durable, inspectable, and not trapped in chat history.


Step 10: Merge in Dependency Order, Not in Completion Order

This is where teams accidentally recreate chaos after doing the hard planning work.

Do not merge whichever branch happens to finish first.

Merge in dependency order:

  1. API contracts and migrations that define the stable surface
  2. Audit instrumentation if it is required by multiple streams
  3. Worker behavior once contracts are real
  4. UI and rollout controls once the lower layers are stable

If one stream finishes early, that is fine. It can stay open until the dependency chain is ready.

This is also where Claude Code’s PR auto-fix loop can be useful. A bounded branch that already has clear ownership is a good candidate for:

  • CI-failure fixes
  • style or lint cleanups
  • small review comment iterations

It is not the place to let an agent reinterpret architecture after the team already agreed on the spec.


Where This Workflow Breaks

This pattern is strong, but not universal.

It breaks down when:

  • the feature is too small to justify a planning branch
  • the repo is so coupled that “separate streams” still edit the same files
  • the team treats Beads and markdown plans as two separate sources of truth
  • nobody owns the acceptance criteria after task generation

It also breaks if you skip the planning review and go straight from a prompt into a five-branch execution plan. That just gives you faster confusion.

The safe defaults are:

  • spec first
  • one approved task graph
  • one owner per stream
  • one worktree per stream
  • one source of truth for task state

The Practical Takeaway

The earlier comparison article was useful because it separated planning, execution, and persistence into different tool strengths.

The more practical follow-up is this:

Use Spec Kit and Cursor to decide what should be built. Use Beads to remember what is currently being built. Use branches, worktrees, and Claude Code to control how it gets built.

That sounds simple, but it solves a very real engineering problem:

  • specs tend to vanish into chat logs
  • tasks drift across sessions
  • branches turn into junk drawers
  • agents overlap when nobody defines ownership

A spec-first planning branch, followed by a Beads task graph, followed by isolated worktrees, is the cleanest pattern I have found so far for keeping a complex feature legible while multiple humans and agents are all moving at once.


Sources