AI Coding Workflows

Using GitHub Spec Kit to Investigate Alternatives Before You Build

A practical guide for engineers and PMs who want to use GitHub Spec Kit as an investigation workflow: compare alternatives, approve one direction, and turn the decision into a stronger implementation spec.

18 min read Updated Jun 12, 2026

TL;DR

  • GitHub Spec Kit is not only useful after you already know what to build. It can also help you investigate which option is worth building.
  • Treat the early Spec Kit flow as a decision funnel:
problem
  -> investigation brief
  -> alternatives
  -> decision matrix
  -> approved direction
  -> hardened spec
  -> implementation plan
  • Use /speckit.specify to capture the problem and success criteria, /speckit.clarify to expose unknowns, and /speckit.plan to compare implementation approaches before writing production code.
  • Ask the AI assistant for explicit alternatives, expected trade-offs, validation experiments, and “what would make this wrong?”
  • Do not approve an alternative because it sounds elegant. Approve it when the spec explains:
    • why this option wins
    • what it rejects
    • what must be verified
    • what risks remain
    • what implementation constraints are now non-negotiable
  • The strongest output is not just a prettier spec. It is a spec with a decision record inside it.

What You Will Learn Here

By the end of this article, you should be able to:

  • use GitHub Spec Kit as a research workflow, not only an implementation workflow
  • structure prompts that ask an AI agent to investigate alternatives
  • recognize the expected outputs at each stage
  • approve one alternative without losing the rejected context
  • turn the approved option into a more solid implementation spec
  • explain the workflow to both engineers and PMs

This article is written for teams that already feel the pain of AI-assisted delivery:

  • ideas move faster than shared understanding
  • agents produce code before requirements are stable
  • PMs need to compare product options without reading a giant diff
  • engineers need to compare architecture options without turning every idea into a branch

Research Audit

I rechecked this article on June 12, 2026 against the current public materials for GitHub Spec Kit and adjacent spec-driven tools.

The source-backed parts are:

  • Spec Kit is an open source toolkit from GitHub for spec-driven development.
  • The official flow centers specifications, plans, tasks, and implementation.
  • Spec Kit initializes project-local templates and assistant-specific command files.
  • GitHub positions Spec Kit as a way to make specifications executable enough to drive AI-assisted implementation.
  • Kiro and Tessl are useful comparison points because they also organize software work around specs, requirements, design, tasks, and validation.

The opinionated parts are my synthesis:

  • using Spec Kit as an investigation funnel
  • adding an explicit alternative approval step
  • treating the approved alternative as a decision record that strengthens the implementation spec
  • using lightweight experiments before implementation approval

The Problem: AI Agents Collapse the Wrong Distance

AI coding agents are very good at collapsing the distance between:

prompt -> code

That is useful, but it is not always the distance your team needs to collapse.

For real product and engineering work, the more expensive distance is usually:

unclear problem -> evaluated options -> defensible decision

If you skip that step, the agent can still produce code. It may even produce good code. But the team has not answered the higher leverage questions:

  • Are we solving the right version of the problem?
  • What alternatives did we reject?
  • What constraints matter most?
  • What would make this approach fail?
  • What should be true before implementation begins?

That is where GitHub Spec Kit becomes more interesting than a scaffold for implementation. It gives you a repeatable place to slow down before code, without turning planning into a heavy ceremony.

The Simple Mental Model

Spec Kit is often introduced as a path from idea to implementation:

constitution
  -> spec
  -> clarify
  -> plan
  -> tasks
  -> implement

For investigation work, I like to stretch the middle:

constitution
  -> investigation spec
  -> clarification questions
  -> alternative plans
  -> decision approval
  -> final implementation spec
  -> tasks
  -> implementation

The important shift is this:

the first spec is allowed to be a research brief.

It does not need to pretend the team already knows the answer. In fact, the strongest investigation specs say the opposite:

We know the desired outcome. We do not yet know the best implementation path.

That sentence is healthy. It tells the agent to investigate instead of perform certainty.

When to Use Spec Kit for Investigation

Use this workflow when the cost of choosing poorly is higher than the cost of doing one extra planning loop.

Good fits:

  • replacing a core service dependency
  • choosing between build vs buy
  • redesigning a data model
  • adding an auth or permissions layer
  • planning a migration from a legacy feature
  • deciding between multiple agent frameworks
  • introducing a new infrastructure component
  • picking a product workflow that changes user behavior

Weak fits:

  • tiny UI copy changes
  • obvious bug fixes
  • low-risk CRUD additions
  • tasks where the team already has a standard pattern

Spec Kit is useful when ambiguity is real. If the answer is obvious, do not add ceremony for aesthetic reasons.

The Investigation Funnel

Here is the workflow I recommend.

+----------------------+
| 1. Frame the problem |
+----------+-----------+
           |
           v
+----------------------+
| 2. Generate options  |
+----------+-----------+
           |
           v
+----------------------+
| 3. Compare tradeoffs |
+----------+-----------+
           |
           v
+----------------------+
| 4. Validate quickly  |
+----------+-----------+
           |
           v
+----------------------+
| 5. Approve one path  |
+----------+-----------+
           |
           v
+----------------------+
| 6. Harden the spec   |
+----------+-----------+
           |
           v
+----------------------+
| 7. Plan and build    |
+----------------------+

The main difference from a normal AI chat is that every stage should produce an artifact the team can review.

Step 1: Start With an Investigation Spec

Instead of asking:

Build a notification preference system.

Start with:

/speckit.specify

We need to design a notification preference system for a B2B SaaS app.

Goal:
- users can control which events notify them by email, in-app, or Slack
- workspace admins can set defaults for new members
- compliance events must remain non-optional

Investigation request:
- compare at least three implementation alternatives
- include schema impact, product flexibility, migration cost, and operational risk
- do not choose an implementation yet
- produce a spec that captures the problem, user outcomes, constraints,
  open questions, and evaluation criteria

Context:
- existing app uses Postgres
- events are already published to an internal queue
- product expects more notification channels next quarter
- team wants to avoid a hard-coded matrix of event types

Expected output:

  • a feature folder for the current branch
  • a spec.md file
  • user stories or scenarios
  • functional requirements
  • success criteria
  • assumptions and open questions
  • enough context to support alternative generation later

What you are checking:

  • Does the spec separate the user problem from the implementation guess?
  • Are success criteria measurable?
  • Are constraints explicit?
  • Are unresolved questions visible instead of hidden?

If the first spec already picks a solution, push back.

Revise this spec so it remains implementation-neutral.
Keep the desired outcomes and constraints, but remove solution commitments.
Add an "Alternatives to investigate" section with placeholders only.

Step 2: Use Clarification to Expose Decision Inputs

Run the clarification step before you ask for architecture options.

/speckit.clarify

Focus on questions that affect the choice between alternatives.
Group them by:
- product behavior
- data model
- migration
- security and compliance
- operational ownership
- future extension

Expected output:

  • a list of important unanswered questions
  • suggested answers or assumptions
  • updated spec text that makes ambiguity visible

Good clarification questions sound like this:

  • Can workspace admins enforce a channel, or only set defaults?
  • Are compliance notifications legally required to be delivered, or only attempted?
  • Do we need per-user quiet hours?
  • Should notification preferences apply retroactively to scheduled notifications?
  • How many event types exist today, and who owns adding new ones?

Weak clarification questions sound like this:

  • Should the UI be nice?
  • Should the system be scalable?
  • Should we use a database?

Those are too generic. Good questions change the decision.

Step 3: Ask for Alternatives Explicitly

Now ask the assistant to generate competing implementation paths.

/speckit.plan

Before producing the final implementation plan, investigate three alternatives:

Alternative A:
- normalized relational model for users, channels, event types, and preferences

Alternative B:
- JSONB policy document per user or workspace

Alternative C:
- rules engine / policy evaluation layer with event metadata

For each alternative, produce:
- short architecture summary
- data model sketch
- example read path and write path
- migration strategy
- product flexibility
- operational complexity
- testing burden
- failure modes
- when this option is the wrong choice

End with a decision matrix and a recommendation, but do not create tasks yet.

Expected output:

  • a plan.md or plan-like artifact
  • architecture sketches
  • trade-off analysis
  • recommended direction
  • technical unknowns
  • validation steps

The key instruction is “do not create tasks yet.”

You are still deciding. Task generation too early creates false momentum.

Step 4: Require a Decision Matrix

Free-form comparisons are easy to read and easy to forget. A decision matrix forces the trade-off into a form PMs and engineers can discuss together.

Prompt example:

Create a decision matrix with scores from 1 to 5.

Criteria:
- user experience flexibility
- admin control
- implementation simplicity
- migration safety
- reporting/queryability
- future channel support
- operational risk
- testability

Add one paragraph below the table explaining where the scoring is uncertain.

Expected output:

Criteria                    | Relational | JSONB Policy | Rules Engine
----------------------------|------------|--------------|-------------
UX flexibility              | 4          | 3            | 5
Admin control               | 4          | 3            | 5
Implementation simplicity   | 4          | 5            | 2
Migration safety            | 4          | 3            | 2
Reporting/queryability      | 5          | 2            | 3
Future channel support      | 4          | 3            | 5
Operational risk            | 4          | 3            | 2
Testability                 | 4          | 3            | 3

The exact scores matter less than the discussion they create.

A good matrix should make someone on the team say:

I disagree with this score.

That is not a failure. That is the process working.

Step 5: Ask for Validation Experiments

Before approving an alternative, ask what can be tested quickly.

For the recommended alternative, propose validation experiments that can be
completed before implementation.

Include:
- one code spike
- one schema/query experiment
- one product review artifact
- one risk we cannot validate cheaply

Keep each experiment under half a day.

Expected output:

  • a small prototype or script idea
  • a query benchmark or schema proof
  • a wireframe or product scenario review
  • a named risk that remains uncertain

Example validation plan:

Experiment 1: Query shape
- Create sample tables for event_types, notification_channels, and preferences.
- Seed 50 event types, 5 channels, 10k users.
- Verify the read path can resolve preferences for one user and one event in one query.

Experiment 2: Admin default behavior
- Write three acceptance scenarios:
  1. new user inherits workspace defaults
  2. user override wins over default
  3. compliance event cannot be disabled

Experiment 3: Product review
- Review preference grouping with PM and design.
- Confirm whether users think in event categories or individual events.

This step is what keeps investigation from becoming theater.

Step 6: Approve One Alternative

The approval moment should be explicit. Do not let the final plan silently choose.

Use a prompt like this:

We approve Alternative A: normalized relational model.

Update the spec and plan to record:
- why this option was approved
- which alternatives were rejected
- what trade-offs we accept
- which assumptions are now locked
- which risks require monitoring during implementation

Then rewrite the implementation spec so it is specific enough for task generation.

Expected output:

  • a final recommendation section
  • rejected alternatives
  • accepted trade-offs
  • implementation constraints
  • updated functional requirements
  • updated non-functional requirements
  • clearer acceptance criteria

The approved spec should now sound different from the investigation spec.

Before:

The system should support notification preferences across channels.

After:

The system must store notification preferences in normalized relational tables:
event_types, notification_channels, workspace_notification_defaults, and
user_notification_preferences. User overrides must take precedence over
workspace defaults, except for compliance events marked as non-optional.

That difference matters. The first version tells the agent what the product wants. The second tells the agent what the team decided.

Step 7: Generate Tasks Only After Approval

Now the task prompt has real grounding.

/speckit.tasks

Generate implementation tasks from the approved relational-model spec.

Group tasks by:
- database migration
- domain model
- preference resolution service
- API endpoints
- admin defaults
- user settings UI
- compliance event enforcement
- tests
- migration and rollout

Mark tasks that can run in parallel.

Expected output:

  • ordered tasks
  • dependency-aware sequencing
  • test tasks
  • migration tasks
  • reviewable implementation slices

Good tasks should be small enough that an engineer or AI agent can complete them without reopening the whole product debate.

What a Strong Final Spec Contains

After investigation, the final spec should include these sections:

  • Problem statement
  • Users and stakeholders
  • Success criteria
  • Scope and non-scope
  • Approved approach
  • Rejected alternatives
  • Decision matrix summary
  • Assumptions
  • Functional requirements
  • Non-functional requirements
  • Data model
  • Acceptance scenarios
  • Migration plan
  • Risks and mitigations
  • Validation checklist

Here is a compact outline you can ask Spec Kit to produce:

Rewrite the spec using this structure:

1. Problem
2. Goals
3. Non-goals
4. Users and stakeholders
5. Approved approach
6. Rejected alternatives
7. Functional requirements
8. Non-functional requirements
9. Data model
10. Acceptance scenarios
11. Migration and rollout
12. Risks
13. Validation checklist

Keep the language precise enough for implementation.
Remove speculative options from the requirements, but keep them in
"Rejected alternatives" with short rationale.

A Reusable Prompt Pack

Here is a practical prompt pack you can adapt.

1. Investigation Brief

/speckit.specify

Create an investigation spec for [problem].

We know:
- [context]
- [constraints]
- [desired outcomes]

We do not yet know:
- [decision to make]

The spec must stay implementation-neutral.
Include success criteria, non-goals, open questions, and evaluation criteria.

2. Clarify the Decision

/speckit.clarify

Ask only questions that could change the implementation choice.
Group them by product, technical, data, security, migration, and operations.
For each question, explain why the answer matters.

3. Generate Alternatives

/speckit.plan

Investigate at least three implementation alternatives.
For each one, include:
- architecture
- data model
- integration points
- test strategy
- migration path
- risks
- when not to use it

End with a decision matrix and recommendation.
Do not generate implementation tasks.

4. Challenge the Recommendation

Before we approve this recommendation, argue against it.

What assumptions would make it wrong?
What hidden costs are easy to underestimate?
What would a senior engineer object to?
What would a PM object to?
What should we validate before committing?

5. Approve the Direction

We approve [alternative].

Update the spec and plan:
- record the decision
- summarize rejected alternatives
- lock the implementation constraints
- preserve remaining risks
- rewrite acceptance criteria so they match the approved approach

6. Produce Build Tasks

/speckit.tasks

Generate tasks from the approved spec only.
Do not reopen rejected alternatives unless a task depends on an unresolved risk.
Group tasks by dependency and mark parallelizable work.

Example: Build vs Buy for Feature Flags

Imagine a team is deciding whether to build a lightweight internal feature flag system or adopt an external platform.

Investigation prompt:

/speckit.specify

Create an investigation spec for feature flag management.

Context:
- B2B SaaS product
- 30 engineers
- weekly releases
- current flags are environment variables
- PMs want controlled beta rollouts
- compliance team wants audit history

Investigate:
- build a simple internal flag service
- adopt a managed feature flag platform
- use database-backed flags inside the existing admin app

Success criteria:
- reduce risky releases
- allow targeted rollout by workspace
- preserve audit history
- avoid blocking releases on engineering-only changes

Stay implementation-neutral until alternatives are compared.

Expected final decision summary:

Approved approach:
Use a managed feature flag platform for rollout targeting, audit history,
and non-engineering control.

Rejected:
- Internal service: lower vendor cost, but high hidden operational and audit burden.
- DB-backed admin flags: fast to start, but likely to become an incomplete
  feature flag platform without experimentation, SDKs, or evaluation caching.

Accepted trade-off:
The team accepts vendor dependency in exchange for faster safe rollout,
auditing, and PM-operable controls.

That summary is valuable even before implementation starts. It gives PMs a clear decision and engineers a clear boundary.

How This Compares to Other Spec-Driven Tools

GitHub Spec Kit is one part of a broader shift toward spec-driven development.

Kiro, for example, also emphasizes specs with requirements, design, and tasks. Tessl frames software development around specs and AI-native delivery. Those tools may provide more integrated product surfaces depending on your workflow.

Spec Kit is especially interesting when you want:

  • an open source toolkit
  • local repository artifacts
  • assistant-specific command files
  • a workflow that fits Git branches and PR review
  • specs that live near the code

The main trade-off is that Spec Kit is still a toolkit, not a full product management system. It gives structure, but your team still owns the quality of the questions, decisions, and approval process.

Common Failure Modes

Failure Mode 1: The Spec Chooses Too Early

Symptom:

The spec says "use Redis Streams" before the team has compared queues.

Fix:

Revise this as an investigation spec.
Move Redis Streams into "Alternatives to investigate."
Keep requirements implementation-neutral until approval.

Failure Mode 2: The Matrix Is Fake Precision

Scoring alternatives can create an illusion of certainty.

Fix:

For every score in the decision matrix, add confidence:
- high
- medium
- low

Then list the three scores most likely to change after validation.

Failure Mode 3: Rejected Alternatives Disappear

If the final spec only includes the winning option, future readers lose the decision history.

Fix:

Add a "Rejected alternatives" section.
For each rejected option, include:
- why it was considered
- why it lost
- when we should reconsider it

Failure Mode 4: Tasks Reopen the Debate

Sometimes task generation reintroduces options the team already rejected.

Fix:

Regenerate tasks using only the approved approach.
Rejected alternatives are context, not implementation options.

A Lightweight Approval Checklist

Before moving from investigation to tasks, ask:

[ ] Is the user problem clear?
[ ] Are success criteria measurable?
[ ] Were at least two serious alternatives compared?
[ ] Are rejected alternatives recorded?
[ ] Is the approved approach explicit?
[ ] Are accepted trade-offs named?
[ ] Are implementation constraints clear?
[ ] Are remaining risks visible?
[ ] Is there a validation plan?
[ ] Would a new engineer understand why this path won?

If you cannot check those boxes, do not ask the agent to implement yet.

The PM and Engineering Split

This workflow works best when PMs and engineers review different parts of the same artifacts.

PMs should focus on:

  • problem framing
  • user scenarios
  • success criteria
  • non-goals
  • acceptance criteria
  • product trade-offs

Engineers should focus on:

  • architecture choices
  • migration risk
  • data model
  • operational behavior
  • security implications
  • test strategy

The shared artifact is the spec. The shared decision is the approved alternative.

That is the real benefit: everyone reviews the same story from different angles before the code exists.

Where Spec Kit Still Needs Human Judgment

Spec Kit can organize the investigation, but it cannot guarantee the team is asking the right questions.

You still need humans to decide:

  • which constraints are real
  • which risks are unacceptable
  • when a product trade-off is worth engineering complexity
  • when the recommendation feels too convenient
  • when a prototype is necessary
  • when to stop analyzing and build

The healthiest version of this workflow is not “AI decides.” It is:

AI structures the decision.
Humans approve the decision.
AI helps execute the approved decision.

Final Takeaway

GitHub Spec Kit is easy to underestimate if you treat it as a command wrapper for AI implementation.

The better use is more strategic:

make the agent help you think before you make it help you code.

Use the first spec as an investigation brief. Use clarification to expose decision inputs. Use planning to compare alternatives. Use approval to lock one direction. Then use tasks and implementation with much more confidence.

That small pause can save days of beautifully implemented wrongness.

Source List