AI Coding Workflows

How to Use Gherkin in AI Development and Modern Engineering Workflows

A practical research guide for using Gherkin, BDD, Spec-Driven Development, and eval-driven workflows to make AI-assisted delivery clearer, safer, and easier to review.

28 min read Updated Jun 9, 2026

TL;DR

  • Gherkin is useful in AI development because it turns vague product intent into concrete, reviewable examples.
  • A good Gherkin scenario gives an AI coding agent three things it badly needs: context, action, and observable success criteria.
  • Do not use Gherkin as a UI scripting language. Use it to describe business behavior in domain language.
  • Gherkin and GitHub Spec Kit solve different parts of the workflow: Spec Kit gives the agent a structured delivery process; Gherkin gives the process concrete acceptance examples.
  • You do need to install and initialize Spec Kit if you want /speckit.* commands. A .feature file by itself is just an idiomatic specification unless you also wire it to Cucumber or another test runner.
  • The best workflow is not “write Gherkin, then let AI code everything.” It is:
    1. discover examples with product, engineering, and QA
    2. formulate the important examples as Gherkin
    3. ask the AI to implement against those examples
    4. automate tests at the right level
    5. keep the examples alive as documentation and regression protection
  • Gherkin fits naturally with modern methods like BDD, Specification by Example, ATDD, TDD, Spec-Driven Development, eval-driven AI development, and CI/CD.
  • The main risk is writing scenarios that are too vague, too technical, too long, or disconnected from real tests.

On June 9, 2026, I reviewed current sources from Cucumber, Agile Alliance, GitHub Spec Kit, Microsoft for Developers, OpenAI’s evals documentation, and current practitioner writing about Gherkin for AI-assisted coding. Source links are listed at the end.

What You Will Learn Here

You will learn:

  • why Gherkin is becoming more useful, not less useful, in AI-assisted development
  • how to write scenarios that help both humans and AI coding agents
  • where Gherkin fits inside BDD, Spec-Driven Development, TDD, and eval-driven AI workflows
  • how to integrate Gherkin with GitHub Spec Kit without confusing specifications, feature files, tests, and agent commands
  • how engineers and PMs can collaborate before prompting an AI to build
  • what good and bad AI-ready Gherkin looks like
  • how to convert a Gherkin scenario into implementation tasks, tests, and review checks

This article is written for engineers, PMs, QA engineers, engineering managers, and technical founders who are already using, or about to use, AI coding agents in real product work.

The Basic Idea

AI coding tools are very good at generating code.

They are much weaker when the team gives them ambiguous intent.

That is not really an AI problem. It is a requirements problem with a faster engine attached to it.

Gherkin helps because it creates a small contract:

Feature: Saved dashboard views

  Rule: A user can save a private dashboard view

    Scenario: Save a new dashboard view
      Given I am viewing a dashboard with active filters
      When I save the view as "Executive weekly"
      Then I should see "Executive weekly" in my saved views
      And the saved view should restore the same filters

That scenario is not just a test.

It is a compact agreement between product and engineering:

  • who the behavior is for
  • what state matters before the action
  • what action matters
  • what result must be observable
  • what an AI coding agent should preserve while implementing

In AI development, that agreement is gold.

Gherkin Is Not the Same Thing as BDD

This distinction matters.

BDD is the collaboration method.

Gherkin is one language teams can use to express examples from that collaboration.

Cucumber’s current BDD documentation describes BDD as a way for teams to close the gap between business and technical people by building shared understanding, working in small iterations, and producing system documentation that can be checked against behavior.

That is the important part: shared understanding first, automation second.

If a team skips the conversation and only asks an AI to generate .feature files, it is not really doing BDD. It is generating formatted guesses.

Why Gherkin Helps AI Development

AI agents need boundaries.

Without boundaries, they tend to fill gaps with plausible assumptions:

  • “Should this be private or shared?”
  • “Should duplicate names be allowed?”
  • “Should deleting a saved view delete the underlying dashboard?”
  • “Should this behavior be tested through the API, UI, or unit layer?”

Gherkin reduces those silent assumptions by turning behavior into examples.

Here is the practical difference:

vague prompt
  -> "Build saved dashboard views"
  -> AI guesses product rules
  -> reviewer finds mismatches late

gherkin-backed prompt
  -> examples define behavior
  -> AI implements against explicit scenarios
  -> reviewer checks code against agreed examples

The value is not that Gherkin is magical.

The value is that concrete examples are easier for humans and models to reason about than abstract requirements.

The Modern Workflow

Here is the flow I recommend for teams using AI coding tools:

product intent
  -> example mapping
  -> Gherkin scenarios
  -> AI implementation plan
  -> code + tests
  -> CI checks
  -> human review
  -> living documentation

In more operational terms:

PM / domain expert
  explains the user need

PM + engineer + QA
  discover rules, examples, and questions

engineer + QA
  formulate key examples in Gherkin

AI coding agent
  creates an implementation plan and code changes

test suite
  verifies behavior at the right level

reviewer
  checks whether the code still matches the examples

This workflow keeps the AI in the right role.

It is not the product owner.

It is not the source of truth.

It is a fast implementation partner working from a clearer contract.

Start With Example Mapping

Before writing formal Gherkin, run a short example mapping session.

The Cucumber team describes example mapping as a lightweight way to capture:

  • the story
  • the rules
  • examples for each rule
  • unresolved questions
  • follow-up stories that should be sliced out

For AI development, this is especially useful because unresolved questions are exactly where AI tools tend to invent answers.

Example:

Story:
  Users can save dashboard views.

Rules:
  - A saved view belongs to the user who created it.
  - A saved view restores filters, columns, and sort order.
  - Saved view names must be unique per user.

Examples:
  - The one where Ana saves a dashboard with two filters.
  - The one where Ana tries to reuse an existing saved view name.
  - The one where Ben cannot see Ana's private saved view.

Questions:
  - Should admins see users' private saved views?
  - Is there a maximum number of saved views per user?
  - Do saved views include relative dates like "last 7 days"?

Notice the key point: you do not need perfect Gherkin yet.

You need the team to find the rules and questions before the AI starts building.

Then Write AI-Ready Gherkin

Good AI-ready Gherkin is:

  • concrete
  • short
  • observable
  • written in product language
  • focused on one behavior
  • independent from implementation details

Bad AI-ready Gherkin is:

  • vague
  • overloaded
  • written like a click-by-click UI script
  • full of database details
  • trying to cover five behaviors in one scenario

Compare these two versions.

Weak Scenario

Scenario: User saves dashboard
  Given the user is on the page
  When the user clicks save
  Then it works

The AI can generate almost anything from this.

The reviewer cannot tell what “works” means.

Better Scenario

Scenario: Save a dashboard view with active filters
  Given I am viewing the revenue dashboard filtered by region "LATAM"
  And the table is sorted by "Closed revenue" descending
  When I save the view as "LATAM revenue"
  Then "LATAM revenue" should appear in my saved views
  And opening "LATAM revenue" should restore the region filter
  And the table should still be sorted by "Closed revenue" descending

This gives the AI a real contract.

It still does not dictate whether the implementation uses React state, a SQL table, an API endpoint, or a cache. Those choices belong in the technical plan.

Prompt Pattern: Give the AI the Scenario and the Rules

Once the team agrees on the behavior, prompt the AI with the scenario plus constraints.

You are implementing the saved dashboard views feature.

Use these Gherkin scenarios as the acceptance contract:

Feature: Saved dashboard views

  Rule: Saved views are private per user

    Scenario: User saves a private dashboard view
      Given Ana is viewing the revenue dashboard filtered by region "LATAM"
      When Ana saves the view as "LATAM revenue"
      Then Ana should see "LATAM revenue" in her saved views
      And Ben should not see "LATAM revenue" in his saved views

Implementation instructions:
- First inspect the existing dashboard, auth, and persistence patterns.
- Propose a short plan before editing files.
- Add tests that map to each scenario.
- Prefer API or integration tests for business rules.
- Keep UI tests focused on the user-visible flow.
- Do not introduce a new persistence abstraction unless the repo already uses that pattern.

This prompt does four useful things:

  • it gives the AI behavioral examples
  • it separates behavior from implementation instructions
  • it tells the AI to inspect existing patterns
  • it makes tests part of the task, not an afterthought

Where Gherkin Fits With Other Modern Methodologies

Gherkin is most useful when it is part of a larger delivery system.

It should not replace all other planning and testing practices.

BDD

BDD uses conversation, examples, and automation to align product and engineering.

Gherkin is a strong formulation tool for BDD because it is readable by PMs and executable by test frameworks.

Use it when behavior needs shared understanding across roles.

Specification by Example

Specification by Example is the broader idea of describing requirements through concrete examples instead of only abstract statements.

Gherkin is one common syntax for those examples.

Use it when the domain has rules, exceptions, and edge cases.

ATDD

Acceptance Test-Driven Development starts with acceptance criteria before implementation.

Gherkin fits well because each scenario can become an acceptance test or a review checklist.

Use it when “done” must be explicit before coding starts.

TDD

TDD works at a lower level.

A Gherkin scenario might say:

Then the saved view should restore the same filters

The implementation might require several unit tests for:

  • serializing filter state
  • validating filter names
  • loading a user’s saved view
  • rejecting access from another user

Gherkin describes behavior. TDD helps design the internal code.

Spec-Driven Development

GitHub Spec Kit popularized a modern AI-friendly version of spec-driven delivery: define what you want, clarify it, plan it, generate tasks, then implement.

Gherkin fits inside that workflow as the acceptance-example layer.

Spec Kit style workflow
  -> feature spec
  -> clarification
  -> plan
  -> tasks
  -> implementation

Gherkin contribution
  -> concrete examples
  -> business rules
  -> acceptance checks
  -> regression scenarios

In other words, Spec-Driven Development gives the delivery structure. Gherkin gives behavior-level precision.

How to Integrate Gherkin With GitHub Spec Kit

This is the part that can feel confusing at first, so let’s separate the pieces.

Spec Kit is a workflow tool.

It gives your AI coding agent commands and templates for moving through the delivery process:

constitution
  -> specify
  -> clarify
  -> plan
  -> tasks
  -> analyze
  -> implement

Gherkin is a behavior format.

It gives your team a readable way to express examples:

Feature
  -> Rule
  -> Scenario
  -> Given / When / Then

They work best together when Spec Kit owns the delivery flow and Gherkin owns the acceptance examples.

Do You Need to Install Spec Kit?

Yes, if you want the actual Spec Kit workflow.

According to the current Spec Kit README, the standard path is:

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z
specify init my-project --integration copilot
cd my-project

The exact integration can change depending on your agent. The README also says most agents expose commands as /speckit.*, while some integrations expose agent skills instead of slash-command prompt files.

After initialization, your agent can use commands such as:

/speckit.constitution
/speckit.specify
/speckit.clarify
/speckit.plan
/speckit.tasks
/speckit.analyze
/speckit.implement

So the short answer is:

Want Spec Kit commands and generated artifacts?
  -> install and initialize Spec Kit.

Only want to write acceptance examples in Gherkin?
  -> no Spec Kit install required.

Is a Gherkin File Just an Idiomatic File?

It depends how you use it.

A .feature file can be three different things:

UseRequires install?What it does
Plain acceptance specNoHumans and AI agents read it as requirements
Spec Kit input/contextSpec Kit yes, Cucumber noThe agent uses it while specifying, planning, and implementing
Executable BDD testCucumber or equivalent yesA test runner maps steps to code and verifies behavior

This distinction is important.

If you create this file:

features/saved-dashboard-views.feature

but you do not install Cucumber, Playwright-BDD, Behave, SpecFlow, pytest-bdd, or another runner, nothing will execute it automatically.

That is still useful. It can be a shared acceptance contract.

But it is not an automated test yet.

To make it executable, you need a test framework and step definitions:

features/saved-dashboard-views.feature
  -> step definitions
  -> app/test driver
  -> assertions
  -> CI result

For AI-assisted delivery, I usually recommend starting with the .feature file as a readable contract, then automating only the scenarios that deserve regression coverage.

A Practical Spec Kit + Gherkin Flow

Here is a clean flow for a real team.

1. Initialize Spec Kit
   -> install specify-cli
   -> run specify init with your agent integration

2. Establish principles
   -> /speckit.constitution
   -> define quality, testing, security, UX, and review standards

3. Define the feature
   -> /speckit.specify
   -> describe what and why
   -> include Gherkin scenarios as acceptance examples

4. Clarify uncertainty
   -> /speckit.clarify
   -> turn vague scenario outcomes into decisions or out-of-scope notes

5. Plan implementation
   -> /speckit.plan
   -> choose architecture, data model, API shape, UI approach, and test strategy

6. Generate tasks
   -> /speckit.tasks
   -> ensure each important scenario maps to a task, test, eval, or manual check

7. Analyze consistency
   -> /speckit.analyze
   -> catch gaps between spec, plan, tasks, and acceptance examples

8. Implement
   -> /speckit.implement
   -> build against the spec and scenarios

The key is to bring Gherkin into /speckit.specify and keep it alive through /speckit.tasks and /speckit.analyze.

What to Put in the Spec Kit Feature Spec

Inside the Spec Kit-generated feature spec, I would include a section like this:

## Acceptance Examples

```gherkin
Feature: Saved dashboard views

  Rule: Saved views are private per user

    Scenario: User saves a private dashboard view
      Given Ana is viewing the revenue dashboard filtered by region "LATAM"
      When Ana saves the view as "LATAM revenue"
      Then Ana should see "LATAM revenue" in her saved views
      And Ben should not see "LATAM revenue" in his saved views

  Rule: Saved view names are unique per user

    Scenario: User tries to reuse a saved view name
      Given Ana already has a saved view named "LATAM revenue"
      When Ana saves another view as "LATAM revenue"
      Then Ana should be asked to choose a different name
```

Then add a traceability table:

## Acceptance Traceability

| Scenario | Verification |
| --- | --- |
| User saves a private dashboard view | API integration test + one E2E happy path |
| User tries to reuse a saved view name | API integration test |
| Different users use the same saved view name | API integration test |
| Admin visibility for private views | Out of scope until product decision |

That table is very useful for AI coding agents because it tells the agent where proof should live.

Prompt Example for /speckit.specify

Use a prompt like this:

/speckit.specify

Build saved dashboard views for authenticated users.

Goal:
Users should be able to save useful dashboard filter setups and return to them later.

Business rules:
- Saved views are private per user.
- Saved view names must be unique per user.
- Saved views restore filters, visible columns, and sort order.
- Deleting a saved view must not delete the underlying dashboard.

Acceptance examples:

Feature: Saved dashboard views

  Rule: Saved views are private per user

    Scenario: User saves a private dashboard view
      Given Ana is viewing the revenue dashboard filtered by region "LATAM"
      When Ana saves the view as "LATAM revenue"
      Then Ana should see "LATAM revenue" in her saved views
      And Ben should not see "LATAM revenue" in his saved views

  Rule: Saved view names are unique per user

    Scenario: User tries to reuse a saved view name
      Given Ana already has a saved view named "LATAM revenue"
      When Ana saves another view as "LATAM revenue"
      Then Ana should be asked to choose a different name

Open questions:
- Should admins see users' private saved views?
- Is there a maximum number of saved views per user?
- Do relative dates like "last 7 days" resolve dynamically when reopened?

Do not answer open questions by guessing. Mark them for clarification or explicitly out of scope.

This gives Spec Kit enough product intent to create the feature spec while preserving the behavior examples the team already agreed on.

Can Spec Kit Auto-Generate Gherkin Scenarios?

Yes, but with an important caveat.

Spec Kit does not need a special “Gherkin generator” to do this. During /speckit.specify or /speckit.clarify, the AI agent can draft Gherkin scenarios from the product brief, business rules, examples, and open questions you provide.

That means this is possible:

product brief
  -> /speckit.specify
  -> generated feature spec
  -> generated draft Gherkin scenarios
  -> human review
  -> clarified acceptance scenarios

But generated scenarios should be treated as draft acceptance examples, not truth.

The AI can infer useful cases, but it can also invent product rules that sound reasonable and are completely wrong.

A good prompt is:

/speckit.specify

Create the feature spec and draft Gherkin acceptance scenarios.

Rules:
- Generate scenarios only from the provided business rules and examples.
- If a scenario requires an unstated product rule, put it under "Candidate scenarios needing clarification."
- Separate must-have scenarios from edge cases.
- Do not invent limits, roles, permissions, data retention rules, or admin behavior.
- Add open questions for anything ambiguous.

Output sections:
1. Feature summary
2. Business rules
3. Draft Gherkin scenarios
4. Candidate scenarios needing clarification
5. Open questions

Then use /speckit.clarify to harden the generated scenarios:

/speckit.clarify

Review the draft Gherkin scenarios.

For each scenario:
- identify any hidden assumption
- ask a clarification question if the behavior is not explicit
- mark the scenario as accepted, revised, split, or removed
- move speculative behavior out of acceptance criteria

This creates a healthier flow:

AI drafts scenarios
  -> humans reject bad assumptions
  -> clarify missing rules
  -> accepted scenarios become the contract
  -> plan and tasks trace back to accepted scenarios

The rule I would use is simple:

AI-generated Gherkin is allowed.
AI-approved Gherkin is not.

Someone with product or domain authority must review the examples before implementation treats them as acceptance criteria.

Use this checklist before accepting generated scenarios:

[ ] Does each scenario come from a known business rule?
[ ] Does each scenario have one clear When action?
[ ] Are expected results observable by a user, API, test, or eval?
[ ] Did the AI invent roles, limits, permissions, defaults, or error states?
[ ] Are uncertain scenarios separated from accepted scenarios?
[ ] Can the team map each accepted scenario to a test, eval, or manual check?

So yes, auto-generation is useful.

Just keep a review gate between generated scenarios and implementation.

Prompt Example for /speckit.plan

After clarification, use the Gherkin scenarios to shape the technical plan:

/speckit.plan

Plan the implementation using the existing stack and repository conventions.

Testing strategy:
- Map each acceptance scenario to a verification method.
- Use API or integration tests for privacy and uniqueness rules.
- Use one E2E test for the happy path of saving and reopening a view.
- Do not automate unresolved questions.

Traceability requirement:
The plan should include a table mapping each Gherkin scenario to:
- implementation area
- test level
- file or test location
- owner of unresolved product decisions

This prevents the plan from becoming a generic technical design.

It keeps the plan anchored to behavior.

Prompt Example for /speckit.tasks

Task generation is where many AI workflows lose the plot.

Ask for scenario-linked tasks:

/speckit.tasks

Generate implementation tasks from the plan.

Requirements:
- Group tasks by acceptance scenario when possible.
- Include tests beside the implementation tasks they verify.
- Add a final traceability task that checks every Gherkin scenario has proof.
- Keep unresolved questions out of implementation tasks unless clarified.

The output should feel like this:

Scenario: User saves a private dashboard view
  [ ] Add persistence model scoped by user ID
  [ ] Add create saved view API endpoint
  [ ] Add saved views list API endpoint filtered by current user
  [ ] Add integration test proving Ben cannot see Ana's saved view
  [ ] Add UI flow for save and reopen
  [ ] Add E2E happy-path test

Scenario: User tries to reuse a saved view name
  [ ] Add uniqueness constraint per user
  [ ] Return validation error for duplicate names
  [ ] Add integration test for duplicate name rejection

Now the AI is not just generating a task list.

It is generating a task list tied to behavior.

Where the Files Should Live

A practical repo layout can look like this:

.specify/
  memory/
    constitution.md

specs/
  001-saved-dashboard-views/
    spec.md
    plan.md
    tasks.md

features/
  saved-dashboard-views.feature

tests/
  integration/
    saved-dashboard-views.test.ts
  e2e/
    saved-dashboard-views.spec.ts

Use specs/ for the Spec Kit artifacts.

Use features/ for durable Gherkin scenarios if your team wants them visible outside the generated spec.

Use tests/ for executable proof.

That separation avoids a common confusion:

spec.md
  explains the feature

.feature
  expresses examples

test files
  execute verification

My Recommendation

Start simple:

Week 1:
  Use Gherkin inside Spec Kit specs as acceptance examples.
  Do not install Cucumber yet.

Week 2:
  Require scenario-to-test traceability in /speckit.tasks.
  Automate key scenarios with normal test tools.

Week 3+:
  If the team likes executable Gherkin, add Cucumber or a BDD runner.
  If not, keep Gherkin as readable acceptance specs and automate behavior at API/E2E levels.

This is the pragmatic path.

Do not start by installing every BDD tool.

Start by making the AI build from better examples.

Eval-Driven AI Development

When the product itself includes LLM behavior, Gherkin can also inspire eval cases.

OpenAI’s evals documentation describes evals as structured tests for checking whether AI outputs meet style and content criteria, and it explicitly compares the process to BDD: specify expected behavior, run test inputs, analyze results, and iterate.

For example:

Feature: Support ticket classifier

  Rule: Billing questions are routed to the billing queue

    Scenario: Customer asks about an invoice charge
      Given a support ticket says "Why was I charged twice this month?"
      When the AI classifies the ticket
      Then the category should be "Billing"
      And the confidence should be high enough for automatic routing

That scenario may not become a Cucumber browser test.

It may become an eval dataset row:

{
  "input": "Why was I charged twice this month?",
  "expected_category": "Billing",
  "minimum_confidence": 0.8
}

The principle is the same: define expected behavior before trusting automation.

How to Choose the Right Test Level

Not every Gherkin scenario should become an end-to-end test.

That is one of the easiest ways to create a slow, fragile suite.

Use this heuristic:

BehaviorBest test level
Pure calculation or transformationUnit test
Authorization or workflow ruleIntegration/API test
Critical user journeyE2E test
LLM output qualityEval
Visual layoutVisual regression or component test
Cross-service contractContract test

The Gherkin scenario is the business-facing example.

The automation can live at the level that gives the best signal for the lowest maintenance cost.

A Practical Repository Pattern

For an AI-assisted team, I like this structure:

features/
  saved-dashboard-views.feature

docs/specs/
  saved-dashboard-views.md

tests/
  integration/
    saved-dashboard-views.test.ts
  e2e/
    saved-dashboard-views.spec.ts

evals/
  support-ticket-routing.jsonl

The .feature file explains behavior.

The spec explains scope, decisions, and open questions.

The tests verify deterministic software behavior.

The evals verify probabilistic AI behavior.

This gives both humans and AI agents a better map of the system.

Review Checklist for AI-Generated Code

When a PR was built from Gherkin scenarios, review it against the examples.

[ ] Does each scenario map to at least one test, eval, or explicit manual check?
[ ] Did the AI preserve the business language from the scenario?
[ ] Are the tests checking observable outcomes instead of implementation details?
[ ] Did the implementation add behavior not covered by the examples?
[ ] Did the AI invent answers to unresolved product questions?
[ ] Are edge cases represented as separate scenarios instead of hidden in one big scenario?
[ ] Can a PM or QA read the examples and understand the behavior?

The most dangerous failure is not bad syntax.

The most dangerous failure is when the AI builds a reasonable feature that is not the feature the team agreed to build.

Common Anti-Patterns

Anti-Pattern 1: UI Script Gherkin

Scenario: Save view
  Given I click the dashboard menu
  And I click the three dots
  And I click the input
  And I type "LATAM revenue"
  When I click the blue save button
  Then I see a toast

This is brittle.

It tells the AI too much about the interface and too little about the business behavior.

Prefer:

Scenario: Save a named dashboard view
  Given I am viewing the revenue dashboard filtered by region "LATAM"
  When I save the view as "LATAM revenue"
  Then "LATAM revenue" should appear in my saved views

Anti-Pattern 2: One Giant Scenario

If a scenario has multiple When steps and multiple outcomes, it probably contains multiple behaviors.

Split it.

one rule
  -> one behavior
  -> one scenario
  -> one clear expected result

Anti-Pattern 3: AI-Generated Placeholder Examples

AI tools love filler:

Scenario: Valid input
Scenario: Invalid input
Scenario: Edge case

Push for real examples:

Scenario: Reject a duplicate saved view name for the same user
Scenario: Allow the same saved view name for different users
Scenario: Restore a relative date filter when opening a saved view tomorrow

Anti-Pattern 4: Treating Gherkin as Final Truth

Gherkin scenarios are only useful if they reflect current product decisions.

If the implementation changes behavior, update the examples.

If the examples expose a bad product rule, change the product rule.

Living documentation must keep living.

A Small End-to-End Example

Imagine a PM writes this request:

Users should be able to save dashboard views so they can return to useful filter setups later.

After example mapping, the team identifies:

Rules:
  - saved views are private per user
  - names must be unique per user
  - saved views restore filters and sort order
  - deleting a view does not delete the dashboard

Questions:
  - should saved views be shareable later?
  - should admins see private views?

The team writes:

Feature: Saved dashboard views

  Rule: Saved views are private per user

    Scenario: User saves a private dashboard view
      Given Ana is viewing the revenue dashboard filtered by region "LATAM"
      When Ana saves the view as "LATAM revenue"
      Then Ana should see "LATAM revenue" in her saved views
      And Ben should not see "LATAM revenue" in his saved views

  Rule: Saved view names are unique per user

    Scenario: User tries to reuse a saved view name
      Given Ana already has a saved view named "LATAM revenue"
      When Ana saves another view as "LATAM revenue"
      Then Ana should be asked to choose a different name

    Scenario: Different users use the same saved view name
      Given Ana has a saved view named "LATAM revenue"
      When Ben saves a view as "LATAM revenue"
      Then Ben's saved view should be created

Then the engineer asks the AI:

Implement the feature described in this Gherkin.

Before editing:
- inspect current dashboard state management
- inspect auth/user scoping patterns
- inspect existing API test patterns
- propose a file-level plan

During implementation:
- add API or integration tests for privacy and duplicate-name rules
- add one E2E test for the happy path
- do not implement sharing or admin visibility; those are unresolved future questions

That last line matters.

It prevents the AI from turning open questions into accidental scope.

How PMs Can Use This Without Becoming Test Engineers

PMs do not need to write perfect Gherkin.

They need to help provide:

  • realistic examples
  • business rules
  • exception cases
  • words users would recognize
  • answers to product questions

A PM can start with this:

The one where Ana saves a LATAM revenue view.
The one where Ana tries to reuse the same name.
The one where Ben cannot see Ana's view.
The one where Ana deletes the saved view but the dashboard remains.

Then engineering and QA can turn the important examples into formal Gherkin.

That is a healthy division of labor.

How Engineers Can Use This Without Slowing Down

Engineers do not need to turn every ticket into ceremony.

Use Gherkin when:

  • the feature has product rules
  • multiple roles need to agree on behavior
  • the AI agent might make dangerous assumptions
  • the workflow spans several components
  • regressions would be expensive
  • the feature includes LLM behavior that needs evals

Skip formal Gherkin when:

  • the change is purely internal refactoring
  • the behavior is already covered by clear tests
  • the task is a small mechanical fix
  • the cost of writing scenarios is higher than the ambiguity risk

The goal is better delivery, not ritual.

For teams using AI coding agents, I would make this lightweight agreement:

For every non-trivial user-facing feature:

1. Capture rules, examples, and open questions before implementation.
2. Write Gherkin for the scenarios that define done.
3. Mark unresolved questions as out of scope until answered.
4. Give the AI the scenarios as the acceptance contract.
5. Require tests, evals, or manual checks that map back to the scenarios.
6. Review the PR against the scenarios, not only against the diff.

This is small enough to use.

It is also strong enough to prevent many AI-assisted delivery mistakes.

Existing Gaps and Where This Practice Can Improve

The main gap in today’s AI development conversation is that teams talk a lot about models and tools, but not enough about behavioral contracts.

Gherkin can help, but there are still open questions:

  • How should teams keep .feature files, specs, tests, and eval datasets synchronized over time?
  • Which scenarios deserve E2E automation, and which should stay as API tests, unit tests, or manual checks?
  • How should AI agents report traceability from scenario to code to test?
  • Can coding agents reliably detect when a scenario is too vague or too implementation-specific?
  • How should PMs review AI-generated Gherkin without getting pulled into tool syntax?

Those are good future sections for a deeper follow-up article.

My Practical Recommendation

Use Gherkin as the behavioral layer of your AI development workflow.

Do not make it the whole workflow.

The strongest pattern is:

BDD / Example Mapping
  gives shared understanding

Gherkin
  captures concrete behavior

Spec-Driven Development
  organizes the implementation path

TDD / integration tests / E2E tests
  verify deterministic software behavior

Evals
  verify probabilistic AI behavior

CI/CD
  keeps the whole thing honest

If you do only one thing, do this:

Before asking an AI agent to build a feature, write three examples of how the feature should behave and one example of what it must not do.

That small habit will improve your prompts, your tests, your reviews, and your product conversations.

Source List