Modern Agent Engineering

The Evolution of AI Agent Orchestration: System Prompts, Skills, MCP, and Plugins

How AI-assisted engineering workflows mature from a simple system prompt into skills, MCP tools, and full plugins — with real engineering examples at every stage.

19 min read Updated Mar 24, 2026

Every team I’ve seen adopt AI coding agents follows the same invisible arc. They start with a system prompt — a few lines of context pasted into a chat box. Then the prompt grows, gets messy, and someone extracts it into a reusable skill. Then the skill needs real data, so they wire up an MCP server. Then someone packages the whole thing as a plugin, and suddenly it’s infrastructure.

This progression isn’t accidental. It’s a natural maturity curve that mirrors how any engineering tool matures: from script to library to service. Understanding it ahead of time lets you move through the stages intentionally instead of reactively.

The Four Layers

At any point in this evolution, you’re working within one of four layers:

  1. System Prompt — context injected before every conversation
  2. Skills and Commands — reusable, invocable workflows: commands as prompt templates, skills as intelligent auto-discoverable units
  3. MCP Tools — structured access to external systems over a standard protocol
  4. Plugins / Extensions — packaged bundles that combine all three and ship as shareable units

Each layer builds on the one below it. You rarely skip ahead — a plugin that skips directly to tool calls without grounding instructions will hallucinate. A skill without MCP access can’t reach your actual data. The layers are cumulative.

Layer 1: The System Prompt

System prompts are the zero-to-one moment for any AI workflow. They’re where you give the model its persona, its constraints, and the domain knowledge it needs to be useful in your context.

What it looks like in Claude Code

Claude Code reads CLAUDE.md files at three levels — project (./CLAUDE.md), user (~/.claude/CLAUDE.md), and enterprise policy — merging them into a single effective system prompt before every session.

A typical CLAUDE.md for a backend team looks like this:

# CLAUDE.md

## Stack
- Node.js 22 + TypeScript strict mode
- PostgreSQL 16 via Prisma ORM
- Jest for tests, ESLint + Prettier for style

## Conventions
- All new functions need JSDoc with @param and @returns
- Database queries go through service layer — never in controllers
- Run `npm test` before committing, fix any failures

## Branching
- Feature branches: feature/<jira-id>-<short-desc>
- Never push directly to main or develop

## Commands
- `npm run dev` starts the local dev server
- `npm run db:migrate` runs pending Prisma migrations

What it looks like in Cursor IDE

Cursor uses .cursor/rules — a directory of .mdc files, each scoped by when it applies:

.cursor/rules/
  global.mdc          # Always attached
  typescript.mdc      # Auto-attached to *.ts, *.tsx files
  database.mdc        # Manually invoked when working on DB code
  security.mdc        # Agent-requested when model detects security context

Each rule file carries metadata:

---
description: TypeScript and React conventions
globs: ["**/*.ts", "**/*.tsx"]
alwaysApply: false
---

Use functional components only. No class components.
All async operations must have explicit error handling.
State management via Zustand — no Redux.

When the system prompt is enough

The system prompt works well when:

  • The context is stable — your stack doesn’t change project to project
  • The instructions are declarative — tell the model what, not how
  • The context fits comfortably in ~2,000 tokens without crowding out your actual conversation

A single-team project with a settled stack rarely needs more than a well-written CLAUDE.md. The problem comes when the prompt tries to do too much.

When to leave this layer

System prompts break down under three conditions:

Context explosion. A prompt that tries to encode your entire engineering handbook — API contract rules, database schema, deployment procedures, code review checklist — will fill the context window and still be incomplete. Models also start ignoring instructions buried deep in a long prompt.

Repetitive procedural tasks. When you find yourself typing the same multi-step instruction over and over (“review this PR: check types, check error handling, check test coverage, summarize in the PR description format”), that’s a skill waiting to be extracted.

Dynamic context. When the instructions need to reference live data — “use the current schema from our database” or “check the latest API spec from our docs site” — a static text file can’t help you.

Layer 2: Skills and Commands

This layer is where reusable workflows live. Claude Code offers two related but distinct mechanisms: Commands are simple prompt templates invoked by slash, while Skills are structured, intelligent workflow units that Claude can discover and execute autonomously. Both promote repeated instructions from inline chat into named, reusable artifacts — but skills can do significantly more.

Commands

A command is a markdown file in .claude/commands/. Its name becomes a slash command. Its content is a prompt template that Claude executes when you invoke it.

.claude/commands/
  review-pr.md
  gen-migration.md
  write-adr.md
  update-changelog.md

.claude/commands/review-pr.md:

Review the current git diff as a pull request.

Check the following in order:
1. TypeScript types — are all new functions properly typed?
2. Error handling — are async operations wrapped in try/catch?
3. Test coverage — do new code paths have tests?
4. Prisma queries — are they going through the service layer?
5. Performance — any N+1 queries or missing indexes?

After review, generate a PR description in this format:

## Summary
[2-3 bullet points about what this PR does]

## Test Plan
[Checkbox list of how to verify the changes]

## Risk
[Low / Medium / High and why]

Use $ARGUMENTS to target a specific file or directory.

Now /review-pr src/auth/ triggers the full workflow without re-typing instructions.

.claude/commands/gen-migration.md:

Generate a Prisma migration for the following schema change: $ARGUMENTS

Steps:
1. Read the current schema at prisma/schema.prisma
2. Design the migration to add/modify the requested fields
3. Check for data implications — will existing rows need backfilling?
4. Generate the migration SQL
5. Update the schema file
6. Write a short description of what the migration does for the migration history

If this migration is destructive (drops columns or tables), flag it clearly.

Commands transform the system prompt from a passive context document into an active workflow registry. The system prompt tells the model who it is. Commands tell it what it knows how to do.

How Claude Code Skills Work

Skills are the evolution of commands. Where a command is a flat prompt template you invoke manually, a skill is a structured unit with metadata, invocation rules, and the ability to spawn parallel agents, inject live data, and adapt to your codebase.

A skill lives in .claude/skills/<name>/SKILL.md:

.claude/skills/
  review-pr/
    SKILL.md
  scaffold-tests/
    SKILL.md
  deploy/
    SKILL.md
    runbook.md     # referenced supporting files

.claude/skills/review-pr/SKILL.md:

---
name: review-pr
description: Reviews pull requests against team standards. Use when asked to review a PR or diff.
allowed-tools: Read, Grep, Glob, Bash
---

Review the PR at $ARGUMENTS against our engineering standards:

1. TypeScript types — all new functions properly typed?
2. Error handling — async operations wrapped in try/catch?
3. Test coverage — new code paths have tests?
4. Service layer — queries going through service layer only?
5. Performance — N+1 patterns or missing indexes?

Generate a PR description with Summary, Test Plan, and Risk level.

The description field is key. Claude reads skill descriptions at session start and automatically invokes the skill when the context matches — you don’t have to type /review-pr. Ask “can you look at this diff?” and Claude loads the skill because its description matches.

.claude/skills/deploy/SKILL.md:

---
name: deploy
description: Runs the deployment checklist for a service.
disable-model-invocation: true
allowed-tools: Bash, Read
---

Deploy $ARGUMENTS to the target environment:

1. Run `npm test` — abort if any failures
2. Check the deployment checklist in runbook.md
3. Run `npm run build`
4. Push the image and verify health checks pass

DO NOT proceed if any step fails.

disable-model-invocation: true means Claude will never run this automatically — only you can trigger it with /deploy. Critical for anything with real-world side effects.

Skills can inject live data before execution:

---
name: pr-summary
description: Summarizes a PR with its current diff and review comments.
---

Context:
- PR diff: !`gh pr diff $ARGUMENTS`
- Existing comments: !`gh pr view $ARGUMENTS --comments`

Summarize the PR, highlight the most important reviewer feedback, and suggest next steps.

The !command“ syntax runs the shell command at invocation time and injects its output into the prompt. The skill now operates on real data, not a description of what data might exist.

Skills can run parallel subagents:

---
name: deep-review
description: Thorough code review using multiple parallel reviewers.
context: fork
---

Spawn three parallel review agents for $ARGUMENTS:
- Agent 1: Focus on correctness and logic errors
- Agent 2: Focus on security vulnerabilities and input validation
- Agent 3: Focus on performance and scalability

Aggregate their findings into a single prioritized report.

context: fork isolates the skill in a subagent, preventing the heavy review work from bloating your main conversation context.

Key frontmatter fields:

FieldPurpose
descriptionWhen to auto-invoke. Required for Claude to discover the skill.
disable-model-invocationtrue = only you can invoke. Prevents accidental execution.
user-invocablefalse = only Claude can invoke. Hides from / menu (background knowledge).
allowed-toolsRestrict which tools Claude can use inside this skill.
contextfork = run in an isolated subagent context.

Claude Code ships several bundled skills out of the box: /batch parallelizes large refactors across isolated worktrees, /simplify runs three concurrent code-review agents and aggregates findings, /loop schedules recurring prompts on an interval, and /claude-api loads the Anthropic SDK reference into your session.

The distinction between commands and skills matters as your workflows grow complex. Commands work well for simple, text-in/text-out templates. Skills are the right choice when you need auto-discovery, safety guards on destructive actions, live data injection, or parallel execution across large codebases.

Cursor’s equivalent: Agent rules and Composer

Cursor’s Composer (the multi-file agent mode) lets you attach rules selectively per session. Rules with alwaysApply: false are available on demand. You can also build reusable “Notepads” — persistent context blocks you attach to any conversation.

The pattern is the same: extract repeated instructions from inline chat into a named, reusable unit.

Real engineering use cases

Code review (/review-pr command or skill with auto-invocation): Encodes your team’s entire review checklist. The model runs it consistently on every diff without you re-specifying every criterion.

ADR generation (/write-adr): Takes a decision description, researches the codebase for context, and generates an Architecture Decision Record in your team’s template format.

Changelog update (/update-changelog): Reads recent commits, categorizes them by conventional commit type, and appends a properly formatted entry to CHANGELOG.md.

Test scaffolding (/scaffold-tests): Given a file path, reads the module, identifies untested functions, and generates Jest test stubs with the right mocking patterns for your project.

Parallel refactoring (/batch skill, built-in): Decomposes a large codebase change into 5–30 independent units and runs them in parallel isolated worktrees — dramatically faster than sequential edits.

Protected deployments (/deploy skill with disable-model-invocation: true): Runs a pre-flight checklist, validates environment, and executes the deployment. The safety flag ensures Claude never deploys without an explicit human command.

When to leave this layer

Commands and skills are powerful but they’re still working with text in a closed loop. The moment you need them to:

  • Query your actual database for current schema state
  • Pull the latest spec from an internal API
  • Create a Jira ticket
  • Read from your company’s internal docs system
  • Interact with GitHub, Slack, or any external service

…you’ve hit the ceiling of what a static prompt can do. That’s where MCP comes in.

Layer 3: MCP Tools

The Model Context Protocol (MCP) is an open standard — created by Anthropic but adopted across the industry — that defines how AI models connect to external systems. It’s the structured I/O layer between your AI agent and the real world.

The architecture

MCP uses a client/server model over two transports: stdio (local process communication) and SSE (HTTP server-sent events for remote servers). An MCP server exposes three primitive types:

  • Tools — functions the model can call (query a database, create a ticket, run a build)
  • Resources — context the model can read (documentation, schema, configuration)
  • Prompts — server-defined prompt templates the model can invoke
┌─────────────────┐       MCP (stdio/SSE)      ┌─────────────────┐
│   Claude Code   │ ◄─────────────────────────► │   MCP Server    │
│   or Cursor     │   tools / resources /        │  (your system)  │
│    (client)     │        prompts               │                 │
└─────────────────┘                              └─────────────────┘


                                               Your DB / API / Service

Connecting MCP servers in Claude Code

MCP servers are configured in .claude/settings.json:

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": {
        "POSTGRES_CONNECTION_STRING": "postgresql://localhost/myapp_dev"
      }
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "jira": {
      "command": "npx",
      "args": ["mcp-server-jira"],
      "env": {
        "JIRA_HOST": "https://mycompany.atlassian.net",
        "JIRA_API_TOKEN": "${JIRA_TOKEN}"
      }
    }
  }
}

Once configured, the model can call these tools naturally in conversation — no special syntax, no manual wiring. It decides when to use them based on context.

MCP transforms Skills into real workflows

Now the skills from Layer 2 gain real power:

/review-pr + GitHub MCP:

Review the current PR: $ARGUMENTS

1. Use the GitHub tool to fetch the PR diff and all inline review comments
2. Run the review checklist (types, error handling, tests, service layer, performance)
3. Check if any flagged patterns match existing GitHub issues
4. Post the review as a GitHub PR review (not just a comment) with line-level annotations
5. If all checks pass, approve the PR; otherwise request changes

/gen-migration + PostgreSQL MCP:

Generate a migration for: $ARGUMENTS

1. Use the postgres tool to read the current schema — run \d on relevant tables
2. Check for existing migrations in prisma/migrations/ to understand migration history
3. Generate the migration ensuring it's compatible with the current production schema
4. Check if any existing data would be affected
5. Generate rollback SQL as well

/write-adr + Confluence MCP:

Write an ADR for: $ARGUMENTS

1. Search Confluence for existing ADRs on related topics
2. Read the codebase to understand the current approach
3. Generate the ADR document
4. Create it in Confluence under the Engineering > ADRs space
5. Link the Confluence page in the PR description

Building a custom MCP server

For internal systems without existing MCP servers, you build your own. Here’s a minimal TypeScript MCP server exposing your internal API:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { ListToolsRequestSchema, CallToolRequestSchema } from "@modelcontextprotocol/sdk/types.js";

const server = new Server(
  { name: "internal-api", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: "get_feature_flags",
      description: "Get the current feature flag configuration for an environment",
      inputSchema: {
        type: "object",
        properties: {
          environment: { type: "string", enum: ["dev", "staging", "prod"] },
        },
        required: ["environment"],
      },
    },
    {
      name: "get_service_status",
      description: "Get the health status of all microservices",
      inputSchema: {
        type: "object",
        properties: {
          service: { type: "string", description: "Service name (optional, all services if omitted)" },
        },
      },
    },
  ],
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  if (name === "get_feature_flags") {
    const flags = await fetchFromInternalAPI(`/flags?env=${args.environment}`);
    return { content: [{ type: "text", text: JSON.stringify(flags, null, 2) }] };
  }

  if (name === "get_service_status") {
    const status = await fetchFromInternalAPI(`/health${args.service ? `/${args.service}` : ""}`);
    return { content: [{ type: "text", text: JSON.stringify(status, null, 2) }] };
  }

  throw new Error(`Unknown tool: ${name}`);
});

const transport = new StdioServerTransport();
await server.connect(transport);

Real engineering use cases for MCP

Database exploration — Connect your development database and let the model introspect actual schema, run explain plans, and generate migrations that are guaranteed to be compatible with the real state of your data, not a stale schema diagram.

CI/CD integration — Connect GitHub Actions or Jenkins. The model can check build status, read failing test logs, identify the root cause, and propose fixes — all without you copying logs into chat.

Multi-repo awareness — An MCP server exposing your service registry lets the model understand which service owns which API, where it’s deployed, and who’s on-call — context that doesn’t fit in any single CLAUDE.md.

Documentation access — Connect Confluence or Notion. Skills that generate ADRs, RFCs, or runbooks can now read existing documents for consistency rather than hallucinating from memory.

Observability — Connect Datadog, Sentry, or Grafana. A debugging session can start from an actual error trace rather than a vague description.

Layer 4: Plugins and Extensions

Plugins are the packaging layer. They take everything from Layers 1–3 — the grounding instructions, the reusable skills, the MCP connections — and bundle them into a shareable, installable unit.

What plugins look like today

Claude Code doesn’t have a formal plugin registry yet, but the pattern already exists: a plugin is a directory or package that contains:

my-engineering-plugin/
  CLAUDE.md              # Grounding instructions for this domain
  .claude/
    commands/
      review-pr.md       # Skills specific to this workflow
      deploy-preview.md
      write-runbook.md
    settings.json        # MCP server declarations
  mcp-servers/
    internal-api/        # Custom MCP server code
      index.ts
      package.json
  README.md

Teams share this via their internal package registry or a Git submodule. A new engineer npm installs the plugin, runs a setup script, and has the entire team’s AI workflow available locally — the same instructions, the same skills, the same data connections.

Cursor extensions

Cursor has a more formal extension mechanism through its VS Code compatibility layer, but the emerging pattern for team-level sharing is the .cursor/ directory committed to the repo. Teams share:

  • Rules in .cursor/rules/ for project-specific instructions
  • Notepads for reusable context blocks
  • MCP configuration in .cursor/mcp.json

A shared .cursor/ directory in a monorepo effectively functions as a plugin for every engineer who clones it.

The distribution advantage

The plugin layer solves a key organizational problem: AI workflow drift. Without packaging, every engineer evolves their personal CLAUDE.md and skill set independently. The senior engineer’s review process diverges from the junior’s. Critical security checks get missed because they’re not in everyone’s skill set.

Plugins make the team’s AI capability a versioned, deployable artifact — the same as any other piece of infrastructure.

The Full Picture: A Maturity Model

Here’s how a real team’s AI capability evolves across these layers:

Stage 1 — Personal Context (Week 1)

You write a CLAUDE.md for your project. Ten lines. Stack, conventions, test command. The model stops asking you which testing framework you use.

Stage 2 — Team Alignment (Month 1)

The team’s individual CLAUDE.md files diverge. You merge them, commit one to the repo, and establish a shared baseline. Everyone’s model now knows the team conventions.

Stage 3 — Workflow Extraction (Month 2–3)

You notice everyone types the same review instructions. You extract /review-pr. Then /write-adr. Then /update-changelog. Skills become a library of the team’s repeatable processes.

Stage 4 — Live Data (Month 3–6)

Skills hit their ceiling — they can’t query the real database, they can’t post to GitHub, they can’t read Confluence. You wire up MCP servers for the systems you use most. Skills become full workflows that interact with real systems.

Stage 5 — Packaged Distribution (Month 6+)

You want new team members, new services, and new repos to start with everything already configured. You package CLAUDE.md + skills + MCP configs into a plugin. It becomes part of your internal platform’s setup script.

Knowing When to Advance

The signals are clear if you watch for them:

SignalNext Layer
You paste the same context into every conversationSystem Prompt in CLAUDE.md
You re-type the same multi-step task repeatedlyCommand or Skill
Your skill needs real data it can’t access as textMCP Tool
New team members need to manually configure everythingPlugin
Different repos have diverged AI configurationsPlugin

A Concrete Example: The PR Review Workflow

Here’s the same PR review capability built at each layer:

Layer 1 — System Prompt only:

# CLAUDE.md
When reviewing PRs, check: types, error handling, test coverage,
service layer patterns, and N+1 queries.

Result: The model knows your standards. You still type “review this PR” and paste the diff.

Layer 2a — Command added:

# .claude/commands/review-pr.md
Review $ARGUMENTS as a PR. Check:
1. TypeScript types on all new functions
2. try/catch on async operations
3. Test coverage for new code paths
4. Queries only in service layer
5. N+1 query patterns

Format the output as a PR description ready to paste.

Result: /review-pr runs the full checklist. You still paste the diff manually.

Layer 2b — Skill added:

# .claude/skills/review-pr/SKILL.md
---
name: review-pr
description: Reviews pull requests and diffs against team standards. Use when asked to review a PR or code changes.
allowed-tools: Read, Grep, Bash
---

Context:
- PR diff: !`gh pr diff $ARGUMENTS`

Review the diff against team standards (types, error handling, test coverage, service layer, N+1 patterns).
Format the output as a PR description ready to paste.

Result: Claude auto-detects review requests and runs the checklist. The skill fetches the diff itself — no manual copy-paste.

Layer 3 — MCP added:

Same skill, but now with GitHub MCP configured. The skill fetches the PR diff itself, checks the PR’s existing comments, and posts the review directly to GitHub as a formal review with line annotations.

Result: /review-pr 1234 is the entire workflow. No copy-paste. Review appears on GitHub.

Layer 4 — Plugin:

The whole setup — CLAUDE.md, /review-pr skill, GitHub MCP config, internal API MCP server — is packaged and installed via npm install @mycompany/engineering-ai-plugin. Every engineer gets it. Every repo has it.

Result: PR review is a one-command operation for every engineer from day one.

Sources