Modern Agent Engineering

12-Factor Agents in Practice: LangChain/LangGraph and Agno/AgentOS

A source-audited translation of HumanLayer's 12-factor agent principles into practical LangGraph and Agno/AgentOS architecture, with production-minded Python examples.

24 min read

humanlayer/12-factor-agents is one of the few agent-engineering documents I would hand to senior engineers without apologizing first.

Not because it is perfectly complete, and not because every factor should be implemented literally, but because it is arguing about the right thing: runtime semantics, not prompt theater.

As of March 31, 2026, I audited the upstream repository and the current official docs for LangGraph and Agno/AgentOS for this piece. This article is not a rewrite of the original guide. It is a translation layer for engineers who need to decide: which of these principles should become explicit architecture, and which framework makes that easier?

TL;DR

  • 12-factor-agents is best read as a production runtime manifesto, not a framework tutorial.
  • The factors are not equally important. In practice, the center of gravity is:
    • structured tool and human interfaces
    • owning context and state
    • pause/resume plus explicit control flow
    • failure handling plus narrow agent scope
  • LangGraph is usually the closer fit when you want the framework to expose durable state, checkpoints, interrupts, and graph-level control directly.
  • Agno/AgentOS is usually the more practical fit when you want sessions, approvals, traces, APIs, and control-plane ergonomics without rebuilding that runtime surface yourself.
  • My strongest editorial take: the HumanLayer philosophy is directionally right, but it also assumes your team is willing to own more orchestration detail than many product teams actually want to maintain.

What You Will Learn Here

  • What 12-factor-agents is actually saying once you strip away the branding and factor-numbering.
  • Which factors matter most in production systems, and which ones are better understood as extensions or style choices.
  • How to map the key factors onto LangGraph concepts like threads, checkpoints, interrupts, and subgraphs.
  • How to map the same ideas onto Agno/AgentOS concepts like sessions, summaries, approvals, tools, and the control plane.
  • A practical release/deployment agent example in Python for both ecosystems.
  • Where LangGraph is closer to the philosophy, and where Agno/AgentOS is simply easier to ship.

The Research Audit: What 12-Factor Agents Actually Says

The source-backed reading of the repository is fairly consistent.

First, the repo is explicitly skeptical of the default “LLM plus tools in a loop” story. The README argues that strong production agents are mostly software, with LLM decisions inserted at the right points, and that most teams chasing quality eventually need deeper control over prompts, flow, and state than frameworks expose by default.

Second, the repo treats tool calls as structured outputs, context as the real runtime, and control flow as an application concern. Factors 1, 3, 4, 5, 6, 7, 8, and 9 all orbit that same idea from different angles.

Third, the repo is heavily biased toward:

  • serializable thread-like state
  • explicit pause/resume semantics
  • durable human approval points
  • compact error feedback
  • small agent scope

That bias is not subtle. It is the heart of the guide.

Finally, the repo does not really read like twelve equally weighted laws. Some factors are core architectural guidance. Others are more like implementation heuristics. Factor 12 literally says it is “mostly just for fun,” which is a useful clue for how seriously to operationalize each section.

My inference from the source material is this:

12-factor-agents is not really a manifesto for “agentic” software in general. It is a manifesto for durable, inspectable, interruption-friendly, human-aware agent runtimes.

That is exactly why it maps unusually well to LangGraph, and only partially to simpler agent wrappers.

A Better Mental Model: 12 Factors as 4 Runtime Concerns

Instead of walking factor-by-factor, I think the more useful translation is to regroup them around four runtime concerns.

Slack / API / Cron trigger
          |
          v
+-----------------------------+
| Planner / structured output |
| "what should happen next?"  |
+-------------+---------------+
              |
              v
+-----------------------------+
| State store / thread        |
| events, approvals, errors,  |
| summaries, session history  |
+------+------+---------------+
       |      |
       |      +-------------------+
       |                          |
       v                          v
+-------------+          +------------------+
| Tool action  |          | Human approval   |
| fetch/deploy |          | reject/approve   |
+------+------+          +---------+--------+
       |                           |
       +------------+--------------+
                    |
                    v
            resume next step
Runtime concernFactorsLangGraph fitAgno/AgentOS fitPractical tradeoff
Structured tool and human interfaces1, 4, 7Strong, explicit, low-levelStrong, ergonomic, batteries includedLangGraph gives more raw control; Agno gives less ceremony
Context and state as runtime2, 3, 5, 12Very strong via thread state and checkpointsStrong via sessions, summaries, DB-backed contextLangGraph is more explicit; Agno is more managed
Pause/resume and control flow6, 8, 11Excellent with interrupts and durable executionStrong with confirmations, approvals, AgentOS APIsLangGraph feels closer to the philosophy; Agno is easier to operate
Failure handling and narrow scope9, 10Strong if you model errors in state deliberatelyStrong if you keep tools and teams narrowNeither framework saves you from bad scope decisions

That table is the whole article in compressed form.

1. Structured Tool and Human Interfaces

The source-backed principle across factors 1, 4, and 7 is simple:

  • the model should emit structured intent
  • deterministic code should decide how that intent is executed
  • asking a human for approval or clarification is just another kind of structured intent

That is an underrated point.

Many frameworks still treat “tool calling” as a magical API feature. HumanLayer is arguing for a more useful view: a tool call is just serialized intent at the LLM/runtime boundary.

That gives you freedom to do things frameworks do not always model directly:

  • turn tool intent into approval records
  • translate one model output into several deterministic actions
  • convert “contact a human” into a first-class step instead of unstructured assistant text
  • inspect or reject a proposed tool before it executes

Why this matters in the release-agent scenario

Suppose a release agent gets a Slack message:

Deploy release-2026.03.31 to production and page me if health checks fail.

The first useful question is not “which framework has tools?”

It is:

Can the system represent this as structured intent in a way that lets me inspect, pause, approve, deny, retry, and resume?

That is the 12-factor question.

Factor-by-factor examples

Factor 1: Natural Language to Tool Calls

In this framing, factor 1 means the model should turn a human release request into a typed intent object before any deploy logic runs.

from typing import Literal

from pydantic import BaseModel


class ReleaseIntent(BaseModel):
    action: Literal["deploy", "verify", "rollback"]
    tag: str
    environment: Literal["staging", "production"]
    notify_on_failure: bool


intent = llm.with_structured_output(ReleaseIntent).invoke(
    "Deploy release-2026.03.31 to production and page me if health checks fail."
)

In LangGraph this intent usually becomes a state update; in Agno it can become an output_schema result or tool-selection input.

Factor 4: Tools Are Just Structured Outputs

Factor 4 says the model is deciding what should happen next, while deterministic code still owns how that action is executed.

def dispatch_release_intent(intent: ReleaseIntent) -> str:
    if intent.action == "deploy":
        return "request_approval" if intent.environment == "production" else "deploy_now"
    if intent.action == "verify":
        return "run_checks"
    if intent.action == "rollback":
        return "open_incident"
    return "done"


next_step = dispatch_release_intent(intent)

That split maps naturally to LangGraph routing functions and to Agno code around tools, approvals, or workflow steps.

Factor 7: Contact Humans With Tool Calls

For factor 7, “ask a human” should be a typed step in the runtime, not a vague assistant sentence buried in chat history.

def build_approval_request(intent: ReleaseIntent, requested_by: str) -> dict:
    return {
        "intent": "request_human_input",
        "question": f"Approve deploy of {intent.tag} to {intent.environment}?",
        "context": "Production release with customer-facing impact.",
        "options": ["approve", "reject"],
        "requested_by": requested_by,
    }


approval_event = build_approval_request(intent, requested_by="slack:alice")

In LangGraph this becomes an interrupt payload or event; in Agno/AgentOS it maps cleanly to confirmation-gated tools and approval records.

LangGraph reading

LangGraph fits this well because it already assumes that state updates and node transitions are the real runtime. The model can emit structured intent, but the graph decides what node runs next and whether to pause before a high-stakes side effect.

In practice, LangGraph encourages a healthy split:

  • model decides the next semantic step
  • nodes and edges decide execution semantics
  • interrupts decide when humans re-enter the loop

Agno/AgentOS reading

Agno fits the same principle from a more productized angle:

  • output_schema makes structured output first-class
  • @tool(requires_confirmation=True) makes approval-gated execution first-class
  • AgentOS persists and exposes the approval flow operationally

This is one of the first places where Agno/AgentOS feels more practical than purist. HumanLayer would like you to own the runtime semantics yourself. Agno says: fine, but most teams would rather have a supported approval system.

That is a real tradeoff, not a philosophical failure.

2. Context Is the Runtime

Factors 2, 3, 5, and 12 all point at the same architectural claim:

The best source of truth for an agent is often the serialized history of what has happened so far, not a separate hidden execution object.

This is where the repo gets genuinely interesting.

Most agent discussions stop at prompt engineering or memory. HumanLayer pushes further and says:

  • own the prompt
  • own the context format
  • unify execution state and business state when possible
  • think of the agent as a reducer over serialized events

That is a much more systems-oriented framing.

The practical payoff

If your release agent keeps a serialized event log like:

  • trigger received
  • release metadata loaded
  • approval requested
  • approval granted
  • deploy attempted
  • health checks failed

then several hard production features become simpler:

  • replay
  • debugging
  • UI rendering
  • forking
  • resumability
  • summaries and compaction

Factor-by-factor examples

Factor 2: Own Your Prompts

Factor 2 means the planner prompt should be explicit application code, not a hidden bundle of “role” and “goal” defaults.

def build_release_planner_prompt(events: list[dict], environment: str) -> str:
    history = "\n".join(f"- {event['type']}: {event['data']}" for event in events[-6:])
    return f"""You are a release planner for {environment}.

Rules:
- Never deploy to production without approval.
- Prefer verify before declare success.

History:
{history}
"""

That pattern works the same way in both frameworks because prompt ownership sits above the orchestration layer.

Factor 3: Own Your Context Window

Factor 3 says you should serialize only the events that matter instead of dumping every prior token back into the model.

def event_to_prompt(event: dict) -> str:
    return f"<{event['type']}>\n{event['data']}\n</{event['type']}>"


def thread_to_prompt(events: list[dict]) -> str:
    keep = {"trigger", "release_context", "approval_response", "error"}
    selected = [event for event in events if event["type"] in keep]
    return "\n\n".join(event_to_prompt(event) for event in selected)

LangGraph encourages this through explicit state shaping; Agno supports it through custom context assembly plus summaries and history controls.

Factor 5: Unify Execution State and Business State

Factor 5 argues that one event log can often explain both “what happened” and “what the runtime should do next.”

events = [
    {"type": "trigger", "data": "deploy release-2026.03.31"},
    {"type": "release_context", "data": {"tag": "release-2026.03.31"}},
    {"type": "approval_requested", "data": {"channel": "slack"}},
    {"type": "approval_response", "data": {"approved": True}},
]

waiting_for_approval = events[-1]["type"] == "approval_requested"
ready_to_deploy = events[-1]["type"] == "approval_response" and events[-1]["data"]["approved"]

LangGraph exposes this style more directly, while Agno tends to wrap more of it inside session and run abstractions.

Factor 12: Make Your Agent a Stateless Reducer

Factor 12 is the most functional reading of the same idea: derive current release state by reducing prior events.

def reduce_release_events(events: list[dict]) -> dict:
    state = {"approved": False, "deployed": False, "last_error": None}
    for event in events:
        if event["type"] == "approval_response":
            state["approved"] = event["data"]["approved"]
        elif event["type"] == "deploy_result":
            state["deployed"] = event["data"]["status"] == "success"
        elif event["type"] == "error":
            state["last_error"] = event["data"]["message"]
    return state

You do not have to implement your system this way, but LangGraph’s state-first model is closer to it than most agent wrappers.

LangGraph reading

This is the strongest LangGraph fit in the entire article.

The persistence docs are explicit that LangGraph saves graph state as checkpoints organized into threads, and that these checkpoints enable human-in-the-loop workflows, memory, time travel, and fault-tolerant execution. The durable execution docs are also explicit that with a checkpointer in place, a workflow can resume from saved progress instead of replaying from zero.

That is very close to the HumanLayer worldview.

LangGraph does not force you into a hidden “agent object with vibes.” It encourages you to model state directly, checkpoint it, reload it by thread_id, and keep progressing from there.

Agno/AgentOS reading

Agno lands on a different but still useful abstraction.

The current docs for Agent.run() describe the runtime building context from system instructions, user input, chat history, user memories, session state, and other relevant inputs. Session summaries then compress long histories into shorter running summaries. AgentOS adds DB-backed sessions, memory, traces, and APIs on top of that.

So Agno absolutely supports the underlying runtime idea. It just packages more of it behind session-centric APIs.

This is the key difference:

  • LangGraph says: state is the graph runtime; model it explicitly.
  • Agno says: state is the agent/session runtime; we will help manage it.

If you like explicit orchestration, LangGraph feels cleaner. If you want a production service quickly, Agno is often the faster move.

3. Pause/Resume Beats Magical Autonomy

This is where factors 6, 8, and 11 become the real production story.

The source-backed argument is straightforward:

  • agents should launch from simple APIs
  • they should pause when a human or long-running dependency is needed
  • they should resume later from an external trigger
  • humans should be reachable through the same runtime surface, not bolted on as an afterthought

This matters more than most “multi-agent” conversations.

A release agent that can choose tools but cannot pause before deploy, wait for approval, and resume later without losing its place is not production-grade. It is a demo with credentials.

Factor-by-factor examples

Factor 6: Launch/Pause/Resume With Simple APIs

Factor 6 means the outside world should be able to start and resume the runtime with small, predictable APIs.

def start_release_run(message: str, thread_id: str) -> dict:
    state = {"events": [{"type": "trigger", "data": message}]}
    return graph.invoke(state, config={"configurable": {"thread_id": thread_id}})


def resume_release_run(thread_id: str, approved: bool, approver: str) -> dict:
    return graph.invoke(
        Command(resume={"approved": approved, "approved_by": approver}),
        config={"configurable": {"thread_id": thread_id}},
    )

In LangGraph this is literally a thread plus Command(resume=...); in Agno the equivalent is session_id, paused runs, and continue_run().

Factor 8: Own Your Control Flow

Factor 8 is the insistence that routing rules should remain visible deterministic code, not opaque framework behavior.

def route_release_step(state: dict) -> str:
    if state.get("last_error"):
        return "escalate"
    if not state.get("context_loaded"):
        return "collect_context"
    if state["environment"] == "production" and not state.get("approved"):
        return "approval_gate"
    if not state.get("deployed"):
        return "deploy"
    return "verify"

This is idiomatic LangGraph and still a useful mental model when wrapping Agno tools or workflow steps.

Factor 11: Trigger From Anywhere

Factor 11 means Slack, cron, and API calls should normalize into the same internal event format before the agent loop sees them.

def normalize_trigger(source: str, payload: dict) -> dict:
    return {
        "type": "trigger",
        "data": {
            "source": source,
            "requested_by": payload.get("user", source),
            "message": payload["message"],
            "thread_key": payload.get("thread_id") or payload.get("session_id"),
        },
    }

That keeps channel-specific glue outside the core runtime whether you are using LangGraph threads or AgentOS sessions.

LangGraph example: explicit state, explicit interrupt, explicit resume

This example stays intentionally close to the 12-factor mindset: the model emits structured intent, deterministic code owns execution, and approval is a real pause point.

from __future__ import annotations

from typing import Any, Literal, NotRequired, TypedDict

from langgraph.checkpoint.memory import InMemorySaver
from langgraph.graph import END, START, StateGraph
from langgraph.types import Command, interrupt


class ReleaseDecision(TypedDict):
    intent: Literal[
        "collect_release_data",
        "request_approval",
        "deploy_release",
        "post_deploy_check",
        "done",
    ]
    reason: str


class ReleaseState(TypedDict):
    release_id: str
    tag: str
    environment: Literal["staging", "production"]
    requested_by: str
    events: list[dict[str, Any]]
    next_action: NotRequired[ReleaseDecision]
    approval: NotRequired[dict[str, Any]]
    deployment_result: NotRequired[dict[str, Any]]
    consecutive_errors: int


def fetch_release_context(release_id: str) -> dict[str, Any]:
    # Deterministic tool boundary: this is not model logic.
    return {
        "release_id": release_id,
        "tag": "release-2026.03.31",
        "change_ticket": "CHG-8471",
        "smoke_tests_passed": True,
    }


def deploy_release(tag: str, environment: str) -> dict[str, Any]:
    # Replace with your deploy system.
    return {"tag": tag, "environment": environment, "status": "success"}


def build_prompt(events: list[dict[str, Any]]) -> str:
    rendered = "\n".join(f"- {event['type']}: {event['data']}" for event in events)
    return f"""You coordinate software releases.

Here is the execution history so far:
{rendered}

Return the next structured step only.
"""


def planner_node(state: ReleaseState) -> dict[str, Any]:
    prompt = build_prompt(state["events"])

    # Assume `llm` is an instantiated LangChain chat model client.
    decision: ReleaseDecision = llm.with_structured_output(ReleaseDecision).invoke(prompt)

    return {
        "next_action": decision,
        "events": state["events"]
        + [{"type": "planner_decision", "data": decision}],
    }


def collect_context_node(state: ReleaseState) -> dict[str, Any]:
    context = fetch_release_context(state["release_id"])
    return {
        "events": state["events"] + [{"type": "release_context", "data": context}],
    }


def approval_gate_node(state: ReleaseState) -> dict[str, Any]:
    approval = interrupt(
        {
            "kind": "deploy_approval",
            "release_id": state["release_id"],
            "tag": state["tag"],
            "environment": state["environment"],
            "requested_by": state["requested_by"],
            "reason": state["next_action"]["reason"],
        }
    )

    return {
        "approval": approval,
        "events": state["events"] + [{"type": "approval_response", "data": approval}],
    }


def deploy_node(state: ReleaseState) -> dict[str, Any]:
    try:
        result = deploy_release(state["tag"], state["environment"])
        return {
            "deployment_result": result,
            "consecutive_errors": 0,
            "events": state["events"] + [{"type": "deploy_result", "data": result}],
        }
    except Exception as exc:
        compact_error = {
            "tool": "deploy_release",
            "message": str(exc)[:240],
        }
        return {
            "consecutive_errors": state["consecutive_errors"] + 1,
            "events": state["events"] + [{"type": "error", "data": compact_error}],
        }


def verify_node(state: ReleaseState) -> dict[str, Any]:
    verification = {
        "environment": state["environment"],
        "healthy": True,
    }
    return {
        "events": state["events"] + [{"type": "post_deploy_check_result", "data": verification}],
    }


def route_from_planner(state: ReleaseState) -> str:
    intent = state["next_action"]["intent"]
    if intent == "collect_release_data":
        return "collect_context"
    if intent == "request_approval":
        return "approval_gate"
    if intent == "deploy_release":
        return "deploy"
    if intent == "post_deploy_check":
        return "verify"
    return END


builder = StateGraph(ReleaseState)
builder.add_node("planner", planner_node)
builder.add_node("collect_context", collect_context_node)
builder.add_node("approval_gate", approval_gate_node)
builder.add_node("deploy", deploy_node)
builder.add_node("verify", verify_node)

builder.add_edge(START, "planner")
builder.add_conditional_edges("planner", route_from_planner)
builder.add_edge("collect_context", "planner")
builder.add_edge("approval_gate", "planner")
builder.add_edge("deploy", "planner")
builder.add_edge("verify", "planner")

graph = builder.compile(checkpointer=InMemorySaver())  # Demo-only saver; use a durable checkpointer in production.

config = {"configurable": {"thread_id": "release-847"}}

initial_state: ReleaseState = {
    "release_id": "rel_847",
    "tag": "release-2026.03.31",
    "environment": "production",
    "requested_by": "slack:alice",
    "events": [
        {
            "type": "trigger",
            "data": {
                "channel": "slack",
                "message": "Deploy release-2026.03.31 to production",
            },
        }
    ],
    "consecutive_errors": 0,
}

# First run pauses at approval_gate_node and persists thread state for this in-process demo.
graph.invoke(initial_state, config=config)

# Later, in the same process; with a durable checkpointer this can also resume from a webhook or admin UI:
graph.invoke(
    Command(resume={"approved": True, "approved_by": "sre-oncall"}),
    config=config,
)

Why this example matters:

  • shared typed state is explicit
  • the node/tool boundary is explicit
  • approval is a real runtime pause, not assistant prose
  • compact errors are just more serialized context
  • thread_id is the durable pointer

This is very close to the HumanLayer philosophy.

Agno/AgentOS example: session-centric runtime with built-in approvals

Now look at the same scenario through Agno/AgentOS.

from typing import Literal

from agno.agent import Agent
from agno.db.postgres import PostgresDb
from agno.models.openai import OpenAIResponses
from agno.os import AgentOS
from agno.tools import tool
from pydantic import BaseModel, Field


db = PostgresDb(
    db_url="postgresql+psycopg://ai:ai@localhost:5532/ai"
)


class ReleaseDecision(BaseModel):
    action: Literal["collect_context", "wait_for_approval", "deploy", "verify", "done"]
    summary: str
    tag: str
    environment: Literal["staging", "production"]
    risk: Literal["low", "medium", "high"]
    rollback_hint: str | None = Field(default=None)


@tool
def fetch_release_context(release_id: str) -> dict:
    return {
        "release_id": release_id,
        "change_ticket": "CHG-8471",
        "smoke_tests_passed": True,
        "approver_group": "sre-oncall",
    }


@tool(requires_confirmation=True)
def deploy_release(tag: str, environment: str) -> dict:
    return {"tag": tag, "environment": environment, "status": "success"}


@tool
def run_post_deploy_checks(environment: str) -> dict:
    return {"environment": environment, "healthy": True}


release_agent = Agent(
    id="release-coordinator",
    name="Release Coordinator",
    model=OpenAIResponses(id="gpt-5.2"),
    db=db,
    tools=[fetch_release_context, deploy_release, run_post_deploy_checks],
    output_schema=ReleaseDecision,
    instructions=[
        "Coordinate releases conservatively.",
        "Always gather release context before any production action.",
        "Treat production deploys as high risk and require confirmation.",
        "After deployment, run verification before declaring success.",
    ],
    add_history_to_context=True,
    enable_session_summaries=True,
    num_history_runs=3,
    markdown=True,
)


agent_os = AgentOS(
    id="release-agent-os",
    agents=[release_agent],
    db=db,
)

app = agent_os.get_app()


run_response = release_agent.run(
    "Deploy rel_847 to production and verify it afterward.",
    user_id="slack:alice",
    session_id="release-847",
)

if run_response.is_paused:
    for requirement in run_response.active_requirements:
        print(requirement.tool.tool_name, requirement.tool.tool_args)
        requirement.confirmed = True

    final_response = release_agent.continue_run(
        run_id=run_response.run_id,
        requirements=run_response.requirements,
    )

And if you serve the agent through AgentOS, the runtime story becomes even more operational:

# Request enters through the AgentOS API with a stable session_id
curl -X POST http://localhost:7777/agents/release-coordinator/runs \
  -F "message=Deploy rel_847 to production" \
  -F "user_id=slack:alice" \
  -F "session_id=release-847"

# If deploy_release requires confirmation, the run pauses.
# An admin can resolve the approval in the AgentOS control plane
# or through the approvals API, then the run resumes.

Why this example matters:

  • session_id is the durable conversation handle
  • output_schema gives you typed decisions
  • requires_confirmation=True gives you built-in pause points for risky tools
  • session summaries help keep context from growing without bound
  • AgentOS adds APIs, approvals, traces, and an operational surface around the agent

This is not as raw as LangGraph, but it is often a more realistic fit for teams that need the runtime to be a product, not a bespoke orchestration project.

4. Design for Failure and Narrow Scope

Factors 9 and 10 are less glamorous, but they are where many agent systems actually succeed or fail.

HumanLayer’s argument is:

  • feed compact, useful error state back into the runtime
  • do not let one agent sprawl into a 40-tool general-purpose blob

That is exactly right.

Factor-by-factor examples

Factor 9: Compact Errors Into Context Window

Factor 9 says failures should be compressed into runtime-readable state so the next step can adapt instead of starting blind.

def record_deploy_error(events: list[dict], exc: Exception, retries: int) -> tuple[list[dict], int]:
    compact_error = {
        "tool": "deploy_release",
        "message": str(exc)[:180],
        "retryable": retries < 2,
    }
    return (
        events + [{"type": "error", "data": compact_error}],
        retries + 1,
    )

This pattern fits both frameworks: LangGraph via state updates, Agno via appended context plus a resumed run.

Factor 10: Small, Focused Agents

Factor 10 is less about “multi-agent” branding and more about preserving narrow responsibilities in the release system.

release_components = {
    "planner": "decide next release step",
    "deployer": "execute deployment only",
    "verifier": "run post-deploy checks only",
    "escalator": "page humans when risk or failure is high",
}

active_component = release_components["planner"]

In LangGraph these can become nodes or subgraphs; in Agno they can become separate agents, workflow steps, or team members without changing the external product surface.

Compact errors are runtime input, not just logs

If your deploy action fails because the artifact is missing or a health check times out, the worst thing you can do is bury that inside observability only. The next useful step is often shaped by the error itself.

The useful pattern is:

  • compact the error
  • append it to state
  • let the planner decide whether to retry, escalate, or stop

In other words: errors belong in both observability and context.

Small, focused agents still win

This section of the upstream repo is one of the least controversial and most correct. Bigger context, more tools, and broader scope do not just increase cost. They also degrade reliability and make failures harder to localize.

For the release scenario, a practical split is:

  • release planner
  • deployment executor
  • post-deploy verifier
  • incident escalator

You do not necessarily need four separately deployed agents. But you should preserve those boundaries in the design, whether as LangGraph subgraphs, Agno teams/workflows, or simple deterministic modules behind one user-facing interface.

A compact operational rule

If a tool can mutate production state, it should usually satisfy all three:

  1. The intent is typed.
  2. The action is approval-aware or at least policy-gated.
  3. The failure mode is serialized back into context in compact form.

That rule alone will improve many real systems more than chasing a fancier orchestration abstraction.

Where LangGraph Fits the 12-Factor Philosophy Best

This section is intentionally editorial.

LangGraph is the better fit when the job is:

  • building a durable execution runtime
  • modeling explicit state transitions
  • preserving a strong distinction between model choice and runtime semantics
  • making pause/resume, replay, thread history, and branching first-class

The current docs support that reading directly:

  • persistence is built around threads and checkpoints
  • interrupts pause execution and resume with external input
  • durable execution keeps progress rather than replaying from zero
  • subgraphs let you isolate reusable or multi-agent components

That architecture is unusually aligned with the HumanLayer principles.

My stronger take is this:

If you read 12-factor-agents and feel energized by the idea of owning your runtime semantics, you probably want LangGraph more than a higher-level agent wrapper.

The cost, of course, is that LangGraph will let you be explicit about almost everything, which means you also get to own almost everything:

  • state design
  • graph boundaries
  • idempotency discipline
  • serialization choices
  • human approval UX
  • production APIs around the graph

That is power, but it is also real engineering work.

Where Agno/AgentOS Is More Practical Than Purist

Agno/AgentOS fits a different team shape.

The current docs are explicit that AgentOS is a production runtime and control plane with:

  • ready-to-use APIs
  • DB-backed sessions, memory, and traces
  • approvals and human-in-the-loop flows
  • SSE-friendly streaming
  • RBAC and request isolation

This matters because many teams do not actually want to construct the perfect explicit runtime from primitives. They want:

  • a strong session model
  • approval-aware tool execution
  • a control plane
  • production endpoints
  • operational visibility

Agno gives you much more of that out of the box.

This is where I disagree slightly with a purist reading of HumanLayer.

The repo is correct that teams often need deeper control than frameworks expose. But some teams overshoot in the other direction and rebuild orchestration infrastructure that was never their product’s moat.

Agno/AgentOS is often the right answer when:

  • you are Python-first
  • you want agent teams and workflows served as APIs quickly
  • you need sessions, approvals, and traces early
  • you are willing to accept a more opinionated runtime in exchange for shipping speed

I would summarize the difference like this:

  • LangGraph is better when your hardest problem is runtime semantics.
  • Agno/AgentOS is better when your hardest problem is operationalizing agent services.

That is not a small distinction.

Final Take

The most valuable thing about 12-factor-agents is not the number 12.

It is the insistence that reliable agent systems are mostly a question of:

  • structured boundaries
  • state design
  • pause/resume semantics
  • explicit control flow
  • narrow scope

That is the right battle.

If I had to compress the translation into one sentence, it would be this:

HumanLayer tells you to own the runtime; LangGraph helps you do that directly; Agno/AgentOS helps you ship a managed version of it.

For expert engineers, the decision is not “which framework is best?”

It is:

Do we want to build a runtime, or do we want to run one?

That is the architectural fork hiding underneath most agent-framework debates.

Sources