humanlayer/12-factor-agents is one of the few agent-engineering documents I would hand to senior engineers without apologizing first.
Not because it is perfectly complete, and not because every factor should be implemented literally, but because it is arguing about the right thing: runtime semantics, not prompt theater.
As of March 31, 2026, I audited the upstream repository and the current official docs for LangGraph and Agno/AgentOS for this piece. This article is not a rewrite of the original guide. It is a translation layer for engineers who need to decide: which of these principles should become explicit architecture, and which framework makes that easier?
TL;DR
12-factor-agentsis best read as a production runtime manifesto, not a framework tutorial.- The factors are not equally important. In practice, the center of gravity is:
- structured tool and human interfaces
- owning context and state
- pause/resume plus explicit control flow
- failure handling plus narrow agent scope
- LangGraph is usually the closer fit when you want the framework to expose durable state, checkpoints, interrupts, and graph-level control directly.
- Agno/AgentOS is usually the more practical fit when you want sessions, approvals, traces, APIs, and control-plane ergonomics without rebuilding that runtime surface yourself.
- My strongest editorial take: the HumanLayer philosophy is directionally right, but it also assumes your team is willing to own more orchestration detail than many product teams actually want to maintain.
What You Will Learn Here
- What
12-factor-agentsis actually saying once you strip away the branding and factor-numbering. - Which factors matter most in production systems, and which ones are better understood as extensions or style choices.
- How to map the key factors onto LangGraph concepts like threads, checkpoints, interrupts, and subgraphs.
- How to map the same ideas onto Agno/AgentOS concepts like sessions, summaries, approvals, tools, and the control plane.
- A practical release/deployment agent example in Python for both ecosystems.
- Where LangGraph is closer to the philosophy, and where Agno/AgentOS is simply easier to ship.
The Research Audit: What 12-Factor Agents Actually Says
The source-backed reading of the repository is fairly consistent.
First, the repo is explicitly skeptical of the default “LLM plus tools in a loop” story. The README argues that strong production agents are mostly software, with LLM decisions inserted at the right points, and that most teams chasing quality eventually need deeper control over prompts, flow, and state than frameworks expose by default.
Second, the repo treats tool calls as structured outputs, context as the real runtime, and control flow as an application concern. Factors 1, 3, 4, 5, 6, 7, 8, and 9 all orbit that same idea from different angles.
Third, the repo is heavily biased toward:
- serializable thread-like state
- explicit pause/resume semantics
- durable human approval points
- compact error feedback
- small agent scope
That bias is not subtle. It is the heart of the guide.
Finally, the repo does not really read like twelve equally weighted laws. Some factors are core architectural guidance. Others are more like implementation heuristics. Factor 12 literally says it is “mostly just for fun,” which is a useful clue for how seriously to operationalize each section.
My inference from the source material is this:
12-factor-agentsis not really a manifesto for “agentic” software in general. It is a manifesto for durable, inspectable, interruption-friendly, human-aware agent runtimes.
That is exactly why it maps unusually well to LangGraph, and only partially to simpler agent wrappers.
A Better Mental Model: 12 Factors as 4 Runtime Concerns
Instead of walking factor-by-factor, I think the more useful translation is to regroup them around four runtime concerns.
Slack / API / Cron trigger
|
v
+-----------------------------+
| Planner / structured output |
| "what should happen next?" |
+-------------+---------------+
|
v
+-----------------------------+
| State store / thread |
| events, approvals, errors, |
| summaries, session history |
+------+------+---------------+
| |
| +-------------------+
| |
v v
+-------------+ +------------------+
| Tool action | | Human approval |
| fetch/deploy | | reject/approve |
+------+------+ +---------+--------+
| |
+------------+--------------+
|
v
resume next step
| Runtime concern | Factors | LangGraph fit | Agno/AgentOS fit | Practical tradeoff |
|---|---|---|---|---|
| Structured tool and human interfaces | 1, 4, 7 | Strong, explicit, low-level | Strong, ergonomic, batteries included | LangGraph gives more raw control; Agno gives less ceremony |
| Context and state as runtime | 2, 3, 5, 12 | Very strong via thread state and checkpoints | Strong via sessions, summaries, DB-backed context | LangGraph is more explicit; Agno is more managed |
| Pause/resume and control flow | 6, 8, 11 | Excellent with interrupts and durable execution | Strong with confirmations, approvals, AgentOS APIs | LangGraph feels closer to the philosophy; Agno is easier to operate |
| Failure handling and narrow scope | 9, 10 | Strong if you model errors in state deliberately | Strong if you keep tools and teams narrow | Neither framework saves you from bad scope decisions |
That table is the whole article in compressed form.
1. Structured Tool and Human Interfaces
The source-backed principle across factors 1, 4, and 7 is simple:
- the model should emit structured intent
- deterministic code should decide how that intent is executed
- asking a human for approval or clarification is just another kind of structured intent
That is an underrated point.
Many frameworks still treat “tool calling” as a magical API feature. HumanLayer is arguing for a more useful view: a tool call is just serialized intent at the LLM/runtime boundary.
That gives you freedom to do things frameworks do not always model directly:
- turn tool intent into approval records
- translate one model output into several deterministic actions
- convert “contact a human” into a first-class step instead of unstructured assistant text
- inspect or reject a proposed tool before it executes
Why this matters in the release-agent scenario
Suppose a release agent gets a Slack message:
Deploy
release-2026.03.31to production and page me if health checks fail.
The first useful question is not “which framework has tools?”
It is:
Can the system represent this as structured intent in a way that lets me inspect, pause, approve, deny, retry, and resume?
That is the 12-factor question.
Factor-by-factor examples
Factor 1: Natural Language to Tool Calls
In this framing, factor 1 means the model should turn a human release request into a typed intent object before any deploy logic runs.
from typing import Literal
from pydantic import BaseModel
class ReleaseIntent(BaseModel):
action: Literal["deploy", "verify", "rollback"]
tag: str
environment: Literal["staging", "production"]
notify_on_failure: bool
intent = llm.with_structured_output(ReleaseIntent).invoke(
"Deploy release-2026.03.31 to production and page me if health checks fail."
)
In LangGraph this intent usually becomes a state update; in Agno it can become an output_schema result or tool-selection input.
Factor 4: Tools Are Just Structured Outputs
Factor 4 says the model is deciding what should happen next, while deterministic code still owns how that action is executed.
def dispatch_release_intent(intent: ReleaseIntent) -> str:
if intent.action == "deploy":
return "request_approval" if intent.environment == "production" else "deploy_now"
if intent.action == "verify":
return "run_checks"
if intent.action == "rollback":
return "open_incident"
return "done"
next_step = dispatch_release_intent(intent)
That split maps naturally to LangGraph routing functions and to Agno code around tools, approvals, or workflow steps.
Factor 7: Contact Humans With Tool Calls
For factor 7, “ask a human” should be a typed step in the runtime, not a vague assistant sentence buried in chat history.
def build_approval_request(intent: ReleaseIntent, requested_by: str) -> dict:
return {
"intent": "request_human_input",
"question": f"Approve deploy of {intent.tag} to {intent.environment}?",
"context": "Production release with customer-facing impact.",
"options": ["approve", "reject"],
"requested_by": requested_by,
}
approval_event = build_approval_request(intent, requested_by="slack:alice")
In LangGraph this becomes an interrupt payload or event; in Agno/AgentOS it maps cleanly to confirmation-gated tools and approval records.
LangGraph reading
LangGraph fits this well because it already assumes that state updates and node transitions are the real runtime. The model can emit structured intent, but the graph decides what node runs next and whether to pause before a high-stakes side effect.
In practice, LangGraph encourages a healthy split:
- model decides the next semantic step
- nodes and edges decide execution semantics
- interrupts decide when humans re-enter the loop
Agno/AgentOS reading
Agno fits the same principle from a more productized angle:
output_schemamakes structured output first-class@tool(requires_confirmation=True)makes approval-gated execution first-class- AgentOS persists and exposes the approval flow operationally
This is one of the first places where Agno/AgentOS feels more practical than purist. HumanLayer would like you to own the runtime semantics yourself. Agno says: fine, but most teams would rather have a supported approval system.
That is a real tradeoff, not a philosophical failure.
2. Context Is the Runtime
Factors 2, 3, 5, and 12 all point at the same architectural claim:
The best source of truth for an agent is often the serialized history of what has happened so far, not a separate hidden execution object.
This is where the repo gets genuinely interesting.
Most agent discussions stop at prompt engineering or memory. HumanLayer pushes further and says:
- own the prompt
- own the context format
- unify execution state and business state when possible
- think of the agent as a reducer over serialized events
That is a much more systems-oriented framing.
The practical payoff
If your release agent keeps a serialized event log like:
- trigger received
- release metadata loaded
- approval requested
- approval granted
- deploy attempted
- health checks failed
then several hard production features become simpler:
- replay
- debugging
- UI rendering
- forking
- resumability
- summaries and compaction
Factor-by-factor examples
Factor 2: Own Your Prompts
Factor 2 means the planner prompt should be explicit application code, not a hidden bundle of “role” and “goal” defaults.
def build_release_planner_prompt(events: list[dict], environment: str) -> str:
history = "\n".join(f"- {event['type']}: {event['data']}" for event in events[-6:])
return f"""You are a release planner for {environment}.
Rules:
- Never deploy to production without approval.
- Prefer verify before declare success.
History:
{history}
"""
That pattern works the same way in both frameworks because prompt ownership sits above the orchestration layer.
Factor 3: Own Your Context Window
Factor 3 says you should serialize only the events that matter instead of dumping every prior token back into the model.
def event_to_prompt(event: dict) -> str:
return f"<{event['type']}>\n{event['data']}\n</{event['type']}>"
def thread_to_prompt(events: list[dict]) -> str:
keep = {"trigger", "release_context", "approval_response", "error"}
selected = [event for event in events if event["type"] in keep]
return "\n\n".join(event_to_prompt(event) for event in selected)
LangGraph encourages this through explicit state shaping; Agno supports it through custom context assembly plus summaries and history controls.
Factor 5: Unify Execution State and Business State
Factor 5 argues that one event log can often explain both “what happened” and “what the runtime should do next.”
events = [
{"type": "trigger", "data": "deploy release-2026.03.31"},
{"type": "release_context", "data": {"tag": "release-2026.03.31"}},
{"type": "approval_requested", "data": {"channel": "slack"}},
{"type": "approval_response", "data": {"approved": True}},
]
waiting_for_approval = events[-1]["type"] == "approval_requested"
ready_to_deploy = events[-1]["type"] == "approval_response" and events[-1]["data"]["approved"]
LangGraph exposes this style more directly, while Agno tends to wrap more of it inside session and run abstractions.
Factor 12: Make Your Agent a Stateless Reducer
Factor 12 is the most functional reading of the same idea: derive current release state by reducing prior events.
def reduce_release_events(events: list[dict]) -> dict:
state = {"approved": False, "deployed": False, "last_error": None}
for event in events:
if event["type"] == "approval_response":
state["approved"] = event["data"]["approved"]
elif event["type"] == "deploy_result":
state["deployed"] = event["data"]["status"] == "success"
elif event["type"] == "error":
state["last_error"] = event["data"]["message"]
return state
You do not have to implement your system this way, but LangGraph’s state-first model is closer to it than most agent wrappers.
LangGraph reading
This is the strongest LangGraph fit in the entire article.
The persistence docs are explicit that LangGraph saves graph state as checkpoints organized into threads, and that these checkpoints enable human-in-the-loop workflows, memory, time travel, and fault-tolerant execution. The durable execution docs are also explicit that with a checkpointer in place, a workflow can resume from saved progress instead of replaying from zero.
That is very close to the HumanLayer worldview.
LangGraph does not force you into a hidden “agent object with vibes.” It encourages you to model state directly, checkpoint it, reload it by thread_id, and keep progressing from there.
Agno/AgentOS reading
Agno lands on a different but still useful abstraction.
The current docs for Agent.run() describe the runtime building context from system instructions, user input, chat history, user memories, session state, and other relevant inputs. Session summaries then compress long histories into shorter running summaries. AgentOS adds DB-backed sessions, memory, traces, and APIs on top of that.
So Agno absolutely supports the underlying runtime idea. It just packages more of it behind session-centric APIs.
This is the key difference:
- LangGraph says: state is the graph runtime; model it explicitly.
- Agno says: state is the agent/session runtime; we will help manage it.
If you like explicit orchestration, LangGraph feels cleaner. If you want a production service quickly, Agno is often the faster move.
3. Pause/Resume Beats Magical Autonomy
This is where factors 6, 8, and 11 become the real production story.
The source-backed argument is straightforward:
- agents should launch from simple APIs
- they should pause when a human or long-running dependency is needed
- they should resume later from an external trigger
- humans should be reachable through the same runtime surface, not bolted on as an afterthought
This matters more than most “multi-agent” conversations.
A release agent that can choose tools but cannot pause before deploy, wait for approval, and resume later without losing its place is not production-grade. It is a demo with credentials.
Factor-by-factor examples
Factor 6: Launch/Pause/Resume With Simple APIs
Factor 6 means the outside world should be able to start and resume the runtime with small, predictable APIs.
def start_release_run(message: str, thread_id: str) -> dict:
state = {"events": [{"type": "trigger", "data": message}]}
return graph.invoke(state, config={"configurable": {"thread_id": thread_id}})
def resume_release_run(thread_id: str, approved: bool, approver: str) -> dict:
return graph.invoke(
Command(resume={"approved": approved, "approved_by": approver}),
config={"configurable": {"thread_id": thread_id}},
)
In LangGraph this is literally a thread plus Command(resume=...); in Agno the equivalent is session_id, paused runs, and continue_run().
Factor 8: Own Your Control Flow
Factor 8 is the insistence that routing rules should remain visible deterministic code, not opaque framework behavior.
def route_release_step(state: dict) -> str:
if state.get("last_error"):
return "escalate"
if not state.get("context_loaded"):
return "collect_context"
if state["environment"] == "production" and not state.get("approved"):
return "approval_gate"
if not state.get("deployed"):
return "deploy"
return "verify"
This is idiomatic LangGraph and still a useful mental model when wrapping Agno tools or workflow steps.
Factor 11: Trigger From Anywhere
Factor 11 means Slack, cron, and API calls should normalize into the same internal event format before the agent loop sees them.
def normalize_trigger(source: str, payload: dict) -> dict:
return {
"type": "trigger",
"data": {
"source": source,
"requested_by": payload.get("user", source),
"message": payload["message"],
"thread_key": payload.get("thread_id") or payload.get("session_id"),
},
}
That keeps channel-specific glue outside the core runtime whether you are using LangGraph threads or AgentOS sessions.
LangGraph example: explicit state, explicit interrupt, explicit resume
This example stays intentionally close to the 12-factor mindset: the model emits structured intent, deterministic code owns execution, and approval is a real pause point.
from __future__ import annotations
from typing import Any, Literal, NotRequired, TypedDict
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.graph import END, START, StateGraph
from langgraph.types import Command, interrupt
class ReleaseDecision(TypedDict):
intent: Literal[
"collect_release_data",
"request_approval",
"deploy_release",
"post_deploy_check",
"done",
]
reason: str
class ReleaseState(TypedDict):
release_id: str
tag: str
environment: Literal["staging", "production"]
requested_by: str
events: list[dict[str, Any]]
next_action: NotRequired[ReleaseDecision]
approval: NotRequired[dict[str, Any]]
deployment_result: NotRequired[dict[str, Any]]
consecutive_errors: int
def fetch_release_context(release_id: str) -> dict[str, Any]:
# Deterministic tool boundary: this is not model logic.
return {
"release_id": release_id,
"tag": "release-2026.03.31",
"change_ticket": "CHG-8471",
"smoke_tests_passed": True,
}
def deploy_release(tag: str, environment: str) -> dict[str, Any]:
# Replace with your deploy system.
return {"tag": tag, "environment": environment, "status": "success"}
def build_prompt(events: list[dict[str, Any]]) -> str:
rendered = "\n".join(f"- {event['type']}: {event['data']}" for event in events)
return f"""You coordinate software releases.
Here is the execution history so far:
{rendered}
Return the next structured step only.
"""
def planner_node(state: ReleaseState) -> dict[str, Any]:
prompt = build_prompt(state["events"])
# Assume `llm` is an instantiated LangChain chat model client.
decision: ReleaseDecision = llm.with_structured_output(ReleaseDecision).invoke(prompt)
return {
"next_action": decision,
"events": state["events"]
+ [{"type": "planner_decision", "data": decision}],
}
def collect_context_node(state: ReleaseState) -> dict[str, Any]:
context = fetch_release_context(state["release_id"])
return {
"events": state["events"] + [{"type": "release_context", "data": context}],
}
def approval_gate_node(state: ReleaseState) -> dict[str, Any]:
approval = interrupt(
{
"kind": "deploy_approval",
"release_id": state["release_id"],
"tag": state["tag"],
"environment": state["environment"],
"requested_by": state["requested_by"],
"reason": state["next_action"]["reason"],
}
)
return {
"approval": approval,
"events": state["events"] + [{"type": "approval_response", "data": approval}],
}
def deploy_node(state: ReleaseState) -> dict[str, Any]:
try:
result = deploy_release(state["tag"], state["environment"])
return {
"deployment_result": result,
"consecutive_errors": 0,
"events": state["events"] + [{"type": "deploy_result", "data": result}],
}
except Exception as exc:
compact_error = {
"tool": "deploy_release",
"message": str(exc)[:240],
}
return {
"consecutive_errors": state["consecutive_errors"] + 1,
"events": state["events"] + [{"type": "error", "data": compact_error}],
}
def verify_node(state: ReleaseState) -> dict[str, Any]:
verification = {
"environment": state["environment"],
"healthy": True,
}
return {
"events": state["events"] + [{"type": "post_deploy_check_result", "data": verification}],
}
def route_from_planner(state: ReleaseState) -> str:
intent = state["next_action"]["intent"]
if intent == "collect_release_data":
return "collect_context"
if intent == "request_approval":
return "approval_gate"
if intent == "deploy_release":
return "deploy"
if intent == "post_deploy_check":
return "verify"
return END
builder = StateGraph(ReleaseState)
builder.add_node("planner", planner_node)
builder.add_node("collect_context", collect_context_node)
builder.add_node("approval_gate", approval_gate_node)
builder.add_node("deploy", deploy_node)
builder.add_node("verify", verify_node)
builder.add_edge(START, "planner")
builder.add_conditional_edges("planner", route_from_planner)
builder.add_edge("collect_context", "planner")
builder.add_edge("approval_gate", "planner")
builder.add_edge("deploy", "planner")
builder.add_edge("verify", "planner")
graph = builder.compile(checkpointer=InMemorySaver()) # Demo-only saver; use a durable checkpointer in production.
config = {"configurable": {"thread_id": "release-847"}}
initial_state: ReleaseState = {
"release_id": "rel_847",
"tag": "release-2026.03.31",
"environment": "production",
"requested_by": "slack:alice",
"events": [
{
"type": "trigger",
"data": {
"channel": "slack",
"message": "Deploy release-2026.03.31 to production",
},
}
],
"consecutive_errors": 0,
}
# First run pauses at approval_gate_node and persists thread state for this in-process demo.
graph.invoke(initial_state, config=config)
# Later, in the same process; with a durable checkpointer this can also resume from a webhook or admin UI:
graph.invoke(
Command(resume={"approved": True, "approved_by": "sre-oncall"}),
config=config,
)
Why this example matters:
- shared typed state is explicit
- the node/tool boundary is explicit
- approval is a real runtime pause, not assistant prose
- compact errors are just more serialized context
thread_idis the durable pointer
This is very close to the HumanLayer philosophy.
Agno/AgentOS example: session-centric runtime with built-in approvals
Now look at the same scenario through Agno/AgentOS.
from typing import Literal
from agno.agent import Agent
from agno.db.postgres import PostgresDb
from agno.models.openai import OpenAIResponses
from agno.os import AgentOS
from agno.tools import tool
from pydantic import BaseModel, Field
db = PostgresDb(
db_url="postgresql+psycopg://ai:ai@localhost:5532/ai"
)
class ReleaseDecision(BaseModel):
action: Literal["collect_context", "wait_for_approval", "deploy", "verify", "done"]
summary: str
tag: str
environment: Literal["staging", "production"]
risk: Literal["low", "medium", "high"]
rollback_hint: str | None = Field(default=None)
@tool
def fetch_release_context(release_id: str) -> dict:
return {
"release_id": release_id,
"change_ticket": "CHG-8471",
"smoke_tests_passed": True,
"approver_group": "sre-oncall",
}
@tool(requires_confirmation=True)
def deploy_release(tag: str, environment: str) -> dict:
return {"tag": tag, "environment": environment, "status": "success"}
@tool
def run_post_deploy_checks(environment: str) -> dict:
return {"environment": environment, "healthy": True}
release_agent = Agent(
id="release-coordinator",
name="Release Coordinator",
model=OpenAIResponses(id="gpt-5.2"),
db=db,
tools=[fetch_release_context, deploy_release, run_post_deploy_checks],
output_schema=ReleaseDecision,
instructions=[
"Coordinate releases conservatively.",
"Always gather release context before any production action.",
"Treat production deploys as high risk and require confirmation.",
"After deployment, run verification before declaring success.",
],
add_history_to_context=True,
enable_session_summaries=True,
num_history_runs=3,
markdown=True,
)
agent_os = AgentOS(
id="release-agent-os",
agents=[release_agent],
db=db,
)
app = agent_os.get_app()
run_response = release_agent.run(
"Deploy rel_847 to production and verify it afterward.",
user_id="slack:alice",
session_id="release-847",
)
if run_response.is_paused:
for requirement in run_response.active_requirements:
print(requirement.tool.tool_name, requirement.tool.tool_args)
requirement.confirmed = True
final_response = release_agent.continue_run(
run_id=run_response.run_id,
requirements=run_response.requirements,
)
And if you serve the agent through AgentOS, the runtime story becomes even more operational:
# Request enters through the AgentOS API with a stable session_id
curl -X POST http://localhost:7777/agents/release-coordinator/runs \
-F "message=Deploy rel_847 to production" \
-F "user_id=slack:alice" \
-F "session_id=release-847"
# If deploy_release requires confirmation, the run pauses.
# An admin can resolve the approval in the AgentOS control plane
# or through the approvals API, then the run resumes.
Why this example matters:
session_idis the durable conversation handleoutput_schemagives you typed decisionsrequires_confirmation=Truegives you built-in pause points for risky tools- session summaries help keep context from growing without bound
- AgentOS adds APIs, approvals, traces, and an operational surface around the agent
This is not as raw as LangGraph, but it is often a more realistic fit for teams that need the runtime to be a product, not a bespoke orchestration project.
4. Design for Failure and Narrow Scope
Factors 9 and 10 are less glamorous, but they are where many agent systems actually succeed or fail.
HumanLayer’s argument is:
- feed compact, useful error state back into the runtime
- do not let one agent sprawl into a 40-tool general-purpose blob
That is exactly right.
Factor-by-factor examples
Factor 9: Compact Errors Into Context Window
Factor 9 says failures should be compressed into runtime-readable state so the next step can adapt instead of starting blind.
def record_deploy_error(events: list[dict], exc: Exception, retries: int) -> tuple[list[dict], int]:
compact_error = {
"tool": "deploy_release",
"message": str(exc)[:180],
"retryable": retries < 2,
}
return (
events + [{"type": "error", "data": compact_error}],
retries + 1,
)
This pattern fits both frameworks: LangGraph via state updates, Agno via appended context plus a resumed run.
Factor 10: Small, Focused Agents
Factor 10 is less about “multi-agent” branding and more about preserving narrow responsibilities in the release system.
release_components = {
"planner": "decide next release step",
"deployer": "execute deployment only",
"verifier": "run post-deploy checks only",
"escalator": "page humans when risk or failure is high",
}
active_component = release_components["planner"]
In LangGraph these can become nodes or subgraphs; in Agno they can become separate agents, workflow steps, or team members without changing the external product surface.
Compact errors are runtime input, not just logs
If your deploy action fails because the artifact is missing or a health check times out, the worst thing you can do is bury that inside observability only. The next useful step is often shaped by the error itself.
The useful pattern is:
- compact the error
- append it to state
- let the planner decide whether to retry, escalate, or stop
In other words: errors belong in both observability and context.
Small, focused agents still win
This section of the upstream repo is one of the least controversial and most correct. Bigger context, more tools, and broader scope do not just increase cost. They also degrade reliability and make failures harder to localize.
For the release scenario, a practical split is:
- release planner
- deployment executor
- post-deploy verifier
- incident escalator
You do not necessarily need four separately deployed agents. But you should preserve those boundaries in the design, whether as LangGraph subgraphs, Agno teams/workflows, or simple deterministic modules behind one user-facing interface.
A compact operational rule
If a tool can mutate production state, it should usually satisfy all three:
- The intent is typed.
- The action is approval-aware or at least policy-gated.
- The failure mode is serialized back into context in compact form.
That rule alone will improve many real systems more than chasing a fancier orchestration abstraction.
Where LangGraph Fits the 12-Factor Philosophy Best
This section is intentionally editorial.
LangGraph is the better fit when the job is:
- building a durable execution runtime
- modeling explicit state transitions
- preserving a strong distinction between model choice and runtime semantics
- making pause/resume, replay, thread history, and branching first-class
The current docs support that reading directly:
- persistence is built around threads and checkpoints
- interrupts pause execution and resume with external input
- durable execution keeps progress rather than replaying from zero
- subgraphs let you isolate reusable or multi-agent components
That architecture is unusually aligned with the HumanLayer principles.
My stronger take is this:
If you read
12-factor-agentsand feel energized by the idea of owning your runtime semantics, you probably want LangGraph more than a higher-level agent wrapper.
The cost, of course, is that LangGraph will let you be explicit about almost everything, which means you also get to own almost everything:
- state design
- graph boundaries
- idempotency discipline
- serialization choices
- human approval UX
- production APIs around the graph
That is power, but it is also real engineering work.
Where Agno/AgentOS Is More Practical Than Purist
Agno/AgentOS fits a different team shape.
The current docs are explicit that AgentOS is a production runtime and control plane with:
- ready-to-use APIs
- DB-backed sessions, memory, and traces
- approvals and human-in-the-loop flows
- SSE-friendly streaming
- RBAC and request isolation
This matters because many teams do not actually want to construct the perfect explicit runtime from primitives. They want:
- a strong session model
- approval-aware tool execution
- a control plane
- production endpoints
- operational visibility
Agno gives you much more of that out of the box.
This is where I disagree slightly with a purist reading of HumanLayer.
The repo is correct that teams often need deeper control than frameworks expose. But some teams overshoot in the other direction and rebuild orchestration infrastructure that was never their product’s moat.
Agno/AgentOS is often the right answer when:
- you are Python-first
- you want agent teams and workflows served as APIs quickly
- you need sessions, approvals, and traces early
- you are willing to accept a more opinionated runtime in exchange for shipping speed
I would summarize the difference like this:
- LangGraph is better when your hardest problem is runtime semantics.
- Agno/AgentOS is better when your hardest problem is operationalizing agent services.
That is not a small distinction.
Final Take
The most valuable thing about 12-factor-agents is not the number 12.
It is the insistence that reliable agent systems are mostly a question of:
- structured boundaries
- state design
- pause/resume semantics
- explicit control flow
- narrow scope
That is the right battle.
If I had to compress the translation into one sentence, it would be this:
HumanLayer tells you to own the runtime; LangGraph helps you do that directly; Agno/AgentOS helps you ship a managed version of it.
For expert engineers, the decision is not “which framework is best?”
It is:
Do we want to build a runtime, or do we want to run one?
That is the architectural fork hiding underneath most agent-framework debates.
Sources
- HumanLayer: 12-Factor Agents README
- HumanLayer: Factor 1 - Natural Language to Tool Calls
- HumanLayer: Factor 2 - Own your prompts
- HumanLayer: Factor 3 - Own your context window
- HumanLayer: Factor 4 - Tools are just structured outputs
- HumanLayer: Factor 5 - Unify execution state and business state
- HumanLayer: Factor 6 - Launch/Pause/Resume with simple APIs
- HumanLayer: Factor 7 - Contact humans with tool calls
- HumanLayer: Factor 8 - Own your control flow
- HumanLayer: Factor 9 - Compact errors into context window
- HumanLayer: Factor 10 - Small, focused agents
- HumanLayer: Factor 11 - Trigger from anywhere, meet users where they are
- HumanLayer: Factor 12 - Make your agent a stateless reducer
- LangGraph Docs: Persistence
- LangGraph Docs: Durable execution
- LangGraph Docs: Interrupts
- LangGraph Docs: Subgraphs
- Agno Docs: Running Agents
- Agno Docs: Session Summaries
- Agno Docs: Teams Overview
- Agno Docs: Workflows Overview
- Agno Docs: User Confirmation
- AgentOS Docs: What is AgentOS?
- AgentOS Docs: Control Plane
- AgentOS Docs: Approvals
- AgentOS Docs: Human-in-the-Loop Example