Most production agent failures are not really “model failures.” They are state failures.
The agent forgot the user. Two workers disagreed about what happened in the last step. A tool retried after a crash and duplicated a side effect. A chat product felt memory-less because the framework was stateless by default and nobody added persistence. Or the opposite happened: a team built a very stateful orchestration layer for a use case that only needed a simple stateless request-response loop.
This is why “stateful vs stateless agents” is not an academic debate. It is an architecture decision with direct consequences for cost, latency, reliability, debuggability, and team complexity.
I audited the official docs and changelogs for Vercel AI SDK, LangChain/LangGraph, Agno, and AgentOS on March 24, 2026 for this piece. The goal here is not to crown one framework. The goal is to help technical engineers, architects, and PMs decide where state should live in a real system.
TL;DR
- AI SDK is the most stateless-first of the three. It gives you a great tool loop, but you own persistence unless you explicitly add memory providers, memory tools, or a chat/message store.
- LangChain/LangGraph is the most stateful-orchestration-first. Threads, checkpoints, stores, interrupts, and durable execution are first-class concepts.
- Agno v2 + AgentOS takes a hybrid stance: agents, teams, and workflows are stateless runtime objects, but sessions, summaries, memory, and traces live in your database and are served through a production runtime.
- In practice, the strongest production pattern is usually: stateless workers, persisted state.
- If your app is mostly a web product with streaming chat, start with AI SDK. If you need resumable threads and durable execution, look hard at LangChain/LangGraph. If you want a Python-first, self-hosted agent runtime with sessions and APIs built in, Agno/AgentOS is compelling.
What You Will Learn Here
- What “stateful” and “stateless” actually mean in agent systems.
- Why the same product can be stateful to the user while remaining stateless at the runtime layer.
- How AI SDK, LangChain, and Agno/AgentOS each model state differently.
- What these differences mean for retries, resumability, horizontal scaling, and observability.
- Concrete implementation patterns with code examples and ASCII flows.
- A practical decision framework for engineers and PMs planning production systems.
The Research Audit: What the Docs Clearly Support
Here is the cleanest reading of the official docs.
1. AI SDK does not assume persistent memory by default
The AI SDK docs are explicit: without memory, every conversation starts fresh. The framework gives you a reusable ToolLoopAgent, tool calling, loop control, streaming, testing helpers, and UI integration, but long-lived memory is something you add deliberately.
The docs show three explicit memory approaches:
- provider-defined tools
- external memory providers
- custom memory tools
The UI docs also show message persistence as an application concern: store UIMessage[], reload them, validate them, and save them in onFinish.
My inference: AI SDK is best understood as a stateless agent loop toolkit plus optional persistence patterns, not as a built-in durable runtime.
2. LangChain/LangGraph treats thread state as a first-class primitive
LangChain’s current docs say short-term memory is part of the agent’s state and is persisted via a checkpointer. A thread ID identifies the conversation, and the runtime reloads that state on future invocations. The docs are also very explicit about durable execution, resumability, and the need to wrap side effects carefully for replay safety.
My inference: LangChain/LangGraph is the strongest fit when state is central to the product, not just an implementation detail.
3. Agno v2 made the runtime stateless, while keeping sessions and memory persistent
Agno’s v2 changelog is unusually clear here: agents, teams, and workflows are now fully stateless. At the same time, the docs say that once you configure a database, Agno automatically stores conversation history, session state, run metadata, and optionally tool calls and media. Session summaries, history management, and session IDs are all built-in concepts.
AgentOS then serves those agents and workflows as a production API with sessions, memory, knowledge, traces, RBAC, and background hooks.
My inference: Agno/AgentOS is architected around stateless runtime objects backed by persistent session infrastructure. That is a very production-friendly split.
First, Fix the Vocabulary
Teams often mix up three different meanings of “state.”
1. Model-call state
Every LLM call is fundamentally request-based. The model only knows what is inside the current prompt, tools, and provider-side context features.
2. Conversation state
This is the running memory of a thread:
- prior messages
- tool results
- summaries
- user preferences
- workflow variables
3. Runtime state
This is what lets a system survive real production conditions:
- retries
- crashes
- restarts
- human approval pauses
- multi-worker execution
This distinction matters because a system can be:
- stateless at the runtime layer
- stateful at the conversation layer
- and still feel fully continuous to the user
That is actually the production shape I trust most.
The Architecture Pattern I Trust Most
User / API Client
|
v
+-------------------------+
| Stateless worker |
| API route / pod / task |
+-----------+-------------+
|
| load thread/session/workflow state
v
+-------------------------+
| Shared state layer |
| Postgres / Redis / DB |
| checkpoints / sessions |
+-----------+-------------+
|
| run model + tools
v
+-------------------------+
| Save updated state |
| messages / summary / |
| checkpoints / metrics |
+-------------------------+
That gives you:
- horizontal scaling without sticky sessions
- debuggable persistence
- better failure recovery
- cleaner separation between orchestration and infrastructure
The frameworks in this article mostly differ in how much of that architecture they give you out of the box.
The Three Frameworks, in One Table
| Framework | Default posture | Where state usually lives | Best fit |
|---|---|---|---|
| AI SDK | Stateless-first | Your app DB, message store, or external memory tool/provider | Web apps and chat products that already have a TypeScript app layer |
| LangChain / LangGraph | Stateful thread/workflow orchestration | Checkpointer + store + thread ID | Long-running, resumable, multi-step agents with durable execution |
| Agno / AgentOS | Stateless runtime, stateful sessions | Agno DB-backed sessions/memories + AgentOS runtime APIs | Python teams building self-hosted agent APIs and workflows |
Vercel AI SDK: Stateless First, State Is Your Job
The AI SDK gives you a very clean agent loop abstraction. That is its superpower.
You define a ToolLoopAgent once, wire in tools, set stopWhen, and use it anywhere in your app. But that does not automatically mean the agent remembers anything across requests.
The stateless default
This is the simplest shape:
import { ToolLoopAgent, stepCountIs, tool } from "ai";
import { z } from "zod";
const lookupOrder = tool({
description: "Look up an order by ID",
inputSchema: z.object({
orderId: z.string(),
}),
execute: async ({ orderId }) => {
return { orderId, status: "shipped" };
},
});
const supportAgent = new ToolLoopAgent({
model: "openai/gpt-5",
instructions: "You are a support triage agent.",
tools: { lookupOrder },
stopWhen: stepCountIs(8),
});
export async function POST(req: Request) {
const { prompt } = await req.json();
const result = await supportAgent.generate({ prompt });
return Response.json({
text: result.text,
steps: result.steps,
});
}
This is clean and perfectly useful. It is also stateless across requests. If the same user returns later, this agent knows nothing unless you rehydrate context yourself.
Turning AI SDK into a stateful product
The official persistence pattern is to store and reload chat messages explicitly:
import { openai } from "@ai-sdk/openai";
import {
convertToModelMessages,
streamText,
type UIMessage,
} from "ai";
import { loadChat, saveChat } from "./chat-store";
export async function POST(req: Request) {
const { message, id }: { message: UIMessage; id: string } = await req.json();
const previousMessages = await loadChat(id);
const messages = [...previousMessages, message];
const result = streamText({
model: openai("gpt-5-mini"),
messages: await convertToModelMessages(messages),
});
return result.toUIMessageStreamResponse({
originalMessages: messages,
onFinish: ({ messages }) => {
saveChat({ chatId: id, messages });
},
});
}
Now the product behaves statefully because the chat has continuity, but the runtime can still be stateless because any worker can load the prior UIMessage[] from storage.
AI SDK production read
Browser / app
|
v
API route
|
+--> load messages from DB
+--> run ToolLoopAgent / streamText
+--> save updated messages
|
v
stream response
Where AI SDK shines
- You already have a Next.js or TypeScript app and want the agent loop embedded in your existing architecture.
- You want full control over what gets persisted and when.
- You want a framework that stays close to web app patterns instead of imposing a large runtime model.
Where AI SDK gets risky
- Teams sometimes assume “agent” implies “memory.” In AI SDK, it does not.
- If you bolt on memory too late, you end up with a lot of app-specific persistence code scattered across routes and components.
- If you need resumable human approval flows, replay-aware side effects, or workflow-level recovery, you will probably build quite a bit yourself or pair AI SDK with another workflow layer.
LangChain / LangGraph: Stateful Threads and Durable Execution
LangChain’s current architecture is different in spirit. The key abstraction is not just “a tool loop.” It is graph state over time.
The docs describe short-term memory as part of the agent’s state, persisted with a checkpointer. A thread becomes the durable identity of the conversation. That is a much stronger opinion about state than AI SDK takes.
Stateful short-term memory with a thread
import { createAgent } from "langchain";
import { MemorySaver } from "@langchain/langgraph";
const checkpointer = new MemorySaver();
const agent = createAgent({
model: "openai:gpt-5",
tools: [],
checkpointer,
});
await agent.invoke(
{
messages: [{ role: "user", content: "Hi, I am Bob." }],
},
{
configurable: { thread_id: "support-thread-1" },
}
);
await agent.invoke(
{
messages: [{ role: "user", content: "What's my name?" }],
},
{
configurable: { thread_id: "support-thread-1" },
}
);
Reusing the same thread_id is not a convenience. It is the core state mechanism.
Production checkpointer
import { createAgent } from "langchain";
import { PostgresSaver } from "@langchain/langgraph-checkpoint-postgres";
const DB_URI =
"postgresql://postgres:postgres@localhost:5442/postgres?sslmode=disable";
const checkpointer = PostgresSaver.fromConnString(DB_URI);
const agent = createAgent({
model: "openai:gpt-5",
tools: [],
checkpointer,
});
The docs also separate short-term thread state from long-term memory via a store. That matters because “same thread” and “same user across threads” are not the same problem.
Durable execution is the real production differentiator
This is where LangGraph moves beyond chat memory.
The durable execution docs explicitly say:
- checkpointing enables resume
- workflows should be deterministic and idempotent
- side effects should be wrapped carefully
- resumed workflows replay from a safe starting point, not from an exact line of code
That is exactly the kind of detail architects need to care about.
LangChain production read
Client
|
v
LangChain agent
|
+--> load thread state via checkpointer
+--> run model step
+--> run tool step(s)
+--> checkpoint after step(s)
|
v
resume later with same thread_id
When LangChain is the right answer
- You need pause/resume, interrupts, or human-in-the-loop review.
- You care about durable execution and explicit replay semantics.
- The conversation thread itself is a product primitive, not just a UI convenience.
- You want memory, state, and workflow control to be part of the framework instead of custom app code.
Where teams get into trouble
- LangChain’s power can tempt teams into over-orchestrating simple use cases.
- Once you adopt thread state and replay semantics, you also inherit the need to reason carefully about idempotency and side effects.
- It is easy to confuse “we can persist everything” with “we should persist everything.” Long threads still need trimming, summaries, or deletion strategies.
Agno + AgentOS: Stateless Runtime, Stateful Sessions
Agno’s recent architecture is one of the most interesting in the current ecosystem because it is very explicit about the split between runtime statelessness and persistent session state.
The v2 changelog says agents, teams, and workflows are now fully stateless. At the same time, the session docs say that once you provide a database, Agno automatically stores:
- conversation history
- session state
- run metadata
- optional tool calls
- optional media
That is a strong production pattern.
A stateful product with a stateless agent object
from agno.agent import Agent
from agno.db.postgres import PostgresDb
from agno.models.openai import OpenAIResponses
db = PostgresDb(db_url="postgresql+psycopg://ai:ai@localhost:5532/ai")
support_agent = Agent(
name="support-agent",
model=OpenAIResponses(id="gpt-5.2"),
db=db,
add_history_to_context=True,
num_history_runs=3,
enable_session_summaries=True,
)
support_agent.run(
"My name is Alice and I need help with order ORD-9001",
session_id="ticket_9001",
user_id="alice@example.com",
)
support_agent.run(
"What order am I asking about?",
session_id="ticket_9001",
user_id="alice@example.com",
)
This feels stateful to the user because the same session_id rehydrates the conversation. But the runtime object itself is not the system of record.
Exposing it as a production API with AgentOS
from agno.os import AgentOS
agent_os = AgentOS(
agents=[support_agent],
run_hooks_in_background=True,
)
app = agent_os.get_app()
Now you have a self-hosted runtime surface for agents, sessions, traces, and operational features.
Agno production read
Client / product
|
v
AgentOS API
|
+--> load session from DB
+--> run stateless agent/workflow
+--> persist messages, state, metrics, summary
|
v
response + operational trace
The important nuance
Agno’s session caching docs explicitly say in-memory caching is for development and testing and is not recommended for production use.
That is a good sign. It reinforces the intended architecture:
- DB-backed persistence for truth
- stateless runtime for scale
- session features for continuity
AgentOS adds operational structure
AgentOS is not just a wrapper. The docs position it as the production runtime and control plane for multi-agent systems, with:
- API endpoints
- SSE-compatible streaming
- sessions, memory, knowledge, traces in your DB
- JWT/RBAC
- background hooks
One subtle but important production detail: background hooks are useful for logging, analytics, notifications, and non-critical processing, but the docs explicitly say they are not suitable for guardrails because they execute after the response is sent.
That is the kind of operational honesty I like seeing in framework docs.
When Agno/AgentOS is the right answer
- You want a Python-first agent platform with batteries included for sessions and runtime APIs.
- You want a framework that embraces stateless runtime objects while still giving you structured conversation continuity.
- You are building teams, workflows, or self-hosted internal agent services.
Where teams get into trouble
- If your use case is really just a simple chat route inside a web app, AgentOS can be more runtime than you need.
- If you rely on session caching instead of durable storage, you are working against the framework’s intended production model.
- You still need good session ID design, tenancy boundaries, and state compaction strategy.
The Real Comparison: Where Does State Live?
This is the question I would ask in every architecture review.
AI SDK
State mostly lives in:
- your frontend chat transport
- your API route
- your database
- optional memory tools/providers
The framework is not trying to be your durable runtime.
LangChain / LangGraph
State mostly lives in:
- thread checkpoints
- graph state
- long-term stores
- replayable workflow execution
The framework assumes state is central.
Agno / AgentOS
State mostly lives in:
- DB-backed sessions
- session history and summaries
- optional memory and knowledge
- AgentOS runtime APIs
The runtime layer stays stateless; the session layer is where continuity lives.
My Practical Decision Framework
If I were advising a team today, I would use this rule set.
Choose a stateless-first design when:
- each request can be answered from current input plus a small retrieved context
- retries do not need resume semantics
- your product is mainly a web app, not a workflow engine
- you want the simplest horizontal-scaling story
Choose a thread-stateful design when:
- the same conversation must evolve over many turns
- tool outputs affect later decisions
- human approvals pause and resume the same task
- you need auditability for what the agent knew at each step
Choose workflow durability when:
- tasks can run for minutes or hours
- failures and restarts are normal
- tool side effects are meaningful
- you need replay-safe orchestration, not just chat history
The Best Production Pattern Is Usually Hybrid
This is the punchline.
The mature production architecture is usually not:
- “everything stateless”
and it is usually not:
- “keep all state inside a sticky long-lived worker”
It is this:
Stateless compute
+
Durable shared state
+
Clear replay rules
+
Explicit session/thread IDs
That is why all three frameworks can work in serious systems, even though they start from different assumptions.
What I Would Recommend by Use Case
1. Customer support copilot inside a web app
Start with AI SDK if your team already lives in TypeScript and React. Keep the runtime stateless, persist chat messages yourself, and only add memory beyond the thread if the product truly needs it.
2. Research or operations agent that pauses, resumes, and calls many tools
Reach for LangChain/LangGraph if checkpointing, replay, and thread semantics are part of the core workflow.
3. Internal platform for multiple agent APIs and workflows
Use Agno + AgentOS if you want a Python-native runtime with built-in session management and production-serving concerns already modeled.
Final Take
The most useful question is not “Should my agent be stateful or stateless?”
The better question is:
Which parts of the system need memory, and which parts should stay disposable?
That leads to better architecture decisions.
AI SDK says: keep the loop simple and let the app own persistence.
LangChain says: model threads, checkpoints, and durable execution explicitly.
Agno/AgentOS says: keep runtime objects stateless, but persist sessions and serve them through a production API.
Those are all defensible choices. The right one depends on whether your main problem is:
- web app integration
- workflow durability
- or agent runtime operations
If you get that answer right, the state model usually follows.
Sources
- Vercel AI SDK, Building Agents
- Vercel AI SDK, Memory
- Vercel AI SDK, Chatbot Message Persistence
- Vercel Knowledge Base, How to build AI agents with Vercel and the AI SDK
- LangChain JS Docs, Agents
- LangChain JS Docs, Short-term memory
- LangGraph JS Docs, Memory
- LangGraph JS Docs, Persistence
- LangGraph Python Docs, Durable execution
- Agno v2.0 Changelog
- Agno, Persisting Sessions
- Agno, Session Management
- Agno, History Management
- Agno, Session Summaries
- AgentOS, What is AgentOS?
- AgentOS, Background Hooks