Modern Agent Engineering

Stateful vs Stateless Agents in Production: AI SDK, LangChain, and Agno/AgentOS

A source-audited, architecture-first guide to deciding where agent state should live in production, with code examples in Vercel AI SDK, LangChain, and Agno/AgentOS.

16 min read

Most production agent failures are not really “model failures.” They are state failures.

The agent forgot the user. Two workers disagreed about what happened in the last step. A tool retried after a crash and duplicated a side effect. A chat product felt memory-less because the framework was stateless by default and nobody added persistence. Or the opposite happened: a team built a very stateful orchestration layer for a use case that only needed a simple stateless request-response loop.

This is why “stateful vs stateless agents” is not an academic debate. It is an architecture decision with direct consequences for cost, latency, reliability, debuggability, and team complexity.

I audited the official docs and changelogs for Vercel AI SDK, LangChain/LangGraph, Agno, and AgentOS on March 24, 2026 for this piece. The goal here is not to crown one framework. The goal is to help technical engineers, architects, and PMs decide where state should live in a real system.

TL;DR

  • AI SDK is the most stateless-first of the three. It gives you a great tool loop, but you own persistence unless you explicitly add memory providers, memory tools, or a chat/message store.
  • LangChain/LangGraph is the most stateful-orchestration-first. Threads, checkpoints, stores, interrupts, and durable execution are first-class concepts.
  • Agno v2 + AgentOS takes a hybrid stance: agents, teams, and workflows are stateless runtime objects, but sessions, summaries, memory, and traces live in your database and are served through a production runtime.
  • In practice, the strongest production pattern is usually: stateless workers, persisted state.
  • If your app is mostly a web product with streaming chat, start with AI SDK. If you need resumable threads and durable execution, look hard at LangChain/LangGraph. If you want a Python-first, self-hosted agent runtime with sessions and APIs built in, Agno/AgentOS is compelling.

What You Will Learn Here

  • What “stateful” and “stateless” actually mean in agent systems.
  • Why the same product can be stateful to the user while remaining stateless at the runtime layer.
  • How AI SDK, LangChain, and Agno/AgentOS each model state differently.
  • What these differences mean for retries, resumability, horizontal scaling, and observability.
  • Concrete implementation patterns with code examples and ASCII flows.
  • A practical decision framework for engineers and PMs planning production systems.

The Research Audit: What the Docs Clearly Support

Here is the cleanest reading of the official docs.

1. AI SDK does not assume persistent memory by default

The AI SDK docs are explicit: without memory, every conversation starts fresh. The framework gives you a reusable ToolLoopAgent, tool calling, loop control, streaming, testing helpers, and UI integration, but long-lived memory is something you add deliberately.

The docs show three explicit memory approaches:

  • provider-defined tools
  • external memory providers
  • custom memory tools

The UI docs also show message persistence as an application concern: store UIMessage[], reload them, validate them, and save them in onFinish.

My inference: AI SDK is best understood as a stateless agent loop toolkit plus optional persistence patterns, not as a built-in durable runtime.

2. LangChain/LangGraph treats thread state as a first-class primitive

LangChain’s current docs say short-term memory is part of the agent’s state and is persisted via a checkpointer. A thread ID identifies the conversation, and the runtime reloads that state on future invocations. The docs are also very explicit about durable execution, resumability, and the need to wrap side effects carefully for replay safety.

My inference: LangChain/LangGraph is the strongest fit when state is central to the product, not just an implementation detail.

3. Agno v2 made the runtime stateless, while keeping sessions and memory persistent

Agno’s v2 changelog is unusually clear here: agents, teams, and workflows are now fully stateless. At the same time, the docs say that once you configure a database, Agno automatically stores conversation history, session state, run metadata, and optionally tool calls and media. Session summaries, history management, and session IDs are all built-in concepts.

AgentOS then serves those agents and workflows as a production API with sessions, memory, knowledge, traces, RBAC, and background hooks.

My inference: Agno/AgentOS is architected around stateless runtime objects backed by persistent session infrastructure. That is a very production-friendly split.

First, Fix the Vocabulary

Teams often mix up three different meanings of “state.”

1. Model-call state

Every LLM call is fundamentally request-based. The model only knows what is inside the current prompt, tools, and provider-side context features.

2. Conversation state

This is the running memory of a thread:

  • prior messages
  • tool results
  • summaries
  • user preferences
  • workflow variables

3. Runtime state

This is what lets a system survive real production conditions:

  • retries
  • crashes
  • restarts
  • human approval pauses
  • multi-worker execution

This distinction matters because a system can be:

  • stateless at the runtime layer
  • stateful at the conversation layer
  • and still feel fully continuous to the user

That is actually the production shape I trust most.

The Architecture Pattern I Trust Most

User / API Client
      |
      v
+-------------------------+
| Stateless worker        |
| API route / pod / task  |
+-----------+-------------+
            |
            | load thread/session/workflow state
            v
+-------------------------+
| Shared state layer      |
| Postgres / Redis / DB   |
| checkpoints / sessions  |
+-----------+-------------+
            |
            | run model + tools
            v
+-------------------------+
| Save updated state      |
| messages / summary /    |
| checkpoints / metrics   |
+-------------------------+

That gives you:

  • horizontal scaling without sticky sessions
  • debuggable persistence
  • better failure recovery
  • cleaner separation between orchestration and infrastructure

The frameworks in this article mostly differ in how much of that architecture they give you out of the box.

The Three Frameworks, in One Table

FrameworkDefault postureWhere state usually livesBest fit
AI SDKStateless-firstYour app DB, message store, or external memory tool/providerWeb apps and chat products that already have a TypeScript app layer
LangChain / LangGraphStateful thread/workflow orchestrationCheckpointer + store + thread IDLong-running, resumable, multi-step agents with durable execution
Agno / AgentOSStateless runtime, stateful sessionsAgno DB-backed sessions/memories + AgentOS runtime APIsPython teams building self-hosted agent APIs and workflows

Vercel AI SDK: Stateless First, State Is Your Job

The AI SDK gives you a very clean agent loop abstraction. That is its superpower.

You define a ToolLoopAgent once, wire in tools, set stopWhen, and use it anywhere in your app. But that does not automatically mean the agent remembers anything across requests.

The stateless default

This is the simplest shape:

import { ToolLoopAgent, stepCountIs, tool } from "ai";
import { z } from "zod";

const lookupOrder = tool({
  description: "Look up an order by ID",
  inputSchema: z.object({
    orderId: z.string(),
  }),
  execute: async ({ orderId }) => {
    return { orderId, status: "shipped" };
  },
});

const supportAgent = new ToolLoopAgent({
  model: "openai/gpt-5",
  instructions: "You are a support triage agent.",
  tools: { lookupOrder },
  stopWhen: stepCountIs(8),
});

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const result = await supportAgent.generate({ prompt });

  return Response.json({
    text: result.text,
    steps: result.steps,
  });
}

This is clean and perfectly useful. It is also stateless across requests. If the same user returns later, this agent knows nothing unless you rehydrate context yourself.

Turning AI SDK into a stateful product

The official persistence pattern is to store and reload chat messages explicitly:

import { openai } from "@ai-sdk/openai";
import {
  convertToModelMessages,
  streamText,
  type UIMessage,
} from "ai";

import { loadChat, saveChat } from "./chat-store";

export async function POST(req: Request) {
  const { message, id }: { message: UIMessage; id: string } = await req.json();

  const previousMessages = await loadChat(id);
  const messages = [...previousMessages, message];

  const result = streamText({
    model: openai("gpt-5-mini"),
    messages: await convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse({
    originalMessages: messages,
    onFinish: ({ messages }) => {
      saveChat({ chatId: id, messages });
    },
  });
}

Now the product behaves statefully because the chat has continuity, but the runtime can still be stateless because any worker can load the prior UIMessage[] from storage.

AI SDK production read

Browser / app
    |
    v
API route
    |
    +--> load messages from DB
    +--> run ToolLoopAgent / streamText
    +--> save updated messages
    |
    v
stream response

Where AI SDK shines

  • You already have a Next.js or TypeScript app and want the agent loop embedded in your existing architecture.
  • You want full control over what gets persisted and when.
  • You want a framework that stays close to web app patterns instead of imposing a large runtime model.

Where AI SDK gets risky

  • Teams sometimes assume “agent” implies “memory.” In AI SDK, it does not.
  • If you bolt on memory too late, you end up with a lot of app-specific persistence code scattered across routes and components.
  • If you need resumable human approval flows, replay-aware side effects, or workflow-level recovery, you will probably build quite a bit yourself or pair AI SDK with another workflow layer.

LangChain / LangGraph: Stateful Threads and Durable Execution

LangChain’s current architecture is different in spirit. The key abstraction is not just “a tool loop.” It is graph state over time.

The docs describe short-term memory as part of the agent’s state, persisted with a checkpointer. A thread becomes the durable identity of the conversation. That is a much stronger opinion about state than AI SDK takes.

Stateful short-term memory with a thread

import { createAgent } from "langchain";
import { MemorySaver } from "@langchain/langgraph";

const checkpointer = new MemorySaver();

const agent = createAgent({
  model: "openai:gpt-5",
  tools: [],
  checkpointer,
});

await agent.invoke(
  {
    messages: [{ role: "user", content: "Hi, I am Bob." }],
  },
  {
    configurable: { thread_id: "support-thread-1" },
  }
);

await agent.invoke(
  {
    messages: [{ role: "user", content: "What's my name?" }],
  },
  {
    configurable: { thread_id: "support-thread-1" },
  }
);

Reusing the same thread_id is not a convenience. It is the core state mechanism.

Production checkpointer

import { createAgent } from "langchain";
import { PostgresSaver } from "@langchain/langgraph-checkpoint-postgres";

const DB_URI =
  "postgresql://postgres:postgres@localhost:5442/postgres?sslmode=disable";

const checkpointer = PostgresSaver.fromConnString(DB_URI);

const agent = createAgent({
  model: "openai:gpt-5",
  tools: [],
  checkpointer,
});

The docs also separate short-term thread state from long-term memory via a store. That matters because “same thread” and “same user across threads” are not the same problem.

Durable execution is the real production differentiator

This is where LangGraph moves beyond chat memory.

The durable execution docs explicitly say:

  • checkpointing enables resume
  • workflows should be deterministic and idempotent
  • side effects should be wrapped carefully
  • resumed workflows replay from a safe starting point, not from an exact line of code

That is exactly the kind of detail architects need to care about.

LangChain production read

Client
  |
  v
LangChain agent
  |
  +--> load thread state via checkpointer
  +--> run model step
  +--> run tool step(s)
  +--> checkpoint after step(s)
  |
  v
resume later with same thread_id

When LangChain is the right answer

  • You need pause/resume, interrupts, or human-in-the-loop review.
  • You care about durable execution and explicit replay semantics.
  • The conversation thread itself is a product primitive, not just a UI convenience.
  • You want memory, state, and workflow control to be part of the framework instead of custom app code.

Where teams get into trouble

  • LangChain’s power can tempt teams into over-orchestrating simple use cases.
  • Once you adopt thread state and replay semantics, you also inherit the need to reason carefully about idempotency and side effects.
  • It is easy to confuse “we can persist everything” with “we should persist everything.” Long threads still need trimming, summaries, or deletion strategies.

Agno + AgentOS: Stateless Runtime, Stateful Sessions

Agno’s recent architecture is one of the most interesting in the current ecosystem because it is very explicit about the split between runtime statelessness and persistent session state.

The v2 changelog says agents, teams, and workflows are now fully stateless. At the same time, the session docs say that once you provide a database, Agno automatically stores:

  • conversation history
  • session state
  • run metadata
  • optional tool calls
  • optional media

That is a strong production pattern.

A stateful product with a stateless agent object

from agno.agent import Agent
from agno.db.postgres import PostgresDb
from agno.models.openai import OpenAIResponses

db = PostgresDb(db_url="postgresql+psycopg://ai:ai@localhost:5532/ai")

support_agent = Agent(
    name="support-agent",
    model=OpenAIResponses(id="gpt-5.2"),
    db=db,
    add_history_to_context=True,
    num_history_runs=3,
    enable_session_summaries=True,
)

support_agent.run(
    "My name is Alice and I need help with order ORD-9001",
    session_id="ticket_9001",
    user_id="alice@example.com",
)

support_agent.run(
    "What order am I asking about?",
    session_id="ticket_9001",
    user_id="alice@example.com",
)

This feels stateful to the user because the same session_id rehydrates the conversation. But the runtime object itself is not the system of record.

Exposing it as a production API with AgentOS

from agno.os import AgentOS

agent_os = AgentOS(
    agents=[support_agent],
    run_hooks_in_background=True,
)

app = agent_os.get_app()

Now you have a self-hosted runtime surface for agents, sessions, traces, and operational features.

Agno production read

Client / product
      |
      v
AgentOS API
      |
      +--> load session from DB
      +--> run stateless agent/workflow
      +--> persist messages, state, metrics, summary
      |
      v
response + operational trace

The important nuance

Agno’s session caching docs explicitly say in-memory caching is for development and testing and is not recommended for production use.

That is a good sign. It reinforces the intended architecture:

  • DB-backed persistence for truth
  • stateless runtime for scale
  • session features for continuity

AgentOS adds operational structure

AgentOS is not just a wrapper. The docs position it as the production runtime and control plane for multi-agent systems, with:

  • API endpoints
  • SSE-compatible streaming
  • sessions, memory, knowledge, traces in your DB
  • JWT/RBAC
  • background hooks

One subtle but important production detail: background hooks are useful for logging, analytics, notifications, and non-critical processing, but the docs explicitly say they are not suitable for guardrails because they execute after the response is sent.

That is the kind of operational honesty I like seeing in framework docs.

When Agno/AgentOS is the right answer

  • You want a Python-first agent platform with batteries included for sessions and runtime APIs.
  • You want a framework that embraces stateless runtime objects while still giving you structured conversation continuity.
  • You are building teams, workflows, or self-hosted internal agent services.

Where teams get into trouble

  • If your use case is really just a simple chat route inside a web app, AgentOS can be more runtime than you need.
  • If you rely on session caching instead of durable storage, you are working against the framework’s intended production model.
  • You still need good session ID design, tenancy boundaries, and state compaction strategy.

The Real Comparison: Where Does State Live?

This is the question I would ask in every architecture review.

AI SDK

State mostly lives in:

  • your frontend chat transport
  • your API route
  • your database
  • optional memory tools/providers

The framework is not trying to be your durable runtime.

LangChain / LangGraph

State mostly lives in:

  • thread checkpoints
  • graph state
  • long-term stores
  • replayable workflow execution

The framework assumes state is central.

Agno / AgentOS

State mostly lives in:

  • DB-backed sessions
  • session history and summaries
  • optional memory and knowledge
  • AgentOS runtime APIs

The runtime layer stays stateless; the session layer is where continuity lives.

My Practical Decision Framework

If I were advising a team today, I would use this rule set.

Choose a stateless-first design when:

  • each request can be answered from current input plus a small retrieved context
  • retries do not need resume semantics
  • your product is mainly a web app, not a workflow engine
  • you want the simplest horizontal-scaling story

Choose a thread-stateful design when:

  • the same conversation must evolve over many turns
  • tool outputs affect later decisions
  • human approvals pause and resume the same task
  • you need auditability for what the agent knew at each step

Choose workflow durability when:

  • tasks can run for minutes or hours
  • failures and restarts are normal
  • tool side effects are meaningful
  • you need replay-safe orchestration, not just chat history

The Best Production Pattern Is Usually Hybrid

This is the punchline.

The mature production architecture is usually not:

  • “everything stateless”

and it is usually not:

  • “keep all state inside a sticky long-lived worker”

It is this:

Stateless compute
    +
Durable shared state
    +
Clear replay rules
    +
Explicit session/thread IDs

That is why all three frameworks can work in serious systems, even though they start from different assumptions.

What I Would Recommend by Use Case

1. Customer support copilot inside a web app

Start with AI SDK if your team already lives in TypeScript and React. Keep the runtime stateless, persist chat messages yourself, and only add memory beyond the thread if the product truly needs it.

2. Research or operations agent that pauses, resumes, and calls many tools

Reach for LangChain/LangGraph if checkpointing, replay, and thread semantics are part of the core workflow.

3. Internal platform for multiple agent APIs and workflows

Use Agno + AgentOS if you want a Python-native runtime with built-in session management and production-serving concerns already modeled.

Final Take

The most useful question is not “Should my agent be stateful or stateless?”

The better question is:

Which parts of the system need memory, and which parts should stay disposable?

That leads to better architecture decisions.

AI SDK says: keep the loop simple and let the app own persistence.

LangChain says: model threads, checkpoints, and durable execution explicitly.

Agno/AgentOS says: keep runtime objects stateless, but persist sessions and serve them through a production API.

Those are all defensible choices. The right one depends on whether your main problem is:

  • web app integration
  • workflow durability
  • or agent runtime operations

If you get that answer right, the state model usually follows.

Sources