Most engineers first meet AI agents through prompts, tool calling, and a quick demo UI. That is enough to build something interesting, but not enough to build something you can trust in production. The official guidance from OpenAI, Anthropic, Google Cloud, and Microsoft all point in the same direction: production-grade agent engineering is a systems discipline, not a prompt trick. It spans workflow design, context management, tool contracts, memory, approvals, evaluation, tracing, governance, security, reliability, cost, and operations (OpenAI AgentKit, Anthropic context engineering, Google Cloud architecture guidance, Microsoft Agent Framework overview).
TL;DR
- Building agents is broader than prompts plus tool calling. The hard part is designing the runtime, context, tools, guardrails, evals, observability, and operating model around the model.
- The most-missed topics are context engineering, evals, observability, human approvals, safety/security, reliability engineering, and business metrics.
- The best study order is: foundations -> single-agent loops -> context and memory -> tool design -> UI and approvals -> evals -> tracing -> safety and reliability -> deployment and cost -> multi-agent.
- Start with a single agent and a strong eval harness. OpenAI, Google Cloud, and Microsoft all explicitly recommend starting simple and only introducing more orchestration when the task actually demands it (OpenAI practical guide, Google Cloud architecture guidance, Microsoft Agent Framework overview).
What you’ll learn here
- How to distinguish chatbots, tool-using agents, workflow-based systems, and long-running or multi-agent systems
- The full topic map for modern agent engineering
- Which topics your current list already covers well, and which important gaps remain
- A phased learning roadmap and a 14-week study plan
- Three portfolio projects that prove real production skills instead of only demo skills
A simple mental model
Chatbot
-> single-turn or short-memory conversation
Tool-using agent
-> model decides which tool to call next
Workflow-based agent system
-> explicit orchestration + tool calls + approvals + branching
Long-running / multi-agent system
-> state, memory, background work, delegation, tracing, governance
OpenAI frames the jump clearly: workflows follow predefined steps, while agents start with a goal, plan, use tools, adapt, and request clarification when needed (OpenAI business leader guide). Microsoft makes the same distinction and adds a useful practical rule: if you can solve the task with a normal function or workflow, do that first (Microsoft Agent Framework overview).
A production agent stack
User / Trigger
|
v
UI stream or API request
|
v
Agent runtime loop
|
+--> context loader (state, retrieval, memory)
+--> model reasoning / planning
+--> policy + approval gate
+--> tools / APIs / MCP / connectors
+--> traces, eval hooks, logs, metrics
|
v
Final response or handoff to human
This is why agent engineering is broader than prompt engineering. Anthropic explicitly argues that the center of gravity has shifted from prompt engineering to context engineering, meaning the whole configuration of state, tools, examples, memory, and retrieved information that reaches the model at each step (Anthropic context engineering).
A tiny runtime you should understand before using frameworks
type StepResult = {
text?: string;
toolCalls?: Array<{ name: string; args: unknown }>;
needsApproval?: boolean;
};
async function runAgent(goal: string) {
const trace = [];
let state = await loadConversationState(goal);
for (let step = 0; step < 8; step++) {
const context = await buildContext(state);
const result: StepResult = await model.planAct(context);
trace.push({ step, result });
if (result.needsApproval) {
return requestHumanApproval(trace);
}
if (!result.toolCalls?.length) {
await saveTrace(trace);
return result.text ?? "No answer";
}
const toolResults = await runToolsSafely(result.toolCalls);
state = updateState(state, toolResults);
}
return escalateToHuman("step_limit_exceeded");
}
You do not need to ship this exact loop by hand forever. But you do need to understand it. OpenAI, Anthropic, Google Cloud, and Microsoft all expose different frameworks and managed runtimes, yet they all keep the same fundamentals: state, tools, memory, approvals, observability, and explicit control over when the loop should stop (OpenAI practical guide, Anthropic writing tools, Google Cloud ADK docs, Microsoft Agent Framework overview).
1. Executive summary
Building agents is broader than prompts and tool calling because the agent is only one piece of the system. Production systems also need state management, memory, retrieval, tool contracts, UI streaming, approvals, evaluation, tracing, safety, reliability, and governance. OpenAI now splits those concerns across Agent Builder, Evals, trace grading, conversation state, background mode, and safety guidance; Google Cloud does the same with architecture components, ADK, sessions, Memory Bank, tracing, logging, monitoring, and access control; Microsoft separates agents, workflows, observability, governance, and AI platform controls; Anthropic explicitly reframes the problem as context engineering, not just better prompts (OpenAI AgentKit, Google Cloud architecture guidance, Microsoft governance for AI agents, Anthropic context engineering).
The most important missing topics are usually:
- Context engineering: what the model sees each turn matters more than endlessly rewriting a system prompt (Anthropic context engineering).
- Evals and trace analysis: without them, teams mistake “worked in a demo” for “works reliably” (OpenAI trace grading, Microsoft Foundry evaluation results).
- Observability: agent systems need logs, traces, and metrics at the tool-call and workflow level, not just API success/failure (Microsoft observability, Google Cloud ADK docs).
- Safety, security, and governance: prompt injection, over-broad permissions, risky MCP integrations, and untracked production agents are operational problems, not optional extras (OpenAI agent safety, OpenAI MCP and connectors, Microsoft governance for AI agents).
- Reliability engineering: retries, timeouts, idempotency, backpressure, cost ceilings, and fallback behavior are where real systems either survive or fail.
The highest-level lesson is simple: learn to build one good single-agent system with strong context, tools, evals, and observability before you chase multi-agent complexity. OpenAI recommends starting with a single agent and evolving only when needed, Google Cloud calls single-agent systems the effective starting point, and Microsoft says to prefer a workflow or even a plain function when that is enough (OpenAI practical guide, Google Cloud architecture guidance, Microsoft Agent Framework overview).
2. Complete topic map
Foundations
- Model capabilities and limits: know what models are good at, where they drift, and what kinds of reasoning, latency, and tool use they support.
- System design basics: queues, retries, idempotency, auth, rate limits, and event-driven design matter as much here as in any backend system.
- Workflow vs agent thinking: understand when a rule-based workflow is enough and when adaptive planning is worth it (OpenAI business leader guide, Microsoft Agent Framework overview).
Single-agent basics
- Agent loop: prompt -> tool decision -> tool result -> next step -> stop condition.
- Tool use fundamentals: function schemas, tool descriptions, result handling, step limits, and failure exits.
- Statefulness basics: learn both stateless requests and session-based or threaded continuations (OpenAI conversation state).
Context engineering
- Prompt structure: clear sections, examples, output formats, and constraints.
- Context budgeting: keep only the highest-signal tokens in play.
- Runtime context loading: decide what to preload and what to fetch just in time.
- Compaction and summarization: maintain coherence over long tasks without dragging full history forever (Anthropic context engineering).
Memory and retrieval
- Short-term memory: conversation/session state for the current task.
- Long-term memory: cross-session user preferences, work state, and task history.
- Retrieval design: when to preload memories, when to let the model call a retrieval tool, and how to avoid stale or noisy recall.
- Hybrid memory strategies: combine externalized state, retrieval, and note-taking instead of dumping everything into one prompt (Google Cloud Memory Bank, Anthropic context engineering).
Tooling and integrations
- Tool contract design: make tools self-contained, distinct, ergonomic, and easy for the model to choose correctly.
- Integration patterns: raw function tools, API wrappers, MCP, connectors, agent-as-a-tool, and API gateways.
- Operational integration quality: auth, scopes, rate limits, retries, idempotency, monitoring, and audit logging (Anthropic writing tools, Google Cloud architecture guidance, OpenAI MCP and connectors).
UI and streaming UX
- Streaming responses: tokens, tool events, partial updates, and step progress.
- Agent-native UI: approvals, inline actions, stateful chat, and structured widgets rather than plain text bubbles.
- Realtime transports: WebSocket and WebRTC matter when latency is part of the product experience (OpenAI AgentKit, OpenAI voice agents, Google Cloud architecture guidance).
Human-in-the-loop workflows
- Approval nodes: when users must confirm reads, writes, purchases, messages, or risky actions.
- Escalation design: define clear handoff points to humans.
- Recovery: allow pause, resume, retry, edit, and override.
- Reviewability: make the agent explain what it wants to do before it does it (OpenAI practical guide, OpenAI agent safety).
Evaluations and testing
- Deterministic tests: tool wrappers, parsers, policies, serializers, and business logic.
- Scenario tests: multi-step tasks with expected outcomes.
- LLM-as-judge and graders: useful, but should be calibrated and combined with deterministic checks.
- Offline and online evaluation: benchmark before launch, then keep scoring production traces after launch.
- Regression discipline: compare runs statistically, not by “felt better” (OpenAI trace grading, Microsoft Foundry evaluation results, Anthropic writing tools).
Observability and tracing
- Trace every step: model decisions, tool calls, latencies, retries, approvals, failures, and outcomes.
- Centralize logs and metrics: do not bury agent behavior in app logs only.
- Semantic conventions: use OpenTelemetry-style traces when possible.
- Debugging loops: inspect traces to see why the agent chose the wrong tool or went down the wrong branch (Microsoft observability, OpenAI trace grading).
Safety, security, and governance
- Prompt injection defense: assume external inputs are hostile.
- Least privilege: limit tool scopes, use approvals, validate inputs, and isolate risky actions.
- Secure integrations: prefer trusted MCP servers and review data flows to third parties.
- Organizational governance: inventory agents, centralize logs, tag costs, and define ownership and lifecycle policies (OpenAI agent safety, OpenAI MCP and connectors, Microsoft governance for AI agents).
Reliability engineering
- Failure handling: tool failures, partial failures, retries, dead letters, fallback models, and safe abort paths.
- Operational SLOs: latency, step count, failure rate, escalation rate, and unit economics.
- State correctness: idempotency, deduplication, session consistency, and replay safety.
- Controlled autonomy: constrain what the agent can do when confidence is low or dependencies are flaky.
Multi-agent orchestration
- Real need first: split into multiple agents only when the task genuinely benefits from specialization, isolation, or parallel exploration.
- Delegation patterns: manager-worker, agent-as-tool, orchestrator-router, and Agent2Agent (A2A)-based collaboration.
- Boundary design: each subagent should have a clear role, context budget, toolset, and permission surface.
- Coordination overhead: multi-agent systems add evaluation, security, latency, and cost complexity (Google Cloud architecture guidance, Anthropic subagents).
Cost and latency optimization
- Prompt caching: cache stable instructions, tools, and long context where your platform supports it.
- Model mix: route cheaper/faster models to simpler work.
- Retrieval discipline: avoid sending large irrelevant context every turn.
- Runtime optimization: use background jobs, streaming, batching, and async tool execution where appropriate (Anthropic prompt caching, OpenAI voice agents).
Deployment, lifecycle, and product metrics
- Deployment lifecycle: dev, staging, shadow mode, limited rollout, production, rollback.
- Lifecycle ownership: versioning, approvals, inventory, retirement, and compliance checks.
- Product metrics: task completion, containment, escalation, time to resolution, acceptance rate, CSAT, revenue or cost impact.
- Leadership visibility: agents should earn their place by measurable business impact, not novelty (Microsoft governance for AI agents).
3. What is missing from this topic list?
The starting list is:
- Agents and subagents, stateless and stateful agents
- API wrappers, WebSockets, webhooks
- UI streaming integration, tool-calling UI
- Testing scenarios and LLM-as-judge
- Agent observability, tracing and monitoring
- Agent design optimization
What is already strong
- Agents and subagents, stateless and stateful agents: strong coverage of runtime shape and system topology.
- UI streaming integration and tool-calling UI: strong early signal that you are thinking beyond text generation.
- Testing scenarios and LLM-as-judge: strong start on evaluation.
- Observability, tracing, and monitoring: strong start on production readiness.
What is partially covered
- API wrappers, WebSockets, and webhooks: good on transport mechanics, but not enough on tool contracts, auth scopes, approvals, MCP/connectors, error semantics, or enterprise integration patterns.
- Agent design optimization: good umbrella term, but too broad. Right now it hides several topics that deserve their own lanes: context engineering, cost/latency tuning, tool ergonomics, prompt structure, and reliability improvements.
- Agents and subagents: partially covers orchestration, but it does not automatically cover when not to use multi-agent, which is one of the most important practical judgments (Google Cloud architecture guidance, OpenAI practical guide).
What is missing
- Foundations: model behavior, reasoning limits, workflow-vs-agent selection, and backend systems basics.
- Context engineering: probably the single biggest missing topic.
- Memory and retrieval: short-term state, long-term memory, retrieval design, compaction, note-taking.
- Tooling and integrations as a design discipline: tool ergonomics, naming, overlap reduction, MCP governance, auth, rate limiting, and monitoring.
- Human-in-the-loop flows: approvals, escalation, pause/resume, safe handoff.
- Safety, security, and governance: prompt injection, least privilege, external integrations, agent inventory, cost accountability.
- Reliability engineering: retries, timeouts, fallback behavior, idempotency, failure budgets.
- Cost and latency optimization: caching, retrieval minimization, model routing, async/background work.
- Deployment and lifecycle management: staging, shadow mode, rollout, rollback, versioning, ownership.
- Product and business metrics: whether the agent helps the business, not just whether it produces a plausible answer.
What deserves to be split into its own category
Agent design optimizationshould be split into:- context engineering
- tool design
- prompt and output design
- cost and latency optimization
- reliability improvements
Testing scenarios and LLM-as-judgeshould be split into:- deterministic testing
- scenario testing
- offline evals
- online evals
- trace grading
UI streaming integration, tool-calling UIshould be split into:- streaming UX
- human approval flows
- realtime transports
- agent-native UI patterns
Agents and subagentsshould be split into:- single-agent basics
- long-running agents
- multi-agent orchestration
- inter-agent protocols like Agent2Agent (A2A)
4. Learning roadmap
Phase 0. Foundations
- What to learn:
- the difference between chatbots, workflows, LLM-powered workflow steps, and agents
- core backend ideas: queues, retries, auth, rate limits, idempotency, tracing
- the basic agent loop and stop conditions
- Why it matters:
- if you cannot explain when not to use an agent, you will overbuild
- production agent engineering is applied systems engineering, not a prompt hobby
- Suggested mini-projects:
- build a plain workflow that classifies support tickets
- build a single function-calling assistant and compare it to the workflow
- write a short memo: when is the workflow enough, and when do you need an agent?
Phase 1. Single-agent basics
- What to learn:
- tool schemas
- agent loops
- step limits
- session state
- structured outputs
- Why it matters:
- this is the irreducible core; every framework is abstracting this
- Suggested mini-projects:
- support copilot with 3 tools
- local research agent that can search docs and summarize
- “agent as CLI helper” with one approval step
Phase 2. Context engineering
- What to learn:
- prompt structure
- example selection
- compaction
- just-in-time retrieval
- note-taking and memory patterns
- Why it matters:
- most real agent failures are context failures, not model failures
- Suggested mini-projects:
- compare a naive long prompt with a compacted prompt
- add note-taking memory to a research agent
- build a context budget dashboard for each turn
Phase 3. Memory and retrieval
- What to learn:
- short-term state vs long-term memory
- memory preload vs tool-triggered retrieval
- retrieval latency and stale context tradeoffs
- Why it matters:
- stateful agents only work when memory is precise, cheap, and retrievable
- Suggested mini-projects:
- personal preference assistant with cross-session recall
- issue triage agent that remembers prior decisions
- retrieval benchmark: compare preload-all vs retrieve-on-demand
Phase 4. Tooling and integrations
- What to learn:
- designing LLM-friendly tools
- MCP/connectors
- auth scopes
- error handling
- monitoring and logging for tool calls
- Why it matters:
- bad tools make good models look bad
- Suggested mini-projects:
- wrap a real third-party API as tools
- refactor overlapping tools into clearer, higher-signal tools
- add approval policies for sensitive tool calls
Phase 5. UI and human-in-the-loop
- What to learn:
- streaming
- progress updates
- approval UX
- fallback to humans
- asynchronous/background jobs
- Why it matters:
- agent products succeed or fail at the human interface, not only in backend traces
- Suggested mini-projects:
- chat UI with streaming tool status
- approval flow for email sending or ticket updates
- long-running job that can pause, resume, and notify
Phase 6. Evaluations and testing
- What to learn:
- golden sets
- deterministic validators
- LLM-as-judge
- trace grading
- regression comparisons
- Why it matters:
- without evals, you cannot improve safely
- Suggested mini-projects:
- create a 50-case dataset for your support agent
- grade full traces, not just final answers
- compare two prompt or tool versions statistically
Phase 7. Observability, safety, and reliability
- What to learn:
- OpenTelemetry-style traces
- centralized logs
- token and cost metrics
- prompt injection defenses
- least privilege
- retries, timeouts, idempotency
- Why it matters:
- this is the boundary between “cool demo” and “safe to run for customers”
- Suggested mini-projects:
- instrument your agent with traces and dashboards
- simulate tool outages and verify fallback behavior
- add content filtering, approvals, and scope-limited credentials
Phase 8. Deployment, cost, and lifecycle
- What to learn:
- rollout strategies
- staging and shadow mode
- prompt caching
- cost controls
- versioning
- ownership and retirement
- Why it matters:
- production quality is maintained operationally, not only coded once
- Suggested mini-projects:
- deploy an agent with feature flags and rollback
- add token and latency budgets per request
- create an ops dashboard with business KPIs
Phase 9. Multi-agent and long-running systems
- What to learn:
- manager-worker patterns
- subagents
- A2A
- long-horizon context strategies
- specialized permissions and isolation
- Why it matters:
- this is useful only after you can run a strong single-agent system
- Suggested mini-projects:
- research orchestrator with one planner and two specialists
- background incident analysis agent with resumable state
- multi-agent comparison harness measuring whether specialization actually helps
5. 12–16 week study plan
This is a 14-week plan. It is aggressive, but realistic for a working software engineer who wants to actually build.
Week 1
- Focus areas: workflows vs agents, basic agent loop, system design basics
- Deliverables: one-page architecture notes and a toy single-agent loop
- What to read: OpenAI business leader guide, Microsoft Agent Framework overview
- What to build: one workflow and one agent solving the same task
Week 2
- Focus areas: tool schemas, stop conditions, structured outputs, session state
- Deliverables: a single-agent assistant with 2-3 tools
- What to read: OpenAI practical guide, Anthropic writing tools
- What to build: support copilot with calendar/docs/search tools
Week 3
- Focus areas: context engineering basics, prompt structure, examples, output constraints
- Deliverables: before/after prompt and context experiments
- What to read: Anthropic context engineering
- What to build: prompt lab with a simple scorecard
Week 4
- Focus areas: compaction, context budgets, note-taking
- Deliverables: compaction strategy and long-task notes format
- What to read: Anthropic context engineering, OpenAI conversation state
- What to build: long-running research agent that summarizes itself every N steps
Week 5
- Focus areas: short-term memory, long-term memory, preload vs on-demand retrieval
- Deliverables: memory-enabled agent with cross-session recall
- What to read: Google Cloud Memory Bank, Google Cloud architecture guidance
- What to build: preference-aware assistant with long-term memory
Week 6
- Focus areas: tool ergonomics, auth, rate limits, retries, observability at the integration boundary
- Deliverables: one polished tool package with logs and failure handling
- What to read: Anthropic writing tools, OpenAI MCP and connectors
- What to build: tool wrapper for a real SaaS API with approval and retry logic
Week 7
- Focus areas: streaming UX, progress events, agent-native UI
- Deliverables: streaming interface with tool-progress timeline
- What to read: OpenAI AgentKit, Google Cloud architecture guidance
- What to build: chat UI that shows tool state, not just final text
Week 8
- Focus areas: human approvals, escalation, pause/resume
- Deliverables: approval policy matrix and HITL flow
- What to read: OpenAI practical guide, OpenAI agent safety
- What to build: agent that drafts an email or ticket update but requires human approval to send
Week 9
- Focus areas: deterministic tests, datasets, scenario tests
- Deliverables: 30-50 case eval dataset
- What to read: OpenAI agent evals, Anthropic writing tools
- What to build: eval harness for your existing single-agent project
Week 10
- Focus areas: trace grading, regression analysis, run comparison
- Deliverables: a grading report comparing two prompt/tool versions
- What to read: OpenAI trace grading, Microsoft Foundry evaluation results
- What to build: trace grader that finds where the agent fails, not just whether it fails
Week 11
- Focus areas: tracing, logs, metrics, dashboards
- Deliverables: trace explorer and cost/latency dashboard
- What to read: Microsoft observability, Google Cloud ADK docs
- What to build: OpenTelemetry instrumentation for model, tool, and approval steps
Week 12
- Focus areas: prompt injection, least privilege, secure external integrations, governance
- Deliverables: threat model for your agent
- What to read: OpenAI agent safety, Microsoft governance for AI agents
- What to build: hardened version of your project with scoped credentials and approval gates
Week 13
- Focus areas: cost and latency optimization, caching, async/background work
- Deliverables: cost budget and latency budget per task type
- What to read: Anthropic prompt caching, OpenAI voice agents
- What to build: optimized runtime with cache-aware prompts and background execution for long tasks
Week 14
- Focus areas: multi-agent only where justified, delegation, A2A, specialized permissions
- Deliverables: side-by-side comparison of single-agent vs multi-agent performance
- What to read: Google Cloud architecture guidance, Anthropic subagents, Microsoft Copilot Studio A2A
- What to build: planner + specialist research system, then measure whether it actually beats the single-agent baseline
6. Portfolio projects
Beginner project: Support Copilot
- Description: a customer support assistant that can read policy docs, look up order status, draft replies, and ask for approval before sending.
- Required components:
- single-agent loop
- 3-4 tools
- session state
- streaming UI
- approval step
- basic eval dataset
- What skills it proves:
- single-agent fundamentals
- tool integration
- human-in-the-loop design
- basic evaluation discipline
Intermediate project: Stateful Research Analyst
- Description: a research agent that works across sessions, remembers preferences, loads relevant docs or notes on demand, and produces structured reports.
- Required components:
- short-term and long-term memory
- retrieval and note-taking
- compaction
- trace logging
- offline evals and trace grading
- cost and latency dashboard
- What skills it proves:
- context engineering
- memory and retrieval design
- observability
- evaluation at the workflow level
Advanced project: Production-style Incident or Operations Agent
- Description: an internal agent that investigates incidents or operational anomalies, gathers logs and telemetry, proposes actions, and routes risky actions through approvals. It can run long tasks in the background and delegate focused subtasks to specialists.
- Required components:
- workflow orchestration
- background execution
- specialist subagents
- approvals and escalation
- robust retries/timeouts/idempotency
- security controls and cost tagging
- business KPI dashboard
- What skills it proves:
- production operations thinking
- long-running agent design
- multi-agent tradeoff judgment
- governance, reliability, and lifecycle management
7. Recommended study order
This is the practical order I would recommend:
- Learn when not to use an agent.
- Build one single-agent loop with a few tools.
- Learn context engineering before chasing bigger architectures.
- Add state, memory, and retrieval.
- Improve tool design and integration quality.
- Add streaming UI and human approvals.
- Build evals before expanding scope.
- Add tracing, metrics, and dashboards.
- Harden security, governance, and reliability.
- Optimize cost and latency.
- Only then experiment with long-running and multi-agent systems.
Why this order works:
- It front-loads judgment, not just implementation.
- It teaches the core failure modes early.
- It avoids the most common mistake in modern agent work: jumping to multi-agent before mastering single-agent quality, context, and evals (OpenAI practical guide, Google Cloud architecture guidance).
8. Best resources by topic
Foundations
Single-agent basics
Context engineering
Memory and retrieval
Tooling and integrations
UI and streaming UX
Human-in-the-loop workflows
Evaluations and testing
Observability and tracing
Safety, security, and governance
- OpenAI agent safety
- OpenAI MCP and connectors
- Microsoft governance for AI agents
- Microsoft AI platform governance
Reliability engineering
Multi-agent orchestration
Cost and latency optimization
Deployment, lifecycle, and product metrics
9. Final conclusion
The biggest lessons are not glamorous, but they are what matter:
- Start simple. A strong single-agent system beats a premature multi-agent design most of the time.
- Context engineering matters more than most teams expect. Prompting is only one part of the problem.
- Evals and observability are essential. If you cannot measure behavior and inspect traces, you cannot improve safely.
- Security and reliability are underestimated. Prompt injection, over-broad permissions, missing approvals, retries, and fallback behavior are first-class engineering work.
- Multi-agent is not the first step. It is an advanced optimization for specialization, isolation, or parallel exploration, not a default architecture.
If I were guiding a team from zero, I would insist on this path:
- Build one narrow agent that solves one expensive workflow.
- Give it a few excellent tools, not many mediocre ones.
- Add evals and tracing before adding more autonomy.
- Add memory only when the product truly needs state across turns or sessions.
- Add multi-agent only after you can prove the single-agent baseline is understood, measured, and limited.
That is how you move from “we built a cool demo” to “we know how to ship agents that help the business.”
Sources
OpenAI
- A practical guide to building agents
- A business leader’s guide to working with agents
- Introducing AgentKit
- Agent evals
- Trace grading
- Conversation state
- Safety in building agents
- MCP and connectors
- Voice agents
Anthropic
- Effective context engineering for AI agents
- Writing effective tools for AI agents
- Prompt caching
- Reduce hallucinations
- Create custom subagents
Google Cloud
- Choose your agentic AI architecture components
- Develop an Agent Development Kit agent
- Quickstart with Agent Development Kit Memory Bank