How AI Helps Engineers Evolve and Scale Modern Apps
AI can help teams move faster, but the real unlock is designing agents, skills, evals, traces, and self-correction loops around your app.
Engineering Manager / Technical Lead
A topic hub collecting every article tagged AI Agents. Use it to explore related posts and follow this theme across the site.
35 articles
AI can help teams move faster, but the real unlock is designing agents, skills, evals, traces, and self-correction loops around your app.
A practical ladder for growing a software system from one app and one database into observable, event-driven, permission-aware, AI-ready architecture.
A practical guide to Auth0 Client ID Metadata Documents, Auth for MCP, on-behalf-of token exchange, and secure agent access patterns for modern web apps.
A practical guide to using OpenFGA for fine-grained authorization in SaaS apps, MCP servers, workflow agents, and agent orchestrators.
A source-backed comparison of Agno's latest native eval docs and LangWatch Scenario's simulation-based testing model, with practical guidance on when to use each and how to combine them.
AI agents feel like they can touch every system. The practical answer is not more trust in the model, but database roles, row-level policies, semantic layers, tool scopes, approval gates, and audit trails.
The current evidence does not support trusting model size alone for secure code generation. Secure agentic coding needs threat modeling, constrained tools, scanners, evals, and human approval gates.
There is no truly bulletproof system prompt. But there is a practical engineering standard for making prompts far more robust across Sonnet, Haiku, GPT, and reasoning-style models.
AI agents are the new crawlers. Learn how to signal discoverability, serve machine-readable content, control bot access, and expose capabilities through MCP — with practical code examples.
Most AI code review bots fail for a simple reason: they optimize for visible comments instead of reviewer trust. This guide pulls together current benchmarks, practitioner reports, product limitations, and design patterns for building a code review agent that is fast, quieter, less biased, and less hallucination-prone.
A source-audited, side-by-side guide to choosing between Agno/AgentOS + FastAPI and LangChain/LangGraph + FastAPI for production Python agent backends.
A practical guide for engineers and PMs on how to lead fast iteration cycles for agentic systems, design useful sandbox UIs and APIs, and graduate the best prototypes into production experiments and real betas.
A practical guide to A2UI with real deployments, official sample patterns, and complete code examples that take you from a static renderer to server-driven, interactive agent UI.
Persistent agents are becoming product features, not just backend architecture. Learn how Dispatch, Cowork, computer use, and OpenAI background mode fit together, with complete TypeScript examples from zero to advanced.
A practical guide for engineers and architects designing an Astro blog platform with draft/publish workflows in D1, a lightweight markdown editor, and a Cloudflare Containers deployment for an Agno/AgentOS research service.
A practical engineer-first guide to turning a messy enterprise feature into specs, acceptance criteria, Beads task graphs, git worktrees, and parallel delivery across Cursor and Claude Code.
A source-audited translation of HumanLayer's 12-factor agent principles into practical LangGraph and Agno/AgentOS architecture, with production-minded Python examples.
A practical roadmap for software engineers who want to move from toy chatbots to production-grade AI agents, with the right study order, common gaps, and portfolio projects.
A practical guide for implementing LangWatch evaluations in Agno and AgentOS systems, from first traces and batch experiments to structured-output scoring, production monitors, and background evaluation hooks.
A practical, source-audited guide to designing subagents for real cloud workloads, with concrete patterns and tradeoffs across Agno, LangChain, and Vercel AI SDK.
Agent UI is becoming its own stack. Here's how ChatKit, A2UI, and MCP Apps fit together, where plain chat breaks down, and how to design structured interaction surfaces that actually help users get work done.
A source-audited, practical guide to building streaming APIs with LangChain and LangGraph, then consuming them cleanly with AI SDK from simple chat to durable agents.
LLM-as-a-judge can be one of the most useful patterns in agent evaluation, but only if you understand where it breaks: order bias, self-preference, verbosity bias, weak judges, and evidence-free scoring. This guide explains the pattern, the common traps, and the fixes that make it practical.
Why production agent evaluation is moving beyond output-only checks, how trace-aware grading complements scenario testing, and how LangWatch, LangSmith, and Langfuse compare.
A source-audited guide to where computer-use agents are already practical, where they still break, and how to deploy them safely for QA, legacy enterprise workflows, and browser automation.
A grounded look at how Cursor's subagents and skills fit with Claude Code's subagents, worktrees, and the new /simplify command for research, implementation, and cleanup.
A source-audited, architecture-first guide to deciding where agent state should live in production, with code examples in Vercel AI SDK, LangChain, and Agno/AgentOS.
A deep dive into why Astro is the best framework for fast, content-rich sites and why Agno is the simplest path to production-grade AI agents — from a Hello World agent to a full multi-agent workflow dashboard.
A practical, multi-level guide to building an agent app that researches topics, drafts markdown articles, writes them into an Astro site, monitors reliability with LangWatch, and opens a GitHub PR automatically using Agno, AgentOS, and OpenRouter.
How AI-assisted engineering workflows mature from a simple system prompt into skills, MCP tools, and full plugins — with real engineering examples at every stage.
Unit tests tell you if your code works. Scenario tests tell you if your agent behaves. But how do you measure quality across hundreds of examples and track it over time? LangWatch evaluations fill that gap.
A deep dive into LangWatch MCP Server — from basic setup to legendary architectural patterns for test-driven AI development and production inference monitoring.
A comprehensive, multi-level deep dive into testing and quality assurance for modern AI-powered systems — from static web apps to agentic pipelines and MCP servers — using LiteLLM, LangWatch, Agno, AgentOS, and the Claude Agent SDK.
From vibes to verification — how to build testable, reliable agent skills using LangWatch's simulation-based Scenario framework with multi-turn conversations, judges, and CI/CD integration.
A deep comparison of four approaches to building AI agents — OpenAI with raw fetch, Vercel AI SDK, Claude Agent SDK, and Agno with FastAPI — and which one you should pick.
Quick find
Search by topic, title, framework, or pattern.