LLM

A topic hub collecting every article tagged LLM. Use it to explore related posts and follow this theme across the site.

20 articles

Explore More Topics

AI Agents AI Architecture Claude Code Loop Engineering AI Engineering

AI LLM AI Engineering Evaluation OpenAI Anthropic Google Software Architecture

Do New Frontier LLMs Really Resolve Ambiguity Better?

Frontier labs increasingly claim their models understand intent with less prompting. Here is what the evidence supports, what training changed, and how to test ask-versus-guess behavior in your own agents.

Jul 17, 2026 21 min read

Machine Translation Google Translate NMT LLM AI Engineering Architecture Production

Building Google Translate: Before LLMs and With Modern AI

How Google Translate evolved from rules and statistical phrase models to neural MT and today’s Translation LLM — and how to build a translation app in 2026 with the right mix of NMT, multilingual models, and LLMs.

Jul 3, 2026 19 min read

AI LLM AI Engineering OpenAI Inference Benchmarks Open Source

How Fast Is gpt-oss-120b? Speed, Quality, and Routing Tradeoffs in 2026

For engineers evaluating gpt-oss-120b: OpenAI's open-weight model is fast for an open reasoning checkpoint, but throughput depends heavily on the provider and benchmark view. Here is how to read Artificial Analysis, OpenRouter trends, and the cost tradeoffs behind fast inference.

Jul 3, 2026 20 min read

Graph RAG RAG AI Engineering Knowledge Graphs LLM Architecture Production

Graph RAG Apps: A Production Deep Dive and Recommended Stack

How to design a production Graph RAG app with graph construction, hybrid retrieval, durable hand-offs, evaluation gates, and a practical recommended stack for engineers and PMs.

Jul 2, 2026 22 min read

AI Agents Subagents RAG LLM Orchestration Production AI Engineering

How to Run 300+ Subagent Jobs: Is It Really Possible, and Are LLM/RAG Apps Ready?

A practical feasibility guide for engineers and PMs on processing hundreds of subagent-style jobs, where the real limits are, and what RAG quality, evals, queues, and budget controls must exist before this is production-safe.

Jun 26, 2026 21 min read

AI LLM Evaluation Software Architecture Anthropic Product

How to Improve LLM Behavior and Personality in 2026: Big Models, Small Models, and Modern Alignment

A source-backed guide for engineers and PMs on shaping model behavior and personality across frontier LLMs and small language models, from prompts and DPO to persona vectors and distillation chains.

Jun 24, 2026 21 min read

AI Engineering Learning Students Roadmap LLM RAG Agents Career

The Parallel Track: Learning AI Engineering While Your CS Degree Catches Up

A friendlier roadmap for curious CS students who want to build real AI products now: what to learn, what to build, which job signals matter, and how to stay grounded while the market moves fast.

Jun 16, 2026 13 min read

RAG AI Engineering Evaluation LLM Knowledge Management AI Agents

RAG Apps in Practice: Structured Knowledge, Better Retrieval, and Real Evals

A practical guide for engineers and PMs designing RAG apps over well-structured, chained knowledge: how to chunk, retrieve, rerank, cite, evaluate, and improve with evidence instead of vibes.

Jun 15, 2026 28 min read

AI Personalization LLM Product Engineering RAG Agents

LLM-Based Personalization in 2026: Cheap, Quick, and Efficient

A practical guide for engineers and PMs who want useful LLM personalization without a giant recommender team, expensive fine-tuning program, or fragile memory layer.

Jun 9, 2026 15 min read

AI LLM Claude Cursor AI Engineering Agents Coding

Claude Sonnet 4.6 1M vs Composer 2.5 Fast: A Practical LLM Comparison

A friendly, evidence-based comparison of Claude Sonnet 4.6 1M and Cursor Composer 2.5 Fast across speed, intelligence, coding, agents, cost, and product fit.

May 27, 2026 14 min read

AI LLM AI Engineering Context Engineering Claude Gemini DeepSeek Kimi

Part 2: Alternatives to Claude Opus 4.7 1M for Engineers in 2026

A practical engineering comparison of Claude Opus 4.7 1M alternatives: GPT-5.5, DeepSeek V4 Pro, DeepSeek V4 Flash, Gemini 3.1 Pro, Gemini 3.5 Flash, and Kimi K2.6.

May 22, 2026 20 min read

AI LLM Context Engineering RAG Product AI Engineering

1M vs 200K Context Windows: What Actually Changes for LLM Apps

A practical comparison of 1M-token and 200K-token LLM context windows: what gets easier, what still breaks, and how Engineers and PMs should choose an architecture.

May 21, 2026 18 min read

AI Security AI Agents Secure Coding DevSecOps LLM Loop Engineering Hardening

A Secure Agentic Coding Process Is Not Just a Bigger LLM

The current evidence does not support trusting model size alone for secure code generation. Secure agentic coding needs threat modeling, constrained tools, scanners, evals, and human approval gates.

Apr 28, 2026 20 min read

AI Agents Prompt Engineering LLM Software Architecture Evaluation Anthropic OpenAI Loop Engineering Hardening

How to Write Robust System Prompts for AI Agents Across LLMs

There is no truly bulletproof system prompt. But there is a practical engineering standard for making prompts far more robust across Sonnet, Haiku, GPT, and reasoning-style models.

Apr 22, 2026 20 min read

Code Review AI Agents LLM Software Engineering Evaluation Loop Engineering Hardening

How to Build a Good Agentic Code Reviewer

Most AI code review bots fail for a simple reason: they optimize for visible comments instead of reviewer trust. This guide pulls together current benchmarks, practitioner reports, product limitations, and design patterns for building a code review agent that is fast, quieter, less biased, and less hallucination-prone.

Apr 15, 2026 23 min read

AI UX Communication LLM Product

Why Concise AI Responses Work Better: Evidence, Biases, and a Better Default for Engineers and PMs

The strongest evidence does not say 'always write less.' It says humans do better with lower cognitive load, clearer cues, and progressive disclosure. Here's how that changes how we should generate AI responses and reports.

Apr 9, 2026 11 min read

AI Evaluation LLM Testing LLMOps

How to Review AI-Generated Responses: A Practical Rubric for Engineers and PMs

A practical, source-backed workflow for reviewing AI-generated responses for factual accuracy and relevance, scoring them with structured rubrics, and turning feedback into better prompts, evals, and product decisions.

Apr 7, 2026 15 min read

AI Agents Evaluation LLM Testing LLMOps Observability

LLM-as-a-Judge for Agent Apps: Biases, Blind Spots, and Fixes

LLM-as-a-judge can be one of the most useful patterns in agent evaluation, but only if you understand where it breaks: order bias, self-preference, verbosity bias, weak judges, and evidence-free scoring. This guide explains the pattern, the common traps, and the fixes that make it practical.

Mar 30, 2026 17 min read

AI UX Conversational AI SaaS Security LLM

The Chat Pivot: Why Web Apps Are Replacing Menus with Conversations

From Cloudflare's Cloudy agent to GitHub Copilot in your issue tracker, the web is shifting toward conversational interfaces. Here's what's driving it, who's doing it right, and what security challenges remain.

Mar 20, 2026 9 min read

LangWatch AI Agents Evaluation LLM Testing Observability

Evaluating AI Agents with LangWatch: From Vibes to Scores

Unit tests tell you if your code works. Scenario tests tell you if your agent behaves. But how do you measure quality across hundreds of examples and track it over time? LangWatch evaluations fill that gap.

Mar 19, 2026 11 min read

Luis Mori Guerra

Recent Articles

Topics

LLM