Modern software design is not one giant architecture diagram.
It is a set of pieces you add when the system earns them: isolated environments, clear layers, observability, read models, event streams, contracts, authorization, failure handling, migration paths, and eventually AI-aware workflows.
The trap is adding all of them at the beginning. The opposite trap is waiting so long that every feature becomes risky. Good architecture lives between those mistakes.
This article is a practical ladder. It starts with one server and one database, then adds each design piece only when a real pain appears.
TL;DR
- Start with one app and one database unless the problem is already distributed.
- Separate development and production early. Most “it worked in the demo” pain is environment pain.
- Keep a single app modular before reaching for microservices.
- Add observability before optimization. You cannot tune what you cannot see.
- Split read and write paths when reads dominate and query shape becomes different from write shape.
- Use events when features need to react to state changes without editing the core service every time.
- Treat APIs and event schemas as contracts. Version them and test them.
- Keep authorization separate from data access once roles, teams, tenants, or agents enter the product.
- Design for failure with timeouts, retries, idempotency, circuit breakers, dead-letter queues, and graceful degradation.
- Modernize legacy systems slice by slice with a strangler fig pattern, not a big-bang rewrite.
- Treat AI as another system layer: model, tools, memory, permissions, evals, observability, and fallback behavior.
The shortest version: modern architecture is the discipline of adding boundaries at the moment they reduce risk more than they add complexity.
What You Will Learn Here
- How a system can grow from a simple CRUD app into a production architecture without over-engineering.
- Which pain usually justifies each architectural pattern.
- How NATS, OpenSearch, and OpenFGA map to common modern system-design problems.
- Where AI features and AI coding agents fit into the same architecture.
- How to decide which piece to add next, and which pieces to avoid for now.
The Ladder
Think of the system as moving through six stages:
Act 0 Act 1 Act 2 Act 3 Act 4 Act 5
simple -> organized -> observable -> distributed -> resilient -> evolving
app app data services system system
Each act is a response to a pain:
| Pain | Design piece |
|---|---|
| ”Testing on real user data is terrifying.” | Isolated environments |
| ”Nobody knows where logic belongs.” | Layered modular structure |
| ”Users say it is slow, but we are guessing.” | Observability |
| ”Reads are drowning writes.” | CQRS and read models |
| ”Every new feature edits the core service.” | Events |
| ”Changing one service breaks another.” | Contracts and versioning |
| ”Permissions are copied everywhere.” | Externalized authorization |
| ”One outage breaks everything.” | Failure design |
| ”The legacy system cannot be replaced safely.” | Strangler fig migration |
| ”AI features need data, tools, and guardrails.” | AI as a first-class layer |
That table is the article in miniature. The rest explains how to use it.
Act 0: Start With One App and One Database
The best first architecture is usually boring:
Browser -> Server -> Database
One deployable app. One source of truth. One path to understand.
This is not a toy. A well-built monolith can carry a serious product for a long time, especially when one team owns the whole system. The problem is not starting simple. The problem is starting simple while leaving yourself no way to grow.
At this stage, avoid:
- a message bus with no asynchronous work
- microservices with one team
- CQRS before queries hurt
- a custom authorization service before roles are complex
- AI orchestration before there is a clear user problem
The useful discipline is to keep the first system small but not messy.
User
|
v
API / Web App
|
v
Database
The first real pain usually appears when people depend on the app and you need to change it safely.
Act 1: Make One App Safe to Change
The first design pieces are not about scale. They are about safety.
Isolate Environments
Development and production should be separate copies of the system:
Development environment Production environment
----------------------- ----------------------
newest code released code
sample or seeded data real user data
test credentials production credentials
safe to break must stay available
The Twelve-Factor App calls this dev/prod parity: keep environments similar in shape while keeping data, credentials, and runtime state isolated.
This distinction explains a common product misunderstanding. “It works in dev” and “customers can use it” are different milestones. The missing step is release.
For PMs, this is not engineering bureaucracy. It is the difference between a demo, a staging validation, and a production launch.
Keep the App Modular
Before splitting into services, split the codebase into clear layers:
Edge routing, auth entry points, rate limits
Application use cases, workflows, commands
Domain business rules and invariants
Data repositories, queries, transactions
Async jobs, queues, outbox, retries
Ops logging, metrics, tracing, health checks
Those layers can live inside one deployable app. That is a modular monolith.
The point is not ceremony. The point is that a future engineer should know where a new rule belongs. If “who can approve an invoice” appears in a controller, a SQL query, a React component, and a background job, the system is already drifting.
At the end of Act 1, the app still looks simple from the outside:
User -> Edge -> Modular App -> Database
But inside, the concerns have homes. That makes the next stage possible.
Act 2: Measure Before You Scale
Scale problems are often described emotionally:
- “Search feels slow.”
- “The dashboard hangs sometimes.”
- “Checkout was weird last night.”
- “Customers say data is missing.”
Those are not yet engineering facts. Observability turns them into facts.
Add Observability
For a production app, you need at least three kinds of telemetry:
Logs what happened
Metrics how often, how fast, how many
Traces where a request went across system boundaries
OpenTelemetry describes these as telemetry signals that can be generated, collected, and exported through a vendor-neutral framework. The tooling matters less than the habit: every important flow should have enough evidence to debug it later.
For each critical user journey, track:
- request rate
- error rate
- latency, especially p95 and p99
- saturation, such as queue depth or connection pool usage
- a correctness signal, such as “orders created” or “todos indexed”
That last one is easy to skip and painful later. A system can return HTTP 200 while silently doing the wrong thing.
Split Reads From Writes When Query Shape Demands It
At some point, the write model and the read model want different shapes.
Writes want correctness:
- normalized tables
- transactions
- constraints
- invariants
Reads want speed:
- denormalized documents
- precomputed views
- search indexes
- cached aggregates
CQRS, or Command Query Responsibility Segregation, names that split.
Write path
---------
Create todo -> App service -> Write database
|
| project changes
v
Read path Read model
--------- ----------
Search todos -> Read API -> OpenSearch / cache / materialized view
The read model is not the source of truth. It is a purpose-built copy.
This design buys speed and query flexibility, but it introduces a new truth: eventual consistency. A user may create something, then wait a moment before it appears in search. That is acceptable for some flows and unacceptable for others.
Use this pattern when the read side has genuinely outgrown the write side. Do not use it because a diagram looks more modern with two databases.
Concrete Implementation: A Read Model Pipeline
Here is a small implementation shape that appears in many real systems:
Write database -> Outbox table -> Publisher -> NATS -> Projector -> OpenSearch
|
v
Read API
|
v
OpenFGA check
The write transaction stores both the business change and an outbox event:
BEGIN;
INSERT INTO todos (id, user_id, title, completed)
VALUES (:id, :user_id, :title, false);
INSERT INTO outbox_events (id, topic, payload, created_at)
VALUES (
:event_id,
'todo.created',
json_build_object(
'todo_id', :id,
'user_id', :user_id,
'title', :title
),
now()
);
COMMIT;
A publisher reads unpublished outbox rows and sends them to the event bus:
type OutboxEvent = {
id: string;
topic: string;
payload: unknown;
};
async function publishOutboxBatch(events: OutboxEvent[]) {
for (const event of events) {
await messageBus.publish(event.topic, {
id: event.id,
occurredAt: new Date().toISOString(),
data: event.payload,
});
await markPublished(event.id);
}
}
A projector updates the read model idempotently:
type TodoCreated = {
id: string;
data: {
todo_id: string;
user_id: string;
title: string;
};
};
async function handleTodoCreated(event: TodoCreated) {
if (await alreadyProcessed(event.id)) {
return;
}
await openSearch.index({
index: "todos",
id: event.data.todo_id,
document: {
title: event.data.title,
owner_id: event.data.user_id,
completed: false,
},
});
await markProcessed(event.id);
}
The details change by stack, but the shape is stable:
- write once to the source of truth
- publish changes reliably
- update read models in the background
- make projectors idempotent
- measure lag between write and read visibility
That lag becomes an operational signal.
Act 3: Split Systems Along Real Boundaries
Microservices are not the next step after “we have files.” They are the next step after independent parts of the system need independent ownership, scaling, release cadence, or failure isolation.
The design problem changes once there is more than one service. Calls cross boundaries. Schemas drift. Permissions duplicate. Debugging gets harder.
Use Events When Services Need to React
Direct calls couple services tightly:
Todo service -> Email service
Todo service -> Analytics service
Todo service -> Search projector
Todo service -> Notification service
Events invert that relationship:
Todo service -> todo.completed event -> Message bus
-> Email worker
-> Analytics worker
-> Search projector
-> Notification worker
The producer announces what happened. Consumers decide what to do.
NATS is one practical open-source option here. Core NATS supports pub/sub and request/reply. JetStream adds persistence, durable consumers, replayable streams, and key-value capabilities. That makes it useful when you need lightweight messaging without adopting a heavier event platform immediately.
The tradeoff: events make flows more flexible, but less obvious. You need naming conventions, schema ownership, tracing, dead-letter handling, and replay procedures.
Treat Contracts as Code
Once multiple services share APIs or events, every boundary is a contract.
Examples:
GET /v1/todos
POST /v1/todos
event todo.created.v1
event todo.completed.v2
Version numbers are not decoration. They tell consumers what can change safely.
The executable version of a contract is a test:
- OpenAPI validation for HTTP APIs
- schema validation for events
- contract tests between producers and consumers
- integration tests for the highest-risk flows
This matters even more with AI coding agents. Agents can change code quickly, but they still need an oracle. Contracts and tests are the oracle.
Separate Authorization From Data Access
Authorization gets complicated when the product gets useful.
At the beginning, this may be enough:
if (user.role !== "admin") {
throw new ForbiddenError();
}
Later, the question becomes:
Can this user, service account, or agent
perform this action
on this workspace, project, document, invoice, or tool
under this task context?
At that point, permission logic scattered through services becomes dangerous.
A cleaner design separates data retrieval from authorization:
Read API -> OpenSearch "Which records match the query?"
Read API -> OpenFGA "Which matching records can this subject see?"
Read API -> User "Return only authorized records."
OpenFGA is one open-source implementation of relationship-based access control inspired by Google’s Zanzibar paper. It stores relationship tuples, evaluates an authorization model, and answers permission checks such as:
Can user:luis view document:roadmap?
Can agent:triage read ticket:123?
Can service:billing refund invoice:456?
The separation gives you a debugging advantage:
| Data exists? | Permission exists? | Likely problem |
|---|---|---|
| yes | yes | API or UI bug |
| yes | no | authorization bug |
| no | yes/no | data sync or creation bug |
This is especially useful in systems with read models, search indexes, tenants, and agents. You can inspect the data path and permission path independently.
Act 4: Design for Failure
Distributed systems fail in ordinary ways:
- the network drops
- a dependency slows down
- a message arrives twice
- a consumer crashes halfway through
- a search index lags
- a permission tuple is missing
- an AI model times out
Production design assumes this will happen.
The basic toolkit:
| Piece | What it prevents |
|---|---|
| Timeouts | Waiting forever |
| Retries | Failing on a transient error |
| Idempotency | Double-processing messages |
| Circuit breakers | Cascading dependency failure |
| Dead-letter queues | One bad message blocking all work |
| Graceful degradation | Optional features taking down core flows |
| Rollbacks | Bad deployments staying bad |
Idempotency is the most important one once events exist:
async function handleEvent(event: Event) {
if (await processedEvents.has(event.id)) {
return;
}
await doWork(event);
await processedEvents.add(event.id);
}
Graceful degradation is equally important for product experience:
async function createTodo(input: CreateTodoInput) {
const todo = await saveTodo(input);
try {
const suggestions = await aiSuggestions.forTodo(todo);
await saveSuggestions(todo.id, suggestions);
} catch (error) {
logger.warn({ error, todoId: todo.id }, "AI suggestions unavailable");
}
return todo;
}
The core action succeeds even if the optional AI feature fails.
But graceful degradation has a trap: it hides failure from users, so it can hide failure from the team. Every degraded path should emit logs, metrics, or traces. Otherwise the app can be broken quietly.
Act 5: Evolve a Living System
Once a system is valuable, replacing it becomes risky. The goal shifts from “build the new thing” to “change the running thing without breaking it.”
Use the Strangler Fig Pattern for Legacy Migration
The strangler fig pattern puts a facade in front of the old system, then routes one capability at a time to the new system:
Request -> Facade / router
|
+-> legacy system for old routes
|
+-> new system for migrated routes
The default should usually be legacy until a slice is proven safe. Useful techniques include:
- routing by endpoint, tenant, feature flag, or cohort
X-Served-Byheaders to identify which system handled a response- shadow traffic to compare old and new behavior
- change data capture to keep new read models in sync
- canary rollout before broad migration
- rollback paths for every migrated slice
This is how modernization ships value continuously instead of disappearing into a rewrite.
Treat AI as a First-Class Layer
AI features are not just “call the model here.”
A production AI feature usually has several pieces:
User request
|
v
AI orchestration layer
|
+-> model
+-> tools
+-> retrieval / vector store
+-> authorization checks
+-> evaluation and tracing
+-> fallback behavior
The vector store is just another read model, optimized for semantic retrieval. OpenSearch can serve this role through vector search, and Postgres with pgvector can be enough when staying close to the primary database is more valuable than specialized search infrastructure.
The same architecture rules still apply:
- AI calls need timeouts and fallbacks.
- AI tools need authorization.
- AI output needs evaluation, not just uptime checks.
- AI traces should show which tools, documents, prompts, and model calls influenced the result.
- Sensitive data should be filtered before retrieval and before tool invocation.
AI does not remove system design. It raises the cost of sloppy boundaries.
Let Agents Help Engineering, With Guardrails
AI coding agents fit into this architecture too. A useful agent loop looks like this:
Task -> Agent reads context -> edits code -> runs tests -> fixes failures -> opens PR
^ |
| v
+------ test feedback <---+
That loop is only safe when earlier pieces exist:
- isolated dev environments
- clear architecture boundaries
- tests and contract checks
- source control
- CI
- human review
- permission boundaries for tools and secrets
A stronger self-correction loop adds production evidence:
Observe regression -> classify cause -> propose patch -> test in dev
^ |
| v
Rollback if unhealthy <- canary release <- human approval <- PR
This is not magic self-healing. It is architecture wired into a feedback loop. Observability detects the problem. Contracts define expected behavior. The dev environment contains the experiment. Tests judge the candidate fix. Canary rollout limits risk. Rollback stops the bleeding.
The more disciplined the system, the more useful autonomy becomes.
The Full Architecture in One Picture
A mature version of the original simple app might look like this:
One environment: dev, staging, or prod
User / Client
|
v
Edge: auth entry point, routing, rate limits
|
v
Application services
|
+-> Write database
| |
| v
| Outbox events
| |
| v
+-> Message bus: NATS / Kafka / Pub/Sub
|
+-> Search projector -> OpenSearch read model
+-> Email worker
+-> Analytics worker
+-> AI orchestration
|
+-> model
+-> vector store
+-> tools
Read API
|
+-> OpenSearch for candidate data
+-> OpenFGA for permission checks
|
v
Authorized response
Ops across everything:
logs, metrics, traces, alerts, SLOs, deploys, rollbacks
Do not read this as the starting point. Read it as the result of many justified steps.
A Practical Decision Matrix
Use symptoms, not fashion, to choose the next piece.
| Symptom | Reach for | Avoid |
|---|---|---|
| Live data is being used for testing | Environment isolation | More manual caution |
| Logic is duplicated across UI, API, and jobs | Modular layers | A service split too early |
| Nobody can explain a slowdown | Observability | Blind caching |
| Queries need search, filtering, and aggregation at scale | Read model / CQRS | Making the write schema serve every read |
| Features react to the same state change | Event bus | More direct service calls |
| Consumers break after producer changes | Contracts and versioning | Slack-based coordination |
| Permissions differ by workspace, object, role, or agent | Externalized authorization | Copy-pasted if checks |
| Messages arrive twice or dependencies flap | Idempotency, retries, timeouts | Assuming the happy path |
| Legacy replacement is risky | Strangler fig migration | Big-bang rewrite |
| AI features need private data or tools | AI layer with auth, evals, tracing | Raw model calls from random services |
| AI coding agents create risky diffs | Specs, tests, CI, review gates | Trusting generated code because it compiles |
The point is not to use every pattern. The point is to know what pain each pattern is meant to cure.
Common Mistakes
Splitting Services Before Splitting Concepts
If the monolith is tangled, microservices will usually distribute the tangle. First clarify module boundaries inside one app. Then split the parts that need independent ownership or scaling.
Adding CQRS Without Measuring
CQRS adds a synchronization problem. Use it when reads genuinely need a different model, not because read/write separation sounds sophisticated.
Treating Events as Invisible Function Calls
Events need ownership, schemas, replay rules, and observability. If nobody owns an event contract, every consumer owns the fallout.
Putting Authorization Only at the UI
UI checks improve experience. They do not protect data. Permission enforcement belongs server-side, close to every read, write, tool call, and background action.
Bolting AI Onto the Side
AI features still need security, evaluation, observability, and fallback behavior. A model call is an integration point, not a product architecture.
Conclusion
Modern software design is the art of adding structure at the right time.
Start simple. Keep the code modular. Isolate environments. Measure the system. Split reads from writes when the data demands it. Use events when reactions multiply. Treat contracts and permissions as first-class boundaries. Design for failure. Migrate incrementally. Add AI as a governed layer, not a shortcut around engineering.
None of these pieces are exotic on their own. The skill is sequencing them.
Architecture becomes less scary when every box in the diagram answers one question: what pain does this solve?
Sources
- The Twelve-Factor App: Dev/Prod Parity - source for the dev/prod parity framing.
- OpenTelemetry documentation - vendor-neutral observability framework for logs, metrics, and traces.
- NATS JetStream documentation - persistence, durable messaging, and replayable streams in NATS.
- NATS JetStream Key/Value Store - official documentation for JetStream-backed key-value buckets.
- OpenSearch vector search documentation - official documentation for vector search with OpenSearch.
- OpenFGA documentation - relationship-based access control concepts inspired by Zanzibar.
- OpenFGA CNCF project page - CNCF status and project description.
- Martin Fowler: Strangler Fig Application - original framing of the strangler fig application pattern.
- Azure Architecture Center: Strangler Fig Pattern - practical guidance for gradual migration using the pattern.
- Google Zanzibar paper - foundational paper behind large-scale relationship-based authorization systems.