The New AI Layer in Software Architecture: LLMs, RAG, Agents, and MCP Explained for Real-World Engineers

Part 1 — Architectural Patterns for Generative AI (SEArt Series)

Nov 22, 2025

Every few years, something arrives that forces us to rethink the way we design systems.

Microservices changed how we structure complexity.
Event-driven architecture changed how we communicate.
Cloud changed how we deliver and scale.

And now AI is disrupting—again—but with far more noise, confusion, and buzzwords than any shift we’ve seen before.

LLMs.
RAG.
Agents.
Guardrails.
MCP.
Vector DBs.
Tool-use.

Everyone talks about these things, but very few explain how they actually fit into a real software architecture—not a demo, not a toy notebook, but a true system with bounded contexts, SLAs, budgets, observability, failure modes, and users who will complain when something breaks at the worst possible time.

This post aims to do just that:
Give you a mental map of the AI layer from the perspective of a software architect.
No hype. No magic. No unicorn dust.

Just architecture.

Let’s dive in.

LLMs: A New Probabilistic Component in Our Architecture

A Large Language Model isn’t magic.
From an architectural perspective:

An LLM is a probabilistic function with expensive inference and unpredictable output.

It does reasoning.
But not like a rules engine.
Not like code.
And definitely not like a deterministic microservice.

This is why you don’t put an LLM inside your core workflows.
Instead, you isolate it behind:

an LLM Gateway (OpenAI, Bedrock, Vertex AI, etc.)
a Prompt Library
safety filters
rate limits
logging/observability

The LLM layer becomes a boundary, not a dependency.

RAG: The Pattern That Gives LLMs Real Knowledge

LLMs know a lot… about the public internet.
But they don’t know your domain:

your internal APIs
your codebase
your architecture
your business rules
your documentation
your support tickets
your catalog
your incidents

That’s where Retrieval-Augmented Generation (RAG) comes in.

RAG enriches the model with context retrieved from your knowledge sources.

The architecture consists of:

Embedder — converts text into vectors
Vector Database — stores embeddings (Pinecone, Weaviate, Qdrant)
Retriever — finds the most relevant pieces at runtime
LLM — uses retrieved knowledge to generate grounded answers

RAG is not “just” a technique—it’s a full architectural pattern of knowledge retrieval.

Agentic Patterns: When the LLM Doesn’t Just Answer… It Acts

RAG gives an LLM memory.
Agents give an LLM capabilities.

An agent is:

An LLM that can choose tools and perform actions autonomously within defined boundaries.

Tools can be:

APIs
functions
calculations
database queries
workflow steps
MCP tools

Agents can:

take multiple steps
plan
decide
execute tasks
correct themselves

This is powerful—and dangerous—if not constrained.

Good agents behave like smart assistants.
Bad agents behave like unsupervised interns with admin privileges.

Agents are a full architectural pattern: autonomous reasoning with tool execution.

MCP: The Enabler That Makes Tool-Use and Agentic Workflows Actually Scalable

There’s a piece of this whole story that isn’t an architectural pattern by itself, but has quickly become one of the most important enablers in modern AI systems:

MCP — Model Context Protocol.

If LLMs are the reasoning layer,
and RAG is the memory layer,
and agents are the autonomy layer,

MCP is the interface layer — the part that allows models and agents to interact with real systems in a standardized, safe and introspectable way.

Before MCP, integrating tools into an LLM or an agent was fragile and inconsistent:

Each provider (OpenAI, Anthropic, local models) had its own API format.
Tool definitions lived inside prompts (yes, really).
Schema mismatches caused silent failures.
There was no discovery mechanism: you had to “tell” the model what it could do.
Tool-use was tightly coupled to specific model vendors.
Security controls were improvised and scattered.

MCP changes this dynamic completely.

It introduces a unified contract describing:

what tools exist
what actions they expose
what parameters they require
what schemas they follow
what capabilities are allowed

With this, an LLM or an agent can discover tools, understand their structure, and use them safely — without relying on brittle prompt instructions.

In other words:

MCP is not an architecture pattern — it’s the enabler that makes agentic systems, tool-use, automation and safe AI orchestration actually viable.

It becomes a clean, vendor-independent bridge between:

your microservices
your automations
your data sources
your business actions
and the AI layer

A Real Use Case Where I Applied MCP (High-Level View)

To make this more tangible, here’s a simplified real scenario I implemented.

Imagine a user asks the system:

“Send the list of payments due today to Charles Sant.”

An LLM alone cannot perform this task — it needs real system capabilities.

Through MCP, the LLM can:

discover which tools exist
inspect their schemas
call the right tool with structured data
use the result to perform a follow-up action

The diagram below shows a high-level view of how this actually works end-to-end:

This flow represents how an LLM, an MCP Client, and multiple MCP Tool Servers coordinate to fulfill a real user request:

1. The LLM discovers available tools

It doesn’t “guess” or rely on a prompt description.
It asks the MCP Client:

“What tools do I have?”
The MCP Client replies:
payments-query
people-notifier

These tools are introspectable because MCP provides structured schemas.

2. The LLM chooses the right tool based on schema

It selects payments-query first, sending structured data (JSON compliant with the tool’s schema).

The MCP Client calls the appropriate MCP Tool Server, which returns the required payments.

3. The LLM composes a second action

Now that it has the list of payments, it prepares a second structured request:

a call to people-notifier

Again, the MCP Client relays the request, and the MCP Tool Server returns the execution status.

4. The LLM returns the final result to the user

All tool interactions are chained through the MCP Client, keeping:

your microservices safe
your API surface clean
your tool definitions standardized
your agentic workflow predictable

Additional Tips for Real-World AI Architecture

Before closing, here are a few high-impact recommendations that can help you design AI-enabled systems that remain safe, predictable and maintainable as they grow:

1. Always validate input and output

LLMs produce probabilistic output, not guaranteed structure.
Add lightweight validation to avoid malformed requests reaching your internal systems.

2. Version everything

Prompts, embeddings, RAG pipelines, tool schemas, agent behaviors.
Treat these assets like code — because they are.

3. Keep agents narrow and supervised

A “general-purpose agent” is a nice idea, but in practice, small task-specific agents with well-defined tools behave far better.

4. Apply caching everywhere

Embedding cache, retrieval cache, and generation cache reduce latency and cost dramatically.

5. Don’t forget security

AI introduces new attack surfaces.
Instead of going deep here, I’ll simply recommend this:
study the OWASP Top 10 for LLM Applications when designing any generative AI feature.
It provides practical guidance for avoiding prompt injection, unsafe tool execution, and other real-world risks.

(You don’t need to memorize every item—but awareness goes a long way.)

6. Separate deterministic logic from AI logic

Critical flows should not depend entirely on an LLM.
Build fallbacks, guardrails, and human-in-the-loop paths for sensitive operations.

7. Measure what your AI layer is actually doing

Logging tool calls, recording decisions, tracking latency and cost — this turns the AI layer from a “black box” into something observable.

These small practices compound over time and make a massive difference in production.

Conclusion

This was a high-level introduction to how LLMs, RAG, agents, and MCP fit together into a modern software architecture.
My goal wasn’t to go deep on every topic, but to give you a mental map that helps you reason about the AI layer with clarity — without hype, magic, or hand-waving.

If this topic interests you, stay tuned.
The next posts in this series will go deeper into individual patterns, starting with Guardrails, and I’d love for you to join the discussion, share your experiences, and challenge the ideas here.

Thanks for reading — and let’s keep building better systems together.

Software Engineering Art - SEArt

Discussion about this post