LLM Engineering in Action

This book teaches the production skills that appear repeatedly in 2026 LLM engineering job postings and strong real-world resumes: RAG, agentic workflows, fine-tuning, LLMOps, evaluation, security, cloud deployment, and portfolio-ready product work.

Market-Aligned Outcomes

By the end of the book, readers should be able to build and explain:

A document-grounded RAG assistant with citations and retrieval tests
A hybrid search system with reranking and measurable retrieval quality
A permission-aware enterprise assistant connected to documents and SQL data
A tool-connected LLM app with approval gates and audit logs
A stateful agent workflow with checkpoints, retries, and human review
A fine-tuned open-source model with before/after evaluation
A production LLM API with Docker, CI/CD, monitoring, and cost controls
An evaluation and security harness for RAG and agent systems
A portfolio capstone with architecture notes, metrics, and trade-off analysis

Introduction: Modern LLM Engineering

The LLM Engineer Role
The Production AI Mindset
The LLM Application Stack
Business-to-System Translation
Reliability, Cost, and Risk
How to Use This Book

Part 1: Foundations and First Applications

Chapter 1: LLM Text Generation

Generation Loop
Tokenization and Subwords
Decoder-Only Transformers
Context Windows and Attention
Logits and Sampling
Decoding Controls
Generation Failure Scenarios
Chat Playgrounds
Hands-On Exercise

Chapter 2: Production Model Selection

Model Selection Criteria
Closed and Open Models
API, Hosted, and Local Access
Capability and Task Fit
Benchmarks and Product Tests
LLM Leaderboards
Cost, Latency, and Reliability
Model Decision Records
Hands-On Exercise

Chapter 3: Streaming Chatbot Applications

Chat API Structure
Roles and Message History
Environment and API Keys
Streaming Response Handling
Command-Line Chatbot Design
Context Growth Management
Runtime Error Handling
Part 1 Capstone Project

Part 2: Context, Retrieval, and Enterprise Grounding

Chapter 4: Prompting and Structured Outputs

System and User Prompts
Production Prompt Anatomy
System Prompt Design
Instruction Hierarchy
Prompt Security Boundaries
Prompt Patterns
Prompt Chaining and Review Loops
Few-Shot Examples
Context Assembly
Prompting for Cost and Latency
Prompt Versioning and Debugging
Prompt Anti-Patterns
JSON Schema Outputs
Pydantic Validation Contracts
Hands-On Exercise

Chapter 5: Embeddings and Semantic Search

What is an Embedding?
Practical Chunking Strategies
Choosing Embedding Models
Embedding Generation Best Practices
Vector Databases and Storage Options
Basic Semantic Search Implementation
Evaluation Metrics for Retrieval
Common Pitfalls and Failure Scenarios
Hands-On Exercise

Chapter 6: Retrieval-Augmented Generation

Search Is Not an Answer
The Basic RAG Loop
Context Construction and Token Budgeting
Grounded Prompting and “I Don’t Know”
Source Citations as a Product Contract
FastAPI and Streamlit Interface
Debugging Retrieval and Context
Minimal Smoke Tests
Hands-On Exercise: PDF Q&A Bot with Citations

Chapter 7: Hybrid Retrieval, Reranking, and RAG Evaluation

Why Basic Vector Search Fails
BM25 and Sparse Retrieval
Metadata Filters and Exact-Match Needs
Hybrid Dense+Sparse Search
Query Rewriting
Reciprocal Rank Fusion
Cross-Encoder Reranking
Golden Query Sets
Retrieval Metrics: Recall@k, Precision@k, MRR, NDCG
RAGAS, DeepEval, and Custom RAG Evaluation
Hands-On Exercise: Support Search System with Measured Retrieval Lift

Chapter 8: Enterprise Data Integration and Permission-Aware RAG

Document Ingestion Pipelines
Parser Selection and Metadata Contracts
Incremental Indexing and Data Freshness
SQL and Postgres Grounding
Permission-Aware Retrieval
Tenant Isolation and Row-Level Security
PII Handling and Audit Logs
Knowledge Graphs and Structured Context
Access-Control Test Suites
Hands-On Exercise: Internal Assistant with Documents, SQL Data, and Role Filters

Part 3: Tools, State, and Agents

Chapter 9: Tool-Connected LLM Apps and MCP-Style Interfaces

Function Calling and Tool Schemas
Argument Validation
Tool-Result Injection
Read Tools vs Write Tools
API Safety and Idempotency
Human Approval for Risky Actions
MCP-Style Tool Boundaries
Tool-Call Logging and Audit Trails
Hands-On Exercise: Customer-Support Copilot with Docs, Order Lookup, and Approval Gates