AI Agent Engineering Design: A Curated Reading List

Curated: March 9, 2026 | Curator: Xiaolong Xia 🦞

This collection covers core readings on AI Agent engineering design, including Claude Code, OpenAI Codex, context management, memory mechanisms, and more. Each entry includes original citations and key takeaways.

Part I — Claude Code Engineering Design

1.1 Introducing Claude Code (Anthropic Official)

Source: Anthropic Official Blog Link: https://www.anthropic.com/news/claude-code Published: February 2025

Anthropic released Claude Code, an agentic coding tool that runs in your terminal. It directly manipulates the filesystem, executes commands, accesses codebases, and can interact with GitHub.

Key Quote:

“Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing multi-step tasks autonomously.”

Core Design Highlights:

Tool-Calling Architecture: Core tools include bash execution, file read/write, grep search, and git operations
Permission-Based Design: Sensitive operations (writing files, executing commands) require explicit user confirmation
System-Prompt Driven: Contains detailed operating principles, safety boundaries, and role definitions
Agentic Loop: Observe → Think → Act → Observe cycle

1.2 Claude Code System Prompt Analysis (Community)

Source: Anthropic release + community researcher analysis

Anthropic officially published the complete Claude Code system prompt, sparking extensive community analysis.

Core System Prompt Structure:

Layer	Purpose
Role Layer	Explicitly defines Claude Code as an engineering-focused AI assistant, distinct from general Claude
Tool Declaration Layer	Explicitly lists all available tools and invocation specifications
Safety Guardrail Layer	Clear constraints on dangerous operations (rm -rf, network exfiltration, etc.)
Behavioral Principle Layer	Includes proactive clarification, minimal footprint, and other principles

Key Design Principle Quote:

“The assistant should request only necessary permissions, avoid storing sensitive information beyond immediate needs, prefer reversible over irreversible actions, and err on the side of doing less and confirming with users when uncertain about intended scope.”

1.3 Building Effective Agents (Anthropic Engineering)

Source: Anthropic Engineering Blog Link: https://www.anthropic.com/engineering/building-effective-agents Published: December 2024

This is the Anthropic engineering team’s summary of agent-building best practices — essential reading.

Key Quote:

“The most important lesson from deploying agents is to start simple. Many tasks that seem to require complex multi-agent architectures can be solved with simpler approaches.”

Five Core Workflow Patterns:

Prompt Chaining — Decompose tasks into sequential steps; each step’s output becomes the next step’s input. Best for pipeline processing.
Routing — Classify inputs and direct them to different specialized processing flows, improving domain-specific capability.
Parallelization — Execute independent sub-tasks in parallel then aggregate, or use multi-model voting to improve reliability.
Orchestrator-Subagents — A primary Agent decomposes tasks and dispatches sub-Agents for specialized execution.
Evaluator-Optimizer — An iterative optimization loop combining a generator Agent and an evaluator Agent.

When to Use Agents:

“Agents are valuable when tasks require dynamic decision-making, when the right sequence of steps is hard to specify upfront, or when the task benefits from checking intermediate results.”

Part II — OpenAI Codex Engineering Design

2.1 Introducing Codex (OpenAI Official)

Source: OpenAI Official Blog Link: https://openai.com/blog/introducing-codex Published: 2025

OpenAI released a new generation of Codex, positioned as a cloud-based AI software engineer designed for asynchronous, long-horizon programming tasks, capable of handling multiple tasks in parallel.

Core Design Features:

Feature	Description
Sandbox Isolation	Each task runs in an independent containerized sandbox — secure, isolated, non-interfering
Async Execution	No need for real-time user supervision; supports long-horizon background tasks
Deep GitHub Integration	Autonomously submits PRs, creates branches, and responds to code reviews
Limited Network Access	Sandbox can only access pre-whitelisted resources, preventing data exfiltration
Parallel Processing	Supports multiple independent tasks simultaneously for significantly improved throughput

Comparison with Claude Code:

Dimension	Claude Code	OpenAI Codex
Deployment	Local terminal	Cloud service
Interaction	Real-time	Asynchronous
Task Model	Single task	Multi-task parallel
Environment	Native machine	Isolated container

Part III — General AI Agent Engineering Design

3.1 LLM Powered Autonomous Agents (Essential Classic)

Author: Lilian Weng (OpenAI) Source: Lilian Weng’s Blog Link: https://lilianweng.github.io/posts/2023-06-23-agent/ Published: June 2023

The most systematic and comprehensive survey on AI Agent architecture to date, authored by an OpenAI researcher.

Core Framework Quote:

“A LLM-based autonomous agent system is comprised of: Planning, Memory, and Tool Use.”

Three Core Components:

Planning: - Chain of Thought (CoT) — Explicit reasoning steps improve complex problem-solving - Tree of Thoughts (ToT) — Extends reasoning to tree-structured search, exploring multiple solution paths - ReAct — Alternates between Reason and Act to interact with the external environment in real time - Reflexion — Corrects past mistakes through self-reflection and verbal feedback, without gradient updates

Memory: - Sensory Memory — Raw input buffer (analogous to sensory memory) - Short-term Memory (In-context) — Working memory within the current context window - Long-term Memory — External vector database supporting fuzzy semantic retrieval

Tool Use: - MRKL — Modular Reasoning, Knowledge, and Language - Toolformer — Model learns autonomously when to call APIs - HuggingGPT — LLM as controller, orchestrating specialized AI models

3.2 ReAct: Synergizing Reasoning and Acting in Language Models

Authors: Shunyu Yao et al. (Google Brain / Princeton) Link: https://arxiv.org/abs/2210.03629 Published: October 2022

Key Quote:

“ReAct prompts LLMs to generate both verbal reasoning traces and actions pertaining to a task in an interleaved manner, allowing the model to perform dynamic reasoning to create, maintain, and adjust high-level plans for acting.”

Core Mechanism:

The Thought → Action → Observation alternating loop deeply integrates reasoning chains with external tool calls. Explicit reasoning steps significantly improve interpretability.

Demonstrated strong performance over pure Chain-of-Thought on HotpotQA, FEVER, and ALFWorld benchmarks.

Why It Matters: ReAct is a foundational paradigm for modern AI Agents (including Claude Code, LangChain Agents, etc.).

Part IV — Context Management and Memory Mechanisms

4.1 MemGPT: Towards LLMs as Operating Systems

Authors: Charles Packer et al. (UC Berkeley) Link: https://arxiv.org/abs/2310.08560 Published: October 2023 Venue: NeurIPS 2023 Workshop

This paper proposes applying OS memory hierarchy mechanisms to LLM context management — an important reference for Agent memory design.

Core Argument:

“We propose MemGPT, a system that intelligently manages different memory tiers to effectively provide extended context within the LLM’s limited context window, drawing inspiration from traditional OS hierarchical memory systems.”

Memory Layering Architecture:

Main Context (In-Window, analogous to RAM): - System Instructions - Working Context (current task information) - FIFO Queue (rolling window of conversation history)

External Storage (analogous to Disk): - Archival Storage (vector-indexed archive) - Recall Storage (complete conversation history)

Core Mechanisms: - Active Memory Management — The model autonomously decides when to store/retrieve information from external storage - Interrupt-Driven — OS-style event-driven mechanism for memory operations - Seamless Extension — Breaks through the context window limit, enabling infinitely long sessions

4.2 Cognitive Architectures for Language Agents (CoALA)

Authors: Theodore R. Sumers et al. (Princeton / Google DeepMind) Link: https://arxiv.org/abs/2309.02427 Published: September 2023

An important survey systematically combining cognitive science with LLM Agent design.

Key Quote:

“We present a systematic framework for thinking about and building language agents, drawing on insights from cognitive science, AI planning, and recent work on LLM-based agents.”

Four-Category Memory Framework:

Memory Type	Description
Episodic Memory	Specific past events, retrievable chronologically
Semantic Memory	General knowledge and facts, context-independent
Procedural Memory	Skills and operation methods, manifested as behavioral patterns
Working Memory	Active information in current context, limited capacity

Action Space Classification: - External Actions — Environmental interaction (tool calls, API requests) - Internal Actions — Reasoning, retrieval, memory read/write - Model Storage — Parameter updates (fine-tuning)

4.3 A Survey on Large Language Model based Autonomous Agents

Authors: Lei Wang et al. (Renmin University of China) Link: https://arxiv.org/abs/2308.11432 Published: August 2023

One of the most comprehensive LLM Agent surveys, covering three dimensions: construction, application, and evaluation.

Key Quote:

“We propose a unified framework to systematically organize and understand the field of LLM-based autonomous agents, encompassing the construction, application, and evaluation of such agents.”

Three Pillars of Agent Construction: - Profiling Module — Defines the Agent’s role, capabilities, and constraints - Memory Module — Designs storage, retrieval, and update mechanisms for memory - Action Module — Plans action sequences and interacts with the environment

Part V — OpenClaw Engineering Design Deep Dive

This section synthesizes the OpenClaw official docs, source code analysis, and multiple technical blogs for a systematic review of its core engineering design.

Reference Sources: - Official Docs: https://docs.openclaw.ai - Towards AI Deep Analysis: OpenClaw Architecture Deep Dive - Memory System Deep Dive: How OpenClaw’s Memory System Works - Three-Layer Architecture: Technical Principles and Extension Practices - System Architecture Overview: OpenClaw System Architecture Overview - Medium Analysis: How OpenClaw Works

5.1 Three-Layer Architecture Overview

OpenClaw’s core design philosophy is to treat AI as an infrastructure problem, not merely an application-layer wrapper. It adopts a three-layer architecture with clear separation of concerns.

Design Philosophy Quote (eastondev.com):

“OpenClaw divides the entire system into three layers, each managing its own concerns… Without layering, all logic piled together means changing one place might affect everything — completely unmaintainable.”

Layer Responsibilities:

① Gateway Layer (Session Management Hub) - Manages the complete lifecycle of user sessions - Message queue and scheduling (who gets processed first) - Authentication and permission control - WebSocket persistent connection maintenance - Single-Writer Architecture: Only one Agent run per Session at any given time, preventing concurrent write state races

② Channel Layer (Platform Adapter) - Adapts platform-specific message formats (WhatsApp / Telegram / Discord / Feishu differ significantly) - Message routing rules (DM vs. group chat, whether @mention is required) - Adapter Pattern: Normalizes multi-platform input into a unified internal format, decoupling Gateway from the LLM layer - Each Channel + user combination has an independently isolated Session, preventing cross-platform context contamination

③ LLM Layer (Model Interface) - Unified Provider interface (Claude / GPT / local models called identically) - Tool Calling (Function Calling) - Streaming response handling - MCP Server integration - Provider Plugin System (2026 refactor): Supports dynamic registration of different model Providers

Complete Message Flow (Feishu example): 1. Channel layer receives webhook → normalizes message format 2. Routing decision: DM or group chat, permission check 3. Gateway finds (or creates) the user’s Session, message enters queue 4. LLM layer selects Provider per config, sends conversation context 5. Model returns result → Channel layer formats response

5.2 Memory System Deep Dive

OpenClaw’s memory system is one of its most valuable engineering designs. Its core philosophy is File-First — Markdown files are the Source of Truth; the LLM’s Context Window is merely a dynamic cache.

Core Design Quote (snowan.gitbook.io):

“Unlike traditional RAG systems that rely on vector databases, OpenClaw takes a file-first approach: Markdown files are the source of truth, and the memory system is designed to help AI agents remember context across conversations.”

Three-Layer Memory Structure:

① Long-Term Core Memory (MEMORY.md) - Stores the Agent’s preferences, operating principles, and persistent knowledge - Only loaded in private primary sessions — prevents privacy leakage to group chats / sub-Agents - Human-readable and editable; highly transparent

② Daily Logs (memory/YYYY-MM-DD.md) - Append-only writes; records daily activities, decisions, and conversation summaries - Today’s + yesterday’s logs auto-loaded at Session start - Analogous to a rolling window of working memory

③ Session Memory (SQLite + Vector Embeddings) - Storage path: ~/.openclaw/memory/<agentId>.sqlite - Conversation history auto-indexed with vector embeddings, supports semantic retrieval - Isolated by Agent ID — each Agent has an independent knowledge base

Core Implementation: MemoryIndexManager

export class MemoryIndexManager {
  private readonly agentId: string;
  private readonly workspaceDir: string;
  private provider: EmbeddingProvider;
  private db: DatabaseSync;
  private watcher: FSWatcher | null = null;
  // ...
}

Key engineering decisions: - Singleton + Cache — Prevents redundant index rebuilding (INDEX_CACHE) - File Watching — Debounced file change sync, real-time index updates - Provider Fallback Chain — Local → OpenAI → Gemini → Voyage, graceful degradation - Delta-Based Incremental Sync — Only processes changed content, optimizing performance - SHA-256 Deduplication — Avoids re-embedding unchanged content

Hybrid Retrieval: - BM25 keyword search + vector semantic search running in parallel - Combines exact keyword matching with semantic understanding; recall significantly outperforms either approach alone

Markdown Chunking Algorithm:

OpenClaw uses a sliding window + overlap-preserving algorithm for Markdown chunking:

export function chunkMarkdown(
  content: string,
  chunking: { tokens: number; overlap: number },
): MemoryChunk[] {
  // Sliding window, preserving overlap region to prevent semantic truncation
}

Auto-Compaction and Memory Flush: - When the Context Window approaches its limit, automatically compresses (compacts) old conversations - Pre-Compaction Flush: Triggers memory writes before compaction to ensure important information is not lost - Analogous to the OS page flush mechanism

5.3 Skill Hot-Swap System

The “Skills as Configuration” Philosophy:

Each skill is self-describing via SKILL.md, supporting trigger condition declarations and invocation specification definitions. Built-in skills include: feishu-doc, feishu-wiki, weather, healthcheck, skill-creator, and others. Community skills are distributed via ClaWHub (https://clawhub.com).

Design Advantages: - Self-Describing Interface — SKILL.md defines trigger intents, tool call specifications, and examples - On-Demand Loading — Only reads SKILL.md when intent matches; zero fixed Context overhead - Extensible Ecosystem — Users can install community Skills without modifying core code

Security Warning (Towards AI):

“Security researchers found 400+ malicious plugins in its marketplace within two minutes of looking.”

This incident highlights supply chain security risks in Agent plugin ecosystems — an important lesson for AI Agent engineering.

5.4 Multi-Session / Sub-Agent Architecture

Session Hierarchy Design:

Session Type	Purpose
Main Session	Primary session in direct dialogue with the user
Subagent	Parallel task execution; isolated toolset; independent state
ACP Session	Integration with external coding Agents like Claude Code / Codex
Thread Session	Persistent dialogue threads bound to platforms like Discord

Engineering Highlights: - sessions_spawn isolates sub-sessions, preventing state contamination and Context conflicts - Sub-Agents push notifications to the main session upon completion, avoiding inefficient polling - ACP Harness uniformly orchestrates external Agents (Claude Code, Gemini, etc.) - Single-Writer Queue — Only one Run per Session at a time, guaranteeing message order consistency

5.5 Heartbeat Proactive Scheduling

OpenClaw’s built-in Heartbeat mechanism transforms the Agent from passive response to proactive scheduling:

Triggers periodically to check emails, calendar events, and pending notifications
Automatically suppresses interruptions during quiet periods (e.g., late night)
Supports background work (memory organization, code checks, file sync)
Cooperates with Cron system: precise scheduled tasks go through Cron; batch periodic checks go through Heartbeat

5.6 Design Highlights Summary

Engineering Dimension	OpenClaw Design	Academic Concept
Memory Layering	MEMORY.md / Daily Logs / SQLite	MemGPT Memory Hierarchy
Hybrid Retrieval	BM25 + Vector Hybrid	Dense-Sparse Hybrid Retrieval
Context Management	File as State / On-Demand Loading	CoALA Working Memory
Multi-Agent Coordination	Subagent + Push Notification	Orchestrator-Subagents Pattern
Tool Calling	Skill Hot-Swap / SKILL.md	ReAct Tool Use Paradigm
Proactive Scheduling	Heartbeat Mechanism	Proactive Agent Design

Part I — Claude Code Engineering Design

1.1 Introducing Claude Code (Anthropic Official)

1.2 Claude Code System Prompt Analysis (Community)

1.3 Building Effective Agents (Anthropic Engineering)

Part II — OpenAI Codex Engineering Design

2.1 Introducing Codex (OpenAI Official)

Part III — General AI Agent Engineering Design

3.1 LLM Powered Autonomous Agents (Essential Classic)

3.2 ReAct: Synergizing Reasoning and Acting in Language Models

Part IV — Context Management and Memory Mechanisms

4.1 MemGPT: Towards LLMs as Operating Systems

4.2 Cognitive Architectures for Language Agents (CoALA)

4.3 A Survey on Large Language Model based Autonomous Agents

Part V — OpenClaw Engineering Design Deep Dive

5.1 Three-Layer Architecture Overview

5.2 Memory System Deep Dive

5.3 Skill Hot-Swap System

5.4 Multi-Session / Sub-Agent Architecture

5.5 Heartbeat Proactive Scheduling

5.6 Design Highlights Summary

Further Reading