Harness Engineering

Qiziqiang6/24/26About 24 min

Harness (English: [ˈhɑːnɪs]): Originally refers to the equipment placed on a horse for control and guidance. In the AI context, Harness Engineering refers to the control system built around AI Agents, used to guide, constrain, and enhance the agent's capabilities, enabling it to operate stably and controllably in production environments.

Introduction: Why Do We Need Harness Engineering?

When we talk about AI Agents, what often comes to mind is an intelligent entity that can think autonomously, use tools, and complete complex tasks. However, real-world Agents are far more complex than this—they need to handle a series of engineering problems such as permission control, error recovery, context management, and task scheduling. This is like a fine horse: without proper harness, it cannot effectively pull a cart or carry people.

The essence of Harness Engineering is to provide a complete "operating system" for AI Agents. It does not change the agent's core capabilities (reasoning, generation, decision-making), but through a series of carefully designed subsystems, organizes these capabilities into a reliable, controllable, and scalable production system.

The Learn Claude Code project provides us with an excellent case study. Through 20 progressive steps, it gradually builds a production-grade agent framework from a minimal agent loop of 102 lines of code to 1708 lines. This process perfectly demonstrates the core idea of Harness Engineering: one loop, surrounded by subsystems that give it production form.

This article will deeply analyze the design philosophy, implementation principles, and future evolution directions of Harness Engineering from 16 dimensions: Agent Loop, Tool Use, Permission, Hooks, Todo, Skill, Context, Memory, Error Recovery, Subagent, Task, Background Tasks, Cron Scheduler, Agent Teams, and more.

Chapter 1: The Core Loop — Agent Loop

1.1 What It Is: The Agent's "Heartbeat"

Agent Loop is the heart of the entire Harness Engineering. Its essence is extremely simple:

while True:
    response = model.generate(messages)
    if response.has_tool_call:
        result = execute_tool(response.tool_call)
        messages.append(result)
    else:
        break

This loop has only three steps:

Call the model: Send the current conversation history to the language model
Check the response: Does the model request to use a tool?
Execute or end: If there's a tool call, execute it and feed back the result; otherwise, end

1.2 Why: Simplicity Is Power

Why can such a simple loop serve as the foundation of the entire framework? Because it embodies the philosophy of recursive decomposition:

Any complex task can be decomposed into a series of tool calls
Each tool call can be further decomposed into smaller tool calls
Until the final task can be directly completed

The elegance of this design lies in: the loop itself doesn't need to know what problem it's solving. It's just an execution engine that converts the model's intentions into actual actions.

1.3 How to Design: The Minimal Viable Loop

The s01 phase of the Learn Claude Code project demonstrates the implementation of a minimal viable loop:

# Minimal agent loop in 102 lines of code
import anthropic

client = anthropic.Anthropic()
messages = []
tools = [read_file_tool]  # Only 1 tool

while True:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=tools,
        messages=messages
    )
    
    if response.stop_reason == "tool_use":
        # Execute tool and continue loop
        tool_result = execute_tool(response.content)
        messages.append(tool_result)
    else:
        # Model finished, exit loop
        print(response.content)
        break

The essence of this design is: it's an infinite loop, but the model's response determines when to exit. The model is both the decision-maker and the judge of termination conditions.

1.4 Essential Exploration: Loop as State Machine

From a deeper perspective, Agent Loop is essentially a finite state machine:

State: [Thinking] --model response--> [Using Tool] --execution result--> [Thinking]
       [Thinking] --no tool call--> [End]

Each state transition is determined by the model's output. This means:

Agent behavior is entirely driven by the model's reasoning
The framework is only responsible for executing state transitions
Any complex behavior is a combination of this simple state machine

1.5 Current Problems and Improvement Directions

Problem 1: Loop Efficiency
The current loop is serial—only one tool call can be executed at a time. When multiple tool calls have no dependencies between them, this causes unnecessary waiting.

Improvement Direction: Parallel tool execution. The framework can analyze dependencies between tool calls and execute independent calls in parallel.

Problem 2: Loop Control
The model may fall into an infinite loop—repeatedly calling the same tools without making progress.

Improvement Direction: Introduce loop detection and intervention mechanisms. The framework can monitor tool call patterns and forcibly inject new prompts or terminate the loop when a loop is detected.

Chapter 2: The Tool System — Tool Use

2.1 What It Is: The Agent's "Hands"

If Agent Loop is the heart, then Tool Use is the agent's hands. Tools are the only way for agents to interact with the outside world—whether reading files, executing commands, calling APIs, or operating databases.

2.2 Why: The Boundary of Capability Extension

Language models themselves can only generate text. To make it a true agent, it must be given the ability to act. The core problem the tool system solves is: how to let the model know what tools are available and how to use them correctly.

2.3 How to Design: The Dispatcher Pattern

The Learn Claude Code project uses the dispatcher pattern to design the tool system:

# Tool registry
TOOLS = {
    "read_file": {
        "description": "Read file content",
        "parameters": {"path": "File path"},
        "handler": read_file_handler
    },
    "write_file": {
        "description": "Write file content",
        "parameters": {"path": "File path", "content": "File content"},
        "handler": write_file_handler
    }
}

def execute_tool(tool_call):
    tool = TOOLS[tool_call.name]
    return tool["handler"](**tool_call.parameters)

Advantages of this design:

Adding a tool only requires registering one line of code
The main loop has no knowledge of specific tools
Tool implementation details are completely encapsulated

2.4 Essential Exploration: Tools as Interfaces

From a software engineering perspective, the tool system is essentially an interface abstraction layer. It defines the contract for agent interaction with the outside world:

Tool description: Tells the model what this tool can do
Parameter definition: Tells the model what information needs to be provided
Return format: Tells the model what result it will get

This abstraction makes it so that the model doesn't need to know the implementation details of tools, only how to describe its intentions.

2.5 Current Problems and Improvement Directions

Problem 1: Tool Discovery
When the number of tools increases, the model may not know which tool is most suitable for the current task.

Improvement Direction: Intelligent tool recommendation. Dynamically recommend the most relevant tool subset based on current context and task type.

Problem 2: Tool Composition
Complex tasks often require the combined use of multiple tools, but the model may not be able to plan the optimal tool call sequence at once.

Improvement Direction: Tool orchestration engine. The framework can provide templates or planners for tool combinations to help models better organize tool calls.

Chapter 3: Permission Control — Permission

3.1 What It Is: The Agent's "Brake"

The permission system is the safety valve in Harness Engineering. It determines what operations the agent can perform under what conditions. An agent without permission control is like a car without brakes—possibly fast, but extremely dangerous.

3.2 Why: The Balance Between Trust and Safety

Agents need to perform various operations, but not all operations are safe. The permission system needs to find a balance between two extremes:

Over-restriction: The agent cannot complete any useful work
Over-permissiveness: The agent may cause irreversible damage

3.3 How to Design: The Decision Point Pattern

Permission control usually inserts a decision point before tool execution:

def execute_tool_with_permission(tool_call):
    # Decision point: check permission
    if needs_permission(tool_call):
        user_decision = ask_user_permission(tool_call)
        if user_decision == "deny":
            return PermissionDeniedError()
    
    # Execute tool
    return execute_tool(tool_call)

Permission decisions can be based on multiple factors:

Operation type: Read vs Write vs Delete
Scope of impact: Single file vs Entire directory
Reversibility: Reversible vs Irreversible
User preference: Auto-allow vs Ask every time

3.4 Essential Exploration: Permission as Policy

The permission system is essentially a policy engine. It separates the decision of "what can be done" from the execution logic, making it so that:

Policies can be adjusted independently of implementation
Different policies can be applied to different scenarios
Policies can be audited and tracked

3.5 Current Problems and Improvement Directions

Problem 1: Permission Granularity
Current permission systems are usually binary—allow or deny. But in actual scenarios, finer-grained control is often needed.

Improvement Direction: Conditional permissions. For example, "allow writing to /tmp directory, but not to /etc directory".

Problem 2: Permission Learning
Every operation requires user decision, which causes "permission fatigue".

Improvement Direction: Permission learning. The framework can learn user decision patterns and automatically apply known security policies.

Chapter 4: The Hook System — Hooks

4.1 What It Is: The Agent's "Nervous System"

The hook system is the cross-cutting concern processor in Harness Engineering. It allows inserting custom logic at key points in the agent lifecycle without modifying core loop code.

4.2 Why: Separation of Concerns

Agents need to handle many tasks that are not related to core logic but are crucial:

Logging: Recording agent behavior for debugging
Monitoring metrics: Collecting performance and usage data
Audit trail: Recording all operations for compliance
Cache optimization: Caching frequently used tool results

If these logics are written directly into the loop, the code will become bloated and difficult to maintain.

4.3 How to Design: The Event-Driven Pattern

The hook system usually adopts the event-driven pattern:

# Hook registration
hooks = {
    "before_tool_call": [log_tool_call, check_rate_limit],
    "after_tool_call": [cache_result, update_metrics],
    "on_error": [retry_with_backoff, notify_user]
}

def execute_tool_with_hooks(tool_call):
    # Trigger pre-hooks
    for hook in hooks["before_tool_call"]:
        hook(tool_call)
    
    # Execute tool
    result = execute_tool(tool_call)
    
    # Trigger post-hooks
    for hook in hooks["after_tool_call"]:
        hook(tool_call, result)
    
    return result

4.4 Essential Exploration: Hooks as Middleware

The hook system is essentially a middleware pattern. It decouples cross-cutting concerns from business logic, making it so that:

Each hook focuses only on a single responsibility
Hooks can be independently added, removed, or modified
The core loop remains clean and focused

4.5 Current Problems and Improvement Directions

Problem 1: Hook Order
When multiple hooks are registered to the same event, execution order may be important.

Improvement Direction: Priority system. Assign priorities to each hook to ensure critical hooks (like security checks) execute first.

Problem 2: Hook Performance
Too many hooks may affect the agent's response speed.

Improvement Direction: Asynchronous hooks. Non-critical hooks can be executed asynchronously in the background without blocking the main loop.

Chapter 5: Plan Management — Todo

5.1 What It Is: The Agent's "Task List"

The Todo system provides agents with explicit plan management capabilities. It allows agents to decompose complex tasks into trackable subtasks and track the completion status of each subtask.

An agent without a plan is like a traveler without a map—may keep moving forward, but easily loses direction. The core problems the Todo system solves are:

Task decomposition: Breaking down large goals into executable small steps
Progress tracking: Knowing what has been completed and what remains
Error recovery: Knowing where to restart when a step fails

5.3 How to Design: The State Machine Pattern

The Todo system usually adopts the state machine pattern:

class TodoItem:
    def __init__(self, description):
        self.description = description
        self.status = "pending"  # pending, in_progress, completed, failed
        self.dependencies = []
        self.result = None

class TodoManager:
    def __init__(self):
        self.items = []
    
    def add_item(self, description, dependencies=[]):
        item = TodoItem(description)
        item.dependencies = dependencies
        self.items.append(item)
    
    def get_next_task(self):
        # Return the next task that can be executed (all dependencies completed)
        for item in self.items:
            if item.status == "pending":
                if all(self.is_completed(dep) for dep in item.dependencies):
                    return item
        return None
    
    def mark_completed(self, item, result):
        item.status = "completed"
        item.result = result

5.4 Essential Exploration: Plan as State

The Todo system is essentially a state manager. It extracts the agent's "memory" from conversation history and stores it in a structured way, making it so that:

Plans can be persisted: Even if the conversation is interrupted, the plan won't be lost
Plans can be modified: Adjust subsequent steps based on new information
Plans can be shared: Multiple agents can collaborate to execute the same plan

5.5 Current Problems and Improvement Directions

Problem 1: Plan Rigidity
The agent may strictly follow the plan even when the plan is no longer optimal.

Improvement Direction: Dynamic plan adjustment. The agent can proactively modify subsequent plans based on execution results and new information.

Problem 2: Dependency Management
Complex dependency relationships may lead to deadlocks or circular dependencies.

Improvement Direction: Dependency graph analysis. The framework can detect and prevent dependency issues, ensuring plan feasibility.

Chapter 6: Skill Loading — Skill

6.1 What It Is: The Agent's "Professional Knowledge"

The skill system allows agents to load professional domain knowledge and capabilities on demand. It's not an independent functional module, but a knowledge injection mechanism.

6.2 Why: The Scarcity of Context

The context window of language models is limited. If all potentially needed knowledge is placed in the system prompt, it will:

Waste precious context space
Increase the model's processing burden
Reduce response relevance

The skill system solves this problem through on-demand loading.

6.3 How to Design: The Dynamic Injection Pattern

The skill system usually adopts the dynamic injection pattern:

class SkillLoader:
    def __init__(self):
        self.skills = {
            "python": "You are a Python expert...",
            "database": "You are a database expert...",
            "security": "You are a security expert..."
        }
    
    def load_skill(self, skill_name):
        if skill_name in self.skills:
            return self.skills[skill_name]
        return None
    
    def inject_skill(self, messages, skill_name):
        skill_content = self.load_skill(skill_name)
        if skill_content:
            # Inject skill into system prompt
            messages.insert(0, {"role": "system", "content": skill_content})

6.4 Essential Exploration: Skills as Patterns

The skill system is essentially a pattern matching and injection mechanism. It identifies needed professional knowledge based on current task characteristics and injects it into the agent's context, making it so that:

Agents can dynamically adapt to tasks in different domains
Professional knowledge can be updated independently of the core agent
Different tasks can use different combinations of professional knowledge

6.5 Current Problems and Improvement Directions

Problem 1: Skill Identification
The agent may not accurately determine which skills the current task needs.

Improvement Direction: Intelligent skill recommendation. Automatically recommend the most relevant skills based on task description and historical data.

Problem 2: Skill Conflicts
Multiple skills may provide contradictory guidance.

Improvement Direction: Skill priority and conflict resolution mechanisms. Define priority relationships between skills and arbitrate when conflicts occur.

Chapter 7: System Prompt — Prompt

7.1 What It Is: The Agent's "Operating System"

The system prompt is the agent's "constitution"—it defines the agent's identity, capabilities, limitations, and code of conduct. Unlike skills, the system prompt is persistent and foundational, and doesn't change with tasks.

7.2 Why: Predictability of Behavior

An agent without a system prompt is like an organization without rules—behavior is unpredictable and difficult to control. The core problems the system prompt solves are:

Identity definition: Tells the model who it is and what it can do
Behavior constraints: Defines what can and cannot be done
Output format: Specifies the structure and style of responses

7.3 How to Design: The Runtime Assembly Pattern

Modern Harness Engineering usually adopts the runtime assembly pattern to build system prompts:

def build_system_prompt(context):
    prompt_parts = []
    
    # 1. Basic identity
    prompt_parts.append("You are a professional programming assistant...")
    
    # 2. Capability declaration
    prompt_parts.append("You can read and modify files, execute commands...")
    
    # 3. Behavior constraints
    prompt_parts.append("You must always follow security best practices...")
    
    # 4. Dynamic context
    if context.current_project:
        prompt_parts.append(f"Current project: {context.current_project}")
    
    # 5. Loaded skills
    for skill in context.loaded_skills:
        prompt_parts.append(skill)
    
    return "\n\n".join(prompt_parts)

7.4 Essential Exploration: Prompt as Code

The system prompt is essentially a declarative programming language. It describes the behavior the agent should have through natural language, rather than implementing these behaviors through code, making it so that:

Non-technical personnel can also define agent behavior
Behavior can be quickly iterated and adjusted
Different scenarios can use different behavior definitions

7.5 Current Problems and Improvement Directions

Problem 1: Prompt Engineering
Writing effective system prompts requires a lot of experience and trial and error.

Improvement Direction: Prompt generation and optimization tools. Generate more effective system prompts through automated testing and optimization.

Problem 2: Prompt Bloat
As features increase, the system prompt may become very long, affecting model performance.

Improvement Direction: Prompt compression and layering. Divide the system prompt into core and extension layers, dynamically loading as needed.

Chapter 8: Context Management — Context

8.1 What It Is: The Agent's "Short-Term Memory"

Context is the agent's "working memory" within a single session. It contains all historical messages, tool call results, and intermediate states of the current conversation.

8.2 Why: The Limitation of Context Windows

The context window of language models is limited (usually 128K-200K tokens). When conversation history exceeds this limit, the agent "forgets" earlier information. The core problems context management solves are:

How to retain the most important information within a limited window
How to compress when the conversation becomes too long
How to restore lost context when needed

8.3 How to Design: The Compression and Retrieval Pattern

Context management usually adopts the compression and retrieval pattern:

class ContextManager:
    def __init__(self, max_tokens=100000):
        self.max_tokens = max_tokens
        self.full_history = []
        self.compressed_history = []
    
    def add_message(self, message):
        self.full_history.append(message)
        
        # Check if compression is needed
        if self.total_tokens() > self.max_tokens:
            self.compress()
    
    def compress(self):
        # Compress old messages into a summary
        old_messages = self.full_history[:len(self.full_history)//2]
        summary = self.summarize(old_messages)
        
        # Keep recent messages and summary
        self.compressed_history.append({"role": "system", "content": summary})
        self.full_history = self.full_history[len(old_messages):]
    
    def get_context(self):
        return self.compressed_history + self.full_history

8.4 Essential Exploration: Context as Cache

Context is essentially a caching system. It caches the agent's "thinking process," making it so that:

Agents can reference previous work: No need to recompute
Agents can maintain conversation coherence: Users feel like they're talking to the same person
Agents can learn from mistakes: Remembering what methods worked and what didn't

8.5 Current Problems and Improvement Directions

Problem 1: Information Loss
The compression process inevitably loses some information, which may cause the agent to forget important details.

Improvement Direction: Intelligent compression. Use more advanced summarization algorithms to retain key information while reducing token count.

Problem 2: Retrieval Efficiency
When context is very long, retrieving specific information may be slow.

Improvement Direction: Vector retrieval. Convert context to vector representations and quickly locate relevant information through semantic search.

Chapter 9: Memory System — Memory

9.1 What It Is: The Agent's "Long-Term Memory"

The memory system is the agent's persistent storage, used to save important information across sessions. Unlike context, memory doesn't disappear when the session ends.

9.2 Why: Learning and Personalization

An agent without memory starts from scratch every session, unable to:

Learn user preferences: Remember user's preferred code style, tools, etc.
Accumulate project knowledge: Remember project structure, conventions, common problems
Avoid repeating mistakes: Remember methods that were tried before but failed

9.3 How to Design: The Layered Storage Pattern

The memory system usually adopts the layered storage pattern:

class MemorySystem:
    def __init__(self):
        self.short_term = {}  # In-session memory
        self.long_term = {}   # Cross-session memory
        self.episodic = []    # Episodic memory
    
    def store(self, key, value, importance="medium"):
        if importance == "high":
            # Store in long-term memory
            self.long_term[key] = {
                "value": value,
                "timestamp": datetime.now(),
                "access_count": 0
            }
        else:
            # Store in short-term memory
            self.short_term[key] = value
    
    def recall(self, key):
        # First check short-term memory
        if key in self.short_term:
            return self.short_term[key]
        
        # Then check long-term memory
        if key in self.long_term:
            self.long_term[key]["access_count"] += 1
            return self.long_term[key]["value"]
        
        return None
    
    def consolidate(self):
        # Transfer frequently accessed short-term memory to long-term memory
        for key, value in self.short_term.items():
            if self.is_important(key):
                self.store(key, value, importance="high")

9.4 Essential Exploration: Memory as Database

The memory system is essentially an intelligent database. It not only stores data, but also:

Automatically determines the importance of information
Optimizes storage based on access patterns
Provides semantic retrieval capabilities

9.5 Current Problems and Improvement Directions

Problem 1: Memory Conflicts
New information may contradict old information, leading to memory inconsistency.

Improvement Direction: Memory version control. Retain historical versions of information and arbitrate when conflicts occur.

Problem 2: Memory Retrieval
When memory volume is large, how to quickly find relevant information is a challenge.

Improvement Direction: Semantic indexing. Use vector databases to store memory and retrieve through semantic similarity.

Chapter 10: Error Recovery — Error Recovery

10.1 What It Is: The Agent's "Immune System"

The error recovery system is the agent's ability to handle exceptional situations. It ensures that the agent doesn't crash when encountering errors, but can gracefully recover and continue working.

10.2 Why: The Complexity of the Real World

In real environments, various errors can occur:

Tool execution failures: File doesn't exist, insufficient permissions, network timeout
Model output errors: Format errors, logical errors, hallucinations
External dependency failures: API unavailable, service down

An agent without error recovery is like a body without an immune system—the first pathogen encountered will cause it to fall.

10.3 How to Design: The Classification and Retry Pattern

Error recovery usually adopts the classification and retry pattern:

class ErrorRecovery:
    def __init__(self):
        self.max_retries = 3
        self.retry_strategies = {
            "transient": self.retry_with_backoff,
            "permanent": self.abort_and_report,
            "recoverable": self.try_alternative
        }
    
    def handle_error(self, error, context):
        # Error classification
        error_type = self.classify_error(error)
        
        # Select recovery strategy
        strategy = self.retry_strategies[error_type]
        
        # Execute recovery
        return strategy(error, context)
    
    def retry_with_backoff(self, error, context):
        # Exponential backoff retry
        for attempt in range(self.max_retries):
            time.sleep(2 ** attempt)
            try:
                return self.retry_operation(context)
            except Exception as e:
                continue
        raise MaxRetriesExceeded()
    
    def try_alternative(self, error, context):
        # Try alternative approaches
        alternatives = self.get_alternatives(context)
        for alt in alternatives:
            try:
                return alt.execute()
            except Exception:
                continue
        raise NoAlternativesAvailable()

10.4 Essential Exploration: Error as Information

The error recovery system is essentially a learning system. It treats errors as valuable information sources, used for:

Identifying system weak points: Which tools fail most often?
Optimizing retry strategies: What types of errors are worth retrying?
Improving future decisions: Avoiding repeating the same mistakes

10.5 Current Problems and Improvement Directions

Problem 1: Error Propagation
An error in one tool may cascade and affect the entire task.

Improvement Direction: Error isolation. Set independent error boundaries for each tool call to prevent error propagation.

Problem 2: Recovery Cost
Retries and recovery may consume significant time and resources.

Improvement Direction: Intelligent retry decisions. Based on error type, historical success rate, and remaining budget, decide whether retrying is worthwhile.

Chapter 11: Subagent — Subagent

11.1 What It Is: The Agent's "Avatar"

The subagent system allows the main agent to delegate tasks to specialized subagents. Each subagent has an independent context and toolset, focusing on completing specific subtasks.

11.2 Why: Decomposition of Complexity

When tasks become complex, a single agent may face:

Context overload: All information crammed into one window
Role confusion: One agent handling too many different types of tasks
Inefficiency: Serial processing of tasks that could be parallel

Subagents solve these problems through divide and conquer.

11.3 How to Design: The Delegation and Aggregation Pattern

The subagent system usually adopts the delegation and aggregation pattern:

class MainAgent:
    def __init__(self):
        self.subagent_pool = []
    
    def delegate_task(self, task):
        # Create specialized subagent
        subagent = self.create_subagent(task)
        
        # Assign task
        result = subagent.execute(task)
        
        # Aggregate results
        return self.aggregate_results(result)
    
    def create_subagent(self, task):
        # Create appropriate subagent based on task type
        if task.type == "code_review":
            return CodeReviewSubagent()
        elif task.type == "testing":
            return TestingSubagent()
        else:
            return GeneralSubagent()

class Subagent:
    def __init__(self):
        self.context = []  # Independent context
        self.tools = []    # Specialized toolset
    
    def execute(self, task):
        # Execute task in independent context
        messages = [{"role": "user", "content": task.description}]
        
        while True:
            response = self.model.generate(messages)
            if response.has_tool_call:
                result = self.execute_tool(response.tool_call)
                messages.append(result)
            else:
                return response.content

11.4 Essential Exploration: Subagents as Microservices

The subagent system is essentially a microservices architecture. Each subagent is an independent "service" with:

Single responsibility: Only handles specific types of tasks
Independent context: Not interfered with by information from other tasks
Composability: Multiple subagents can collaborate to complete complex tasks

11.5 Current Problems and Improvement Directions

Problem 1: Communication Overhead
Communication between the main agent and subagents requires transmitting large amounts of context.

Improvement Direction: Shared memory. Use shared memory space to reduce communication overhead.

Problem 2: Coordination Complexity
When multiple subagents work in parallel, coordinating their activities becomes complex.

Improvement Direction: Coordination protocols. Define clear coordination protocols to ensure effective collaboration between subagents.

Chapter 12: Task System — Task

12.1 What It Is: The Agent's "Project Management"

The task system decomposes large goals into structured task graphs. It's not just a collection of to-do items, but a workflow engine that manages dependencies, priorities, and execution order between tasks.

12.2 Why: The Bridge from Goals to Execution

Users usually provide high-level goals ("fix this bug", "add this feature") rather than specific execution steps. The core problems the task system solves are:

Goal decomposition: Transforming vague goals into concrete steps
Dependency management: Determining which steps can be parallel and which must be serial
Progress tracking: Knowing overall progress and predicting completion time

12.3 How to Design: The Directed Acyclic Graph Pattern

The task system usually adopts the Directed Acyclic Graph (DAG) pattern:

class TaskGraph:
    def __init__(self):
        self.tasks = {}
        self.dependencies = {}
    
    def add_task(self, task_id, description, dependencies=[]):
        self.tasks[task_id] = {
            "description": description,
            "status": "pending",
            "result": None
        }
        self.dependencies[task_id] = dependencies
    
    def get_executable_tasks(self):
        # Return all tasks whose dependencies are satisfied
        executable = []
        for task_id, deps in self.dependencies.items():
            if self.tasks[task_id]["status"] == "pending":
                if all(self.tasks[dep]["status"] == "completed" for dep in deps):
                    executable.append(task_id)
        return executable
    
    def mark_completed(self, task_id, result):
        self.tasks[task_id]["status"] = "completed"
        self.tasks[task_id]["result"] = result
    
    def is_complete(self):
        return all(t["status"] == "completed" for t in self.tasks.values())

12.4 Essential Exploration: Tasks as Workflow

The task system is essentially a workflow engine. It separates execution logic from task definitions, making it so that:

Tasks can be visualized: Task graphs can intuitively display work progress
Tasks can be optimized: By analyzing dependencies, find the optimal execution order
Tasks can be recovered: When a task fails, it can continue from the breakpoint

12.5 Current Problems and Improvement Directions

Problem 1: Task Granularity
The granularity of task decomposition is difficult to grasp—too fine leads to management overhead, too coarse loses parallelism.

Improvement Direction: Adaptive granularity. Dynamically adjust task granularity based on task complexity and dependencies.

Problem 2: Dynamic Adjustment
When new information is discovered during execution, the task graph may need to be adjusted.

Improvement Direction: Dynamic task graphs. Allow adding, deleting, or modifying tasks during execution.

Chapter 13: Background Tasks — Background Tasks

13.1 What It Is: The Agent's "Asynchronous Capability"

The background task system allows agents to put time-consuming operations in the background while continuing to handle other work. It's a key mechanism for agents to achieve non-blocking operations.

13.2 Why: Efficiency and Responsiveness

Some operations may take a long time (compiling code, running tests, downloading files). If the agent must wait for these operations to complete, it will cause:

Response delays: Users wait too long
Resource waste: The agent can't do other work while waiting
Poor user experience: The agent feels "stuck"

13.3 How to Design: The Asynchronous Execution Pattern

The background task system usually adopts the asynchronous execution pattern:

class BackgroundTaskManager:
    def __init__(self):
        self.tasks = {}
        self.task_queue = queue.Queue()
        self.workers = []
    
    def submit_task(self, task_id, task_func, callback):
        # Submit task to background
        self.tasks[task_id] = {
            "status": "running",
            "result": None,
            "callback": callback
        }
        
        # Execute asynchronously
        thread = threading.Thread(
            target=self._execute_task,
            args=(task_id, task_func)
        )
        thread.start()
    
    def _execute_task(self, task_id, task_func):
        try:
            result = task_func()
            self.tasks[task_id]["status"] = "completed"
            self.tasks[task_id]["result"] = result
            
            # Call callback function
            callback = self.tasks[task_id]["callback"]
            if callback:
                callback(result)
        except Exception as e:
            self.tasks[task_id]["status"] = "failed"
            self.tasks[task_id]["error"] = str(e)
    
    def get_task_status(self, task_id):
        return self.tasks[task_id]["status"]
    
    def get_task_result(self, task_id):
        task = self.tasks[task_id]
        if task["status"] == "completed":
            return task["result"]
        return None

13.4 Essential Exploration: Background as Concurrency

The background task system is essentially a concurrent execution framework. It allows agents to:

Handle multiple tasks simultaneously: Improve resource utilization
Maintain responsiveness: Users won't feel the agent is "stuck"
Implement pipelines: One task's output can be another task's input

13.5 Current Problems and Improvement Directions

Problem 1: Task Monitoring
Background tasks may fail, but the agent may not know.

Improvement Direction: Proactive monitoring. Regularly check background task status and notify users promptly when tasks fail.

Problem 2: Resource Management
Too many background tasks may exhaust system resources.

Improvement Direction: Resource limits. Set maximum concurrent task count to prevent system overload.

Chapter 14: Scheduled Execution — Cron Scheduler

14.1 What It Is: The Agent's "Alarm Clock"

The scheduled execution system allows agents to execute tasks on schedule. It's a key mechanism for agents to achieve automation and periodic work.

14.2 Why: From Passive to Active

Without scheduled execution, agents can only passively respond to user requests. Scheduled execution enables agents to:

Proactively execute tasks: No need for users to manually trigger each time
Maintain system health: Regular checks, cleanup, backups
Respond to time events: Execute specific operations at specific times

14.3 How to Design: The Scheduler Pattern

The scheduled execution system usually adopts the scheduler pattern:

class CronScheduler:
    def __init__(self):
        self.jobs = {}
        self.running = False
    
    def add_job(self, job_id, schedule, task_func, args=[]):
        self.jobs[job_id] = {
            "schedule": schedule,
            "task": task_func,
            "args": args,
            "last_run": None,
            "next_run": self.calculate_next_run(schedule)
        }
    
    def start(self):
        self.running = True
        while self.running:
            now = datetime.now()
            
            # Check if any tasks need to be executed
            for job_id, job in self.jobs.items():
                if now >= job["next_run"]:
                    # Execute task
                    self.execute_job(job_id)
                    
                    # Update next execution time
                    job["last_run"] = now
                    job["next_run"] = self.calculate_next_run(job["schedule"])
            
            # Wait for a while before checking again
            time.sleep(1)
    
    def execute_job(self, job_id):
        job = self.jobs[job_id]
        try:
            job["task"](*job["args"])
        except Exception as e:
            self.handle_job_error(job_id, e)

14.4 Essential Exploration: Scheduling as Time Management

The scheduled execution system is essentially a time management system. It introduces the time dimension into agent behavior, making it so that:

Agents can predict the future: Know when to do what
Agents can optimize time: Execute low-priority tasks during idle time
Agents can respond to time events: Trigger specific behaviors at specific times

14.5 Current Problems and Improvement Directions

Problem 1: Schedule Conflicts
Multiple tasks may trigger at the same time, causing resource competition.

Improvement Direction: Priority scheduling. Set priorities for tasks, with high-priority tasks executing first.

Problem 2: Schedule Failure
If the agent restarts, all scheduled tasks will be lost.

Improvement Direction: Persistent scheduling. Persistently store scheduling configurations and automatically restore after agent restart.

Chapter 15: Multi-Agent Collaboration — Agent Teams

15.1 What It Is: The Agent's "Team"

The Agent Teams system allows multiple agents to work together to complete complex tasks. It's the most advanced form of Harness Engineering, extending single-agent capabilities to the team level.

15.2 Why: The Limits of Single Agents

Single agents face inherent limitations:

Context window limitations: Cannot process all relevant information simultaneously
Professional capability limitations: One agent cannot master all domains
Parallelism limitations: Single-threaded execution cannot fully utilize resources

Agent Teams breaks through these limitations through division of labor and collaboration.

15.3 How to Design: The Team Protocol Pattern

Agent Teams usually adopts the team protocol pattern:

class AgentTeam:
    def __init__(self):
        self.agents = {}
        self.shared_memory = SharedMemory()
        self.communication_protocol = CommunicationProtocol()
    
    def add_agent(self, agent_id, agent, role):
        self.agents[agent_id] = {
            "agent": agent,
            "role": role,
            "status": "idle"
        }
    
    def assign_task(self, task):
        # Analyze task and decide which agent to assign to
        suitable_agent = self.find_suitable_agent(task)
        
        # Assign task
        self.agents[suitable_agent]["status"] = "busy"
        result = self.agents[suitable_agent]["agent"].execute(task)
        
        # Update shared memory
        self.shared_memory.store(task.id, result)
        
        # Notify other agents
        self.communication_protocol.broadcast(
            "task_completed",
            {"task_id": task.id, "result": result}
        )
        
        return result
    
    def find_suitable_agent(self, task):
        # Select the most suitable agent based on role and capability
        for agent_id, agent_info in self.agents.items():
            if agent_info["role"] == task.required_role:
                if agent_info["status"] == "idle":
                    return agent_id
        return None

15.4 Essential Exploration: Teams as Distributed Systems

Agent Teams is essentially a distributed system. It distributes the complexity of a single agent across multiple specialized agents, making it so that:

Each agent focuses only on its area of expertise
Multiple agents can process different subtasks in parallel
The system can scale by adding more agents

15.5 Current Problems and Improvement Directions

Problem 1: Coordination Overhead
Communication and coordination between agents consume significant resources.

Improvement Direction: Efficient protocols. Design more efficient communication protocols to reduce coordination overhead.

Problem 2: Consistency Guarantee
Multiple agents may have different understandings of the same problem, leading to inconsistency.

Improvement Direction: Consensus mechanisms. Introduce blockchain-like consensus mechanisms to ensure team decision consistency.

Chapter 16: Comprehensive Agent — Comprehensive Agent Turn

16.1 What It Is: The "Final Form" of Harness Engineering

The comprehensive agent is the integrated form of all Harness Engineering components. It brings together Agent Loop, Tool Use, Permission, Hooks, Todo, Skill, Context, Memory, Error Recovery, Subagent, Task, Background Tasks, Cron Scheduler, Agent Teams, and all other mechanisms into one loop, forming a complete production-grade agent.

16.2 Why: From Components to System

Individual components are like car parts—each has its function, but only when assembled together can they become a usable car. The core problems the comprehensive agent solves are:

Component integration: Ensuring all components can work together
Behavior consistency: Different components' behaviors don't conflict with each other
Performance optimization: Overall performance is greater than the sum of its parts

16.3 How to Design: The Layered Architecture Pattern

The comprehensive agent usually adopts the layered architecture pattern:

class ComprehensiveAgent:
    def __init__(self):
        # Core layer
        self.agent_loop = AgentLoop()
        
        # Tool layer
        self.tool_registry = ToolRegistry()
        self.permission_manager = PermissionManager()
        
        # Planning layer
        self.todo_manager = TodoManager()
        self.task_graph = TaskGraph()
        self.skill_loader = SkillLoader()
        
        # Memory layer
        self.context_manager = ContextManager()
        self.memory_system = MemorySystem()
        
        # Execution layer
        self.background_task_manager = BackgroundTaskManager()
        self.cron_scheduler = CronScheduler()
        
        # Collaboration layer
        self.subagent_manager = SubagentManager()
        self.agent_team = AgentTeam()
        
        # Cross-cutting concerns
        self.hook_system = HookSystem()
        self.error_recovery = ErrorRecovery()
    
    def process_request(self, user_request):
        # 1. Build system prompt
        system_prompt = self.build_system_prompt()
        
        # 2. Load relevant skills
        relevant_skills = self.skill_loader.identify_skills(user_request)
        
        # 3. Create task plan
        task_plan = self.create_task_plan(user_request)
        
        # 4. Execute agent loop
        while not task_plan.is_complete():
            # Get next task
            task = task_plan.get_next_task()
            
            # Decide whether to handle it yourself or delegate
            if self.should_delegate(task):
                result = self.delegate_to_subagent(task)
            else:
                result = self.execute_task(task)
            
            # Update task status
            task_plan.mark_completed(task, result)
            
            # Update memory
            self.memory_system.store(task.id, result)
        
        # 5. Aggregate results
        final_result = self.aggregate_results(task_plan)
        
        return final_result

16.4 Essential Exploration: Comprehensive Agent as Operating System

The comprehensive agent is essentially an operating system. It manages all of the agent's resources (context, memory, tools, time), provides all services (planning, execution, collaboration, recovery), making it so that:

Agents can focus on thinking and decision-making: No need to worry about low-level details
The system can guarantee reliability and security: Through permission and error recovery mechanisms
Capabilities can scale horizontally: Through subagent and team mechanisms

16.5 Current Problems and Improvement Directions

Problem 1: Complexity Management
As components increase, system complexity grows exponentially.

Improvement Direction: Modular design. Break the system down into independent modules, each focusing on a single responsibility.

Problem 2: Performance Bottlenecks
Interaction between multiple components may create performance bottlenecks.

Improvement Direction: Performance analysis and optimization. Use performance analysis tools to identify bottlenecks and perform targeted optimization.

Chapter 17: Current Problems and Future Improvements

17.1 Special Challenges of Coding Agents

Coding Agents are the most typical application scenario of Harness Engineering, and also the most challenging domain:

Problem 1: Depth of Code Understanding
Current agents' understanding of code is often superficial—able to identify syntax structures, but difficult to understand deep semantics and design intent.

Improvement Directions:

Program analysis techniques: Introduce traditional compiler techniques like static analysis and data flow analysis
Knowledge graphs: Build code knowledge graphs to capture complex relationships between code
Multi-modal understanding: Combine code, comments, documentation, tests, and other information sources

Problem 2: Limitations of Long-Term Memory
Programming tasks often require understanding the overall architecture and historical decisions of a project, but current agents' memory capabilities are limited.

Improvement Directions:

Project memory: Maintain dedicated memory spaces for each project
Decision tracing: Record the reasons and context for each design decision
Knowledge evolution: Track codebase evolution and understand reasons for changes

Problem 3: Insufficient Collaboration Capabilities
Large software projects require multi-person collaboration, but current agents have difficulty collaborating effectively with human developers.

Improvement Directions:

Human-machine collaboration protocols: Define clear human-machine collaboration workflows
Intent understanding: Better understand developer intentions and preferences
Conflict resolution: How to handle conflicts when agent suggestions conflict with developer opinions

17.2 Improvement Directions for General Agents

Direction 1: Enhancing Autonomy
Current agents are still "passive"—requiring user triggers to act. Future agents should be able to:

Proactively discover problems: Monitor system status and proactively identify potential issues
Proactively make suggestions: Based on observations and experience, proactively suggest improvements
Proactively learn: Learn from each interaction and continuously improve capabilities

Direction 2: Enhancing Explainability
Current agents' decision-making processes are often "black boxes." Future agents should be able to:

Explain decision reasons: Clearly explain why a particular decision was made
Provide alternatives: Show other possible choices and their trade-offs
Support audit trails: Record complete decision processes for post-hoc review

Direction 3: Strengthening Security
As agent capabilities increase, security becomes increasingly important. Future agents should be able to:

Risk assessment: Assess potential risks before executing operations
Sandbox execution: Execute high-risk operations in isolated environments
Anomaly detection: Identify and block abnormal or malicious behavior

17.3 Technology Evolution Trends

Trend 1: From Rules to Learning
Current Harness Engineering heavily relies on rules (permission rules, error handling rules, scheduling rules, etc.). The future trend is:

Learning-based rules: Automatically discover and optimize rules through machine learning
Adaptive strategies: Dynamically adjust strategies based on environment and historical data
Predictive intervention: Predict potential problems and take preventive measures in advance

Trend 2: From Monolithic to Distributed
Current agents are usually monolithic—all functions concentrated in one process. The future trend is:

Microservices architecture: Break agents down into independent microservices
Edge computing: Offload some computing tasks to edge devices
Cloud-native deployment: Use cloud-native technologies for elastic scaling

Trend 3: From Tool to Partner
Current agents are more like "tools"—passively executing user instructions. The future trend is:

Partnership: Agents become human partners, jointly solving problems
Emotional understanding: Understand users' emotional states and provide more humanized interaction
Value alignment: Ensure agent behavior aligns with human values

Chapter 18: Design Philosophy and Best Practices

18.1 Core Design Principles

Principle 1: Simplicity
The core idea of Harness Engineering is "one loop, surrounded by subsystems." The elegance of this design lies in:

The core loop stays simple: Only does the most basic things
Complexity is pushed to the periphery: Complex situations are handled through subsystems
Easy to understand and maintain: Developers can quickly understand how the system works

Principle 2: Progressiveness
The Learn Claude Code project demonstrates the power of progressive design:

Start with a minimum viable product: Minimal loop of 102 lines of code
Gradually add features: Only one subsystem added at a time
Maintain usability at each step: Each stage is a working system

Principle 3: Composability
Each subsystem is independent and can be freely combined:

Modular design: Each subsystem focuses on a single responsibility
Standard interfaces: Subsystems communicate through standard interfaces
Plugin-based extension: New subsystems can be added through plugins

18.2 Implementation Best Practices

Practice 1: Test-Driven Development
Harness Engineering should adopt test-driven development:

Unit tests: Each subsystem should have complete unit tests
Integration tests: Test interactions between subsystems
End-to-end tests: Test complete agent behavior

Practice 2: Monitoring and Observability
Agents in production environments need comprehensive monitoring:

Performance metrics: Response time, throughput, error rate
Business metrics: Task completion rate, user satisfaction
Security metrics: Permission denial rate, abnormal behavior detection

Practice 3: Progressive Deployment
Agent systems should adopt progressive deployment strategies:

Canary releases: Test in a small scope first
A/B testing: Compare performance of old and new versions
Rollback mechanisms: Ability to quickly rollback when problems occur

18.3 Common Pitfalls and How to Avoid Them

Pitfall 1: Over-Engineering
Don't optimize prematurely, don't add unnecessary features.
How to avoid: Follow the YAGNI principle (You Aren't Gonna Need It)

Pitfall 2: Neglecting Error Handling
Error handling is the most easily overlooked part of Harness Engineering.
How to avoid: Treat error handling as a first-class citizen, consider it from the design phase

Pitfall 3: Performance Bottlenecks
Agent systems may face various performance bottlenecks.
How to avoid: Consider performance from the design phase, regularly perform performance testing and optimization

Conclusion: The Future of Harness Engineering

Harness Engineering represents a critical step for AI Agents moving from the laboratory to production environments. It's not a single technology, but a complete engineering methodology covering the entire agent lifecycle: from design, development, testing, deployment, to operations.

Through studying the Learn Claude Code project, we can see:

The core of Harness Engineering is simplicity: One loop, surrounded by subsystems
The key to Harness Engineering is progressiveness: Start with a minimum viable product and gradually add features
The value of Harness Engineering is reliability: Transforming unstable AI models into reliable production systems

Future Harness Engineering will develop in the following directions:

More intelligent: Automatically optimize agent behavior through machine learning
More autonomous: Agents can proactively discover problems and opportunities
More collaborative: Agents become true human partners
More secure: Ensure agent behavior aligns with human values

Harness Engineering is not just a technical problem, but a philosophical one: How do we build a system that can both leverage AI's powerful capabilities and be effectively controlled by humans? The answer to this question will determine the role and impact of AI Agents in the future.

As developers, we are in an exciting era. Harness Engineering provides us with tools and methods to build truly useful AI Agents. Let's embrace this challenge and shape the future of AI together.

Appendix: Harness Engineering Checklist

When building your own Harness Engineering, you can refer to the following checklist:

Core Loop

Implement a minimal viable agent loop
Ensure the loop can terminate correctly
Implement basic error handling

Tool System

Design tool registration mechanism
Implement tool execution framework
Add tool documentation and examples

Permission Control

Define permission policies
Implement permission checking mechanism
Add user confirmation workflow

Hook System

Identify key points that need hooks
Implement hook registration and execution mechanism
Add logging and monitoring hooks

Plan Management

Implement basic Todo system
Add task state management
Implement dependency management

Skill Loading

Define skill format
Implement skill loading mechanism
Add skill recommendation functionality

System Prompt

Design basic system prompt
Implement runtime assembly mechanism
Add dynamic context injection

Context Management

Implement context window management
Add context compression mechanism
Implement important information retention strategy

Memory System

Design memory storage structure
Implement memory retrieval mechanism
Add memory importance assessment

Error Recovery

Define error classification
Implement error handling strategy
Add retry and recovery mechanisms

Subagent

Design subagent interface
Implement task delegation mechanism
Add result aggregation functionality

Task System

Implement task graph structure
Add dependency management
Implement progress tracking

Background Tasks

Design asynchronous execution framework
Implement task monitoring
Add resource management

Scheduled Execution

Implement basic scheduler
Add persistence mechanism
Implement schedule conflict handling

Multi-Agent Collaboration

Design team protocol
Implement inter-agent communication
Add conflict resolution mechanism

By following this checklist, you can build a complete, reliable, and scalable Harness Engineering, providing a solid infrastructure foundation for AI Agents.

Prompt

Write an article about AI Harness Engineering, referencing the Learn Claude Code project, covering aspects such as Agent Loop, Prompt, Tool Use, MCP, Permission, Hooks, Todo, Skill, Context, Memory, Error Recovery, Subagent, Task, Background Tasks, Cron Scheduler, Agent Teams, etc. The article should be easy to understand, hit the key points, and connect前后. First explain what it is, why, and how to design it, then deeply explore its essence based on current latest research, and explain the current problems of Agents especially Coding Agents in this regard and how they will be improved in the future. The article has no word limit, aiming to thoroughly explain Harness Engineering in one article.