Harness Engineering
Harness (English: [ˈhɑːnɪs]): Originally refers to the equipment placed on a horse for control and guidance. In the AI context, Harness Engineering refers to the control system built around AI Agents, used to guide, constrain, and enhance the agent's capabilities, enabling it to operate stably and controllably in production environments.
Introduction: Why Do We Need Harness Engineering?
When we talk about AI Agents, what often comes to mind is an intelligent entity that can think autonomously, use tools, and complete complex tasks. However, real-world Agents are far more complex than this—they need to handle a series of engineering problems such as permission control, error recovery, context management, and task scheduling. This is like a fine horse: without proper harness, it cannot effectively pull a cart or carry people.
The essence of Harness Engineering is to provide a complete "operating system" for AI Agents. It does not change the agent's core capabilities (reasoning, generation, decision-making), but through a series of carefully designed subsystems, organizes these capabilities into a reliable, controllable, and scalable production system.
The Learn Claude Code project provides us with an excellent case study. Through 20 progressive steps, it gradually builds a production-grade agent framework from a minimal agent loop of 102 lines of code to 1708 lines. This process perfectly demonstrates the core idea of Harness Engineering: one loop, surrounded by subsystems that give it production form.
This article will deeply analyze the design philosophy, implementation principles, and future evolution directions of Harness Engineering from 16 dimensions: Agent Loop, Tool Use, Permission, Hooks, Todo, Skill, Context, Memory, Error Recovery, Subagent, Task, Background Tasks, Cron Scheduler, Agent Teams, and more.
Chapter 1: The Core Loop — Agent Loop
1.1 What It Is: The Agent's "Heartbeat"
Agent Loop is the heart of the entire Harness Engineering. Its essence is extremely simple:
while True:
response = model.generate(messages)
if response.has_tool_call:
result = execute_tool(response.tool_call)
messages.append(result)
else:
breakThis loop has only three steps:
- Call the model: Send the current conversation history to the language model
- Check the response: Does the model request to use a tool?
- Execute or end: If there's a tool call, execute it and feed back the result; otherwise, end
1.2 Why: Simplicity Is Power
Why can such a simple loop serve as the foundation of the entire framework? Because it embodies the philosophy of recursive decomposition:
- Any complex task can be decomposed into a series of tool calls
- Each tool call can be further decomposed into smaller tool calls
- Until the final task can be directly completed
The elegance of this design lies in: the loop itself doesn't need to know what problem it's solving. It's just an execution engine that converts the model's intentions into actual actions.
1.3 How to Design: The Minimal Viable Loop
The s01 phase of the Learn Claude Code project demonstrates the implementation of a minimal viable loop:
# Minimal agent loop in 102 lines of code
import anthropic
client = anthropic.Anthropic()
messages = []
tools = [read_file_tool] # Only 1 tool
while True:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=messages
)
if response.stop_reason == "tool_use":
# Execute tool and continue loop
tool_result = execute_tool(response.content)
messages.append(tool_result)
else:
# Model finished, exit loop
print(response.content)
breakThe essence of this design is: it's an infinite loop, but the model's response determines when to exit. The model is both the decision-maker and the judge of termination conditions.
1.4 Essential Exploration: Loop as State Machine
From a deeper perspective, Agent Loop is essentially a finite state machine:
State: [Thinking] --model response--> [Using Tool] --execution result--> [Thinking]
[Thinking] --no tool call--> [End]Each state transition is determined by the model's output. This means:
- Agent behavior is entirely driven by the model's reasoning
- The framework is only responsible for executing state transitions
- Any complex behavior is a combination of this simple state machine
1.5 Current Problems and Improvement Directions
Problem 1: Loop Efficiency
The current loop is serial—only one tool call can be executed at a time. When multiple tool calls have no dependencies between them, this causes unnecessary waiting.
Improvement Direction: Parallel tool execution. The framework can analyze dependencies between tool calls and execute independent calls in parallel.
Problem 2: Loop Control
The model may fall into an infinite loop—repeatedly calling the same tools without making progress.
Improvement Direction: Introduce loop detection and intervention mechanisms. The framework can monitor tool call patterns and forcibly inject new prompts or terminate the loop when a loop is detected.
Chapter 2: The Tool System — Tool Use
2.1 What It Is: The Agent's "Hands"
If Agent Loop is the heart, then Tool Use is the agent's hands. Tools are the only way for agents to interact with the outside world—whether reading files, executing commands, calling APIs, or operating databases.
2.2 Why: The Boundary of Capability Extension
Language models themselves can only generate text. To make it a true agent, it must be given the ability to act. The core problem the tool system solves is: how to let the model know what tools are available and how to use them correctly.
2.3 How to Design: The Dispatcher Pattern
The Learn Claude Code project uses the dispatcher pattern to design the tool system:
# Tool registry
TOOLS = {
"read_file": {
"description": "Read file content",
"parameters": {"path": "File path"},
"handler": read_file_handler
},
"write_file": {
"description": "Write file content",
"parameters": {"path": "File path", "content": "File content"},
"handler": write_file_handler
}
}
def execute_tool(tool_call):
tool = TOOLS[tool_call.name]
return tool["handler"](**tool_call.parameters)Advantages of this design:
- Adding a tool only requires registering one line of code
- The main loop has no knowledge of specific tools
- Tool implementation details are completely encapsulated
2.4 Essential Exploration: Tools as Interfaces
From a software engineering perspective, the tool system is essentially an interface abstraction layer. It defines the contract for agent interaction with the outside world:
- Tool description: Tells the model what this tool can do
- Parameter definition: Tells the model what information needs to be provided
- Return format: Tells the model what result it will get
This abstraction makes it so that the model doesn't need to know the implementation details of tools, only how to describe its intentions.
2.5 Current Problems and Improvement Directions
Problem 1: Tool Discovery
When the number of tools increases, the model may not know which tool is most suitable for the current task.
Improvement Direction: Intelligent tool recommendation. Dynamically recommend the most relevant tool subset based on current context and task type.
Problem 2: Tool Composition
Complex tasks often require the combined use of multiple tools, but the model may not be able to plan the optimal tool call sequence at once.
Improvement Direction: Tool orchestration engine. The framework can provide templates or planners for tool combinations to help models better organize tool calls.
Chapter 3: Permission Control — Permission
3.1 What It Is: The Agent's "Brake"
The permission system is the safety valve in Harness Engineering. It determines what operations the agent can perform under what conditions. An agent without permission control is like a car without brakes—possibly fast, but extremely dangerous.
3.2 Why: The Balance Between Trust and Safety
Agents need to perform various operations, but not all operations are safe. The permission system needs to find a balance between two extremes:
- Over-restriction: The agent cannot complete any useful work
- Over-permissiveness: The agent may cause irreversible damage
3.3 How to Design: The Decision Point Pattern
Permission control usually inserts a decision point before tool execution:
def execute_tool_with_permission(tool_call):
# Decision point: check permission
if needs_permission(tool_call):
user_decision = ask_user_permission(tool_call)
if user_decision == "deny":
return PermissionDeniedError()
# Execute tool
return execute_tool(tool_call)Permission decisions can be based on multiple factors:
- Operation type: Read vs Write vs Delete
- Scope of impact: Single file vs Entire directory
- Reversibility: Reversible vs Irreversible
- User preference: Auto-allow vs Ask every time
3.4 Essential Exploration: Permission as Policy
The permission system is essentially a policy engine. It separates the decision of "what can be done" from the execution logic, making it so that:
- Policies can be adjusted independently of implementation
- Different policies can be applied to different scenarios
- Policies can be audited and tracked
3.5 Current Problems and Improvement Directions
Problem 1: Permission Granularity
Current permission systems are usually binary—allow or deny. But in actual scenarios, finer-grained control is often needed.
Improvement Direction: Conditional permissions. For example, "allow writing to /tmp directory, but not to /etc directory".
Problem 2: Permission Learning
Every operation requires user decision, which causes "permission fatigue".
Improvement Direction: Permission learning. The framework can learn user decision patterns and automatically apply known security policies.
Chapter 4: The Hook System — Hooks
4.1 What It Is: The Agent's "Nervous System"
The hook system is the cross-cutting concern processor in Harness Engineering. It allows inserting custom logic at key points in the agent lifecycle without modifying core loop code.
4.2 Why: Separation of Concerns
Agents need to handle many tasks that are not related to core logic but are crucial:
- Logging: Recording agent behavior for debugging
- Monitoring metrics: Collecting performance and usage data
- Audit trail: Recording all operations for compliance
- Cache optimization: Caching frequently used tool results
If these logics are written directly into the loop, the code will become bloated and difficult to maintain.
4.3 How to Design: The Event-Driven Pattern
The hook system usually adopts the event-driven pattern:
# Hook registration
hooks = {
"before_tool_call": [log_tool_call, check_rate_limit],
"after_tool_call": [cache_result, update_metrics],
"on_error": [retry_with_backoff, notify_user]
}
def execute_tool_with_hooks(tool_call):
# Trigger pre-hooks
for hook in hooks["before_tool_call"]:
hook(tool_call)
# Execute tool
result = execute_tool(tool_call)
# Trigger post-hooks
for hook in hooks["after_tool_call"]:
hook(tool_call, result)
return result4.4 Essential Exploration: Hooks as Middleware
The hook system is essentially a middleware pattern. It decouples cross-cutting concerns from business logic, making it so that:
- Each hook focuses only on a single responsibility
- Hooks can be independently added, removed, or modified
- The core loop remains clean and focused
4.5 Current Problems and Improvement Directions
Problem 1: Hook Order
When multiple hooks are registered to the same event, execution order may be important.
Improvement Direction: Priority system. Assign priorities to each hook to ensure critical hooks (like security checks) execute first.
Problem 2: Hook Performance
Too many hooks may affect the agent's response speed.
Improvement Direction: Asynchronous hooks. Non-critical hooks can be executed asynchronously in the background without blocking the main loop.
Chapter 5: Plan Management — Todo
5.1 What It Is: The Agent's "Task List"
The Todo system provides agents with explicit plan management capabilities. It allows agents to decompose complex tasks into trackable subtasks and track the completion status of each subtask.
5.2 Why: Navigation for Long-Term Tasks
An agent without a plan is like a traveler without a map—may keep moving forward, but easily loses direction. The core problems the Todo system solves are:
- Task decomposition: Breaking down large goals into executable small steps
- Progress tracking: Knowing what has been completed and what remains
- Error recovery: Knowing where to restart when a step fails
5.3 How to Design: The State Machine Pattern
The Todo system usually adopts the state machine pattern:
class TodoItem:
def __init__(self, description):
self.description = description
self.status = "pending" # pending, in_progress, completed, failed
self.dependencies = []
self.result = None
class TodoManager:
def __init__(self):
self.items = []
def add_item(self, description, dependencies=[]):
item = TodoItem(description)
item.dependencies = dependencies
self.items.append(item)
def get_next_task(self):
# Return the next task that can be executed (all dependencies completed)
for item in self.items:
if item.status == "pending":
if all(self.is_completed(dep) for dep in item.dependencies):
return item
return None
def mark_completed(self, item, result):
item.status = "completed"
item.result = result5.4 Essential Exploration: Plan as State
The Todo system is essentially a state manager. It extracts the agent's "memory" from conversation history and stores it in a structured way, making it so that:
- Plans can be persisted: Even if the conversation is interrupted, the plan won't be lost
- Plans can be modified: Adjust subsequent steps based on new information
- Plans can be shared: Multiple agents can collaborate to execute the same plan
5.5 Current Problems and Improvement Directions
Problem 1: Plan Rigidity
The agent may strictly follow the plan even when the plan is no longer optimal.
Improvement Direction: Dynamic plan adjustment. The agent can proactively modify subsequent plans based on execution results and new information.
Problem 2: Dependency Management
Complex dependency relationships may lead to deadlocks or circular dependencies.
Improvement Direction: Dependency graph analysis. The framework can detect and prevent dependency issues, ensuring plan feasibility.
Chapter 6: Skill Loading — Skill
6.1 What It Is: The Agent's "Professional Knowledge"
The skill system allows agents to load professional domain knowledge and capabilities on demand. It's not an independent functional module, but a knowledge injection mechanism.
6.2 Why: The Scarcity of Context
The context window of language models is limited. If all potentially needed knowledge is placed in the system prompt, it will:
- Waste precious context space
- Increase the model's processing burden
- Reduce response relevance
The skill system solves this problem through on-demand loading.
6.3 How to Design: The Dynamic Injection Pattern
The skill system usually adopts the dynamic injection pattern:
class SkillLoader:
def __init__(self):
self.skills = {
"python": "You are a Python expert...",
"database": "You are a database expert...",
"security": "You are a security expert..."
}
def load_skill(self, skill_name):
if skill_name in self.skills:
return self.skills[skill_name]
return None
def inject_skill(self, messages, skill_name):
skill_content = self.load_skill(skill_name)
if skill_content:
# Inject skill into system prompt
messages.insert(0, {"role": "system", "content": skill_content})6.4 Essential Exploration: Skills as Patterns
The skill system is essentially a pattern matching and injection mechanism. It identifies needed professional knowledge based on current task characteristics and injects it into the agent's context, making it so that:
- Agents can dynamically adapt to tasks in different domains
- Professional knowledge can be updated independently of the core agent
- Different tasks can use different combinations of professional knowledge
6.5 Current Problems and Improvement Directions
Problem 1: Skill Identification
The agent may not accurately determine which skills the current task needs.
Improvement Direction: Intelligent skill recommendation. Automatically recommend the most relevant skills based on task description and historical data.
Problem 2: Skill Conflicts
Multiple skills may provide contradictory guidance.
Improvement Direction: Skill priority and conflict resolution mechanisms. Define priority relationships between skills and arbitrate when conflicts occur.
Chapter 7: System Prompt — Prompt
7.1 What It Is: The Agent's "Operating System"
The system prompt is the agent's "constitution"—it defines the agent's identity, capabilities, limitations, and code of conduct. Unlike skills, the system prompt is persistent and foundational, and doesn't change with tasks.
7.2 Why: Predictability of Behavior
An agent without a system prompt is like an organization without rules—behavior is unpredictable and difficult to control. The core problems the system prompt solves are:
- Identity definition: Tells the model who it is and what it can do
- Behavior constraints: Defines what can and cannot be done
- Output format: Specifies the structure and style of responses
7.3 How to Design: The Runtime Assembly Pattern
Modern Harness Engineering usually adopts the runtime assembly pattern to build system prompts:
def build_system_prompt(context):
prompt_parts = []
# 1. Basic identity
prompt_parts.append("You are a professional programming assistant...")
# 2. Capability declaration
prompt_parts.append("You can read and modify files, execute commands...")
# 3. Behavior constraints
prompt_parts.append("You must always follow security best practices...")
# 4. Dynamic context
if context.current_project:
prompt_parts.append(f"Current project: {context.current_project}")
# 5. Loaded skills
for skill in context.loaded_skills:
prompt_parts.append(skill)
return "\n\n".join(prompt_parts)7.4 Essential Exploration: Prompt as Code
The system prompt is essentially a declarative programming language. It describes the behavior the agent should have through natural language, rather than implementing these behaviors through code, making it so that:
- Non-technical personnel can also define agent behavior
- Behavior can be quickly iterated and adjusted
- Different scenarios can use different behavior definitions
7.5 Current Problems and Improvement Directions
Problem 1: Prompt Engineering
Writing effective system prompts requires a lot of experience and trial and error.
Improvement Direction: Prompt generation and optimization tools. Generate more effective system prompts through automated testing and optimization.
Problem 2: Prompt Bloat
As features increase, the system prompt may become very long, affecting model performance.
Improvement Direction: Prompt compression and layering. Divide the system prompt into core and extension layers, dynamically loading as needed.
Chapter 8: Context Management — Context
8.1 What It Is: The Agent's "Short-Term Memory"
Context is the agent's "working memory" within a single session. It contains all historical messages, tool call results, and intermediate states of the current conversation.
8.2 Why: The Limitation of Context Windows
The context window of language models is limited (usually 128K-200K tokens). When conversation history exceeds this limit, the agent "forgets" earlier information. The core problems context management solves are:
- How to retain the most important information within a limited window
- How to compress when the conversation becomes too long
- How to restore lost context when needed
8.3 How to Design: The Compression and Retrieval Pattern
Context management usually adopts the compression and retrieval pattern:
class ContextManager:
def __init__(self, max_tokens=100000):
self.max_tokens = max_tokens
self.full_history = []
self.compressed_history = []
def add_message(self, message):
self.full_history.append(message)
# Check if compression is needed
if self.total_tokens() > self.max_tokens:
self.compress()
def compress(self):
# Compress old messages into a summary
old_messages = self.full_history[:len(self.full_history)//2]
summary = self.summarize(old_messages)
# Keep recent messages and summary
self.compressed_history.append({"role": "system", "content": summary})
self.full_history = self.full_history[len(old_messages):]
def get_context(self):
return self.compressed_history + self.full_history8.4 Essential Exploration: Context as Cache
Context is essentially a caching system. It caches the agent's "thinking process," making it so that:
- Agents can reference previous work: No need to recompute
- Agents can maintain conversation coherence: Users feel like they're talking to the same person
- Agents can learn from mistakes: Remembering what methods worked and what didn't
8.5 Current Problems and Improvement Directions
Problem 1: Information Loss
The compression process inevitably loses some information, which may cause the agent to forget important details.
Improvement Direction: Intelligent compression. Use more advanced summarization algorithms to retain key information while reducing token count.
Problem 2: Retrieval Efficiency
When context is very long, retrieving specific information may be slow.
Improvement Direction: Vector retrieval. Convert context to vector representations and quickly locate relevant information through semantic search.
Chapter 9: Memory System — Memory
9.1 What It Is: The Agent's "Long-Term Memory"
The memory system is the agent's persistent storage, used to save important information across sessions. Unlike context, memory doesn't disappear when the session ends.
9.2 Why: Learning and Personalization
An agent without memory starts from scratch every session, unable to:
- Learn user preferences: Remember user's preferred code style, tools, etc.
- Accumulate project knowledge: Remember project structure, conventions, common problems
- Avoid repeating mistakes: Remember methods that were tried before but failed
9.3 How to Design: The Layered Storage Pattern
The memory system usually adopts the layered storage pattern:
class MemorySystem:
def __init__(self):
self.short_term = {} # In-session memory
self.long_term = {} # Cross-session memory
self.episodic = [] # Episodic memory
def store(self, key, value, importance="medium"):
if importance == "high":
# Store in long-term memory
self.long_term[key] = {
"value": value,
"timestamp": datetime.now(),
"access_count": 0
}
else:
# Store in short-term memory
self.short_term[key] = value
def recall(self, key):
# First check short-term memory
if key in self.short_term:
return self.short_term[key]
# Then check long-term memory
if key in self.long_term:
self.long_term[key]["access_count"] += 1
return self.long_term[key]["value"]
return None
def consolidate(self):
# Transfer frequently accessed short-term memory to long-term memory
for key, value in self.short_term.items():
if self.is_important(key):
self.store(key, value, importance="high")9.4 Essential Exploration: Memory as Database
The memory system is essentially an intelligent database. It not only stores data, but also:
- Automatically determines the importance of information
- Optimizes storage based on access patterns
- Provides semantic retrieval capabilities
9.5 Current Problems and Improvement Directions
Problem 1: Memory Conflicts
New information may contradict old information, leading to memory inconsistency.
Improvement Direction: Memory version control. Retain historical versions of information and arbitrate when conflicts occur.
Problem 2: Memory Retrieval
When memory volume is large, how to quickly find relevant information is a challenge.
Improvement Direction: Semantic indexing. Use vector databases to store memory and retrieve through semantic similarity.
Chapter 10: Error Recovery — Error Recovery
10.1 What It Is: The Agent's "Immune System"
The error recovery system is the agent's ability to handle exceptional situations. It ensures that the agent doesn't crash when encountering errors, but can gracefully recover and continue working.
10.2 Why: The Complexity of the Real World
In real environments, various errors can occur:
- Tool execution failures: File doesn't exist, insufficient permissions, network timeout
- Model output errors: Format errors, logical errors, hallucinations
- External dependency failures: API unavailable, service down
An agent without error recovery is like a body without an immune system—the first pathogen encountered will cause it to fall.
10.3 How to Design: The Classification and Retry Pattern
Error recovery usually adopts the classification and retry pattern:
class ErrorRecovery:
def __init__(self):
self.max_retries = 3
self.retry_strategies = {
"transient": self.retry_with_backoff,
"permanent": self.abort_and_report,
"recoverable": self.try_alternative
}
def handle_error(self, error, context):
# Error classification
error_type = self.classify_error(error)
# Select recovery strategy
strategy = self.retry_strategies[error_type]
# Execute recovery
return strategy(error, context)
def retry_with_backoff(self, error, context):
# Exponential backoff retry
for attempt in range(self.max_retries):
time.sleep(2 ** attempt)
try:
return self.retry_operation(context)
except Exception as e:
continue
raise MaxRetriesExceeded()
def try_alternative(self, error, context):
# Try alternative approaches
alternatives = self.get_alternatives(context)
for alt in alternatives:
try:
return alt.execute()
except Exception:
continue
raise NoAlternativesAvailable()10.4 Essential Exploration: Error as Information
The error recovery system is essentially a learning system. It treats errors as valuable information sources, used for:
- Identifying system weak points: Which tools fail most often?
- Optimizing retry strategies: What types of errors are worth retrying?
- Improving future decisions: Avoiding repeating the same mistakes
10.5 Current Problems and Improvement Directions
Problem 1: Error Propagation
An error in one tool may cascade and affect the entire task.
Improvement Direction: Error isolation. Set independent error boundaries for each tool call to prevent error propagation.
Problem 2: Recovery Cost
Retries and recovery may consume significant time and resources.
Improvement Direction: Intelligent retry decisions. Based on error type, historical success rate, and remaining budget, decide whether retrying is worthwhile.
Chapter 11: Subagent — Subagent
11.1 What It Is: The Agent's "Avatar"
The subagent system allows the main agent to delegate tasks to specialized subagents. Each subagent has an independent context and toolset, focusing on completing specific subtasks.
11.2 Why: Decomposition of Complexity
When tasks become complex, a single agent may face:
- Context overload: All information crammed into one window
- Role confusion: One agent handling too many different types of tasks
- Inefficiency: Serial processing of tasks that could be parallel
Subagents solve these problems through divide and conquer.
11.3 How to Design: The Delegation and Aggregation Pattern
The subagent system usually adopts the delegation and aggregation pattern:
class MainAgent:
def __init__(self):
self.subagent_pool = []
def delegate_task(self, task):
# Create specialized subagent
subagent = self.create_subagent(task)
# Assign task
result = subagent.execute(task)
# Aggregate results
return self.aggregate_results(result)
def create_subagent(self, task):
# Create appropriate subagent based on task type
if task.type == "code_review":
return CodeReviewSubagent()
elif task.type == "testing":
return TestingSubagent()
else:
return GeneralSubagent()
class Subagent:
def __init__(self):
self.context = [] # Independent context
self.tools = [] # Specialized toolset
def execute(self, task):
# Execute task in independent context
messages = [{"role": "user", "content": task.description}]
while True:
response = self.model.generate(messages)
if response.has_tool_call:
result = self.execute_tool(response.tool_call)
messages.append(result)
else:
return response.content11.4 Essential Exploration: Subagents as Microservices
The subagent system is essentially a microservices architecture. Each subagent is an independent "service" with:
- Single responsibility: Only handles specific types of tasks
- Independent context: Not interfered with by information from other tasks
- Composability: Multiple subagents can collaborate to complete complex tasks
11.5 Current Problems and Improvement Directions
Problem 1: Communication Overhead
Communication between the main agent and subagents requires transmitting large amounts of context.
Improvement Direction: Shared memory. Use shared memory space to reduce communication overhead.
Problem 2: Coordination Complexity
When multiple subagents work in parallel, coordinating their activities becomes complex.
Improvement Direction: Coordination protocols. Define clear coordination protocols to ensure effective collaboration between subagents.
Chapter 12: Task System — Task
12.1 What It Is: The Agent's "Project Management"
The task system decomposes large goals into structured task graphs. It's not just a collection of to-do items, but a workflow engine that manages dependencies, priorities, and execution order between tasks.
12.2 Why: The Bridge from Goals to Execution
Users usually provide high-level goals ("fix this bug", "add this feature") rather than specific execution steps. The core problems the task system solves are:
- Goal decomposition: Transforming vague goals into concrete steps
- Dependency management: Determining which steps can be parallel and which must be serial
- Progress tracking: Knowing overall progress and predicting completion time
12.3 How to Design: The Directed Acyclic Graph Pattern
The task system usually adopts the Directed Acyclic Graph (DAG) pattern:
class TaskGraph:
def __init__(self):
self.tasks = {}
self.dependencies = {}
def add_task(self, task_id, description, dependencies=[]):
self.tasks[task_id] = {
"description": description,
"status": "pending",
"result": None
}
self.dependencies[task_id] = dependencies
def get_executable_tasks(self):
# Return all tasks whose dependencies are satisfied
executable = []
for task_id, deps in self.dependencies.items():
if self.tasks[task_id]["status"] == "pending":
if all(self.tasks[dep]["status"] == "completed" for dep in deps):
executable.append(task_id)
return executable
def mark_completed(self, task_id, result):
self.tasks[task_id]["status"] = "completed"
self.tasks[task_id]["result"] = result
def is_complete(self):
return all(t["status"] == "completed" for t in self.tasks.values())12.4 Essential Exploration: Tasks as Workflow
The task system is essentially a workflow engine. It separates execution logic from task definitions, making it so that:
- Tasks can be visualized: Task graphs can intuitively display work progress
- Tasks can be optimized: By analyzing dependencies, find the optimal execution order
- Tasks can be recovered: When a task fails, it can continue from the breakpoint
12.5 Current Problems and Improvement Directions
Problem 1: Task Granularity
The granularity of task decomposition is difficult to grasp—too fine leads to management overhead, too coarse loses parallelism.
Improvement Direction: Adaptive granularity. Dynamically adjust task granularity based on task complexity and dependencies.
Problem 2: Dynamic Adjustment
When new information is discovered during execution, the task graph may need to be adjusted.
Improvement Direction: Dynamic task graphs. Allow adding, deleting, or modifying tasks during execution.
Chapter 13: Background Tasks — Background Tasks
13.1 What It Is: The Agent's "Asynchronous Capability"
The background task system allows agents to put time-consuming operations in the background while continuing to handle other work. It's a key mechanism for agents to achieve non-blocking operations.
13.2 Why: Efficiency and Responsiveness
Some operations may take a long time (compiling code, running tests, downloading files). If the agent must wait for these operations to complete, it will cause:
- Response delays: Users wait too long
- Resource waste: The agent can't do other work while waiting
- Poor user experience: The agent feels "stuck"
13.3 How to Design: The Asynchronous Execution Pattern
The background task system usually adopts the asynchronous execution pattern:
class BackgroundTaskManager:
def __init__(self):
self.tasks = {}
self.task_queue = queue.Queue()
self.workers = []
def submit_task(self, task_id, task_func, callback):
# Submit task to background
self.tasks[task_id] = {
"status": "running",
"result": None,
"callback": callback
}
# Execute asynchronously
thread = threading.Thread(
target=self._execute_task,
args=(task_id, task_func)
)
thread.start()
def _execute_task(self, task_id, task_func):
try:
result = task_func()
self.tasks[task_id]["status"] = "completed"
self.tasks[task_id]["result"] = result
# Call callback function
callback = self.tasks[task_id]["callback"]
if callback:
callback(result)
except Exception as e:
self.tasks[task_id]["status"] = "failed"
self.tasks[task_id]["error"] = str(e)
def get_task_status(self, task_id):
return self.tasks[task_id]["status"]
def get_task_result(self, task_id):
task = self.tasks[task_id]
if task["status"] == "completed":
return task["result"]
return None13.4 Essential Exploration: Background as Concurrency
The background task system is essentially a concurrent execution framework. It allows agents to:
- Handle multiple tasks simultaneously: Improve resource utilization
- Maintain responsiveness: Users won't feel the agent is "stuck"
- Implement pipelines: One task's output can be another task's input
13.5 Current Problems and Improvement Directions
Problem 1: Task Monitoring
Background tasks may fail, but the agent may not know.
Improvement Direction: Proactive monitoring. Regularly check background task status and notify users promptly when tasks fail.
Problem 2: Resource Management
Too many background tasks may exhaust system resources.
Improvement Direction: Resource limits. Set maximum concurrent task count to prevent system overload.
Chapter 14: Scheduled Execution — Cron Scheduler
14.1 What It Is: The Agent's "Alarm Clock"
The scheduled execution system allows agents to execute tasks on schedule. It's a key mechanism for agents to achieve automation and periodic work.
14.2 Why: From Passive to Active
Without scheduled execution, agents can only passively respond to user requests. Scheduled execution enables agents to:
- Proactively execute tasks: No need for users to manually trigger each time
- Maintain system health: Regular checks, cleanup, backups
- Respond to time events: Execute specific operations at specific times
14.3 How to Design: The Scheduler Pattern
The scheduled execution system usually adopts the scheduler pattern:
class CronScheduler:
def __init__(self):
self.jobs = {}
self.running = False
def add_job(self, job_id, schedule, task_func, args=[]):
self.jobs[job_id] = {
"schedule": schedule,
"task": task_func,
"args": args,
"last_run": None,
"next_run": self.calculate_next_run(schedule)
}
def start(self):
self.running = True
while self.running:
now = datetime.now()
# Check if any tasks need to be executed
for job_id, job in self.jobs.items():
if now >= job["next_run"]:
# Execute task
self.execute_job(job_id)
# Update next execution time
job["last_run"] = now
job["next_run"] = self.calculate_next_run(job["schedule"])
# Wait for a while before checking again
time.sleep(1)
def execute_job(self, job_id):
job = self.jobs[job_id]
try:
job["task"](*job["args"])
except Exception as e:
self.handle_job_error(job_id, e)14.4 Essential Exploration: Scheduling as Time Management
The scheduled execution system is essentially a time management system. It introduces the time dimension into agent behavior, making it so that:
- Agents can predict the future: Know when to do what
- Agents can optimize time: Execute low-priority tasks during idle time
- Agents can respond to time events: Trigger specific behaviors at specific times
14.5 Current Problems and Improvement Directions
Problem 1: Schedule Conflicts
Multiple tasks may trigger at the same time, causing resource competition.
Improvement Direction: Priority scheduling. Set priorities for tasks, with high-priority tasks executing first.
Problem 2: Schedule Failure
If the agent restarts, all scheduled tasks will be lost.
Improvement Direction: Persistent scheduling. Persistently store scheduling configurations and automatically restore after agent restart.
Chapter 15: Multi-Agent Collaboration — Agent Teams
15.1 What It Is: The Agent's "Team"
The Agent Teams system allows multiple agents to work together to complete complex tasks. It's the most advanced form of Harness Engineering, extending single-agent capabilities to the team level.
15.2 Why: The Limits of Single Agents
Single agents face inherent limitations:
- Context window limitations: Cannot process all relevant information simultaneously
- Professional capability limitations: One agent cannot master all domains
- Parallelism limitations: Single-threaded execution cannot fully utilize resources
Agent Teams breaks through these limitations through division of labor and collaboration.
15.3 How to Design: The Team Protocol Pattern
Agent Teams usually adopts the team protocol pattern:
class AgentTeam:
def __init__(self):
self.agents = {}
self.shared_memory = SharedMemory()
self.communication_protocol = CommunicationProtocol()
def add_agent(self, agent_id, agent, role):
self.agents[agent_id] = {
"agent": agent,
"role": role,
"status": "idle"
}
def assign_task(self, task):
# Analyze task and decide which agent to assign to
suitable_agent = self.find_suitable_agent(task)
# Assign task
self.agents[suitable_agent]["status"] = "busy"
result = self.agents[suitable_agent]["agent"].execute(task)
# Update shared memory
self.shared_memory.store(task.id, result)
# Notify other agents
self.communication_protocol.broadcast(
"task_completed",
{"task_id": task.id, "result": result}
)
return result
def find_suitable_agent(self, task):
# Select the most suitable agent based on role and capability
for agent_id, agent_info in self.agents.items():
if agent_info["role"] == task.required_role:
if agent_info["status"] == "idle":
return agent_id
return None15.4 Essential Exploration: Teams as Distributed Systems
Agent Teams is essentially a distributed system. It distributes the complexity of a single agent across multiple specialized agents, making it so that:
- Each agent focuses only on its area of expertise
- Multiple agents can process different subtasks in parallel
- The system can scale by adding more agents
15.5 Current Problems and Improvement Directions
Problem 1: Coordination Overhead
Communication and coordination between agents consume significant resources.
Improvement Direction: Efficient protocols. Design more efficient communication protocols to reduce coordination overhead.
Problem 2: Consistency Guarantee
Multiple agents may have different understandings of the same problem, leading to inconsistency.
Improvement Direction: Consensus mechanisms. Introduce blockchain-like consensus mechanisms to ensure team decision consistency.
Chapter 16: Comprehensive Agent — Comprehensive Agent Turn
16.1 What It Is: The "Final Form" of Harness Engineering
The comprehensive agent is the integrated form of all Harness Engineering components. It brings together Agent Loop, Tool Use, Permission, Hooks, Todo, Skill, Context, Memory, Error Recovery, Subagent, Task, Background Tasks, Cron Scheduler, Agent Teams, and all other mechanisms into one loop, forming a complete production-grade agent.
16.2 Why: From Components to System
Individual components are like car parts—each has its function, but only when assembled together can they become a usable car. The core problems the comprehensive agent solves are:
- Component integration: Ensuring all components can work together
- Behavior consistency: Different components' behaviors don't conflict with each other
- Performance optimization: Overall performance is greater than the sum of its parts
16.3 How to Design: The Layered Architecture Pattern
The comprehensive agent usually adopts the layered architecture pattern:
class ComprehensiveAgent:
def __init__(self):
# Core layer
self.agent_loop = AgentLoop()
# Tool layer
self.tool_registry = ToolRegistry()
self.permission_manager = PermissionManager()
# Planning layer
self.todo_manager = TodoManager()
self.task_graph = TaskGraph()
self.skill_loader = SkillLoader()
# Memory layer
self.context_manager = ContextManager()
self.memory_system = MemorySystem()
# Execution layer
self.background_task_manager = BackgroundTaskManager()
self.cron_scheduler = CronScheduler()
# Collaboration layer
self.subagent_manager = SubagentManager()
self.agent_team = AgentTeam()
# Cross-cutting concerns
self.hook_system = HookSystem()
self.error_recovery = ErrorRecovery()
def process_request(self, user_request):
# 1. Build system prompt
system_prompt = self.build_system_prompt()
# 2. Load relevant skills
relevant_skills = self.skill_loader.identify_skills(user_request)
# 3. Create task plan
task_plan = self.create_task_plan(user_request)
# 4. Execute agent loop
while not task_plan.is_complete():
# Get next task
task = task_plan.get_next_task()
# Decide whether to handle it yourself or delegate
if self.should_delegate(task):
result = self.delegate_to_subagent(task)
else:
result = self.execute_task(task)
# Update task status
task_plan.mark_completed(task, result)
# Update memory
self.memory_system.store(task.id, result)
# 5. Aggregate results
final_result = self.aggregate_results(task_plan)
return final_result16.4 Essential Exploration: Comprehensive Agent as Operating System
The comprehensive agent is essentially an operating system. It manages all of the agent's resources (context, memory, tools, time), provides all services (planning, execution, collaboration, recovery), making it so that:
- Agents can focus on thinking and decision-making: No need to worry about low-level details
- The system can guarantee reliability and security: Through permission and error recovery mechanisms
- Capabilities can scale horizontally: Through subagent and team mechanisms
16.5 Current Problems and Improvement Directions
Problem 1: Complexity Management
As components increase, system complexity grows exponentially.
Improvement Direction: Modular design. Break the system down into independent modules, each focusing on a single responsibility.
Problem 2: Performance Bottlenecks
Interaction between multiple components may create performance bottlenecks.
Improvement Direction: Performance analysis and optimization. Use performance analysis tools to identify bottlenecks and perform targeted optimization.
Chapter 17: Current Problems and Future Improvements
17.1 Special Challenges of Coding Agents
Coding Agents are the most typical application scenario of Harness Engineering, and also the most challenging domain:
Problem 1: Depth of Code Understanding
Current agents' understanding of code is often superficial—able to identify syntax structures, but difficult to understand deep semantics and design intent.
Improvement Directions:
- Program analysis techniques: Introduce traditional compiler techniques like static analysis and data flow analysis
- Knowledge graphs: Build code knowledge graphs to capture complex relationships between code
- Multi-modal understanding: Combine code, comments, documentation, tests, and other information sources
Problem 2: Limitations of Long-Term Memory
Programming tasks often require understanding the overall architecture and historical decisions of a project, but current agents' memory capabilities are limited.
Improvement Directions:
- Project memory: Maintain dedicated memory spaces for each project
- Decision tracing: Record the reasons and context for each design decision
- Knowledge evolution: Track codebase evolution and understand reasons for changes
Problem 3: Insufficient Collaboration Capabilities
Large software projects require multi-person collaboration, but current agents have difficulty collaborating effectively with human developers.
Improvement Directions:
- Human-machine collaboration protocols: Define clear human-machine collaboration workflows
- Intent understanding: Better understand developer intentions and preferences
- Conflict resolution: How to handle conflicts when agent suggestions conflict with developer opinions
17.2 Improvement Directions for General Agents
Direction 1: Enhancing Autonomy
Current agents are still "passive"—requiring user triggers to act. Future agents should be able to:
- Proactively discover problems: Monitor system status and proactively identify potential issues
- Proactively make suggestions: Based on observations and experience, proactively suggest improvements
- Proactively learn: Learn from each interaction and continuously improve capabilities
Direction 2: Enhancing Explainability
Current agents' decision-making processes are often "black boxes." Future agents should be able to:
- Explain decision reasons: Clearly explain why a particular decision was made
- Provide alternatives: Show other possible choices and their trade-offs
- Support audit trails: Record complete decision processes for post-hoc review
Direction 3: Strengthening Security
As agent capabilities increase, security becomes increasingly important. Future agents should be able to:
- Risk assessment: Assess potential risks before executing operations
- Sandbox execution: Execute high-risk operations in isolated environments
- Anomaly detection: Identify and block abnormal or malicious behavior
17.3 Technology Evolution Trends
Trend 1: From Rules to Learning
Current Harness Engineering heavily relies on rules (permission rules, error handling rules, scheduling rules, etc.). The future trend is:
- Learning-based rules: Automatically discover and optimize rules through machine learning
- Adaptive strategies: Dynamically adjust strategies based on environment and historical data
- Predictive intervention: Predict potential problems and take preventive measures in advance
Trend 2: From Monolithic to Distributed
Current agents are usually monolithic—all functions concentrated in one process. The future trend is:
- Microservices architecture: Break agents down into independent microservices
- Edge computing: Offload some computing tasks to edge devices
- Cloud-native deployment: Use cloud-native technologies for elastic scaling
Trend 3: From Tool to Partner
Current agents are more like "tools"—passively executing user instructions. The future trend is:
- Partnership: Agents become human partners, jointly solving problems
- Emotional understanding: Understand users' emotional states and provide more humanized interaction
- Value alignment: Ensure agent behavior aligns with human values
Chapter 18: Design Philosophy and Best Practices
18.1 Core Design Principles
Principle 1: Simplicity
The core idea of Harness Engineering is "one loop, surrounded by subsystems." The elegance of this design lies in:
- The core loop stays simple: Only does the most basic things
- Complexity is pushed to the periphery: Complex situations are handled through subsystems
- Easy to understand and maintain: Developers can quickly understand how the system works
Principle 2: Progressiveness
The Learn Claude Code project demonstrates the power of progressive design:
- Start with a minimum viable product: Minimal loop of 102 lines of code
- Gradually add features: Only one subsystem added at a time
- Maintain usability at each step: Each stage is a working system
Principle 3: Composability
Each subsystem is independent and can be freely combined:
- Modular design: Each subsystem focuses on a single responsibility
- Standard interfaces: Subsystems communicate through standard interfaces
- Plugin-based extension: New subsystems can be added through plugins
18.2 Implementation Best Practices
Practice 1: Test-Driven Development
Harness Engineering should adopt test-driven development:
- Unit tests: Each subsystem should have complete unit tests
- Integration tests: Test interactions between subsystems
- End-to-end tests: Test complete agent behavior
Practice 2: Monitoring and Observability
Agents in production environments need comprehensive monitoring:
- Performance metrics: Response time, throughput, error rate
- Business metrics: Task completion rate, user satisfaction
- Security metrics: Permission denial rate, abnormal behavior detection
Practice 3: Progressive Deployment
Agent systems should adopt progressive deployment strategies:
- Canary releases: Test in a small scope first
- A/B testing: Compare performance of old and new versions
- Rollback mechanisms: Ability to quickly rollback when problems occur
18.3 Common Pitfalls and How to Avoid Them
Pitfall 1: Over-Engineering
Don't optimize prematurely, don't add unnecessary features.
How to avoid: Follow the YAGNI principle (You Aren't Gonna Need It)
Pitfall 2: Neglecting Error Handling
Error handling is the most easily overlooked part of Harness Engineering.
How to avoid: Treat error handling as a first-class citizen, consider it from the design phase
Pitfall 3: Performance Bottlenecks
Agent systems may face various performance bottlenecks.
How to avoid: Consider performance from the design phase, regularly perform performance testing and optimization
Conclusion: The Future of Harness Engineering
Harness Engineering represents a critical step for AI Agents moving from the laboratory to production environments. It's not a single technology, but a complete engineering methodology covering the entire agent lifecycle: from design, development, testing, deployment, to operations.
Through studying the Learn Claude Code project, we can see:
- The core of Harness Engineering is simplicity: One loop, surrounded by subsystems
- The key to Harness Engineering is progressiveness: Start with a minimum viable product and gradually add features
- The value of Harness Engineering is reliability: Transforming unstable AI models into reliable production systems
Future Harness Engineering will develop in the following directions:
- More intelligent: Automatically optimize agent behavior through machine learning
- More autonomous: Agents can proactively discover problems and opportunities
- More collaborative: Agents become true human partners
- More secure: Ensure agent behavior aligns with human values
Harness Engineering is not just a technical problem, but a philosophical one: How do we build a system that can both leverage AI's powerful capabilities and be effectively controlled by humans? The answer to this question will determine the role and impact of AI Agents in the future.
As developers, we are in an exciting era. Harness Engineering provides us with tools and methods to build truly useful AI Agents. Let's embrace this challenge and shape the future of AI together.
Appendix: Harness Engineering Checklist
When building your own Harness Engineering, you can refer to the following checklist:
Core Loop
Tool System
Permission Control
Hook System
Plan Management
Skill Loading
System Prompt
Context Management
Memory System
Error Recovery
Subagent
Task System
Background Tasks
Scheduled Execution
Multi-Agent Collaboration
By following this checklist, you can build a complete, reliable, and scalable Harness Engineering, providing a solid infrastructure foundation for AI Agents.
Prompt
Write an article about AI Harness Engineering, referencing the Learn Claude Code project, covering aspects such as Agent Loop, Prompt, Tool Use, MCP, Permission, Hooks, Todo, Skill, Context, Memory, Error Recovery, Subagent, Task, Background Tasks, Cron Scheduler, Agent Teams, etc. The article should be easy to understand, hit the key points, and connect前后. First explain what it is, why, and how to design it, then deeply explore its essence based on current latest research, and explain the current problems of Agents especially Coding Agents in this regard and how they will be improved in the future. The article has no word limit, aiming to thoroughly explain Harness Engineering in one article.