How to spawn child agents, pass messages, use resource locks, and manage swarm lifecycle.

Building Multi-Agent Swarms

This guide covers how to build agents that coordinate with other agents via the AEGIS swarm system. Parent agents can spawn children, await completion, exchange messages, and use resource locks via SEAL-secured MCP tool calls.

When to Use Swarms

Use swarms when a single agent's task is best decomposed into parallel or sequential sub-tasks:

Parallel processing: Analyze multiple files simultaneously with one agent per file.
Specialization: Route sub-tasks to agents with different capabilities (e.g., a python-expert and a security-reviewer in parallel).
Pipeline decomposition: Break a large task into sequential stages, each with isolated state and its own iteration loop.

If your decomposition is static and predictable, consider a Workflow FSM instead — it is easier to monitor and debug. Use swarms when decomposition needs to be dynamic (e.g., spawn one child per input item).

Basic Spawn and Await Pattern

import json
from aegis import AegisClient

client = AegisClient()
task = client.get_task()

files = task.input.get("files", [])

# Spawn a child agent for each file
spawned = []
for file_path in files:
    result = client.call_tool("aegis.spawn_child", {
        "manifest_yaml": f"""
apiVersion: 100monkeys.ai/v1
kind: Agent
metadata:
  name: file-analyzer-child
spec:
  image: myregistry/file-analyzer:latest
  capabilities:
    - fs.read
    - cmd.run
  security:
    security_context: default
  resources:
    timeout_secs: 120
""",
    })
    spawned.append({
        "file": file_path,
        "execution_id": result["execution_id"],
        "swarm_id": result["swarm_id"]
    })

# Await all children
results = []
for item in spawned:
    outcome = client.call_tool("aegis.await_child", {
        "execution_id": item["execution_id"],
        "timeout_secs": 150
    })
    results.append({
        "file": item["file"],
        "status": outcome["status"],
        "output": outcome.get("output", "")
    })

# Write summary
with open("/workspace/analysis_results.json", "w") as f:
    json.dump(results, f, indent=2)

print(json.dumps({"processed": len(results), "results_path": "/workspace/analysis_results.json"}))

Passing Context to Children

The child execution receives what is passed via task.input. You can inject per-child context by embedding it in the manifest env section, or by having the child read from a shared volume that the parent writes first.

Option A: Volume-Backed Context

Write a context file to the shared workspace before spawning:

# Parent writes per-child task files
for i, file_path in enumerate(files):
    context_path = f"/workspace/tasks/task_{i}.json"
    client.call_tool("fs.write", {
        "path": context_path,
        "content": json.dumps({"target_file": file_path})
    })

    client.call_tool("aegis.spawn_child", {
        "manifest_yaml": child_manifest(task_index=i),
    })

The child reads its task file from /workspace/tasks/task_{i}.json at startup.

Option B: Environment Variables in the Manifest

Embed small parameters directly in the dynamically generated manifest:

def child_manifest(file_path: str) -> str:
    return f"""
apiVersion: 100monkeys.ai/v1
kind: Agent
metadata:
  name: worker
spec:
  image: myregistry/worker:latest
  capabilities:
    - fs.read
    - fs.write
  security:
    security_context: default
  resources:
    timeout_secs: 120
  environment:
    TARGET_FILE: "{file_path}"
"""

The child reads TARGET_FILE from its environment at runtime.

Resource Locking

When multiple children might write to the same file or shared state, use resource locks.

Each lock is internally bound to the calling execution's execution_id. This means locks are automatically released when the holding execution completes or is cancelled, in addition to the TTL-based expiry. You do not need to pass execution_id explicitly — the orchestrator resolves it from the caller's context.

# Child agent acquires a lock before writing to shared output
lock = client.call_tool("aegis.acquire_lock", {
    "resource": "workspace/aggregated_output.json",
    "ttl_secs": 30
})

try:
    # Read current state
    try:
        current = json.loads(client.call_tool("fs.read", {"path": "/workspace/aggregated_output.json"})["content"])
    except:
        current = []

    # Append this child's result
    current.append({"file": os.environ["TARGET_FILE"], "result": my_result})

    # Write back
    client.call_tool("fs.write", {
        "path": "/workspace/aggregated_output.json",
        "content": json.dumps(current)
    })
finally:
    client.call_tool("aegis.release_lock", {
        "lock_token": lock["lock_token"]
    })

Inter-Agent Messaging

Use messaging for lightweight coordination between long-running agents in the same swarm:

# Parent sends work items to a child
client.call_tool("aegis.send_message", {
    "to_agent_id": child_agent_id,
    "payload": json.dumps({"command": "analyze", "target": "/workspace/file.py"}).encode()
})

# Broadcast a cancellation signal to all agents in the swarm
client.call_tool("aegis.broadcast_message", {
    "swarm_id": swarm_id,
    "payload": b'{"command": "stop"}'
})

Messages are delivered in send order between the same sender-receiver pair. There is no delivery guarantee across different sender-receiver pairs.

Security Context Ceiling

A child agent's security_context must be a subset of the parent's. If your parent runs with security_context: default and you attempt to spawn a child with security_context: privileged, the spawn will be rejected:

SpawnError: ContextExceedsParentCeiling (requested=privileged, parent=default)

Phase 1 (current): Enforcement uses name-based comparison — the child's security_context name must exactly match the parent's. A child requesting a different security context name is rejected. Phase 2 will introduce capability-lattice comparison, enabling strict-subset requests under different context names.

Always use the same or more restrictive security context for child agents.

Depth Limit

AEGIS enforces a maximum swarm depth of 3. A root agent (depth 0) can spawn depth-1 children. Depth-1 children can spawn depth-2 children. Depth-2 children can spawn depth-3 children. Attempting to spawn from depth-3 returns SpawnError::MaxDepthExceeded.

Design your decomposition to stay within this limit. If you need deeper hierarchies, flatten the decomposition into wider (more parallel) rather than deeper (more sequential) structures.

Monitoring Swarms

Swarm-specific listing and cancellation are not exposed as dedicated CLI commands in the current release. Use execution logs/streams for observability and POST /v1/executions/{execution_id}/cancel for cancellation. Cancelling a parent execution automatically cascades to all children via SwarmCancellationPort.

Six SwarmEvent variants are published to the event bus and available via gRPC streaming:

SwarmCreated — emitted when a new swarm is initialized
ChildSpawned — emitted for each child execution added to the swarm
SwarmDissolved — emitted when the swarm enters Dissolved state (includes reason and dissolved_at)
LockAcquired — emitted when a resource lock is granted (includes execution_id of the holder)
LockReleased — emitted when a resource lock is released or expires
MessageBroadcast — emitted for broadcast messages (includes recipient_count)

Building Multi-Agent Swarms

On this page