Agents
The agent manifest format, lifecycle states, runtime selection, and BYOLLM model alias system.
Agents
An Agent in AEGIS is a stateless compute process defined entirely by a declarative YAML manifest (kind: Agent). The orchestrator reads the manifest to determine the runtime environment, what tools the agent is allowed to use, what security constraints to enforce, what resources to allocate, and how to validate the output.
Agents do not maintain state between executions. Context is injected by the orchestrator at the start of each execution and tool results are returned to the agent via the orchestrator proxy.
Execution Context Overrides
Direct agent executions can inject a structured dictionary of context variables at start time using context_overrides.
- The dictionary must be a JSON or YAML object.
- Keys are exposed as top-level template variables during prompt rendering.
- The same normalized object is forwarded into the runtime task context so the agent receives the structured values beyond prompt interpolation.
- Reserved built-in keys cannot be overridden. Attempts to replace fields such as
instruction,input, or iteration metadata fail validation.
This makes it possible to keep a reusable agent manifest while swapping execution-specific variables from a file or API call.
aegis task execute python-coder \
--input @input.json \
--context @context.yamlExample context.yaml:
repo_url: https://github.com/example/service
branch: main
review:
severity: highIf the prompt template references {{repo_url}} or {{review.severity}}, the override values win over same-named non-reserved context variables for that execution.
Manifest Structure
All agent manifests follow a Kubernetes-style format:
apiVersion: 100monkeys.ai/v1
kind: Agent
metadata:
name: code-reviewer
version: "1.0.0"
labels:
team: platform
environment: production
spec:
runtime:
language: python
version: "3.11"
isolation: docker
task:
instruction: "Reviews pull requests and outputs structured feedback."
security:
network:
mode: allow
allowlist:
- api.github.com
- api.openai.com
filesystem:
read:
- /workspace
write:
- /workspace
resources:
cpu: 1000
memory: "1Gi"
timeout: "300s"
execution:
mode: iterative
max_iterations: 10
validation:
system:
must_succeed: true
output:
format: json
env:
LOG_LEVEL: info
volumes:
- name: workspace
storage_class: ephemeral
mount_path: /workspace
access_mode: read-write
ttl_hours: 1Manifest Fields
The table below covers the most common fields. For the complete field-by-field specification including all options, defaults, and validation rules, see the Agent Manifest Reference.
metadata
| Field | Type | Required | Description |
|---|---|---|---|
name | string | ✓ | Unique identifier for the agent on this node. Used in CLI commands and gRPC calls. |
labels | map[string]string | Arbitrary key-value tags for filtering and organization. |
spec
| Field | Type | Required | Description |
|---|---|---|---|
description | string | Human-readable description injected into the system prompt. | |
runtime | object | ✓ | Language, version, and isolation mode. |
task | object | Instruction and prompt template. Community skills can be imported via the skill-import workflow. | |
security | object | Network policy, filesystem policy, resource limits (deny-by-default). | |
execution | object | Iteration mode, max iterations, and validation criteria. | |
tools | object[] or string[] | MCP tools the agent may invoke. | |
env | map[string]string | Environment variables injected into the agent container. | |
volumes | object[] | Volume mounts. See Configuring Storage. |
spec.security
| Field | Type | Description |
|---|---|---|
network.mode | allow | deny | none | Policy mode. allow = allowlist; none = no network. |
network.allowlist | string[] | Allowed domain names and CIDR blocks. |
network.denylist | string[] | Explicitly blocked domains. |
filesystem.read | string[] | Readable paths inside the container. Glob patterns supported. |
filesystem.write | string[] | Writable paths inside the container. Glob patterns supported. |
resources.cpu | integer | CPU in millicores (1000 = 1 core). Default: 1000. |
resources.memory | string | Memory limit. Human-readable: "512Mi", "1Gi". Default: "512Mi". |
resources.timeout | string | Total execution timeout. Human-readable: "300s", "5m". Max "1h". |
spec.execution
| Field | Type | Description |
|---|---|---|
mode | one-shot | iterative | Execution strategy. iterative enables the 100monkeys refinement loop. Default: one-shot. |
max_iterations | integer | Maximum refinement loops (for iterative mode). Default: 5. |
iteration_timeout | string | Per-iteration timeout. Human-readable: "30s", "60s", "5m". Default: "300s" (5 minutes). Each iteration gets this much time for LLM calls + tool invocations. |
llm_timeout_seconds | integer | HTTP socket timeout for bootstrap.py LLM calls. Default: 300 seconds. |
memory | boolean | Enable Cortex memory system. Default: false. |
Agent Lifecycle
An agent transitions through the following states after deploy:
deployed → paused → deployed
│
└──────────────────────→ archived| State | Description |
|---|---|
deployed | The manifest is registered. New executions can be started against this agent. |
paused | The manifest is retained but no new executions are accepted. Running executions complete normally. |
archived | The manifest is soft-deleted. Cannot be unarchived. Historical execution records are retained. |
aegis agent deploy ./agent.yaml
aegis agent list
aegis agent show <id>
aegis agent remove <id>Semantic Discovery (Enterprise)
On nodes with the discovery service configured, deployed agents are automatically indexed for semantic search. This allows agents and operators to find existing agents by natural-language intent (e.g. "review code for security issues") rather than by exact name, using the aegis.agent.search MCP tool. This is an enterprise feature; nodes without discovery configured use aegis.agent.list for enumeration.
Runtime Selection
Two separate concerns govern how an agent runs:
- The container image — defined entirely by the agent manifest (
spec.runtime). Each agent declares its own image, independently of other agents on the same node. - The isolation technology — determined by node configuration (
aegis-config.yaml). This controls whether images run inside Docker containers or Firecracker microVMs.
Container Image: Manifest-Driven
AEGIS supports two mutually exclusive ways to specify the container image in spec.runtime:
| Mode | Manifest Fields | How the Image Is Resolved |
|---|---|---|
| StandardRuntime | language + version | Orchestrator resolves to a pinned official image (e.g., python:3.11-slim). No image to build or maintain. |
| CustomRuntime | image | Orchestrator pulls directly from your registry. You build and maintain the image. |
See Standard Runtime Registry for the full language-version table, and Custom Runtime Agents for the custom image path.
Isolation Technology: Node-Config-Driven
The AgentRuntime trait abstracts all isolation backends. Switching a node from Docker to Podman or Firecracker requires only a config change — no agent manifest changes needed.
| Isolation | Use Case | Requirement |
|---|---|---|
docker | Development, staging, and production | Docker or Podman runtime accessible via container_socket_path |
firecracker | Hardened production | Bare-metal or KVM-passthrough host, Linux kernel 5.10+ |
The docker isolation type works with both Docker and Podman. The orchestrator auto-detects the container engine from the configured socket. Podman rootless mode provides additional security by eliminating the privileged daemon.
See Docker Deployment, Podman Deployment, and Firecracker Runtime for deployment details.
BYOLLM: Model Alias System
Agent manifests reference LLM models by alias, not by provider name. This decouples agent definitions from infrastructure choices.
Specify the model alias in your manifest (for example in semantic validation settings), and let the node operator map aliases to providers in node configuration.
In aegis-config.yaml, the node operator maps aliases to providers:
llm:
providers:
- name: openai-gpt4o
type: openai
api_key: "env:OPENAI_API_KEY"
model: gpt-4o
- name: claude-sonnet-4-5
type: anthropic
api_key: "env:ANTHROPIC_API_KEY"
model: claude-sonnet-4-5
aliases:
default: openai-gpt4o
fast: claude-sonnet-4-5
reasoning: openai-gpt4oTo swap the model backing the default alias for all agents on the node, change the alias mapping in config and restart the daemon — no manifest changes required.
See Configuring LLM Providers for the full provider configuration reference.
See Also
- Agent Manifest Reference — Complete field-by-field specification for
kind: Agentmanifests - Workflow Manifest Reference — Declarative FSM manifest for multi-agent coordination
- Executions — How agents are run and iterated
- Swarms — Coordinating multiple agents in a parent-child hierarchy