Aegis Orchestrator
Core Concepts

Agents

The agent manifest format, lifecycle states, runtime selection, and BYOLLM model alias system.

Agents

An Agent in AEGIS is a stateless compute process defined entirely by a declarative YAML manifest (kind: Agent). The orchestrator reads the manifest to determine the runtime environment, what tools the agent is allowed to use, what security constraints to enforce, what resources to allocate, and how to validate the output.

Agents do not maintain state between executions. Context is injected by the orchestrator at the start of each execution and tool results are returned to the agent via the orchestrator proxy.


Execution Context Overrides

Direct agent executions can inject a structured dictionary of context variables at start time using context_overrides.

  • The dictionary must be a JSON or YAML object.
  • Keys are exposed as top-level template variables during prompt rendering.
  • The same normalized object is forwarded into the runtime task context so the agent receives the structured values beyond prompt interpolation.
  • Reserved built-in keys cannot be overridden. Attempts to replace fields such as instruction, input, or iteration metadata fail validation.

This makes it possible to keep a reusable agent manifest while swapping execution-specific variables from a file or API call.

aegis task execute python-coder \
  --input @input.json \
  --context @context.yaml

Example context.yaml:

repo_url: https://github.com/example/service
branch: main
review:
  severity: high

If the prompt template references {{repo_url}} or {{review.severity}}, the override values win over same-named non-reserved context variables for that execution.


Manifest Structure

All agent manifests follow a Kubernetes-style format:

apiVersion: 100monkeys.ai/v1
kind: Agent
metadata:
  name: code-reviewer
  version: "1.0.0"
  labels:
    team: platform
    environment: production
spec:
  runtime:
    language: python
    version: "3.11"
    isolation: docker

  task:
    instruction: "Reviews pull requests and outputs structured feedback."

  security:
    network:
      mode: allow
      allowlist:
        - api.github.com
        - api.openai.com
    filesystem:
      read:
        - /workspace
      write:
        - /workspace
    resources:
      cpu: 1000
      memory: "1Gi"
      timeout: "300s"

  execution:
    mode: iterative
    max_iterations: 10
    validation:
      system:
        must_succeed: true
      output:
        format: json

  env:
    LOG_LEVEL: info

  volumes:
    - name: workspace
      storage_class: ephemeral
      mount_path: /workspace
      access_mode: read-write
      ttl_hours: 1

Manifest Fields

The table below covers the most common fields. For the complete field-by-field specification including all options, defaults, and validation rules, see the Agent Manifest Reference.

metadata

FieldTypeRequiredDescription
namestringUnique identifier for the agent on this node. Used in CLI commands and gRPC calls.
labelsmap[string]stringArbitrary key-value tags for filtering and organization.

spec

FieldTypeRequiredDescription
descriptionstringHuman-readable description injected into the system prompt.
runtimeobjectLanguage, version, and isolation mode.
taskobjectInstruction and prompt template. Community skills can be imported via the skill-import workflow.
securityobjectNetwork policy, filesystem policy, resource limits (deny-by-default).
executionobjectIteration mode, max iterations, and validation criteria.
toolsobject[] or string[]MCP tools the agent may invoke.
envmap[string]stringEnvironment variables injected into the agent container.
volumesobject[]Volume mounts. See Configuring Storage.

spec.security

FieldTypeDescription
network.modeallow | deny | nonePolicy mode. allow = allowlist; none = no network.
network.allowliststring[]Allowed domain names and CIDR blocks.
network.denyliststring[]Explicitly blocked domains.
filesystem.readstring[]Readable paths inside the container. Glob patterns supported.
filesystem.writestring[]Writable paths inside the container. Glob patterns supported.
resources.cpuintegerCPU in millicores (1000 = 1 core). Default: 1000.
resources.memorystringMemory limit. Human-readable: "512Mi", "1Gi". Default: "512Mi".
resources.timeoutstringTotal execution timeout. Human-readable: "300s", "5m". Max "1h".

spec.execution

FieldTypeDescription
modeone-shot | iterativeExecution strategy. iterative enables the 100monkeys refinement loop. Default: one-shot.
max_iterationsintegerMaximum refinement loops (for iterative mode). Default: 5.
iteration_timeoutstringPer-iteration timeout. Human-readable: "30s", "60s", "5m". Default: "300s" (5 minutes). Each iteration gets this much time for LLM calls + tool invocations.
llm_timeout_secondsintegerHTTP socket timeout for bootstrap.py LLM calls. Default: 300 seconds.
memorybooleanEnable Cortex memory system. Default: false.

Agent Lifecycle

An agent transitions through the following states after deploy:

deployed → paused → deployed

    └──────────────────────→ archived
StateDescription
deployedThe manifest is registered. New executions can be started against this agent.
pausedThe manifest is retained but no new executions are accepted. Running executions complete normally.
archivedThe manifest is soft-deleted. Cannot be unarchived. Historical execution records are retained.
aegis agent deploy ./agent.yaml
aegis agent list
aegis agent show <id>
aegis agent remove <id>

Semantic Discovery (Enterprise)

On nodes with the discovery service configured, deployed agents are automatically indexed for semantic search. This allows agents and operators to find existing agents by natural-language intent (e.g. "review code for security issues") rather than by exact name, using the aegis.agent.search MCP tool. This is an enterprise feature; nodes without discovery configured use aegis.agent.list for enumeration.


Runtime Selection

Two separate concerns govern how an agent runs:

  1. The container image — defined entirely by the agent manifest (spec.runtime). Each agent declares its own image, independently of other agents on the same node.
  2. The isolation technology — determined by node configuration (aegis-config.yaml). This controls whether images run inside Docker containers or Firecracker microVMs.

Container Image: Manifest-Driven

AEGIS supports two mutually exclusive ways to specify the container image in spec.runtime:

ModeManifest FieldsHow the Image Is Resolved
StandardRuntimelanguage + versionOrchestrator resolves to a pinned official image (e.g., python:3.11-slim). No image to build or maintain.
CustomRuntimeimageOrchestrator pulls directly from your registry. You build and maintain the image.

See Standard Runtime Registry for the full language-version table, and Custom Runtime Agents for the custom image path.

Isolation Technology: Node-Config-Driven

The AgentRuntime trait abstracts all isolation backends. Switching a node from Docker to Podman or Firecracker requires only a config change — no agent manifest changes needed.

IsolationUse CaseRequirement
dockerDevelopment, staging, and productionDocker or Podman runtime accessible via container_socket_path
firecrackerHardened productionBare-metal or KVM-passthrough host, Linux kernel 5.10+

The docker isolation type works with both Docker and Podman. The orchestrator auto-detects the container engine from the configured socket. Podman rootless mode provides additional security by eliminating the privileged daemon.

See Docker Deployment, Podman Deployment, and Firecracker Runtime for deployment details.


BYOLLM: Model Alias System

Agent manifests reference LLM models by alias, not by provider name. This decouples agent definitions from infrastructure choices.

Specify the model alias in your manifest (for example in semantic validation settings), and let the node operator map aliases to providers in node configuration.

In aegis-config.yaml, the node operator maps aliases to providers:

llm:
  providers:
    - name: openai-gpt4o
      type: openai
      api_key: "env:OPENAI_API_KEY"
      model: gpt-4o
    - name: claude-sonnet-4-5
      type: anthropic
      api_key: "env:ANTHROPIC_API_KEY"
      model: claude-sonnet-4-5
  aliases:
    default: openai-gpt4o
    fast: claude-sonnet-4-5
    reasoning: openai-gpt4o

To swap the model backing the default alias for all agents on the node, change the alias mapping in config and restart the daemon — no manifest changes required.

See Configuring LLM Providers for the full provider configuration reference.


See Also

On this page