Aegis Orchestrator
Reference

Node Configuration Reference

Complete specification for the NodeConfig YAML format (v1.0) — schema, field definitions, credential resolution, model alias system, and example configurations.

Node Configuration Reference

API Version: 100monkeys.ai/v1 | Kind: NodeConfig | Status: Canonical

The Node Configuration defines the capabilities, resources, and LLM providers available on an AEGIS Agent Host (Orchestrator Node or Edge Node). It uses the same Kubernetes-style declarative format (apiVersion/kind/metadata/spec) as the Agent Manifest and Workflow Manifest.

Key capabilities:

  • BYOLLM (Bring Your Own LLM) — use any provider (OpenAI, Anthropic, Ollama, LM Studio)
  • Air-gapped operation — local LLMs (Ollama) for fully offline deployments
  • Provider abstraction — agent manifests use model aliases, not hardcoded provider names
  • Hot-swappable models — change underlying LLM without updating agent manifests

For an annotated walkthrough of every field, see Daemon Configuration.


Annotated Full Example

apiVersion: 100monkeys.ai/v1        # required; must be exactly this value
kind: NodeConfig                     # required; must be exactly "NodeConfig"

metadata:
  name: production-node-01           # required; unique human-readable node name
  version: "1.0.0"                   # optional; configuration version for tracking
  labels:                            # optional; key-value pairs for categorization
    environment: production
    region: us-west-2

spec:
  # ─── Node Identity ──────────────────────────────────────────────────────────
  node:
    id: "550e8400-e29b-41d4-a716-446655440000"  # required; stable UUID
    type: orchestrator               # required; edge | orchestrator | hybrid
    region: us-west-2                # optional; geographic region
    tags:                            # optional; for execution_targets matching
      - production
      - gpu
    resources:                       # optional; available compute resources
      cpu_cores: 8
      memory_gb: 32
      disk_gb: 500
      gpu: true

  # ─── Image Tag ──────────────────────────────────────────────────────────────
  image_tag: "0.1.0-pre-alpha"      # optional; written by aegis init --tag / aegis update

  # ─── LLM Providers ──────────────────────────────────────────────────────────
  llm_providers:
    - name: openai-primary
      type: openai
      endpoint: "https://api.openai.com/v1"
      api_key: "env:OPENAI_API_KEY"
      enabled: true
      models:
        - alias: default
          model: gpt-4o
          capabilities: [chat, code, reasoning]
          context_window: 128000
          cost_per_1k_tokens: 0.005
        - alias: fast
          model: gpt-4o-mini
          capabilities: [chat, code]
          context_window: 128000
          cost_per_1k_tokens: 0.00015

    - name: anthropic-primary
      type: anthropic
      endpoint: "https://api.anthropic.com/v1"
      api_key: "secret:aegis-system/llm/anthropic-api-key"
      enabled: true
      models:
        - alias: smart
          model: claude-sonnet-4-5
          capabilities: [chat, code, reasoning]
          context_window: 200000
          cost_per_1k_tokens: 0.003

    - name: ollama-local
      type: ollama
      endpoint: "http://localhost:11434"
      enabled: true
      models:
        - alias: local
          model: qwen2.5-coder:32b
          capabilities: [chat, code]
          context_window: 32000
          cost_per_1k_tokens: 0.0

  # ─── LLM Selection Strategy ─────────────────────────────────────────────────
  llm_selection:
    strategy: prefer-local           # prefer-local | prefer-cloud | cost-optimized | latency-optimized
    default_provider: openai-primary
    fallback_provider: ollama-local
    max_retries: 3
    retry_delay_ms: 1000

  # ─── Runtime ────────────────────────────────────────────────────────────────
  runtime:
    bootstrap_script: "assets/bootstrap.py"
    default_isolation: docker        # docker | firecracker | inherit | process
    container_socket_path: "/var/run/docker.sock"  # or Podman: /run/user/1000/podman/podman.sock
    container_network_mode: "aegis-network"
    orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"
    nfs_server_host: "env:AEGIS_NFS_HOST"
    nfs_port: 2049
    nfs_mountport: 2049
    runtime_registry_path: "runtime-registry.yaml"  # default value

  # ─── Network ────────────────────────────────────────────────────────────────
  network:
    bind_address: "0.0.0.0"
    port: 8088
    grpc_port: 50051
    orchestrator_endpoint: null      # WebSocket URL for edge → orchestrator (edge nodes only)
    heartbeat_interval_seconds: 30
    tls:
      cert_path: "/etc/aegis/tls/server.crt"
      key_path: "/etc/aegis/tls/server.key"

  # ─── Storage ────────────────────────────────────────────────────────────────
  storage:
    backend: seaweedfs               # seaweedfs | local_host | opendal
    fallback_to_local: true
    nfs_port: 2049
    seaweedfs:
      filer_url: "http://localhost:8888"
      mount_point: "/var/lib/aegis/storage"
      default_ttl_hours: 24
      default_size_limit_mb: 1000
      max_size_limit_mb: 10000
      gc_interval_minutes: 60
    local_host:
      mount_point: "/data/shared_llm_weights"
    opendal:
      provider: "memory"

  # ─── Deploy Built-In Templates ──────────────────────────────────────────────
  # Deploy vendored built-in agent and workflow templates on startup.
  # Includes agent-creator-agent, workflow-generator-planner-agent, judge agents,
  # intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator,
  # and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows.
  # Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function.
  deploy_builtins: false

  # ─── MCP Tool Servers ───────────────────────────────────────────────────────
  mcp_servers:
    - name: web-search
      enabled: true
      executable: "node"
      args: ["/opt/aegis-tools/web-search/index.js"]
      capabilities:
        - name: web.search
          skip_judge: true   # read-only lookup — skip inner-loop judge overhead
        - name: web.fetch
          skip_judge: true   # read-only fetch — skip inner-loop judge overhead
      credentials:
        SEARCH_API_KEY: "secret:aegis-system/tools/search-api-key"
      environment:
        LOG_LEVEL: "info"
      health_check:
        interval_seconds: 60
        timeout_seconds: 5
        method: "tools/list"
      resource_limits:
        cpu_millicores: 1000
        memory_mb: 512

  # ─── SEAL ───────────────────────────────────────────────────────────────────
  seal:
    private_key_path: "/etc/aegis/seal/private.pem"
    public_key_path: "/etc/aegis/seal/public.pem"
    issuer: "aegis-orchestrator"
    audiences: ["aegis-agents"]
    token_ttl_seconds: 3600

  # ─── Security Contexts ──────────────────────────────────────────────────────
  security_contexts:
    - name: coder-default
      description: "Standard coder context — filesystem + commands + safe package registries"
      capabilities:
        - tool_pattern: "fs.*"
          path_allowlist: [/workspace, /agent]
        - tool_pattern: "cmd.run"
          subcommand_allowlist:
            git: [clone, add, commit, push, pull, status, diff]
            cargo: [build, test, fmt, clippy, check, run]
            npm: [install, run, test, build, ci]
            python: ["-m"]
        - tool_pattern: "web.fetch"
          domain_allowlist: [pypi.org, crates.io, npmjs.com]
          rate_limit:
            calls: 30
            per_seconds: 60
      deny_list: []

    - name: aegis-system-operator
      description: "Platform operator — all safe tools plus destructive and orchestrator commands"
      capabilities:
        - tool_pattern: "fs.*"
          path_allowlist: [/workspace, /agent, /shared]
        - tool_pattern: "cmd.run"
          subcommand_allowlist:
            git: [clone, add, commit, push, pull, status, diff, stash]
            cargo: [build, test, fmt, clippy, check, run]
            npm: [install, run, test, build, ci]
            python: ["-m"]
        - tool_pattern: "web.*"
        - tool_pattern: "aegis.agent.delete"
        - tool_pattern: "aegis.workflow.delete"
        - tool_pattern: "aegis.task.remove"
        - tool_pattern: "aegis.system.info"
        - tool_pattern: "aegis.system.config"
      deny_list: []

  # ─── Builtin Dispatchers ────────────────────────────────────────────────────
  builtin_dispatchers:
    - name: "cmd"
      description: "Execute shell commands inside the agent container via Dispatch Protocol"
      enabled: true
      capabilities:
        - name: cmd.run
          skip_judge: false  # state-mutating — always validate
    - name: "fs"
      description: "Filesystem operations routed through AegisFSAL"
      enabled: true
      capabilities:
        - name: fs.read
          skip_judge: true   # read-only — skip inner-loop judge overhead
        - name: fs.write
          skip_judge: false  # state-mutating — always validate
        - name: fs.list
          skip_judge: true   # read-only — skip inner-loop judge overhead
        - name: fs.grep
          skip_judge: true   # read-only — skip inner-loop judge overhead
        - name: fs.glob
          skip_judge: true   # read-only — skip inner-loop judge overhead
        - name: fs.edit
          skip_judge: false  # state-mutating — always validate
        - name: fs.multi_edit
          skip_judge: false  # state-mutating — always validate
        - name: fs.create_dir
          skip_judge: false  # state-mutating — always validate
        - name: fs.delete
          skip_judge: false  # state-mutating — always validate

  # ─── IAM (OIDC) ─────────────────────────────────────────────────────────────
  iam:
    realms:
      - slug: aegis-system
        issuer_url: "https://auth.myzaru.com/realms/aegis-system"
        jwks_uri: "https://auth.myzaru.com/realms/aegis-system/protocol/openid-connect/certs"
        audience: "aegis-orchestrator"
        kind: system
    jwks_cache_ttl_seconds: 300
    claims:
      zaru_tier: "zaru_tier"
      aegis_role: "aegis_role"

  # ─── gRPC Auth ──────────────────────────────────────────────────────────────
  grpc_auth:
    enabled: true
    exempt_methods:
      - "/aegis.v1.InnerLoop/Generate"

  # ─── Secrets (OpenBao) ──────────────────────────────────────────────────────
  secrets:
    backend:
      address: "https://openbao.internal:8200"
      auth_method: approle
      approle:
        role_id: "env:OPENBAO_ROLE_ID"
        secret_id_env_var: "OPENBAO_SECRET_ID"
      namespace: "aegis-system"
      tls:
        ca_cert: "/etc/aegis/openbao-ca.pem"

  # ─── Database ───────────────────────────────────────────────────────────────
  database:
    url: "env:AEGIS_DATABASE_URL"
    max_connections: 10
    connect_timeout_seconds: 5

  # ─── Temporal ───────────────────────────────────────────────────────────────
  temporal:
    address: "temporal:7233"
    worker_http_endpoint: "http://temporal-worker:3000"
    worker_secret: "env:TEMPORAL_WORKER_SECRET"
    namespace: "default"
    task_queue: "aegis-agents"
    max_connection_retries: 30

  # ─── Cortex ─────────────────────────────────────────────────────────────────
  cortex:
    grpc_url: "http://cortex:50052"
    api_key: "env:CORTEX_API_KEY"  # Required for Zaru SaaS; absent = memoryless mode

  # ─── External SEAL Tooling Gateway ───────────────────────────────
  seal_gateway:
    # gRPC endpoint URL for aegis-seal-gateway
    url: "http://aegis-seal-gateway:50055"

  # ─── Execution Limits ───────────────────────────────────────────────────────
  max_execution_list_limit: 1000

  # ─── Cluster Protocol ───────────────────────────────────────────────────────
  # Configures this node's role in the multi-node cluster topology.
  cluster:
    # Enable cluster mode. Default: false.
    enabled: true
    
    # Node role in cluster. Options: controller | worker | hybrid. Default: hybrid.
    role: worker
    
    # Controller settings (required for workers)
    controller:
      # gRPC endpoint of the controller node.
      endpoint: "grpc://aegis-controller:50056"
      # Bootstrap token for initial attestation (Step 0).
      token: "env:AEGIS_CLUSTER_TOKEN"

    # Port for NodeClusterService gRPC (controller only). Default: 50056.
    cluster_grpc_port: 50056
    # Static list of peer controller addresses. Default: [].
    peers: []
    # Path to the persistent Ed25519 keypair file for node identity.
    # Generated automatically on first startup if missing.
    node_keypair_path: "/etc/aegis/node_keypair.pem"
    # Interval in seconds for worker heartbeats to the controller. Default: 30.
    heartbeat_interval_secs: 30
    # Re-attest this many seconds before the security token expires. Default: 120.
    token_refresh_margin_secs: 120
    # TLS configuration for cluster communication (mTLS).
    tls:
      enabled: true
      cert_path: "/etc/aegis/certs/node.crt"
      key_path: "/etc/aegis/certs/node.key"
      ca_cert: "/etc/aegis/certs/ca.crt"

  # ─── Observability ──────────────────────────────────────────────────────────
  observability:
    logging:
      level: info
      format: json
      # ── OTLP Log Export ─────────────────────────────────────────
      # Set otlp_endpoint to start shipping logs to any OpenTelemetry-compatible
      # backend (Grafana Cloud, Datadog, self-hosted OTEL Collector, etc.).
      # otlp_endpoint: "http://otel-collector:4317"   # grpc (default)
      # otlp_endpoint: "https://otlp-gateway.grafana.net/v1/logs"  # Grafana Cloud
      # otlp_protocol: grpc                           # grpc (default) | http
      # otlp_headers:                                 # API keys / auth headers
      #   Authorization: "env:OTLP_AUTH_TOKEN"
      # otlp_min_level: info                          # min log level exported (default: info)
      # otlp_service_name: aegis-orchestrator         # service.name resource attr
      # batch:
      #   max_queue_size: 2048
      #   scheduled_delay_ms: 5000
      #   max_export_batch_size: 512
      #   export_timeout_ms: 10000
      # tls:
      #   verify: true                                # set false to skip cert verify (dev only)
      #   ca_cert_path: null                          # custom CA cert for self-signed backends
    metrics:
      enabled: true
      port: 9091
      path: "/metrics"
    tracing:
      enabled: false

Manifest Envelope

All node configuration files use the Kubernetes-style envelope:

FieldTypeRequiredValue
apiVersionstring100monkeys.ai/v1
kindstringNodeConfig
metadata.namestringUnique human-readable node name
metadata.versionstringSemantic version for tracking
metadata.labelsmapKey-value pairs for categorization
specobjectAll configuration sections documented below

Credential Resolution

Any string value in the config supports credential prefixes:

PrefixExampleResolution
env:VAR_NAMEenv:OPENAI_API_KEYRead from daemon process environment at startup
secret:pathsecret:aegis-system/kv/api-keyResolved from OpenBao at runtime (requires spec.secrets.backend)
literal:valueliteral:test-keyUse literal string (not recommended for production)
(bare string)sk-abc123...Plaintext. Avoid for secrets.

Model Alias System

Agent manifests reference model aliases, not provider-specific model names. The node configuration maps aliases to real models, enabling hot-swapping and provider independence.

Standard Aliases

AliasPurpose
defaultGeneral-purpose model (balanced cost/performance)
fastLow-latency model (quick responses)
smartHigh-capability model (complex reasoning)
cheapCost-optimized model
localLocal-only model (air-gapped)

How It Works

Agent manifest references an alias:

# agent.yaml
spec:
  task:
    prompt_template: ...
  # The agent uses whatever model is mapped to "default" on the node

Node A (cloud) maps default → GPT-4o:

llm_providers:
  - name: openai
    type: openai
    models:
      - alias: default
        model: gpt-4o

Node B (air-gapped) maps default → Llama 3.2:

llm_providers:
  - name: ollama
    type: ollama
    models:
      - alias: default
        model: llama3.2:latest

Same agent manifest runs on both nodes without changes.


Section Reference

spec.node

Required. Identifies this node within the AEGIS cluster.

KeyTypeRequiredDefaultDescription
idstringUnique stable node identifier. UUID recommended.
typeenumedge | orchestrator | hybrid
regionstringnullGeographic region (e.g., us-east-1)
tagsstring[][]Capability tags matched against execution_targets in agent manifests
resources.cpu_coresu32Available CPU cores
resources.memory_gbu32Available RAM in GB
resources.disk_gbu32Available disk in GB
resources.gpuboolfalseGPU available

spec.image_tag

Optional. The Docker image tag used for all AEGIS-owned service containers. Written by aegis init --tag <TAG> at initialization and updated automatically by aegis update. When absent, both commands default to the version string embedded in the aegis binary.

KeyTypeRequiredDefaultDescription
image_tagstring<binary version>Tag applied to all AEGIS-owned Docker images (e.g. ghcr.io/100monkeys-ai/aegis-orchestrator:<tag>). Written by aegis init --tag and updated by aegis update.

spec.llm_providers

Required array. At least one entry with at least one model is required.

KeyTypeRequiredDefaultDescription
namestringUnique provider name
typeenumopenai | anthropic | ollama | openai-compatible
endpointstringAPI endpoint URL
api_keystringnullAPI key. Supports env: and secret: prefixes.
enabledbooltrueWhether this provider is active
models[].aliasstringAlias referenced in agent manifests
models[].modelstringProvider-side model identifier
models[].capabilitiesstring[]chat | embedding | reasoning | vision | code
models[].context_windowu32Max context window in tokens
models[].cost_per_1k_tokensf640.0Cost per 1K tokens (0.0 for free/local)

Provider Types

TypeUse CaseAPI Key Required
openaiOpenAI APIYes
anthropicAnthropic APIYes
ollamaLocal Ollama serverNo
openai-compatibleLM Studio, vLLM, or any OpenAI-compatible APIDepends

spec.llm_selection

Optional. Controls runtime provider selection strategy.

KeyTypeDefaultDescription
strategyenumprefer-localprefer-local | prefer-cloud | cost-optimized | latency-optimized
default_providerstringnullProvider to use when no preference is specified
fallback_providerstringnullProvider to use if the primary fails
max_retriesu323Maximum retry attempts on LLM failure
retry_delay_msu641000Delay between retries in milliseconds

spec.runtime

Optional. Controls how agent containers are launched.

KeyTypeDefaultDescription
bootstrap_scriptstringassets/bootstrap.pyPath to bootstrap script relative to orchestrator binary
default_isolationenuminheritdocker | firecracker | inherit | process. The docker value works with both Docker and Podman runtimes — the orchestrator auto-detects the engine from the configured socket.
container_socket_pathstring(platform default)Container runtime socket path. The orchestrator auto-detects whether the socket belongs to Docker or Podman. Common values: /var/run/docker.sock (Docker), /run/user/<UID>/podman/podman.sock (Podman rootless), /run/podman/podman.sock (Podman rootful). If omitted, the orchestrator checks CONTAINER_HOST, then DOCKER_HOST, then falls back to the platform default (/var/run/docker.sock).
container_network_modestringnullContainer network name for agent containers. Supports env:. Works with both Docker and Podman networks.
orchestrator_urlstringhttp://localhost:8088Callback URL reachable from inside agent containers. Supports env:.
nfs_server_hoststringnullNFS server host as seen by the Docker daemon host OS. Supports env:.
nfs_portu162049NFS server port
nfs_mountportu162049NFS mountd port
runtime_registry_pathstringruntime-registry.yamlPath to the StandardRuntime registry YAML. Resolved relative to the daemon working directory. Hard-fails at startup if missing.

nfs_server_host by environment:

EnvironmentValue
WSL2 / Linux native"127.0.0.1"
Docker Desktop (macOS)"host.docker.internal"
Linux bridge network"172.17.0.1" (Docker bridge gateway)
Remote / VM host<physical host IP>
Via env var"env:AEGIS_NFS_HOST"

spec.network

Optional. Configures ports and TLS.

KeyTypeDefaultDescription
bind_addressstring0.0.0.0Network interface to bind all listeners
portu168088HTTP REST API port
grpc_portu1650051gRPC API port
orchestrator_endpointstringnullWebSocket URL for edge → orchestrator connection (edge nodes only)
heartbeat_interval_secondsu6430Health check ping interval
tls.cert_pathstringTLS certificate path
tls.key_pathstringTLS private key path
tls.ca_pathstringnullCA certificate path (optional)

spec.storage

Optional. Defaults to the local_host backend.

KeyTypeDefaultDescription
backendenumlocal_hostseaweedfs | local_host | opendal
fallback_to_localbooltrueGracefully fall back to local storage when SeaweedFS is unreachable
nfs_portu162049NFS Server Gateway listen port
seaweedfs.filer_urlstringhttp://localhost:8888SeaweedFS Filer endpoint
seaweedfs.mount_pointstring/var/lib/aegis/storageHost filesystem mount point
seaweedfs.default_ttl_hoursu3224Default TTL for ephemeral volumes (hours)
seaweedfs.default_size_limit_mbu641000Default per-volume size quota (MB)
seaweedfs.max_size_limit_mbu6410000Hard ceiling on volume size (MB)
seaweedfs.gc_interval_minutesu3260Expired volume GC interval (minutes)
seaweedfs.s3_endpointstringnullOptional SeaweedFS S3 gateway endpoint
seaweedfs.s3_regionstringus-east-1S3 gateway region
local_host.mount_pointstring/var/lib/aegis/local-host-volumesHost filesystem mount point for local volumes
opendal.providerstringmemoryOpenDAL scheme provider
opendal.optionsmap{}OpenDAL provider options

spec.deploy_builtins

Optional. Default: false.

KeyTypeDefaultDescription
deploy_builtinsboolfalseDeploy vendored built-in agent and workflow templates on startup. Includes agent-creator-agent, workflow-generator-planner-agent, judge agents, intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator, and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows. Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function.

spec.force_deploy_builtins

Optional. Default: disabled.

KeyTypeDefaultDescription
force_deploy_builtinsstringForce re-register all built-in agents and workflows on startup. Accepts "true" or "env:VAR_NAME". Use after upgrades to flush stale definitions.

spec.mcp_servers

Optional array. Each entry defines an external MCP Tool Server process.

KeyTypeDefaultDescription
namestringUnique server name on this node
enabledbooltrueWhether to start this server
executablestringExecutable path
argsstring[][]Command-line arguments
capabilitiesCapabilityConfig[][]Per-tool capability objects (see below)
credentialsmap{}API keys/tokens injected as env vars. Values support secret:.
environmentmap{}Non-secret env vars for the server process
health_check.interval_secondsu6460Health check interval
health_check.timeout_secondsu645Health check timeout
health_check.methodstringtools/listMCP method used to health-check the server
resource_limits.cpu_millicoresu321000CPU limit (1000 = 1 core)
resource_limits.memory_mbu32512Memory limit (MB)

Each CapabilityConfig entry:

KeyTypeDefaultDescription
namestringTool name exposed to agents (e.g. "web.search", "gmail.read")
skip_judgeboolfalseWhen true, the orchestrator bypasses the inner-loop semantic pre-execution judge for this tool even if spec.execution.tool_validation is enabled in the agent manifest. Set true only for read-only or otherwise idempotent tools. Set false for any state-mutating tool.

skip_judge is an operator override. It does not disable SEAL authentication, SecurityContext policy, argument validation, or routing. It only removes the extra semantic review step before dispatch.


spec.seal

Optional. Enables cryptographic agent authorization via SEAL. Required in production.

KeyTypeDefaultDescription
private_key_pathstringPath to RSA private key PEM for signing SecurityToken JWTs
public_key_pathstringPath to RSA public key PEM for verifying SecurityToken JWTs
issuerstringaegis-orchestratorJWT iss claim
audiencesstring[][aegis-agents]JWT aud claims
token_ttl_secondsu643600SecurityToken lifetime in seconds

spec.security_contexts

Optional array. Named permission boundaries assigned to agents at execution time.

The platform ships with the following built-in contexts that are registered automatically and do not need to be defined in aegis-config.yaml:

NameSurfaceDescription
aegis-system-agent-runtimeExecutionAgent containers — fs.* scoped to /workspace, cmd.run, web.*, aegis read/execution tools
aegis-system-defaultAuthoringPlatform authoring agents — unrestricted tool access for manifest authoring and validation
zaru-freeChat/MCPZaru Free tier chat surface
zaru-proChat/MCPZaru Pro tier chat surface
zaru-businessChat/MCPZaru Business tier chat surface
zaru-enterpriseChat/MCPZaru Enterprise tier chat surface
aegis-system-operatorOperatorPlatform operators — all safe tools plus destructive and orchestrator commands

Entries in spec.security_contexts define operator-provided contexts that supplement the built-in list. Built-in context names cannot be redefined here.

Each entry (SecurityContextDefinition):

KeyTypeDefaultDescription
namestringUnique context name, referenced in agent manifests
descriptionstring""Human-readable description
capabilitiesarray[]Tool permissions granted by this context
deny_liststring[][]Explicit tool deny list; overrides any matching capability

Each capabilities entry (CapabilityDefinition):

KeyTypeDescription
tool_patternstringTool name pattern (e.g., "fs.*", "cmd.run", "web.fetch")
path_allowliststring[]Allowed filesystem path prefixes (for fs.* tools)
subcommand_allowlistobjectMap of base command → allowed first positional arguments (for cmd.run). Example: {cargo: ["build","test"]}.
domain_allowliststring[]Allowed network domain suffixes (for web.* tools)
rate_limit.callsu32Number of calls allowed per window
rate_limit.per_secondsu32Window size in seconds
max_response_sizeu64Max response size in bytes

spec.builtin_dispatchers

Optional array. Configures the built-in in-process tool handlers. These are not external MCP server processes — they are implemented directly inside the orchestrator binary and dispatched via the Dispatch Protocol.

Each entry:

KeyTypeDefaultDescription
namestringDispatcher identifier (e.g. "cmd", "fs")
descriptionstring""Human-readable description forwarded to the LLM tool schema
enabledbooltrueActivate or deactivate this dispatcher
capabilitiesCapabilityConfig[][]Per-tool capability objects (same schema as spec.mcp_servers[].capabilities)

Each CapabilityConfig entry follows the same schema described under spec.mcp_servers above.

skip_judge defaults by tool:

ToolDefault skip_judgeRationale
cmd.runfalseState-mutating — subprocess output must always be validated
fs.readtrueRead-only — file contents are deterministic
fs.writefalseState-mutating — written content must be validated
fs.listtrueRead-only — directory listings are deterministic
fs.greptrueRead-only — search results are deterministic
fs.globtrueRead-only — glob matches are deterministic
fs.editfalseState-mutating — edits must be validated
fs.multi_editfalseState-mutating — edits must be validated
fs.create_dirfalseState-mutating
fs.deletefalseState-mutating — destructive operation
web.searchtrueRead-only external lookup
web.fetchtrueRead-only HTTP fetch

If spec.builtin_dispatchers is omitted the orchestrator uses the compiled-in defaults shown above. Explicit configuration is only needed when overriding those defaults.

For external MCP servers, skip_judge follows the same rule: true for deterministic reads, false for anything that can mutate state or trigger side effects. The node configuration is the source of truth for the bypass decision.


spec.iam

Optional. Configures IAM/OIDC as the trusted JWT issuer. Omit to disable JWT validation (dev only).

KeyTypeDefaultDescription
realms[].slugstringRealm name matching the Keycloak configuration
realms[].issuer_urlstringOIDC issuer URL
realms[].jwks_uristringJWKS endpoint for JWT signature verification
realms[].audiencestringExpected aud claim in tokens from this realm
realms[].kindenumsystem | consumer | tenant
jwks_cache_ttl_secondsu32300JWKS key cache TTL
claims.zaru_tierstringzaru_tierKeycloak claim name carrying ZaruTier
claims.aegis_rolestringaegis_roleKeycloak claim name carrying AegisRole

spec.grpc_auth

Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint. Requires spec.iam.

KeyTypeDefaultDescription
enabledbooltrueEnforce JWT validation on gRPC methods
exempt_methodsstring[][/aegis.v1.InnerLoop/Generate]gRPC method full paths exempt from auth

spec.secrets

Optional. Configures OpenBao as the secrets backend. Follows the Keymaster Pattern — agents never access OpenBao directly.

KeyTypeDefaultDescription
backend.addressstringOpenBao server URL
backend.auth_methodstringapproleOnly approle is currently supported
backend.approle.role_idstringAppRole Role ID (public; safe to commit)
backend.approle.secret_id_env_varstringOPENBAO_SECRET_IDEnv var name containing the AppRole Secret ID
backend.namespacestringOpenBao namespace (maps 1:1 to an IAM realm)
backend.tls.ca_certstringnullCA certificate path
backend.tls.client_certstringnullmTLS client certificate path
backend.tls.client_keystringnullmTLS client key path

spec.database

Optional. PostgreSQL connection for persistent state (executions, patterns, workflows). If omitted, the daemon uses in-memory repositories (development mode only).

KeyTypeDefaultDescription
urlstringPostgreSQL connection URL. Supports env: and secret:.
max_connectionsu325Maximum connections in the pool
connect_timeout_secondsu645Connection timeout

Example:

database:
  url: "env:AEGIS_DATABASE_URL"
  max_connections: 10
  connect_timeout_seconds: 5

spec.temporal

Optional. Temporal workflow engine configuration for durable workflow execution. If omitted, workflow orchestration features are unavailable.

KeyTypeDefaultDescription
addressstringtemporal:7233Temporal gRPC server address
worker_http_endpointstringhttp://localhost:3000HTTP endpoint for Temporal worker callbacks. Supports env:.
worker_secretstringnullShared secret for authenticating worker callbacks. Supports env:.
namespacestringdefaultTemporal namespace
task_queuestringaegis-agentsTemporal task queue name
max_connection_retriesi3230Maximum number of connection retries when establishing the Temporal client.

Example:

temporal:
  address: "temporal:7233"
  worker_http_endpoint: "http://aegis-runtime:3000"
  worker_secret: "env:TEMPORAL_WORKER_SECRET"
  namespace: "default"
  task_queue: "aegis-agents"
  max_connection_retries: 30

spec.cortex

Optional. Cortex memory and learning service configuration. If omitted or grpc_url is null, the daemon runs in memoryless mode — no error, no retry, patterns are simply not stored.

KeyTypeDefaultDescription
grpc_urlstringnullCortex gRPC service URL. Supports env:.
api_keystringnullAPI key for 100monkeys hosted Cortex (Zaru SaaS). Supports env: and secret: prefixes. When absent, the orchestrator connects without authentication (local/open cortex).

Example:

cortex:
  grpc_url: "env:CORTEX_GRPC_URL"
  api_key: "env:CORTEX_API_KEY"  # Required for Zaru SaaS

spec.discovery

Discovery — Semantic agent and workflow search (aegis.agent.search, aegis.workflow.search) is powered by the Cortex service. When spec.cortex is configured with a valid grpc_url and api_key, discovery is available automatically. No separate spec.discovery configuration is required.


spec.seal_gateway

Optional. Configures forwarding of external tool invocations to the standalone SEAL tooling gateway.

KeyTypeDefaultDescription
urlstringnullgRPC endpoint URL of aegis-seal-gateway (example: http://aegis-seal-gateway:50055).

If omitted, orchestrator does not forward unknown/external tools to the gateway and continues with built-in routing only.


spec.max_execution_list_limit

Optional. Upper bound on executions returned by a single list_executions request.

KeyTypeDefaultDescription
max_execution_list_limitusize1000Maximum number of executions returned by a single list_executions request. Protects against excessive memory usage.

spec.cluster

Optional. Configures this node's role in the multi-node cluster topology. If omitted, the node defaults to hybrid and operates as a standalone single-node deployment.

spec.cluster.role is orthogonal to spec.node.type. A node with type: orchestrator can have cluster.role: worker.

KeyTypeRequiredDefaultDescription
enabledboolfalseEnable cluster mode.
roleenumhybridNode role: controller | worker | hybrid
controller_endpointstringgRPC endpoint of the controller node. Required for role: worker.
cluster_grpc_portu1650056Port for NodeClusterService gRPC (controller only).
peersstring[][]Static list of peer controller addresses.
node_keypair_pathstringPath to the persistent Ed25519 keypair file for node identity.
heartbeat_interval_secsu6430Interval in seconds for worker heartbeats.
token_refresh_margin_secsu64120Token re-attestation margin in seconds.
tls.enabledbooltrueEnable TLS for cluster communication.
tls.cert_pathstringPath to node TLS certificate.
tls.key_pathstringPath to node TLS private key.
tls.ca_certstringPath to CA certificate for peer verification.

Cluster Roles

RoleExposesConnects To
controllerNodeClusterService on cluster_grpc_port (default 50056)Nothing (it is the authority)
workerForwardExecution stream on spec.network.grpc_port (50051)Controller at controller_endpoint
hybridBoth 50051 and 50056Itself (no external cluster required)

Node Attestation

On startup, a worker node:

  1. Loads (or generates) its Ed25519 keypair from node_keypair_path
  2. Calls AttestNode on the controller
  3. Receives a ChallengeNode call — signs the challenge with its private key
  4. Receives a NodeSecurityToken (signed JWT)
  5. Wraps all subsequent cluster RPCs in a secure envelope
  6. Sends periodic heartbeats to the controller

The NodeSecurityToken is automatically refreshed before expiry. The keypair is persistent across restarts, enabling the controller to recognize re-connecting workers.


spec.observability

Optional.

logging

KeyTypeDefaultEnv OverrideDescription
logging.levelenuminfoRUST_LOGerror | warn | info | debug | trace
logging.formatenumjsonAEGIS_LOG_FORMATjson | text
logging.filestringnullLog file path. Omit to write to stdout.
logging.otlp_endpointstringnullAEGIS_OTLP_ENDPOINTOTLP collector endpoint. Setting this value enables OTLP log export. Omit or null to disable.
logging.otlp_protocolenumgrpcAEGIS_OTLP_PROTOCOLgrpc | http. grpc uses port 4317 (default); http uses port 4318 with Protobuf over HTTP/1.1.
logging.otlp_headersmap{}AEGIS_OTLP_HEADERSKey-value HTTP/gRPC metadata headers sent on every export RPC (e.g. Authorization, api-key). Values support env: prefixes. When set via env var, use comma-separated key=value pairs: Authorization=Bearer token,x-scope-orgid=12345.
logging.otlp_min_levelstringinfoAEGIS_OTLP_LOG_LEVELMinimum log level forwarded to OTLP. Does not affect stdout output.
logging.otlp_service_namestringaegis-orchestratorAEGIS_OTLP_SERVICE_NAMEValue of the service.name OpenTelemetry resource attribute.
logging.batch.max_queue_sizeu322048Maximum number of log records buffered before export.
logging.batch.scheduled_delay_msu645000Interval (ms) between batch export flushes.
logging.batch.max_export_batch_sizeu32512Maximum records per export RPC.
logging.batch.export_timeout_msu6410000Timeout (ms) for a single export RPC.
logging.tls.verifybooltrueWhether to verify the OTLP endpoint's TLS certificate. Set false only for development.
logging.tls.ca_cert_pathstringnullPath to a custom CA certificate PEM for self-signed OTLP backends.

metrics

KeyTypeDefaultDescription
metrics.enabledbooltrueEnable Prometheus metrics
metrics.portu169091Prometheus metrics exposition port
metrics.pathstring/metricsHTTP path for scraping

tracing

KeyTypeDefaultDescription
tracing.enabledboolfalseEnable distributed tracing via OpenTelemetry

Config Discovery Order

The daemon searches for a configuration file in this order (first match wins):

  1. --config <path> CLI flag
  2. AEGIS_CONFIG_PATH environment variable
  3. ./aegis-config.yaml (working directory)
  4. ~/.aegis/config.yaml
  5. /etc/aegis/config.yaml (Linux/macOS)

Example Configurations

Minimal (Local Development)

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: dev-laptop
spec:
  node:
    id: "dev-local"
    type: edge
  llm_providers:
    - name: ollama
      type: ollama
      endpoint: "http://localhost:11434"
      enabled: true
      models:
        - alias: default
          model: llama3.2:latest
          capabilities: [chat, code]
          context_window: 8192
          cost_per_1k_tokens: 0.0
  llm_selection:
    strategy: prefer-local
    default_provider: ollama

Air-Gapped Production

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: prod-airgap-001
  version: "1.0.0"
  labels:
    environment: production
    deployment: air-gapped
spec:
  node:
    id: "550e8400-e29b-41d4-a716-446655440001"
    type: edge
    tags: [production, air-gapped, local-llm]
  llm_providers:
    - name: ollama
      type: ollama
      endpoint: "http://localhost:11434"
      enabled: true
      models:
        - alias: default
          model: llama3.2:latest
          capabilities: [chat, code, reasoning]
          context_window: 8192
          cost_per_1k_tokens: 0.0
        - alias: fast
          model: phi3:mini
          capabilities: [chat, code]
          context_window: 4096
          cost_per_1k_tokens: 0.0
  llm_selection:
    strategy: prefer-local
    default_provider: ollama

Cloud Multi-Provider

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: cloud-multi-001
  version: "1.0.0"
  labels:
    environment: production
    deployment: cloud
spec:
  node:
    id: "550e8400-e29b-41d4-a716-446655440002"
    type: orchestrator
    region: us-west-2
    tags: [production, cloud, multi-provider]
  llm_providers:
    - name: openai
      type: openai
      endpoint: "https://api.openai.com/v1"
      api_key: "env:OPENAI_API_KEY"
      enabled: true
      models:
        - alias: default
          model: gpt-4o
          capabilities: [chat, code, reasoning]
          context_window: 128000
          cost_per_1k_tokens: 0.005
        - alias: fast
          model: gpt-4o-mini
          capabilities: [chat, code]
          context_window: 128000
          cost_per_1k_tokens: 0.00015
    - name: anthropic
      type: anthropic
      endpoint: "https://api.anthropic.com/v1"
      api_key: "env:ANTHROPIC_API_KEY"
      enabled: true
      models:
        - alias: smart
          model: claude-sonnet-4-5
          capabilities: [chat, code, reasoning]
          context_window: 200000
          cost_per_1k_tokens: 0.003
  llm_selection:
    strategy: cost-optimized
    default_provider: openai
    fallback_provider: anthropic

Docker Compose Deployment

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: docker-compose-node
  version: "1.0.0"
spec:
  node:
    id: "env:AEGIS_NODE_ID"
    type: orchestrator
  llm_providers:
    - name: ollama
      type: ollama
      endpoint: "http://ollama:11434"
      enabled: true
      models:
        - alias: default
          model: phi3:mini
          capabilities: [chat, code, reasoning]
          context_window: 4096
          cost_per_1k_tokens: 0.0
  llm_selection:
    strategy: prefer-local
    default_provider: ollama
  runtime:
    default_isolation: docker
    container_network_mode: "env:AEGIS_CONTAINER_NETWORK"
    orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"
    nfs_server_host: "env:AEGIS_NFS_HOST"
    runtime_registry_path: "runtime-registry.yaml"
  storage:
    backend: seaweedfs
    fallback_to_local: true
    seaweedfs:
      filer_url: "http://seaweedfs-filer:8888"
      mount_point: "/var/lib/aegis/storage"
      default_ttl_hours: 24
      default_size_limit_mb: 1000
      max_size_limit_mb: 10000
      gc_interval_minutes: 60
  database:
    url: "env:AEGIS_DATABASE_URL"
    max_connections: 5
    connect_timeout_seconds: 5
  temporal:
    address: "temporal:7233"
    worker_http_endpoint: "http://aegis-runtime:3000"
    worker_secret: "env:TEMPORAL_WORKER_SECRET"
    namespace: "default"
    task_queue: "aegis-agents"
  cortex:
    grpc_url: "env:CORTEX_GRPC_URL"
    api_key: "env:CORTEX_API_KEY"  # Required for Zaru SaaS
  observability:
    logging:
      level: info

On this page