Complete specification for the NodeConfig YAML format (v1.0) — schema, field definitions, credential resolution, model alias system, and example configurations.

Node Configuration Reference

API Version: 100monkeys.ai/v1 | Kind: NodeConfig | Status: Canonical

The Node Configuration defines the capabilities, resources, and LLM providers available on an AEGIS Agent Host (Orchestrator Node or Edge Node). It uses the same Kubernetes-style declarative format (apiVersion/kind/metadata/spec) as the Agent Manifest and Workflow Manifest.

Key capabilities:

BYOLLM (Bring Your Own LLM) — use any provider (OpenAI, Anthropic, Ollama, LM Studio)
Air-gapped operation — local LLMs (Ollama) for fully offline deployments
Provider abstraction — agent manifests use model aliases, not hardcoded provider names
Hot-swappable models — change underlying LLM without updating agent manifests

For an annotated walkthrough of every field, see Daemon Configuration.

Annotated Full Example

apiVersion: 100monkeys.ai/v1        # required; must be exactly this value
kind: NodeConfig                     # required; must be exactly "NodeConfig"

metadata:
  name: production-node-01           # required; unique human-readable node name
  version: "1.0.0"                   # optional; configuration version for tracking
  labels:                            # optional; key-value pairs for categorization
    environment: production
    region: us-west-2

spec:
  # ─── Node Identity ──────────────────────────────────────────────────────────
  node:
    id: "550e8400-e29b-41d4-a716-446655440000"  # required; stable UUID
    type: orchestrator               # required; edge | orchestrator | hybrid
    region: us-west-2                # optional; geographic region
    tags:                            # optional; for execution_targets matching
      - production
      - gpu
    resources:                       # optional; available compute resources
      cpu_cores: 8
      memory_gb: 32
      disk_gb: 500
      gpu: true

  # ─── Image Tag ──────────────────────────────────────────────────────────────
  image_tag: "0.1.0-pre-alpha"      # optional; written by aegis init --tag / aegis update

  # ─── LLM Providers ──────────────────────────────────────────────────────────
  llm_providers:
    - name: openai-primary
      type: openai
      endpoint: "https://api.openai.com/v1"
      api_key: "env:OPENAI_API_KEY"
      enabled: true
      models:
        - alias: default
          model: gpt-4o
          capabilities: [chat, code, reasoning]
          context_window: 128000
          cost_per_1k_tokens: 0.005
        - alias: fast
          model: gpt-4o-mini
          capabilities: [chat, code]
          context_window: 128000
          cost_per_1k_tokens: 0.00015

    - name: anthropic-primary
      type: anthropic
      endpoint: "https://api.anthropic.com/v1"
      api_key: "secret:aegis-system/llm/anthropic-api-key"
      enabled: true
      models:
        - alias: smart
          model: claude-sonnet-4-5
          capabilities: [chat, code, reasoning]
          context_window: 200000
          cost_per_1k_tokens: 0.003

    - name: ollama-local
      type: ollama
      endpoint: "http://localhost:11434"
      enabled: true
      models:
        - alias: local
          model: qwen2.5-coder:32b
          capabilities: [chat, code]
          context_window: 32000
          cost_per_1k_tokens: 0.0

  # ─── LLM Selection Strategy ─────────────────────────────────────────────────
  llm_selection:
    strategy: prefer-local           # prefer-local | prefer-cloud | cost-optimized | latency-optimized
    default_provider: openai-primary
    fallback_provider: ollama-local
    max_retries: 3
    retry_delay_ms: 1000

  # ─── Runtime ────────────────────────────────────────────────────────────────
  runtime:
    bootstrap_script: "assets/bootstrap.py"
    default_isolation: docker        # docker | firecracker | inherit | process
    container_socket_path: "/var/run/docker.sock"  # or Podman: /run/user/1000/podman/podman.sock
    container_network_mode: "aegis-network"
    orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"
    nfs_server_host: "env:AEGIS_NFS_HOST"
    nfs_port: 2049
    nfs_mountport: 2049
    runtime_registry_path: "runtime-registry.yaml"  # default value

  # ─── Network ────────────────────────────────────────────────────────────────
  network:
    bind_address: "0.0.0.0"
    port: 8088
    grpc_port: 50051
    orchestrator_endpoint: null      # WebSocket URL for edge → orchestrator (edge nodes only)
    heartbeat_interval_seconds: 30
    tls:
      cert_path: "/etc/aegis/tls/server.crt"
      key_path: "/etc/aegis/tls/server.key"

  # ─── Storage ────────────────────────────────────────────────────────────────
  storage:
    backend: seaweedfs               # seaweedfs | local_host | opendal
    fallback_to_local: true
    nfs_port: 2049
    seaweedfs:
      filer_url: "http://localhost:8888"
      mount_point: "/var/lib/aegis/storage"
      default_ttl_hours: 24
      default_size_limit_mb: 1000
      max_size_limit_mb: 10000
      gc_interval_minutes: 60
    local_host:
      mount_point: "/data/shared_llm_weights"
    opendal:
      provider: "memory"

  # ─── Deploy Built-In Templates ──────────────────────────────────────────────
  # Deploy vendored built-in agent and workflow templates on startup.
  # Includes agent-creator-agent, workflow-generator-planner-agent, judge agents,
  # intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator,
  # and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows.
  # Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function.
  deploy_builtins: false

  # ─── MCP Tool Servers ───────────────────────────────────────────────────────
  mcp_servers:
    - name: web-search
      enabled: true
      executable: "node"
      args: ["/opt/aegis-tools/web-search/index.js"]
      capabilities:
        - name: web.search
          skip_judge: true   # read-only lookup — skip inner-loop judge overhead
        - name: web.fetch
          skip_judge: true   # read-only fetch — skip inner-loop judge overhead
      credentials:
        SEARCH_API_KEY: "secret:aegis-system/tools/search-api-key"
      environment:
        LOG_LEVEL: "info"
      health_check:
        interval_seconds: 60
        timeout_seconds: 5
        method: "tools/list"
      resource_limits:
        cpu_millicores: 1000
        memory_mb: 512

  # ─── SEAL ───────────────────────────────────────────────────────────────────
  seal:
    private_key_path: "/etc/aegis/seal/private.pem"
    public_key_path: "/etc/aegis/seal/public.pem"
    issuer: "aegis-orchestrator"
    audiences: ["aegis-agents"]
    token_ttl_seconds: 3600

  # ─── Security Contexts ──────────────────────────────────────────────────────
  security_contexts:
    - name: coder-default
      description: "Standard coder context — filesystem + commands + safe package registries"
      capabilities:
        - tool_pattern: "fs.*"
          path_allowlist: [/workspace, /agent]
        - tool_pattern: "cmd.run"
          subcommand_allowlist:
            git: [clone, add, commit, push, pull, status, diff]
            cargo: [build, test, fmt, clippy, check, run]
            npm: [install, run, test, build, ci]
            python: ["-m"]
        - tool_pattern: "web.fetch"
          domain_allowlist: [pypi.org, crates.io, npmjs.com]
          rate_limit:
            calls: 30
            per_seconds: 60
      deny_list: []

    - name: aegis-system-operator
      description: "Platform operator — all safe tools plus destructive and orchestrator commands"
      capabilities:
        - tool_pattern: "fs.*"
          path_allowlist: [/workspace, /agent, /shared]
        - tool_pattern: "cmd.run"
          subcommand_allowlist:
            git: [clone, add, commit, push, pull, status, diff, stash]
            cargo: [build, test, fmt, clippy, check, run]
            npm: [install, run, test, build, ci]
            python: ["-m"]
        - tool_pattern: "web.*"
        - tool_pattern: "aegis.agent.delete"
        - tool_pattern: "aegis.workflow.delete"
        - tool_pattern: "aegis.task.remove"
        - tool_pattern: "aegis.system.info"
        - tool_pattern: "aegis.system.config"
      deny_list: []

  # ─── Builtin Dispatchers ────────────────────────────────────────────────────
  builtin_dispatchers:
    - name: "cmd"
      description: "Execute shell commands inside the agent container via Dispatch Protocol"
      enabled: true
      capabilities:
        - name: cmd.run
          skip_judge: false  # state-mutating — always validate
    - name: "fs"
      description: "Filesystem operations routed through AegisFSAL"
      enabled: true
      capabilities:
        - name: fs.read
          skip_judge: true   # read-only — skip inner-loop judge overhead
        - name: fs.write
          skip_judge: false  # state-mutating — always validate
        - name: fs.list
          skip_judge: true   # read-only — skip inner-loop judge overhead
        - name: fs.grep
          skip_judge: true   # read-only — skip inner-loop judge overhead
        - name: fs.glob
          skip_judge: true   # read-only — skip inner-loop judge overhead
        - name: fs.edit
          skip_judge: false  # state-mutating — always validate
        - name: fs.multi_edit
          skip_judge: false  # state-mutating — always validate
        - name: fs.create_dir
          skip_judge: false  # state-mutating — always validate
        - name: fs.delete
          skip_judge: false  # state-mutating — always validate

  # ─── IAM (OIDC) ─────────────────────────────────────────────────────────────
  iam:
    realms:
      - slug: aegis-system
        issuer_url: "https://auth.myzaru.com/realms/aegis-system"
        jwks_uri: "https://auth.myzaru.com/realms/aegis-system/protocol/openid-connect/certs"
        audience: "aegis-orchestrator"
        kind: system
    jwks_cache_ttl_seconds: 300
    claims:
      zaru_tier: "zaru_tier"
      aegis_role: "aegis_role"

  # ─── gRPC Auth ──────────────────────────────────────────────────────────────
  grpc_auth:
    enabled: true
    exempt_methods:
      - "/aegis.v1.InnerLoop/Generate"

  # ─── Secrets (OpenBao) ──────────────────────────────────────────────────────
  secrets:
    backend:
      address: "https://openbao.internal:8200"
      auth_method: approle
      approle:
        role_id: "env:OPENBAO_ROLE_ID"
        secret_id_env_var: "OPENBAO_SECRET_ID"
      namespace: "aegis-system"
      tls:
        ca_cert: "/etc/aegis/openbao-ca.pem"

  # ─── Database ───────────────────────────────────────────────────────────────
  database:
    url: "env:AEGIS_DATABASE_URL"
    max_connections: 10
    connect_timeout_seconds: 5

  # ─── Temporal ───────────────────────────────────────────────────────────────
  temporal:
    address: "temporal:7233"
    worker_http_endpoint: "http://temporal-worker:3000"
    worker_secret: "env:TEMPORAL_WORKER_SECRET"
    namespace: "default"
    task_queue: "aegis-agents"
    max_connection_retries: 30

  # ─── Cortex ─────────────────────────────────────────────────────────────────
  cortex:
    grpc_url: "http://cortex:50052"
    api_key: "env:CORTEX_API_KEY"  # Required for Zaru SaaS; absent = memoryless mode

  # ─── External SEAL Tooling Gateway ───────────────────────────────
  seal_gateway:
    # gRPC endpoint URL for aegis-seal-gateway
    url: "http://aegis-seal-gateway:50055"

  # ─── Execution Limits ───────────────────────────────────────────────────────
  max_execution_list_limit: 1000

  # ─── Cluster Protocol ───────────────────────────────────────────────────────
  # Configures this node's role in the multi-node cluster topology.
  cluster:
    # Enable cluster mode. Default: false.
    enabled: true
    
    # Node role in cluster. Options: controller | worker | hybrid. Default: hybrid.
    role: worker
    
    # Controller settings (required for workers)
    controller:
      # gRPC endpoint of the controller node.
      endpoint: "grpc://aegis-controller:50056"
      # Bootstrap token for initial attestation (Step 0).
      token: "env:AEGIS_CLUSTER_TOKEN"

    # Port for NodeClusterService gRPC (controller only). Default: 50056.
    cluster_grpc_port: 50056
    # Static list of peer controller addresses. Default: [].
    peers: []
    # Path to the persistent Ed25519 keypair file for node identity.
    # Generated automatically on first startup if missing.
    node_keypair_path: "/etc/aegis/node_keypair.pem"
    # Interval in seconds for worker heartbeats to the controller. Default: 30.
    heartbeat_interval_secs: 30
    # Re-attest this many seconds before the security token expires. Default: 120.
    token_refresh_margin_secs: 120
    # TLS configuration for cluster communication (mTLS).
    tls:
      enabled: true
      cert_path: "/etc/aegis/certs/node.crt"
      key_path: "/etc/aegis/certs/node.key"
      ca_cert: "/etc/aegis/certs/ca.crt"

  # ─── Observability ──────────────────────────────────────────────────────────
  observability:
    logging:
      level: info
      format: json
      # ── OTLP Log Export ─────────────────────────────────────────
      # Set otlp_endpoint to start shipping logs to any OpenTelemetry-compatible
      # backend (Grafana Cloud, Datadog, self-hosted OTEL Collector, etc.).
      # otlp_endpoint: "http://otel-collector:4317"   # grpc (default)
      # otlp_endpoint: "https://otlp-gateway.grafana.net/v1/logs"  # Grafana Cloud
      # otlp_protocol: grpc                           # grpc (default) | http
      # otlp_headers:                                 # API keys / auth headers
      #   Authorization: "env:OTLP_AUTH_TOKEN"
      # otlp_min_level: info                          # min log level exported (default: info)
      # otlp_service_name: aegis-orchestrator         # service.name resource attr
      # batch:
      #   max_queue_size: 2048
      #   scheduled_delay_ms: 5000
      #   max_export_batch_size: 512
      #   export_timeout_ms: 10000
      # tls:
      #   verify: true                                # set false to skip cert verify (dev only)
      #   ca_cert_path: null                          # custom CA cert for self-signed backends
    metrics:
      enabled: true
      port: 9091
      path: "/metrics"
    tracing:
      enabled: false

Manifest Envelope

All node configuration files use the Kubernetes-style envelope:

Field	Type	Required	Value
`apiVersion`	string	✅	`100monkeys.ai/v1`
`kind`	string	✅	`NodeConfig`
`metadata.name`	string	✅	Unique human-readable node name
`metadata.version`	string	❌	Semantic version for tracking
`metadata.labels`	map	❌	Key-value pairs for categorization
`spec`	object	✅	All configuration sections documented below

Credential Resolution

Any string value in the config supports credential prefixes:

Prefix	Example	Resolution
`env:VAR_NAME`	`env:OPENAI_API_KEY`	Read from daemon process environment at startup
`secret:path`	`secret:aegis-system/kv/api-key`	Resolved from OpenBao at runtime (requires `spec.secrets.backend`)
`literal:value`	`literal:test-key`	Use literal string (not recommended for production)
(bare string)	`sk-abc123...`	Plaintext. Avoid for secrets.

Model Alias System

Agent manifests reference model aliases, not provider-specific model names. The node configuration maps aliases to real models, enabling hot-swapping and provider independence.

Standard Aliases

Alias	Purpose
`default`	General-purpose model (balanced cost/performance)
`fast`	Low-latency model (quick responses)
`smart`	High-capability model (complex reasoning)
`cheap`	Cost-optimized model
`local`	Local-only model (air-gapped)

How It Works

Agent manifest references an alias:

# agent.yaml
spec:
  task:
    prompt_template: ...
  # The agent uses whatever model is mapped to "default" on the node

Node A (cloud) maps default → GPT-4o:

llm_providers:
  - name: openai
    type: openai
    models:
      - alias: default
        model: gpt-4o

Node B (air-gapped) maps default → Llama 3.2:

llm_providers:
  - name: ollama
    type: ollama
    models:
      - alias: default
        model: llama3.2:latest

Same agent manifest runs on both nodes without changes.

Section Reference

`spec.node`

Required. Identifies this node within the AEGIS cluster.

Key	Type	Required	Default	Description
`id`	string	✅	—	Unique stable node identifier. UUID recommended.
`type`	enum	✅	—	`edge` \| `orchestrator` \| `hybrid`
`region`	string	❌	null	Geographic region (e.g., `us-east-1`)
`tags`	string[]	❌	`[]`	Capability tags matched against `execution_targets` in agent manifests
`resources.cpu_cores`	u32	❌	—	Available CPU cores
`resources.memory_gb`	u32	❌	—	Available RAM in GB
`resources.disk_gb`	u32	❌	—	Available disk in GB
`resources.gpu`	bool	❌	`false`	GPU available

`spec.image_tag`

Optional. The Docker image tag used for all AEGIS-owned service containers. Written by aegis init --tag <TAG> at initialization and updated automatically by aegis update. When absent, both commands default to the version string embedded in the aegis binary.

Key	Type	Required	Default	Description
`image_tag`	string	❌	`<binary version>`	Tag applied to all AEGIS-owned Docker images (e.g. `ghcr.io/100monkeys-ai/aegis-orchestrator:<tag>`). Written by `aegis init --tag` and updated by `aegis update`.

`spec.llm_providers`

Required array. At least one entry with at least one model is required.

Key	Type	Required	Default	Description
`name`	string	✅	—	Unique provider name
`type`	enum	✅	—	`openai` \| `anthropic` \| `ollama` \| `openai-compatible`
`endpoint`	string	✅	—	API endpoint URL
`api_key`	string	❌	null	API key. Supports `env:` and `secret:` prefixes.
`enabled`	bool	❌	`true`	Whether this provider is active
`models[].alias`	string	✅	—	Alias referenced in agent manifests
`models[].model`	string	✅	—	Provider-side model identifier
`models[].capabilities`	string[]	✅	—	`chat` \| `embedding` \| `reasoning` \| `vision` \| `code`
`models[].context_window`	u32	✅	—	Max context window in tokens
`models[].cost_per_1k_tokens`	f64	❌	`0.0`	Cost per 1K tokens (0.0 for free/local)

Provider Types

Type	Use Case	API Key Required
`openai`	OpenAI API	Yes
`anthropic`	Anthropic API	Yes
`ollama`	Local Ollama server	No
`openai-compatible`	LM Studio, vLLM, or any OpenAI-compatible API	Depends

`spec.llm_selection`

Optional. Controls runtime provider selection strategy.

Key	Type	Default	Description
`strategy`	enum	`prefer-local`	`prefer-local` \| `prefer-cloud` \| `cost-optimized` \| `latency-optimized`
`default_provider`	string	null	Provider to use when no preference is specified
`fallback_provider`	string	null	Provider to use if the primary fails
`max_retries`	u32	`3`	Maximum retry attempts on LLM failure
`retry_delay_ms`	u64	`1000`	Delay between retries in milliseconds

`spec.runtime`

Optional. Controls how agent containers are launched.

Key	Type	Default	Description
`bootstrap_script`	string	`assets/bootstrap.py`	Path to bootstrap script relative to orchestrator binary
`default_isolation`	enum	`inherit`	`docker` \| `firecracker` \| `inherit` \| `process`. The `docker` value works with both Docker and Podman runtimes — the orchestrator auto-detects the engine from the configured socket.
`container_socket_path`	string	(platform default)	Container runtime socket path. The orchestrator auto-detects whether the socket belongs to Docker or Podman. Common values: `/var/run/docker.sock` (Docker), `/run/user/<UID>/podman/podman.sock` (Podman rootless), `/run/podman/podman.sock` (Podman rootful). If omitted, the orchestrator checks `CONTAINER_HOST`, then `DOCKER_HOST`, then falls back to the platform default (`/var/run/docker.sock`).
`container_network_mode`	string	null	Container network name for agent containers. Supports `env:`. Works with both Docker and Podman networks.
`orchestrator_url`	string	`http://localhost:8088`	Callback URL reachable from inside agent containers. Supports `env:`.
`nfs_server_host`	string	null	NFS server host as seen by the Docker daemon host OS. Supports `env:`.
`nfs_port`	u16	`2049`	NFS server port
`nfs_mountport`	u16	`2049`	NFS mountd port
`runtime_registry_path`	string	`runtime-registry.yaml`	Path to the StandardRuntime registry YAML. Resolved relative to the daemon working directory. Hard-fails at startup if missing.

nfs_server_host by environment:

Environment	Value
WSL2 / Linux native	`"127.0.0.1"`
Docker Desktop (macOS)	`"host.docker.internal"`
Linux bridge network	`"172.17.0.1"` (Docker bridge gateway)
Remote / VM host	`<physical host IP>`
Via env var	`"env:AEGIS_NFS_HOST"`

`spec.network`

Optional. Configures ports and TLS.

Key	Type	Default	Description
`bind_address`	string	`0.0.0.0`	Network interface to bind all listeners
`port`	u16	`8088`	HTTP REST API port
`grpc_port`	u16	`50051`	gRPC API port
`orchestrator_endpoint`	string	null	WebSocket URL for edge → orchestrator connection (edge nodes only)
`heartbeat_interval_seconds`	u64	`30`	Health check ping interval
`tls.cert_path`	string	—	TLS certificate path
`tls.key_path`	string	—	TLS private key path
`tls.ca_path`	string	null	CA certificate path (optional)

`spec.storage`

Optional. Defaults to the local_host backend.

Key	Type	Default	Description
`backend`	enum	`local_host`	`seaweedfs` \| `local_host` \| `opendal`
`fallback_to_local`	bool	`true`	Gracefully fall back to local storage when SeaweedFS is unreachable
`nfs_port`	u16	`2049`	NFS Server Gateway listen port
`seaweedfs.filer_url`	string	`http://localhost:8888`	SeaweedFS Filer endpoint
`seaweedfs.mount_point`	string	`/var/lib/aegis/storage`	Host filesystem mount point
`seaweedfs.default_ttl_hours`	u32	`24`	Default TTL for ephemeral volumes (hours)
`seaweedfs.default_size_limit_mb`	u64	`1000`	Default per-volume size quota (MB)
`seaweedfs.max_size_limit_mb`	u64	`10000`	Hard ceiling on volume size (MB)
`seaweedfs.gc_interval_minutes`	u32	`60`	Expired volume GC interval (minutes)
`seaweedfs.s3_endpoint`	string	null	Optional SeaweedFS S3 gateway endpoint
`seaweedfs.s3_region`	string	`us-east-1`	S3 gateway region
`local_host.mount_point`	string	`/var/lib/aegis/local-host-volumes`	Host filesystem mount point for local volumes
`opendal.provider`	string	`memory`	OpenDAL scheme provider
`opendal.options`	map	`{}`	OpenDAL provider options

`spec.deploy_builtins`

Optional. Default: false.

Key	Type	Default	Description
`deploy_builtins`	bool	`false`	Deploy vendored built-in agent and workflow templates on startup. Includes agent-creator-agent, workflow-generator-planner-agent, judge agents, intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator, and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows. Required for `aegis.agent.generate`, `aegis.workflow.generate`, and `aegis.execute.intent` to function.

`spec.force_deploy_builtins`

Optional. Default: disabled.

Key	Type	Default	Description
`force_deploy_builtins`	string	—	Force re-register all built-in agents and workflows on startup. Accepts `"true"` or `"env:VAR_NAME"`. Use after upgrades to flush stale definitions.

`spec.mcp_servers`

Optional array. Each entry defines an external MCP Tool Server process.

Key	Type	Default	Description
`name`	string	—	Unique server name on this node
`enabled`	bool	`true`	Whether to start this server
`executable`	string	—	Executable path
`args`	string[]	`[]`	Command-line arguments
`capabilities`	CapabilityConfig[]	`[]`	Per-tool capability objects (see below)
`credentials`	map	`{}`	API keys/tokens injected as env vars. Values support `secret:`.
`environment`	map	`{}`	Non-secret env vars for the server process
`health_check.interval_seconds`	u64	`60`	Health check interval
`health_check.timeout_seconds`	u64	`5`	Health check timeout
`health_check.method`	string	`tools/list`	MCP method used to health-check the server
`resource_limits.cpu_millicores`	u32	`1000`	CPU limit (1000 = 1 core)
`resource_limits.memory_mb`	u32	`512`	Memory limit (MB)

Each CapabilityConfig entry:

Key	Type	Default	Description
`name`	string	—	Tool name exposed to agents (e.g. `"web.search"`, `"gmail.read"`)
`skip_judge`	bool	`false`	When `true`, the orchestrator bypasses the inner-loop semantic pre-execution judge for this tool even if `spec.execution.tool_validation` is enabled in the agent manifest. Set `true` only for read-only or otherwise idempotent tools. Set `false` for any state-mutating tool.

skip_judge is an operator override. It does not disable SEAL authentication, SecurityContext policy, argument validation, or routing. It only removes the extra semantic review step before dispatch.

`spec.seal`

Optional. Enables cryptographic agent authorization via SEAL. Required in production.

Key	Type	Default	Description
`private_key_path`	string	—	Path to RSA private key PEM for signing `SecurityToken` JWTs
`public_key_path`	string	—	Path to RSA public key PEM for verifying `SecurityToken` JWTs
`issuer`	string	`aegis-orchestrator`	JWT `iss` claim
`audiences`	string[]	`[aegis-agents]`	JWT `aud` claims
`token_ttl_seconds`	u64	`3600`	`SecurityToken` lifetime in seconds

`spec.security_contexts`

Optional array. Named permission boundaries assigned to agents at execution time.

The platform ships with the following built-in contexts that are registered automatically and do not need to be defined in aegis-config.yaml:

Name	Surface	Description
`aegis-system-agent-runtime`	Execution	Agent containers — `fs.` scoped to `/workspace`, `cmd.run`, `web.`, aegis read/execution tools
`aegis-system-default`	Authoring	Platform authoring agents — unrestricted tool access for manifest authoring and validation
`zaru-free`	Chat/MCP	Zaru Free tier chat surface
`zaru-pro`	Chat/MCP	Zaru Pro tier chat surface
`zaru-business`	Chat/MCP	Zaru Business tier chat surface
`zaru-enterprise`	Chat/MCP	Zaru Enterprise tier chat surface
`aegis-system-operator`	Operator	Platform operators — all safe tools plus destructive and orchestrator commands

Entries in spec.security_contexts define operator-provided contexts that supplement the built-in list. Built-in context names cannot be redefined here.

Each entry (SecurityContextDefinition):

Key	Type	Default	Description
`name`	string	—	Unique context name, referenced in agent manifests
`description`	string	`""`	Human-readable description
`capabilities`	array	`[]`	Tool permissions granted by this context
`deny_list`	string[]	`[]`	Explicit tool deny list; overrides any matching capability

Each capabilities entry (CapabilityDefinition):

Key	Type	Description
`tool_pattern`	string	Tool name pattern (e.g., `"fs.*"`, `"cmd.run"`, `"web.fetch"`)
`path_allowlist`	string[]	Allowed filesystem path prefixes (for `fs.*` tools)
`subcommand_allowlist`	object	Map of base command → allowed first positional arguments (for `cmd.run`). Example: `{cargo: ["build","test"]}`.
`domain_allowlist`	string[]	Allowed network domain suffixes (for `web.*` tools)
`rate_limit.calls`	u32	Number of calls allowed per window
`rate_limit.per_seconds`	u32	Window size in seconds
`max_response_size`	u64	Max response size in bytes

`spec.builtin_dispatchers`

Optional array. Configures the built-in in-process tool handlers. These are not external MCP server processes — they are implemented directly inside the orchestrator binary and dispatched via the Dispatch Protocol.

Each entry:

Key	Type	Default	Description
`name`	string	—	Dispatcher identifier (e.g. `"cmd"`, `"fs"`)
`description`	string	`""`	Human-readable description forwarded to the LLM tool schema
`enabled`	bool	`true`	Activate or deactivate this dispatcher
`capabilities`	CapabilityConfig[]	`[]`	Per-tool capability objects (same schema as `spec.mcp_servers[].capabilities`)

Each CapabilityConfig entry follows the same schema described under spec.mcp_servers above.

skip_judge defaults by tool:

Tool	Default `skip_judge`	Rationale
`cmd.run`	`false`	State-mutating — subprocess output must always be validated
`fs.read`	`true`	Read-only — file contents are deterministic
`fs.write`	`false`	State-mutating — written content must be validated
`fs.list`	`true`	Read-only — directory listings are deterministic
`fs.grep`	`true`	Read-only — search results are deterministic
`fs.glob`	`true`	Read-only — glob matches are deterministic
`fs.edit`	`false`	State-mutating — edits must be validated
`fs.multi_edit`	`false`	State-mutating — edits must be validated
`fs.create_dir`	`false`	State-mutating
`fs.delete`	`false`	State-mutating — destructive operation
`web.search`	`true`	Read-only external lookup
`web.fetch`	`true`	Read-only HTTP fetch

If spec.builtin_dispatchers is omitted the orchestrator uses the compiled-in defaults shown above. Explicit configuration is only needed when overriding those defaults.

For external MCP servers, skip_judge follows the same rule: true for deterministic reads, false for anything that can mutate state or trigger side effects. The node configuration is the source of truth for the bypass decision.

`spec.iam`

Optional. Configures IAM/OIDC as the trusted JWT issuer. Omit to disable JWT validation (dev only).

Key	Type	Default	Description
`realms[].slug`	string	—	Realm name matching the Keycloak configuration
`realms[].issuer_url`	string	—	OIDC issuer URL
`realms[].jwks_uri`	string	—	JWKS endpoint for JWT signature verification
`realms[].audience`	string	—	Expected `aud` claim in tokens from this realm
`realms[].kind`	enum	—	`system` \| `consumer` \| `tenant`
`jwks_cache_ttl_seconds`	u32	`300`	JWKS key cache TTL
`claims.zaru_tier`	string	`zaru_tier`	Keycloak claim name carrying `ZaruTier`
`claims.aegis_role`	string	`aegis_role`	Keycloak claim name carrying `AegisRole`

`spec.grpc_auth`

Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint. Requires spec.iam.

Key	Type	Default	Description
`enabled`	bool	`true`	Enforce JWT validation on gRPC methods
`exempt_methods`	string[]	`[/aegis.v1.InnerLoop/Generate]`	gRPC method full paths exempt from auth

`spec.secrets`

Optional. Configures OpenBao as the secrets backend. Follows the Keymaster Pattern — agents never access OpenBao directly.

Key	Type	Default	Description
`backend.address`	string	—	OpenBao server URL
`backend.auth_method`	string	`approle`	Only `approle` is currently supported
`backend.approle.role_id`	string	—	AppRole Role ID (public; safe to commit)
`backend.approle.secret_id_env_var`	string	`OPENBAO_SECRET_ID`	Env var name containing the AppRole Secret ID
`backend.namespace`	string	—	OpenBao namespace (maps 1:1 to an IAM realm)
`backend.tls.ca_cert`	string	null	CA certificate path
`backend.tls.client_cert`	string	null	mTLS client certificate path
`backend.tls.client_key`	string	null	mTLS client key path

`spec.database`

Optional. PostgreSQL connection for persistent state (executions, patterns, workflows). If omitted, the daemon uses in-memory repositories (development mode only).

Key	Type	Default	Description
`url`	string	—	PostgreSQL connection URL. Supports `env:` and `secret:`.
`max_connections`	u32	`5`	Maximum connections in the pool
`connect_timeout_seconds`	u64	`5`	Connection timeout

Example:

database:
  url: "env:AEGIS_DATABASE_URL"
  max_connections: 10
  connect_timeout_seconds: 5

`spec.temporal`

Optional. Temporal workflow engine configuration for durable workflow execution. If omitted, workflow orchestration features are unavailable.

Key	Type	Default	Description
`address`	string	`temporal:7233`	Temporal gRPC server address
`worker_http_endpoint`	string	`http://localhost:3000`	HTTP endpoint for Temporal worker callbacks. Supports `env:`.
`worker_secret`	string	null	Shared secret for authenticating worker callbacks. Supports `env:`.
`namespace`	string	`default`	Temporal namespace
`task_queue`	string	`aegis-agents`	Temporal task queue name
`max_connection_retries`	i32	`30`	Maximum number of connection retries when establishing the Temporal client.

Example:

temporal:
  address: "temporal:7233"
  worker_http_endpoint: "http://aegis-runtime:3000"
  worker_secret: "env:TEMPORAL_WORKER_SECRET"
  namespace: "default"
  task_queue: "aegis-agents"
  max_connection_retries: 30

`spec.cortex`

Optional. Cortex memory and learning service configuration. If omitted or grpc_url is null, the daemon runs in memoryless mode — no error, no retry, patterns are simply not stored.

Key	Type	Default	Description
`grpc_url`	string	null	Cortex gRPC service URL. Supports `env:`.
`api_key`	string	null	API key for 100monkeys hosted Cortex (Zaru SaaS). Supports `env:` and `secret:` prefixes. When absent, the orchestrator connects without authentication (local/open cortex).

Example:

cortex:
  grpc_url: "env:CORTEX_GRPC_URL"
  api_key: "env:CORTEX_API_KEY"  # Required for Zaru SaaS

`spec.discovery`

Discovery — Semantic agent and workflow search (aegis.agent.search, aegis.workflow.search) is powered by the Cortex service. When spec.cortex is configured with a valid grpc_url and api_key, discovery is available automatically. No separate spec.discovery configuration is required.

`spec.seal_gateway`

Optional. Configures forwarding of external tool invocations to the standalone SEAL tooling gateway.

Key	Type	Default	Description
`url`	string	null	gRPC endpoint URL of `aegis-seal-gateway` (example: `http://aegis-seal-gateway:50055`).

If omitted, orchestrator does not forward unknown/external tools to the gateway and continues with built-in routing only.

`spec.max_execution_list_limit`

Optional. Upper bound on executions returned by a single list_executions request.

Key	Type	Default	Description
`max_execution_list_limit`	usize	`1000`	Maximum number of executions returned by a single `list_executions` request. Protects against excessive memory usage.

`spec.cluster`

Optional. Configures this node's role in the multi-node cluster topology. If omitted, the node defaults to hybrid and operates as a standalone single-node deployment.

spec.cluster.role is orthogonal to spec.node.type. A node with type: orchestrator can have cluster.role: worker.

Key	Type	Required	Default	Description
`enabled`	bool	❌	`false`	Enable cluster mode.
`role`	enum	✅	`hybrid`	Node role: `controller` \| `worker` \| `hybrid`
`controller_endpoint`	string	❌	—	gRPC endpoint of the controller node. Required for `role: worker`.
`cluster_grpc_port`	u16	❌	`50056`	Port for `NodeClusterService` gRPC (controller only).
`peers`	string[]	❌	`[]`	Static list of peer controller addresses.
`node_keypair_path`	string	✅	—	Path to the persistent Ed25519 keypair file for node identity.
`heartbeat_interval_secs`	u64	❌	`30`	Interval in seconds for worker heartbeats.
`token_refresh_margin_secs`	u64	❌	`120`	Token re-attestation margin in seconds.
`tls.enabled`	bool	❌	`true`	Enable TLS for cluster communication.
`tls.cert_path`	string	❌	—	Path to node TLS certificate.
`tls.key_path`	string	❌	—	Path to node TLS private key.
`tls.ca_cert`	string	❌	—	Path to CA certificate for peer verification.

Cluster Roles

Role	Exposes	Connects To
`controller`	`NodeClusterService` on `cluster_grpc_port` (default 50056)	Nothing (it is the authority)
`worker`	`ForwardExecution` stream on `spec.network.grpc_port` (50051)	Controller at `controller_endpoint`
`hybrid`	Both 50051 and 50056	Itself (no external cluster required)

Node Attestation

On startup, a worker node:

Loads (or generates) its Ed25519 keypair from node_keypair_path
Calls AttestNode on the controller
Receives a ChallengeNode call — signs the challenge with its private key
Receives a NodeSecurityToken (signed JWT)
Wraps all subsequent cluster RPCs in a secure envelope
Sends periodic heartbeats to the controller

The NodeSecurityToken is automatically refreshed before expiry. The keypair is persistent across restarts, enabling the controller to recognize re-connecting workers.

`spec.observability`

Optional.

`logging`

Key	Type	Default	Env Override	Description
`logging.level`	enum	`info`	`RUST_LOG`	`error` \| `warn` \| `info` \| `debug` \| `trace`
`logging.format`	enum	`json`	`AEGIS_LOG_FORMAT`	`json` \| `text`
`logging.file`	string	null	—	Log file path. Omit to write to stdout.
`logging.otlp_endpoint`	string	null	`AEGIS_OTLP_ENDPOINT`	OTLP collector endpoint. Setting this value enables OTLP log export. Omit or `null` to disable.
`logging.otlp_protocol`	enum	`grpc`	`AEGIS_OTLP_PROTOCOL`	`grpc` \| `http`. `grpc` uses port 4317 (default); `http` uses port 4318 with Protobuf over HTTP/1.1.
`logging.otlp_headers`	map	`{}`	`AEGIS_OTLP_HEADERS`	Key-value HTTP/gRPC metadata headers sent on every export RPC (e.g. `Authorization`, `api-key`). Values support `env:` prefixes. When set via env var, use comma-separated `key=value` pairs: `Authorization=Bearer token,x-scope-orgid=12345`.
`logging.otlp_min_level`	string	`info`	`AEGIS_OTLP_LOG_LEVEL`	Minimum log level forwarded to OTLP. Does not affect stdout output.
`logging.otlp_service_name`	string	`aegis-orchestrator`	`AEGIS_OTLP_SERVICE_NAME`	Value of the `service.name` OpenTelemetry resource attribute.
`logging.batch.max_queue_size`	u32	`2048`	—	Maximum number of log records buffered before export.
`logging.batch.scheduled_delay_ms`	u64	`5000`	—	Interval (ms) between batch export flushes.
`logging.batch.max_export_batch_size`	u32	`512`	—	Maximum records per export RPC.
`logging.batch.export_timeout_ms`	u64	`10000`	—	Timeout (ms) for a single export RPC.
`logging.tls.verify`	bool	`true`	—	Whether to verify the OTLP endpoint's TLS certificate. Set `false` only for development.
`logging.tls.ca_cert_path`	string	null	—	Path to a custom CA certificate PEM for self-signed OTLP backends.

`metrics`

Key	Type	Default	Description
`metrics.enabled`	bool	`true`	Enable Prometheus metrics
`metrics.port`	u16	`9091`	Prometheus metrics exposition port
`metrics.path`	string	`/metrics`	HTTP path for scraping

`tracing`

Key	Type	Default	Description
`tracing.enabled`	bool	`false`	Enable distributed tracing via OpenTelemetry

Config Discovery Order

The daemon searches for a configuration file in this order (first match wins):

--config <path> CLI flag
AEGIS_CONFIG_PATH environment variable
./aegis-config.yaml (working directory)
~/.aegis/config.yaml
/etc/aegis/config.yaml (Linux/macOS)

Example Configurations

Minimal (Local Development)

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: dev-laptop
spec:
  node:
    id: "dev-local"
    type: edge
  llm_providers:
    - name: ollama
      type: ollama
      endpoint: "http://localhost:11434"
      enabled: true
      models:
        - alias: default
          model: llama3.2:latest
          capabilities: [chat, code]
          context_window: 8192
          cost_per_1k_tokens: 0.0
  llm_selection:
    strategy: prefer-local
    default_provider: ollama

Air-Gapped Production

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: prod-airgap-001
  version: "1.0.0"
  labels:
    environment: production
    deployment: air-gapped
spec:
  node:
    id: "550e8400-e29b-41d4-a716-446655440001"
    type: edge
    tags: [production, air-gapped, local-llm]
  llm_providers:
    - name: ollama
      type: ollama
      endpoint: "http://localhost:11434"
      enabled: true
      models:
        - alias: default
          model: llama3.2:latest
          capabilities: [chat, code, reasoning]
          context_window: 8192
          cost_per_1k_tokens: 0.0
        - alias: fast
          model: phi3:mini
          capabilities: [chat, code]
          context_window: 4096
          cost_per_1k_tokens: 0.0
  llm_selection:
    strategy: prefer-local
    default_provider: ollama

Cloud Multi-Provider

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: cloud-multi-001
  version: "1.0.0"
  labels:
    environment: production
    deployment: cloud
spec:
  node:
    id: "550e8400-e29b-41d4-a716-446655440002"
    type: orchestrator
    region: us-west-2
    tags: [production, cloud, multi-provider]
  llm_providers:
    - name: openai
      type: openai
      endpoint: "https://api.openai.com/v1"
      api_key: "env:OPENAI_API_KEY"
      enabled: true
      models:
        - alias: default
          model: gpt-4o
          capabilities: [chat, code, reasoning]
          context_window: 128000
          cost_per_1k_tokens: 0.005
        - alias: fast
          model: gpt-4o-mini
          capabilities: [chat, code]
          context_window: 128000
          cost_per_1k_tokens: 0.00015
    - name: anthropic
      type: anthropic
      endpoint: "https://api.anthropic.com/v1"
      api_key: "env:ANTHROPIC_API_KEY"
      enabled: true
      models:
        - alias: smart
          model: claude-sonnet-4-5
          capabilities: [chat, code, reasoning]
          context_window: 200000
          cost_per_1k_tokens: 0.003
  llm_selection:
    strategy: cost-optimized
    default_provider: openai
    fallback_provider: anthropic

Docker Compose Deployment

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: docker-compose-node
  version: "1.0.0"
spec:
  node:
    id: "env:AEGIS_NODE_ID"
    type: orchestrator
  llm_providers:
    - name: ollama
      type: ollama
      endpoint: "http://ollama:11434"
      enabled: true
      models:
        - alias: default
          model: phi3:mini
          capabilities: [chat, code, reasoning]
          context_window: 4096
          cost_per_1k_tokens: 0.0
  llm_selection:
    strategy: prefer-local
    default_provider: ollama
  runtime:
    default_isolation: docker
    container_network_mode: "env:AEGIS_CONTAINER_NETWORK"
    orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"
    nfs_server_host: "env:AEGIS_NFS_HOST"
    runtime_registry_path: "runtime-registry.yaml"
  storage:
    backend: seaweedfs
    fallback_to_local: true
    seaweedfs:
      filer_url: "http://seaweedfs-filer:8888"
      mount_point: "/var/lib/aegis/storage"
      default_ttl_hours: 24
      default_size_limit_mb: 1000
      max_size_limit_mb: 10000
      gc_interval_minutes: 60
  database:
    url: "env:AEGIS_DATABASE_URL"
    max_connections: 5
    connect_timeout_seconds: 5
  temporal:
    address: "temporal:7233"
    worker_http_endpoint: "http://aegis-runtime:3000"
    worker_secret: "env:TEMPORAL_WORKER_SECRET"
    namespace: "default"
    task_queue: "aegis-agents"
  cortex:
    grpc_url: "env:CORTEX_GRPC_URL"
    api_key: "env:CORTEX_API_KEY"  # Required for Zaru SaaS
  observability:
    logging:
      level: info

Daemon Configuration — annotated walkthrough of every field
Agent Manifest Reference — agent manifest specification
Workflow Manifest Reference — workflow manifest specification
Docker Deployment — Docker-specific deployment guide
Secrets Management — OpenBao integration guide
IAM Integration — Keycloak configuration guide

Node Configuration Reference

On this page