Aegis Orchestrator
Guides

Configuring Storage Volumes

Declaring ephemeral and persistent volumes, volume mounts, access modes, quotas, and managing volumes via CLI.

Configuring Storage Volumes

AEGIS provides agents with filesystem storage via Volumes. Volumes are first-class domain entities managed by the orchestrator and backed by SeaweedFS. Agents access their volumes as a standard POSIX filesystem — the NFS mount is transparent to agent code.

All agent volume mount paths must be rooted at /workspace. Valid examples: /workspace, /workspace/datasets, /workspace/cache.


Volume Types

Storage ClassDescriptionTTL
EphemeralTemporary workspace; auto-deleted after execution or when TTL expires.Required; e.g., 1h, 30m, 2h
PersistentSurvives across executions; must be explicitly deleted.Not applicable

Use ephemeral volumes for scratch space, build artifacts, and intermediate results that do not need to outlive the execution.

Use persistent volumes for data that must be shared across multiple executions, stored long-term, or readable by other agents.


Declaring Volumes in the Manifest

spec:
  volumes:
    # Ephemeral workspace scratch volume — deleted after 1 hour
    - name: workspace
      mount_path: /workspace
      access_mode: read-write
      storage_class: ephemeral
      ttl_hours: 1

    # Persistent output volume — survives across executions
    - name: output-store
      mount_path: /workspace/output
      access_mode: read-write
      storage_class: persistent

    # Read-only access to a shared reference dataset
    - name: reference-data
      mount_path: /workspace/reference
      access_mode: read-only
      storage_class: persistent
      source:
        volume_id: "vol-a1b2c3d4-..."   # reference an existing volume by ID

Volume Fields

FieldTypeRequiredDescription
namestringLocal identifier used to reference this mount within the manifest.
mount_pathstringAbsolute path inside the container, rooted at /workspace.
access_moderead-write | read-onlyAccess mode enforced by the storage gateway (AegisFSAL).
storage_classephemeral | persistentLifetime of the volume.
ttl_hoursintegerRequired for ephemeralHours until auto-deletion (e.g., 1, 24, 48).
size_limitstringMaximum volume size as a Kubernetes resource string (e.g., "500Mi", "5Gi"). Writes that exceed this emit a VolumeQuotaExceeded event and return ENOSPC.
source.volume_idstringRequired for persistentPin to a specific existing volume by UUID. Supports Handlebars: {{input.volume_id}}.

mount_path values must be unique and non-overlapping except for the canonical root /workspace (for example, /workspace plus /workspace/datasets is valid).


How Volumes Are Mounted

When an execution starts, the orchestrator:

  1. Creates the declared volumes in SeaweedFS (or resolves existing ones via volume_id).
  2. Starts the NFS server gateway for each volume.
  3. Mounts the volumes into the agent container via the kernel NFS client (nfsvers=3).
  4. The agent sees the mount path as a standard filesystem directory.

No special code is required in bootstrap.py. The agent can use ordinary Python open(), os.path, shutil, etc. to access volume contents.

# Agents read and write volumes like any filesystem
with open("/workspace/solution.py", "w") as f:
    f.write(code_content)

# Files written are immediately visible to the orchestrator
# for FSAL tool operations (fs.read, fs.write, etc.)

Storage Gateway Security

The AEGIS storage gateway (AegisFSAL) intercepts every POSIX operation on the volume:

  • Authorization: Verifies the requesting execution holds the volume's FileHandle (encoding execution_id + volume_id).
  • Path canonicalization: All paths are normalized server-side. Any .. component is rejected before reaching SeaweedFS.
  • Filesystem policy enforcement: Read/write path allowlists from spec.security.filesystem are enforced per-operation.
  • Quota enforcement: Writes that would exceed size_limit are blocked and emit VolumeQuotaExceeded.
  • Audit logging: Every operation (open, read, write, create, delete, readdir) is published as a StorageEvent domain event to the event bus.

Quota Configuration

Set a maximum volume size in bytes:

volumes:
  - name: workspace
    mount_path: /workspace
    access_mode: read-write
    storage_class: ephemeral
    ttl_hours: 2
    size_limit: "5Gi"

If the agent writes data that would exceed the quota, the write fails with ENOSPC inside the container and a VolumeQuotaExceeded event is emitted.


Phase 1 Constraints

Single-writer constraint: In Phase 1, a persistent volume with read-write access can only be mounted by one execution at a time. Attempting to mount an already-in-use read-write persistent volume will cause the second execution to fail at startup with a VolumeAlreadyMounted error.

File locking: NFS mounts use the nolock option. POSIX advisory locks (flock, fcntl) are not coordinated between agents. For multi-agent coordination on shared files, use the ResourceLock mechanism instead.


Volume Ownership

Every volume has an owner that determines its lifetime and access rules:

OwnershipCreated ByLifetime
ExecutionOrchestrator at execution start (one per spec.volumes entry)Ephemeral volumes are deleted when the execution ends; persistent volumes survive and require manual deletion
WorkflowExecutionOrchestrator at workflow start (one per spec.storage entry)Ephemeral volumes are deleted when the workflow execution completes; persistent volumes survive
PersistentAPINo automatic cleanup; must be explicitly deleted via volume management APIs

When a volume is owned by an execution, only that execution can perform write operations through AegisFSAL. Other executions may mount the same persistent volume read-only by referencing it via source.volume_id — this does not change ownership.


TTL and Garbage Collection

The TTL clock starts at volume creation, not at execution start. If provisioning pauses between volume creation and container startup, those seconds count against the TTL.

The background garbage collector runs every gc_interval_minutes (default: 60 minutes, configurable in the node configuration under spec.storage.seaweedfs.gc_interval_minutes).

ScenarioWhat happens
Execution completes normally and volume is ephemeralVolume deleted immediately on execution teardown, not waiting for GC
Execution is cancelled mid-runVolume marked expired; deleted on next GC pass
Orchestrator restarts unexpectedlyGC picks up all orphaned expired volumes on the next scheduled pass
ttl_hours not set on an ephemeral volumedefault_ttl_hours from node configuration applies (default: 24)

Cleanup is a two-phase soft-delete: the orchestrator transitions the volume to Deleting, removes the SeaweedFS directory, then transitions to Deleted. If the SeaweedFS deletion fails, the volume stays in Deleting state and the next GC pass retries.


Managing Volumes

Volume lifecycle is managed by orchestrator APIs and execution/workflow manifests in the current release. Dedicated aegis volume ... CLI commands are not exposed yet.

Volumes used by an active execution cannot be deleted. The delete command returns an error if the volume is currently mounted.


Pinning to an Existing Volume

To pass data between executions via a persistent volume:

# Get execution metadata and copy the referenced volume ID
EXEC_ID=<execution-id>
curl http://localhost:8088/v1/executions/${EXEC_ID}

In the next agent manifest:

volumes:
    - name: previous-output
      mount_path: /workspace/previous
      access_mode: read-only
      storage_class: persistent
    source:
      volume_id: "vol-a1b2c3d4-..."

The previous execution's output is now readable at /workspace/previous in the new agent container.


Passing Volumes Between Agents in a Workflow

In a workflow, the standard pattern is for a writer agent to populate a persistent volume and declare its volume_id on the blackboard, so that subsequent agents can mount it read-only.

Writer agent manifest (first workflow state):

spec:
  volumes:
    - name: output
      mount_path: /workspace/output
      access_mode: read-write
      storage_class: persistent

After the writer completes, a system state in the workflow writes the volume ID to the blackboard:

states:
  - name: write-output-id
    kind: system
    action: blackboard.set
    params:
      key: output_volume_id
      value: "{{executions.writer.volumes.output.id}}"

Reader agent manifest (subsequent workflow state):

spec:
  volumes:
    - name: previous-output
      mount_path: /workspace/input
      access_mode: read-only
      storage_class: persistent
      source:
        volume_id: "{{blackboard.output_volume_id}}"

The reader agent sees the writer's files at /workspace/input. Because it mounts read-only, it is compatible with the single-writer constraint and can run concurrently with other readers on the same volume.


Inspecting Volume Artifacts

When seaweedfs.s3_endpoint is configured in the node configuration, all volume contents are also accessible via any S3-compatible client. This is useful for inspecting execution artifacts, debugging failed agents, or archiving outputs without running a new execution.

Volumes are stored at the following path in the SeaweedFS S3 namespace:

/<tenant_id>/<volume_id>/

Using the AWS CLI:

# Retrieve the volume ID from a previous execution
TENANT_ID="00000000-0000-0000-0000-000000000001"  # default single-tenant ID
VOL_ID="<volume-id-from-execution-metadata>"

# List files in the volume
aws s3 ls s3://${TENANT_ID}/${VOL_ID}/ \
  --endpoint-url http://seaweedfs-s3:8333

# Download all artifacts locally
aws s3 cp s3://${TENANT_ID}/${VOL_ID}/ ./artifacts/ --recursive \
  --endpoint-url http://seaweedfs-s3:8333

The S3 gateway does not enforce manifest FilesystemPolicy rules. Restrict access to the S3 gateway endpoint at the network level in production environments — it should not be reachable from agent containers or public networks.


Troubleshooting

Permission denied (EACCES) on file operations

The agent's code is attempting to read or write a path that is not covered by the manifest's spec.security.filesystem allowlists, or the path contains a .. component.

Check the FilesystemPolicyViolation or PathTraversalBlocked events in the execution log to identify the exact path. Then update the manifest's read or write allowlist to include it:

spec:
  security:
    filesystem:
      write:
        - /workspace
        - /tmp   # add paths your agent needs

No space left on device (ENOSPC) during a write

The volume's size_limit quota has been reached. A VolumeQuotaExceeded event is emitted with the volume ID and the byte counts. Either increase size_limit or clean up files before writing more data.

Note that quota accounting tracks cumulative bytes written, not net storage used. Deleting files does not reduce the recorded quota usage in Phase 1.

VolumeAlreadyMounted at execution startup

A persistent volume declared with read-write access is already mounted by another running execution. Persistent read-write volumes can only be held by one execution at a time.

Options:

  • Wait for the existing execution to complete before starting a new one that needs write access.
  • Switch the new execution to read-only access if it only needs to read the volume.
  • Use execution/workflow metadata APIs to identify which execution currently owns the volume.

Mount hangs or times out

NFS mounts use soft mode with a 1-second timeout and 2 retries. If the orchestrator NFS server is unreachable, the mount fails with ETIMEDOUT rather than hanging indefinitely. Check that the AEGIS orchestrator process is healthy and that port 2049 is reachable from the agent container's network.

For current CLI vs API storage operation coverage, see the CLI Capability Matrix.

On this page