Container Registry & Image Management
How AEGIS discovers, pulls, caches, and authenticates container images for standard and custom runtimes — including ImagePullPolicy, private registry credentials, failure scenarios, and pre-caching for airgapped environments.
Container Registry & Image Management
Every AEGIS agent execution requires a container image. AEGIS delegates image discovery, pulling, and caching to the container runtime's (Docker or Podman) native mechanisms, with explicit ImagePullPolicy control and support for both public and private registries.
Image Resolution
AEGIS resolves the container image for each execution from one of two sources depending on the runtime mode declared in the agent manifest:
| Runtime Mode | Image Source |
|---|---|
StandardRuntime (language + version) | Resolved at execution time via the StandardRuntime Registry (runtime-registry.yaml). Example: language: python + version: "3.11" → python:3.11-slim. See Standard Runtime Registry. |
CustomRuntime (image) | Taken directly from spec.runtime.image in the manifest. Must be a fully-qualified reference that includes a registry component (e.g. ghcr.io/myorg/agent:v1.0). |
Once the image reference is known, the orchestrator applies the ImagePullPolicy to decide whether to pull from the registry or use the local container runtime cache.
ImagePullPolicy
Set image_pull_policy in spec.runtime to control when the orchestrator pulls images:
spec:
runtime:
image: "ghcr.io/myorg/agent:v1.0"
image_pull_policy: "IfNotPresent" # Always | IfNotPresent | NeverAlways
Pulls from the registry before every execution, even if the image is already cached locally.
image_pull_policy: "Always"Use when: Your image uses a mutable tag (e.g., :latest) and you need every execution to use the current push. Slower due to the network round-trip.
IfNotPresent (Default)
Uses the local container runtime cache if the image is already present. Pulls from the registry only if the image is missing locally.
image_pull_policy: "IfNotPresent"Use when: Standard production deployments with pinned version tags. Fast on repeated executions; requires one initial pull.
Never
Uses only the local container runtime cache. Fails immediately if the image is not already present — no network attempt is made.
image_pull_policy: "Never"Use when: Airgapped or offline environments where network access to a registry is unavailable or prohibited. Requires images to be pre-cached before execution. See Pre-Caching for Airgapped Environments below.
Registry Authentication
Public Registries (Phase 1)
Standard runtime images pull from Docker Hub without authentication. Custom runtime images from public repositories (Docker Hub, GHCR public repos) also require no credentials.
Private Registries (Phase 1)
Credentials for private registries are injected via node configuration. The intended configuration shape uses a dockerconfigjson-format secret.
Docker stores credentials in ~/.docker/config.json. Podman uses ~/.config/containers/auth.json (or ${XDG_RUNTIME_DIR}/containers/auth.json). Both formats are compatible -- the orchestrator reads the appropriate file based on the active container runtime.
# In node-config
secrets:
ghcr-credentials:
type: dockercfg
data:
.dockerconfigjson: |
{
"auths": {
"ghcr.io": {
"username": "user@example.com",
"password": "ghp_xxxxxxxxxxxx",
"auth": "<base64(username:password)>"
},
"docker.io": {
"username": "dockerhub_user",
"password": "dckr_pat_xxxx",
"auth": "<base64(username:password)>"
}
}
}The orchestrator passes these credentials to the container runtime (Docker or Podman) API when pulling the image. Credentials are never exposed to the agent container.
Note: The
registry_credentialsfield inNodeConfigSpecis a planned Phase 1 feature and is not yet fully wired in the current release. Track progress in the orchestrator repository.
Phase 2: Dynamic Credentials via OpenBao
A future phase will support short-lived dynamic credentials sourced from OpenBao (an open-source secrets manager). The credential retrieval happens entirely in the orchestrator — agents never access the secrets store directly.
Image Caching
Images pulled by the container runtime are stored in the local image cache. AEGIS does not manage its own image cache layer -- it delegates entirely to Docker or Podman.
# View cached images on your node
docker images # or: podman images
# Remove unused images to free disk space
docker image prune # or: podman image prune
# Remove all unused images (including those not referenced by any container)
docker image prune -a # or: podman image prune -aNote:
podmancommands are CLI-compatible with theirdockerequivalents for all image operations shown above.
For StandardRuntime images, each distinct language+version pair resolves to a pinned, immutable image tag — the same tag is always used for a given version, so images are effectively cached after the first execution on a node.
Pre-Caching for Airgapped Environments
When using image_pull_policy: Never, images must be present in the local container runtime cache before any execution attempt. Pre-cache images on each node manually or as part of your CI/CD provisioning pipeline:
# Pull StandardRuntime images (example: all Python versions)
docker pull python:3.11-slim # or: podman pull python:3.11-slim
docker pull python:3.10-slim # or: podman pull python:3.10-slim
# Pull your custom runtime images
docker pull ghcr.io/myorg/agent:v1.0.0 # or: podman pull ghcr.io/myorg/agent:v1.0.0
# Verify images are present
docker images # or: podman imagesIf the image is not found locally and image_pull_policy: Never is set, the execution fails immediately with an ImageNotFound error.
Pull Failure Scenarios
When an image pull fails, the execution is rejected and no container is started. Phase 1 makes a single attempt with no automatic retries.
| Scenario | Cause | Resolution |
|---|---|---|
| Image not found | Typo in image name; image deleted from registry | Verify the image exists in the registry and the reference is correct |
| Authentication failed | Invalid or missing credentials; insufficient permissions | Update node-config registry credentials; verify token scopes |
| Network timeout | Registry unreachable; slow network | Verify network connectivity to the registry; consider IfNotPresent with a pre-pulled image |
| Rate limited | Docker Hub free-tier pull rate limit exceeded | Authenticate to Docker Hub (authenticated users have higher limits); use a private registry mirror |
| Disk full | Container runtime storage exhausted | Run docker image prune (or podman image prune) on the node to free space |
Observability
AEGIS publishes domain events for all image operations. These events are available on the event bus and can be consumed by monitoring integrations:
| Event | When Published |
|---|---|
ImagePullStarted | Orchestrator begins a registry pull |
ImagePullCompleted | Pull succeeded; includes whether the image came from cache (Cached) or was freshly downloaded (Downloaded) |
ImagePullFailed | Pull failed; includes failure reason |
ImageCached | Image successfully stored in local container runtime cache |
ImageRemoved | Image removed from local cache (e.g., after prune) |
AEGIS Platform Images
All AEGIS platform service images are published to the GitHub Container Registry under the 100monkeys-ai organization.
Image Registry
| Image | Repository | Description |
|---|---|---|
aegis-runtime | ghcr.io/100monkeys-ai/aegis-runtime | Core orchestrator |
aegis-temporal-worker | ghcr.io/100monkeys-ai/aegis-temporal-worker | Temporal workflow worker |
aegis-seal-gateway | ghcr.io/100monkeys-ai/aegis-seal-gateway | SEAL tooling gateway |
Tagging Strategy
| Trigger | Tags Applied |
|---|---|
Push to main | :latest, sha-<short-commit> |
Semver tag (e.g., v1.2.3) | 1.2.3, 1.2, 1 |
Authenticating with GHCR
# Using the aegis-deploy Makefile
make registry-login
# Manual authentication
echo $GHCR_TOKEN | podman login ghcr.io -u $GHCR_USERNAME --password-stdin
# Or with Docker
echo $GHCR_TOKEN | docker login ghcr.io -u $GHCR_USERNAME --password-stdinRequires a GitHub Personal Access Token (PAT) with read:packages scope. Set GHCR_USERNAME and GHCR_TOKEN in your .env file.
CI/CD Pipeline
Images are built automatically via GitHub Actions on every push to main and on version tags. Each repository (aegis-orchestrator, aegis-temporal-worker, aegis-seal-gateway) has its own .github/workflows/docker-publish.yml workflow that:
- Builds a multi-stage Docker image
- Tags with
:latestand the short commit SHA - Pushes to
ghcr.io/100monkeys-ai/ - On semver tags, additionally tags with major, minor, and patch versions
See Also
- Standard Runtime Registry — Full language-version-to-image mapping table
- Custom Runtime Agents — Building and using your own container images
- Agent Manifest Reference —
spec.runtimefield definitions includingimage_pull_policy - Docker Deployment — Docker daemon setup and container lifecycle