Arena Architecture
Arena is Omnia’s distributed testing framework for evaluating PromptKit bundles at scale. It enables systematic testing of prompts against datasets, tracking results, and comparing performance across versions.
Overview
Section titled “Overview”Arena provides:
- Distributed execution: Run tests across multiple workers
- GitOps integration: Source bundles from Git, OCI, or ConfigMaps
- Revision tracking: Track which bundle versions were tested
- Result aggregation: Collect and analyze test results
- Scalability: Handle large test datasets efficiently
- Unified provider model: LLM providers and agents are interchangeable in the evaluation matrix
Core Concepts
Section titled “Core Concepts”PromptKit Bundles
Section titled “PromptKit Bundles”Arena tests PromptKit bundles - structured collections of prompts with versioning, templating, and parameter definitions. Bundles are fetched from external sources and tested against datasets.
Sources
Section titled “Sources”An ArenaSource defines where to fetch PromptKit bundles from:
┌─────────────────────────────────────────────────────────────┐│ ArenaSource │├─────────────────────────────────────────────────────────────┤│ • Git repository (branch, tag, or commit) ││ • OCI registry (container image format) ││ • Kubernetes ConfigMap (for simple cases) │├─────────────────────────────────────────────────────────────┤│ Polls source at interval → Updates artifact revision ││ Provides download URL for workers │└─────────────────────────────────────────────────────────────┘The controller automatically:
- Polls the source at the configured interval
- Detects changes (new commits, tags, or versions)
- Updates the artifact URL and revision
- Triggers downstream jobs when sources change
Providers and Agents
Section titled “Providers and Agents”An ArenaJob references providers — the LLM backends or agents that execute prompts during a test run. Providers are organized into named groups (e.g., default, judge) that correspond to roles defined in the arena configuration file.
Each entry in a provider group is either:
- A providerRef pointing to a Provider CRD (direct LLM access)
- An agentRef pointing to an AgentRuntime CRD (the worker connects via WebSocket)
Agents and LLM providers are interchangeable — they can appear in any provider position within the scenario-by-provider evaluation matrix. There is no separate “fleet mode”; an agent is simply another provider entry.
An ArenaJob executes a test run:
- References an ArenaSource via
sourceRef - Maps provider groups to Provider or AgentRuntime CRDs
- Partitions work across workers
- Tracks progress and collects results
- Stores aggregated results
Architecture
Section titled “Architecture”┌─────────────────────────────────────────────────────────────┐│ Arena Controllers │├─────────────────────────────────────────────────────────────┤│ ││ ┌──────────────┐ ┌──────────────┐ ││ │ ArenaSource │───────────────────────▶│ ArenaJob │ ││ │ Controller │ │ Controller │ ││ └──────┬───────┘ └──────┬───────┘ ││ │ │ ││ ▼ ▼ ││ ┌──────────────┐ ┌──────────────┐ ││ │ Fetcher │ │ Work Queue │ ││ │ (Git/OCI) │ │ (Redis) │ ││ └──────┬───────┘ └──────┬───────┘ ││ │ │ ││ ▼ ▼ ││ ┌──────────────┐ ┌──────────────┐ ││ │ Artifacts │ │ Workers │ ││ │ Storage │ │ (Pods) │ ││ └──────────────┘ └──────────────┘ ││ │└─────────────────────────────────────────────────────────────┘Workflow
Section titled “Workflow”1. Define Sources
Section titled “1. Define Sources”Create ArenaSource resources pointing to your PromptKit bundles:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaSourcemetadata: name: my-promptsspec: type: git interval: 5m git: url: https://github.com/acme/prompts ref: branch: main2. Run Jobs
Section titled “2. Run Jobs”Create an ArenaJob that references the source and maps provider groups:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobmetadata: name: evaluation-run-001spec: sourceRef: name: my-prompts providers: default: - providerRef: name: claude-sonnetThe providers field maps group names (here, default) to lists of provider or agent entries. Groups correspond to the roles defined in the arena configuration file within the source. You can mix LLM providers and agents in the same group:
providers: default: - providerRef: name: claude-sonnet - providerRef: name: gpt-4o - agentRef: name: my-custom-agent judge: - providerRef: name: claude-opus3. Monitor Results
Section titled “3. Monitor Results”Check job status and retrieve results:
kubectl get arenajob evaluation-run-001 -o yamlRevision Tracking
Section titled “Revision Tracking”Arena tracks source revisions for reproducibility:
| Source Type | Revision Format | Example |
|---|---|---|
| Git | branch@sha1:commit | main@sha1:abc123 |
| OCI | tag@sha256:digest | v1.0@sha256:def456 |
| ConfigMap | resourceVersion | 12345 |
This enables:
- Reproducible tests: Re-run with exact same bundle version
- Change detection: Only re-test when sources change
- Audit trail: Track which versions were tested
GitOps Integration
Section titled “GitOps Integration”Arena integrates naturally with GitOps workflows:
- Developers push prompt changes to Git
- ArenaSource detects changes and updates artifacts
- ArenaJob runs tests against new version
- Results inform whether to promote changes
Developer → Git Push → ArenaSource → ArenaJob → Results ↓ Artifact UpdateNext Steps
Section titled “Next Steps”- ArenaSource CRD Reference: Complete spec details
- ArenaJob CRD Reference: Job execution details