ArenaJob CRD
The ArenaJob custom resource defines a test execution that runs scenarios from an ArenaConfig. It supports evaluation, load testing, and data generation job types with configurable workers and output destinations.
API Version
Section titled “API Version”apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobOverview
Section titled “Overview”ArenaJob provides:
- Multiple job types: Evaluation, load testing, and data generation
- Worker scaling: Configure replicas and autoscaling
- Flexible output: Store results in S3 or PVC
- Scheduling support: Cron-based recurring execution
- Progress tracking: Real-time status and progress updates
Spec Fields
Section titled “Spec Fields”sourceRef
Section titled “sourceRef”Reference to the ArenaSource containing test scenarios and configuration.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Name of the ArenaSource |
spec: sourceRef: name: my-evaluation-sourceThe type of job to execute.
| Value | Description |
|---|---|
evaluation | Run prompt evaluation against test scenarios (default) |
loadtest | Run load testing against providers |
datagen | Generate synthetic data using prompts |
spec: type: evaluationscenarios
Section titled “scenarios”Override scenario selection from the ArenaConfig. If not specified, uses the ArenaConfig’s scenario settings.
| Field | Type | Description |
|---|---|---|
include | []string | Glob patterns for scenarios to include |
exclude | []string | Glob patterns for scenarios to exclude |
spec: scenarios: include: - "scenarios/critical-*.yaml" exclude: - "*-slow.yaml"evaluation
Section titled “evaluation”Settings specific to evaluation jobs.
| Field | Type | Description |
|---|---|---|
outputFormats | []string | Result formats: junit, json, csv |
spec: type: evaluation evaluation: outputFormats: - junit - jsonloadTest
Section titled “loadTest”Settings specific to load testing jobs.
| Field | Type | Default | Description |
|---|---|---|---|
rampUp | string | ”30s” | Duration to ramp up to target |
duration | string | ”5m” | Total test duration |
targetRPS | integer | - | Target requests per second |
spec: type: loadtest loadTest: rampUp: 1m duration: 10m targetRPS: 100dataGen
Section titled “dataGen”Settings specific to data generation jobs.
| Field | Type | Default | Description |
|---|---|---|---|
count | integer | 100 | Number of items to generate |
format | string | ”jsonl” | Output format: json, jsonl, csv |
spec: type: datagen dataGen: count: 1000 format: jsonlworkers
Section titled “workers”Configure the worker pool for job execution.
| Field | Type | Default | Description |
|---|---|---|---|
replicas | integer | 1 | Number of worker replicas |
minReplicas | integer | - | Minimum for autoscaling |
maxReplicas | integer | - | Maximum for autoscaling |
spec: workers: replicas: 10For autoscaling:
spec: workers: minReplicas: 2 maxReplicas: 20providers
Section titled “providers”Map of group names to lists of provider/agent entries. Groups correspond to the arena config’s provider groups (e.g., "default", "judge", "selfplay"). When set, provider YAML files from the arena project are ignored and the worker resolves providers directly from CRDs.
Each entry is an ArenaProviderEntry with exactly one of the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
providerRef | object | Conditional | Reference to a Provider CRD |
providerRef.name | string | Yes | Name of the Provider resource |
providerRef.namespace | string | No | Namespace (defaults to the ArenaJob’s namespace) |
agentRef | object | Conditional | Reference to an AgentRuntime CRD |
agentRef.name | string | Yes | Name of the AgentRuntime resource |
A CEL validation rule enforces that exactly one of providerRef or agentRef is set on each entry. Setting both or neither will be rejected at admission time.
Agents and LLM providers are interchangeable in the scenario x provider matrix. An agentRef entry causes the worker to connect to the agent over WebSocket instead of making direct LLM API calls.
Example: Single Provider Group
Section titled “Example: Single Provider Group”spec: providers: default: - providerRef: name: gpt4-prodExample: Multiple Providers in a Group
Section titled “Example: Multiple Providers in a Group”When a group contains multiple entries, each provider is evaluated against every scenario:
spec: providers: default: - providerRef: name: gpt4-prod - providerRef: name: claude-sonnet - providerRef: name: gemini-proExample: Separate Judge Provider
Section titled “Example: Separate Judge Provider”Use a dedicated provider group for the judge (evaluator) model:
spec: providers: default: - providerRef: name: gpt4-prod - providerRef: name: claude-sonnet judge: - providerRef: name: claude-opusExample: Agent Entry
Section titled “Example: Agent Entry”Reference a deployed AgentRuntime instead of a raw LLM provider. The worker connects to the agent’s WebSocket endpoint:
spec: providers: default: - agentRef: name: my-support-agentExample: Self-Play with Mixed Types
Section titled “Example: Self-Play with Mixed Types”Mix LLM providers and agents in a self-play evaluation:
spec: providers: selfplay: - providerRef: name: gpt4-prod - agentRef: name: my-agent-v2 judge: - providerRef: name: claude-opusExample: Cross-Namespace Provider
Section titled “Example: Cross-Namespace Provider”Reference a Provider in a different namespace:
spec: providers: default: - providerRef: name: shared-gpt4 namespace: shared-providerstoolRegistries
Section titled “toolRegistries”List of ToolRegistry CRD references whose discovered tools replace the arena config’s tool and MCP server file references. When set, tool YAML files from the arena project are ignored.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Name of the ToolRegistry resource |
spec: toolRegistries: - name: production-toolsHow Tool Registries Work
Section titled “How Tool Registries Work”- The controller reads each referenced ToolRegistry CRD
- Discovered tools from each registry’s status are extracted
- These tools replace any tools defined in the arena config files
- The worker receives the resolved tool endpoints via configuration
This is useful for:
- Switching between mock and real tool implementations per environment
- Routing tool calls to different endpoints
- Dynamic service discovery for tool handlers
Example: Multiple Tool Registries
Section titled “Example: Multiple Tool Registries”spec: toolRegistries: - name: core-tools - name: billing-toolsCombining Providers and Tool Registries
Section titled “Combining Providers and Tool Registries”You can use both providers and toolRegistries together for complete CRD-based runtime configuration:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobmetadata: name: production-evalspec: sourceRef: name: my-source providers: default: - providerRef: name: gpt4-prod - providerRef: name: claude-sonnet judge: - providerRef: name: claude-opus toolRegistries: - name: production-tools workers: replicas: 5 output: type: s3 s3: bucket: arena-results prefix: "evals/"output
Section titled “output”Configure where job results are stored.
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Destination type: s3, pvc |
s3 | object | Conditional | S3 configuration (when type is s3) |
pvc | object | Conditional | PVC configuration (when type is pvc) |
S3 Output
Section titled “S3 Output”| Field | Type | Required | Description |
|---|---|---|---|
bucket | string | Yes | S3 bucket name |
prefix | string | No | Key prefix for objects |
region | string | No | AWS region |
endpoint | string | No | Custom S3-compatible endpoint |
secretRef | object | No | Credentials secret reference |
spec: output: type: s3 s3: bucket: arena-results prefix: "evals/nightly/" region: us-west-2 secretRef: name: s3-credentialsPVC Output
Section titled “PVC Output”| Field | Type | Required | Description |
|---|---|---|---|
claimName | string | Yes | PVC name |
subPath | string | No | Subdirectory within PVC |
spec: output: type: pvc pvc: claimName: arena-results-pvc subPath: "evals/"schedule
Section titled “schedule”Configure scheduled/recurring job execution.
| Field | Type | Default | Description |
|---|---|---|---|
cron | string | - | Cron expression for scheduling |
timezone | string | ”UTC” | Timezone for cron |
concurrencyPolicy | string | ”Forbid” | Allow, Forbid, or Replace |
spec: schedule: cron: "0 2 * * *" # 2am daily timezone: "America/New_York" concurrencyPolicy: ForbidttlSecondsAfterFinished
Section titled “ttlSecondsAfterFinished”How long to keep completed jobs before automatic cleanup.
spec: ttlSecondsAfterFinished: 86400 # 24 hoursStatus Fields
Section titled “Status Fields”| Value | Description |
|---|---|
Pending | Job is waiting to start |
Running | Job is actively executing |
Succeeded | Job completed successfully |
Failed | Job failed |
Cancelled | Job was cancelled |
progress
Section titled “progress”Tracks job execution progress.
| Field | Description |
|---|---|
total | Total number of work items |
completed | Successfully completed items |
failed | Failed items |
pending | Pending items |
result
Section titled “result”Contains summary results for completed jobs.
| Field | Description |
|---|---|
url | URL to access detailed results |
summary | Aggregated result metrics |
conditions
Section titled “conditions”| Type | Description |
|---|---|
Ready | Overall readiness of the job |
ConfigValid | ArenaConfig reference is valid and ready |
JobCreated | Worker K8s Job has been created |
Progressing | Job is actively executing workers |
Timing Fields
Section titled “Timing Fields”| Field | Description |
|---|---|
startTime | When the job started |
completionTime | When the job completed |
lastScheduleTime | Last scheduled job trigger |
nextScheduleTime | Next scheduled execution |
activeWorkers
Section titled “activeWorkers”Current number of active worker pods.
Complete Examples
Section titled “Complete Examples”Basic Evaluation Job
Section titled “Basic Evaluation Job”apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobmetadata: name: basic-eval namespace: arenaspec: sourceRef: name: my-configMulti-Worker Evaluation
Section titled “Multi-Worker Evaluation”apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobmetadata: name: parallel-eval namespace: arenaspec: sourceRef: name: provider-comparison type: evaluation evaluation: outputFormats: - junit - json workers: replicas: 10 output: type: s3 s3: bucket: arena-results prefix: "evals/parallel/"Scheduled Nightly Evaluation
Section titled “Scheduled Nightly Evaluation”apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobmetadata: name: nightly-eval namespace: arenaspec: sourceRef: name: production-tests type: evaluation workers: replicas: 5 output: type: s3 s3: bucket: arena-results prefix: "evals/nightly/" schedule: cron: "0 2 * * *" timezone: "UTC" ttlSecondsAfterFinished: 604800 # 7 daysLoad Testing Job
Section titled “Load Testing Job”apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobmetadata: name: provider-loadtest namespace: arenaspec: sourceRef: name: load-test-config type: loadtest loadTest: rampUp: 2m duration: 30m targetRPS: 500 workers: minReplicas: 5 maxReplicas: 50 output: type: s3 s3: bucket: loadtest-results prefix: "loadtests/"Data Generation Job
Section titled “Data Generation Job”apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobmetadata: name: synthetic-data namespace: arenaspec: sourceRef: name: datagen-config type: datagen dataGen: count: 10000 format: jsonl workers: replicas: 4 output: type: pvc pvc: claimName: generated-data subPath: "batch-001/"Workflow
Section titled “Workflow”- Create ArenaConfig - Define test configuration with providers and settings
- Create ArenaJob - Reference the config and specify execution parameters
- Monitor Progress - Watch status.progress for completion
- Retrieve Results - Access results from configured output destination
ArenaConfig ──▶ ArenaJob ──▶ Workers ──▶ Results │ ├──▶ Progress tracking └──▶ Output storageRelated Resources
Section titled “Related Resources”- ArenaSource: Defines bundle sources
- ArenaConfig: Test configuration
- Provider: LLM provider configuration