Skip to content

ArenaJob CRD

The ArenaJob custom resource defines a test execution that runs scenarios from an ArenaConfig. It supports evaluation, load testing, and data generation job types with configurable workers and output destinations.

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaJob

ArenaJob provides:

  • Multiple job types: Evaluation, load testing, and data generation
  • Worker scaling: Configure replicas and autoscaling
  • Flexible output: Store results in S3 or PVC
  • Scheduling support: Cron-based recurring execution
  • Progress tracking: Real-time status and progress updates

Reference to the ArenaSource containing test scenarios and configuration.

FieldTypeRequiredDescription
namestringYesName of the ArenaSource
spec:
sourceRef:
name: my-evaluation-source

The type of job to execute.

ValueDescription
evaluationRun prompt evaluation against test scenarios (default)
loadtestRun load testing against providers
datagenGenerate synthetic data using prompts
spec:
type: evaluation

Override scenario selection from the ArenaConfig. If not specified, uses the ArenaConfig’s scenario settings.

FieldTypeDescription
include[]stringGlob patterns for scenarios to include
exclude[]stringGlob patterns for scenarios to exclude
spec:
scenarios:
include:
- "scenarios/critical-*.yaml"
exclude:
- "*-slow.yaml"

Settings specific to evaluation jobs.

FieldTypeDescription
outputFormats[]stringResult formats: junit, json, csv
spec:
type: evaluation
evaluation:
outputFormats:
- junit
- json

Settings specific to load testing jobs.

FieldTypeDefaultDescription
rampUpstring”30s”Duration to ramp up to target
durationstring”5m”Total test duration
targetRPSinteger-Target requests per second
spec:
type: loadtest
loadTest:
rampUp: 1m
duration: 10m
targetRPS: 100

Settings specific to data generation jobs.

FieldTypeDefaultDescription
countinteger100Number of items to generate
formatstring”jsonl”Output format: json, jsonl, csv
spec:
type: datagen
dataGen:
count: 1000
format: jsonl

Configure the worker pool for job execution.

FieldTypeDefaultDescription
replicasinteger1Number of worker replicas
minReplicasinteger-Minimum for autoscaling
maxReplicasinteger-Maximum for autoscaling
spec:
workers:
replicas: 10

For autoscaling:

spec:
workers:
minReplicas: 2
maxReplicas: 20

Map of group names to lists of provider/agent entries. Groups correspond to the arena config’s provider groups (e.g., "default", "judge", "selfplay"). When set, provider YAML files from the arena project are ignored and the worker resolves providers directly from CRDs.

Each entry is an ArenaProviderEntry with exactly one of the following fields:

FieldTypeRequiredDescription
providerRefobjectConditionalReference to a Provider CRD
providerRef.namestringYesName of the Provider resource
providerRef.namespacestringNoNamespace (defaults to the ArenaJob’s namespace)
agentRefobjectConditionalReference to an AgentRuntime CRD
agentRef.namestringYesName of the AgentRuntime resource

A CEL validation rule enforces that exactly one of providerRef or agentRef is set on each entry. Setting both or neither will be rejected at admission time.

Agents and LLM providers are interchangeable in the scenario x provider matrix. An agentRef entry causes the worker to connect to the agent over WebSocket instead of making direct LLM API calls.

spec:
providers:
default:
- providerRef:
name: gpt4-prod

When a group contains multiple entries, each provider is evaluated against every scenario:

spec:
providers:
default:
- providerRef:
name: gpt4-prod
- providerRef:
name: claude-sonnet
- providerRef:
name: gemini-pro

Use a dedicated provider group for the judge (evaluator) model:

spec:
providers:
default:
- providerRef:
name: gpt4-prod
- providerRef:
name: claude-sonnet
judge:
- providerRef:
name: claude-opus

Reference a deployed AgentRuntime instead of a raw LLM provider. The worker connects to the agent’s WebSocket endpoint:

spec:
providers:
default:
- agentRef:
name: my-support-agent

Mix LLM providers and agents in a self-play evaluation:

spec:
providers:
selfplay:
- providerRef:
name: gpt4-prod
- agentRef:
name: my-agent-v2
judge:
- providerRef:
name: claude-opus

Reference a Provider in a different namespace:

spec:
providers:
default:
- providerRef:
name: shared-gpt4
namespace: shared-providers

List of ToolRegistry CRD references whose discovered tools replace the arena config’s tool and MCP server file references. When set, tool YAML files from the arena project are ignored.

FieldTypeRequiredDescription
namestringYesName of the ToolRegistry resource
spec:
toolRegistries:
- name: production-tools
  1. The controller reads each referenced ToolRegistry CRD
  2. Discovered tools from each registry’s status are extracted
  3. These tools replace any tools defined in the arena config files
  4. The worker receives the resolved tool endpoints via configuration

This is useful for:

  • Switching between mock and real tool implementations per environment
  • Routing tool calls to different endpoints
  • Dynamic service discovery for tool handlers
spec:
toolRegistries:
- name: core-tools
- name: billing-tools

You can use both providers and toolRegistries together for complete CRD-based runtime configuration:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaJob
metadata:
name: production-eval
spec:
sourceRef:
name: my-source
providers:
default:
- providerRef:
name: gpt4-prod
- providerRef:
name: claude-sonnet
judge:
- providerRef:
name: claude-opus
toolRegistries:
- name: production-tools
workers:
replicas: 5
output:
type: s3
s3:
bucket: arena-results
prefix: "evals/"

Configure where job results are stored.

FieldTypeRequiredDescription
typestringYesDestination type: s3, pvc
s3objectConditionalS3 configuration (when type is s3)
pvcobjectConditionalPVC configuration (when type is pvc)
FieldTypeRequiredDescription
bucketstringYesS3 bucket name
prefixstringNoKey prefix for objects
regionstringNoAWS region
endpointstringNoCustom S3-compatible endpoint
secretRefobjectNoCredentials secret reference
spec:
output:
type: s3
s3:
bucket: arena-results
prefix: "evals/nightly/"
region: us-west-2
secretRef:
name: s3-credentials
FieldTypeRequiredDescription
claimNamestringYesPVC name
subPathstringNoSubdirectory within PVC
spec:
output:
type: pvc
pvc:
claimName: arena-results-pvc
subPath: "evals/"

Configure scheduled/recurring job execution.

FieldTypeDefaultDescription
cronstring-Cron expression for scheduling
timezonestring”UTC”Timezone for cron
concurrencyPolicystring”Forbid”Allow, Forbid, or Replace
spec:
schedule:
cron: "0 2 * * *" # 2am daily
timezone: "America/New_York"
concurrencyPolicy: Forbid

How long to keep completed jobs before automatic cleanup.

spec:
ttlSecondsAfterFinished: 86400 # 24 hours
ValueDescription
PendingJob is waiting to start
RunningJob is actively executing
SucceededJob completed successfully
FailedJob failed
CancelledJob was cancelled

Tracks job execution progress.

FieldDescription
totalTotal number of work items
completedSuccessfully completed items
failedFailed items
pendingPending items

Contains summary results for completed jobs.

FieldDescription
urlURL to access detailed results
summaryAggregated result metrics
TypeDescription
ReadyOverall readiness of the job
ConfigValidArenaConfig reference is valid and ready
JobCreatedWorker K8s Job has been created
ProgressingJob is actively executing workers
FieldDescription
startTimeWhen the job started
completionTimeWhen the job completed
lastScheduleTimeLast scheduled job trigger
nextScheduleTimeNext scheduled execution

Current number of active worker pods.

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaJob
metadata:
name: basic-eval
namespace: arena
spec:
sourceRef:
name: my-config
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaJob
metadata:
name: parallel-eval
namespace: arena
spec:
sourceRef:
name: provider-comparison
type: evaluation
evaluation:
outputFormats:
- junit
- json
workers:
replicas: 10
output:
type: s3
s3:
bucket: arena-results
prefix: "evals/parallel/"
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaJob
metadata:
name: nightly-eval
namespace: arena
spec:
sourceRef:
name: production-tests
type: evaluation
workers:
replicas: 5
output:
type: s3
s3:
bucket: arena-results
prefix: "evals/nightly/"
schedule:
cron: "0 2 * * *"
timezone: "UTC"
ttlSecondsAfterFinished: 604800 # 7 days
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaJob
metadata:
name: provider-loadtest
namespace: arena
spec:
sourceRef:
name: load-test-config
type: loadtest
loadTest:
rampUp: 2m
duration: 30m
targetRPS: 500
workers:
minReplicas: 5
maxReplicas: 50
output:
type: s3
s3:
bucket: loadtest-results
prefix: "loadtests/"
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaJob
metadata:
name: synthetic-data
namespace: arena
spec:
sourceRef:
name: datagen-config
type: datagen
dataGen:
count: 10000
format: jsonl
workers:
replicas: 4
output:
type: pvc
pvc:
claimName: generated-data
subPath: "batch-001/"
  1. Create ArenaConfig - Define test configuration with providers and settings
  2. Create ArenaJob - Reference the config and specify execution parameters
  3. Monitor Progress - Watch status.progress for completion
  4. Retrieve Results - Access results from configured output destination
ArenaConfig ──▶ ArenaJob ──▶ Workers ──▶ Results
├──▶ Progress tracking
└──▶ Output storage