Skip to content

Arena Fleet: Run Your First Evaluation

This tutorial walks you through running your first prompt evaluation using Arena Fleet. By the end, you’ll have a complete evaluation pipeline running in your cluster.

Before you begin, ensure you have:

  • A Kubernetes cluster with Omnia installed
  • kubectl configured to access your cluster
  • An LLM provider configured (or use the demo Ollama setup)

Arena Fleet evaluates prompts through three CRDs:

ArenaSource → ArenaConfig → ArenaJob → Results
│ │ │
│ │ └── Executes the evaluation
│ └── Defines what to test and how
└── Fetches your PromptKit bundle

An ArenaSource defines where to fetch your PromptKit bundle from. For this tutorial, we’ll use a ConfigMap source.

First, create a ConfigMap with a simple PromptKit bundle containing a test scenario:

apiVersion: v1
kind: ConfigMap
metadata:
name: greeting-prompts
namespace: default
data:
pack.json: |
{
"$schema": "https://promptpack.org/schema/latest/promptpack.schema.json",
"id": "greeting-prompts",
"name": "Greeting Prompts",
"version": "1.0.0",
"template_engine": {
"version": "v1",
"syntax": "{{variable}}"
},
"prompts": {
"greeting": {
"id": "greeting",
"name": "Greeting Prompt",
"version": "1.0.0",
"system_template": "You are a friendly assistant. Respond warmly to greetings.",
"user_template": "Say hello to {{name}}.",
"parameters": {
"temperature": 0.7
}
}
},
"scenarios": {
"greeting-test": {
"id": "greeting-test",
"name": "Greeting Test",
"prompt_ref": "greeting",
"variables": {
"name": "World"
},
"assertions": [
{
"type": "contains",
"value": "hello",
"case_insensitive": true
}
]
}
}
}

Now create the ArenaSource:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaSource
metadata:
name: greeting-source
namespace: default
spec:
type: configmap
configMap:
name: greeting-prompts
key: pack.json
interval: 5m

Apply both resources:

Terminal window
kubectl apply -f configmap.yaml
kubectl apply -f arenasource.yaml

Verify the source is ready:

Terminal window
kubectl get arenasource greeting-source

You should see:

NAME TYPE PHASE REVISION AGE
greeting-source configmap Ready 12345 10s

If you don’t already have a Provider configured, create one:

apiVersion: v1
kind: Secret
metadata:
name: llm-credentials
namespace: default
type: Opaque
stringData:
ANTHROPIC_API_KEY: "sk-ant-..." # Or OPENAI_API_KEY
---
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: claude-provider
namespace: default
spec:
type: claude
model: claude-sonnet-4-20250514
secretRef:
name: llm-credentials
Terminal window
kubectl apply -f provider.yaml

Verify the provider is ready:

Terminal window
kubectl get provider claude-provider

The ArenaConfig combines your source with providers and evaluation settings:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaConfig
metadata:
name: greeting-eval
namespace: default
spec:
sourceRef:
name: greeting-source
providers:
- name: claude-provider
evaluation:
timeout: "2m"
maxRetries: 2
concurrency: 1
metrics:
- latency
- tokens
Terminal window
kubectl apply -f arenaconfig.yaml

Verify the config is ready:

Terminal window
kubectl get arenaconfig greeting-eval

Create an ArenaJob to execute the evaluation:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ArenaJob
metadata:
name: greeting-eval-001
namespace: default
spec:
sourceRef:
name: greeting-eval
type: evaluation
evaluation:
outputFormats:
- json
workers:
replicas: 1
ttlSecondsAfterFinished: 3600
Terminal window
kubectl apply -f arenajob.yaml

Watch the job progress:

Terminal window
kubectl get arenajob greeting-eval-001 -w

You’ll see the job progress through phases:

NAME PHASE PROGRESS AGE
greeting-eval-001 Pending 0/1 5s
greeting-eval-001 Running 0/1 10s
greeting-eval-001 Running 1/1 25s
greeting-eval-001 Succeeded 1/1 30s

Get detailed status:

Terminal window
kubectl get arenajob greeting-eval-001 -o yaml

The status section shows:

status:
phase: Succeeded
progress:
total: 1
completed: 1
failed: 0
result:
summary:
passed: 1
failed: 0
duration: "5.2s"

For jobs with S3 or PVC output configured, results are stored at the configured location. For this simple example, view results in the job status:

Terminal window
kubectl describe arenajob greeting-eval-001

To see worker logs:

Terminal window
kubectl logs -l arena.omnia.altairalabs.ai/job=greeting-eval-001

Arena Fleet evaluations produce results showing:

  • Pass/Fail: Whether assertions passed
  • Latency: Response time from the LLM
  • Tokens: Input/output token counts
  • Cost: Estimated cost (if pricing configured)

Example result summary:

{
"job": "greeting-eval-001",
"scenarios": [
{
"id": "greeting-test",
"provider": "claude-provider",
"passed": true,
"latency_ms": 1234,
"tokens": {
"input": 45,
"output": 28
},
"assertions": [
{
"type": "contains",
"expected": "hello",
"actual": "Hello, World! How can I help you today?",
"passed": true
}
]
}
]
}

Now that you’ve run your first evaluation:

Remove the resources created in this tutorial:

Terminal window
kubectl delete arenajob greeting-eval-001
kubectl delete arenaconfig greeting-eval
kubectl delete arenasource greeting-source
kubectl delete configmap greeting-prompts
kubectl delete provider claude-provider
kubectl delete secret llm-credentials