Arena Fleet: Run Your First Evaluation
This tutorial walks you through running your first prompt evaluation using Arena Fleet. By the end, you’ll have a complete evaluation pipeline running in your cluster.
Prerequisites
Section titled “Prerequisites”Before you begin, ensure you have:
- A Kubernetes cluster with Omnia installed
kubectlconfigured to access your cluster- An LLM provider configured (or use the demo Ollama setup)
Overview
Section titled “Overview”Arena Fleet evaluates prompts through three CRDs:
ArenaSource → ArenaConfig → ArenaJob → Results │ │ │ │ │ └── Executes the evaluation │ └── Defines what to test and how └── Fetches your PromptKit bundleStep 1: Create an ArenaSource
Section titled “Step 1: Create an ArenaSource”An ArenaSource defines where to fetch your PromptKit bundle from. For this tutorial, we’ll use a ConfigMap source.
First, create a ConfigMap with a simple PromptKit bundle containing a test scenario:
apiVersion: v1kind: ConfigMapmetadata: name: greeting-prompts namespace: defaultdata: pack.json: | { "$schema": "https://promptpack.org/schema/latest/promptpack.schema.json", "id": "greeting-prompts", "name": "Greeting Prompts", "version": "1.0.0", "template_engine": { "version": "v1", "syntax": "{{variable}}" }, "prompts": { "greeting": { "id": "greeting", "name": "Greeting Prompt", "version": "1.0.0", "system_template": "You are a friendly assistant. Respond warmly to greetings.", "user_template": "Say hello to {{name}}.", "parameters": { "temperature": 0.7 } } }, "scenarios": { "greeting-test": { "id": "greeting-test", "name": "Greeting Test", "prompt_ref": "greeting", "variables": { "name": "World" }, "assertions": [ { "type": "contains", "value": "hello", "case_insensitive": true } ] } } }Now create the ArenaSource:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaSourcemetadata: name: greeting-source namespace: defaultspec: type: configmap configMap: name: greeting-prompts key: pack.json interval: 5mApply both resources:
kubectl apply -f configmap.yamlkubectl apply -f arenasource.yamlVerify the source is ready:
kubectl get arenasource greeting-sourceYou should see:
NAME TYPE PHASE REVISION AGEgreeting-source configmap Ready 12345 10sStep 2: Configure a Provider
Section titled “Step 2: Configure a Provider”If you don’t already have a Provider configured, create one:
apiVersion: v1kind: Secretmetadata: name: llm-credentials namespace: defaulttype: OpaquestringData: ANTHROPIC_API_KEY: "sk-ant-..." # Or OPENAI_API_KEY---apiVersion: omnia.altairalabs.ai/v1alpha1kind: Providermetadata: name: claude-provider namespace: defaultspec: type: claude model: claude-sonnet-4-20250514 secretRef: name: llm-credentialskubectl apply -f provider.yamlVerify the provider is ready:
kubectl get provider claude-providerStep 3: Create an ArenaConfig
Section titled “Step 3: Create an ArenaConfig”The ArenaConfig combines your source with providers and evaluation settings:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaConfigmetadata: name: greeting-eval namespace: defaultspec: sourceRef: name: greeting-source providers: - name: claude-provider evaluation: timeout: "2m" maxRetries: 2 concurrency: 1 metrics: - latency - tokenskubectl apply -f arenaconfig.yamlVerify the config is ready:
kubectl get arenaconfig greeting-evalStep 4: Run an ArenaJob
Section titled “Step 4: Run an ArenaJob”Create an ArenaJob to execute the evaluation:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: ArenaJobmetadata: name: greeting-eval-001 namespace: defaultspec: sourceRef: name: greeting-eval type: evaluation evaluation: outputFormats: - json workers: replicas: 1 ttlSecondsAfterFinished: 3600kubectl apply -f arenajob.yamlStep 5: Monitor the Job
Section titled “Step 5: Monitor the Job”Watch the job progress:
kubectl get arenajob greeting-eval-001 -wYou’ll see the job progress through phases:
NAME PHASE PROGRESS AGEgreeting-eval-001 Pending 0/1 5sgreeting-eval-001 Running 0/1 10sgreeting-eval-001 Running 1/1 25sgreeting-eval-001 Succeeded 1/1 30sGet detailed status:
kubectl get arenajob greeting-eval-001 -o yamlThe status section shows:
status: phase: Succeeded progress: total: 1 completed: 1 failed: 0 result: summary: passed: 1 failed: 0 duration: "5.2s"Step 6: View Results
Section titled “Step 6: View Results”For jobs with S3 or PVC output configured, results are stored at the configured location. For this simple example, view results in the job status:
kubectl describe arenajob greeting-eval-001To see worker logs:
kubectl logs -l arena.omnia.altairalabs.ai/job=greeting-eval-001Understanding the Results
Section titled “Understanding the Results”Arena Fleet evaluations produce results showing:
- Pass/Fail: Whether assertions passed
- Latency: Response time from the LLM
- Tokens: Input/output token counts
- Cost: Estimated cost (if pricing configured)
Example result summary:
{ "job": "greeting-eval-001", "scenarios": [ { "id": "greeting-test", "provider": "claude-provider", "passed": true, "latency_ms": 1234, "tokens": { "input": 45, "output": 28 }, "assertions": [ { "type": "contains", "expected": "hello", "actual": "Hello, World! How can I help you today?", "passed": true } ] } ]}Next Steps
Section titled “Next Steps”Now that you’ve run your first evaluation:
- Configure S3 Storage: Store results in S3 for persistence
- Set Up Scheduled Jobs: Run evaluations on a schedule
- Monitor Job Progress: Track evaluations in real-time
- Use Git Sources: Fetch bundles from Git repositories
- Compare Providers: Test against multiple LLMs
Cleanup
Section titled “Cleanup”Remove the resources created in this tutorial:
kubectl delete arenajob greeting-eval-001kubectl delete arenaconfig greeting-evalkubectl delete arenasource greeting-sourcekubectl delete configmap greeting-promptskubectl delete provider claude-providerkubectl delete secret llm-credentials