WebSocket Protocol
This document describes the WebSocket protocol used by Omnia agent facades.
Connection
Section titled “Connection”URL Format
Section titled “URL Format”ws://host:port?agent=<agent-name>&namespace=<namespace>&binary=<true|false>| Parameter | Required | Description |
|---|---|---|
agent | Yes | Name of the AgentRuntime |
namespace | No | Namespace (defaults to default) |
binary | No | Enable binary WebSocket frame support (defaults to false) |
Example
Section titled “Example”websocat "ws://localhost:8080?agent=my-agent&namespace=production"Message Types
Section titled “Message Types”Client Messages
Section titled “Client Messages”Messages sent from client to server.
Message
Section titled “Message”Send a user message to the agent:
{ "type": "message", "content": "Hello, how are you?", "session_id": "optional-session-id", "metadata": { "user_id": "user-123" }}| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Must be "message" |
content | string | No | User message content (text-only) |
parts | array | No | Multi-modal content parts (see below) |
session_id | string | No | Resume existing session |
metadata | object | No | Custom metadata |
Note: Either
contentorpartsshould be provided. If both are present,partstakes precedence.
Multi-Modal Message
Section titled “Multi-Modal Message”Send a message with images or other media:
{ "type": "message", "session_id": "sess-abc123", "parts": [ { "type": "text", "text": "What's in this image?" }, { "type": "image", "media": { "url": "https://example.com/photo.jpg", "mime_type": "image/jpeg" } } ]}ContentPart Types
Section titled “ContentPart Types”| Type | Description |
|---|---|
text | Plain text content |
image | Image (JPEG, PNG, GIF, WebP) |
audio | Audio file (MP3, WAV, OGG) |
video | Video file (MP4, WebM) |
file | Generic file attachment |
ContentPart Structure
Section titled “ContentPart Structure”interface ContentPart { type: "text" | "image" | "audio" | "video" | "file" text?: string // For type: "text" media?: MediaContent // For media types}
interface MediaContent { // Data source (exactly one required) data?: string // Base64-encoded (< 256KB recommended) url?: string // HTTP/HTTPS URL storage_ref?: string // Backend storage reference
// Required mime_type: string // e.g., "image/jpeg", "audio/mp3"
// Optional metadata filename?: string size_bytes?: number
// Image-specific width?: number height?: number detail?: "low" | "high" | "auto" // Vision model hint
// Audio/Video-specific duration_ms?: number sample_rate?: number // Audio: Hz channels?: number // Audio: 1=mono, 2=stereo}Example: Image with Base64 Data
Section titled “Example: Image with Base64 Data”{ "type": "message", "parts": [ { "type": "text", "text": "Describe this image" }, { "type": "image", "media": { "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR...", "mime_type": "image/png" } } ]}Upload Request
Section titled “Upload Request”Request an upload URL for a file (requires facade media storage to be enabled):
{ "type": "upload_request", "session_id": "sess-abc123", "upload_request": { "filename": "photo.jpg", "mime_type": "image/jpeg", "size_bytes": 102400 }}| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Must be "upload_request" |
session_id | string | No | Resume existing session |
upload_request.filename | string | Yes | Original filename |
upload_request.mime_type | string | Yes | MIME type of the file |
upload_request.size_bytes | number | Yes | File size in bytes |
The server responds with an upload_ready message containing the upload URL. After uploading the file via HTTP PUT, the client can reference it using the storage_ref in subsequent messages.
Server Messages
Section titled “Server Messages”Messages sent from server to client.
Connected
Section titled “Connected”Sent immediately after connection:
{ "type": "connected", "session_id": "sess-abc123"}When binary=true is specified in the connection URL, the connected message includes capabilities:
{ "type": "connected", "session_id": "sess-abc123", "connected": { "capabilities": { "binary_frames": true, "max_payload_size": 524288, "protocol_version": 1 } }}| Field | Type | Description |
|---|---|---|
connected.capabilities.binary_frames | boolean | Server supports binary WebSocket frames |
connected.capabilities.max_payload_size | number | Maximum payload size in bytes |
connected.capabilities.protocol_version | number | Binary protocol version |
Streaming response chunk:
{ "type": "chunk", "content": "Hello! I'm doing"}Final response completion:
{ "type": "done", "content": "Hello! I'm doing great, thank you for asking!"}Multi-Modal Response
Section titled “Multi-Modal Response”For responses containing media (e.g., generated images), the server uses the parts array:
{ "type": "done", "session_id": "sess-abc123", "parts": [ { "type": "text", "text": "Here's the image you requested:" }, { "type": "image", "media": { "url": "https://storage.example.com/generated/img-123.png", "mime_type": "image/png", "width": 1024, "height": 1024 } } ]}Note: When
partsis present, it takes precedence overcontent. For backward compatibility, text-only responses may use either format.
Tool Call
Section titled “Tool Call”Agent is calling a tool:
{ "type": "tool_call", "tool_call": { "id": "tc-123", "name": "weather", "arguments": { "location": "San Francisco" } }}Tool Result
Section titled “Tool Result”Result from a tool call:
{ "type": "tool_result", "tool_result": { "id": "tc-123", "result": "72°F, Sunny" }}Upload Ready
Section titled “Upload Ready”Response to an upload_request with the upload URL:
{ "type": "upload_ready", "session_id": "sess-abc123", "upload_ready": { "upload_id": "upl-xyz789", "upload_url": "http://agent.example.com/media/upload/upl-xyz789", "storage_ref": "omnia://sessions/sess-abc123/media/med-def456", "expires_at": "2025-01-09T12:00:00Z" }}| Field | Type | Description |
|---|---|---|
upload_ready.upload_id | string | Unique upload identifier |
upload_ready.upload_url | string | URL to PUT the file content |
upload_ready.storage_ref | string | Storage reference for the uploaded file |
upload_ready.expires_at | string | When the upload URL expires (ISO 8601) |
Upload Complete
Section titled “Upload Complete”Notification that a file upload has completed successfully:
{ "type": "upload_complete", "session_id": "sess-abc123", "upload_complete": { "upload_id": "upl-xyz789", "storage_ref": "omnia://sessions/sess-abc123/media/med-def456", "size_bytes": 102400 }}| Field | Type | Description |
|---|---|---|
upload_complete.upload_id | string | Upload identifier |
upload_complete.storage_ref | string | Storage reference for the uploaded file |
upload_complete.size_bytes | number | Actual file size in bytes |
Media Chunk
Section titled “Media Chunk”Streaming media chunk for audio/video responses. Allows playback to begin before the entire media is generated:
{ "type": "media_chunk", "session_id": "sess-abc123", "media_chunk": { "media_id": "audio-xyz789", "sequence": 0, "is_last": false, "data": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVVVVVVVVVV...", "mime_type": "audio/mp3" }}| Field | Type | Description |
|---|---|---|
media_chunk.media_id | string | Unique identifier for the media stream |
media_chunk.sequence | number | Sequence number for ordering (0-indexed) |
media_chunk.is_last | boolean | Whether this is the final chunk |
media_chunk.data | string | Base64-encoded chunk data |
media_chunk.mime_type | string | MIME type (e.g., “audio/mp3”, “video/mp4”) |
The client should:
- Buffer chunks by
media_idandsequence - Begin playback once sufficient data is buffered
- Assemble the complete media when
is_last: trueis received - The final
donemessage may include a complete media URL for replay
Error message:
{ "type": "error", "error": { "code": "INVALID_MESSAGE", "message": "Failed to parse message" }}Error Codes
Section titled “Error Codes”| Code | Description |
|---|---|
INVALID_MESSAGE | Message format is invalid |
SESSION_NOT_FOUND | Specified session doesn’t exist |
PROVIDER_ERROR | LLM provider returned an error |
TOOL_ERROR | Tool execution failed |
INTERNAL_ERROR | Internal server error |
UPLOAD_FAILED | File upload operation failed |
MEDIA_NOT_ENABLED | Media storage is not enabled on the facade |
Message Flow
Section titled “Message Flow”New Conversation
Section titled “New Conversation”sequenceDiagram participant C as Client participant S as Server
C->>S: WebSocket connect S-->>C: connected (session_id) C->>S: message S-->>C: chunk S-->>C: chunk S-->>C: doneWith Tool Calls
Section titled “With Tool Calls”sequenceDiagram participant C as Client participant S as Server participant T as Tool Service
C->>S: message S->>T: Execute tool S-->>C: tool_call T-->>S: Result S-->>C: tool_result S-->>C: chunk S-->>C: doneWith File Upload (WebSocket)
Section titled “With File Upload (WebSocket)”sequenceDiagram participant C as Client participant S as Server
C->>S: upload_request S-->>C: upload_ready (upload_url, storage_ref) C->>S: PUT file to upload_url (HTTP) C->>S: message with storage_ref S-->>C: chunk S-->>C: doneThis flow shows uploading a file via WebSocket before sending a message that references it. The client:
- Sends an
upload_requestvia WebSocket - Receives
upload_readywith the upload URL - PUTs the file content to the upload URL via HTTP
- Sends a message with the
storage_refin the media content
With Streaming Media Response
Section titled “With Streaming Media Response”sequenceDiagram participant C as Client participant S as Server participant A as Agent (TTS/Video Gen)
C->>S: message ("Read this text aloud") S->>A: Generate audio A-->>S: Audio chunk 1 S-->>C: media_chunk (seq=0) Note over C: Begin playback A-->>S: Audio chunk 2 S-->>C: media_chunk (seq=1) A-->>S: Audio chunk 3 (final) S-->>C: media_chunk (seq=2, is_last=true) S-->>C: done (with complete media URL)This flow shows streaming audio/video responses. The client:
- Sends a message requesting audio/video generation
- Receives
media_chunkmessages as data becomes available - Buffers chunks by
media_idandsequencenumber - Begins playback once sufficient data is buffered
- Assembles the complete media when
is_last: trueis received - Optionally uses the complete media URL from the
donemessage for replay
Session Resumption
Section titled “Session Resumption”sequenceDiagram participant C as Client participant S as Server participant R as Session Store
C->>S: WebSocket connect S-->>C: connected (session_id) C->>S: message (with session_id) S->>R: Load session history R-->>S: History S-->>C: done (with context)Session Handling
Section titled “Session Handling”New Session
Section titled “New Session”Omit session_id to create a new session:
{"type": "message", "content": "Hello"}The server responds with a connected message containing the new session ID.
Resume Session
Section titled “Resume Session”Include session_id to resume:
{ "type": "message", "session_id": "sess-abc123", "content": "Continue our conversation"}If the session exists and hasn’t expired, conversation history is preserved.
Session Expiration
Section titled “Session Expiration”Sessions expire based on the AgentRuntime’s session.ttl configuration. Attempting to resume an expired session creates a new one.
Media Upload (Optional)
Section titled “Media Upload (Optional)”When facade media storage is enabled, clients can upload files via HTTP before referencing them in WebSocket messages. This avoids base64-encoding large files in the WebSocket protocol.
Upload Flow
Section titled “Upload Flow”sequenceDiagram participant C as Client participant F as Facade
C->>F: POST /media/request-upload F-->>C: {upload_url, storage_ref} C->>F: PUT /media/upload/{id} (file content) F-->>C: 204 No Content C->>F: WebSocket message with storage_refStep 1: Request Upload URL
Section titled “Step 1: Request Upload URL”POST /media/request-uploadContent-Type: application/json
{ "session_id": "sess-abc123", "filename": "photo.jpg", "mime_type": "image/jpeg", "size_bytes": 102400}Response:
{ "upload_url": "http://agent.example.com/media/upload/upl-xyz789", "upload_id": "upl-xyz789", "storage_ref": "omnia://sessions/sess-abc123/media/med-def456", "expires_at": "2025-01-09T12:00:00Z"}Step 2: Upload File
Section titled “Step 2: Upload File”PUT /media/upload/upl-xyz789Content-Type: image/jpeg
<binary file content>Response: 204 No Content on success.
Step 3: Reference in WebSocket Message
Section titled “Step 3: Reference in WebSocket Message”{ "type": "message", "session_id": "sess-abc123", "parts": [ { "type": "text", "text": "What's in this image?" }, { "type": "image", "media": { "storage_ref": "omnia://sessions/sess-abc123/media/med-def456", "mime_type": "image/jpeg" } } ]}Media Info Endpoint
Section titled “Media Info Endpoint”Retrieve metadata about uploaded media:
GET /media/info/{session-id}/{media-id}Response:
{ "filename": "photo.jpg", "mime_type": "image/jpeg", "size_bytes": 102400, "created_at": "2025-01-09T11:00:00Z", "expires_at": "2025-01-10T11:00:00Z"}Media Download Endpoint
Section titled “Media Download Endpoint”Download previously uploaded media:
GET /media/download/{session-id}/{media-id}Returns the file with appropriate Content-Type and Content-Disposition headers.
Note: Media upload is only available when the facade is configured with media storage. See AgentRuntime facade.media configuration for details.
Binary WebSocket Frames
Section titled “Binary WebSocket Frames”When binary frame support is enabled (binary=true query parameter), the server can send binary WebSocket frames for efficient media streaming. This reduces bandwidth by approximately 33% compared to base64-encoded JSON.
Binary Frame Structure
Section titled “Binary Frame Structure”┌──────────────────┬─────────────────┬──────────────────────────┐│ Header (32 bytes)│ Metadata (JSON) │ Binary Payload │└──────────────────┴─────────────────┴──────────────────────────┘Header Layout
Section titled “Header Layout”| Field | Offset | Size | Type | Description |
|---|---|---|---|---|
| Magic | 0 | 4 | bytes | "OMNI" magic bytes |
| Version | 4 | 1 | uint8 | Protocol version (currently 1) |
| Flags | 5 | 1 | uint8 | Bit flags (see below) |
| MessageType | 6 | 2 | uint16 | Message type (big-endian) |
| MetadataLen | 8 | 4 | uint32 | JSON metadata length (big-endian) |
| PayloadLen | 12 | 4 | uint32 | Binary payload length (big-endian) |
| Sequence | 16 | 4 | uint32 | Sequence number (big-endian) |
| MediaID | 20 | 12 | bytes | Media stream identifier |
| Bit | Name | Description |
|---|---|---|
| 0 | Compressed | Payload is compressed (reserved) |
| 1 | Chunked | Part of a chunked transfer |
| 2 | IsLast | Last chunk in a stream |
Message Types
Section titled “Message Types”| Value | Name | Description |
|---|---|---|
| 1 | MediaChunk | Streaming media chunk |
| 2 | Upload | Binary upload data (reserved) |
Binary Media Chunk
Section titled “Binary Media Chunk”When binary_frames capability is enabled, media_chunk messages may be sent as binary frames instead of JSON. The metadata contains:
{ "session_id": "sess-abc123", "mime_type": "audio/mp3"}The payload contains raw binary audio/video data (not base64-encoded).
Example: JavaScript Binary Frame Handling
Section titled “Example: JavaScript Binary Frame Handling”const ws = new WebSocket('ws://localhost:8080?agent=my-agent&binary=true');ws.binaryType = 'arraybuffer';
ws.onmessage = (event) => { if (event.data instanceof ArrayBuffer) { // Binary frame const view = new DataView(event.data); const magic = new TextDecoder().decode(new Uint8Array(event.data, 0, 4));
if (magic === 'OMNI') { const metadataLen = view.getUint32(8, false); const payloadLen = view.getUint32(12, false); const sequence = view.getUint32(16, false); const isLast = (view.getUint8(5) & 0x04) !== 0;
// Extract payload (raw audio/video data) const payload = event.data.slice(32 + metadataLen);
// Process binary media chunk... } } else { // JSON text frame const msg = JSON.parse(event.data); // Handle JSON message... }};Fallback Behavior
Section titled “Fallback Behavior”When a client doesn’t request binary frames (binary=true not set), the server always sends JSON text frames with base64-encoded media data. This ensures backward compatibility with existing clients.
Connection Health
Section titled “Connection Health”The server sends WebSocket ping frames to maintain connection health. Clients should respond with pong frames automatically (most WebSocket libraries handle this).
Default timeouts:
- Ping interval: 30 seconds
- Pong timeout: 60 seconds
Type Definitions
Section titled “Type Definitions”Source of Truth
Section titled “Source of Truth”The protocol types are defined in multiple places:
| Location | Purpose |
|---|---|
api/proto/runtime/v1/runtime.proto | Internal gRPC protocol (facade ↔ runtime) |
internal/facade/protocol.go | WebSocket protocol (client ↔ facade) |
dashboard/src/types/websocket.ts | TypeScript types for dashboard |
dashboard/src/lib/proto/ | Generated TypeScript from proto |
Generating TypeScript Types
Section titled “Generating TypeScript Types”TypeScript types can be generated from the Protocol Buffer definitions:
# Generate TypeScript from proto filescd dashboardnpm run generate:proto
# Or from the rootmake generate-proto-tsThe generated types are in dashboard/src/lib/proto/runtime/v1/runtime.ts and include:
ClientMessage/ServerMessage- Core message typesContentPart/MediaContent- Multi-modal contentToolCall/ToolResult- Tool invocation types- Helper functions:
toJSON(),fromJSON(),encode(),decode()
JSON Field Names
Section titled “JSON Field Names”The WebSocket protocol uses snake_case for JSON field names to match Go conventions:
{ "session_id": "...", "mime_type": "image/png", "size_bytes": 1024}The generated TypeScript types use camelCase for property names but serialize to snake_case JSON:
interface MediaContent { mimeType: string; // TypeScript property // Serializes to: { "mime_type": "..." }}