Skip to content

WebSocket Protocol

This document describes the WebSocket protocol used by Omnia agent facades.

ws://host:port?agent=<agent-name>&namespace=<namespace>&binary=<true|false>
ParameterRequiredDescription
agentYesName of the AgentRuntime
namespaceNoNamespace (defaults to default)
binaryNoEnable binary WebSocket frame support (defaults to false)
Terminal window
websocat "ws://localhost:8080?agent=my-agent&namespace=production"

Messages sent from client to server.

Send a user message to the agent:

{
"type": "message",
"content": "Hello, how are you?",
"session_id": "optional-session-id",
"metadata": {
"user_id": "user-123"
}
}
FieldTypeRequiredDescription
typestringYesMust be "message"
contentstringNoUser message content (text-only)
partsarrayNoMulti-modal content parts (see below)
session_idstringNoResume existing session
metadataobjectNoCustom metadata

Note: Either content or parts should be provided. If both are present, parts takes precedence.

Send a message with images or other media:

{
"type": "message",
"session_id": "sess-abc123",
"parts": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image",
"media": {
"url": "https://example.com/photo.jpg",
"mime_type": "image/jpeg"
}
}
]
}
TypeDescription
textPlain text content
imageImage (JPEG, PNG, GIF, WebP)
audioAudio file (MP3, WAV, OGG)
videoVideo file (MP4, WebM)
fileGeneric file attachment
interface ContentPart {
type: "text" | "image" | "audio" | "video" | "file"
text?: string // For type: "text"
media?: MediaContent // For media types
}
interface MediaContent {
// Data source (exactly one required)
data?: string // Base64-encoded (< 256KB recommended)
url?: string // HTTP/HTTPS URL
storage_ref?: string // Backend storage reference
// Required
mime_type: string // e.g., "image/jpeg", "audio/mp3"
// Optional metadata
filename?: string
size_bytes?: number
// Image-specific
width?: number
height?: number
detail?: "low" | "high" | "auto" // Vision model hint
// Audio/Video-specific
duration_ms?: number
sample_rate?: number // Audio: Hz
channels?: number // Audio: 1=mono, 2=stereo
}
{
"type": "message",
"parts": [
{ "type": "text", "text": "Describe this image" },
{
"type": "image",
"media": {
"data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR...",
"mime_type": "image/png"
}
}
]
}

Request an upload URL for a file (requires facade media storage to be enabled):

{
"type": "upload_request",
"session_id": "sess-abc123",
"upload_request": {
"filename": "photo.jpg",
"mime_type": "image/jpeg",
"size_bytes": 102400
}
}
FieldTypeRequiredDescription
typestringYesMust be "upload_request"
session_idstringNoResume existing session
upload_request.filenamestringYesOriginal filename
upload_request.mime_typestringYesMIME type of the file
upload_request.size_bytesnumberYesFile size in bytes

The server responds with an upload_ready message containing the upload URL. After uploading the file via HTTP PUT, the client can reference it using the storage_ref in subsequent messages.

Messages sent from server to client.

Sent immediately after connection:

{
"type": "connected",
"session_id": "sess-abc123"
}

When binary=true is specified in the connection URL, the connected message includes capabilities:

{
"type": "connected",
"session_id": "sess-abc123",
"connected": {
"capabilities": {
"binary_frames": true,
"max_payload_size": 524288,
"protocol_version": 1
}
}
}
FieldTypeDescription
connected.capabilities.binary_framesbooleanServer supports binary WebSocket frames
connected.capabilities.max_payload_sizenumberMaximum payload size in bytes
connected.capabilities.protocol_versionnumberBinary protocol version

Streaming response chunk:

{
"type": "chunk",
"content": "Hello! I'm doing"
}

Final response completion:

{
"type": "done",
"content": "Hello! I'm doing great, thank you for asking!"
}

For responses containing media (e.g., generated images), the server uses the parts array:

{
"type": "done",
"session_id": "sess-abc123",
"parts": [
{
"type": "text",
"text": "Here's the image you requested:"
},
{
"type": "image",
"media": {
"url": "https://storage.example.com/generated/img-123.png",
"mime_type": "image/png",
"width": 1024,
"height": 1024
}
}
]
}

Note: When parts is present, it takes precedence over content. For backward compatibility, text-only responses may use either format.

Agent is calling a tool:

{
"type": "tool_call",
"tool_call": {
"id": "tc-123",
"name": "weather",
"arguments": {
"location": "San Francisco"
}
}
}

Result from a tool call:

{
"type": "tool_result",
"tool_result": {
"id": "tc-123",
"result": "72°F, Sunny"
}
}

Response to an upload_request with the upload URL:

{
"type": "upload_ready",
"session_id": "sess-abc123",
"upload_ready": {
"upload_id": "upl-xyz789",
"upload_url": "http://agent.example.com/media/upload/upl-xyz789",
"storage_ref": "omnia://sessions/sess-abc123/media/med-def456",
"expires_at": "2025-01-09T12:00:00Z"
}
}
FieldTypeDescription
upload_ready.upload_idstringUnique upload identifier
upload_ready.upload_urlstringURL to PUT the file content
upload_ready.storage_refstringStorage reference for the uploaded file
upload_ready.expires_atstringWhen the upload URL expires (ISO 8601)

Notification that a file upload has completed successfully:

{
"type": "upload_complete",
"session_id": "sess-abc123",
"upload_complete": {
"upload_id": "upl-xyz789",
"storage_ref": "omnia://sessions/sess-abc123/media/med-def456",
"size_bytes": 102400
}
}
FieldTypeDescription
upload_complete.upload_idstringUpload identifier
upload_complete.storage_refstringStorage reference for the uploaded file
upload_complete.size_bytesnumberActual file size in bytes

Streaming media chunk for audio/video responses. Allows playback to begin before the entire media is generated:

{
"type": "media_chunk",
"session_id": "sess-abc123",
"media_chunk": {
"media_id": "audio-xyz789",
"sequence": 0,
"is_last": false,
"data": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVVVVVVVVVV...",
"mime_type": "audio/mp3"
}
}
FieldTypeDescription
media_chunk.media_idstringUnique identifier for the media stream
media_chunk.sequencenumberSequence number for ordering (0-indexed)
media_chunk.is_lastbooleanWhether this is the final chunk
media_chunk.datastringBase64-encoded chunk data
media_chunk.mime_typestringMIME type (e.g., “audio/mp3”, “video/mp4”)

The client should:

  1. Buffer chunks by media_id and sequence
  2. Begin playback once sufficient data is buffered
  3. Assemble the complete media when is_last: true is received
  4. The final done message may include a complete media URL for replay

Error message:

{
"type": "error",
"error": {
"code": "INVALID_MESSAGE",
"message": "Failed to parse message"
}
}
CodeDescription
INVALID_MESSAGEMessage format is invalid
SESSION_NOT_FOUNDSpecified session doesn’t exist
PROVIDER_ERRORLLM provider returned an error
TOOL_ERRORTool execution failed
INTERNAL_ERRORInternal server error
UPLOAD_FAILEDFile upload operation failed
MEDIA_NOT_ENABLEDMedia storage is not enabled on the facade
sequenceDiagram
participant C as Client
participant S as Server
C->>S: WebSocket connect
S-->>C: connected (session_id)
C->>S: message
S-->>C: chunk
S-->>C: chunk
S-->>C: done
sequenceDiagram
participant C as Client
participant S as Server
participant T as Tool Service
C->>S: message
S->>T: Execute tool
S-->>C: tool_call
T-->>S: Result
S-->>C: tool_result
S-->>C: chunk
S-->>C: done
sequenceDiagram
participant C as Client
participant S as Server
C->>S: upload_request
S-->>C: upload_ready (upload_url, storage_ref)
C->>S: PUT file to upload_url (HTTP)
C->>S: message with storage_ref
S-->>C: chunk
S-->>C: done

This flow shows uploading a file via WebSocket before sending a message that references it. The client:

  1. Sends an upload_request via WebSocket
  2. Receives upload_ready with the upload URL
  3. PUTs the file content to the upload URL via HTTP
  4. Sends a message with the storage_ref in the media content
sequenceDiagram
participant C as Client
participant S as Server
participant A as Agent (TTS/Video Gen)
C->>S: message ("Read this text aloud")
S->>A: Generate audio
A-->>S: Audio chunk 1
S-->>C: media_chunk (seq=0)
Note over C: Begin playback
A-->>S: Audio chunk 2
S-->>C: media_chunk (seq=1)
A-->>S: Audio chunk 3 (final)
S-->>C: media_chunk (seq=2, is_last=true)
S-->>C: done (with complete media URL)

This flow shows streaming audio/video responses. The client:

  1. Sends a message requesting audio/video generation
  2. Receives media_chunk messages as data becomes available
  3. Buffers chunks by media_id and sequence number
  4. Begins playback once sufficient data is buffered
  5. Assembles the complete media when is_last: true is received
  6. Optionally uses the complete media URL from the done message for replay
sequenceDiagram
participant C as Client
participant S as Server
participant R as Session Store
C->>S: WebSocket connect
S-->>C: connected (session_id)
C->>S: message (with session_id)
S->>R: Load session history
R-->>S: History
S-->>C: done (with context)

Omit session_id to create a new session:

{"type": "message", "content": "Hello"}

The server responds with a connected message containing the new session ID.

Include session_id to resume:

{
"type": "message",
"session_id": "sess-abc123",
"content": "Continue our conversation"
}

If the session exists and hasn’t expired, conversation history is preserved.

Sessions expire based on the AgentRuntime’s session.ttl configuration. Attempting to resume an expired session creates a new one.

When facade media storage is enabled, clients can upload files via HTTP before referencing them in WebSocket messages. This avoids base64-encoding large files in the WebSocket protocol.

sequenceDiagram
participant C as Client
participant F as Facade
C->>F: POST /media/request-upload
F-->>C: {upload_url, storage_ref}
C->>F: PUT /media/upload/{id} (file content)
F-->>C: 204 No Content
C->>F: WebSocket message with storage_ref
Terminal window
POST /media/request-upload
Content-Type: application/json
{
"session_id": "sess-abc123",
"filename": "photo.jpg",
"mime_type": "image/jpeg",
"size_bytes": 102400
}

Response:

{
"upload_url": "http://agent.example.com/media/upload/upl-xyz789",
"upload_id": "upl-xyz789",
"storage_ref": "omnia://sessions/sess-abc123/media/med-def456",
"expires_at": "2025-01-09T12:00:00Z"
}
Terminal window
PUT /media/upload/upl-xyz789
Content-Type: image/jpeg
<binary file content>

Response: 204 No Content on success.

{
"type": "message",
"session_id": "sess-abc123",
"parts": [
{ "type": "text", "text": "What's in this image?" },
{
"type": "image",
"media": {
"storage_ref": "omnia://sessions/sess-abc123/media/med-def456",
"mime_type": "image/jpeg"
}
}
]
}

Retrieve metadata about uploaded media:

Terminal window
GET /media/info/{session-id}/{media-id}

Response:

{
"filename": "photo.jpg",
"mime_type": "image/jpeg",
"size_bytes": 102400,
"created_at": "2025-01-09T11:00:00Z",
"expires_at": "2025-01-10T11:00:00Z"
}

Download previously uploaded media:

Terminal window
GET /media/download/{session-id}/{media-id}

Returns the file with appropriate Content-Type and Content-Disposition headers.

Note: Media upload is only available when the facade is configured with media storage. See AgentRuntime facade.media configuration for details.

When binary frame support is enabled (binary=true query parameter), the server can send binary WebSocket frames for efficient media streaming. This reduces bandwidth by approximately 33% compared to base64-encoded JSON.

┌──────────────────┬─────────────────┬──────────────────────────┐
│ Header (32 bytes)│ Metadata (JSON) │ Binary Payload │
└──────────────────┴─────────────────┴──────────────────────────┘
FieldOffsetSizeTypeDescription
Magic04bytes"OMNI" magic bytes
Version41uint8Protocol version (currently 1)
Flags51uint8Bit flags (see below)
MessageType62uint16Message type (big-endian)
MetadataLen84uint32JSON metadata length (big-endian)
PayloadLen124uint32Binary payload length (big-endian)
Sequence164uint32Sequence number (big-endian)
MediaID2012bytesMedia stream identifier
BitNameDescription
0CompressedPayload is compressed (reserved)
1ChunkedPart of a chunked transfer
2IsLastLast chunk in a stream
ValueNameDescription
1MediaChunkStreaming media chunk
2UploadBinary upload data (reserved)

When binary_frames capability is enabled, media_chunk messages may be sent as binary frames instead of JSON. The metadata contains:

{
"session_id": "sess-abc123",
"mime_type": "audio/mp3"
}

The payload contains raw binary audio/video data (not base64-encoded).

const ws = new WebSocket('ws://localhost:8080?agent=my-agent&binary=true');
ws.binaryType = 'arraybuffer';
ws.onmessage = (event) => {
if (event.data instanceof ArrayBuffer) {
// Binary frame
const view = new DataView(event.data);
const magic = new TextDecoder().decode(new Uint8Array(event.data, 0, 4));
if (magic === 'OMNI') {
const metadataLen = view.getUint32(8, false);
const payloadLen = view.getUint32(12, false);
const sequence = view.getUint32(16, false);
const isLast = (view.getUint8(5) & 0x04) !== 0;
// Extract payload (raw audio/video data)
const payload = event.data.slice(32 + metadataLen);
// Process binary media chunk...
}
} else {
// JSON text frame
const msg = JSON.parse(event.data);
// Handle JSON message...
}
};

When a client doesn’t request binary frames (binary=true not set), the server always sends JSON text frames with base64-encoded media data. This ensures backward compatibility with existing clients.

The server sends WebSocket ping frames to maintain connection health. Clients should respond with pong frames automatically (most WebSocket libraries handle this).

Default timeouts:

  • Ping interval: 30 seconds
  • Pong timeout: 60 seconds

The protocol types are defined in multiple places:

LocationPurpose
api/proto/runtime/v1/runtime.protoInternal gRPC protocol (facade ↔ runtime)
internal/facade/protocol.goWebSocket protocol (client ↔ facade)
dashboard/src/types/websocket.tsTypeScript types for dashboard
dashboard/src/lib/proto/Generated TypeScript from proto

TypeScript types can be generated from the Protocol Buffer definitions:

Terminal window
# Generate TypeScript from proto files
cd dashboard
npm run generate:proto
# Or from the root
make generate-proto-ts

The generated types are in dashboard/src/lib/proto/runtime/v1/runtime.ts and include:

  • ClientMessage / ServerMessage - Core message types
  • ContentPart / MediaContent - Multi-modal content
  • ToolCall / ToolResult - Tool invocation types
  • Helper functions: toJSON(), fromJSON(), encode(), decode()

The WebSocket protocol uses snake_case for JSON field names to match Go conventions:

{
"session_id": "...",
"mime_type": "image/png",
"size_bytes": 1024
}

The generated TypeScript types use camelCase for property names but serialize to snake_case JSON:

interface MediaContent {
mimeType: string; // TypeScript property
// Serializes to: { "mime_type": "..." }
}