- requests are discrete HTTP calls, not streaming segments, and
- routing and pricing is by pipeline and model capability instead of by pixel throughput.
This page covers how AI jobs flow through your gateway. For initial setup and startup commands, see Setup → AI Gateway Quickstart. For custom container workloads, see BYOC Pipelines.
Request flow
AISessionManager
The
AISessionManager is the gateway component responsible for session tracking, capability matching, and failover. It is the AI equivalent of the video pipeline’s BroadcastSessionsManager.
Source reference: AISessionManager
go-livepeer/server/ai_http.go
AI vs Video Comparison
Available Pipelines
Livepeer AI inference routes across three integration patterns. As a gateway operator, you do not build or implement these - you configure which orchestrators you connect to and which pipeline endpoints you expose. This page covers batch AI (Standard API). For real-time and custom container pipelines, see the links below.Standard API Pipelines
Pre-built, well-defined inference endpoints. Each pipeline has a fixed URL path and request/response schema. Your gateway routes requests to orchestrators advertising the matching pipeline and model. Example request:Pricing Control
Use-maxPricePerCapability to set per-pipeline, per-model price caps. Pass a path to a JSON file or a JSON string directly.
capabilities_prices array format:
"model_id": "default" to set a fallback cap for all models in a pipeline. Specific model entries take precedence over the default. Without this flag, the gateway accepts any price the orchestrator advertises. For full pricing details, see .
Other Pipelines
The
live-video-to-video pipeline (real-time AI via ComfyStream) uses the trickle streaming protocol, not the REST API described above. From a gateway operator perspective, ComfyStream workers appear as AI-capable orchestrators connected via -orchAddr. See for how real-time AI fits into the pipeline taxonomy.ComfyStream
Build real-time AI video workflows with ComfyUI nodes.
BYOC Pipelines
Route custom container workloads by capability - operator responsibilities, model fit, and health tracking.
Orchestrator Discovery
How your AI gateway finds orchestrators depends on your operational mode.Direct configuration (-orchAddr)
Direct configuration (-orchAddr)
The standard method for off-chain AI gateways. Specify orchestrator addresses directly:The format is
scheme://host:port. All orchestrators must be running ai-runner containers and advertising the pipelines you intend to route. If an orchestrator does not support a requested pipeline, the job fails or falls back to the next orchestrator if one is available.Most production gateways use this pattern. Operators build relationships with specific orchestrators who run the models and capabilities their applications need.Webhook discovery (-orchWebhookUrl)
Webhook discovery (-orchWebhookUrl)
Gateways can call an external service to receive a dynamic orchestrator list:The webhook returns a JSON array of orchestrator addresses. This enables custom filtering, whitelisting, or load balancing without modifying the gateway itself. Used by platform builders (NaaP) and operators with orchestrator tiering or geographic routing requirements.
On-chain discovery (-aiServiceRegistry)
On-chain discovery (-aiServiceRegistry)
On-chain AI Gateways use the AI service registry for automatic Orchestrator discovery:This is a boolean flag (no argument required). When set, the Gateway registers with the
AIServiceRegistry contract on Arbitrum Mainnet and discovers AI-capable Orchestrators on-chain. The contract address is hardcoded in go-livepeer. Requires on-chain operational mode (-network=arbitrum-one-mainnet, -ethUrl, funded ETH account).The Gateway’s /getNetworkCapabilities endpoint exposes the aggregated capability data from discovered Orchestrators.Capability Matching
When a request arrives at your gateway, theAISessionManager matches it to an orchestrator based on two criteria:
- Pipeline: does the orchestrator advertise this pipeline type? (e.g.,
text-to-image) - Model: does the orchestrator have the requested
model_idavailable?
AI Gateway Mechanisms
Source reference: ai_mediaserver.go
AISessionManager
AISessionManager
Manages AI processing sessions and selects orchestrators with matching AI capabilities. Tracks performance per orchestrator per pipeline. Handles retry and failover when an orchestrator fails mid-request. Defined in
server/ai_http.go.MediaMTX integration
MediaMTX integration
Handles media streaming for real-time AI pipelines (ComfyStream and live-video-to-video). MediaMTX manages the stream lifecycle and frame routing between the gateway and the AI worker.
Trickle protocol
Trickle protocol
Enables efficient low-latency streaming for real-time AI video pipelines. The Trickle protocol incrementally delivers video frames to the AI worker instead of buffering full segments, keeping end-to-end latency low for live AI effects.
ai_process.go workflow
ai_process.go workflow
The
server/ai_process.go file defines the core AI job workflow: authenticate the request, select a capable orchestrator, process the payment (off-chain for standard AI), and manage the live AI pipeline session. This is where per-pixel pricing calculations happen for image pipelines.Retry Logic
If an orchestrator fails to return a result (timeout, model not loaded, GPU error), theAISessionManager retries the request with the next best orchestrator.
The retry timeout is controlled by:
30s, 1m, 2m30s. For pipelines that use slow-loading models (cold starts), increase this value - otherwise the gateway will retry before the orchestrator has had time to load the model.
Off-chain vs On-chain AI
- Off-chain AI (standard)
- On-chain AI (dual gateway)
The default AI gateway mode. No ETH deposit, no TicketBroker interaction, no Arbitrum RPC required.The gateway connects directly to orchestrators via
-orchAddr. Payment works in two ways:- Remote signer with PM tickets (primary): The gateway uses the Livepeer probabilistic micropayment system with an off-chain remote signer. This is the standard production path for SPE operators.
- Direct billing (alternative): Payment is handled entirely outside the Livepeer PM system - typically via a direct billing arrangement with orchestrator operators, or by running your own orchestrator. Start command (minimum flags for direct billing):
Platform Constraints
The AI inference stack requires a CUDA/GPU toolchain that is only available on Linux. The limitation applies toai-runner containers running on orchestrators.
Workaround for Windows/macOS testing: Run the ai-runner Docker container on a Linux machine or cloud instance. Docker on Linux is the standard production deployment method.
Finding AI Models
There is no single registry that lets operators query all available models and pipelines across the entire network in one place. The-aiServiceRegistry flag (see Orchestrator Discovery) provides on-chain capability data for registered orchestrators, and the gateway’s /getNetworkCapabilities endpoint exposes this data. For off-chain gateways, discovery relies on direct communication and community tools.
Related Pages
Pipelines Guide
Pipeline taxonomy - video, batch AI, real-time AI, and BYOC.
BYOC Pipelines
Route custom container workloads by capability - operator responsibilities, model fit, and health tracking.
Pipeline Configuration
AI routing flags, retry timeouts, and per-pipeline pricing reference.
Model Support
Full compatibility matrix - supported pipeline types, model architectures, and VRAM requirements.
Workload Fit
Decision framework for evaluating whether your AI workload belongs on Livepeer.
AI API Reference
Full endpoint reference for all AI pipelines.