Skip to main content
AI Gateways receive HTTP inference requests, match each one to a capable orchestrator, and return the result to the client. No ETH deposit is required for standard off-chain AI operation. The pipeline differs fundamentally from video transcoding:
  • requests are discrete HTTP calls, not streaming segments, and
  • routing and pricing is by pipeline and model capability instead of by pixel throughput.
This page covers how AI jobs flow through your gateway. For initial setup and startup commands, see Setup → AI Gateway Quickstart. For custom container workloads, see BYOC Pipelines.

Request flow

AISessionManager

The AISessionManager is the gateway component responsible for session tracking, capability matching, and failover. It is the AI equivalent of the video pipeline’s BroadcastSessionsManager.

Source reference: AISessionManager

go-livepeer/server/ai_http.go

AI vs Video Comparison

Available Pipelines

Livepeer AI inference routes across three integration patterns. As a gateway operator, you do not build or implement these - you configure which orchestrators you connect to and which pipeline endpoints you expose. This page covers batch AI (Standard API). For real-time and custom container pipelines, see the links below.

Standard API Pipelines

Pre-built, well-defined inference endpoints. Each pipeline has a fixed URL path and request/response schema. Your gateway routes requests to orchestrators advertising the matching pipeline and model. Example request:
curl -X POST http://localhost:8935/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "prompt": "a photograph of a coastal village at dusk",
    "width": 1024,
    "height": 1024,
    "num_inference_steps": 6,
    "guidance_scale": 1.5
  }'
Lightning-suffix models (such as RealVisXL_V4.0_Lightning) use fewer inference steps (4-8) and a lower guidance scale (1.0-2.0). Standard SDXL models need 20-50 steps and guidance 7.0-9.0. Use the correct parameters for the model family or image quality will degrade.

Pricing Control

Use -maxPricePerCapability to set per-pipeline, per-model price caps. Pass a path to a JSON file or a JSON string directly.
livepeer \
  -gateway \
  -orchAddr https://orch1.example.com:8935 \
  -maxPricePerCapability /path/to/aiPricing.json
The JSON uses the capabilities_prices array format:
{
  "capabilities_prices": [
    {
      "pipeline": "text-to-image",
      "model_id": "SG161222/RealVisXL_V4.0_Lightning",
      "price_per_unit": 4768371,
      "pixels_per_unit": 1
    }
  ]
}
Use "model_id": "default" to set a fallback cap for all models in a pipeline. Specific model entries take precedence over the default. Without this flag, the gateway accepts any price the orchestrator advertises. For full pricing details, see .

Other Pipelines

The live-video-to-video pipeline (real-time AI via ComfyStream) uses the trickle streaming protocol, not the REST API described above. From a gateway operator perspective, ComfyStream workers appear as AI-capable orchestrators connected via -orchAddr. See for how real-time AI fits into the pipeline taxonomy.

Orchestrator Discovery

How your AI gateway finds orchestrators depends on your operational mode.
The standard method for off-chain AI gateways. Specify orchestrator addresses directly:
-orchAddr https://orch1.example.com:8935,https://orch2.example.com:8935
The format is scheme://host:port. All orchestrators must be running ai-runner containers and advertising the pipelines you intend to route. If an orchestrator does not support a requested pipeline, the job fails or falls back to the next orchestrator if one is available.Most production gateways use this pattern. Operators build relationships with specific orchestrators who run the models and capabilities their applications need.
Gateways can call an external service to receive a dynamic orchestrator list:
-orchWebhookUrl https://your-service.example.com/orchestrators
The webhook returns a JSON array of orchestrator addresses. This enables custom filtering, whitelisting, or load balancing without modifying the gateway itself. Used by platform builders (NaaP) and operators with orchestrator tiering or geographic routing requirements.
On-chain AI Gateways use the AI service registry for automatic Orchestrator discovery:
-aiServiceRegistry
This is a boolean flag (no argument required). When set, the Gateway registers with the AIServiceRegistry contract on Arbitrum Mainnet and discovers AI-capable Orchestrators on-chain. The contract address is hardcoded in go-livepeer. Requires on-chain operational mode (-network=arbitrum-one-mainnet, -ethUrl, funded ETH account).The Gateway’s /getNetworkCapabilities endpoint exposes the aggregated capability data from discovered Orchestrators.
Check tools.livepeer.cloud/ai/network-capabilities before selecting a model ID. Models already loaded in GPU memory (warm models) return results significantly faster than cold models that must load from disk first.

Capability Matching

When a request arrives at your gateway, the AISessionManager matches it to an orchestrator based on two criteria:
  1. Pipeline: does the orchestrator advertise this pipeline type? (e.g., text-to-image)
  2. Model: does the orchestrator have the requested model_id available?
If no orchestrator matches both criteria, the request fails with an error. If multiple orchestrators match, the manager routes to the best-performing one based on latency history and current load. Warm vs cold models: Orchestrators load models into GPU memory. A model loaded and ready to serve immediately is “warm”. A model that must be downloaded or loaded from disk before it can serve is “cold”. Cold starts add seconds to minutes of latency for the first request. Warm models respond in milliseconds. During the current beta phase, orchestrators support one warm model per GPU.

AI Gateway Mechanisms

Source reference: ai_mediaserver.go

The AI pipeline uses multiple components that do not exist in the video pipeline:
Manages AI processing sessions and selects orchestrators with matching AI capabilities. Tracks performance per orchestrator per pipeline. Handles retry and failover when an orchestrator fails mid-request. Defined in server/ai_http.go.
Handles media streaming for real-time AI pipelines (ComfyStream and live-video-to-video). MediaMTX manages the stream lifecycle and frame routing between the gateway and the AI worker.
Enables efficient low-latency streaming for real-time AI video pipelines. The Trickle protocol incrementally delivers video frames to the AI worker instead of buffering full segments, keeping end-to-end latency low for live AI effects.
The server/ai_process.go file defines the core AI job workflow: authenticate the request, select a capable orchestrator, process the payment (off-chain for standard AI), and manage the live AI pipeline session. This is where per-pixel pricing calculations happen for image pipelines.

Retry Logic

If an orchestrator fails to return a result (timeout, model not loaded, GPU error), the AISessionManager retries the request with the next best orchestrator. The retry timeout is controlled by:
-aiProcessingRetryTimeout 30s
The value accepts Go duration format: 30s, 1m, 2m30s. For pipelines that use slow-loading models (cold starts), increase this value - otherwise the gateway will retry before the orchestrator has had time to load the model.

Off-chain vs On-chain AI

The default AI gateway mode. No ETH deposit, no TicketBroker interaction, no Arbitrum RPC required.The gateway connects directly to orchestrators via -orchAddr. Payment works in two ways:
  • Remote signer with PM tickets (primary): The gateway uses the Livepeer probabilistic micropayment system with an off-chain remote signer. This is the standard production path for SPE operators.
  • Direct billing (alternative): Payment is handled entirely outside the Livepeer PM system - typically via a direct billing arrangement with orchestrator operators, or by running your own orchestrator. Start command (minimum flags for direct billing):
livepeer \
  -gateway \
  -orchAddr https://orch1.example.com:8935 \
  -httpAddr 0.0.0.0:8935 \
  -httpIngest \
  -v 6

Platform Constraints

The Livepeer AI binary is Linux-only. Windows and macOS builds are not available. This constraint affects orchestrators, not gateways - but it limits your testing options if you do not have a Linux machine available.
The AI inference stack requires a CUDA/GPU toolchain that is only available on Linux. The limitation applies to ai-runner containers running on orchestrators. Workaround for Windows/macOS testing: Run the ai-runner Docker container on a Linux machine or cloud instance. Docker on Linux is the standard production deployment method.
# Docker on Linux - standard AI gateway deployment
docker run \
  --name livepeer_ai_gateway \
  -v ~/.lpData2/:/root/.lpData2 \
  -p 8935:8935 \
  --network host \
  livepeer/go-livepeer:master \
  -datadir ~/.lpData2 \
  -gateway \
  -orchAddr https://orch1.example.com:8935 \
  -httpAddr 0.0.0.0:8935 \
  -httpIngest \
  -v 6

Finding AI Models

There is no single registry that lets operators query all available models and pipelines across the entire network in one place. The -aiServiceRegistry flag (see Orchestrator Discovery) provides on-chain capability data for registered orchestrators, and the gateway’s /getNetworkCapabilities endpoint exposes this data. For off-chain gateways, discovery relies on direct communication and community tools.
Last modified on March 16, 2026