AI Pipelines on Livepeer

AI Gateways receive HTTP inference requests, match each one to a capable Orchestrator, and return the result to the client. No ETH deposit is required for standard off-chain AI operation. The pipeline differs fundamentally from video transcoding:

requests are discrete HTTP calls, not streaming segments, and
routing and pricing is by pipeline and model capability instead of by pixel throughput.

how AI jobs flow through your Gateway. For initial setup and startup commands, see Setup → AI Gateway Quickstart. For custom container workloads, see BYOC Pipelines.

Request flow

AISessionManager

The AISessionManager is the Gateway component responsible for session tracking, capability matching, and failover. It is the AI equivalent of the video pipeline’s BroadcastSessionsManager.

Source reference: AISessionManager

go-livepeer/server/ai_http.go

AI vs Video Comparison

Available Pipelines

Livepeer AI inference routes across three integration patterns. As a Gateway operator, you do not build or implement these - you configure which Orchestrators you connect to and which pipeline endpoints you expose. Batch AI (Standard API). For real-time and custom container pipelines, see the links below.

Standard API Pipelines

Pre-built, well-defined inference endpoints. Each pipeline has a fixed URL path and request/response schema. Your Gateway routes requests to Orchestrators advertising the matching pipeline and model. Example request:

curl -X POST http://localhost:8935/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "prompt": "a photograph of a coastal village at dusk",
    "width": 1024,
    "height": 1024,
    "num_inference_steps": 6,
    "guidance_scale": 1.5
  }'

Lightning-suffix models (such as RealVisXL_V4.0_Lightning) use fewer inference steps (4-8) and a lower guidance scale (1.0-2.0). Standard SDXL models need 20-50 steps and guidance 7.0-9.0. Use the correct parameters for the model family or image quality will degrade.

Pricing Control

Use -maxPricePerCapability to set per-pipeline, per-model price caps. Pass a path to a JSON file or a JSON string directly.

livepeer \
  -gateway \
  -orchAddr https://orch1.example.com:8935 \
  -maxPricePerCapability /path/to/aiPricing.json

The JSON uses the capabilities_prices array format:

{
  "capabilities_prices": [
    {
      "pipeline": "text-to-image",
      "model_id": "SG161222/RealVisXL_V4.0_Lightning",
      "price_per_unit": 4768371,
      "pixels_per_unit": 1
    }
  ]
}

Use "model_id": "default" to set a fallback cap for all models in a pipeline. Specific model entries take precedence over the default. Without this flag, the Gateway accepts any price the Orchestrator advertises. For full pricing details, see .

Other Pipelines

The live-video-to-video pipeline (real-time AI via ComfyStream) uses the trickle streaming protocol, not the REST API described above. From a Gateway operator perspective, ComfyStream workers appear as AI-capable Orchestrators connected via -orchAddr. See for how real-time AI fits into the pipeline taxonomy.

ComfyStream

Build real-time AI video workflows with ComfyUI nodes.

BYOC Pipelines

Route custom container workloads by capability - operator responsibilities, model fit, and health tracking.

Orchestrator Discovery

How your AI Gateway finds Orchestrators depends on your operational mode.

Direct configuration (-orchAddr)

The standard method for off-chain AI Gateways. Specify Orchestrator addresses directly:

-orchAddr https://orch1.example.com:8935,https://orch2.example.com:8935

The format is scheme://host:port. All Orchestrators must be running ai-runner containers and advertising the pipelines you intend to route. If an Orchestrator does not support a requested pipeline, the job fails or falls back to the next Orchestrator if one is available.Most production Gateways use this pattern. Operators build relationships with specific Orchestrators who run the models and capabilities their applications need.

Webhook discovery (-orchWebhookUrl)

Gateways can call an external service to receive a dynamic Orchestrator list:

-orchWebhookUrl https://your-service.example.com/orchestrators

The webhook returns a JSON array of Orchestrator addresses. This enables custom filtering, whitelisting, or load balancing without modifying the Gateway itself. Used by platform builders (NaaP) and operators with Orchestrator tiering or geographic routing requirements.

On-chain discovery (-aiServiceRegistry)

On-chain AI Gateways use the AI service registry for automatic Orchestrator discovery:

-aiServiceRegistry

This is a boolean flag (no argument required). When set, the Gateway registers with the AIServiceRegistry contract on Arbitrum Mainnet and discovers AI-capable Orchestrators on-chain. The contract address is hardcoded in go-livepeer. Requires on-chain operational mode (-network=arbitrum-one-mainnet, -ethUrl, funded ETH account).The Gateway’s /getNetworkCapabilities endpoint exposes the aggregated capability data from discovered Orchestrators.

Check tools.Livepeer.cloud/ai/network-capabilities before selecting a model ID. Models already loaded in GPU memory (warm models) return results significantly faster than cold models that must load from disk first.

Capability Matching

When a request arrives at your Gateway, the AISessionManager matches it to an Orchestrator based on two criteria:

Pipeline: does the Orchestrator advertise this pipeline type? (e.g., text-to-image)
Model: does the Orchestrator have the requested model_id available?

If no Orchestrator matches both criteria, the request fails with an error. If multiple Orchestrators match, the manager routes to the best-performing one based on latency history and current load. Warm vs cold models: Orchestrators load models into GPU memory. A model loaded and ready to serve immediately is “warm”. A model that must be downloaded or loaded from disk before it can serve is “cold”. Cold starts add seconds to minutes of latency for the first request. Warm models respond in milliseconds. During the current beta phase, Orchestrators support one warm model per GPU.

AI Gateway Mechanisms

Source reference: ai_mediaserver.go

The AI pipeline uses multiple components that do not exist in the video pipeline:

AISessionManager

Manages AI processing sessions and selects Orchestrators with matching AI capabilities. Tracks performance per Orchestrator per pipeline. Handles retry and failover when an Orchestrator fails mid-request. Defined in server/ai_http.go.

MediaMTX integration

Handles media streaming for real-time AI pipelines (ComfyStream and live-video-to-video). MediaMTX manages the stream lifecycle and frame routing between the Gateway and the AI worker.

Trickle protocol

Enables efficient low-latency streaming for real-time AI video pipelines. The Trickle protocol incrementally delivers video frames to the AI worker instead of buffering full segments, keeping end-to-end latency low for live AI effects.

ai_process.go workflow

The server/ai_process.go file defines the core AI job workflow: authenticate the request, select a capable Orchestrator, process the payment (off-chain for standard AI), and manage the live AI pipeline session. This is where per-pixel pricing calculations happen for image pipelines.

Retry Logic

If an Orchestrator fails to return a result (timeout, model not loaded, GPU error), the AISessionManager retries the request with the next best Orchestrator. The retry timeout is controlled by:

-aiProcessingRetryTimeout 30s

The value accepts Go duration format: 30s, 1m, 2m30s. For pipelines that use slow-loading models (cold starts), increase this value - otherwise the Gateway will retry before the Orchestrator has had time to load the model.

Off-chain vs On-chain AI

Off-chain AI (standard)
On-chain AI (dual gateway)

The default AI Gateway mode. No ETH deposit, no TicketBroker interaction, no Arbitrum RPC required.The Gateway connects directly to Orchestrators via -orchAddr. Payment works in two ways:

Remote signer with PM tickets (primary): The Gateway uses the Livepeer probabilistic micropayment system with an off-chain remote signer. This is the standard production path for SPE operators.
Direct billing (alternative): Payment is handled entirely outside the Livepeer PM system - typically via a direct billing arrangement with Orchestrator operators, or by running your own Orchestrator. Start command (minimum flags for direct billing):

livepeer \
  -gateway \
  -orchAddr https://orch1.example.com:8935 \
  -httpAddr 0.0.0.0:8935 \
  -httpIngest \
  -v 6

When running a dual Gateway (AI + video, on-chain), the PM payment system applies to AI jobs as well. ETH deposit and reserve are required. The Gateway uses the same ticket-based payment mechanism as video transcoding.Additional flags required for on-chain AI:

-network arbitrum-one-mainnet
-ethUrl https://arb1.arbitrum.io/rpc    # replace with your own RPC
-ethKeystorePath /root/.lpData/arbitrum-one-mainnet/keystore
-ethAcctAddr <YOUR_ETH_ADDRESS>
-ethPassword /root/.lpData/.eth_secret

See Dual Gateway Configuration for the complete on-chain AI setup guide.

Platform Constraints

The Livepeer AI binary is Linux-only. Windows and macOS builds are not available. This constraint affects Orchestrators, not Gateways - but it limits your testing options if you do not have a Linux machine available.

The AI inference stack requires a CUDA/GPU toolchain that is only available on Linux. The limitation applies to ai-runner containers running on Orchestrators. Workaround for Windows/macOS testing: Run the ai-runner Docker container on a Linux machine or cloud instance. Docker on Linux is the standard production deployment method.

# Docker on Linux - standard AI gateway deployment
docker run \
  --name livepeer_ai_gateway \
  -v ~/.lpData2/:/root/.lpData2 \
  -p 8935:8935 \
  --network host \
  livepeer/go-livepeer:master \
  -datadir ~/.lpData2 \
  -gateway \
  -orchAddr https://orch1.example.com:8935 \
  -httpAddr 0.0.0.0:8935 \
  -httpIngest \
  -v 6

Finding AI Models

There is no single registry that lets operators query all available models and pipelines across the entire network in one place. The -aiServiceRegistry flag (see Orchestrator Discovery) provides on-chain capability data for registered Orchestrators, and the Gateway’s /getNetworkCapabilities endpoint exposes this data. For off-chain Gateways, discovery relies on direct communication and community tools.

Pipelines Guide

Pipeline taxonomy - video, batch AI, real-time AI, and BYOC.

BYOC Pipelines

Route custom container workloads by capability - operator responsibilities, model fit, and health tracking.

Pipeline Configuration

AI routing flags, retry timeouts, and per-pipeline pricing reference.

Model Support

Full compatibility matrix - supported pipeline types, model architectures, and VRAM requirements.

Workload Fit

Decision framework for evaluating whether your AI workload belongs on Livepeer.

AI API Reference

Full endpoint reference for all AI pipelines.

​Request flow

Source reference: AISessionManager

​AI vs Video Comparison

​Available Pipelines

​Standard API Pipelines

​Pricing Control

​Other Pipelines

ComfyStream

BYOC Pipelines

​Orchestrator Discovery

​Capability Matching

​AI Gateway Mechanisms

Source reference: ai_mediaserver.go

​Retry Logic

​Off-chain vs On-chain AI

​Platform Constraints

​Finding AI Models

​Related Pages

Pipelines Guide

BYOC Pipelines

Pipeline Configuration

Model Support

Workload Fit

AI API Reference

Request flow

AI vs Video Comparison

Available Pipelines

Standard API Pipelines

Pricing Control

Other Pipelines

Orchestrator Discovery

Capability Matching

AI Gateway Mechanisms

Retry Logic

Off-chain vs On-chain AI

Platform Constraints

Finding AI Models

Related Pages