How AI inference works on the Livepeer network – pipeline types, the batch vs live-video distinction, hardware requirements, and how jobs flow from application to your GPU node.
Livepeer’s AI subnet launched in Q3 2024 and has grown into a major source of new fee revenue for Orchestrators. It turns GPU nodes into open, composable inference infrastructure that serves image generation, live-video effects, and large language model completions.AI workloads reach your node through Gateway routing, capability advertisement, and container-based inference. The core operator distinction is between batch inference and live-video inference because the hardware profile and routing logic differ.
Low-LPT entry path: AI inference is often a better starting point than solo video orchestration when stake is limited. Capability, pricing, and latency matter more than active-set position for many AI jobs.
Applications never communicate with Orchestrators directly. Every request flows through a Gateway, which handles authentication, pricing negotiation, and routing to qualified nodes.
Application / User ↓Gateway (routing, auth, pricing, QoS) ↓Orchestrator (go-livepeer + AI Worker) ↓AI Runner container (your GPU + model) ↓Result returned through gateway
Your Orchestrator advertises capabilities – which pipelines it supports and at what price – and Gateways route matching jobs to it. You never build a marketplace, billing system, or authentication layer. You run excellent inference infrastructure.When a Gateway selects your Orchestrator, it is because your combination of capability, pricing, latency, and uptime made you the best option for that specific request.
Gateways discover Orchestrators through the OrchestratorInfo structure, which your node broadcasts and updates on-chain. The key fields that determine whether you receive AI jobs are:Gateway pricing is a hard gate. Gateways configure a maximum price they will pay per capability using the -maxPricePerCapability JSON flag. A pipeline priced above that maximum receives no jobs from that Gateway, regardless of hardware quality.Before setting prices in aiModels.json, check what prices the major Gateways are using. See Models and VRAM Reference for a pricing reference table and Gateway Orchestrator Offerings for the full capability discovery protocol documentation.For the complete list of supported pipelines and their model architectures, see AI Model Support in the Developers section.
The most important distinction for operators is between batch AI and live-video AI. These are different job types with different hardware profiles, different runtime architectures, and different operational characteristics.
Batch AI
Request-response inference. An application sends a prompt or media file, your node processes it and returns the result. Includes text-to-image, audio-to-text, image-to-video, LLM completions, and more.
Cascade live-video AI
Continuous frame-by-frame video transformation. Live video streams in, processed video streams out with sub-100ms latency. Used for live AI effects, generative video overlays, and streaming AI agents.
Livepeer’s AI worker supports ten pipeline types. Each pipeline handles a specific class of inference task, with its own model format, VRAM floor, and pricing unit.
text-to-image - Generate images from text prompts
The most widely used batch AI pipeline on the network. Takes a text prompt and sampling parameters, returns a generated image.Minimum VRAM: 24 GB
Pricing unit: Per output pixel
Recommended model:SG161222/RealVisXL_V4.0_LightningTypical hardware: RTX 3090, RTX 4090, A5000Diffusion models (Stable Diffusion, SDXL variants) run natively on the managed livepeer/ai-runner container. The Lightning and Turbo variants reduce step count to deliver results in under 2 seconds on an RTX 4090.Source:SG161222/RealVisXL_V4.0_Lightning on HuggingFace
image-to-image - Style transfer and transformation
Takes an input image and applies diffusion-based transformation, style transfer, or enhancement. Used for artistic style application, image enhancement, and controlled generation.Minimum VRAM: 24 GB
Pricing unit: Per output pixel
Recommended model: SDXL variants, ByteDance/SDXL-LightningTypical hardware: RTX 3090, RTX 4090
image-to-video - Animate a still image
Generates a short video clip from a single input image. Significantly more VRAM and compute-intensive than image-to-image.Minimum VRAM: 24 GB
Pricing unit: Per output pixel
Typical hardware: RTX 4090, A100
image-to-text - Vision-language captioning
Takes an image and returns a text description. Lower VRAM floor makes this accessible to operators with older consumer cards.Minimum VRAM: 4 GB
Pricing unit: Per input pixel
Recommended model:Salesforce/blip-image-captioning-largeTypical hardware: RTX 2060, GTX 1080 (as secondary pipeline)
audio-to-text - Speech recognition and transcription
Runs Whisper-class speech recognition with timestamps. Widely used for transcription, captioning, and audio search.Minimum VRAM: 12 GB
Pricing unit: Per millisecond of audio
Recommended model:openai/whisper-large-v3Typical hardware: RTX 3060 12 GB, RTX 3080 10 GBSource:openai/whisper-large-v3 on HuggingFace
segment-anything-2 - Promptable segmentation
Pixel-level object segmentation using SAM2. Takes a prompt (point, box, or mask) and returns a segmentation mask over the input image or video frame.Recommended model: SAM2 variants
Source:facebookresearch/segment-anything-2 on GitHub
text-to-speech - Natural speech synthesis
Converts text to natural speech audio. Growing use case for AI-generated video narration and interactive media.Pricing unit: Per character / per millisecond of output audio
upscale - Resolution enhancement
Upscales low-resolution input to high resolution using diffusion-based super-resolution.Recommended model:stabilityai/stable-diffusion-x4-upscalerPricing unit: Per input pixel
llm - Large language model inference
OpenAI-compatible text completion endpoint backed by an Ollama-based runner. Runs quantised LLMs with as little as 8 GB VRAM, making it accessible to operators with older consumer GPUs that are unsuitable for diffusion pipelines.Minimum VRAM: 8 GB
Pricing unit: Per custom unit (typically per million tokens)
Recommended model:meta-llama/Meta-Llama-3.1-8B-Instruct (via Ollama)
Typical hardware: GTX 1070 Ti, GTX 1080, RTX 2060The LLM pipeline uses a separate runner architecture from the standard livepeer/ai-runner image. See Batch AI Setup for the Ollama deployment guide.Source:Cloud SPE Ollama runner blog post
live-video-to-video - Cascade streaming AI
Continuous frame-by-frame transformation of live video streams. This pipeline takes a WebRTC stream as input and returns a transformed WebRTC stream with sub-100ms per-frame latency.Minimum VRAM: 24 GB recommended
Pricing unit: Per frame
Runtime:livepeer/ai-runner:live-base + ComfyStream
Typical hardware: RTX 4090, A100, H100This pipeline powers the Cascade architecture – Livepeer’s live-video AI system. It supports live AI effects, live style transfer, and streaming AI agents.Source:ComfyStream on GitHub
These are minimum requirements. Running at the minimum will result in longer cold-start times and reduced job competitiveness. The figures below reflect production-ready recommendations.
The network capabilities tool shows all registered Orchestrators and their advertised pipelines currently visible on the network. Before your Orchestrator receives AI jobs, it must appear there under at least one pipeline.