Skip to main content
Livepeer’s AI subnet launched in Q3 2024 and has grown into a major source of new fee revenue for orchestrators. It turns GPU nodes into open, composable inference infrastructure that serves image generation, live-video effects, and large language model completions. AI workloads reach your node through gateway routing, capability advertisement, and container-based inference. The core operator distinction is between batch inference and live-video inference because the hardware profile and routing logic differ.
Low-LPT entry path: AI inference is often a better starting point than solo video orchestration when stake is limited. Capability, pricing, and latency matter more than active-set position for many AI jobs.

How the network routes AI jobs

Applications never communicate with orchestrators directly. Every request flows through a gateway, which handles authentication, pricing negotiation, and routing to qualified nodes.
Application / User

Gateway (routing, auth, pricing, QoS)

Orchestrator (go-livepeer + AI Worker)

AI Runner container (your GPU + model)

Result returned through gateway
Your orchestrator advertises capabilities — which pipelines it supports and at what price — and gateways route matching jobs to it. You never build a marketplace, billing system, or authentication layer. You run excellent inference infrastructure. When a gateway selects your orchestrator, it is because your combination of capability, pricing, latency, and uptime made you the best option for that specific request.

How gateway selection actually works

Gateways discover orchestrators through the OrchestratorInfo structure, which your node broadcasts and updates on-chain. The key fields that determine whether you receive AI jobs are: Gateway pricing is a hard gate. Gateways configure a maximum price they will pay per capability using the -maxPricePerCapability JSON flag. A pipeline priced above that maximum receives no jobs from that gateway, regardless of hardware quality. Before setting prices in aiModels.json, check what prices the major gateways are using. See Models and VRAM Reference for a pricing reference table and Gateway Orchestrator Offerings for the full capability discovery protocol documentation. For the complete list of supported pipelines and their model architectures, see AI Model Support in the Developers section.

The two workload types

The most important distinction for operators is between batch AI and live-video AI. These are different job types with different hardware profiles, different runtime architectures, and different operational characteristics.

Comparison

AI pipeline types

Livepeer’s AI worker supports ten pipeline types. Each pipeline handles a specific class of inference task, with its own model format, VRAM floor, and pricing unit.
The most widely used batch AI pipeline on the network. Takes a text prompt and sampling parameters, returns a generated image.Minimum VRAM: 24 GB Pricing unit: Per output pixel Recommended model: SG161222/RealVisXL_V4.0_Lightning Typical hardware: RTX 3090, RTX 4090, A5000Diffusion models (Stable Diffusion, SDXL variants) run natively on the managed livepeer/ai-runner container. The Lightning and Turbo variants reduce step count to deliver results in under 2 seconds on an RTX 4090.Source: SG161222/RealVisXL_V4.0_Lightning on HuggingFace
Takes an input image and applies diffusion-based transformation, style transfer, or enhancement. Used for artistic style application, image enhancement, and controlled generation.Minimum VRAM: 24 GB Pricing unit: Per output pixel Recommended model: SDXL variants, ByteDance/SDXL-Lightning Typical hardware: RTX 3090, RTX 4090
Generates a short video clip from a single input image. Significantly more VRAM and compute-intensive than image-to-image.Minimum VRAM: 24 GB Pricing unit: Per output pixel Typical hardware: RTX 4090, A100
Takes an image and returns a text description. Lower VRAM floor makes this accessible to operators with older consumer cards.Minimum VRAM: 4 GB Pricing unit: Per input pixel Recommended model: Salesforce/blip-image-captioning-large Typical hardware: RTX 2060, GTX 1080 (as secondary pipeline)
Runs Whisper-class speech recognition with timestamps. Widely used for transcription, captioning, and audio search.Minimum VRAM: 12 GB Pricing unit: Per millisecond of audio Recommended model: openai/whisper-large-v3 Typical hardware: RTX 3060 12 GB, RTX 3080 10 GBSource: openai/whisper-large-v3 on HuggingFace
Pixel-level object segmentation using SAM2. Takes a prompt (point, box, or mask) and returns a segmentation mask over the input image or video frame.Recommended model: SAM2 variants Source: facebookresearch/segment-anything-2 on GitHub
Converts text to natural speech audio. Growing use case for AI-generated video narration and interactive media.Pricing unit: Per character / per millisecond of output audio
Upscales low-resolution input to high resolution using diffusion-based super-resolution.Recommended model: stabilityai/stable-diffusion-x4-upscaler Pricing unit: Per input pixel
OpenAI-compatible text completion endpoint backed by an Ollama-based runner. Runs quantised LLMs with as little as 8 GB VRAM, making it accessible to operators with older consumer GPUs that are unsuitable for diffusion pipelines.Minimum VRAM: 8 GB Pricing unit: Per custom unit (typically per million tokens) Recommended model: meta-llama/Meta-Llama-3.1-8B-Instruct (via Ollama) Typical hardware: GTX 1070 Ti, GTX 1080, RTX 2060The LLM pipeline uses a separate runner architecture from the standard livepeer/ai-runner image. See Batch AI Setup for the Ollama deployment guide.Source: Cloud SPE Ollama runner blog post
Continuous frame-by-frame transformation of live video streams. This pipeline takes a WebRTC stream as input and returns a transformed WebRTC stream with sub-100ms per-frame latency.Minimum VRAM: 24 GB recommended Pricing unit: Per frame Runtime: livepeer/ai-runner:live-base + ComfyStream Typical hardware: RTX 4090, A100, H100This pipeline powers the Cascade architecture — Livepeer’s live-video AI system. It supports live AI effects, live style transfer, and streaming AI agents.Source: ComfyStream on GitHub

Hardware by workload type

These are minimum requirements. Running at the minimum will result in longer cold-start times and reduced job competitiveness. The figures below reflect production-ready recommendations.
For detailed VRAM planning, warm model strategy, and multi-pipeline configuration, see Model Hosting and VRAM Planning.

What you build and what the network supplies

The Livepeer protocol handles the hard parts of running an inference marketplace. As an orchestrator: You do need to:
  • Run and maintain GPU infrastructure
  • Configure aiModels.json with your supported pipelines and pricing
  • Keep your primary models warm and your node performant
  • Stay competitive on latency and pricing
The network already supplies:
  • Build a marketplace or API
  • Implement authentication or billing
  • Handle service discovery
  • Build brand recognition
Gateways provide all of that. Your competitive advantage is performance: lower latency, better-tuned models, higher uptime, specialised capabilities.

Network participation

To verify your pipelines are visible to the network and check live capability coverage: The network capabilities tool shows all registered orchestrators and their advertised pipelines currently visible on the network. Before your orchestrator receives AI jobs, it must appear there under at least one pipeline.

Watch: AI on Livepeer

Next steps

Last modified on March 16, 2026