At a glance
How routing works — and why it differs per job type
The routing mechanics differ significantly by job type. Understanding this is the key insight for operators deciding which workloads to prioritise.Video transcoding
Transcoding jobs are routed by gateways using a multi-factor algorithm: stake weight + price + performance score. Gateways favour orchestrators with more total stake (own + delegated), competitive pricing, and high historical success rates. This means new or low-stake orchestrators are at a structural disadvantage for transcoding — they compete against established nodes with large delegator bases. Being in the active set (top 100 by stake) is a prerequisite, and even within the active set, high-stake nodes capture disproportionate transcoding volume.Batch AI inference
AI jobs route on capability match + price ceiling. The gateway checks: does this orchestrator support the requested pipeline and model? Is the price below the gateway’s-maxPricePerCapability limit? Stake plays a much smaller role.
This matters for operators: a new orchestrator with 24 GB VRAM and correctly configured AI pipelines competes for AI jobs immediately, regardless of stake. The barrier to entry for AI earnings is hardware and configuration, with delegation history playing a much smaller role.
Cascade live-video AI
Routing adds latency as a primary factor alongside capability and price. Persistent streaming jobs cannot tolerate high jitter or cold model loads, so gateways select orchestrators with the right workflow already warm and with low round-trip latency. Geographic proximity to gateways matters here.LLM inference
Routes the same way as batch AI through capability match and price. The capability check targets a specific Ollama model ID instead of a diffusion pipeline. Operators running quantised 7B models on 8 GB cards receive LLM jobs without competing on the same axis as GPU-heavy diffusion operators.The practical upshot: Low-stake nodes still earn AI fees while transcoding demand stays concentrated in higher-stake nodes. A capable GPU plus sound AI configuration is the faster route to active earnings.
Video transcoding
Transcoding has been the Livepeer network’s core workload since 2017. When a broadcaster sends a live video stream to a gateway, the gateway routes individual segments — roughly 2 seconds each — to orchestrators for conversion into multiple resolution and bitrate variants. Your node receives raw video, decodes it (NVDEC), re-encodes it to the requested output profiles (NVENC), and returns the results. Sessions often run for hours. Your node processes dozens or hundreds of segments per active stream. GPU-accelerated transcoding via NVIDIA hardware is strongly recommended — CPU transcoding remains viable for tests but rarely stays cost-competitive at current network prices.Video Transcoding Guide
Pricing configuration (wei and USD), session limits, NVENC caps, output rendition profiles, and benchmarking your setup.
Batch AI inference
The AI Subnet launched in Q3 2024, adding single-request inference as a job category alongside transcoding. A batch AI job arrives as an HTTP request routed from a gateway to yourai-runner Docker container. The container runs inference — generating an image, captioning a photo, transcribing audio — and returns the result. The session ends when the result is returned.
The nine current pipelines:
For warm model recommendations and per-pipeline pricing guidance, see AI Pipelines and the Model and VRAM Reference.
Batch AI Setup
Install and configure the ai-runner container, aiModels.json, warm models, and pipeline pricing.
Model and VRAM Reference
Per-pipeline VRAM requirements, recommended models, and pricing benchmarks.
Cascade live-video AI
Cascade live-video AI is architecturally distinct from batch inference. Instead of processing a single request and returning a result, your GPU is allocated to a persistent, continuous stream that receives live video frames, runs inference on each frame, and returns the transformed stream. Cascade is the network’s strategic name for live-video AI. The underlying technical implementation is ComfyStream, which runs ComfyUI as an inference backend and applies ComfyUI workflows to live video frames. Cascade Phase 1 shipped in December 2024, and ComfyStream integrated with Livepeer network payments in January 2025. The latency constraint is what separates this from batch inference: a 30 fps video stream must process each frame in roughly 33 ms or less. This drives both hardware requirements — faster GPUs, more VRAM for resident tensors — and configuration requirements: the ComfyUI workflow must be compiled and warm throughout the stream. Current use cases on the network include live avatar generation, style transfer overlays, live-video agents, and AI-enhanced scene analysis.Cascade Setup
ComfyStream setup, workflow configuration, latency requirements, and troubleshooting.
LLM inference
The LLM pipeline serves large language model inference - text completion, chat, and instruction-following - through an Ollama-based runner built by the Cloud SPE. Diffusion pipelines use the standardai-runner Docker image, while LLM inference uses a separate livepeer-ollama-runner container.
Quantised LLMs are VRAM-efficient. A 4-bit quantised 7B model fits into 8 GB of VRAM and runs on hardware in the GTX 1080 / RTX 2060 class — machines that are too memory-constrained for most diffusion pipelines. This makes LLM inference the practical AI entry point for operators with older consumer GPUs.
The Ollama catalogue is broad. Start with a network-tested model such as meta-llama/Meta-Llama-3.1-8B-Instruct.
LLM jobs route via capability match and price, the same as batch AI. Hardware fit and configuration determine the primary advantage here.
What should I run?
Match your hardware to the workloads that fit. Focused coverage beats trying to run every workload poorly.8–12 GB VRAM — entry-level GPU (GTX 1080, RTX 2060, 3060)
8–12 GB VRAM — entry-level GPU (GTX 1080, RTX 2060, 3060)
The diffusion pipelines that drive most AI fee volume require 16–24 GB. Your best options:
- Transcoding — modern NVIDIA GPUs handle video transcoding sessions competitively; NVENC session caps on consumer cards still apply. Active-set operators use transcoding as baseline earnings.
- LLM inference — quantised 7–8B models run well at 8 GB. Low barrier to entry for AI earnings.
image-to-text,segment-anything-2— the two AI pipelines with sub-8 GB requirements. Lower fee volume than diffusion, but feasible on these cards.
text-to-image, image-to-image, image-to-video, audio-to-text — all require 12–24 GB minimum.16 GB VRAM — mid-range (RTX 3080, 4080, A4000)
16 GB VRAM — mid-range (RTX 3080, 4080, A4000)
This tier supports the full batch AI suite except the most VRAM-heavy diffusion models:
- Transcoding + batch AI (most pipelines)
audio-to-text(12 GB),image-to-text(4 GB),segment-anything-2(6 GB) all fit comfortablytext-to-imagewith smaller Lightning / Turbo variants (some fit at 16 GB)- LLM inference with 13B quantised models
- Cascade live-video AI — lightweight workflows fit at 16 GB; heavier SDXL-based workflows stay tight
24 GB VRAM — high-end (RTX 3090, 4090, A5000, A6000)
24 GB VRAM — high-end (RTX 3090, 4090, A5000, A6000)
A 24 GB card runs the full suite:
- Transcoding — full output ladder, higher
maxSessions - Full batch AI — all pipelines including
text-to-image,image-to-image,image-to-videowith production SDXL models - Cascade live-video AI — most ComfyStream workflows fit; latency-optimised workloads are feasible
- LLM inference — 13B quantised models comfortably; 70B quantised models require multiple cards
text-to-image is typically the most sought-after), cold the rest. See Model and VRAM Reference for per-pipeline earnings guidance.Multiple GPUs or data-centre class
Multiple GPUs or data-centre class
- Pool architecture — route jobs across multiple workers, each on separate GPUs
- Different GPUs serve different pipelines simultaneously (assign per-GPU in
aiModels.json) - Fleet operations for multi-machine deployments
- Cascade becomes highly competitive with dedicated GPU per stream