Workload Options - Livepeer Docs

Orchestrators on Livepeer earn fees from four distinct categories of work. Each has different GPU requirements, a different pricing model, and a different path for how jobs reach your node. Use this page to compare the workload types, understand their routing logic, and choose the setup path that fits your hardware.

At a glance

How routing works – and why it differs per job type

The routing mechanics differ significantly by job type. Understanding this is the key insight for operators deciding which workloads to prioritise.

Video transcoding

Transcoding jobs are routed by Gateways using a multi-factor algorithm: stake weight + price + performance score. Gateways favour Orchestrators with more total stake (own + delegated), competitive pricing, and high historical success rates. This means new or low-stake Orchestrators are at a structural disadvantage for transcoding – they compete against established nodes with large Delegator bases. Being in the Active Set (top 100 by stake) is a prerequisite, and even within the Active Set, high-stake nodes capture disproportionate transcoding volume.

Batch AI inference

AI jobs route on capability match + price ceiling. The Gateway checks: does this Orchestrator support the requested pipeline and model? Is the price below the Gateway’s -maxPricePerCapability limit? Stake plays a much smaller role. This matters for operators: a new Orchestrator with 24 GB VRAM and correctly configured AI pipelines competes for AI jobs immediately, regardless of stake. The barrier to entry for AI earnings is hardware and configuration, with delegation history playing a much smaller role.

Cascade live-video AI

Routing adds latency as a primary factor alongside capability and price. Persistent streaming jobs cannot tolerate high jitter or cold model loads, so Gateways select Orchestrators with the right workflow already warm and with low round-trip latency. Geographic proximity to Gateways matters here.

LLM inference

Routes the same way as batch AI through capability match and price. The capability check targets a specific Ollama model ID instead of a diffusion pipeline. Operators running quantised 7B models on 8 GB cards receive LLM jobs without competing on the same axis as GPU-heavy diffusion operators.

The practical upshot: Low-stake nodes still earn AI fees while transcoding demand stays concentrated in higher-stake nodes. A capable GPU plus sound AI configuration is the faster route to active earnings.

Video transcoding

Transcoding has been the Livepeer Network’s core workload since 2017. When a broadcaster sends a live video stream to a Gateway, the Gateway routes individual segments – roughly 2 seconds each – to Orchestrators for conversion into multiple resolution and bitrate variants. Your node receives raw video, decodes it (NVDEC), re-encodes it to the requested output profiles (NVENC), and returns the results. Sessions often run for hours. Your node processes dozens or hundreds of segments per active stream. GPU-accelerated transcoding via NVIDIA hardware is strongly recommended – CPU transcoding remains viable for tests but rarely stays cost-competitive at current network prices.

Video Transcoding Guide

Pricing configuration (wei and USD), session limits, NVENC caps, output rendition profiles, and benchmarking your setup.

Batch AI inference

The AI Subnet launched in Q3 2024, adding single-request inference as a job category alongside transcoding. A batch AI job arrives as an HTTP request routed from a Gateway to your ai-runner Docker container. The container runs inference – generating an image, captioning a photo, transcribing audio – and returns the result. The session ends when the result is returned. The nine current pipelines: For warm model recommendations and per-pipeline pricing guidance, see AI Pipelines and the Model and VRAM Reference.

Batch AI Setup

Install and configure the ai-runner container, aiModels.json, warm models, and pipeline pricing.

Model and VRAM Reference

Per-pipeline VRAM requirements, recommended models, and pricing benchmarks.

Cascade live-video AI

Cascade live-video AI is architecturally distinct from batch inference. Instead of processing a single request and returning a result, your GPU is allocated to a persistent, continuous stream that receives live video frames, runs inference on each frame, and returns the transformed stream. Cascade is the network’s strategic name for live-video AI. The underlying technical implementation is ComfyStream, which runs ComfyUI as an inference backend and applies ComfyUI workflows to live video frames. Cascade Phase 1 shipped in December 2024, and ComfyStream integrated with Livepeer Network payments in January 2025. The latency constraint is what separates this from batch inference: a 30 fps video stream must process each frame in roughly 33 ms or less. This drives both hardware requirements – faster GPUs, more VRAM for resident tensors – and configuration requirements: the ComfyUI workflow must be compiled and warm throughout the stream. Current use cases on the network include live avatar generation, style transfer overlays, live-video agents, and AI-enhanced scene analysis.

Cascade Setup

ComfyStream setup, workflow configuration, latency requirements, and troubleshooting.

LLM inference

The LLM pipeline serves large language model inference - text completion, chat, and instruction-following - through an Ollama-based runner built by the Cloud SPE. Diffusion pipelines use the standard ai-runner Docker image, while LLM inference uses a separate livepeer-ollama-runner container. Quantised LLMs are VRAM-efficient. A 4-bit quantised 7B model fits into 8 GB of VRAM and runs on hardware in the GTX 1080 / RTX 2060 class – machines that are too memory-constrained for most diffusion pipelines. This makes LLM inference the practical AI entry point for operators with older consumer GPUs. The Ollama catalogue is broad. Start with a network-tested model such as meta-llama/Meta-Llama-3.1-8B-Instruct. LLM jobs route via capability match and price, the same as batch AI. Hardware fit and configuration determine the primary advantage here.

What should I run?

Match your hardware to the workloads that fit. Focused coverage beats trying to run every workload poorly.

8–12 GB VRAM - entry-level GPU (GTX 1080, RTX 2060, 3060)

The diffusion pipelines that drive most AI fee volume require 16–24 GB. Your best options:

Transcoding – modern NVIDIA GPUs handle video transcoding sessions competitively; NVENC session caps on consumer cards still apply. Active-set operators use transcoding as baseline earnings.
LLM inference – quantised 7–8B models run well at 8 GB. Low barrier to entry for AI earnings.
image-to-text, segment-anything-2 – the two AI pipelines with sub-8 GB requirements. Lower fee volume than diffusion, but feasible on these cards.

Skip: text-to-image, image-to-image, image-to-video, audio-to-text – all require 12–24 GB minimum.

16 GB VRAM - mid-range (RTX 3080, 4080, A4000)

This tier supports the full batch AI suite except the most VRAM-heavy diffusion models:

Transcoding + batch AI (most pipelines)
audio-to-text (12 GB), image-to-text (4 GB), segment-anything-2 (6 GB) all fit comfortably
text-to-image with smaller Lightning / Turbo variants (some fit at 16 GB)
LLM inference with 13B quantised models
Cascade live-video AI – lightweight workflows fit at 16 GB; heavier SDXL-based workflows stay tight

Prioritise: get batch AI earning first on the lower-VRAM pipelines, then experiment with which diffusion models fit.

24 GB VRAM - high-end (RTX 3090, 4090, A5000, A6000)

A 24 GB card runs the full suite:

Transcoding – full output ladder, higher maxSessions
Full batch AI – all pipelines including text-to-image, image-to-image, image-to-video with production SDXL models
Cascade live-video AI – most ComfyStream workflows fit; latency-optimised workloads are feasible
LLM inference – 13B quantised models comfortably; 70B quantised models require multiple cards

Strategy: warm your highest-fee pipeline (text-to-image is typically the most sought-after), cold the rest. See Model and VRAM Reference for per-pipeline earnings guidance.

Multiple GPUs or data-centre class

Pool architecture – route jobs across multiple workers, each on separate GPUs
Different GPUs serve different pipelines simultaneously (assign per-GPU in aiModels.json)
Fleet operations for multi-machine deployments
Cascade becomes highly competitive with dedicated GPU per stream

See Run a Pool and Fleet Operations.

Setup guides

Video Transcoding

Configure pricing, session limits, and NVENC settings for transcoding.

AI Workloads Overview

Pipeline overview, aiModels.json, warm model strategy.

Batch AI Setup

Install the ai-runner and configure your first AI pipeline.

Cascade Setup

ComfyStream setup for live video AI workloads.

​At a glance

​How routing works – and why it differs per job type

​Video transcoding

​Batch AI inference

​Cascade live-video AI

​LLM inference

​Video transcoding

Video Transcoding Guide

​Batch AI inference

Batch AI Setup

Model and VRAM Reference

​Cascade live-video AI

Cascade Setup

​LLM inference

​What should I run?

​Setup guides

Video Transcoding

AI Workloads Overview

Batch AI Setup

Cascade Setup

At a glance

How routing works – and why it differs per job type

Video transcoding

Batch AI inference

Cascade live-video AI

LLM inference

Video transcoding

Batch AI inference

Cascade live-video AI

LLM inference

What should I run?

Setup guides