Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt

Use this file to discover all available pages before exploring further.

The Livepeer network supports three distinct categories of AI pipeline. Each category works differently at the protocol level: different connection models, different billing, different GPU requirements. Understanding which category fits your use case before building prevents rework. Constraint: Livepeer AI pipelines run on GPU capacity contributed by independent orchestrators. Availability and latency depend on the orchestrator set at any given time. The community gateway at dream-gateway.livepeer.cloud routes to the best available orchestrator for development; production applications use a self-hosted gateway or a gateway provider for routing control.

Pipeline categories at a glance

Batch AI pipelines

Batch AI pipelines follow a request-and-response model: your application sends a job to the network, an orchestrator processes it, and you receive the result. There is no persistent connection. The GPU is assigned to your job, completes the inference, and is released. Orchestrators keep one model per pipeline “warm” in GPU memory. Requesting a model that no orchestrator currently has warm still works, but the first response is slower while the model loads (30 seconds to 5 minutes depending on model size). Warm model availability per pipeline is listed on the model support page. Where to start: AI quickstart

Real-time AI

Real-time AI on Livepeer is built around the live-video-to-video pipeline type. Unlike batch pipelines, real-time AI maintains a persistent stream connection: video frames flow in continuously, inference runs on each frame, and transformed frames flow back out at sub-second latency. The infrastructure model differs from batch processing in four ways:
  • Connection: Persistent WebRTC or trickle stream, not request/response
  • Billing: Per second of compute time (confirmed in the go-livepeer LivePaymentSender interface)
  • GPU assignment: Dedicated to your stream for its full duration
  • Output: Continuous frame-by-frame results, not a single returned asset

Developer tools for real-time AI

Three tools serve different real-time AI use cases: ComfyStream (livepeer/comfystream) is the primary tool for building real-time AI pipelines. It turns ComfyUI’s node-graph workflow editor into a real-time inference engine for live video. Supported models include StreamDiffusion, ControlNet, IPAdapter, FaceID, LoRA, Whisper (audio), Gemma (video understanding), and SuperResolution. See ComfyStream overview. PyTrickle (livepeer/pytrickle) is the Python SDK for building custom real-time processing services outside ComfyUI. Subclass FrameProcessor, implement process_frame(), and PyTrickle handles the trickle protocol transport, session management, and frame serialisation. See PyTrickle overview. ComfyUI-Stream-Pack (livepeer/ComfyUI-Stream-Pack) provides custom ComfyUI nodes for live video and audio input: LoadTensor and LoadAudioTensor nodes that feed real-time media into ComfyUI workflows. See Stream Pack overview.

VTuber and agent avatar infrastructure

VTuber avatar generation requires sub-100ms latency, face/body tracking input, and a real-time diffusion pipeline running at 20+ FPS. Livepeer’s real-time AI infrastructure supports this via ComfyStream. The Agent SPE (treasury-funded Special Purpose Entity, approved April 2025 with 30,000 LPT) built the first production VTuber and AI avatar pipeline on Livepeer, delivering:
  • A real-time agent avatar generation pipeline using ComfyStream and StreamDiffusion
  • A Livepeer model provider plugin for the Eliza agent framework (ai16z), enabling Eliza agents to route LLM inference through the Livepeer network
Technical path for VTuber / avatar products:
  1. ComfyStream as the real-time inference engine
  2. live-video-to-video pipeline type via the AI gateway
  3. StreamDiffusion custom nodes from ComfyUI-Stream-Pack for diffusion-based avatar transformation
  4. GPU requirements: NVIDIA RTX 3090 or better; RTX 4090 recommended for 25 FPS
Where to start for real-time AI: ComfyStream quickstart Where to start for AI agents: Eliza Livepeer plugin tutorial
Real-time AI requires a dedicated GPU for the duration of the stream. At peak network load, orchestrator availability for live-video-to-video is lower than for batch pipelines. Test under expected concurrency before production launch.

LLM pipeline

The LLM pipeline brings text inference to the Livepeer network using an Ollama-based runner with an OpenAI-compatible API. From a developer’s perspective, it works like any OpenAI-compatible chat completions endpoint. Requests route to decentralised GPU orchestrators instead of a centralised cloud provider. The LLM pipeline is currently in beta. It runs on a wider range of GPU hardware than diffusion-based batch pipelines: an orchestrator needs as little as 8 GB of VRAM to serve LLM workloads.
curl -X POST https://dream-gateway.livepeer.cloud/llm \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
      {"role": "user", "content": "Explain Livepeer in one sentence."}
    ]
  }'
Supported models include meta-llama/Meta-Llama-3.1-8B-Instruct (warm, 8 GB VRAM), mistralai/Mistral-7B-Instruct-v0.3, google/gemma-2-9b-it, and Qwen/Qwen2.5-7B-Instruct. Any Ollama-compatible model works; cold-start applies to models not currently loaded on any orchestrator. The LLM SPE built and maintains this pipeline. The Cloud SPE provides managed gateway access to it for production use. Where to start: AI quickstart for the LLM endpoint; Eliza Livepeer plugin tutorial for the agent integration path.

Choose your path

The key question: does your application transform a live stream continuously, or process one piece of media at a time? Continuous live transformation requires real-time AI. One-at-a-time processing uses batch AI. Text inference uses the LLM pipeline. The AI quickstart covers the batch and LLM paths. The ComfyStream quickstart covers the real-time path.
Last modified on May 19, 2026