Skip to main content
The Livepeer network supports three distinct categories of AI pipeline. Each category works differently at the protocol level: different connection models, different billing, different GPU requirements. Understanding which category fits your use case before building prevents rework. Limitation: Livepeer AI pipelines run on GPU capacity contributed by independent orchestrators. Availability and latency depend on the orchestrator set at any given time. The Studio gateway routes to the best available orchestrator; direct gateway access gives more control at the cost of more operational responsibility.

Pipeline categories at a glance

Batch AI pipelines

Batch AI pipelines follow a request-and-response model: your application sends a job to the network, an orchestrator processes it, and you receive the result. There is no persistent connection. The GPU is assigned to your job, completes the inference, and is released. The Livepeer network supports the following batch pipelines: Orchestrators keep one model per pipeline “warm” — loaded and ready in GPU memory. Requesting a model that no orchestrator currently has warm still works, but the first response is slower while the model loads (30 seconds to several minutes). Warm model availability per pipeline is listed on each pipeline’s reference page. Batch pipelines are accessed through the AI Gateway API. Gateway options include the Studio-managed gateway and the free community gateway. See Developer Stack for a full comparison. Where to start: AI Quickstart

Real-time AI

Real-time AI on Livepeer is built around the live-video-to-video pipeline type. Unlike batch pipelines, real-time AI maintains a persistent stream connection: video frames flow in continuously, inference runs on each frame, and transformed frames flow back out at sub-second latency. The infrastructure model differs from batch processing in four ways:
  • Connection: Persistent WebRTC or RTMP stream, not request/response
  • Billing: Per second of compute, not per pixel or per output
  • GPU assignment: Dedicated to your stream for its full duration
  • Output: Continuous frame-by-frame results, not a single returned asset
ComfyStream is the primary tool for building real-time AI pipelines on Livepeer. It is an open-source ComfyUI plugin (github.com/livepeer/comfystream) that turns ComfyUI’s node-graph workflow editor into a real-time inference engine for live video. Daydream is built on ComfyStream — if you are using the Daydream API, you are already running on this infrastructure.

Use cases

Real-time AI on Livepeer supports:
  • Live video style transfer and artistic transformation
  • Interactive generative overlays for live streams
  • Real-time scene augmentation and frame-by-frame computer vision

VTuber and agent avatar infrastructure

VTuber avatar generation is the most technically demanding real-time AI use case. It requires sub-100ms latency, face/body tracking input, and a real-time diffusion pipeline running at 20+ FPS. Livepeer’s real-time AI infrastructure supports this via ComfyStream. The Agent SPE — a treasury-funded Special Purpose Entity approved in April 2025 with 30,000 LPT — built the first production VTuber and AI avatar pipeline on Livepeer. Its deliverables are:
  • A real-time agent avatar generation pipeline using ComfyStream and StreamDiffusion
  • A Livepeer model provider plugin for the Eliza agent framework (ai16z), enabling Eliza agents to route LLM inference through the Livepeer network
Technical path for VTuber / avatar products:
  1. ComfyStream as the real-time inference engine — see Build with ComfyStream
  2. live-video-to-video pipeline type via the AI gateway
  3. StreamDiffusion custom nodes from ComfyUI-Stream-Pack for diffusion-based avatar transformation
  4. GPU requirements: NVIDIA RTX 3090 or better; RTX 4090 recommended for 25 FPS
Where to start for VTuber / avatar: ComfyStream Quickstart Where to start for AI agents with avatar output: Build an AI Agent on Livepeer Limitation: Real-time AI requires a dedicated GPU for the duration of the stream. At peak network load, orchestrator availability for live-video-to-video is lower than for batch pipelines. Test under expected concurrency before production launch.

LLM pipeline

The LLM pipeline brings text inference to the Livepeer network using an Ollama-based runner with an OpenAI-compatible API. From a developer’s perspective, it works like any OpenAI-compatible chat completions endpoint. Requests route to decentralised GPU orchestrators rather than a centralised cloud provider. The LLM pipeline runs on a wider range of GPU hardware than diffusion-based batch pipelines. An orchestrator needs as little as 8 GB of VRAM to serve LLM workloads, making it accessible to a larger pool of network participants. The LLM SPE built and maintains this pipeline. The Cloud SPE provides managed gateway access to it, making decentralised LLM inference available at https://livepeer.studio/api/beta/generate/llm with a Studio API key and no infrastructure setup.

Working with the LLM pipeline

The LLM endpoint accepts the OpenAI /v1/chat/completions request format:
curl -X POST https://livepeer.studio/api/beta/generate/llm \
  -H "Authorization: Bearer $LIVEPEER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
      {"role": "user", "content": "Explain Livepeer in one sentence."}
    ]
  }'
Supported models include meta-llama/Meta-Llama-3.1-8B-Instruct (warm, 8 GB VRAM), mistralai/Mistral-7B-Instruct-v0.3, google/gemma-2-9b-it, and Qwen/Qwen2.5-7B-Instruct. Any Ollama-compatible model works — cold-start applies to models not currently loaded on any orchestrator. The LLM pipeline supports applications that need:
  • Text or code generation
  • Conversational agents and chatbots
  • AI copilots embedded in applications
  • Decentralised, open-source model inference without proprietary API dependency
Where to start: AI QuickstartBuild an AI Agent on Livepeer for the Eliza integration tutorial

Choose your path

The key question: does your application transform a live stream continuously, or process one piece of media at a time? Continuous live transformation requires real-time AI. One-at-a-time processing uses batch AI.

AI Quickstart

Make your first batch AI inference call via the AI Gateway API.

ComfyStream Quickstart

Build and run a real-time AI video pipeline with ComfyStream.

Build an AI Agent

Connect an Eliza agent to the Livepeer LLM pipeline.

AI Model Support

All supported models, warm model availability, and VRAM requirements.

BYOC

Deploy a custom model container on the Livepeer network.

Grants & Programmes

Agent SPE, LLM SPE, and Cloud SPE details, plus funded builder programmes.
Last modified on April 8, 2026