Pipeline categories at a glance
Batch AI pipelines
Batch AI pipelines follow a request-and-response model: your application sends a job to the network, an orchestrator processes it, and you receive the result. There is no persistent connection. The GPU is assigned to your job, completes the inference, and is released. The Livepeer network supports the following batch pipelines: Orchestrators keep one model per pipeline “warm” — loaded and ready in GPU memory. Requesting a model that no orchestrator currently has warm still works, but the first response is slower while the model loads (30 seconds to several minutes). Warm model availability per pipeline is listed on each pipeline’s reference page. Batch pipelines are accessed through the AI Gateway API. Gateway options include the Studio-managed gateway and the free community gateway. See Developer Stack for a full comparison. Where to start: AI QuickstartReal-time AI
Real-time AI on Livepeer is built around thelive-video-to-video pipeline type. Unlike batch pipelines, real-time AI maintains a persistent stream connection: video frames flow in continuously, inference runs on each frame, and transformed frames flow back out at sub-second latency.
The infrastructure model differs from batch processing in four ways:
- Connection: Persistent WebRTC or RTMP stream, not request/response
- Billing: Per second of compute, not per pixel or per output
- GPU assignment: Dedicated to your stream for its full duration
- Output: Continuous frame-by-frame results, not a single returned asset
github.com/livepeer/comfystream) that turns ComfyUI’s node-graph workflow editor into a real-time inference engine for live video. Daydream is built on ComfyStream — if you are using the Daydream API, you are already running on this infrastructure.
Use cases
Real-time AI on Livepeer supports:- Live video style transfer and artistic transformation
- Interactive generative overlays for live streams
- Real-time scene augmentation and frame-by-frame computer vision
VTuber and agent avatar infrastructure
VTuber avatar generation is the most technically demanding real-time AI use case. It requires sub-100ms latency, face/body tracking input, and a real-time diffusion pipeline running at 20+ FPS. Livepeer’s real-time AI infrastructure supports this via ComfyStream. The Agent SPE — a treasury-funded Special Purpose Entity approved in April 2025 with 30,000 LPT — built the first production VTuber and AI avatar pipeline on Livepeer. Its deliverables are:- A real-time agent avatar generation pipeline using ComfyStream and StreamDiffusion
- A Livepeer model provider plugin for the Eliza agent framework (ai16z), enabling Eliza agents to route LLM inference through the Livepeer network
- ComfyStream as the real-time inference engine — see Build with ComfyStream
live-video-to-videopipeline type via the AI gateway- StreamDiffusion custom nodes from ComfyUI-Stream-Pack for diffusion-based avatar transformation
- GPU requirements: NVIDIA RTX 3090 or better; RTX 4090 recommended for 25 FPS
live-video-to-video is lower than for batch pipelines. Test under expected concurrency before production launch.
LLM pipeline
The LLM pipeline brings text inference to the Livepeer network using an Ollama-based runner with an OpenAI-compatible API. From a developer’s perspective, it works like any OpenAI-compatible chat completions endpoint. Requests route to decentralised GPU orchestrators rather than a centralised cloud provider. The LLM pipeline runs on a wider range of GPU hardware than diffusion-based batch pipelines. An orchestrator needs as little as 8 GB of VRAM to serve LLM workloads, making it accessible to a larger pool of network participants. The LLM SPE built and maintains this pipeline. The Cloud SPE provides managed gateway access to it, making decentralised LLM inference available athttps://livepeer.studio/api/beta/generate/llm with a Studio API key and no infrastructure setup.
Working with the LLM pipeline
The LLM endpoint accepts the OpenAI/v1/chat/completions request format:
meta-llama/Meta-Llama-3.1-8B-Instruct (warm, 8 GB VRAM), mistralai/Mistral-7B-Instruct-v0.3, google/gemma-2-9b-it, and Qwen/Qwen2.5-7B-Instruct. Any Ollama-compatible model works — cold-start applies to models not currently loaded on any orchestrator.
The LLM pipeline supports applications that need:
- Text or code generation
- Conversational agents and chatbots
- AI copilots embedded in applications
- Decentralised, open-source model inference without proprietary API dependency
Choose your path
The key question: does your application transform a live stream continuously, or process one piece of media at a time? Continuous live transformation requires real-time AI. One-at-a-time processing uses batch AI.
Related pages
AI Quickstart
Make your first batch AI inference call via the AI Gateway API.
ComfyStream Quickstart
Build and run a real-time AI video pipeline with ComfyStream.
Build an AI Agent
Connect an Eliza agent to the Livepeer LLM pipeline.
AI Model Support
All supported models, warm model availability, and VRAM requirements.
BYOC
Deploy a custom model container on the Livepeer network.
Grants & Programmes
Agent SPE, LLM SPE, and Cloud SPE details, plus funded builder programmes.