Skip to main content
The Livepeer network supports three distinct categories of AI pipeline. Each one works differently at the protocol level — different connection models, different billing, different GPU requirements — and each one suits a different class of workload. Understanding which category fits your use case before you start building will save you time and rework.

Pipeline Categories at a Glance

CategoryWhat it doesBest forPrimary tool
Batch AISingle request → inference → resultImage generation, transcription, upscaling, captioningAI Gateway API
Real-time AIPersistent stream → continuous frame-by-frame outputLive video transformation, VTuber avatars, generative overlaysComfyStream
LLMText in → text out (OpenAI-compatible)Chatbots, agents, copilots, text inferenceLLM API (Ollama-based)

Batch AI Pipelines

Batch AI pipelines follow a request-and-response model: your application sends a job to the network, an orchestrator processes it, and you receive the result. There is no persistent connection. The GPU is assigned to your job, completes the inference, and is released. The Livepeer network currently supports the following batch pipelines:
PipelineWhat it doesMin VRAM
text-to-imageGenerate images from text prompts24 GB
image-to-imageStyle transfer, enhancement, img2img~16 GB
image-to-videoAnimate images into video clips~16 GB
image-to-textGenerate captions or descriptions for images4 GB
audio-to-textSpeech recognition (ASR) with timestamps~16 GB
text-to-speechGenerate natural speech from text~16 GB
upscaleUpscale low-resolution images without distortion~16 GB
segment-anything-2Promptable visual segmentation for images and video~16 GB
Orchestrators are encouraged to keep one model per pipeline “warm” on their GPU — meaning it stays loaded and ready. Requesting a model that is not currently warm on any orchestrator will still work, but the first response may be slower while the model loads. This is called a cold start. Warm model availability per pipeline is listed on each pipeline’s reference page.
Batch pipelines are accessed through the AI Gateway API. Multiple gateway providers are available — see AI Gateways for options including the hosted Studio Gateway and the free community gateway. Where to start: AI Quickstart

Real-Time AI

Real-time AI on Livepeer is built around the live-video-to-video pipeline type. Unlike batch pipelines, real-time AI maintains a persistent stream connection: video frames flow in continuously, inference runs on each frame, and transformed frames flow back out — all with sub-second latency. This represents a different infrastructure model from batch processing:
  • Connection: Persistent WebRTC or RTMP stream (not request/response)
  • Billing: Per second of compute (not per pixel or per output)
  • GPU assignment: Dedicated to your stream for its entire duration
  • Output: Continuous frame-by-frame results — not a single returned asset
This shift from batch to real-time is what the Livepeer Cascade vision describes: “a path to transition from a pure streaming and transcoding infrastructure, to an infrastructure that could succeed at providing compute for the future of real-time AI video.” Real-time pipelines enable use cases that are simply impossible with batch processing — live avatars, interactive stream effects, and generative overlays that respond to live input. ComfyStream is the primary tool for building real-time AI pipelines on Livepeer. It is an open-source ComfyUI plugin (github.com/livepeer/comfystream) that turns ComfyUI’s node-graph workflow editor into a real-time inference engine for live video. Daydream itself is built on ComfyStream — so if you are using the Daydream API, you are already running on this infrastructure. Building with ComfyStream directly gives you full control over the workflow, model selection, and pipeline composition. Use cases enabled by real-time AI on Livepeer:
  • Live video style transfer and artistic transformation
  • VTuber avatar generation and face/body tracking overlays
  • Interactive generative overlays for live streams
  • Automated video agents and real-time scene augmentation
  • Live analytics and frame-by-frame computer vision
Where to start: ComfyStream Quickstart

LLM Pipeline

The LLM pipeline brings text inference to the Livepeer network using an Ollama-based runner with an OpenAI-compatible API. From a developer’s perspective, it works like any other OpenAI-compatible chat completions endpoint — the difference is that your requests are routed to decentralised GPU operators instead of a centralised cloud provider. The LLM pipeline runs on a wider range of GPU hardware than diffusion-based batch pipelines — an orchestrator needs as little as 8 GB of VRAM to serve LLM workloads, making it accessible to a larger pool of network participants. The LLM pipeline is suited for applications that need:
  • Text or code generation
  • Conversational agents or chatbots
  • AI copilots embedded in applications
  • Decentralised, open-source model inference (no proprietary API dependency)
Where to start: AI Quickstart

Choose Your Path

If your workload is…UseLatencySetup complexity
Generating images or video on demandBatch AI (text-to-image, image-to-video)SecondsLow
Processing audio to textBatch AI (audio-to-text)SecondsLow
Captioning or analysing imagesBatch AI (image-to-text, segment-anything-2)SecondsLow
Live video transformation, avatars, overlaysReal-time AI (live-video-to-video)Sub-secondMedium–High
Text/code inference, chatbots, agentsLLM pipelineSecondsLow–Medium
Custom AI model or pipeline (BYOC)Real-time AI + BYOCSub-secondHigh
If you are unsure whether your workload is batch or real-time, ask: does your application need to transform a live stream continuously, or does it process one piece of media at a time? Continuous live transformation → real-time AI. One-at-a-time processing → batch AI.

Next Steps

Last modified on March 16, 2026