live-video-to-video pipeline that powers live AI video effects, live style transfer, and streaming AI agents on the Livepeer network.
What Cascade is
Cascade is Livepeer’s live-video AI processing pipeline. The name refers to the architecture — video streams cascade through AI transformation nodes in the network, enabling live applications that previously required centralised infrastructure. Example applications on Cascade:- Daydream — generative AI video platform with live style application
- StreamDiffusionTD — live diffusion via TouchDesigner
- ComfyStream — browser-based ComfyUI pipelines with live video input
- OBS plugins — live AI effects applied to streaming content
How it differs from batch AI
The key difference is the continuous frame loop. Your pipeline receives frames as they arrive from the upstream WebRTC stream and must process and emit them quickly enough to avoid buffering. At 30 fps, you have 33 ms per frame budget. At 24 fps, 42 ms.Prerequisites
Cascade has stricter hardware requirements than batch inference:- GPU: RTX 4090 (24 GB) strongly recommended. RTX 3090 (24 GB) is functional but with less headroom. A100/H100 for production multi-stream setups.
- CPU: 8+ cores recommended. Frame decoding/encoding is CPU-bound.
- Network: Low-latency connection. WebRTC streams are sensitive to packet loss and jitter.
- CUDA: 12.0+
- Docker with NVIDIA Container Toolkit
- go-livepeer running with
-aiWorkerenabled
Architecture overview
Cascade architecture overview
ComfyStream — the live-video pipeline runtime
ComfyStream is the primary runtime for live-video AI inference on Livepeer. It wraps ComfyUI’s node-based workflow system and adapts it for continuous frame processing. What ComfyStream adds over standard ComfyUI:- WebRTC frame ingestion and emission
- Async frame queue for continuous processing
- Warm model management to avoid per-frame load latency
- Livepeer AI worker integration via the
Pipelineinterface
- StreamDiffusion (optimised for live diffusion at 30+ fps)
- Standard SDXL / SD 1.5 (lower fps, higher quality)
- ControlNet variants (depth, pose, sketch, canny)
- IP-Adapter (style reference)
- DepthAnything / MiDaS (depth estimation)
- SAM2 (live segmentation)
- Any ComfyUI-compatible model loaded as a DAG node
Setup
StreamDiffusion — live-video diffusion models
StreamDiffusion is the primary model architecture for live-video AI on Livepeer. It was designed specifically for continuous frame processing and achieves 30+ fps on an RTX 4090. How StreamDiffusion achieves live performance:- Stream Batch — processes multiple frames simultaneously as a batch, amortising model overhead across frames
- Residual CFG — approximates classifier-free guidance with fewer forward passes
- Stochastic Similarity Filter — skips inference on frames that are sufficiently similar to the previous frame
- TinyVAE acceleration — uses a compressed VAE encoder/decoder for faster latency
ComfyUI workflow for StreamDiffusion
A typical ComfyStream workflow for live style application:ComfyStream workflow example
2 support live performance. Quality vs latency is tunable through the workflow.
The Pipeline interface (custom pipelines)
For operators who want to build custom live-video AI processing beyond ComfyUI, the AI runner exposes a PythonPipeline interface. Custom pipelines extend this interface and are packaged as Docker images extending livepeer/ai-runner:live-base.
Custom live-video Pipeline interface example
Integration requirements for custom pipelines
Custom live-video pipelines require two code changes in the upstream repositories:ai-worker/runner/dl_checkpoints.sh— add your pipeline to the model download scriptgo-livepeer/ai/worker/docker.go— add your pipeline to the container image map (livePipelineToImage)
The Livepeer team is working toward a fully dynamic plugin architecture that eliminates these manual upstream changes. Track progress on the ai-worker GitHub repository.
Model types for live-video inference
StreamDiffusion (primary)
Best for: Continuous style application, generative video effects, live prompt-to-videolcm-loravariants for fastest inference- SD 1.5 base with Lightning LoRA
- SDXL Turbo at reduced resolution
ControlNet variants
ControlNet conditioning allows style transfer guided by structure maps extracted from the input frame: Source: DepthAnything on HuggingFace · DWPose · ControlNet paperIP-Adapter (style reference)
IP-Adapter conditions generation on a reference image, enabling consistent style application across frames. Effective for brand-consistent visual transformation. Source: tencent-ailab/IP-Adapter on GitHubPerformance tuning
Maximising fps
Cascade performance is dominated by inference latency per frame. Key levers: Model selection: Use 1–2 step LCM or Lightning models instead of 20-step DDIM. The quality difference for streaming is acceptable; the latency difference is not. Resolution: Lower resolution dramatically increases fps. 512×512 at 30 fps is achievable on an RTX 4090 with StreamDiffusion. 768×768 drops to ~20 fps. 1024×1024 drops to ~12 fps. TensorRT compilation: For production deployments, compile the model to TensorRT engine format. One-time compilation overhead; 2–4× runtime speedup.TensorRT compilation example
VRAM management
Unlike batch AI, live-video pipelines hold models in VRAM continuously for the duration of the stream. VRAM must be reserved for:- Model weights (~8–18 GB for SDXL-class)
- Frame buffers (input + output, ~500 MB–1 GB per resolution)
- ControlNet/LoRA adapters (~1–3 GB each)
- Stream batch buffer (StreamDiffusion’s continuous frame queue)
Multi-stream capacity
A single RTX 4090 usually handles 1–2 concurrent live-video streams depending on resolution and model complexity. For multi-stream capacity:- Multiple GPUs: The AI worker dispatches streams across multiple physical GPUs
- Model parallelism: ComfyStream uses one GPU per stream
- Scale-out: Run multiple orchestrator instances, each handling 1–2 streams, behind a gateway load balancer
capacity values and maintaining multiple warm pipeline instances.
Watch: Cascade and live-video AI
Encode Club Live Video AI Bootcamp
Full Q1 2025 bootcamp session. Covers ComfyStream, Cascade architecture, and orchestrator setup for live-video inference.
StreamDiffusion Demo
StreamDiffusion GitHub repository includes benchmark videos showing 30+ fps live style transfer.
Troubleshooting
Frames dropping or high latency
Frames dropping or high latency
- Model is too slow for target fps — try a lower-step model (LCM, Lightning) or reduce output resolution
- VRAM OOM on frame buffer — reduce
stream_batch_sizein StreamDiffusion config - CPU bottleneck on encode/decode — WebRTC frame codec operations are CPU-bound; monitor CPU usage during streaming
- Network jitter — WebRTC is sensitive to packet loss; check your upstream network quality
Restore live-video job flow
Restore live-video job flow
- Confirm
live-video-to-videoappears on tools.livepeer.cloud/ai/network-capabilities under your orchestrator - Verify the
live-basecontainer is running and healthy:docker ps --filter name=livepeer-ai-runner-live - Check that your orchestrator’s
serviceAddris reachable from gateways — WebRTC ICE negotiation requires bidirectional reachability - Confirm your node has WebRTC port access (typically UDP 8935 or your configured port)
ComfyStream container failing to start
ComfyStream container failing to start
- Check model weights are present at the expected path (
-aiModelsDirlocation) - Check CUDA/driver compatibility — ComfyStream requires CUDA 12.0+
- Run the container manually to see startup output:
Check CUDA access inside the live-base container
Custom pipeline registration issues
Custom pipeline registration issues
- Verify
livePipelineToImageingo-livepeer/ai/worker/docker.goincludes your pipeline name - Confirm
dl_checkpoints.shinai-runnerincludes your pipeline’s model preparation step - The
model_idinaiModels.jsonmust match thenamefield in yourPipelineSpecexactly - After rebuilding and redeploying, re-register capabilities with the network
Related
Batch AI Setup
Configure text-to-image, audio-to-text, LLM, and other batch pipelines.
Model Hosting and VRAM
VRAM planning, warm model strategy, and aiModels.json reference.
ComfyStream on GitHub
Source repository for ComfyStream, including setup scripts, example workflows, and community contributions.
AI Workloads Overview
Batch vs live-video AI, pipeline types, and network routing.