This tutorial sets up a working
live-video-to-video pipeline using the Cascade architecture and ComfyStream. By the end, a live video stream enters the orchestrator, StreamDiffusion transforms each frame in a continuous low-latency pipeline, and the output stream is viewable. Estimated time: 3 hours (most of this is model download time).
What you will verify:
- The
livepeer/ai-runner:live-basecontainer starts cleanly with GPU access - The
live-video-to-videopipeline registers attools.livepeer.cloud/ai/network-capabilities - A test stream sends successfully and the transformed output is visible
How realtime AI differs from batch
At 30 fps, the frame budget is 33 ms. The pipeline must receive, process, and emit each frame within that window. StreamDiffusion’s architecture is purpose-built for this: stream batching, residual CFG, and stochastic similarity filtering combine to achieve 30+ fps on an RTX 4090.Prerequisites
Step 1: Verify GPU and Docker access
525.60.13 or newer.
Confirm Docker GPU access:
Step 2: Pull the live-base AI runner image
Thelive-base image is separate from the standard livepeer/ai-runner used for batch pipelines. It includes ComfyStream, ComfyUI, and StreamDiffusion dependencies:
CUDA available: True, GPU: NVIDIA GeForce RTX 4090
Step 3: Download ComfyStream model weights
ComfyStream requires model weights before the container starts. Clone the ComfyStream repository and run the download script:-aiModelsDir.
Verify the download:
Step 4: Configure aiModels.json for live pipeline
For the live pipeline,model_id names the ComfyUI workflow or pipeline. The underlying models load inside the ComfyStream container.
price_per_unit for the live pipeline is charged per frame, unlike batch pipelines that charge per pixel or per millisecond. Set a value at or below the current gateway caps in -maxPricePerCapability for live-video-to-video. Check current rates at tools.livepeer.cloud/ai/network-capabilities.
Step 5: Start go-livepeer with live AI flags
Existing AI nodes should stop and restart with the updatedaiModels.json. Fresh setups should use:
Expected live-runner startup log
livepeer-orchestrator and the AI runner container for the live pipeline.
Step 6: Set up the gateway for live routing
Start an off-chain gateway that routeslive-video-to-video jobs to the orchestrator for this local test:
Expected gateway startup log
For a production gateway routing live AI jobs on-chain, configure
-maxPricePerCapability with a cap for live-video-to-video. The gateway routes only to orchestrators priced at or below this cap, regardless of hardware capability.Step 7: Send a test stream
Send a test RTMP stream through the gateway using ffmpeg. This simulates a camera or OBS stream:live-video-to-video pipeline. Keep this running while checking the output.
In a second terminal, watch the orchestrator process frames:
Expected frame-processing log
Step 8: Verify the transformed output
Retrieve the processed output stream from the gateway:live-video-to-video pipeline should appear with Warm status.
Latency check:
Monitor frame processing times in the orchestrator logs. At 30 fps, each frame should be processed in under 33 ms. Repeated frame times above 33 ms show the pipeline is falling behind the incoming stream:
Troubleshooting
Frames dropping or high latency:- The model is running too slowly for the target fps. StreamDiffusion at 2 steps is the minimum viable configuration for 30 fps on an RTX 4090. Try reducing output resolution.
- VRAM OOM: reduce
stream_batch_sizein the StreamDiffusion config. - CPU bottleneck: WebRTC frame encode/decode is CPU-bound. Monitor CPU with
htop.
- Confirm
live-video-to-videoappears at tools.livepeer.cloud/ai/network-capabilities. - Verify the live runner container is running:
docker ps --filter name=livepeer. - Check the container started cleanly:
docker logs <live-runner-container-name>.
True. Any other result means CUDA is unavailable inside the container, so re-install the NVIDIA Container Toolkit.
What happened
The Cascade architecture processed a live stream end-to-end:- ffmpeg sent an RTMP stream to the gateway at
:1935. - The gateway routed the stream to the orchestrator at
:8935with alive-video-to-videocapability match. - The orchestrator dispatched the stream to the
livepeer/ai-runner:live-basecontainer. - ComfyStream received each frame via WebRTC, ran it through the StreamDiffusion workflow, and emitted the processed frame.
- The orchestrator collected processed frames and returned the output stream through the gateway.
- The HLS output was available at the gateway’s
/hls/endpoint.
-livePaymentInterval, default 5 seconds) instead of one ticket per frame. This reduces payment overhead for continuous streams.
Related pages
Realtime AI Setup
Full reference for Cascade architecture, ComfyStream workflows, ControlNet variants, and multi-stream capacity.
Full AI Pipeline Tutorial
Batch inference end-to-end - the alternative pipeline for request-response AI workloads.
Capacity Planning
VRAM budgeting for realtime workloads and the one-warm-model-per-GPU constraint.
Gateway-Orchestrator Interface
Production combined setup with port allocation and pricing alignment.