Realtime AI Tutorial - Livepeer Docs

Realtime AI is Livepeer’s highest-value compute offering. A continuous stream of video frames enters the Orchestrator, AI transforms each frame, and a processed stream exits. This is fundamentally different from batch inference - there is no request-response cycle.

This tutorial sets up a working live-video-to-video pipeline using the Cascade architecture and ComfyStream. By the end, a live video stream enters the Orchestrator, StreamDiffusion transforms each frame in a continuous low-latency pipeline, and the output stream is viewable. Estimated time: 3 hours (most of this is model download time). What you will verify:

The livepeer/ai-runner:live-base container starts cleanly with GPU access
The live-video-to-video pipeline registers at tools.livepeer.cloud/ai/network-capabilities
A test stream sends successfully and the transformed output is visible

How realtime AI differs from batch

At 30 fps, the frame budget is 33 ms. The pipeline must receive, process, and emit each frame within that window. StreamDiffusion’s architecture is purpose-built for this: stream batching, residual CFG, and stochastic similarity filtering combine to achieve 30+ fps on an RTX 4090.

Prerequisites

GPUs below 24 GB VRAM (RTX 3080 10 GB, RTX 3060 12 GB) are typically insufficient for live-video inference at acceptable frame rates. StreamDiffusion’s stream batch buffers, model weights, and ControlNet adapters combined exhaust available VRAM on these cards.

Step 1: Verify GPU and Docker access

nvidia-smi

Note the GPU name, VRAM total, and driver version. Driver must be 525.60.13 or newer. Confirm Docker GPU access:

docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

The GPU table should appear inside the container output. Re-install the NVIDIA Container Toolkit until this command succeeds.

Step 2: Pull the live-base AI Runner image

The live-base image is separate from the standard livepeer/ai-runner used for batch pipelines. It includes ComfyStream, ComfyUI, and StreamDiffusion dependencies:

docker pull livepeer/ai-runner:live-base

Verify the image is available:

docker images | grep "ai-runner.*live-base"

Verify CUDA works inside the container:

docker run --gpus all --rm livepeer/ai-runner:live-base \
  python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}, GPU: {torch.cuda.get_device_name(0)}')"

Expected: CUDA available: True, GPU: NVIDIA GeForce RTX 4090

Step 3: Download ComfyStream model weights

ComfyStream requires model weights before the container starts. Clone the ComfyStream repository and run the download script:

git clone https://github.com/livepeer/comfystream
cd comfystream
pip install -r requirements.txt

Download StreamDiffusion and base models:

python scripts/download_models.py

This downloads approximately 15 to 20 GB. Wait for completion. Models are downloaded to the directory that will be mounted into the AI Runner container via -aiModelsDir. Verify the download:

ls -lh ~/.lpData/models/ | head -20

Step 4: Configure aiModels.json for live pipeline

For the live pipeline, model_id names the ComfyUI workflow or pipeline. The underlying models load inside the ComfyStream container.

cat > ~/.lpData/aiModels.json << 'EOF'
[
  {
    "pipeline": "live-video-to-video",
    "model_id": "streamdiffusion",
    "price_per_unit": 500,
    "warm": true
  }
]
EOF

price_per_unit for the live pipeline is charged per frame, unlike batch pipelines that charge per pixel or per millisecond. Set a value at or below the current Gateway caps in -maxPricePerCapability for live-video-to-video. Check current rates at tools.Livepeer.cloud/ai/network-capabilities.

Step 5: Start go-livepeer with live AI flags

Existing AI nodes should stop and restart with the updated aiModels.json. Fresh setups should use:

docker run -d \
  --name livepeer-orchestrator \
  -v ~/.lpData/:/root/.lpData/ \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --network host \
  --gpus all \
  livepeer/go-livepeer:latest \
  -network arbitrum-one-mainnet \
  -ethUrl https://arb-mainnet.g.alchemy.com/v2/YOUR_API_KEY \
  -orchestrator \
  -transcoder \
  -nvidia 0 \
  -pricePerUnit 1000 \
  -serviceAddr YOUR_PUBLIC_IP:8935 \
  -aiWorker \
  -aiModels /root/.lpData/aiModels.json \
  -aiModelsDir /root/.lpData/models \
  -v 6

Wait for the live runner container to start and the pipeline to warm. This takes longer than batch pipelines - ComfyStream loads the full ComfyUI environment and StreamDiffusion model stack:

docker logs -f livepeer-orchestrator 2>&1 | grep -i "live\|cascade\|pipeline\|warm\|error"

Expected after a typical 5 to 10 minute warm-up:

Expected live-runner startup log

Starting AI worker
Starting live-video-to-video pipeline: streamdiffusion
ComfyStream container started
Warm model loaded: streamdiffusion

Check the live runner container is running:

docker ps | grep livepeer

Two containers should be running: livepeer-orchestrator and the AI Runner container for the live pipeline.

Step 6: Set up the Gateway for live routing

Start an off-chain Gateway that routes live-video-to-video jobs to the Orchestrator for this local test:

docker run -d \
  --name livepeer-gateway-live \
  -v ~/.lpData-gateway-live/:/root/.lpData/ \
  --network host \
  livepeer/go-livepeer:latest \
  -gateway \
  -cliAddr 127.0.0.1:7936 \
  -httpAddr 0.0.0.0:8936 \
  -rtmpAddr 0.0.0.0:1935 \
  -orchAddr http://127.0.0.1:8935 \
  -httpIngest \
  -remoteSignerAddr https://signer.eliteencoder.net \
  -network offchain

Verify the Gateway started:

docker logs livepeer-gateway-live 2>&1 | grep -i "started\|gateway\|rtmp\|http" | head -10

Expected:

Expected gateway startup log

Gateway started on :8936
RTMP server listening on :1935

For a production Gateway routing live AI jobs on-chain, configure -maxPricePerCapability with a cap for live-video-to-video. The Gateway routes only to Orchestrators priced at or below this cap, regardless of hardware capability.

Step 7: Send a test stream

Send a test RTMP stream through the Gateway using FFmpeg. This simulates a camera or OBS stream:

# Generate a synthetic test pattern and stream it via RTMP
ffmpeg \
  -re \
  -f lavfi -i "testsrc=size=512x512:rate=30" \
  -f lavfi -i "sine=frequency=440:sample_rate=44100" \
  -vcodec libx264 \
  -preset ultrafast \
  -tune zerolatency \
  -b:v 2000k \
  -acodec aac \
  -f flv \
  rtmp://localhost:1935/live/test-stream-key

This streams a synthetic test pattern at 30 fps. The stream should be processed by the live-video-to-video pipeline. Keep this running while checking the output. In a second terminal, watch the Orchestrator process frames:

docker logs -f livepeer-orchestrator 2>&1 | grep -i "frame\|stream\|cascade\|inference" | head -20

Expected:

Expected frame-processing log

Received live stream: test-stream-key
Dispatching to live-video-to-video pipeline
Processing frame 0
Processing frame 1
...

Step 8: Verify the transformed output

Retrieve the processed output stream from the Gateway:

# Pull the transformed output HLS stream
curl -o output-manifest.m3u8 http://localhost:8936/hls/test-stream-key/index.m3u8

A non-empty manifest confirms the live pipeline is processing frames and delivering output. To view the output stream in VLC or another player:

ffplay http://localhost:8936/hls/test-stream-key/index.m3u8

Check the network registration: Open tools.Livepeer.cloud/ai/network-capabilities and search for the Orchestrator address. The live-video-to-video pipeline should appear with Warm status. Latency check: Monitor frame processing times in the Orchestrator logs. At 30 fps, each frame should be processed in under 33 ms. Repeated frame times above 33 ms show the pipeline is falling behind the incoming stream:

docker logs livepeer-orchestrator 2>&1 | grep -i "frame.*ms\|latency\|processing time" | tail -20

Troubleshooting

Frames dropping or high latency:

The model is running too slowly for the target fps. StreamDiffusion at 2 steps is the minimum viable configuration for 30 fps on an RTX 4090. Try reducing output resolution.
VRAM OOM: reduce stream_batch_size in the StreamDiffusion config.
CPU bottleneck: WebRTC frame encode/decode is CPU-bound. Monitor CPU with htop.

Pipeline job registration check:

Confirm live-video-to-video appears at tools.Livepeer.cloud/ai/network-capabilities.
Verify the live runner container is running: docker ps --filter name=livepeer.
Check the container started cleanly: docker logs <live-runner-container-name>.

ComfyStream container failing to start:

docker run --gpus all --rm livepeer/ai-runner:live-base \
  python -c "import torch; print(torch.cuda.is_available())"

This command should return True. Any other result means CUDA is unavailable inside the container, so re-install the NVIDIA Container Toolkit.

What happened

The Cascade architecture processed a live stream end-to-end:

FFmpeg sent an RTMP stream to the Gateway at :1935.
The Gateway routed the stream to the Orchestrator at :8935 with a live-video-to-video capability match.
The Orchestrator dispatched the stream to the livepeer/ai-runner:live-base container.
ComfyStream received each frame via WebRTC, ran it through the StreamDiffusion workflow, and emitted the processed frame.
The Orchestrator collected processed frames and returned the output stream through the Gateway.
The HLS output was available at the Gateway’s /hls/ endpoint.

Payment for live streams uses an interval-based model instead of per-frame settlement: the Gateway sends periodic PM tickets at a configurable interval (-livePaymentInterval, default 5 seconds) instead of one ticket per frame. This reduces payment overhead for continuous streams.

Realtime AI Setup

Full reference for Cascade architecture, ComfyStream workflows, ControlNet variants, and multi-stream capacity.

Full AI Pipeline Tutorial

Batch inference end-to-end - the alternative pipeline for request-response AI workloads.

Capacity Planning

VRAM budgeting for realtime workloads and the one-warm-model-per-GPU constraint.

Gateway-Orchestrator Interface

Production combined setup with port allocation and pricing alignment.

​How realtime AI differs from batch

​Prerequisites

​Step 1: Verify GPU and Docker access

​Step 2: Pull the live-base AI Runner image

​Step 3: Download ComfyStream model weights

​Step 4: Configure aiModels.json for live pipeline

​Step 5: Start go-livepeer with live AI flags

​Step 6: Set up the Gateway for live routing

​Step 7: Send a test stream

​Step 8: Verify the transformed output

​Troubleshooting

​What happened

​Related pages

Realtime AI Setup

Full AI Pipeline Tutorial

Capacity Planning

Gateway-Orchestrator Interface

How realtime AI differs from batch

Prerequisites

Step 1: Verify GPU and Docker access

Step 2: Pull the live-base AI Runner image

Step 3: Download ComfyStream model weights

Step 4: Configure aiModels.json for live pipeline

Step 5: Start go-livepeer with live AI flags

Step 6: Set up the Gateway for live routing

Step 7: Send a test stream

Step 8: Verify the transformed output

Troubleshooting

What happened

Related pages