Add AI to a Video Node - Livepeer Docs

Adding AI inference to a running video Orchestrator is an additive change. On-chain registration, staking, reward calling, and all transcoding flags stay exactly as they are. Three new flags and one new file are all that change.

This tutorial adds AI inference capability to an existing go-livepeer video Orchestrator. Estimated time: 1 hour plus model download time (6 to 10 GB depending on the model chosen). Prerequisites: A working video Orchestrator already running on Arbitrum One mainnet. Fresh nodes should start with .

What changes and what stays the same

Video transcoding uses NVENC and NVDEC, the fixed-function hardware blocks built into NVIDIA GPUs. AI inference uses CUDA compute cores. These are separate hardware resources on the same die. Adding AI workloads leaves video transcoding capacity intact and preserves available NVENC sessions.

VRAM headroom check

AI models require VRAM. Video transcoding uses NVENC/NVDEC silicon and consumes negligible VRAM. The available VRAM for AI is the total GPU VRAM minus a small system overhead. Check current VRAM state while the video node is running:

nvidia-smi --query-gpu=name,memory.total,memory.free,memory.used \
  --format=csv,noheader,nounits

Example output:

Example VRAM output

NVIDIA GeForce RTX 4090, 24564, 22100, 2464

Values are in MB. In this example: 24 GB total, 22 GB free, 2.4 GB used by the running video node. Choose a model that fits within the free VRAM, with at least 2 GB headroom:

Step 1: Download model weights

Download the model before restarting the node. AI Runner containers mount the model directory at startup - weights must already be present.

docker run --rm \
  -v ~/.lpData/models:/models \
  --gpus all \
  livepeer/ai-runner:latest \
  bash -c "PIPELINE=text-to-image MODEL_ID=ByteDance/SDXL-Lightning bash /app/dl_checkpoints.sh"

Replace text-to-image and ByteDance/SDXL-Lightning with your chosen pipeline and model. See the VRAM table above and check tools.Livepeer.cloud/ai/network-capabilities for current demand before choosing. Download is approximately 6 GB for SDXL-Lightning. Verify:

ls -lh ~/.lpData/models/

Step 2: Create aiModels.json

cat > ~/.lpData/aiModels.json << 'EOF'
[
  {
    "pipeline": "text-to-image",
    "model_id": "ByteDance/SDXL-Lightning",
    "price_per_unit": 4768371,
    "warm": true
  }
]
EOF

Set price_per_unit from current market rates at tools.Livepeer.cloud/ai/network-capabilities. Keep it at or below current Gateway caps so the node remains routable.

Step 3: Add AI flags to the start command

Stop the running video node:

docker stop livepeer-orchestrator
docker rm livepeer-orchestrator

Restart with the three AI flags added. The existing video flags are unchanged:

docker run -d \
  --name livepeer-orchestrator \
  -v ~/.lpData/:/root/.lpData/ \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --network host \
  --gpus all \
  livepeer/go-livepeer:latest \
  -network arbitrum-one-mainnet \
  -ethUrl https://arb-mainnet.g.alchemy.com/v2/YOUR_API_KEY \
  -orchestrator \
  -transcoder \
  -nvidia 0 \
  -maxSessions 10 \
  -pricePerUnit 1000 \
  -serviceAddr YOUR_PUBLIC_IP:8935 \
  -aiWorker \
  -aiModels /root/.lpData/aiModels.json \
  -aiModelsDir /root/.lpData/models

The restart command includes the -v /var/run/docker.sock:/var/run/docker.sock mount required for Docker-out-of-Docker. Go-livepeer uses the Docker daemon to start AI Runner containers through this socket.

Use the host path for -aiModelsDir (~/.lpData/models). Docker mounts that directory into each AI Runner container it creates.

Step 4: Verify both workloads

What happened

The node now operates in dual-workload configuration: video transcoding and AI inference run simultaneously on the same go-livepeer process. Video transcoding routes through NVENC/NVDEC, dedicated silicon with negligible VRAM demand. AI inference routes through CUDA compute cores with model weights loaded in VRAM. The two workloads use separate hardware paths. Income now flows from two sources:

ETH from video transcoding - probabilistic micropayment tickets per transcoded segment
ETH from AI inference - probabilistic micropayment tickets per completed inference job

Both streams use the same wallet, the same Reward() call for LPT inflation, and the same on-chain registration. The AI income stream uses the existing operator identity.

Capacity Planning

Detailed VRAM budgeting and the benchmarking process for setting maxSessions correctly.

AI Model Management

Warm vs cold strategy, model rotation by demand, and optimisation flags.

Dual Mode Configuration

Full reference for dual-workload configuration including multi-GPU assignments.

Pricing Strategy

Set AI pipeline pricing in aiModels.json: per-pipeline, per-model, and USD notation.

​What changes and what stays the same

​VRAM headroom check

​Step 1: Download model weights

​Step 2: Create aiModels.json

​Step 3: Add AI flags to the start command

​Step 4: Verify both workloads

​What happened

​Related pages

Capacity Planning

AI Model Management

Dual Mode Configuration

Pricing Strategy

What changes and what stays the same

VRAM headroom check

Step 1: Download model weights

Step 2: Create aiModels.json

Step 3: Add AI flags to the start command

Step 4: Verify both workloads

What happened

Related pages