Skip to main content
Adding AI inference to a running video orchestrator is an additive change. On-chain registration, staking, reward calling, and all transcoding flags stay exactly as they are. Three new flags and one new file are all that change.

This tutorial adds AI inference capability to an existing go-livepeer video orchestrator. Estimated time: 1 hour plus model download time (6 to 10 GB depending on the model chosen). Prerequisites: A working video orchestrator already running on Arbitrum One mainnet. Fresh nodes should start with .

What changes and what stays the same

Video transcoding uses NVENC and NVDEC, the fixed-function hardware blocks built into NVIDIA GPUs. AI inference uses CUDA compute cores. These are separate hardware resources on the same die. Adding AI workloads leaves video transcoding capacity intact and preserves available NVENC sessions.

VRAM headroom check

AI models require VRAM. Video transcoding uses NVENC/NVDEC silicon and consumes negligible VRAM. The available VRAM for AI is the total GPU VRAM minus a small system overhead. Check current VRAM state while the video node is running:
nvidia-smi --query-gpu=name,memory.total,memory.free,memory.used \
  --format=csv,noheader,nounits
Example output:
Example VRAM output
NVIDIA GeForce RTX 4090, 24564, 22100, 2464
Values are in MB. In this example: 24 GB total, 22 GB free, 2.4 GB used by the running video node. Choose a model that fits within the free VRAM, with at least 2 GB headroom:

Step 1: Download model weights

Download the model before restarting the node. AI runner containers mount the model directory at startup - weights must already be present.
docker run --rm \
  -v ~/.lpData/models:/models \
  --gpus all \
  livepeer/ai-runner:latest \
  bash -c "PIPELINE=text-to-image MODEL_ID=ByteDance/SDXL-Lightning bash /app/dl_checkpoints.sh"
Replace text-to-image and ByteDance/SDXL-Lightning with your chosen pipeline and model. See the VRAM table above and check tools.livepeer.cloud/ai/network-capabilities for current demand before choosing. Download is approximately 6 GB for SDXL-Lightning. Verify:
ls -lh ~/.lpData/models/

Step 2: Create aiModels.json

cat > ~/.lpData/aiModels.json << 'EOF'
[
  {
    "pipeline": "text-to-image",
    "model_id": "ByteDance/SDXL-Lightning",
    "price_per_unit": 4768371,
    "warm": true
  }
]
EOF
Set price_per_unit from current market rates at tools.livepeer.cloud/ai/network-capabilities. Keep it at or below current gateway caps so the node remains routable.

Step 3: Add AI flags to the start command

Stop the running video node:
docker stop livepeer-orchestrator
docker rm livepeer-orchestrator
Restart with the three AI flags added. The existing video flags are unchanged:
docker run -d \
  --name livepeer-orchestrator \
  -v ~/.lpData/:/root/.lpData/ \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --network host \
  --gpus all \
  livepeer/go-livepeer:latest \
  -network arbitrum-one-mainnet \
  -ethUrl https://arb-mainnet.g.alchemy.com/v2/YOUR_API_KEY \
  -orchestrator \
  -transcoder \
  -nvidia 0 \
  -maxSessions 10 \
  -pricePerUnit 1000 \
  -serviceAddr YOUR_PUBLIC_IP:8935 \
  -aiWorker \
  -aiModels /root/.lpData/aiModels.json \
  -aiModelsDir /root/.lpData/models
The restart command includes the -v /var/run/docker.sock:/var/run/docker.sock mount required for Docker-out-of-Docker. go-livepeer uses the Docker daemon to start AI runner containers through this socket.
Use the host path for -aiModelsDir (~/.lpData/models). Docker mounts that directory into each AI runner container it creates.

Step 4: Verify both workloads

What happened

The node now operates in dual-workload configuration: video transcoding and AI inference run simultaneously on the same go-livepeer process. Video transcoding routes through NVENC/NVDEC, dedicated silicon with negligible VRAM demand. AI inference routes through CUDA compute cores with model weights loaded in VRAM. The two workloads use separate hardware paths. Income now flows from two sources:
  • ETH from video transcoding - probabilistic micropayment tickets per transcoded segment
  • ETH from AI inference - probabilistic micropayment tickets per completed inference job
Both streams use the same wallet, the same Reward() call for LPT inflation, and the same on-chain registration. The AI income stream uses the existing operator identity.
Last modified on March 16, 2026