This tutorial adds AI inference capability to an existing go-livepeer video orchestrator. Estimated time: 1 hour plus model download time (6 to 10 GB depending on the model chosen). Prerequisites: A working video orchestrator already running on Arbitrum One mainnet. Fresh nodes should start with .
What changes and what stays the same
Video transcoding uses NVENC and NVDEC, the fixed-function hardware blocks built into NVIDIA GPUs. AI inference uses CUDA compute cores. These are separate hardware resources on the same die. Adding AI workloads leaves video transcoding capacity intact and preserves available NVENC sessions.VRAM headroom check
AI models require VRAM. Video transcoding uses NVENC/NVDEC silicon and consumes negligible VRAM. The available VRAM for AI is the total GPU VRAM minus a small system overhead. Check current VRAM state while the video node is running:Example VRAM output
Step 1: Download model weights
Download the model before restarting the node. AI runner containers mount the model directory at startup - weights must already be present.text-to-image and ByteDance/SDXL-Lightning with your chosen pipeline and model. See the VRAM table above and check tools.livepeer.cloud/ai/network-capabilities for current demand before choosing.
Download is approximately 6 GB for SDXL-Lightning. Verify:
Step 2: Create aiModels.json
price_per_unit from current market rates at tools.livepeer.cloud/ai/network-capabilities. Keep it at or below current gateway caps so the node remains routable.
Step 3: Add AI flags to the start command
Stop the running video node:-v /var/run/docker.sock:/var/run/docker.sock mount required for Docker-out-of-Docker. go-livepeer uses the Docker daemon to start AI runner containers through this socket.
Step 4: Verify both workloads
What happened
The node now operates in dual-workload configuration: video transcoding and AI inference run simultaneously on the same go-livepeer process. Video transcoding routes through NVENC/NVDEC, dedicated silicon with negligible VRAM demand. AI inference routes through CUDA compute cores with model weights loaded in VRAM. The two workloads use separate hardware paths. Income now flows from two sources:- ETH from video transcoding - probabilistic micropayment tickets per transcoded segment
- ETH from AI inference - probabilistic micropayment tickets per completed inference job
Reward() call for LPT inflation, and the same on-chain registration. The AI income stream uses the existing operator identity.
Related pages
Capacity Planning
Detailed VRAM budgeting and the benchmarking process for setting maxSessions correctly.
AI Model Management
Warm vs cold strategy, model rotation by demand, and optimisation flags.
Dual Mode Configuration
Full reference for dual-workload configuration including multi-GPU assignments.
Pricing Strategy
Set AI pipeline pricing in aiModels.json: per-pipeline, per-model, and USD notation.