Skip to main content
AI inference routing follows capability and price. A node with a 24 GB GPU and a warm model enters the routing pool as soon as registration propagates, without waiting for active-set entry or a large LPT bond.

This tutorial gets an orchestrator earning from AI inference in under two hours. One GPU, one warm diffusion model, no active set membership needed. Estimated time: 1.5 to 2.5 hours (most of this is model download time - approximately 6 GB). You will verify:
  • go-livepeer starts with the AI worker enabled
  • The warm model registers at tools.livepeer.cloud/ai/network-capabilities
  • A local test inference returns a result
  • The node is live on the Livepeer AI network

Prerequisites

Step 1: Install go-livepeer and pull the AI runner

Step 2: Write aiModels.json

aiModels.json declares which AI pipelines and models the node serves, and at what price.

Step 3: Pre-download the model

Download model weights before starting the node. The warm model must be available at startup or the AI worker will attempt to download it on-demand, which delays the first inference.
docker run --rm \
  -v ~/.lpData/models:/models \
  --gpus all \
  livepeer/ai-runner:latest \
  bash -c "PIPELINE=text-to-image MODEL_ID=ByteDance/SDXL-Lightning bash /app/dl_checkpoints.sh"
This downloads approximately 6 GB. On a 100 Mbps connection, expect 8 to 12 minutes. Watch for the download progress output. Verify the model files are present:
ls -lh ~/.lpData/models/
A non-empty directory confirms the download completed.

Step 4: Start go-livepeer with AI worker

Step 5: Register AI capabilities

Step 6: Test a local inference

Send a test inference request directly to the orchestrator to confirm the pipeline is serving:
curl -X POST http://localhost:8935/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "ByteDance/SDXL-Lightning",
    "prompt": "a blue mountain lake at sunrise",
    "width": 512,
    "height": 512,
    "num_inference_steps": 4
  }' \
  -o test-output.png
Verify the response:
file test-output.png
ls -lh test-output.png
Expected: test-output.png: PNG image data with a non-zero file size. The PNG is the generated image.
The first inference after startup is slower than later requests because the CUDA kernels warm up. That is normal. From the second request onward, SDXL-Lightning at 512×512 should complete in under 5 seconds.

What happened

The node completed the AI inference path:
  1. go-livepeer started with -aiWorker - it read aiModels.json, pulled the livepeer/ai-runner container, mounted the model weights from the host, and loaded ByteDance/SDXL-Lightning into GPU VRAM as a warm model.
  2. Capability advertisement - go-livepeer registered the text-to-image pipeline and warm model status on-chain (via the AI Service Registry) and announced it to the network. Gateways that query for text-to-image capability now see this node in their routing pool.
  3. The test inference travelled from the local HTTP call to the AI worker, to the AI runner container, through SDXL-Lightning’s inference pipeline, and back as a PNG. In production, this same path is triggered by a gateway routing an inference request.
  4. Earning begins when a gateway routes a job to this node. Each completed job sends a PM ticket worth approximately price_per_unit × pixels_in_output wei. See for how winning tickets translate to ETH.
AI routing ignores active-set rank. It checks capability advertisement and price instead. A new node with a warm model and a competitive price starts receiving AI jobs on the same day it registers.
Last modified on March 16, 2026