Prerequisites
Before you begin:- go-livepeer is installed and running as a transcoding orchestrator on Arbitrum mainnet (see Install go-livepeer and Get Started)
- Your orchestrator is in the Top 100 active set on the Livepeer network
- Docker is installed with
nvidia-container-toolkitenabled (GPU passthrough required for the AI runner containers) - Your GPU has at least 4GB of VRAM available to run at least one AI pipeline (see the hardware check below)
- Model weights pre-downloaded for the pipeline(s) you want to serve (see Download AI Models)
This guide adds AI inference to an existing transcoding node. If you are setting up from scratch, start with Install go-livepeer.
Check your hardware
AI inference runs in a separate Docker container alongside your transcoding process. If both share the same GPU, VRAM is split between them. Before configuring anything, confirm how much VRAM your GPU has available. Run this command to list your GPUs and their VRAM:| Pipeline | Min VRAM | Notes |
|---|---|---|
image-to-text | 4GB | Caption generation; lowest barrier to entry |
segment-anything-2 | 6GB | Object segmentation |
LLM (llm) | 8GB | Requires Ollama runner; 7–8B quantised models |
audio-to-text | 12GB | Speech transcription; Whisper-based |
image-to-video | 16GB+ | Animated video from image |
image-to-image | 20GB | Style transfer, image manipulation |
text-to-image | 24GB | Text-to-image generation (Stable Diffusion, SDXL) |
upscale | Image upscaling | |
text-to-speech | Speech synthesis |
Step 1 — Pull the AI runner image
The AI subnet uses a separate Docker image (livepeer/ai-runner) to run inference. Pull it before starting your node:
segment-anything-2 pipeline, also pull its pipeline-specific image:
Step 2 — Configure aiModels.json
TheaiModels.json file tells your orchestrator which AI pipelines and models to serve, what to charge, and whether to keep models warm in VRAM.
Create the file at ~/.lpData/aiModels.json:
text-to-image pipeline with a warm model — the minimal working configuration:
Field reference
| Field | Required | Description |
|---|---|---|
pipeline | Yes | Pipeline name (e.g. "text-to-image", "audio-to-text", "llm") |
model_id | Yes | HuggingFace model ID |
price_per_unit | Yes | Price in wei per unit (integer), or USD string e.g. "0.5e-2USD" |
warm | No | If true, model is preloaded into VRAM on startup |
capacity | No | Max concurrent inference requests (default: 1) |
optimization_flags | No | Performance flags: SFAST (up to +25% speed) and/or DEEPCACHE (up to +50% speed) |
url | No | For external containers only — URL of a separately managed runner |
token | No | Bearer token for external container authentication |
During Beta, only one warm model per GPU is supported. Set
"warm": true for the model you want pre-loaded; additional models will load on demand when requested.Step 3 — Update your startup command
Stop your current go-livepeer process, then restart it with the following additions. Three flags enable AI:-aiWorker— enables the AI worker functionality-aiModels— path to youraiModels.jsonfile-aiModelsDir— directory where model weights are stored on the host machine
The
-aiModelsDir path must be the host machine path, not the path inside the Docker container. The orchestrator uses docker-out-of-docker to start ai-runner containers, and passes this path directly to them.Step 4 — Verify AI is active
Check the logs
Within a few seconds of startup, you should see a line like this for each model configured as warm:aiModels.jsonis valid JSON and at the path specified in-aiModels- The model weights are present in
-aiModelsDir - The Docker socket is mounted (Docker mode only)
Test the AI runner directly
Once running, confirm the AI runner responds by sending a test inference request. Navigate tohttp://localhost:8000/docs in your browser to access the Swagger UI for the ai-runner container.
Alternatively, use curl:
images array containing a base64-encoded PNG URL.
Confirm pipelines are advertised
Your AI pipelines will appear in the Livepeer Explorer on your orchestrator’s profile once on-chain capability advertisement is configured. See Publish Offerings for that step.Choose your AI path
Your AI runner is active. The next step depends on which pipeline type you want to specialise in.Set up batch AI inference
Configure image, audio, and video generation pipelines. Covers model downloads, pricing, and on-chain registration for batch inference.
Set up real-time AI (Cascade)
Configure ComfyStream for persistent video stream processing. Covers ComfyUI workflow deployment and GPU allocation.
Related
- Job Types — understand the difference between transcoding, batch AI, real-time AI, and LLM inference before choosing a path
- AI Pipeline Configuration — advanced aiModels.json options, multi-GPU setup, external containers, and optimization flags