Documentation Index
Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt
Use this file to discover all available pages before exploring further.
By the end of this tutorial, a Hugging Face model is running on your Livepeer orchestrator, advertised to the network, and callable through a gateway you operate. The example model is
SG161222/RealVisXL_V4.0_Lightning, served through the text-to-image pipeline.
What you will verify:
aiModels.jsonparses cleanly at orchestrator startup- The runner container loads the model into VRAM
- The model is advertised on
tools.livepeer.cloud/ai/network-capabilities - A request through your self-hosted gateway returns a successful inference result
Scope and intent
This is the simplest path: your model conforms to one of the ten pipeline shapes the Livepeer AI worker supports out of the box. The runner does the model loading, inference, and response formatting. You only declare the model and the price. This is the right tutorial if your model is, for example, an SDXL fine-tune, a BLIP variant, or a Whisper variant. It is not the right tutorial if:- your model needs custom Python code (preprocessing, postprocessing, novel architecture, or non-standard input or output shape). See the custom pipeline path.
- your model ships as an arbitrary container with its own protocol. See the BYOC path.
- your model is an LLM you want to run via Ollama instead of the standard
livepeer/ai-runnerimage. The same overall flow applies but the runner image andaiModels.jsonentry differ. See the LLM variant note at the end.
Built-in pipelines
The Livepeer AI worker ships with a fixed set of pipeline implementations underlivepeer/ai-worker/runner/src/runner/pipelines/.
Each file defines the input schema, the output schema, and the model-loading conventions for one class of
inference task.
If your model fits the input and output shape of one of these, take this tutorial. If not, the model needs
either a custom pipeline or a BYOC container.
Prerequisites
Each requirement is a hard prerequisite, not a soft one. Stop here if any is not in place.Step 1: Choose the model directory
Pick a host path for model weights. The AI worker mounts this path into the runner container at/models.
export-model-dir.sh
go-livepeer via -aiModelsDir. The runner reads weights from /models inside
the container, which maps to this directory on the host.
Step 2: Declare the model in aiModels.json
Create anaiModels.json file. The orchestrator parses this file at startup and advertises every pipeline it
lists.
aiModels.json
go-livepeer:
Step 3: Pre-download the model weights
The model needs to land on disk before the runner starts. Otherwise warm load fails and lazy load stalls the first request. The canonical script inlivepeer/ai-worker is
runner/dl_checkpoints.sh. It
reads pipeline names from environment variables, calls huggingface_hub.snapshot_download for each model, and
places weights at $MODEL_DIR/<model_id>/.
download-weights.sh
- Mounts your host model directory at
/modelsinside the container - Mounts the
runner/directory so the script and helpers are available - Sets
MODEL_DIR=/modelsso the script knows where to write - Sets
PIPELINEandMODEL_IDso the script knows what to fetch - Runs the script, which uses
huggingface_hub(already installed in the runner image) to pull the weights
verify-weights.sh
model_index.json, unet/, vae/, text_encoder/, text_encoder_2/,
tokenizer/, tokenizer_2/, scheduler/. If the directory is empty or partial, re-run the command.
huggingface_hub resumes partial downloads.
Step 4: Start the orchestrator with the new model
Stop your existinggo-livepeer orchestrator and restart with the AI flags:
start-orchestrator.sh
livepeer/go-livepeer/cmd/livepeer/livepeer.go:
At startup, go-livepeer:
- Parses
aiModels.json - For each entry with
warm: true, looks up the runner image from the pipeline-to-image map inlivepeer/go-livepeer/ai/worker/docker.go, pulls it if absent, and starts a container - Mounts
$LP_AI_MODELS_DIRinto the container at/models - Waits for the runner’s
/healthendpoint to report ready - Begins advertising the pipeline plus model plus price as a capability
livepeer/ai-worker/runner/src/runner/main.py
(FastAPI app definition).
Step 5: Verify on the network capabilities tool
Opentools.livepeer.cloud/ai/network-capabilities in
a browser. This dashboard reads live capability advertisements from active orchestrators on the network.
Find your orchestrator address. You should see:
- the
text-to-imagepipeline listed under your orchestrator SG161222/RealVisXL_V4.0_Lightninglisted under that pipeline- a warm indicator, if the dashboard surfaces it
Orchestrator not in the active set
Orchestrator not in the active set
Confirm on
explorer.livepeer.org that your address shows as active.
Capability advertisement requires on-chain registration with sufficient stake.Runner container failed to start
Runner container failed to start
Check
docker ps -a for an exited container, then docker logs <container-id> for the failure reason.
The most common is CUDA out-of-memory at warm load.aiModels.json did not parse
aiModels.json did not parse
go-livepeer was started without -aiWorker, or aiModels.json did not parse. Check the orchestrator
startup logs for parse errors.Step 6: Send a test inference request
Two paths verify the model end-to-end without touching Studio or Daydream. Use both in order: localhost first, gateway second.Step 6a: Hit the runner directly on localhost
The runner is a FastAPI service. Source:livepeer/ai-worker/runner/src/runner/main.py.
The orchestrator runs it on a port internal to the host (printed in startup logs as the AI worker port).
runner-direct.sh
huggingface.co/SG161222/RealVisXL_V4.0_Lightning.
A successful response is a JSON object with an images array. Each image is base64-encoded or referenced by
URL depending on runner version. Decode and inspect the output:
inspect-output.sh
Step 6b: Self-hosted gateway test
go-livepeer runs as a gateway when started with -gateway. On a separate process or machine:
start-gateway.sh
-orchAddr flag pins discovery to your own orchestrator, removing the variability of network-wide
selection. This is what makes the test deterministic: the gateway can only route to your node.
Then send the inference request to the gateway:
gateway-request.sh
The Livepeer Cloud Community Gateway is a free public gateway maintained by the Cloud SPE (Titan Node).
Sending a request to it tests routing from outside your own infrastructure. The downside is non-determinism:
it selects an orchestrator from the active set and may not select yours. Use it only as a cross-check after
Step 6b succeeds, never as the primary verification.
Step 7: Confirm the loop is closed
The tutorial is complete when all four are observable:aiModels.jsondeclares the model andgo-livepeerparsed it cleanly at startup (orchestrator logs)- The runner container is running and the model is loaded into VRAM (
docker ps,nvidia-smi) - The orchestrator advertises the model on
tools.livepeer.cloud/ai/network-capabilities - A request through your self-hosted gateway returns a successful inference result
Operational notes
Pricing
Pricing
Setting price-per-pixel above the network median means your orchestrator receives no jobs. Gateway
selection in
go-livepeer filters by price competitiveness. Compare against the rates visible on the
network capabilities dashboard before going live.Warm and cold trade-off
Warm and cold trade-off
warm: true holds the model in VRAM continuously. SDXL-class models occupy roughly 12 GB; on a 24 GB card
you can warm one SDXL plus, perhaps, a smaller pipeline like image-to-text (4 GB floor per
Salesforce/blip-image-captioning-large) but not two SDXL variants. Cold models (warm: false) share
VRAM via swap on first request; price them lower because the cold-start latency makes them less attractive
to gateways.Same flow, different model
Same flow, different model
Replace the
model_id in aiModels.json and the MODEL_ID in the download command with your chosen
model. The pipeline name stays the same as long as the model fits the same I/O shape. For example,
swapping SG161222/RealVisXL_V4.0_Lightning for ByteDance/SDXL-Lightning (also a text-to-image model)
requires no other changes.LLM variant via Ollama
LLM models follow the same overall flow but use a different runner image. The Cloud SPE maintainstztcloud/livepeer-ollama-runner, which wraps
Ollama for OpenAI-compatible completions.
The aiModels.json entry for an LLM:
aiModels-llm.json
ollama pull llama3.1:8b) inside the Ollama runner container. The mapping
between HF identifier and Ollama tag for each LLM is the only piece that does not generalise from the
standard runner. Reference: the Ollama tag library at ollama.com/library.
Otherwise the pattern is identical: declare in aiModels.json, ensure the runner image is available, restart
go-livepeer, verify on the capabilities tool, test through your self-hosted gateway with an
OpenAI-compatible chat completion request.
Troubleshooting
Runner container exits immediately
Runner container exits immediately
Run
docker logs <container-id>. Three common causes: model files missing or partial (re-run Step 3);
CUDA out-of-memory at load (insufficient VRAM, downgrade to warm: false or pick a smaller variant);
image pull failed (check Docker Hub connectivity).Orchestrator absent from capabilities tool but runner loaded
Orchestrator absent from capabilities tool but runner loaded
Check
explorer.livepeer.org that your orchestrator is in the active
set. Capability advertisement requires on-chain registration with sufficient stake.Localhost works but gateway fails
Localhost works but gateway fails
Confirm
serviceAddr is reachable from outside your network. Open the relevant port at the firewall,
confirm DNS, and confirm the orchestrator is binding to a public interface instead of localhost.Inference returns low-quality output
Inference returns low-quality output
Check that you are using the SDXL Lightning recommended sampling (4 steps, low guidance). Different SDXL
fine-tunes have different recommended schedulers and step counts. Consult the model card.
Sources
Every claim in this tutorial is grounded in one of the following readable references:Livepeer source
Livepeer source
github.com/livepeer/ai-worker– runner architecture, pipeline implementations,dl_checkpoints.shlivepeer/ai-worker/runner/src/runner/pipelines– supported pipeline list and their I/O shapeslivepeer/ai-worker/runner/dl_checkpoints.sh– model download script, environment variables, HF integrationlivepeer/ai-worker/runner/src/runner/main.py– FastAPI app,/healthendpoint, port bindinggithub.com/livepeer/go-livepeer– orchestrator, gateway, AI worker modelivepeer/go-livepeer/cmd/livepeer/livepeer.go– flag definitions for-aiWorker,-aiModels,-aiModelsDir,-gateway,-orchAddr,-serviceAddrlivepeer/go-livepeer/ai/worker/docker.go– pipeline-to-image map keyed on canonical pipeline name strings
External source
External source
huggingface_hub–snapshot_downloadsemanticshuggingface.co/SG161222/RealVisXL_V4.0_Lightning– model card, recommended samplinghub.docker.com/r/livepeer/ai-runner– runner image, tagshub.docker.com/r/tztcloud/livepeer-ollama-runner– Ollama-based LLM runnertools.livepeer.cloud/ai/network-capabilities– live capability dashboardexplorer.livepeer.org– orchestrator active-set status
AI agent prompt
Related pages
Advanced HuggingFace paths
Three structurally different paths: existing pipeline, custom pipeline, BYOC.
Full AI Pipeline Tutorial
Local end-to-end pipeline: gateway routes inference to orchestrator and the result returns through the full pipeline.
Realtime AI Tutorial
Live video-to-video pipeline: continuous WebRTC stream in, transformed stream out.
ComfyStream Quickstart
Stand up a ComfyStream pipeline for real-time AI workloads.