Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt

Use this file to discover all available pages before exploring further.

Your Hugging Face model already fits one of the ten built-in Livepeer pipelines. You declare it, pre-download the weights, restart the orchestrator with the AI flags, and verify through your own self-hosted gateway. No Studio. No Daydream. No code written.

By the end of this tutorial, a Hugging Face model is running on your Livepeer orchestrator, advertised to the network, and callable through a gateway you operate. The example model is SG161222/RealVisXL_V4.0_Lightning, served through the text-to-image pipeline. What you will verify:
  • aiModels.json parses cleanly at orchestrator startup
  • The runner container loads the model into VRAM
  • The model is advertised on tools.livepeer.cloud/ai/network-capabilities
  • A request through your self-hosted gateway returns a successful inference result

Scope and intent

This is the simplest path: your model conforms to one of the ten pipeline shapes the Livepeer AI worker supports out of the box. The runner does the model loading, inference, and response formatting. You only declare the model and the price. This is the right tutorial if your model is, for example, an SDXL fine-tune, a BLIP variant, or a Whisper variant. It is not the right tutorial if:
  • your model needs custom Python code (preprocessing, postprocessing, novel architecture, or non-standard input or output shape). See the custom pipeline path.
  • your model ships as an arbitrary container with its own protocol. See the BYOC path.
  • your model is an LLM you want to run via Ollama instead of the standard livepeer/ai-runner image. The same overall flow applies but the runner image and aiModels.json entry differ. See the LLM variant note at the end.

Built-in pipelines

The Livepeer AI worker ships with a fixed set of pipeline implementations under livepeer/ai-worker/runner/src/runner/pipelines/. Each file defines the input schema, the output schema, and the model-loading conventions for one class of inference task. If your model fits the input and output shape of one of these, take this tutorial. If not, the model needs either a custom pipeline or a BYOC container.

Prerequisites

Each requirement is a hard prerequisite, not a soft one. Stop here if any is not in place.

Step 1: Choose the model directory

Pick a host path for model weights. The AI worker mounts this path into the runner container at /models.
export-model-dir.sh
export LP_AI_MODELS_DIR=/data/livepeer-ai-models
mkdir -p "$LP_AI_MODELS_DIR"
This is the path you pass to go-livepeer via -aiModelsDir. The runner reads weights from /models inside the container, which maps to this directory on the host.

Step 2: Declare the model in aiModels.json

Create an aiModels.json file. The orchestrator parses this file at startup and advertises every pipeline it lists.
aiModels.json
[
  {
    "pipeline": "text-to-image",
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "price_per_unit": 4768371,
    "pixels_per_unit": 1,
    "currency": "wei",
    "warm": true
  }
]
Each field, grounded in the schema parsed by go-livepeer:

Step 3: Pre-download the model weights

The model needs to land on disk before the runner starts. Otherwise warm load fails and lazy load stalls the first request. The canonical script in livepeer/ai-worker is runner/dl_checkpoints.sh. It reads pipeline names from environment variables, calls huggingface_hub.snapshot_download for each model, and places weights at $MODEL_DIR/<model_id>/.
download-weights.sh
git clone https://github.com/livepeer/ai-worker.git
cd ai-worker

docker run --rm \
  -v "$LP_AI_MODELS_DIR:/models" \
  -v "$(pwd)/runner:/runner" \
  -e MODEL_DIR=/models \
  -e PIPELINE=text-to-image \
  -e MODEL_ID=SG161222/RealVisXL_V4.0_Lightning \
  livepeer/ai-runner:latest \
  bash /runner/dl_checkpoints.sh
The command:
  1. Mounts your host model directory at /models inside the container
  2. Mounts the runner/ directory so the script and helpers are available
  3. Sets MODEL_DIR=/models so the script knows where to write
  4. Sets PIPELINE and MODEL_ID so the script knows what to fetch
  5. Runs the script, which uses huggingface_hub (already installed in the runner image) to pull the weights
Verify the download:
verify-weights.sh
ls -la "$LP_AI_MODELS_DIR/SG161222/RealVisXL_V4.0_Lightning/"
Expect SDXL’s standard layout: model_index.json, unet/, vae/, text_encoder/, text_encoder_2/, tokenizer/, tokenizer_2/, scheduler/. If the directory is empty or partial, re-run the command. huggingface_hub resumes partial downloads.

Step 4: Start the orchestrator with the new model

Stop your existing go-livepeer orchestrator and restart with the AI flags:
start-orchestrator.sh
go-livepeer \
  -orchestrator \
  -transcoder \
  -nvidia all \
  -aiWorker \
  -aiModels /path/to/aiModels.json \
  -aiModelsDir "$LP_AI_MODELS_DIR" \
  -ethUrl <your-arbitrum-rpc> \
  -serviceAddr <your-public-host>:<port> \
  -pricePerUnit 0
The relevant flags, defined in livepeer/go-livepeer/cmd/livepeer/livepeer.go: At startup, go-livepeer:
  1. Parses aiModels.json
  2. For each entry with warm: true, looks up the runner image from the pipeline-to-image map in livepeer/go-livepeer/ai/worker/docker.go, pulls it if absent, and starts a container
  3. Mounts $LP_AI_MODELS_DIR into the container at /models
  4. Waits for the runner’s /health endpoint to report ready
  5. Begins advertising the pipeline plus model plus price as a capability
Watch the logs. A successful warm load looks like a runner-container start, a model-load log line, and a “capability advertised” or equivalent message. Source for the runner’s health and readiness contract: livepeer/ai-worker/runner/src/runner/main.py (FastAPI app definition).

Step 5: Verify on the network capabilities tool

Open tools.livepeer.cloud/ai/network-capabilities in a browser. This dashboard reads live capability advertisements from active orchestrators on the network. Find your orchestrator address. You should see:
  • the text-to-image pipeline listed under your orchestrator
  • SG161222/RealVisXL_V4.0_Lightning listed under that pipeline
  • a warm indicator, if the dashboard surfaces it
If your orchestrator is not in the list, the model is not visible to the network. The three usual causes:
Confirm on explorer.livepeer.org that your address shows as active. Capability advertisement requires on-chain registration with sufficient stake.
Check docker ps -a for an exited container, then docker logs <container-id> for the failure reason. The most common is CUDA out-of-memory at warm load.
go-livepeer was started without -aiWorker, or aiModels.json did not parse. Check the orchestrator startup logs for parse errors.
Resolve any of these before continuing.

Step 6: Send a test inference request

Two paths verify the model end-to-end without touching Studio or Daydream. Use both in order: localhost first, gateway second.

Step 6a: Hit the runner directly on localhost

The runner is a FastAPI service. Source: livepeer/ai-worker/runner/src/runner/main.py. The orchestrator runs it on a port internal to the host (printed in startup logs as the AI worker port).
runner-direct.sh
curl -X POST http://localhost:<runner-port>/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "prompt": "a quiet harbour at dawn, photo realistic",
    "width": 1024,
    "height": 1024,
    "num_inference_steps": 4,
    "guidance_scale": 2.0
  }' \
  --output result.json
The four-step inference and low guidance scale follow the SDXL Lightning recommendations on the model card at huggingface.co/SG161222/RealVisXL_V4.0_Lightning. A successful response is a JSON object with an images array. Each image is base64-encoded or referenced by URL depending on runner version. Decode and inspect the output:
inspect-output.sh
jq -r '.images[0].url // .images[0]' result.json | head -c 200
This step confirms the model is loaded and inference works. It does not confirm that the model is reachable through the Livepeer network. That is Step 6b.

Step 6b: Self-hosted gateway test

go-livepeer runs as a gateway when started with -gateway. On a separate process or machine:
start-gateway.sh
go-livepeer \
  -gateway \
  -httpAddr 0.0.0.0:8935 \
  -orchAddr <your-orch-host>:<port> \
  -ethUrl <your-arbitrum-rpc>
The -orchAddr flag pins discovery to your own orchestrator, removing the variability of network-wide selection. This is what makes the test deterministic: the gateway can only route to your node. Then send the inference request to the gateway:
gateway-request.sh
curl -X POST http://localhost:8935/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "prompt": "a quiet harbour at dawn, photo realistic",
    "width": 1024,
    "height": 1024,
    "num_inference_steps": 4,
    "guidance_scale": 2.0
  }' \
  --output gateway-result.json
The gateway handles discovery, capability matching, ticket-based payment, and routing to the orchestrator. The response includes the inference output and a settlement record for the probabilistic micropayment ticket. A successful response means your model is reachable across the protocol layer through your own infrastructure.
The Livepeer Cloud Community Gateway is a free public gateway maintained by the Cloud SPE (Titan Node). Sending a request to it tests routing from outside your own infrastructure. The downside is non-determinism: it selects an orchestrator from the active set and may not select yours. Use it only as a cross-check after Step 6b succeeds, never as the primary verification.

Step 7: Confirm the loop is closed

The tutorial is complete when all four are observable:
  1. aiModels.json declares the model and go-livepeer parsed it cleanly at startup (orchestrator logs)
  2. The runner container is running and the model is loaded into VRAM (docker ps, nvidia-smi)
  3. The orchestrator advertises the model on tools.livepeer.cloud/ai/network-capabilities
  4. A request through your self-hosted gateway returns a successful inference result
If any one of these is missing, the model is not yet on the network. Resolve before relying on the path for paid traffic.

Operational notes

Setting price-per-pixel above the network median means your orchestrator receives no jobs. Gateway selection in go-livepeer filters by price competitiveness. Compare against the rates visible on the network capabilities dashboard before going live.
warm: true holds the model in VRAM continuously. SDXL-class models occupy roughly 12 GB; on a 24 GB card you can warm one SDXL plus, perhaps, a smaller pipeline like image-to-text (4 GB floor per Salesforce/blip-image-captioning-large) but not two SDXL variants. Cold models (warm: false) share VRAM via swap on first request; price them lower because the cold-start latency makes them less attractive to gateways.
Replace the model_id in aiModels.json and the MODEL_ID in the download command with your chosen model. The pipeline name stays the same as long as the model fits the same I/O shape. For example, swapping SG161222/RealVisXL_V4.0_Lightning for ByteDance/SDXL-Lightning (also a text-to-image model) requires no other changes.

LLM variant via Ollama

LLM models follow the same overall flow but use a different runner image. The Cloud SPE maintains tztcloud/livepeer-ollama-runner, which wraps Ollama for OpenAI-compatible completions. The aiModels.json entry for an LLM:
aiModels-llm.json
{
  "pipeline": "llm",
  "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "price_per_unit": 1,
  "pixels_per_unit": 1000000,
  "currency": "wei",
  "warm": true
}
The model identifier is the Hugging Face repo for documentation purposes; the actual model pull happens through Ollama’s tagging system (ollama pull llama3.1:8b) inside the Ollama runner container. The mapping between HF identifier and Ollama tag for each LLM is the only piece that does not generalise from the standard runner. Reference: the Ollama tag library at ollama.com/library. Otherwise the pattern is identical: declare in aiModels.json, ensure the runner image is available, restart go-livepeer, verify on the capabilities tool, test through your self-hosted gateway with an OpenAI-compatible chat completion request.

Troubleshooting

Run docker logs <container-id>. Three common causes: model files missing or partial (re-run Step 3); CUDA out-of-memory at load (insufficient VRAM, downgrade to warm: false or pick a smaller variant); image pull failed (check Docker Hub connectivity).
Check explorer.livepeer.org that your orchestrator is in the active set. Capability advertisement requires on-chain registration with sufficient stake.
Confirm serviceAddr is reachable from outside your network. Open the relevant port at the firewall, confirm DNS, and confirm the orchestrator is binding to a public interface instead of localhost.
Check that you are using the SDXL Lightning recommended sampling (4 steps, low guidance). Different SDXL fine-tunes have different recommended schedulers and step counts. Consult the model card.

Sources

Every claim in this tutorial is grounded in one of the following readable references:
Your model is now running on the Livepeer network, advertised to gateways, and callable through your self-hosted gateway. For custom architectures that do not fit a native pipeline, see the advanced paths.

AI agent prompt

Complete the "Add a Hugging Face Model to Livepeer" tutorial for a model that fits an existing Livepeer AI pipeline. Use placeholders for MODEL_ID=<huggingface org/repo>, PIPELINE=<canonical pipeline name>, LP_AI_MODELS_DIR=/data/livepeer-ai-models, ORCH_SERVICE_ADDR=<orchestrator service address>, ORCH_ETH_ADDR=<orchestrator ETH address>, GATEWAY_PORT=8935, and ORCH_ADDR=<orchestrator address>. Clone livepeer/ai-worker only for the checkpoint script, use livepeer/ai-runner images, write aiModels.json, pre-download weights, start go-livepeer with -aiWorker -aiModels -aiModelsDir, verify the runner container and tools.livepeer.cloud capability listing, then start a self-hosted go-livepeer -gateway pinned to the orchestrator and send a test inference request. Do not use Studio or Daydream.

Advanced HuggingFace paths

Three structurally different paths: existing pipeline, custom pipeline, BYOC.

Full AI Pipeline Tutorial

Local end-to-end pipeline: gateway routes inference to orchestrator and the result returns through the full pipeline.

Realtime AI Tutorial

Live video-to-video pipeline: continuous WebRTC stream in, transformed stream out.

ComfyStream Quickstart

Stand up a ComfyStream pipeline for real-time AI workloads.
Last modified on May 19, 2026