Add a Hugging Face Model to Livepeer

Your Hugging Face model already fits one of the ten built-in Livepeer pipelines. You declare it, pre-download the weights, restart the orchestrator with the AI flags, and verify through your own self-hosted gateway. No Studio. No Daydream. No code written.

By the end of this tutorial, a Hugging Face model is running on your Livepeer orchestrator, advertised to the network, and callable through a gateway you operate. The example model is SG161222/RealVisXL_V4.0_Lightning, served through the text-to-image pipeline. What you will verify:

aiModels.json parses cleanly at orchestrator startup
The runner container loads the model into VRAM
The model is advertised on tools.livepeer.cloud/ai/network-capabilities
A request through your self-hosted gateway returns a successful inference result

Scope and intent

This is the simplest path: your model conforms to one of the ten pipeline shapes the Livepeer AI worker supports out of the box. The runner does the model loading, inference, and response formatting. You only declare the model and the price. This is the right tutorial if your model is, for example, an SDXL fine-tune, a BLIP variant, or a Whisper variant. It is not the right tutorial if:

your model needs custom Python code (preprocessing, postprocessing, novel architecture, or non-standard input or output shape). See the custom pipeline path.
your model ships as an arbitrary container with its own protocol. See the BYOC path.
your model is an LLM you want to run via Ollama instead of the standard livepeer/ai-runner image. The same overall flow applies but the runner image and aiModels.json entry differ. See the LLM variant note at the end.

Built-in pipelines

The Livepeer AI worker ships with a fixed set of pipeline implementations under livepeer/ai-worker/runner/src/runner/pipelines/. Each file defines the input schema, the output schema, and the model-loading conventions for one class of inference task. If your model fits the input and output shape of one of these, take this tutorial. If not, the model needs either a custom pipeline or a BYOC container.

Prerequisites

Each requirement is a hard prerequisite, not a soft one. Stop here if any is not in place.

Step 1: Choose the model directory

Pick a host path for model weights. The AI worker mounts this path into the runner container at /models.

export-model-dir.sh

export LP_AI_MODELS_DIR=/data/livepeer-ai-models
mkdir -p "$LP_AI_MODELS_DIR"

This is the path you pass to go-livepeer via -aiModelsDir. The runner reads weights from /models inside the container, which maps to this directory on the host.

Step 2: Declare the model in aiModels.json

Create an aiModels.json file. The orchestrator parses this file at startup and advertises every pipeline it lists.

aiModels.json

[
  {
    "pipeline": "text-to-image",
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "price_per_unit": 4768371,
    "pixels_per_unit": 1,
    "currency": "wei",
    "warm": true
  }
]

Each field, grounded in the schema parsed by go-livepeer:

Step 3: Pre-download the model weights

The model needs to land on disk before the runner starts. Otherwise warm load fails and lazy load stalls the first request. The canonical script in livepeer/ai-worker is runner/dl_checkpoints.sh. It reads pipeline names from environment variables, calls huggingface_hub.snapshot_download for each model, and places weights at $MODEL_DIR/<model_id>/.

download-weights.sh

git clone https://github.com/livepeer/ai-worker.git
cd ai-worker

docker run --rm \
  -v "$LP_AI_MODELS_DIR:/models" \
  -v "$(pwd)/runner:/runner" \
  -e MODEL_DIR=/models \
  -e PIPELINE=text-to-image \
  -e MODEL_ID=SG161222/RealVisXL_V4.0_Lightning \
  livepeer/ai-runner:latest \
  bash /runner/dl_checkpoints.sh

The command:

Mounts your host model directory at /models inside the container
Mounts the runner/ directory so the script and helpers are available
Sets MODEL_DIR=/models so the script knows where to write
Sets PIPELINE and MODEL_ID so the script knows what to fetch
Runs the script, which uses huggingface_hub (already installed in the runner image) to pull the weights

Verify the download:

verify-weights.sh

ls -la "$LP_AI_MODELS_DIR/SG161222/RealVisXL_V4.0_Lightning/"

Expect SDXL’s standard layout: model_index.json, unet/, vae/, text_encoder/, text_encoder_2/, tokenizer/, tokenizer_2/, scheduler/. If the directory is empty or partial, re-run the command. huggingface_hub resumes partial downloads.

Step 4: Start the orchestrator with the new model

Stop your existing go-livepeer orchestrator and restart with the AI flags:

start-orchestrator.sh

go-livepeer \
  -orchestrator \
  -transcoder \
  -nvidia all \
  -aiWorker \
  -aiModels /path/to/aiModels.json \
  -aiModelsDir "$LP_AI_MODELS_DIR" \
  -ethUrl <your-arbitrum-rpc> \
  -serviceAddr <your-public-host>:<port> \
  -pricePerUnit 0

The relevant flags, defined in livepeer/go-livepeer/cmd/livepeer/livepeer.go: At startup, go-livepeer:

Parses aiModels.json
For each entry with warm: true, looks up the runner image from the pipeline-to-image map in livepeer/go-livepeer/ai/worker/docker.go, pulls it if absent, and starts a container
Mounts $LP_AI_MODELS_DIR into the container at /models
Waits for the runner’s /health endpoint to report ready
Begins advertising the pipeline plus model plus price as a capability

Watch the logs. A successful warm load looks like a runner-container start, a model-load log line, and a “capability advertised” or equivalent message. Source for the runner’s health and readiness contract: livepeer/ai-worker/runner/src/runner/main.py (FastAPI app definition).

Step 5: Verify on the network capabilities tool

Open tools.livepeer.cloud/ai/network-capabilities in a browser. This dashboard reads live capability advertisements from active orchestrators on the network. Find your orchestrator address. You should see:

the text-to-image pipeline listed under your orchestrator
SG161222/RealVisXL_V4.0_Lightning listed under that pipeline
a warm indicator, if the dashboard surfaces it

If your orchestrator is not in the list, the model is not visible to the network. The three usual causes:

Orchestrator not in the active set

Confirm on explorer.livepeer.org that your address shows as active. Capability advertisement requires on-chain registration with sufficient stake.

Runner container failed to start

Check docker ps -a for an exited container, then docker logs <container-id> for the failure reason. The most common is CUDA out-of-memory at warm load.

aiModels.json did not parse

go-livepeer was started without -aiWorker, or aiModels.json did not parse. Check the orchestrator startup logs for parse errors.

Resolve any of these before continuing.

Step 6: Send a test inference request

Two paths verify the model end-to-end without touching Studio or Daydream. Use both in order: localhost first, gateway second.

Step 6a: Hit the runner directly on localhost

The runner is a FastAPI service. Source: livepeer/ai-worker/runner/src/runner/main.py. The orchestrator runs it on a port internal to the host (printed in startup logs as the AI worker port).

runner-direct.sh

curl -X POST http://localhost:<runner-port>/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "prompt": "a quiet harbour at dawn, photo realistic",
    "width": 1024,
    "height": 1024,
    "num_inference_steps": 4,
    "guidance_scale": 2.0
  }' \
  --output result.json

The four-step inference and low guidance scale follow the SDXL Lightning recommendations on the model card at huggingface.co/SG161222/RealVisXL_V4.0_Lightning. A successful response is a JSON object with an images array. Each image is base64-encoded or referenced by URL depending on runner version. Decode and inspect the output:

inspect-output.sh

jq -r '.images[0].url // .images[0]' result.json | head -c 200

This step confirms the model is loaded and inference works. It does not confirm that the model is reachable through the Livepeer network. That is Step 6b.

Step 6b: Self-hosted gateway test

go-livepeer runs as a gateway when started with -gateway. On a separate process or machine:

start-gateway.sh

go-livepeer \
  -gateway \
  -httpAddr 0.0.0.0:8935 \
  -orchAddr <your-orch-host>:<port> \
  -ethUrl <your-arbitrum-rpc>

The -orchAddr flag pins discovery to your own orchestrator, removing the variability of network-wide selection. This is what makes the test deterministic: the gateway can only route to your node. Then send the inference request to the gateway:

gateway-request.sh

curl -X POST http://localhost:8935/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "prompt": "a quiet harbour at dawn, photo realistic",
    "width": 1024,
    "height": 1024,
    "num_inference_steps": 4,
    "guidance_scale": 2.0
  }' \
  --output gateway-result.json

The gateway handles discovery, capability matching, ticket-based payment, and routing to the orchestrator. The response includes the inference output and a settlement record for the probabilistic micropayment ticket. A successful response means your model is reachable across the protocol layer through your own infrastructure.

The Livepeer Cloud Community Gateway is a free public gateway maintained by the Cloud SPE (Titan Node). Sending a request to it tests routing from outside your own infrastructure. The downside is non-determinism: it selects an orchestrator from the active set and may not select yours. Use it only as a cross-check after Step 6b succeeds, never as the primary verification.

Step 7: Confirm the loop is closed

The tutorial is complete when all four are observable:

aiModels.json declares the model and go-livepeer parsed it cleanly at startup (orchestrator logs)
The runner container is running and the model is loaded into VRAM (docker ps, nvidia-smi)
The orchestrator advertises the model on tools.livepeer.cloud/ai/network-capabilities
A request through your self-hosted gateway returns a successful inference result

If any one of these is missing, the model is not yet on the network. Resolve before relying on the path for paid traffic.

Operational notes

Pricing

Setting price-per-pixel above the network median means your orchestrator receives no jobs. Gateway selection in go-livepeer filters by price competitiveness. Compare against the rates visible on the network capabilities dashboard before going live.

Warm and cold trade-off

warm: true holds the model in VRAM continuously. SDXL-class models occupy roughly 12 GB; on a 24 GB card you can warm one SDXL plus, perhaps, a smaller pipeline like image-to-text (4 GB floor per Salesforce/blip-image-captioning-large) but not two SDXL variants. Cold models (warm: false) share VRAM via swap on first request; price them lower because the cold-start latency makes them less attractive to gateways.

Same flow, different model

Replace the model_id in aiModels.json and the MODEL_ID in the download command with your chosen model. The pipeline name stays the same as long as the model fits the same I/O shape. For example, swapping SG161222/RealVisXL_V4.0_Lightning for ByteDance/SDXL-Lightning (also a text-to-image model) requires no other changes.

LLM variant via Ollama

LLM models follow the same overall flow but use a different runner image. The Cloud SPE maintains tztcloud/livepeer-ollama-runner, which wraps Ollama for OpenAI-compatible completions. The aiModels.json entry for an LLM:

aiModels-llm.json

{
  "pipeline": "llm",
  "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "price_per_unit": 1,
  "pixels_per_unit": 1000000,
  "currency": "wei",
  "warm": true
}

The model identifier is the Hugging Face repo for documentation purposes; the actual model pull happens through Ollama’s tagging system (ollama pull llama3.1:8b) inside the Ollama runner container. The mapping between HF identifier and Ollama tag for each LLM is the only piece that does not generalise from the standard runner. Reference: the Ollama tag library at ollama.com/library. Otherwise the pattern is identical: declare in aiModels.json, ensure the runner image is available, restart go-livepeer, verify on the capabilities tool, test through your self-hosted gateway with an OpenAI-compatible chat completion request.

Troubleshooting

Runner container exits immediately

Run docker logs <container-id>. Three common causes: model files missing or partial (re-run Step 3); CUDA out-of-memory at load (insufficient VRAM, downgrade to warm: false or pick a smaller variant); image pull failed (check Docker Hub connectivity).

Orchestrator absent from capabilities tool but runner loaded

Check explorer.livepeer.org that your orchestrator is in the active set. Capability advertisement requires on-chain registration with sufficient stake.

Localhost works but gateway fails

Confirm serviceAddr is reachable from outside your network. Open the relevant port at the firewall, confirm DNS, and confirm the orchestrator is binding to a public interface instead of localhost.

Inference returns low-quality output

Check that you are using the SDXL Lightning recommended sampling (4 steps, low guidance). Different SDXL fine-tunes have different recommended schedulers and step counts. Consult the model card.

Sources

Every claim in this tutorial is grounded in one of the following readable references:

Livepeer source

github.com/livepeer/ai-worker – runner architecture, pipeline implementations, dl_checkpoints.sh
livepeer/ai-worker/runner/src/runner/pipelines – supported pipeline list and their I/O shapes
livepeer/ai-worker/runner/dl_checkpoints.sh – model download script, environment variables, HF integration
livepeer/ai-worker/runner/src/runner/main.py – FastAPI app, /health endpoint, port binding
github.com/livepeer/go-livepeer – orchestrator, gateway, AI worker mode
livepeer/go-livepeer/cmd/livepeer/livepeer.go – flag definitions for -aiWorker, -aiModels, -aiModelsDir, -gateway, -orchAddr, -serviceAddr
livepeer/go-livepeer/ai/worker/docker.go – pipeline-to-image map keyed on canonical pipeline name strings

External source

huggingface_hub – snapshot_download semantics
huggingface.co/SG161222/RealVisXL_V4.0_Lightning – model card, recommended sampling
hub.docker.com/r/livepeer/ai-runner – runner image, tags
hub.docker.com/r/tztcloud/livepeer-ollama-runner – Ollama-based LLM runner
tools.livepeer.cloud/ai/network-capabilities – live capability dashboard
explorer.livepeer.org – orchestrator active-set status

Your model is now running on the Livepeer network, advertised to gateways, and callable through your self-hosted gateway. For custom architectures that do not fit a native pipeline, see the advanced paths.

AI agent prompt

Complete the "Add a Hugging Face Model to Livepeer" tutorial for a model that fits an existing Livepeer AI pipeline. Use placeholders for MODEL_ID=<huggingface org/repo>, PIPELINE=<canonical pipeline name>, LP_AI_MODELS_DIR=/data/livepeer-ai-models, ORCH_SERVICE_ADDR=<orchestrator service address>, ORCH_ETH_ADDR=<orchestrator ETH address>, GATEWAY_PORT=8935, and ORCH_ADDR=<orchestrator address>. Clone livepeer/ai-worker only for the checkpoint script, use livepeer/ai-runner images, write aiModels.json, pre-download weights, start go-livepeer with -aiWorker -aiModels -aiModelsDir, verify the runner container and tools.livepeer.cloud capability listing, then start a self-hosted go-livepeer -gateway pinned to the orchestrator and send a test inference request. Do not use Studio or Daydream.

Advanced HuggingFace paths

Three structurally different paths: existing pipeline, custom pipeline, BYOC.

Full AI Pipeline Tutorial

Local end-to-end pipeline: gateway routes inference to orchestrator and the result returns through the full pipeline.

Realtime AI Tutorial

Live video-to-video pipeline: continuous WebRTC stream in, transformed stream out.

ComfyStream Quickstart

Stand up a ComfyStream pipeline for real-time AI workloads.

Start here

Concepts

Learn

Build

Guides

Resources

Add a Hugging Face Model to Livepeer

Scope and intent

Built-in pipelines

Prerequisites

Step 1: Choose the model directory

Step 2: Declare the model in aiModels.json

Step 3: Pre-download the model weights

Step 4: Start the orchestrator with the new model

Step 5: Verify on the network capabilities tool

Step 6: Send a test inference request

Step 6a: Hit the runner directly on localhost

Step 6b: Self-hosted gateway test

Step 7: Confirm the loop is closed

Operational notes

LLM variant via Ollama

Troubleshooting

Sources

AI agent prompt

Advanced HuggingFace paths

Full AI Pipeline Tutorial

Realtime AI Tutorial

ComfyStream Quickstart

Start here

Concepts

Learn

Build

Guides

Resources

Documentation Index

​Scope and intent

​Built-in pipelines

​Prerequisites

​Step 1: Choose the model directory

​Step 2: Declare the model in aiModels.json

​Step 3: Pre-download the model weights

​Step 4: Start the orchestrator with the new model

​Step 5: Verify on the network capabilities tool

​Step 6: Send a test inference request

​Step 6a: Hit the runner directly on localhost

​Step 6b: Self-hosted gateway test

​Step 7: Confirm the loop is closed

​Operational notes

​LLM variant via Ollama

​Troubleshooting

​Sources

​AI agent prompt

​Related pages

Advanced HuggingFace paths

Full AI Pipeline Tutorial

Realtime AI Tutorial

ComfyStream Quickstart

Scope and intent

Built-in pipelines

Prerequisites

Step 1: Choose the model directory

Step 2: Declare the model in aiModels.json

Step 3: Pre-download the model weights

Step 4: Start the orchestrator with the new model

Step 5: Verify on the network capabilities tool

Step 6: Send a test inference request

Step 6a: Hit the runner directly on localhost

Step 6b: Self-hosted gateway test

Step 7: Confirm the loop is closed

Operational notes

LLM variant via Ollama

Troubleshooting

Sources

AI agent prompt

Related pages