Documentation Index
Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt
Use this file to discover all available pages before exploring further.
Livepeer’s AI inference layer is implemented as a set of pipeline runners (
livepeer/ai-worker) coordinated by the orchestrator process
(livepeer/go-livepeer). Where your model fits in that layer
determines which path you take.
What you will verify (whichever path you take):
- Your model loads cleanly inside its runner or container
- Your orchestrator advertises the capability on
tools.livepeer.cloud/ai/network-capabilities - A request through your self-hosted gateway returns a successful inference result
Path summary
Decision flow
Built-in pipeline shapes (Question 1)
The built-in pipelines, readable fromlivepeer/ai-worker/runner/src/runner/pipelines/:
If your model is, say, an SDXL fine-tune, a BLIP variant, or a Whisper variant, the answer is yes. The
pipeline already handles your I/O. Take Path 1.
If your model is a diffusion model that needs custom preprocessing the built-in pipeline does not do (a novel
ControlNet, a non-standard scheduler, multi-stage inference), the answer is no. The I/O looks similar but the
runtime behaviour does not fit. Continue to Question 2.
If your model is something else entirely (a protein folder, an audio classifier, a multi-modal model with
three inputs), the answer is no. Continue to Question 2.
Custom pipeline scope (Question 2)
Custom pipelines extend thePipeline interface
for real-time pipelines or the equivalent batch base class for batch pipelines. The package is a normal
Python project managed with uv, shipped as a Docker image extending
livepeer/ai-runner:live-base.
Two upstream PRs are required today, because the pipeline registry is not yet dynamic:
- one to
livepeer/ai-worker/runner/dl_checkpoints.sh, adding your pipeline to the model preparation switch - one to
livepeer/go-livepeer/ai/worker/docker.go, adding your pipeline name to thelivePipelineToImagemap so the orchestrator knows which container to launch
BYOC container (Question 3)
The BYOC contract, defined inlivepeer/go-livepeer and visible in the orchestrator’s external-capability
handling code, requires:
- a
/healthendpoint that returns 200 when the container is ready - one or more job-handling endpoints whose protocol is whatever you publish to gateways using your capability
- a stable container image and version
livepeer/ai-worker or livepeer/go-livepeer.
The trade-off: BYOC requires gateway-side implementation work for any gateway operator who wants to call your
capability. You are shipping both a model and a small protocol that gateways must adopt.
Take Path 3 if your model needs a non-standard protocol and you are willing to coordinate with gateway
operators (or run your own gateway) to drive adoption.
Shared prerequisites
These are identical regardless of path. They are prerequisites, not part of any individual path.Path differences at a glance
Path 1: Configure an existing pipeline
By the end of Path 1, a Hugging Face model conforming to one of the built-in pipeline shapes is running on your Livepeer orchestrator, advertised to the network, and callable through your self-hosted gateway. The example isSG161222/RealVisXL_V4.0_Lightning on the text-to-image pipeline.
You are not writing code. You are declaring the model, pre-downloading weights, and restarting the
orchestrator with the AI flags. The runner does the rest.
Step 1: Pick the model directory
export-model-dir.sh
/models.
Step 2: Write aiModels.json
aiModels.json
pipeline: the canonical pipeline name (hyphenated form). Source: keys inlivePipelineToImageinlivepeer/go-livepeer/ai/worker/docker.go.model_id: the Hugging Face repository slug, exactly as it appears inhuggingface.co/<org>/<repo>. Used by the runner as both the download target and the inference-routing key.price_per_unitandpixels_per_unit: together set the rate. For pixel-priced pipelines, the rate isprice_per_unit / pixels_per_unitwei per pixel. The wei figure is illustrative; set yours by comparing live rates ontools.livepeer.cloud/ai/network-capabilities.currency:"wei"(Arbitrum-native ETH).warm:truekeeps the model in VRAM continuously, eliminating cold-start latency on the first request. Required to compete on latency.
Step 3: Pre-download weights
The canonical script islivepeer/ai-worker/runner/dl_checkpoints.sh.
It uses huggingface_hub.snapshot_download to fetch model files into $MODEL_DIR/<model_id>/.
download-weights.sh
verify-weights.sh
model_index.json, unet/, vae/, text_encoder/, text_encoder_2/,
tokenizer/, tokenizer_2/, scheduler/. If empty or partial, re-run; huggingface_hub resumes partial
downloads.
Step 4: Start go-livepeer with the AI flags
start-orchestrator.sh
livepeer/go-livepeer/cmd/livepeer/livepeer.go:
-aiWorker: declare this node serves AI jobs. Without it,aiModels.jsonis ignored.-aiModels: path to your config file.-aiModelsDir: host directory with the weights. Mounts to/modelsinside the runner.-nvidia: GPU index (orall).
go-livepeer parses aiModels.json, pulls the runner image from livePipelineToImage for each
declared pipeline, mounts the models directory, starts the runner container, waits for /health to return
200, and begins advertising the capability.
Step 5: Verify on the capabilities tool
tools.livepeer.cloud/ai/network-capabilities shows
live capability advertisements from active orchestrators.
Find your orchestrator address and check that text-to-image appears with
SG161222/RealVisXL_V4.0_Lightning under it.
If it does not appear:
Active set check
Active set check
Confirm your orchestrator is in the active set on
explorer.livepeer.org.Runner container check
Runner container check
Check
docker ps -a for an exited runner container, then docker logs <id> for the cause. CUDA
out-of-memory at warm load is the most common.aiModels.json parse check
aiModels.json parse check
Check
go-livepeer logs for aiModels.json parse errors.Step 6: Test through your own gateway
Direct runner test
The runner is a FastAPI app. Seelivepeer/ai-worker/runner/src/runner/main.py.
It runs on a port the orchestrator prints at startup.
runner-direct.sh
Self-hosted gateway test
Run a secondgo-livepeer instance as a gateway pinned to your orchestrator:
start-gateway.sh
-orchAddr removes the variability of network-wide selection, so the test is deterministic.
gateway-request.sh
Path 1 done
You have completed Path 1 when:- The runner container is up and the model is in VRAM (
docker ps,nvidia-smi) - The model appears on
tools.livepeer.cloud/ai/network-capabilitiesunder your orchestrator - A request through your self-hosted gateway returns a successful inference result
model_id in aiModels.json and MODEL_ID in the download command. The pipeline
name stays the same as long as the new model fits the same I/O shape.
LLM variant. LLM models follow the same flow but use the Cloud SPE-maintained
tztcloud/livepeer-ollama-runner image
(Docker Hub). The aiModels.json entry uses
pipeline: "llm" and the Hugging Face model_id is the slug for documentation purposes; the actual model
pull happens through Ollama’s tag system. Reference: Ollama tag library.Path 2: Build a custom pipeline
By the end of Path 2, you have a Python package implementing the Livepeer AI runnerPipeline interface, a
Docker image built on top of livepeer/ai-runner:live-base, and the upstream PRs prepared against
livepeer/ai-worker and livepeer/go-livepeer. After those PRs merge, your pipeline runs on the network the
same way the built-in pipelines do.
The reference implementation throughout is
daydreamlive/scope-runner. When in doubt, read the
equivalent file in scope-runner.
Step 1: Initialise the project
The Livepeer AI runner uses uv for dependency management. Source:pyproject.toml and uv.lock in
livepeer/ai-worker/runner.
init-project.sh
pyproject.toml with:
pyproject.toml
ai-runner revision to a tagged release for reproducibility. Use ai-runner[batch] instead of
ai-runner[realtime] for batch (request/response) pipelines.
Project layout:
project-layout
Step 2: Implement the Pipeline interface
The interface lives atlivepeer/ai-worker/runner/src/runner/live/pipelines/interface.py.
Parameters
src/my_pipeline/pipeline/params.py
Pipeline class
src/my_pipeline/pipeline/pipeline.py
pipeline.py
is the working reference for how to wire frame_queue, asyncio.to_thread, and warm-load patterns.
Keep
__init__.py files minimal. Do not export Pipeline or Params from __init__.py. The runner
loader imports them by full path (module.path:ClassName); re-exporting triggers expensive imports (torch,
transformers) when only the params class is needed.Step 3: Application entrypoint
src/my_pipeline/main.py
name field is the wire identifier. It must match the entry you add to livePipelineToImage in Step 6.
Step 4: Dockerfile
Dockerfile
HF_HUB_OFFLINE=1 blocks Hugging Face Hub access at runtime. Weights must be present from the prepare step.
dl_checkpoints.sh overrides this during model preparation.
Step 5: Test locally
Build:build-image.sh
prepare-models.sh
run-pipeline.sh
/health:
check-health.sh
go-livepeer
polls before declaring the capability available.
Step 6: Upstream integration PRs
Two PRs are required because the pipeline registry is not yet dynamic.PR 1: livepeer/ai-worker
Editrunner/dl_checkpoints.sh.
Add your image variable near the top:
dl_checkpoints.sh (additions)
dl_checkpoints.sh (case branch)
PR 2: livepeer/go-livepeer
Editai/worker/docker.go and add
your pipeline name to livePipelineToImage:
ai/worker/docker.go
"my-pipeline" must match the name in your PipelineSpec and the value an orchestrator places
in aiModels.json.
Step 7: Configure your orchestrator (after PRs merge)
Once both PRs merge and a newgo-livepeer release is built, declare the pipeline in aiModels.json:
aiModels.json
go-livepeer with the AI flags as in Path 1, Step 4. From here the verification flow is identical to
Path 1: check tools.livepeer.cloud/ai/network-capabilities, then test through a self-hosted
go-livepeer -gateway.
Path 2 done
You have completed the local part of Path 2 when:docker buildproduces an imagePREPARE_MODELS=1populates the models directory with the expected weights- The container starts,
/healthreturns 200, and your pipeline endpoints respond - Both upstream PRs are filed with reproducible test instructions
Path 3: Bring Your Own Container
By the end of Path 3, your Hugging Face model is wrapped in a container of your design, registered as a BYOC external capability on your Livepeer orchestrator, and reachable through a gateway that has implemented the matching client side. BYOC is the path that does not require modifyinglivepeer/ai-worker or livepeer/go-livepeer. The
trade-off is that gateways must implement your capability’s protocol on their side. You are coordinating with
gateway operators or running your own gateway.
BYOC Fit Criteria
BYOC fits if at least one of the following is true:- your model needs a non-FastAPI protocol (gRPC, WebSocket-only, custom binary)
- your model is part of a larger application stack you want to ship as a single container
- your inference shape does not fit any built-in pipeline AND you do not want to maintain a Python package
against the
ai-runnerinterface - you are already running an inference service in production and want to expose it through Livepeer rather than re-implement it
The BYOC contract
The orchestrator’s BYOC integration requires: The orchestrator does not care what runs inside the container as long as/health and the job endpoints
behave.
Step 1: Wrap your model in a container
The minimum viable wrapper is your model behind any HTTP server. A FastAPI example:server.py
Dockerfile
build-byoc.sh
test-byoc-local.sh
/health returns 200 only after the model has loaded, and /infer returns sensible output, the container
itself is sound.
Step 2: Run the container alongside your orchestrator
The orchestrator launches an external capability container or connects to an already-running one (depending on your BYOC configuration). The container must be on the same host (or a private network reachable from the orchestrator host) and addressable by hostname or IP. A docker-compose example for orchestrator-side hosting:docker-compose.yml
Step 3: Register the capability with go-livepeer
Configurego-livepeer with the external capability. The exact flag and config-file shape is documented
inline in livepeer/go-livepeer. Search the repository for ExternalCapability and the BYOC capability
registration in the orchestrator startup path. The configuration declares:
- the capability name (your wire identifier)
- the URL where the orchestrator reaches your container (typically
http://localhost:8000for same-host setups) - the price (currency, units, rate)
- the URL fragment or path for
/health
go-livepeer with the BYOC flags. The orchestrator polls your container’s /health, and once it
returns 200, advertises the capability.
Step 4: Verify the capability is advertised
tools.livepeer.cloud/ai/network-capabilities shows
external capabilities alongside built-in pipelines for active orchestrators. Find your orchestrator and
confirm the capability name appears.
If it does not:
Active set status
Active set status
Confirm orchestrator active-set status on
explorer.livepeer.org.Health from orchestrator host
Health from orchestrator host
Confirm
/health returns 200 from the orchestrator’s perspective: curl http://<container-host>:8000/health
from the orchestrator host.Capability registration logs
Capability registration logs
Check
go-livepeer startup logs for capability registration messages and errors.Step 5: Test through a self-hosted gateway
This is the step where BYOC differs most from Paths 1 and 2. The gateway must know how to call your capability. There is no built-in gateway behaviour for unknown capabilities.Run a self-hosted gateway
start-gateway.sh
Implement the BYOC client
The gateway-side BYOC client is currently the active development surface. Reference: the SDK work atj0sh/livepeer-python-gateway and the BYOC support PR at
livepeer/go-livepeer#3866.
For initial verification, the simplest gateway-side test is to use go-livepeer’s BYOC API directly,
bypassing custom SDK selection logic. Send a job through the gateway’s BYOC endpoint, naming your capability
and supplying the request body your container expects.
A working request through the gateway means:
- the gateway discovered your orchestrator’s capability advertisement
- the gateway negotiated a payment ticket with your orchestrator
- the orchestrator routed the job to your container
- your container produced a response
- the response made it back through the gateway to the caller
Path 3 done
You have completed Path 3 when:- The container starts cleanly with NVIDIA GPU access and
/healthonly returns 200 after model load go-livepeeradvertises the capability and it appears on the network capabilities tool- A request through your self-hosted gateway, addressed to your capability name, returns the expected output from your container
Operational notes
Discovery and selection (BYOC)
Discovery and selection (BYOC)
BYOC currently uses “first response wins” selection at the gateway. The start-stream request can include
an allowlist or blocklist of orchestrators.
Reach (BYOC)
Reach (BYOC)
Your capability is callable only by gateways that have implemented your client-side protocol. Until other
gateway operators adopt your capability, you are running both ends – orchestrator and
gateway – yourself. This is normal for BYOC during bootstrap.
Iterating on the protocol (BYOC)
Iterating on the protocol (BYOC)
You control the request and response schemas. Version them explicitly (path-prefix
/v1/infer, etc.) so
changes do not silently break gateway clients.Pricing (all paths)
Pricing (all paths)
Setting price-per-pixel above the network median means your orchestrator receives no jobs. Compare
against the rates visible on the network capabilities dashboard before going live.
Warm and cold trade-off (Paths 1 and 2)
Warm and cold trade-off (Paths 1 and 2)
warm: true holds the model in VRAM continuously. SDXL-class models occupy roughly 12 GB; on a 24 GB
card you can warm one SDXL plus a smaller pipeline. Cold models share VRAM via swap on first request;
price them lower because the cold-start latency makes them less attractive to gateways.Scope exclusions
- Studio. Not used in any verification step. All inference verification runs through a self-hosted
go-livepeer -gateway. - Daydream. Not referenced as a runtime, a verification surface, or a recommended gateway. The
custom-pipeline reference repo (
daydreamlive/scope-runner) is cited as a code example, not as a runtime path the reader uses. - VRAM thresholds without a source. Where a VRAM figure appears, it is grounded in the model card or the model architecture. Vague “minimum VRAM” claims that did not have a source were left out.
- Pricing recommendations. No specific wei value is recommended as competitive. The reader is sent to the live capabilities dashboard to compare. The wei figures shown in JSON examples are illustrative.
Sources
Path 1 sources
Path 1 sources
livepeer/ai-worker– runner architecture, pipelines,dl_checkpoints.shlivepeer/go-livepeer– orchestrator, gateway, AI worker flagslivepeer/go-livepeer/ai/worker/docker.go– pipeline-to-image maplivepeer/ai-worker/runner/src/runner/main.py– FastAPI apphuggingface.co/SG161222/RealVisXL_V4.0_Lightning– model card, sampling params
Path 2 sources
Path 2 sources
livepeer/ai-worker/runner/src/runner/live/pipelines/interface.py– Pipeline interfacelivepeer/ai-worker/runner/dl_checkpoints.sh– model preparation switchlivepeer/go-livepeer/ai/worker/docker.go– pipeline-to-image mapdaydreamlive/scope-runner– reference custom pipeline implementationhuggingface_hub–snapshot_downloadfor model preparation
Path 3 sources
Path 3 sources
livepeer/go-livepeer– orchestrator, gateway, external capability handlinglivepeer/go-livepeer#3866– BYOC gateway support PRj0sh/livepeer-python-gateway– Python gateway SDK including BYOC client worknvidia/cudaDocker images – base layer for GPU containers
Common references
Common references
tools.livepeer.cloud/ai/network-capabilities– capability dashboardexplorer.livepeer.org– active-set statushub.docker.com/r/livepeer/ai-runner– runner image, tagshub.docker.com/r/tztcloud/livepeer-ollama-runner– Ollama-based LLM runner
AI agent prompt
Related pages
HuggingFace basic path
The single canonical Path 1 walkthrough on its own page, without the multi-path scaffolding.
Full AI Pipeline Tutorial
Local end-to-end pipeline: gateway routes inference to orchestrator and the result returns through the full pipeline.
Realtime AI Tutorial
Live video-to-video pipeline: continuous WebRTC stream in, transformed stream out.
BYOC CPU Tutorial
BYOC end-to-end on CPU: a focused BYOC walkthrough from the orchestrator side.