Model Hosting

The model_id in aiModels.json must match the HuggingFace model ID exactly, including capitalisation and the organisation prefix. A single character mismatch causes the container to fail at model load time.

Model hosting covers how AI models reach your GPU: where they come from, how they download, where they are stored, and how to verify they are loaded and serving correctly. Warm/cold strategy and runtime model selection are covered in .

Model sources

HuggingFace (primary)

The primary source for all standard Livepeer AI pipelines is HuggingFace. The model_id field in aiModels.json is a HuggingFace model identifier in the format organisation/model-name. Examples:

SG161222/RealVisXL_V4.0_Lightning – text-to-image diffusion model
openai/whisper-large-v3 – audio-to-text transcription model
Salesforce/blip-image-captioning-large – image-to-text vision model
meta-llama/Meta-Llama-3.1-8B-Instruct – LLM (served via Ollama runner)

The model_id is case-sensitive, including the organisation prefix. A typo here causes the container to fail silently at model load time with no user-facing warning beyond a startup error in the container logs.

External containers (BYOC)

The url field in an aiModels.json entry points to an external container that handles inference independently of the standard livepeer/ai-runner. The AI worker passes jobs to the external container and polls its /health endpoint at startup.

External container entry

{
  "pipeline": "audio-to-text",
  "model_id": "openai/whisper-large-v3",
  "price_per_unit": 12882811,
  "url": "http://my-whisper-container:8000",
  "capacity": 2
}

Common use cases for external containers:

Ollama runner for LLM inference (see )
Custom PyTorch, TensorRT, or ONNX inference servers
GPU clusters or auto-scaling stacks behind a load balancer
Fine-tuned or proprietary model checkpoints outside HuggingFace

External containers must expose a /health endpoint returning HTTP 200. Load the model inside the container before the AI worker starts. A failed health check at startup causes the entry to be skipped.

Download mechanics

Automatic download on first start

For standard pipelines, the livepeer/ai-runner container downloads model weights from HuggingFace automatically on first use. The download triggers when:

The container starts with a cold model configured (no "warm": true), and a job arrives for that model
The container starts with "warm": true set – download happens immediately at container startup

Download time varies by model size and network speed. Large diffusion models often take a few minutes to download on the first run. The container waits until the model is ready before serving requests.

Manual pre-download

Pre-download model weights before the container starts to avoid per-request download latency:

Pre-download a model into the go-livepeer model directory

# Pre-download into the model directory used by go-livepeer
docker run --rm \
  -v ~/.lpData/models:/root/.lpData/models \
  livepeer/ai-runner \
  python download_model.py \
    --pipeline text-to-image \
    --model_id SG161222/RealVisXL_V4.0_Lightning

Pre-downloading is recommended for:

Large models (5 GB+) where per-request download creates unacceptable first-request latency
Environments with unreliable internet connectivity during inference
Production deployments where startup time predictability matters

Storage location

Models are stored in the directory specified by -aiModelsDir. Default location:

Default aiModelsDir location

~/.lpData/models/

Override with the -aiModelsDir flag at startup:

Override aiModelsDir on startup

livepeer \
  -aiWorker \
  -aiModelsDir /mnt/fast-nvme/ai-models \
  ...

Storage sizing guidance (per model): Plan for NVMe storage on the model directory – loading weights from spinning disk into VRAM is significantly slower and affects warm model startup time and cold model first-request latency.

When using Docker-out-of-Docker, the -aiModelsDir path must point to the host machine. Docker uses that path to mount model files into spawned AI Runner containers, so a host path keeps the mount target resolvable.

Gated model access

Some HuggingFace models require authentication before download. These are called gated models – the model creator requires HuggingFace account acceptance before granting access.

Getting access

Create a HuggingFace account at huggingface.co
Navigate to the model page (e.g. meta-llama/Meta-Llama-3.1-8B-Instruct)
Accept the model’s usage terms when prompted
Generate an access token at huggingface.co/settings/tokens with at least Read scope

Using the token in aiModels.json

Add the token field to the relevant aiModels.json entry:

Gated model with HuggingFace token

{
  "pipeline": "llm",
  "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "price_per_unit": 0.18,
  "currency": "USD",
  "pixels_per_unit": 1000000,
  "warm": true,
  "url": "http://llm_runner:8000",
  "token": "hf_your_token_here"
}

The token field provides the bearer token for authenticating with HuggingFace during model download.

Keep aiModels.json files containing HuggingFace tokens out of version control. Treat the token as a credential. Store aiModels.json outside public repositories or use environment variable substitution.

Livepeer verified model list

In practice, Gateways route the models and pipeline combinations they recognise, price against, and currently request. The visible network set is the most useful operational reference point: check tools.Livepeer.cloud/ai/network-capabilities to see which models are presently showing up on the network. Configuring a model outside the verified list in aiModels.json is permitted, but Gateways route no traffic to it.

Verifying model load

Container status

List AI runner containers

docker ps --filter name=livepeer-ai-runner

All AI runner containers should show Up status. A container in a restart loop indicates a model load failure. Check logs:

Inspect AI runner logs

docker logs <container_name> --tail 100

Common error messages:

OOM or CUDA out of memory – the model exceeds available VRAM; reduce warm model count or switch to a smaller model variant
Failed to load model – model_id mismatch or network error during download
model lookup failed – HuggingFace cannot find the model_id, or gated-model access is missing

Network registration

Verify your pipelines appear registered at tools.Livepeer.cloud/ai/network-capabilities. Search by your Orchestrator address. Each configured pipeline should show its status (Warm or Cold). Registration usually takes 2 to 5 minutes after the AI worker starts. Pipelines still missing after 10 minutes should be checked against:

Container is running (docker ps)
Model loaded without errors (docker logs)
Your Orchestrator is reachable and advertising the expected pipeline capability

AI Model Management

Warm vs cold strategy, VRAM allocation, model rotation, and optimisation flags.

AI Inference Operations

Full aiModels.json reference including all fields and pipeline configuration.

Diffusion Pipeline Setup

Recommended models, VRAM requirements, and configuration for diffusion pipelines.

LLM Pipeline Setup

Ollama-based LLM runner configuration and model download via Ollama tags.

Start Here

Concepts

Quickstart

Setup

Guides

Resources

Model sources

HuggingFace (primary)

External containers (BYOC)

Download mechanics

Automatic download on first start

Manual pre-download

Storage location

Gated model access

Getting access

Using the token in aiModels.json

Livepeer verified model list

Verifying model load

Container status

Network registration

AI Model Management

AI Inference Operations

Diffusion Pipeline Setup

LLM Pipeline Setup

​Model sources

​HuggingFace (primary)

​External containers (BYOC)

​Download mechanics

​Automatic download on first start

​Manual pre-download

​Storage location

​Gated model access

​Getting access

​Using the token in aiModels.json

​Livepeer verified model list

​Verifying model load

​Container status

​Network registration

​Related pages

AI Model Management

AI Inference Operations

Diffusion Pipeline Setup

LLM Pipeline Setup

Model sources

HuggingFace (primary)

External containers (BYOC)

Download mechanics

Automatic download on first start

Manual pre-download

Storage location

Gated model access

Getting access

Using the token in aiModels.json

Livepeer verified model list

Verifying model load

Container status

Network registration

Related pages