Model hosting covers how AI models reach your GPU: where they come from, how they download, where they are stored, and how to verify they are loaded and serving correctly. Warm/cold strategy and runtime model selection are covered in .
Model sources
HuggingFace (primary)
The primary source for all standard Livepeer AI pipelines is HuggingFace. Themodel_id field in aiModels.json is a HuggingFace model identifier in the format organisation/model-name.
Examples:
SG161222/RealVisXL_V4.0_Lightning— text-to-image diffusion modelopenai/whisper-large-v3— audio-to-text transcription modelSalesforce/blip-image-captioning-large— image-to-text vision modelmeta-llama/Meta-Llama-3.1-8B-Instruct— LLM (served via Ollama runner)
model_id is case-sensitive, including the organisation prefix. A typo here causes the container to fail silently at model load time with no user-facing warning beyond a startup error in the container logs.
External containers (BYOC)
Theurl field in an aiModels.json entry points to an external container that handles inference independently of the standard livepeer/ai-runner. The AI worker passes jobs to the external container and polls its /health endpoint at startup.
External container entry
- Ollama runner for LLM inference (see )
- Custom PyTorch, TensorRT, or ONNX inference servers
- GPU clusters or auto-scaling stacks behind a load balancer
- Fine-tuned or proprietary model checkpoints outside HuggingFace
/health endpoint returning HTTP 200. Load the model inside the container before the AI worker starts. A failed health check at startup causes the entry to be skipped.
Download mechanics
Automatic download on first start
For standard pipelines, thelivepeer/ai-runner container downloads model weights from HuggingFace automatically on first use. The download triggers when:
- The container starts with a cold model configured (no
"warm": true), and a job arrives for that model - The container starts with
"warm": trueset — download happens immediately at container startup
Manual pre-download
Pre-download model weights before the container starts to avoid per-request download latency:Pre-download a model into the go-livepeer model directory
- Large models (5 GB+) where per-request download creates unacceptable first-request latency
- Environments with unreliable internet connectivity during inference
- Production deployments where startup time predictability matters
Storage location
Models are stored in the directory specified by-aiModelsDir. Default location:
Default aiModelsDir location
-aiModelsDir flag at startup:
Override aiModelsDir on startup
When using Docker-out-of-Docker, the
-aiModelsDir path must point to the host machine. Docker uses that path to mount model files into spawned AI Runner containers, so a host path keeps the mount target resolvable.Gated model access
Some HuggingFace models require authentication before download. These are called gated models — the model creator requires HuggingFace account acceptance before granting access.Getting access
- Create a HuggingFace account at huggingface.co
- Navigate to the model page (e.g.
meta-llama/Meta-Llama-3.1-8B-Instruct) - Accept the model’s usage terms when prompted
- Generate an access token at huggingface.co/settings/tokens with at least
Readscope
Using the token in aiModels.json
Add thetoken field to the relevant aiModels.json entry:
Gated model with HuggingFace token
token field provides the bearer token for authenticating with HuggingFace during model download.
Livepeer verified model list
In practice, gateways route the models and pipeline combinations they recognise, price against, and currently request. The visible network set is the most useful operational reference point: check tools.livepeer.cloud/ai/network-capabilities to see which models are presently showing up on the network. Configuring a model outside the verified list inaiModels.json is permitted, but gateways route no traffic to it.
Verifying model load
Container status
List AI runner containers
Up status. A container in a restart loop indicates a model load failure. Check logs:
Inspect AI runner logs
OOMorCUDA out of memory— the model exceeds available VRAM; reduce warm model count or switch to a smaller model variantFailed to load model— model_id mismatch or network error during downloadmodel lookup failed— HuggingFace cannot find the model_id, or gated-model access is missing
Network registration
Verify your pipelines appear registered at tools.livepeer.cloud/ai/network-capabilities. Search by your orchestrator address. Each configured pipeline should show its status (Warm or Cold). Registration usually takes 2 to 5 minutes after the AI worker starts. Pipelines still missing after 10 minutes should be checked against:- Container is running (
docker ps) - Model loaded without errors (
docker logs) - Your orchestrator is reachable and advertising the expected pipeline capability
Related pages
AI Model Management
Warm vs cold strategy, VRAM allocation, model rotation, and optimisation flags.
AI Inference Operations
Full aiModels.json reference including all fields and pipeline configuration.
Diffusion Pipeline Setup
Recommended models, VRAM requirements, and configuration for diffusion pipelines.
LLM Pipeline Setup
Ollama-based LLM runner configuration and model download via Ollama tags.