AI model management covers the operational decisions made after models are downloaded: which models to keep warm in VRAM, how to allocate VRAM across multiple pipelines, when to rotate warm models based on demand changes, and which optimisation flags to apply for throughput gains. Model sourcing and downloading is covered separately in .
Warm vs cold strategy
Warm means the model weights are loaded into GPU VRAM at container startup. Job requests are served immediately with no loading latency. Cold keeps the container available while leaving the weights out of VRAM. The first request triggers a model load — typically 10 to 60 seconds depending on model size and NVMe storage speed — before inference begins.Impact on job routing
Gateways track first-response latency per orchestrator. Nodes with fast first responses win more jobs. For latency-sensitive pipelines — particularlytext-to-image and image-to-image — cold loading creates a competitive disadvantage on the first request of each session.
The practical rule: warm your primary revenue pipeline. Keep secondary pipelines cold until VRAM capacity allows warming them.
Beta constraint: one warm model per GPU
During the Beta phase, only one warm model per GPU is supported. Setting"warm": true on more entries than you have GPUs causes the AI worker to log a conflict at startup and skip the excess entries.
Check logs on startup for:
Check warm model startup logs
Error loading warm model message indicates a warm model conflict. Reduce "warm": true entries to match your GPU count.
VRAM allocation
A 24 GB GPU holds one large diffusion model warm, with a small pipeline (Whisper or BLIP) warm on the same card when using a multi-GPU system. Keep multiple large diffusion models off a single 24 GB GPU — the Beta constraint blocks it, and VRAM is insufficient anyway.Model rotation by demand
Demand on the Livepeer AI network shifts over time. A model leading one week often falls back the next as new models are listed or gateway preferences change. Warm model selection should track demand and revenue opportunity.Checking current demand
Visit tools.livepeer.cloud/ai/network-capabilities weekly. Filter by pipeline to see which models active gateways are requesting. Models with the most gateway registrations are receiving the most routing traffic. The Livepeer Explorer AI leaderboard shows per-orchestrator earnings data, which reveals which price tiers and pipelines are earning the most jobs.Rotating the warm model
To swap which model is warm, updateaiModels.json and restart the AI worker:
aiModels.json — rotate warm model
"warm": true loads at startup. The cold entry is available but will incur first-request latency. After restarting the AI worker container, verify the new warm model appears with Warm status at tools.livepeer.cloud/ai/network-capabilities.
Restart the AI worker container and leave the full go-livepeer process running to minimise downtime:
Restart the AI worker
Optimisation flags
Optimisation flags apply to warm diffusion models only —text-to-image, image-to-image, and upscale entries with "warm": true. They have no effect on cold models or on non-diffusion pipelines.
Both flags are experimental. Apply one at a time and verify output quality before serving jobs.
SFAST and DEEPCACHE cannot be combined. Set only one, or neither, in optimization_flags.
Monitoring model loading
Verify model state after startup by checking the AI runner container logs:Check model loading logs
Error loading warm model— warm model conflict (too many"warm": trueentries for available GPUs)- Container restart loops — check
docker psfor restart counts - Model download still in progress — warm models must finish downloading before the container loads them
Related pages
Model Hosting
Download mechanics, storage layout, HuggingFace sourcing, and gated model access.
AI Inference Operations
Full aiModels.json reference and pipeline architecture.
Model Demand Reference
VRAM requirements and demand context for all supported pipelines.
Capacity Planning
VRAM budgeting and the capacity field for AI pipeline concurrency.