Skip to main content
This is not model hosting in the Hugging Face sense. You are hosting an inference service, not a model artefact. The distinction matters - see How Livepeer routes by capability, not model below.

What BYOC is (and isn’t)

BYOC (Bring Your Own Container) lets you run your own AI inference server inside a Docker container on a Livepeer orchestrator, and the network treats it as a callable AI capability. Livepeer does not restrict you to a fixed model catalogue or pre-approved models. Technically, any Hugging Face model can be containerised and run via BYOC. But Livepeer is optimised for low-latency, GPU-bound, real-time inference - especially for video and vision workloads. Models that violate these assumptions will be inefficient, poorly routed, or uneconomic. Rule of thumb: If the workload is frame-based or stream-based, it fits Livepeer well.

How Livepeer routes by capability, not model

Livepeer intentionally avoids model marketplaces, model-branded APIs, and centralised catalogues. Instead, it routes by capability descriptors:
  • image-to-image
  • video-to-video
  • depth
  • segmentation
  • style-transfer
Your orchestrator advertises capabilities, not model names. Gateways route on capability, price, and performance - not on which Hugging Face weights you load internally. This means:
  • Models can be swapped or updated without breaking downstream apps
  • No vendor lock-in at the model layer
  • Performance-based competition between orchestrators
  • Apps never need direct knowledge of which model runs their job

Implementation patterns

Pattern A - Real-time diffusion

Best for style transfer, image-to-image, live video effects.
  • Hugging Face SD / SDXL weights
  • StreamDiffusion or ComfyUI-style pipelines
  • Frame-in → frame-out processing
  • Persistent GPU residency

Pattern B - Vision utility node

Best for sub-tasks inside larger video pipelines.
  • Depth, segmentation, or pose models
  • Extremely fast per-frame inference
  • Used as conditioning steps feeding into diffusion

Pattern C - Hybrid pipeline

Best for differentiated orchestrator offerings.
  • Vision model output feeds conditioning into diffusion
  • Vision → condition → generation chain
  • Strong competitive differentiation in the marketplace

Hard constraints

Ignoring these will degrade routing priority and reduce job assignment:
  • Cold starts reduce job assignment. Keep models warm. Containers that take >10s to serve first inference will be deprioritised.
  • Excess VRAM usage limits parallelism. Efficient memory management means more concurrent jobs per GPU.
  • Slow endpoints are deprioritised. Gateways track latency per orchestrator and route accordingly.
  • Stateful jobs break retry and failover semantics. The network assumes short, repeatable units of work. Long-lived state breaks this.

Setup


Pricing and discovery

  • Set pricing per request, frame, or second
  • Pricing is advertised off-chain
  • Settlement occurs via Livepeer tickets
  • Gateways discover and route to you automatically
  • Applications never interact with Hugging Face or your orchestrator directly

See also

Last modified on March 9, 2026