Skip to main content
BYOC (Bring Your Own Container) extends what AI (and other) workloads a gateway can route by allowing orchestrators to run custom Docker inference containers and advertise them as capabilities. BYOC is a routing and policy concern, not a model hosting concern. Gateways configure how requests reach BYOC-capable orchestrators; the orchestrators handle everything inside the container.
This page covers the gateway operator perspective on BYOC. For the orchestrator and developer side - building containers, registering capabilities, and deploying inference servers - see the .

BYOC Routing

BYOC orchestrators advertise custom capabilities using the same protocol as standard AI orchestrators. The difference is that instead of running a managed ai-runner container for a known pipeline (such as text-to-image), they run a custom Docker container that exposes any inference API they choose. Gateway routes to them using the same -orchAddr flag and the same AISessionManager used for standard AI pipelines. The distinction is in how you think about the routing contract.

Gateway Responsibilities

What you do

Route requests by capability and service policy. Monitor per-capability health and error rates. Configure retry policy for BYOC-specific failure modes (cold starts, model loading). Set price ceilings per capability. Maintain failover to alternative orchestrators when a BYOC node degrades.

What you don't do

Run model containers. Host model weights. Expose orchestrator-internal model identifiers as public API contracts. Manage GPU allocation. Control what runs inside the BYOC container.
Prioritise real-time, GPU-bound, frame-based capabilities when selecting BYOC orchestrators to connect to.

Poor-fit batch workloads (large LLMs, multi-minute jobs, stateful pipelines) behind BYOC will degrade routing quality and increase latency for all jobs on the same orchestrator.

Capability Contracts

BYOC routing treats capabilities as stable API contracts, not model names. Orchestrators advertise capability descriptors (image-to-image, depth, segmentation, style-transfer). Your gateway routes on the capability - the orchestrator decides which model or container implementation serves it. This means orchestrators can update models without breaking your routing, multiple orchestrators can compete to serve the same capability, and performance-based routing automatically favours faster or cheaper implementations.
Anti-pattern: Coupling your routing to model names. If you are making routing decisions based on SG161222/RealVisXL_V4.0_Lightning instead of image-to-image, you are working against the architecture.

Routing Profiles

BYOC capabilities on the network fall into broad latency profiles. Understanding these helps you set appropriate retry timeouts and price ceilings.
Capabilities like style-transfer, image-to-image, and video-to-video where orchestrators keep models warm with persistent GPU residency. Expect sub-second latency per frame.Routing guidance: low retry timeout (5-10s), stable latency - high variance indicates GPU contention. Low price ceiling acceptable.
Capabilities like depth, segmentation, and pose - fast per frame (milliseconds) but may have a cold-start cost on first request. Once warm, throughput is high.Routing guidance: slightly higher retry timeout to account for cold starts. Very low per-request cost relative to diffusion capabilities. Monitor for latency spikes indicating model eviction.
Orchestrators that chain multiple capabilities in sequence (e.g. depth estimation feeding into a diffusion step). Higher latency and VRAM than either capability alone.Routing guidance: higher latency per request, price ceiling must account for combined compute. Expect higher per-pixel rates.
BYOC works best for frame-based, stream-based, and short-lived GPU workloads. See for the full decision framework and for how BYOC fits the pipeline taxonomy.

BYOC Requirements

These constraints apply to BYOC containers on the network. Gateway operators enforce them through routing priority; developers must meet them for their containers to be routable.
The network assumes short, repeatable, stateless units of work. BYOC containers that maintain long-lived state between requests break retry and failover semantics. If a request fails and the gateway retries on a different orchestrator, a stateful container will produce inconsistent results.
Containers that take more than 10 seconds to serve their first inference will be deprioritised by gateways tracking per-orchestrator latency. Prefer orchestrators that keep models warm. When evaluating a new BYOC orchestrator, send a test request and measure cold-start latency before committing to high-traffic routing.
Containers that use excessive VRAM reduce the orchestrator’s ability to serve concurrent jobs. For real-time pipelines, prefer fp16 or quantised models - they use less VRAM with minimal quality loss.
The capability descriptor advertised must match what the container can actually serve. Mismatches cause routing failures and degrade the orchestrator’s reputation score.
The container must expose an HTTP endpoint implementing the Livepeer AI worker API. See the for the full API contract and container requirements.

Health Tracking

BYOC routing requires per-capability health tracking instead of per-orchestrator tracking. An orchestrator may serve image-to-image perfectly while its depth capability is degraded. Track them independently. BYOC containers may have longer cold-start times than standard ai-runner pipelines. See for retry timeout settings (-aiProcessingRetryTimeout). Failure modes specific to BYOC:
  • Cold-start delays: container loading model from disk for the first time
  • GPU out-of-memory: container allocated too much VRAM, evicting other models
  • Container crash: Docker container exited, orchestrator not yet restarting it
  • API mismatch: container endpoint returns unexpected schema
For each failure mode, the AISessionManager will attempt to route to an alternative orchestrator. If no alternative is available, the request fails with an error returned to the client.

Capability Discovery

BYOC capability discovery uses the same mechanisms as standard AI discovery. See and for the full list of discovery methods and tools. When you identify a BYOC-capable orchestrator, add them to your -orchAddr list. The AISessionManager will route BYOC requests to them when their advertised capability matches the request.
Last modified on March 16, 2026