This page covers the gateway operator perspective on BYOC. For the orchestrator and developer side - building containers, registering capabilities, and deploying inference servers - see the .
BYOC Routing
BYOC orchestrators advertise custom capabilities using the same protocol as standard AI orchestrators. The difference is that instead of running a managedai-runner container for a known pipeline (such as text-to-image), they run a custom Docker container that exposes any inference API they choose.
Gateway routes to them using the same -orchAddr flag and the same AISessionManager used for standard AI pipelines. The distinction is in how you think about the routing contract.
Gateway Responsibilities
What you do
Route requests by capability and service policy. Monitor per-capability health and error rates. Configure retry policy for BYOC-specific failure modes (cold starts, model loading). Set price ceilings per capability. Maintain failover to alternative orchestrators when a BYOC node degrades.
What you don't do
Run model containers. Host model weights. Expose orchestrator-internal model identifiers as public API contracts. Manage GPU allocation. Control what runs inside the BYOC container.
Prioritise real-time, GPU-bound, frame-based capabilities when selecting BYOC orchestrators to connect to.
Poor-fit batch workloads (large LLMs, multi-minute jobs, stateful pipelines) behind BYOC will degrade routing quality and increase latency for all jobs on the same orchestrator.
Poor-fit batch workloads (large LLMs, multi-minute jobs, stateful pipelines) behind BYOC will degrade routing quality and increase latency for all jobs on the same orchestrator.
Capability Contracts
BYOC routing treats capabilities as stable API contracts, not model names. Orchestrators advertise capability descriptors (image-to-image, depth, segmentation, style-transfer). Your gateway routes on the capability - the orchestrator decides which model or container implementation serves it.
This means orchestrators can update models without breaking your routing, multiple orchestrators can compete to serve the same capability, and performance-based routing automatically favours faster or cheaper implementations.
Anti-pattern: Coupling your routing to model names. If you are making routing decisions based on
SG161222/RealVisXL_V4.0_Lightning instead of image-to-image, you are working against the architecture. Routing Profiles
BYOC capabilities on the network fall into broad latency profiles. Understanding these helps you set appropriate retry timeouts and price ceilings.Low-latency capabilities
Low-latency capabilities
Capabilities like
style-transfer, image-to-image, and video-to-video where orchestrators keep models warm with persistent GPU residency. Expect sub-second latency per frame.Routing guidance: low retry timeout (5-10s), stable latency - high variance indicates GPU contention. Low price ceiling acceptable.Per-frame utility capabilities
Per-frame utility capabilities
Capabilities like
depth, segmentation, and pose - fast per frame (milliseconds) but may have a cold-start cost on first request. Once warm, throughput is high.Routing guidance: slightly higher retry timeout to account for cold starts. Very low per-request cost relative to diffusion capabilities. Monitor for latency spikes indicating model eviction.Chained capabilities
Chained capabilities
Orchestrators that chain multiple capabilities in sequence (e.g. depth estimation feeding into a diffusion step). Higher latency and VRAM than either capability alone.Routing guidance: higher latency per request, price ceiling must account for combined compute. Expect higher per-pixel rates.
BYOC Requirements
These constraints apply to BYOC containers on the network. Gateway operators enforce them through routing priority; developers must meet them for their containers to be routable.Stateless execution
Stateless execution
The network assumes short, repeatable, stateless units of work. BYOC containers that maintain long-lived state between requests break retry and failover semantics. If a request fails and the gateway retries on a different orchestrator, a stateful container will produce inconsistent results.
Cold start under 10s
Cold start under 10s
Containers that take more than 10 seconds to serve their first inference will be deprioritised by gateways tracking per-orchestrator latency. Prefer orchestrators that keep models warm. When evaluating a new BYOC orchestrator, send a test request and measure cold-start latency before committing to high-traffic routing.
VRAM efficiency
VRAM efficiency
Containers that use excessive VRAM reduce the orchestrator’s ability to serve concurrent jobs. For real-time pipelines, prefer fp16 or quantised models - they use less VRAM with minimal quality loss.
Capability accuracy
Capability accuracy
The capability descriptor advertised must match what the container can actually serve. Mismatches cause routing failures and degrade the orchestrator’s reputation score.
Livepeer AI worker API
Livepeer AI worker API
The container must expose an HTTP endpoint implementing the Livepeer AI worker API. See the for the full API contract and container requirements.
Health Tracking
BYOC routing requires per-capability health tracking instead of per-orchestrator tracking. An orchestrator may serveimage-to-image perfectly while its depth capability is degraded. Track them independently.
BYOC containers may have longer cold-start times than standard ai-runner pipelines. See for retry timeout settings (-aiProcessingRetryTimeout).
Failure modes specific to BYOC:
- Cold-start delays: container loading model from disk for the first time
- GPU out-of-memory: container allocated too much VRAM, evicting other models
- Container crash: Docker container exited, orchestrator not yet restarting it
- API mismatch: container endpoint returns unexpected schema
AISessionManager will attempt to route to an alternative orchestrator. If no alternative is available, the request fails with an error returned to the client.
Capability Discovery
BYOC capability discovery uses the same mechanisms as standard AI discovery. See and for the full list of discovery methods and tools. When you identify a BYOC-capable orchestrator, add them to your-orchAddr list. The AISessionManager will route BYOC requests to them when their advertised capability matches the request.
Related Pages
Pipelines Guide
Pipeline taxonomy - video, batch AI, real-time AI, and BYOC.
AI Pipelines
Standard AI pipeline routing, orchestrator discovery, and AISessionManager details.
Pipeline Configuration
Retry timeouts, AI routing flags, and per-capability price ceiling configuration.
Workload Fit
Decision framework for evaluating whether your AI workload belongs on Livepeer.
Developer BYOC Guide
Full architecture, container requirements, and setup for teams building BYOC containers.
Monitoring Setup
Per-capability health tracking, discovery error metrics, and alert configuration.