BYOC Pipelines

BYOC (Bring Your Own Container) extends what AI (and other) workloads a Gateway can route by allowing Orchestrators to run custom Docker inference containers and advertise them as capabilities. BYOC is a routing and policy concern, not a model hosting concern. Gateways configure how requests reach BYOC-capable Orchestrators; the Orchestrators handle everything inside the container.

the Gateway operator perspective on BYOC. For the Orchestrator and developer side - building containers, registering capabilities, and deploying inference servers - see the .

BYOC Routing

BYOC Orchestrators advertise custom capabilities using the same protocol as standard AI Orchestrators. The difference is that instead of running a managed ai-runner container for a known pipeline (such as text-to-image), they run a custom Docker container that exposes any inference API they choose. Gateway routes to them using the same -orchAddr flag and the same AISessionManager used for standard AI pipelines. The distinction is in how you think about the routing contract.

Gateway Responsibilities

What you do

Route requests by capability and service policy. Monitor per-capability health and error rates. Configure retry policy for BYOC-specific failure modes (cold starts, model loading). Set price ceilings per capability. Maintain failover to alternative Orchestrators when a BYOC node degrades.

What you don't do

Run model containers. Host model weights. Expose Orchestrator-internal model identifiers as public API contracts. Manage GPU allocation. Control what runs inside the BYOC container.

Prioritise real-time, GPU-bound, frame-based capabilities when selecting BYOC Orchestrators to connect to.

Poor-fit batch workloads (large LLMs, multi-minute jobs, stateful pipelines) behind BYOC will degrade routing quality and increase latency for all jobs on the same Orchestrator.

Capability Contracts

BYOC routing treats capabilities as stable API contracts, not model names. Orchestrators advertise capability descriptors (image-to-image, depth, segmentation, style-transfer). Your Gateway routes on the capability - the Orchestrator decides which model or container implementation serves it. This means Orchestrators can update models without breaking your routing, multiple Orchestrators can compete to serve the same capability, and performance-based routing automatically favours faster or cheaper implementations.

Anti-pattern: Coupling your routing to model names. If you are making routing decisions based on SG161222/RealVisXL_V4.0_Lightning instead of image-to-image, you are working against the architecture.

Routing Profiles

BYOC capabilities on the network fall into broad latency profiles. Understanding these helps you set appropriate retry timeouts and price ceilings.

Low-latency capabilities

Capabilities like style-transfer, image-to-image, and video-to-video where Orchestrators keep models warm with persistent GPU residency. Expect sub-second latency per frame.Routing guidance: low retry timeout (5-10s), stable latency - high variance indicates GPU contention. Low price ceiling acceptable.

Per-frame utility capabilities

Capabilities like depth, segmentation, and pose - fast per frame (milliseconds) but may have a cold-start cost on first request. Once warm, throughput is high.Routing guidance: slightly higher retry timeout to account for cold starts. Very low per-request cost relative to diffusion capabilities. Monitor for latency spikes indicating model eviction.

Chained capabilities

Orchestrators that chain multiple capabilities in sequence (e.g. Depth estimation feeding into a diffusion step). Higher latency and VRAM than either capability alone.Routing guidance: higher latency per request, price ceiling must account for combined compute. Expect higher per-pixel rates.

BYOC works best for frame-based, stream-based, and short-lived GPU workloads. See for the full decision framework and for how BYOC fits the pipeline taxonomy.

BYOC Requirements

These constraints apply to BYOC containers on the network. Gateway operators enforce them through routing priority; developers must meet them for their containers to be routable.

Stateless execution

The network assumes short, repeatable, stateless units of work. BYOC containers that maintain long-lived state between requests break retry and failover semantics. If a request fails and the Gateway retries on a different Orchestrator, a stateful container will produce inconsistent results.

Cold start under 10s

Containers that take more than 10 seconds to serve their first inference will be deprioritised by Gateways tracking per-Orchestrator latency. Prefer Orchestrators that keep models warm. When evaluating a new BYOC Orchestrator, send a test request and measure cold-start latency before committing to high-traffic routing.

VRAM efficiency

Containers that use excessive VRAM reduce the Orchestrator’s ability to serve concurrent jobs. For real-time pipelines, prefer fp16 or quantised models - they use less VRAM with minimal quality loss.

Capability accuracy

The capability descriptor advertised must match what the container can actually serve. Mismatches cause routing failures and degrade the Orchestrator’s reputation score.

Livepeer AI worker API

The container must expose an HTTP endpoint implementing the Livepeer AI worker API. See the for the full API contract and container requirements.

Health Tracking

BYOC routing requires per-capability health tracking instead of per-Orchestrator tracking. An Orchestrator may serve image-to-image perfectly while its depth capability is degraded. Track them independently. BYOC containers may have longer cold-start times than standard ai-runner pipelines. See for retry timeout settings (-aiProcessingRetryTimeout). Failure modes specific to BYOC:

Cold-start delays: container loading model from disk for the first time
GPU out-of-memory: container allocated too much VRAM, evicting other models
Container crash: Docker container exited, Orchestrator not yet restarting it
API mismatch: container endpoint returns unexpected schema

For each failure mode, the AISessionManager will attempt to route to an alternative Orchestrator. If no alternative is available, the request fails with an error returned to the client.

Capability Discovery

BYOC capability discovery uses the same mechanisms as standard AI discovery. See and for the full list of discovery methods and tools. When you identify a BYOC-capable Orchestrator, add them to your -orchAddr list. The AISessionManager will route BYOC requests to them when their advertised capability matches the request.

Pipelines Guide

Pipeline taxonomy - video, batch AI, real-time AI, and BYOC.

AI Pipelines

Standard AI pipeline routing, Orchestrator discovery, and AISessionManager details.

Pipeline Configuration

Retry timeouts, AI routing flags, and per-capability price ceiling configuration.

Workload Fit

Decision framework for evaluating whether your AI workload belongs on Livepeer.

Developer BYOC Guide

Full architecture, container requirements, and setup for teams building BYOC containers.

Monitoring Setup

Per-capability health tracking, discovery error metrics, and alert configuration.

About Gateways

Concepts

Quickstart ⚡

Gateway Setup

Guides

Resources

BYOC Routing

Gateway Responsibilities

What you do

What you don't do

Capability Contracts

Routing Profiles

BYOC Requirements

Health Tracking

Capability Discovery

Pipelines Guide

AI Pipelines

Pipeline Configuration

Workload Fit

Developer BYOC Guide

Monitoring Setup

​BYOC Routing

​Gateway Responsibilities

What you do

What you don't do

​Capability Contracts

​Routing Profiles

​BYOC Requirements

​Health Tracking

​Capability Discovery

​Related Pages

Pipelines Guide

AI Pipelines

Pipeline Configuration

Workload Fit

Developer BYOC Guide

Monitoring Setup

BYOC Routing

Gateway Responsibilities

Capability Contracts

Routing Profiles

BYOC Requirements

Health Tracking

Capability Discovery

Related Pages