BYOC production

A BYOC container that works locally may fail in production under concurrent sessions, GPU memory pressure, or ungraceful restarts. Use this checklist to verify container behaviour before registering on mainnet.

GPU memory profiling

Profile your container under the expected concurrent session count:

# Monitor GPU memory during load test
watch -n 1 nvidia-smi

# Run multiple concurrent sessions against local orchestrator
for i in $(seq 1 5); do
  curl -X POST http://localhost:8935/live-video-to-video -d '{"model_id":"my-model"}' &
done

Measure peak VRAM usage per session and multiply by expected concurrency. If peak exceeds your GPU’s VRAM, either reduce per-session memory (smaller batch size, lower resolution) or limit the orchestrator’s maxSessions configuration.

Graceful shutdown

The orchestrator sends SIGTERM when stopping a container. Handle it:

import signal
import asyncio

async def shutdown(server):
    # Close active sessions
    await server.close_all_sessions()
    # Flush any buffered output
    await server.flush()

def handle_sigterm(signum, frame):
    asyncio.get_event_loop().create_task(shutdown(server))

signal.signal(signal.SIGTERM, handle_sigterm)

A container that does not handle SIGTERM is killed after a timeout (default 10 seconds). Active sessions receive no graceful close and may produce incomplete output.

Health check under load

The /health endpoint must return {"status": "ok"} even under full GPU load. If health checks fail, the orchestrator stops advertising the capability and gateways route elsewhere. Common failure: the health check handler shares the GPU inference thread and blocks during heavy processing. Run health checks on a separate thread or async task.

Monitoring

Expose Prometheus metrics from your container for the orchestrator’s monitoring stack:

Metric	Description
`byoc_sessions_active`	Current concurrent sessions
`byoc_frame_latency_ms`	Per-frame processing latency histogram
`byoc_gpu_memory_bytes`	Current GPU memory usage
`byoc_errors_total`	Processing errors by type

The orchestrator’s Prometheus scraper picks up metrics from containers on the same Docker network. The BYOC architecture covers the container interface. The production checklist covers gateway-side production requirements.

Start here

Concepts

Learn

Build

Guides

Resources

GPU memory profiling

Graceful shutdown

Health check under load

Monitoring

Start here

Concepts

Learn

Build

Guides

Resources

Documentation Index

​GPU memory profiling

​Graceful shutdown

​Health check under load

​Monitoring

GPU memory profiling

Graceful shutdown

Health check under load

Monitoring