What gateway operators need to do
- Route requests by capability and policy (price, latency, reliability)
- Prefer orchestrators with stable warm-start behavior
- Monitor p95 latency and error rates by capability
- Configure retries and failover so requests remain serviceable during node churn
What gateway operators do not do
- Run model containers directly
- Host model weights as the primary inference service
- Expose orchestrator-internal model identifiers as public API contracts
Routing best practices
- Treat capabilities as the API contract (
image-to-image,depth,segmentation, etc.) - Avoid coupling routing to specific model names
- Maintain per-capability health and route around degraded nodes
- Keep clear max-price settings to avoid uneconomic job assignment
Operational flow
Developer handoff
For BYOC implementation and container design details, use the developer guide:Developer BYOC
Full architecture and setup for teams deploying BYOC containers.