For production workloads, operators often need more control. This guide covers manual selection, quality tiering, and failover configuration.
Selection Algorithm
Before tuning selection, it helps to understand what the Gateway does by default. The scoring algorithm is a weighted combination of four factors, each adjustable via flags: For AI Gateways using an explicit Orchestrator list (-orchAddr), the selection algorithm is simpler: the Gateway round-robins across the listed Orchestrators while respecting capability and price filters.
Workload Criteria
Different workloads have different priorities:Orchestrator Settings
Tiering Strategy
Operators running a gateway-as-service with SLA commitments route different customer tiers to different Orchestrator quality levels. Since go-livepeer does not natively support named tiers, the pattern is to run separate Gateway instances per tier, each with a different configuration.Failover Behaviour
Automatic swaps
Automatic swaps
When an Orchestrator fails mid-job, the Gateway automatically swaps to the next candidate in its selection pool. For video transcoding, the stream continues on a new Orchestrator with segments re-attempted. For AI inference, the request is retried on a different Orchestrator.The number of retry attempts before the job fails is controlled by
-maxAttempts (video transcoding). For AI jobs, the retry behaviour depends on the pipeline.Reducing swaps
Reducing swaps
High Orchestrator swap rates indicate instability in the Orchestrator pool, such as under-resourced machines or network issues. To reduce swaps:
- Increase
-minPerfScoreto exclude poorly-performing Orchestrators proactively - Add
-orchMinLivepeerVersionto exclude outdated nodes - Review
-maxPricePerUnitor-maxPricePerCapabilityceiling: if it is too low, the Gateway may be cycling through marginal Orchestrators
livepeer_orchestrator_swaps Prometheus counter to track swap frequency over time.Discovery timeout
Discovery timeout
If the Gateway cannot find a suitable Orchestrator within the discovery window, the job fails. The timeout is configurable:Increase this if jobs fail because no Orchestrator was found, particularly at startup or after a blocklist change. Decrease it for faster failure detection in latency-sensitive applications.
AI Capability Matching
For AI Gateways, Orchestrator selection is driven by capability matching before any price or performance scoring applies. The Gateway only considers Orchestrators that declare support for the requested pipeline and model. Capability information is returned by/getNetworkCapabilities. To inspect which Orchestrators support a specific model:
/getNetworkCapabilities
-maxPricePerCapability
-ignoreMaxPriceIfNeeded