Skip to main content
A Gateway routes jobs to Orchestrators instead of processing them locally, but it still has resource limits: session capacity, network throughput, and (for Dual Gateways) the GPU resources shared between video and AI workloads.
This guide covers how to measure those limits, decide when they have been reached, and scale in the right direction.

Scaling Signals

Before adjusting anything, confirm there is actually a capacity problem. These are the reliable indicators.
The Prometheus metric livepeer_current_sessions_total approaching livepeer_max_sessions_total means the Gateway is at capacity and will start rejecting new sessions. This is the clearest signal.
curl http://localhost:8935/metrics | grep livepeer_current_sessions
For Dual Gateways, the /hardware/stats endpoint reports GPU utilisation and memory:
curl http://localhost:8935/hardware/stats
If VRAM usage is consistently above 85%, the Gateway is at risk of OOM failures on new model loads (AI) or performance degradation on concurrent transcoding segments (video).
Rising transcoding latency or AI inference latency under load, combined with high Orchestrator swap rates, suggests the Orchestrators being routed to are also under pressure. This is an Orchestrator-side scaling problem. See for expanding the Orchestrator pool.
Errors with OrchestratorCapped in the Gateway log indicate that a downstream Orchestrator has hit its own session limit and rejected the job. Either expand the Orchestrator pool or negotiate higher capacity with preferred Orchestrators.

Capacity Planning

Per GPU, the practical limit for 1080p to 720p/480p/360p transcoding is approximately 8-12 concurrent sessions on a modern NVIDIA T4, and 15-25 on an RTX 3080 or equivalent. Run livepeer_bench on the specific hardware to get a precise number.Multiply by the number of GPUs in the Orchestrator pool to get total network capacity. The Gateway can route to all of them.For deposit sizing: video transcoding payments are per-pixel. Estimate the expected pixel throughput (resolution x frame rate x concurrent sessions x hours per day) and size the Arbitrum deposit to cover at least 24 hours of expected traffic.
AI inference capacity depends entirely on the model and the Orchestrators in the pool. FLUX.1-dev requires approximately 12-16 GB VRAM per concurrent inference; smaller SD 1.5 models can fit in 6 GB.For off-chain AI Gateways, the bottleneck is the number of AI-capable Orchestrators available. Expand the -orchAddr list or use a discovery URL that returns a larger pool.Set a monitoring alert at 70% of observed capacity. This gives time to provision additional Orchestrators before hitting the ceiling.
Last modified on March 16, 2026