-maxSessions is the ceiling on concurrent transcoding streams your orchestrator accepts. go-livepeer defaults to 10 sessions. For most hardware, this is either too conservative (leaving GPU capacity unused) or too aggressive (exceeding available bandwidth). The correct value requires measurement.
Two constraints determine the session limit independently:
- Hardware limit — the number of concurrent sessions the GPU transcodes within live segment timing, measured with
livepeer_bench - Bandwidth limit — the number of concurrent sessions your connection carries within available upload bandwidth
min(hardware_limit, bandwidth_limit).
Video transcoding capacity and AI inference capacity use separate limits and separate mechanisms. This page covers video transcoding sessions. AI capacity is configured per pipeline via the capacity field in aiModels.json.
Measuring hardware capacity
livepeer_bench simulates network workloads with segments arriving at live pace across multiple concurrent sessions. It reports a duration ratio: total transcoding time divided by total source duration. A ratio below 1.0 means the GPU is keeping up. A ratio above 1.0 means it is falling behind.
Installing livepeer_bench
livepeer_bench ships with go-livepeer. Verify it is on PATH:
Check livepeer_bench availability
livepeer and livepeer_cli.
Setting up the benchmark
Reading the output
The summary table appears after each run:Sample benchmark output
Benchmark session scaling example
NVENC hardware session caps
Consumer NVIDIA GPUs enforce a hard limit of 3 to 8 concurrent NVENC sessions in the driver, regardless of VRAM or compute capacity. This limit is imposed by NVIDIA in consumer-grade drivers to differentiate from professional-grade Quadro and datacenter cards. The benchmark reflects this cap with a sharp ratio jump at the NVENC ceiling, even when VRAM and compute remain available. Professional GPUs such as the A100 and H100 are outside this consumer driver restriction.CPU transcoding
For CPU-only setups, omit-nvidia from the benchmark command. Start at -concurrentSessions 1 and increase. CPU transcoding produces significantly higher ratios per session than GPU. Use the benchmark result directly. Treat older rule-of-thumb figures as historical context only, because modern CPUs (Ryzen 9 7950X, Threadripper PRO) handle more sessions than older guidance suggests.
Calculating bandwidth capacity
Every transcoding session consumes upload and download bandwidth. The current standard rendition set totals approximately 5.65 Mbps upload per stream (sum of all output renditions). Source resolution determines download volume, so budget ~6 Mbps symmetric per stream to cover both directions with margin. Use the upload rate as the primary constraint. Residential connections with 100 Mbps download commonly have 20 to 30 Mbps upload, so the upload cap usually dominates.Setting maxSessions
The session limit ismin(hardware_limit, bandwidth_limit). Example calculation:
Apply the limit in your startup command:
Set maxSessions at startup
-maxSessions on both nodes — the orchestrator uses it to track total capacity; the transcoder uses it to control how many concurrent jobs it accepts.
AI inference and VRAM capacity
AI inference capacity is separate from video transcoding capacity.-maxSessions has no effect on AI pipeline concurrency. The capacity field in each aiModels.json entry controls how many concurrent inference requests that pipeline accepts.
VRAM is the binding constraint for AI capacity. A 24 GB GPU holds one large diffusion model warm, or multiple smaller pipelines simultaneously:
Beta constraint: Only one warm model per GPU is supported during the Beta phase. Additional entries with "warm": true beyond the number of GPUs will cause a conflict at startup. Keep additional pipelines cold or assign them to separate GPUs.
Video vs AI VRAM: NVENC and NVDEC use dedicated hardware blocks and consume minimal VRAM for video transcoding. Running video sessions alongside warm AI models on the same GPU is supported, and AI model footprint remains the main VRAM constraint.
Tuning after going live
The benchmark estimate is a starting point. Live network conditions add variables the benchmark omits:- Actual segment sizes and bitrates from gateways vary from the test stream
- Upload latency and jitter add overhead beyond raw bandwidth measurements
- Reward calls and ticket redemptions consume CPU and network intermittently
-maxSessions by 1 to 2 and observe. A sudden drop in gateway traffic should send you to the logs first; look for OrchestratorCapped errors that show the session ceiling is blocking new jobs.
Related pages
AI Inference Operations
Full aiModels.json reference including the
capacity field for AI pipeline concurrency.Model Management
Warm vs cold strategy and VRAM allocation across multiple AI pipelines.
Metrics and Alerting
Prometheus metrics for transcoding throughput, alerting, and session health.
GPU Support Reference
NVENC session caps by GPU tier and supported hardware matrix.