Capacity Planning - Livepeer Docs

Set -maxSessions to the minimum of your hardware limit and your bandwidth limit. Too low wastes capacity and limits earnings. Too high pushes transcoding behind live segment cadence – segments pile up, Gateways log errors, and your reputation score suffers.

-maxSessions is the ceiling on concurrent transcoding streams your Orchestrator accepts. Go-livepeer defaults to 10 sessions. For most hardware, this is either too conservative (leaving GPU capacity unused) or too aggressive (exceeding available bandwidth). The correct value requires measurement. Two constraints determine the session limit independently:

Hardware limit – the number of concurrent sessions the GPU transcodes within live segment timing, measured with livepeer_bench
Bandwidth limit – the number of concurrent sessions your connection carries within available upload bandwidth

The session limit is min(hardware_limit, bandwidth_limit). Video transcoding capacity and AI inference capacity use separate limits and separate mechanisms. Video transcoding sessions. AI capacity is configured per pipeline via the capacity field in aiModels.json.

Measuring hardware capacity

livepeer_bench simulates network workloads with segments arriving at live pace across multiple concurrent sessions. It reports a duration ratio: total transcoding time divided by total source duration. A ratio below 1.0 means the GPU is keeping up. A ratio above 1.0 means it is falling behind.

Installing livepeer_bench

livepeer_bench ships with go-livepeer. Verify it is on PATH:

Check livepeer_bench availability

livepeer_bench -help

A missing command usually means go-livepeer is installed outside PATH. Add the binary directory to PATH or call the binary directly. The binary sits alongside livepeer and livepeer_cli.

Setting up the benchmark

Reading the output

The summary table appears after each run:

Sample benchmark output

*------------------------------*---------------------*
| Concurrent Sessions          | 4                   |
| Real-Time Segs Transcoded    | 120                 |
| * Real-Time Segs Ratio *     | 1                   |
| Total Source Duration        | 240s                |
| Total Transcoding Duration   | 18.42s              |
| * Real-Time Duration Ratio * | 0.0768              |
*------------------------------*---------------------*

Production threshold: the last concurrent session count where the ratio stays at or below 0.8 is the hardware limit. The 20% headroom absorbs upload/download overhead and transient load spikes. Example output from the scaling script:

Benchmark session scaling example

| * Real-Time Duration Ratio * | 0.058  |    # 1 session
| * Real-Time Duration Ratio * | 0.114  |    # 2 sessions
| * Real-Time Duration Ratio * | 0.421  |    # 3 sessions
| * Real-Time Duration Ratio * | 0.783  |    # 4 sessions — last below 0.8
| * Real-Time Duration Ratio * | 1.102  |    # 5 sessions — exceeds threshold

Hardware limit in this example: 4 sessions.

NVENC hardware session caps

Consumer NVIDIA GPUs enforce a hard limit of 3 to 8 concurrent NVENC sessions in the driver, regardless of VRAM or compute capacity. This limit is imposed by NVIDIA in consumer-grade drivers to differentiate from professional-grade Quadro and datacenter cards. The benchmark reflects this cap with a sharp ratio jump at the NVENC ceiling, even when VRAM and compute remain available. Professional GPUs such as the A100 and H100 are outside this consumer driver restriction.

CPU transcoding

For CPU-only setups, omit -nvidia from the benchmark command. Start at -concurrentSessions 1 and increase. CPU transcoding produces significantly higher ratios per session than GPU. Use the benchmark result directly. Treat older rule-of-thumb figures as historical context only, because modern CPUs (Ryzen 9 7950X, Threadripper PRO) handle more sessions than older guidance suggests.

Calculating bandwidth capacity

Every transcoding session consumes upload and download bandwidth. The current standard rendition set totals approximately 5.65 Mbps upload per stream (sum of all output renditions). Source resolution determines download volume, so budget ~6 Mbps symmetric per stream to cover both directions with margin. Use the upload rate as the primary constraint. Residential connections with 100 Mbps download commonly have 20 to 30 Mbps upload, so the upload cap usually dominates.

Setting maxSessions

The session limit is min(hardware_limit, bandwidth_limit). Example calculation: Apply the limit in your startup command:

Set maxSessions at startup

livepeer \
    -orchestrator \
    -transcoder \
    -maxSessions 12 \
    -network arbitrum-one-mainnet \
    ...

For a split Orchestrator/transcoder setup, set -maxSessions on both nodes – the Orchestrator uses it to track total capacity; the transcoder uses it to control how many concurrent jobs it accepts.

AI inference and VRAM capacity

AI inference capacity is separate from video transcoding capacity. -maxSessions has no effect on AI pipeline concurrency. The capacity field in each aiModels.json entry controls how many concurrent inference requests that pipeline accepts. VRAM is the binding constraint for AI capacity. A 24 GB GPU holds one large diffusion model warm, or multiple smaller pipelines simultaneously: Beta constraint: Only one warm model per GPU is supported during the Beta phase. Additional entries with "warm": true beyond the number of GPUs will cause a conflict at startup. Keep additional pipelines cold or assign them to separate GPUs. Video vs AI VRAM: NVENC and NVDEC use dedicated hardware blocks and consume minimal VRAM for video transcoding. Running video sessions alongside warm AI models on the same GPU is supported, and AI model footprint remains the main VRAM constraint.

Tuning after going live

The benchmark estimate is a starting point. Live network conditions add variables the benchmark omits:

Actual segment sizes and bitrates from Gateways vary from the test stream
Upload latency and jitter add overhead beyond raw bandwidth measurements
Reward calls and ticket redemptions consume CPU and network intermittently

Monitor these Prometheus metrics after activation: After 24 hours of clean operation, increment -maxSessions by 1 to 2 and observe. A sudden drop in Gateway traffic should send you to the logs first; look for OrchestratorCapped errors that show the session ceiling is blocking new jobs.

AI Inference Operations

Full aiModels.json reference including the capacity field for AI pipeline concurrency.

Model Management

Warm vs cold strategy and VRAM allocation across multiple AI pipelines.

Metrics and Alerting

Prometheus metrics for transcoding throughput, alerting, and session health.

GPU Support Reference

NVENC session caps by GPU tier and supported hardware matrix.

​Measuring hardware capacity

​Installing livepeer_bench

​Setting up the benchmark

​Reading the output

​NVENC hardware session caps

​CPU transcoding

​Calculating bandwidth capacity

​Setting maxSessions

​AI inference and VRAM capacity

​Tuning after going live

​Related pages

AI Inference Operations

Model Management

Metrics and Alerting

GPU Support Reference

Measuring hardware capacity

Installing livepeer_bench

Setting up the benchmark

Reading the output

NVENC hardware session caps

CPU transcoding

Calculating bandwidth capacity

Setting maxSessions

AI inference and VRAM capacity

Tuning after going live

Related pages