When you need fleet operations
Single-node operation is appropriate for most orchestrators. You are in fleet territory when these conditions start to apply:- Your workload requires more GPU capacity than fits in one machine
- You need geographic distribution for latency-sensitive AI workloads
- You want to separate concerns — one node for reward calling, one for ticket redemption, separate GPU workers
- You are operating at data-centre scale with SLA commitments to gateway operators
Multi-orchestrator architecture
go-livepeer supports running multiple orchestrator nodes behind a single on-chain identity. This is documented indoc/multi-o.md in the go-livepeer repository.
The key insight: each orchestrator node has its own keypair and accepts payments on behalf of the same on-chain registered Ethereum address. Node separation allows you to assign specific functions to specific nodes:
Separation patterns:
This pattern is architecturally similar to the Siphon split setup. The same principle separates reward calling from workload processing, but the split is implemented entirely within go-livepeer instead of through OrchestratorSiphon.
doc/multi-o.md — go-livepeer
The canonical multi-orchestrator architecture documentation in the go-livepeer repository.
Scaling GPU workers
Whether you are running a single orchestrator or a fleet, GPU workers scale horizontally. Each worker connects to an orchestrator with-orchSecret and the orchestrator distributes segments across all connected workers.
Adding capacity:
- Provision a new machine with NVIDIA GPU and drivers
- Install go-livepeer
- Start in transcoder mode:
Add a worker to the fleet
- The orchestrator immediately begins routing to the new worker — no configuration change on the orchestrator required
Capacity management at fleet scale
Each worker advertises its capacity (the-maxSessions value). The orchestrator tracks capacity across all connected workers and routes jobs accordingly. There is no manual load balancing step — go-livepeer handles distribution internally.
What to monitor fleet-wide:
For Prometheus fleet monitoring, run the livepeer/livepeer-monitoring Docker image configured with all worker node addresses:
Run the monitoring stack for a fleet
Rolling updates
Updating a single-node orchestrator drops all in-flight sessions. At fleet scale, you can do rolling updates to minimise disruption: Basic rolling update procedure:- Remove one node from the rotation. Update the load balancer or
-orchAddrconfigs to stop routing to the node being updated. Wait for in-flight sessions to complete (typically a few minutes). - Update the node. Pull the new go-livepeer binary and restart the service.
- Verify the updated node. Confirm it connects and is receiving sessions before proceeding.
- Repeat for remaining nodes.
Workers reconnect automatically when an orchestrator restarts. From a worker’s perspective, the orchestrator briefly disappears and then reappears. No manual action is needed on the worker side.
Network and key management at scale
Fleet operations introduce key management complexity that does not exist on single-node deployments. Key considerations:- Each orchestrator node needs access to the same Ethereum keystore to accept payments on behalf of your on-chain address. Distribute the keystore file carefully — only over encrypted channels, with restricted file permissions on each machine.
- For reward calling, you want exactly one node calling
reward()per round. Running reward calling on multiple nodes risks duplicate submissions and wasted gas. Designate a single node for reward calling and set-reward=falseon all others. - For ticket redemption, the Redeemer can be run as a separate process. See
doc/redeemer.mdin the go-livepeer repository. - Static IPs or stable DNS names are essential at fleet scale — your service URI is stored on-chain and must resolve consistently.
Enterprise and data-centre onboarding
If you are operating at data-centre scale, multiple co-location sites, or with commercial-grade SLA requirements, the Livepeer Foundation offers direct engagement support.Contact Livepeer Foundation
For enterprise and data-centre operators. Direct support for fleet integration, custom gateway relationships, and commercial partnership discussions.
Run a Pool
Pool operations — accepting worker connections and managing off-chain payouts.
Split O-T Setup
The foundational split between orchestrator and transcoder processes.
Siphon Setup
The reward-safe split setup using OrchestratorSiphon.
Metrics and Monitoring
Scaling Prometheus monitoring to a multi-node fleet.