Orchestrator Monitoring

Orchestrator health on the Livepeer Network is evaluated by Gateways in real time: nodes that fail to return segments or AI inference results are removed from the active selection pool. Monitoring your Orchestrator means tracking the signals Gateways use to evaluate you, before Gateways start routing work away.

Transcode Score

The transcode score (livepeer_broadcaster_transcode_score from the broadcaster side, observable from the Orchestrator’s own segment counts) is the ratio of successfully returned segments to submitted segments. A score of 1.0 means all segments are completing; below 0.9 indicates a failure pattern that needs investigation. On the Orchestrator side, watch:

livepeer_orchestrator_segments_transcoded_total: total completions
livepeer_orchestrator_transcode_time_seconds: per-segment latency histogram
livepeer_monitor_num_orchestrators: whether the Orchestrator appears in the broadcaster’s active pool

If the Orchestrator disappears from a Gateway’s active pool unexpectedly, check: price configuration (has -pricePerUnit moved above the Gateway’s maximum?), connectivity (is -serviceAddr reachable?), and segment failure rate.

AI Inference Health

For AI-enabled Orchestrators, additional signals indicate pipeline health:

livepeer_ai_requests_total: total AI inference requests received
livepeer_ai_request_duration_seconds: inference latency histogram per pipeline
livepeer_ai_errors_total: failed inference requests

A rising error rate on a specific pipeline type usually indicates a model load failure, VRAM exhaustion, or a container crash. Check the ai-runner container logs alongside the metric.

GPU Metrics

go-livepeer does not expose NVIDIA GPU metrics directly. Supplement Prometheus scraping with the nvidia_gpu_exporter (or nvidia-smi exporter) to track:

GPU utilisation percentage
VRAM usage vs capacity
GPU temperature
Power draw

High VRAM utilisation during model loading periods is expected; persistent saturation during inference indicates the Orchestrator is accepting more concurrent sessions than available VRAM supports. Community Grafana dashboards from the Livepeer Orchestrator community (available at grafana.com/grafana/dashboards) combine go-livepeer metrics with NVIDIA GPU exporter data.

External Health Tools

Two community tools provide external visibility into Orchestrator performance: Stream Tester (livepeer.tools): Sends test streams to your Orchestrator from multiple regions and reports segment success rates, latency, and quality scores. The test stream dashboard is public; your Orchestrator’s performance history is visible to Delegators evaluating whether to stake with you. AI Inference Tester (livepeer.tools): Sends test AI inference requests to your Orchestrator and reports response times and error rates per pipeline type. Both tools query your Orchestrator’s public -serviceAddr endpoint. They are external to your node and provide ground-truth measurements from the network’s perspective.

Alerting

Configure Prometheus alerting rules for the signals that matter most:

groups:
  - name: livepeer-orchestrator
    rules:
      - alert: OrchestratorHighErrorRate
        expr: rate(livepeer_orchestrator_transcode_errors_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Orchestrator error rate above 10% for 2 minutes"

      - alert: OrchestratorNotInActivePool
        expr: livepeer_monitor_num_orchestrators == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Orchestrator absent from broadcaster active pool for 5 minutes"

Route alerts to PagerDuty, Slack, or email via Alertmanager. The Livepeer community also maintains a Telegram bot (Livepeer Reward Watcher) that alerts when an Orchestrator misses a reward call. The transcode score is the single most useful health indicator. A score below 0.9 means segments are not returning successfully and Gateways will deprioritise the Orchestrator.

Tooling and Metrics

Setting up Prometheus and the Livepeer/monitor Docker container.

AI Pipelines

Pipeline types and the aiModels.json configuration that affects AI metrics.

Start here

Concepts

Learn

Build

Guides

Resources

Transcode Score

AI Inference Health

GPU Metrics

External Health Tools

Alerting

Tooling and Metrics

AI Pipelines

​Transcode Score

​AI Inference Health

​GPU Metrics

​External Health Tools

​Alerting

​Related Pages

Tooling and Metrics

AI Pipelines

Transcode Score

AI Inference Health

GPU Metrics

External Health Tools

Alerting

Related Pages