Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt

Use this file to discover all available pages before exploring further.


Orchestrator health on the Livepeer network is evaluated by gateways in real time: nodes that fail to return segments or AI inference results are removed from the active selection pool. Monitoring your orchestrator means tracking the signals gateways use to evaluate you, before gateways start routing work away.

Transcode Score

The transcode score (livepeer_broadcaster_transcode_score from the broadcaster side, observable from the orchestrator’s own segment counts) is the ratio of successfully returned segments to submitted segments. A score of 1.0 means all segments are completing; below 0.9 indicates a failure pattern that needs investigation. On the orchestrator side, watch:
  • livepeer_orchestrator_segments_transcoded_total: total completions
  • livepeer_orchestrator_transcode_time_seconds: per-segment latency histogram
  • livepeer_monitor_num_orchestrators: whether the orchestrator appears in the broadcaster’s active pool
If the orchestrator disappears from a gateway’s active pool unexpectedly, check: price configuration (has -pricePerUnit moved above the gateway’s maximum?), connectivity (is -serviceAddr reachable?), and segment failure rate.

AI Inference Health

For AI-enabled orchestrators, additional signals indicate pipeline health:
  • livepeer_ai_requests_total: total AI inference requests received
  • livepeer_ai_request_duration_seconds: inference latency histogram per pipeline
  • livepeer_ai_errors_total: failed inference requests
A rising error rate on a specific pipeline type usually indicates a model load failure, VRAM exhaustion, or a container crash. Check the ai-runner container logs alongside the metric.

GPU Metrics

go-livepeer does not expose NVIDIA GPU metrics directly. Supplement Prometheus scraping with the nvidia_gpu_exporter (or nvidia-smi exporter) to track:
  • GPU utilisation percentage
  • VRAM usage vs capacity
  • GPU temperature
  • Power draw
High VRAM utilisation during model loading periods is expected; persistent saturation during inference indicates the orchestrator is accepting more concurrent sessions than available VRAM supports. Community Grafana dashboards from the Livepeer orchestrator community (available at grafana.com/grafana/dashboards) combine go-livepeer metrics with NVIDIA GPU exporter data.

External Health Tools

Two community tools provide external visibility into orchestrator performance: Stream Tester (livepeer.tools): Sends test streams to your orchestrator from multiple regions and reports segment success rates, latency, and quality scores. The test stream dashboard is public; your orchestrator’s performance history is visible to delegators evaluating whether to stake with you. AI Inference Tester (livepeer.tools): Sends test AI inference requests to your orchestrator and reports response times and error rates per pipeline type. Both tools query your orchestrator’s public -serviceAddr endpoint. They are external to your node and provide ground-truth measurements from the network’s perspective.

Alerting

Configure Prometheus alerting rules for the signals that matter most:
groups:
  - name: livepeer-orchestrator
    rules:
      - alert: OrchestratorHighErrorRate
        expr: rate(livepeer_orchestrator_transcode_errors_total[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Orchestrator error rate above 10% for 2 minutes"

      - alert: OrchestratorNotInActivePool
        expr: livepeer_monitor_num_orchestrators == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Orchestrator absent from broadcaster active pool for 5 minutes"
Route alerts to PagerDuty, Slack, or email via Alertmanager. The Livepeer community also maintains a Telegram bot (Livepeer Reward Watcher) that alerts when an orchestrator misses a reward call. The transcode score is the single most useful health indicator. A score below 0.9 means segments are not returning successfully and gateways will deprioritise the orchestrator.

Tooling and Metrics

Setting up Prometheus and the livepeer/monitor Docker container.

AI Pipelines

Pipeline types and the aiModels.json configuration that affects AI metrics.
Last modified on May 19, 2026