Skip to main content
Use this page to instrument the node, collect the metrics that matter, and wire up alerts before failures turn into missed jobs or missed rewards. Logs explain individual incidents; metrics show whether throughput, latency, capacity, and ticket flow are moving in the right direction.

Enabling node metrics

go-livepeer exposes a standard Prometheus endpoint when you pass the -monitor flag at startup.
Enable monitoring flags
livepeer \
  -orchestrator \
  -transcoder \
  -monitor \
  -metricsPerStream \
  -network arbitrum-one-mainnet \
  # ... other flags
Metrics endpoint: http://localhost:7935/metrics This is on the same port as the go-livepeer CLI (7935 by default). The -monitor flag activates the /metrics path on that port. Monitoring flags:
Split orchestrator and transcoder setups should pass -monitor on both processes when both sides need to be scraped. Each process exposes its own /metrics endpoint on its respective CLI port.

What metrics are exposed

go-livepeer exposes metrics across multiple categories. The full reference is at Prometheus Metrics Reference. The metrics you will actually act on:

Session metrics

Current active sessions, max session capacity, sessions per GPU. These tell you whether your node is at capacity or idle.

Segment metrics

Segments received, transcoded, and failed. Success rate over time is your core transcoding health signal.

Ticket metrics

Winning tickets received and redeemed. A gap between the two indicates ETH balance or redemption issues.

Latency metrics

Processing time per segment. High latency means your GPU is saturated or a pipeline is slow — both affect gateway scoring.
Other available metrics:
  • GPU utilisation (where reported by the NVIDIA driver)
  • ETH balance and pending fees
  • Round number and reward call status

Option A: Docker monitoring stack (fastest setup)

Livepeer maintains a Docker image that bundles Prometheus, Grafana, and starter dashboard templates. This is the quickest path from zero to a working dashboard.
Run the Docker monitoring stack
# Pull the image
docker pull livepeer/monitoring

# Run against a single local node
docker run --net=host \
  --env LP_MODE=standalone \
  --env LP_NODES=localhost:7935 \
  livepeer/monitoring:latest
Then open Grafana at http://localhost:3000 (default credentials: admin / admin). Multi-node:
Run the monitoring stack for multiple nodes
docker run --net=host \
  --env LP_MODE=standalone \
  --env LP_NODES=node1.yourdomain.com:7935,node2.yourdomain.com:7935 \
  livepeer/monitoring:latest
All environment variable options: For Kubernetes deployments, pods with the prometheus.io/scrape label are discovered automatically without specifying LP_NODES.

livepeer/livepeer-monitoring

Source, Dockerfile, prometheus.yml, and Grafana dashboard templates.

Option B: Custom Prometheus and Grafana

Operators already running a monitoring stack should add go-livepeer as a scrape target:
Prometheus scrape target
# prometheus.yml
scrape_configs:
  - job_name: 'livepeer-orchestrator'
    static_configs:
      - targets: ['localhost:7935']
    scrape_interval: 15s
    metrics_path: /metrics
Reload Prometheus after editing: kill -HUP <prometheus-pid> or use the reload API at http://localhost:9090/-/reload. Useful Grafana panels to build:

Monitoring AI runner containers

AI inference workloads run in the ai-runner Docker container alongside go-livepeer. Monitoring the container separately from the node gives you a faster signal for AI-specific issues. Check container health:
Check AI runner health
# List running containers and their status
docker ps --filter name=livepeer-ai-runner

# Follow container logs live
docker logs -f livepeer-ai-runner

# Check resource usage
docker stats livepeer-ai-runner
Key log messages to watch for: Verify AI pipelines are registered on the network: After starting your AI runner and go-livepeer, check that your pipelines appear to the network:
Query registered capabilities
# Query your node's registered capabilities
curl http://localhost:7935/getNetworkCapabilities | jq
Also check the community tool at tools.livepeer.cloud/ai/network-capabilities — this shows which AI-capable orchestrators are visible network-wide and which models are warm.

Log capture and verbose output

By default, livepeer sends all logs to stdout only. For long-running production nodes, capture logs to a file:
Capture logs with tee
# Pipe to both terminal and a log file
livepeer \
  -orchestrator \
  -transcoder \
  -monitor \
  ... \
  2>&1 | tee /var/log/livepeer/livepeer.log
Verbose logging for debugging:
Enable verbose logging
# -v 6 adds detailed transcoding activity logs
livepeer \
  -orchestrator \
  -transcoder \
  -v 6 \
  ...
With -v 6 you will see individual segment reception and transcoding activity, which is the fastest way to confirm your node is receiving and processing work without a full Prometheus setup. Useful log search patterns:
Search logs for key events
# Check for reward calls
grep -i "reward" /var/log/livepeer/livepeer.log

# Check for transcoding sessions
grep -i "transcode\|session" /var/log/livepeer/livepeer.log

# Check for errors only
grep -i "error\|fail\|crit" /var/log/livepeer/livepeer.log

# Check for incoming jobs
grep -i "received\|segment" /var/log/livepeer/livepeer.log
For systemd-managed services, use journalctl instead:
Inspect systemd logs
journalctl -u livepeer -f               # follow live
journalctl -u livepeer --since "1 hour ago"   # recent
journalctl -u livepeer -p err           # errors only

Alerting

Prometheus users should add alerting rules for the most impactful failure modes:
Prometheus alert rules
# prometheus-alerts.yml — add to your Prometheus config
groups:
  - name: livepeer-orchestrator
    rules:

      - alert: OrchestratorAtCapacity
        expr: "livepeer_current_sessions_total >= livepeer_max_sessions"
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Orchestrator at session capacity for 5+ minutes"

      - alert: LowETHBalance
        expr: "livepeer_eth_balance < 0.02"
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "ETH balance below 0.02 — reward calls may fail"

      - alert: TranscodeFailureRate
        expr: "rate(livepeer_transcode_failed_total[10m]) > 0.1"
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Transcode failure rate above 10% over 10 minutes"
Last modified on March 16, 2026