> ## Documentation Index > Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt > Use this file to discover all available pages before exploring further. # Metrics and Alerting > How to set up Prometheus, Grafana, and the Docker monitoring stack for your Livepeer orchestrator. Covers the -monitor flag, key metrics, AI runner container monitoring, and log capture. export const CustomDivider = ({color = "var(--lp-color-border-default)", middleText = "", spacing = "default", style = {}, className = "", ...rest}) => { const spacingPresets = { default: { margin: "24px 0" }, overlap: { margin: "-1rem 0 -1rem 0" }, tight: { margin: "0 0 -1rem 0" }, section: { margin: "0 0 -2rem 0" }, sectionOverlap: { margin: "-1rem 0 -2rem 0" }, deepOverlap: { margin: "-1rem 0 -1.5rem 0" } }; const spacingStyle = spacingPresets[spacing] || spacingPresets.default; return

{middleText && <> {middleText} }

; }; export const TableCell = ({children, align = "left", header = false, style = {}, className = "", ...rest}) => { const Component = header ? "th" : "td"; return {children} ; }; export const TableRow = ({children, header = false, hover = false, style = {}, className = "", ...rest}) => { const rowId = `table-row-${Math.random().toString(36).substr(2, 9)}`; return <> {hover && } {children} ; }; export const StyledTable = ({children, variant = "default", style = {}, className = "", ...rest}) => { const wrapperVariants = { default: { border: "1px solid var(--lp-color-border-default)", backgroundColor: "var(--lp-color-bg-card)", overflow: "hidden" }, bordered: { border: "2px solid var(--lp-color-accent)", backgroundColor: "var(--lp-color-bg-page)", overflow: "hidden" }, minimal: { border: "none", backgroundColor: "transparent", overflow: "visible" } }; return

{children}

; }; Use this page to instrument the node, collect the metrics that matter, and wire up alerts before failures turn into missed jobs or missed rewards. Logs explain individual incidents; metrics show whether throughput, latency, capacity, and ticket flow are moving in the right direction. ## Enabling node metrics go-livepeer exposes a standard Prometheus endpoint when you pass the `-monitor` flag at startup. ```bash icon="terminal" title="Enable monitoring flags" theme={"theme":{"light":"github-light","dark":"dark-plus"}} livepeer \ -orchestrator \ -transcoder \ -monitor \ -metricsPerStream \ -network arbitrum-one-mainnet \ # ... Other flags ``` **Metrics endpoint:** `http://localhost:7935/metrics` This is on the same port as the go-livepeer CLI (`7935` by default). The `-monitor` flag activates the `/metrics` path on that port. **Monitoring flags:** Flag What it does `-monitor` Enables the `/metrics` endpoint. Required for any Prometheus scraping. `-metricsPerStream` Groups performance metrics per stream. Useful for diagnosing individual session issues alongside aggregate counts. `-metricsClientIP` Includes the client IP address in metrics labels. Useful for identifying which Gateway is routing work to you. Split Orchestrator and transcoder setups should pass `-monitor` on both processes when both sides need to be scraped. Each process exposes its own `/metrics` endpoint on its respective CLI port. ## What metrics are exposed go-livepeer exposes metrics across multiple categories. The full reference is at [Prometheus Metrics Reference](/v2/Orchestrators/guides/monitoring-and-tooling/metrics-and-alerting). **The metrics you will actually act on:** Current active sessions, max session capacity, sessions per GPU. These tell you whether your node is at capacity or idle. Segments received, transcoded, and failed. Success rate over time is your core transcoding health signal. Winning tickets received and redeemed. A gap between the two indicates ETH balance or redemption issues. Processing time per segment. High latency means your GPU is saturated or a pipeline is slow – both affect Gateway scoring. **Other available metrics:** * GPU utilisation (where reported by the NVIDIA driver) * ETH balance and pending fees * Round number and reward call status ## Option A: Docker monitoring stack (fastest setup) Livepeer maintains a Docker image that bundles Prometheus, Grafana, and starter dashboard templates. This is the quickest path from zero to a working dashboard. ```bash icon="terminal" title="Run the Docker monitoring stack" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # Pull the image docker pull livepeer/monitoring # Run against a single local node docker run --net=host \ --env LP_MODE=standalone \ --env LP_NODES=localhost:7935 \ livepeer/monitoring:latest ``` Then open Grafana at `http://localhost:3000` (default credentials: `admin` / `admin`). **Multi-node:** ```bash icon="terminal" title="Run the monitoring stack for multiple nodes" theme={"theme":{"light":"github-light","dark":"dark-plus"}} docker run --net=host \ --env LP_MODE=standalone \ --env LP_NODES=node1.yourdomain.com:7935,node2.yourdomain.com:7935 \ livepeer/monitoring:latest ``` **All environment variable options:** Variable Values Description `LP_MODE` `standalone` Local or bare metal single-machine `LP_MODE` `docker-compose` Running as part of a Docker Compose stack `LP_MODE` `kubernetes` Pod-based deployment; auto-discovery via labels `LP_NODES` Comma-separated `host:port` Nodes to monitor (not needed for Kubernetes) `LP_KUBE_NAMESPACES` Comma-separated namespaces Kubernetes namespace filter `LP_PROMETHEUS_KUBE_SCRAPE` Annotation name Custom scrape annotation for Kubernetes pods For Kubernetes deployments, pods with the `prometheus.io/scrape` label are discovered automatically without specifying `LP_NODES`. Source, Dockerfile, Prometheus.yml, and Grafana dashboard templates. ## Option B: Custom Prometheus and Grafana Operators already running a monitoring stack should add go-livepeer as a scrape target: ```yaml icon="terminal" title="Prometheus scrape target" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # prometheus.yml scrape_configs: - job_name: 'livepeer-orchestrator' static_configs: - targets: ['localhost:7935'] scrape_interval: 15s metrics_path: /metrics ``` Reload Prometheus after editing: `kill -HUP ` or use the reload API at `http://localhost:9090/-/reload`. **Useful Grafana panels to build:** Panel Query pattern What it shows Session capacity `livepeer_current_sessions_total` vs `livepeer_max_sessions` How close to capacity you are Segment success rate `rate(livepeer_transcode_success_total[5m])` Transcoding health over time Reward call tracker Alerts on missed rounds Whether reward is being called reliably AI job latency AI pipeline processing time Performance per pipeline ## Monitoring AI Runner containers AI inference workloads run in the `ai-runner` Docker container alongside go-livepeer. Monitoring the container separately from the node gives you a faster signal for AI-specific issues. **Check container health:** ```bash icon="terminal" title="Check AI runner health" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # List running containers and their status docker ps --filter name=livepeer-ai-runner # Follow container logs live docker logs -f livepeer-ai-runner # Check resource usage docker stats livepeer-ai-runner ``` **Key log messages to watch for:** Message Meaning `Starting AI worker` Container is initialising. Normal. `Loaded model ` Model loaded into VRAM. Ready to process. `RunAI request pipeline=` Job received. Processing. `Error loading model` Model failed to load – check VRAM, model ID spelling `Container health check failed` Container is alive but not responding to health checks `CUDA out of memory` VRAM exhausted – reduce `-maxSessions` or capacity in `aiModels.json` **Verify AI pipelines are registered on the network:** After starting your AI Runner and go-livepeer, check that your pipelines appear to the network: ```bash icon="terminal" title="Query registered capabilities" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # Query your node's registered capabilities curl http://localhost:7935/getNetworkCapabilities | jq ``` Also check the community tool at [tools.Livepeer.cloud/ai/network-capabilities](https://tools.livepeer.cloud/ai/network-capabilities) – this shows which AI-capable Orchestrators are visible network-wide and which models are warm. ## Log capture and verbose output By default, `livepeer` sends all logs to stdout only. For long-running production nodes, capture logs to a file: ```bash icon="terminal" title="Capture logs with tee" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # Pipe to both terminal and a log file livepeer \ -orchestrator \ -transcoder \ -monitor \ ... \ 2>&1 | tee /var/log/livepeer/livepeer.log ``` **Verbose logging for debugging:** ```bash icon="terminal" title="Enable verbose logging" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # -v 6 adds detailed transcoding activity logs livepeer \ -orchestrator \ -transcoder \ -v 6 \ ... ``` With `-v 6` you will see individual segment reception and transcoding activity, which is the fastest way to confirm your node is receiving and processing work without a full Prometheus setup. **Useful log search patterns:** ```bash icon="terminal" title="Search logs for key events" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # Check for reward calls grep -i "reward" /var/log/livepeer/livepeer.log # Check for transcoding sessions grep -i "transcode\|session" /var/log/livepeer/livepeer.log # Check for errors only grep -i "error\|fail\|crit" /var/log/livepeer/livepeer.log # Check for incoming jobs grep -i "received\|segment" /var/log/livepeer/livepeer.log ``` For systemd-managed services, use `journalctl` instead: ```bash icon="terminal" title="Inspect systemd logs" theme={"theme":{"light":"github-light","dark":"dark-plus"}} journalctl -u livepeer -f # follow live journalctl -u livepeer --since "1 hour ago" # recent journalctl -u livepeer -p err # errors only ``` ## Alerting Prometheus users should add alerting rules for the most impactful failure modes: ```yaml icon="terminal" title="Prometheus alert rules" theme={"theme":{"light":"github-light","dark":"dark-plus"}} # prometheus-alerts.yml — add to your Prometheus config groups: - name: livepeer-orchestrator rules: - alert: OrchestratorAtCapacity expr: "livepeer_current_sessions_total >= livepeer_max_sessions" for: 5m labels: severity: warning annotations: summary: "Orchestrator at session capacity for 5+ minutes" - alert: LowETHBalance expr: "livepeer_eth_balance < 0.02" for: 1m labels: severity: critical annotations: summary: "ETH balance below 0.02 — reward calls may fail" - alert: TranscodeFailureRate expr: "rate(livepeer_transcode_failed_total[10m]) > 0.1" for: 10m labels: severity: warning annotations: summary: "Transcode failure rate above 10% over 10 minutes" ``` Full list of every metric exported by go-livepeer. When metrics reveal a problem – how to diagnose and fix it. Setting up aiModels.json and the AI Runner container. Configuring -maxSessions to match your hardware capacity.