> ## Documentation Index
> Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Metrics and Alerting

> How to set up Prometheus, Grafana, and the Docker monitoring stack for your Livepeer orchestrator. Covers the -monitor flag, key metrics, AI runner container monitoring, and log capture.

export const CustomDivider = ({color = "var(--lp-color-border-default)", middleText = "", spacing = "default", style = {}, className = "", ...rest}) => {
  const spacingPresets = {
    default: {
      margin: "24px 0"
    },
    overlap: {
      margin: "-1rem 0 -1rem 0"
    },
    tight: {
      margin: "0 0 -1rem 0"
    },
    section: {
      margin: "0 0 -2rem 0"
    },
    sectionOverlap: {
      margin: "-1rem 0 -2rem 0"
    },
    deepOverlap: {
      margin: "-1rem 0 -1.5rem 0"
    }
  };
  const spacingStyle = spacingPresets[spacing] || spacingPresets.default;
  return <div role="separator" aria-orientation="horizontal" className={className} style={{
    display: "flex",
    alignItems: "center",
    ...spacingStyle,
    fontSize: style?.fontSize || "16px",
    height: "fit-content",
    ...style
  }} {...rest}>
      <span style={{
    marginRight: "var(--lp-spacing-px-8)",
    opacity: 0.2
  }}>
        <Icon icon="/snippets/assets/logos/Livepeer-Logo-Symbol-Theme.svg" />
      </span>
      <div style={{
    flex: 1,
    height: "1px",
    background: "var(--lp-color-border-default)",
    opacity: 0.4
  }}></div>
      {middleText && <>
          <Icon icon="circle" size={2} />
          <span style={{
    margin: "0 8px",
    fontWeight: "bold",
    color: color,
    opacity: 0.7
  }}>
            {middleText}
          </span>
          <Icon icon="circle" size={2} />
        </>}
      <div style={{
    flex: 1,
    height: "1px",
    background: "var(--lp-color-border-default)",
    opacity: 0.4
  }}></div>
      <span style={{
    marginLeft: "var(--lp-spacing-px-8)",
    opacity: 0.2
  }}>
        <span style={{
    display: "inline-block",
    transform: "scaleX(-1)"
  }}>
          <Icon icon="/snippets/assets/logos/Livepeer-Logo-Symbol-Theme.svg" />
        </span>
      </span>
    </div>;
};

export const TableCell = ({children, align = "left", header = false, style = {}, className = "", ...rest}) => {
  const Component = header ? "th" : "td";
  return <Component className={className} style={{
    padding: "0.75rem 1rem",
    textAlign: align,
    border: header ? "none" : "1px solid var(--lp-color-border-default)",
    ...style
  }} {...rest}>
      {children}
    </Component>;
};

export const TableRow = ({children, header = false, hover = false, style = {}, className = "", ...rest}) => {
  const rowId = `table-row-${Math.random().toString(36).substr(2, 9)}`;
  return <>
      {hover && <style>{`
          #${rowId}:hover {
            background-color: var(--lp-color-bg-card);
          }
        `}</style>}
      <tr id={rowId} className={className} style={{
    ...header && ({
      backgroundColor: "var(--lp-color-accent-strong)",
      color: "var(--lp-color-on-accent)",
      fontWeight: "bold"
    }),
    ...style
  }} {...rest}>
        {children}
      </tr>
    </>;
};

export const StyledTable = ({children, variant = "default", style = {}, className = "", ...rest}) => {
  const wrapperVariants = {
    default: {
      border: "1px solid var(--lp-color-border-default)",
      backgroundColor: "var(--lp-color-bg-card)",
      overflow: "hidden"
    },
    bordered: {
      border: "2px solid var(--lp-color-accent)",
      backgroundColor: "var(--lp-color-bg-page)",
      overflow: "hidden"
    },
    minimal: {
      border: "none",
      backgroundColor: "transparent",
      overflow: "visible"
    }
  };
  return <div data-docs-styled-table-shell className={className} style={{
    width: "100%",
    padding: 0,
    margin: 0,
    ...wrapperVariants[variant],
    ...style
  }} {...rest}>
      <table data-docs-styled-table style={{
    width: "100%",
    borderCollapse: "collapse",
    borderSpacing: 0,
    margin: 0,
    backgroundColor: "transparent"
  }}>
        {children}
      </table>
    </div>;
};

Use this page to instrument the node, collect the metrics that matter, and wire up alerts before failures turn into missed jobs or missed rewards. Logs explain individual incidents; metrics show whether throughput, latency, capacity, and ticket flow are moving in the right direction.

<CustomDivider />

## Enabling node metrics

go-livepeer exposes a standard Prometheus endpoint when you pass the `-monitor` flag at startup.

```bash icon="terminal" title="Enable monitoring flags" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
livepeer \
  -orchestrator \
  -transcoder \
  -monitor \
  -metricsPerStream \
  -network arbitrum-one-mainnet \
  # ... Other flags
```

**Metrics endpoint:** `http://localhost:7935/metrics`

This is on the same port as the go-livepeer CLI (`7935` by default). The `-monitor` flag activates the `/metrics` path on that port.

**Monitoring flags:**

<StyledTable variant="bordered">
  <TableRow header>
    <TableCell header>Flag</TableCell>
    <TableCell header>What it does</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`-monitor`</TableCell>
    <TableCell>Enables the `/metrics` endpoint. Required for any Prometheus scraping.</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`-metricsPerStream`</TableCell>
    <TableCell>Groups performance metrics per stream. Useful for diagnosing individual session issues alongside aggregate counts.</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`-metricsClientIP`</TableCell>
    <TableCell>Includes the client IP address in metrics labels. Useful for identifying which Gateway is routing work to you.</TableCell>
  </TableRow>
</StyledTable>

<Note>
  Split Orchestrator and transcoder setups should pass `-monitor` on both processes when both sides need to be scraped. Each process exposes its own `/metrics` endpoint on its respective CLI port.
</Note>

<CustomDivider />

## What metrics are exposed

go-livepeer exposes metrics across multiple categories. The full reference is at [Prometheus Metrics Reference](/v2/Orchestrators/guides/monitoring-and-tooling/metrics-and-alerting).

**The metrics you will actually act on:**

<CardGroup cols={2}>
  <Card title="Session metrics" icon="users">
    Current active sessions, max session capacity, sessions per GPU. These tell you whether your node is at capacity or idle.
  </Card>

  <Card title="Segment metrics" icon="film">
    Segments received, transcoded, and failed. Success rate over time is your core transcoding health signal.
  </Card>

  <Card title="Ticket metrics" icon="ticket">
    Winning tickets received and redeemed. A gap between the two indicates ETH balance or redemption issues.
  </Card>

  <Card title="Latency metrics" icon="gauge-high">
    Processing time per segment. High latency means your GPU is saturated or a pipeline is slow – both affect Gateway scoring.
  </Card>
</CardGroup>

**Other available metrics:**

* GPU utilisation (where reported by the NVIDIA driver)
* ETH balance and pending fees
* Round number and reward call status

<CustomDivider />

## Option A: Docker monitoring stack (fastest setup)

Livepeer maintains a Docker image that bundles Prometheus, Grafana, and starter dashboard templates. This is the quickest path from zero to a working dashboard.

```bash icon="terminal" title="Run the Docker monitoring stack" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Pull the image
docker pull livepeer/monitoring

# Run against a single local node
docker run --net=host \
  --env LP_MODE=standalone \
  --env LP_NODES=localhost:7935 \
  livepeer/monitoring:latest
```

Then open Grafana at `http://localhost:3000` (default credentials: `admin` / `admin`).

**Multi-node:**

```bash icon="terminal" title="Run the monitoring stack for multiple nodes" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
docker run --net=host \
  --env LP_MODE=standalone \
  --env LP_NODES=node1.yourdomain.com:7935,node2.yourdomain.com:7935 \
  livepeer/monitoring:latest
```

**All environment variable options:**

<StyledTable variant="bordered">
  <TableRow header>
    <TableCell header>Variable</TableCell>
    <TableCell header>Values</TableCell>
    <TableCell header>Description</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`LP_MODE`</TableCell>
    <TableCell>`standalone`</TableCell>
    <TableCell>Local or bare metal single-machine</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`LP_MODE`</TableCell>
    <TableCell>`docker-compose`</TableCell>
    <TableCell>Running as part of a Docker Compose stack</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`LP_MODE`</TableCell>
    <TableCell>`kubernetes`</TableCell>
    <TableCell>Pod-based deployment; auto-discovery via labels</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`LP_NODES`</TableCell>
    <TableCell>Comma-separated `host:port`</TableCell>
    <TableCell>Nodes to monitor (not needed for Kubernetes)</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`LP_KUBE_NAMESPACES`</TableCell>
    <TableCell>Comma-separated namespaces</TableCell>
    <TableCell>Kubernetes namespace filter</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`LP_PROMETHEUS_KUBE_SCRAPE`</TableCell>
    <TableCell>Annotation name</TableCell>
    <TableCell>Custom scrape annotation for Kubernetes pods</TableCell>
  </TableRow>
</StyledTable>

For Kubernetes deployments, pods with the `prometheus.io/scrape` label are discovered automatically without specifying `LP_NODES`.

<Card title="livepeer/livepeer-monitoring" icon="github" href="https://github.com/livepeer/livepeer-monitoring">
  Source, Dockerfile, Prometheus.yml, and Grafana dashboard templates.
</Card>

<CustomDivider />

## Option B: Custom Prometheus and Grafana

Operators already running a monitoring stack should add go-livepeer as a scrape target:

```yaml icon="terminal" title="Prometheus scrape target" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# prometheus.yml
scrape_configs:
  - job_name: 'livepeer-orchestrator'
    static_configs:
      - targets: ['localhost:7935']
    scrape_interval: 15s
    metrics_path: /metrics
```

Reload Prometheus after editing: `kill -HUP <prometheus-pid>` or use the reload API at `http://localhost:9090/-/reload`.

**Useful Grafana panels to build:**

<StyledTable variant="bordered">
  <TableRow header>
    <TableCell header>Panel</TableCell>
    <TableCell header>Query pattern</TableCell>
    <TableCell header>What it shows</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>Session capacity</TableCell>
    <TableCell>`livepeer_current_sessions_total` vs `livepeer_max_sessions`</TableCell>
    <TableCell>How close to capacity you are</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>Segment success rate</TableCell>
    <TableCell>`rate(livepeer_transcode_success_total[5m])`</TableCell>
    <TableCell>Transcoding health over time</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>Reward call tracker</TableCell>
    <TableCell>Alerts on missed rounds</TableCell>
    <TableCell>Whether reward is being called reliably</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>AI job latency</TableCell>
    <TableCell>AI pipeline processing time</TableCell>
    <TableCell>Performance per pipeline</TableCell>
  </TableRow>
</StyledTable>

<CustomDivider />

## Monitoring AI Runner containers

AI inference workloads run in the `ai-runner` Docker container alongside go-livepeer. Monitoring the container separately from the node gives you a faster signal for AI-specific issues.

**Check container health:**

```bash icon="terminal" title="Check AI runner health" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# List running containers and their status
docker ps --filter name=livepeer-ai-runner

# Follow container logs live
docker logs -f livepeer-ai-runner

# Check resource usage
docker stats livepeer-ai-runner
```

**Key log messages to watch for:**

<StyledTable variant="bordered">
  <TableRow header>
    <TableCell header>Message</TableCell>
    <TableCell header>Meaning</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`Starting AI worker`</TableCell>
    <TableCell>Container is initialising. Normal.</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`Loaded model <model-id>`</TableCell>
    <TableCell>Model loaded into VRAM. Ready to process.</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`RunAI request pipeline=<name>`</TableCell>
    <TableCell>Job received. Processing.</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`Error loading model`</TableCell>
    <TableCell>Model failed to load – check VRAM, model ID spelling</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`Container health check failed`</TableCell>
    <TableCell>Container is alive but not responding to health checks</TableCell>
  </TableRow>

  <TableRow>
    <TableCell>`CUDA out of memory`</TableCell>
    <TableCell>VRAM exhausted – reduce `-maxSessions` or capacity in `aiModels.json`</TableCell>
  </TableRow>
</StyledTable>

**Verify AI pipelines are registered on the network:**

After starting your AI Runner and go-livepeer, check that your pipelines appear to the network:

```bash icon="terminal" title="Query registered capabilities" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Query your node's registered capabilities
curl http://localhost:7935/getNetworkCapabilities | jq
```

Also check the community tool at [tools.Livepeer.cloud/ai/network-capabilities](https://tools.livepeer.cloud/ai/network-capabilities) – this shows which AI-capable Orchestrators are visible network-wide and which models are warm.

<CustomDivider />

## Log capture and verbose output

By default, `livepeer` sends all logs to stdout only. For long-running production nodes, capture logs to a file:

```bash icon="terminal" title="Capture logs with tee" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Pipe to both terminal and a log file
livepeer \
  -orchestrator \
  -transcoder \
  -monitor \
  ... \
  2>&1 | tee /var/log/livepeer/livepeer.log
```

**Verbose logging for debugging:**

```bash icon="terminal" title="Enable verbose logging" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# -v 6 adds detailed transcoding activity logs
livepeer \
  -orchestrator \
  -transcoder \
  -v 6 \
  ...
```

With `-v 6` you will see individual segment reception and transcoding activity, which is the fastest way to confirm your node is receiving and processing work without a full Prometheus setup.

**Useful log search patterns:**

```bash icon="terminal" title="Search logs for key events" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Check for reward calls
grep -i "reward" /var/log/livepeer/livepeer.log

# Check for transcoding sessions
grep -i "transcode\|session" /var/log/livepeer/livepeer.log

# Check for errors only
grep -i "error\|fail\|crit" /var/log/livepeer/livepeer.log

# Check for incoming jobs
grep -i "received\|segment" /var/log/livepeer/livepeer.log
```

For systemd-managed services, use `journalctl` instead:

```bash icon="terminal" title="Inspect systemd logs" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
journalctl -u livepeer -f               # follow live
journalctl -u livepeer --since "1 hour ago"   # recent
journalctl -u livepeer -p err           # errors only
```

<CustomDivider />

## Alerting

Prometheus users should add alerting rules for the most impactful failure modes:

```yaml icon="terminal" title="Prometheus alert rules" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# prometheus-alerts.yml — add to your Prometheus config
groups:
  - name: livepeer-orchestrator
    rules:

      - alert: OrchestratorAtCapacity
        expr: "livepeer_current_sessions_total >= livepeer_max_sessions"
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Orchestrator at session capacity for 5+ minutes"

      - alert: LowETHBalance
        expr: "livepeer_eth_balance < 0.02"
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "ETH balance below 0.02 — reward calls may fail"

      - alert: TranscodeFailureRate
        expr: "rate(livepeer_transcode_failed_total[10m]) > 0.1"
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Transcode failure rate above 10% over 10 minutes"
```

<CustomDivider />

<CardGroup cols={2}>
  <Card title="Prometheus Metrics Reference" icon="table" href="/v2/orchestrators/guides/monitoring-and-tooling/metrics-and-alerting">
    Full list of every metric exported by go-livepeer.
  </Card>

  <Card title="Troubleshooting" icon="triangle-exclamation" href="/v2/orchestrators/guides/monitoring-and-tooling/troubleshooting">
    When metrics reveal a problem – how to diagnose and fix it.
  </Card>

  <Card title="AI Configuration" icon="robot" href="/v2/orchestrators/guides/ai-and-job-workloads/ai-inference-operations">
    Setting up aiModels.json and the AI Runner container.
  </Card>

  <Card title="Session Limits" icon="gauge" href="/v2/orchestrators/guides/config-and-optimisation/capacity-planning">
    Configuring -maxSessions to match your hardware capacity.
  </Card>
</CardGroup>
