> ## Documentation Index
> Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM Pipeline Setup

> Set up the llm pipeline on a Livepeer orchestrator using the Ollama-based runner. Covers Docker Compose configuration, model selection for 8 GB VRAM GPUs, model ID mapping, aiModels.json entries, and USD pricing for token-based workloads.

export const StyledStep = ({title, icon, titleSize = 'h3', iconColor = null, titleColor = null, children, className = '', style = {}, ...rest}) => {
  const styledTitle = titleColor ? <span style={{
    color: titleColor
  }}>{title}</span> : title;
  return <Step title={styledTitle} icon={icon} iconColor={iconColor || undefined} titleSize={titleSize} className={className} style={style} {...rest}>
      {children}
    </Step>;
};

export const StyledSteps = ({children, iconColor, titleColor, lineColor, iconSize = '24px', className = '', style = {}, ...rest}) => {
  const resolvedIconColor = iconColor || 'var(--accent-dark, #18794E)';
  const resolvedTitleColor = titleColor || 'var(--lp-color-accent)';
  const resolvedLineColor = lineColor || 'var(--lp-color-accent)';
  return <div className={['docs-styled-steps', className].filter(Boolean).join(' ')} style={style} {...rest}>
      <style>{`
        .docs-styled-steps .steps > div > div.absolute > div {
          background-color: ${resolvedIconColor};
        }
        .docs-styled-steps .steps > div > div.w-full > p {
          color: ${resolvedTitleColor};
        }
        .docs-styled-steps .steps > div > div.absolute.w-px {
          background-color: ${resolvedLineColor};
        }
        .docs-styled-steps .steps > div:last-child > div.absolute.w-px::after {
          content: '';
          position: absolute;
          bottom: 0;
          left: 50%;
          transform: translateX(-50%);
          width: 6px;
          height: 6px;
          background-color: ${resolvedLineColor};
          transform: translateX(-50%) rotate(45deg);
        }
      `}</style>
      <div>
        <Steps>{children}</Steps>
      </div>
    </div>;
};

export const TableCell = ({children, align = "left", header = false, style = {}, className = "", ...rest}) => {
  const Component = header ? "th" : "td";
  return <Component className={className} style={{
    padding: "0.75rem 1rem",
    textAlign: align,
    border: header ? "none" : "1px solid var(--lp-color-border-default)",
    ...style
  }} {...rest}>
      {children}
    </Component>;
};

export const TableRow = ({children, header = false, hover = false, style = {}, className = "", ...rest}) => {
  const rowId = `table-row-${Math.random().toString(36).substr(2, 9)}`;
  return <>
      {hover && <style>{`
          #${rowId}:hover {
            background-color: var(--lp-color-bg-card);
          }
        `}</style>}
      <tr id={rowId} className={className} style={{
    ...header && ({
      backgroundColor: "var(--lp-color-accent-strong)",
      color: "var(--lp-color-on-accent)",
      fontWeight: "bold"
    }),
    ...style
  }} {...rest}>
        {children}
      </tr>
    </>;
};

export const StyledTable = ({children, variant = "default", style = {}, className = "", ...rest}) => {
  const wrapperVariants = {
    default: {
      border: "1px solid var(--lp-color-border-default)",
      backgroundColor: "var(--lp-color-bg-card)",
      overflow: "hidden"
    },
    bordered: {
      border: "2px solid var(--lp-color-accent)",
      backgroundColor: "var(--lp-color-bg-page)",
      overflow: "hidden"
    },
    minimal: {
      border: "none",
      backgroundColor: "transparent",
      overflow: "visible"
    }
  };
  return <div data-docs-styled-table-shell className={className} style={{
    width: "100%",
    padding: 0,
    margin: 0,
    ...wrapperVariants[variant],
    ...style
  }} {...rest}>
      <table data-docs-styled-table style={{
    width: "100%",
    borderCollapse: "collapse",
    borderSpacing: 0,
    margin: 0,
    backgroundColor: "transparent"
  }}>
        {children}
      </table>
    </div>;
};

export const LinkArrow = ({href, label, description, newline = true, borderColor, className = '', style = {}, ...rest}) => {
  const linkArrowStyle = {
    display: 'inline-flex',
    alignItems: 'center',
    justifyContent: 'center',
    gap: "var(--lp-spacing-1)",
    width: 'fit-content',
    ...borderColor && ({
      borderColor
    })
  };
  return <span className={className} style={style} {...rest}>
      {newline && <br />}
      <span style={linkArrowStyle}>
        <a href={href} target="_blank" rel="noopener noreferrer">
          {label}
        </a>
        <Icon icon="arrow-up-right" size={14} color="var(--lp-color-accent)" />
      </span>
      {description && description}
      {description && <div style={{
    height: "var(--lp-spacing-3)"
  }} />}
    </span>;
};

export const CustomDivider = ({color = "var(--lp-color-border-default)", middleText = "", spacing = "default", style = {}, className = "", ...rest}) => {
  const spacingPresets = {
    default: {
      margin: "24px 0"
    },
    overlap: {
      margin: "-1rem 0 -1rem 0"
    },
    tight: {
      margin: "0 0 -1rem 0"
    },
    section: {
      margin: "0 0 -2rem 0"
    },
    sectionOverlap: {
      margin: "-1rem 0 -2rem 0"
    },
    deepOverlap: {
      margin: "-1rem 0 -1.5rem 0"
    }
  };
  const spacingStyle = spacingPresets[spacing] || spacingPresets.default;
  return <div role="separator" aria-orientation="horizontal" className={className} style={{
    display: "flex",
    alignItems: "center",
    ...spacingStyle,
    fontSize: style?.fontSize || "16px",
    height: "fit-content",
    ...style
  }} {...rest}>
      <span style={{
    marginRight: "var(--lp-spacing-px-8)",
    opacity: 0.2
  }}>
        <Icon icon="/snippets/assets/logos/Livepeer-Logo-Symbol-Theme.svg" />
      </span>
      <div style={{
    flex: 1,
    height: "1px",
    background: "var(--lp-color-border-default)",
    opacity: 0.4
  }}></div>
      {middleText && <>
          <Icon icon="circle" size={2} />
          <span style={{
    margin: "0 8px",
    fontWeight: "bold",
    color: color,
    opacity: 0.7
  }}>
            {middleText}
          </span>
          <Icon icon="circle" size={2} />
        </>}
      <div style={{
    flex: 1,
    height: "1px",
    background: "var(--lp-color-border-default)",
    opacity: 0.4
  }}></div>
      <span style={{
    marginLeft: "var(--lp-spacing-px-8)",
    opacity: 0.2
  }}>
        <span style={{
    display: "inline-block",
    transform: "scaleX(-1)"
  }}>
          <Icon icon="/snippets/assets/logos/Livepeer-Logo-Symbol-Theme.svg" />
        </span>
      </span>
    </div>;
};

<Tip>
  The `llm` pipeline is the entry point for operators with older or smaller GPUs. A quantised 8B parameter model runs within 8 GB VRAM – opening AI participation to cards that cannot run diffusion models at all.
</Tip>

***

The `llm` pipeline uses a different architecture from all other Livepeer AI pipelines. Where diffusion and audio pipelines use the standard `livepeer/ai-runner` container, the LLM pipeline routes through an **Ollama-based runner** maintained by Cloud SPE. This enables quantised large language models to run on consumer GPUs with 8 GB of VRAM or more.

The pipeline flow is:

```text icon="terminal" title="LLM pipeline flow" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
go-livepeer → livepeer-ollama-runner → ollama container → quantised model
```

go-livepeer reaches the LLM stack over HTTP instead of managing model weights directly. The Ollama runner and Ollama container run as separate Docker services, and go-livepeer connects to them via the `url` field in `aiModels.json`.

<CustomDivider />

## Architecture split

All other batch AI pipelines (text-to-image, audio-to-text, segment-anything-2, text-to-speech) use the `livepeer/ai-runner` container. Go-livepeer spawns that container automatically based on `aiModels.json` and manages its lifecycle.

The `llm` pipeline requires you to run the Ollama stack manually:

* **Ollama container** – the model runtime that loads and serves quantised LLM weights
* **livepeer-ollama-runner** – a shim container that translates between go-livepeer's AI worker protocol and the Ollama API

go-livepeer connects to the `livepeer-ollama-runner` via the `url` field. The runner must be reachable on a shared Docker network.

<CustomDivider />

## Setup

### Prerequisites

* Docker and Docker Compose installed
* NVIDIA Container Toolkit configured (for GPU passthrough)
* An existing go-livepeer Orchestrator with `-aiWorker` enabled
* 8 GB or more of GPU VRAM (minimum for quantised 7B/8B models)

<StyledSteps>
  <StyledStep title="Create a Docker volume for model persistence">
    ```bash icon="terminal" title="Create the Ollama volume" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    docker volume create ollama
    ```

    This volume persists model weights across container restarts. Without it, models must be re-downloaded every time the Ollama container restarts.
  </StyledStep>

  <StyledStep title="Create docker-compose.yml for the Ollama stack">
    ```yaml icon="code" title="docker-compose.yml" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    services:
      ollama-ai-runner:
        image: tztcloud/livepeer-ollama-runner:0.1.1
        container_name: llm_runner
        restart: unless-stopped
        runtime: nvidia
        networks:
          - livepeer-ai

      ollama:
        image: ollama/ollama:latest
        container_name: ollama
        restart: unless-stopped
        runtime: nvidia
        volumes:
          - ollama:/root/.ollama
        environment:
          - OLLAMA_GPU_ENABLED=true
        deploy:
          resources:
            reservations:
              devices:
                - capabilities: [gpu]
                  driver: nvidia
                  count: all
        networks:
          - livepeer-ai

    networks:
      livepeer-ai:
        external: true

    volumes:
      ollama:
        external: true
    ```

    The `livepeer-ai` network must be the same network your go-livepeer container is on. The runner uses the Docker service name `llm_runner` as the hostname – go-livepeer resolves this via the shared network.
  </StyledStep>

  <StyledStep title="Start the stack">
    ```bash icon="terminal" title="Start the Ollama stack" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    docker compose up -d
    ```
  </StyledStep>

  <StyledStep title="Pull your LLM model">
    ```bash icon="terminal" title="Pull the first Ollama model" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    docker exec -it ollama ollama pull llama3.1:8b
    ```

    Replace `llama3.1:8b` with your chosen model tag. The model downloads into the `ollama` volume and persists across restarts.
  </StyledStep>

  <StyledStep title="Add the LLM entry to aiModels.json">
    ```json icon="code" title="~/.lpData/aiModels.json" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    {
      "pipeline": "llm",
      "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
      "warm": true,
      "price_per_unit": 0.18,
      "currency": "USD",
      "pixels_per_unit": 1000000,
      "url": "http://llm_runner:8000"
    }
    ```

    The `url` references the Docker service name `llm_runner` defined in the compose file. Both containers must share the `livepeer-ai` network for this hostname to resolve.
  </StyledStep>

  <StyledStep title="Restart the AI worker and verify">
    Restart your go-livepeer process, or restart the AI worker component, to load the new `aiModels.json` entry.

    After 2 to 3 minutes, check [tools.Livepeer.cloud/ai/network-capabilities](https://tools.livepeer.cloud/ai/network-capabilities) and search for your Orchestrator address. The `llm` pipeline should appear with **Warm** status.
  </StyledStep>
</StyledSteps>

<CustomDivider />

## Model selection for 8 GB VRAM

Quantised models reduce precision (typically from float32 to 4-bit integer) to fit within smaller VRAM budgets with minimal quality reduction. Ollama handles quantisation automatically via its model tags.

<StyledTable variant="bordered">
  <thead>
    <TableRow header>
      <TableCell header>Model</TableCell>
      <TableCell header>Ollama tag</TableCell>
      <TableCell header>HuggingFace model\_id</TableCell>
      <TableCell header>VRAM</TableCell>
    </TableRow>
  </thead>

  <tbody>
    <TableRow>
      <TableCell>Llama 3.1 8B</TableCell>
      <TableCell>`llama3.1:8b`</TableCell>
      <TableCell>`meta-llama/Meta-Llama-3.1-8B-Instruct`</TableCell>
      <TableCell>\~8 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>Mistral 7B</TableCell>
      <TableCell>`mistral:7b`</TableCell>
      <TableCell>`mistralai/Mistral-7B-Instruct-v0.3`</TableCell>
      <TableCell>\~8 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>Gemma 2 9B</TableCell>
      <TableCell>`gemma2:9b`</TableCell>
      <TableCell>`google/gemma-2-9b-it`</TableCell>
      <TableCell>\~10 GB</TableCell>
    </TableRow>

    <TableRow>
      <TableCell>Llama 3.1 70B Q4</TableCell>
      <TableCell>`llama3.1:70b`</TableCell>
      <TableCell>`meta-llama/Meta-Llama-3.1-70B-Instruct`</TableCell>
      <TableCell>\~40 GB</TableCell>
    </TableRow>
  </tbody>
</StyledTable>

For 8 GB VRAM GPUs, use `llama3.1:8b` or `mistral:7b`. The Gemma 2 9B typically requires closer to 10 GB, so single 8 GB cards should stay on the 7B to 8B class.

### Model ID mapping

The **Ollama tag** (`llama3.1:8b`) and the **Livepeer model\_id** (`meta-llama/Meta-Llama-3.1-8B-Instruct`) are different naming conventions for the same model family. Ollama uses its own tag format internally; go-livepeer uses HuggingFace IDs for on-chain capability advertisement.

Both identify the same underlying model. The `aiModels.json` entry uses the HuggingFace ID in `model_id`, while the `ollama pull` command uses the Ollama tag.

<CustomDivider />

## Pricing the LLM pipeline

LLM pricing differs from pixel-based pipelines. Use USD notation with `pixels_per_unit` as a token-count proxy:

```json icon="code" title="LLM pricing in aiModels.json" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
{
  "pipeline": "llm",
  "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "price_per_unit": 0.18,
  "currency": "USD",
  "pixels_per_unit": 1000000,
  "warm": true,
  "url": "http://llm_runner:8000"
}
```

This example sets a rate of $0.18 per million tokens (equivalent to $0.18/1M tokens, a competitive rate for 8B parameter models as of early 2026). Adjust based on your GPU's inference throughput and current market rates.

Check [tools.Livepeer.cloud/ai/network-capabilities](https://tools.livepeer.cloud/ai/network-capabilities) for current LLM pipeline pricing from other Orchestrators before setting your rate.

<CustomDivider />

## Testing locally

After the stack is running, test the Ollama runner directly before routing live traffic:

```bash icon="terminal" title="Test LLM inference locally" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
# Check Ollama is running and the model is loaded
docker exec -it ollama ollama list

# Test inference via the runner (adjust port if different)
curl -X POST http://localhost:8000/llm \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.1:8b", "prompt": "Hello"}'
```

Verify the runner health endpoint is responding:

```bash icon="terminal" title="Check the runner health endpoint" theme={"theme":{"light":"github-light","dark":"dark-plus"}}
curl http://localhost:8000/health
# Expected: HTTP 200
```

<CustomDivider />

## Related pages

<CardGroup cols={2}>
  <Card title="AI Inference Operations" icon="microchip" href="/v2/orchestrators/guides/ai-and-job-workloads/ai-inference-operations" arrow horizontal>
    aiModels.json reference and full pipeline architecture including the url field for external containers.
  </Card>

  <Card title="Diffusion Pipeline Setup" icon="image" href="/v2/orchestrators/guides/ai-and-job-workloads/diffusion-pipeline-setup" arrow horizontal>
    text-to-image, image-to-image, and other diffusion pipelines requiring the standard ai-runner.
  </Card>

  <Card title="Audio and Vision Pipelines" icon="waveform-lines" href="/v2/orchestrators/guides/ai-and-job-workloads/audio-and-vision-pipelines" arrow horizontal>
    audio-to-text, text-to-speech, image-to-text, and segment-anything-2 setup.
  </Card>

  <Card title="AI Model Management" icon="sliders" href="/v2/orchestrators/guides/config-and-optimisation/ai-model-management" arrow horizontal>
    Warm vs cold strategy and optimisation flags for AI pipelines.
  </Card>
</CardGroup>
