Skip to main content
Both tests run entirely off-chain. No blockchain, no staking, no ETH. The only goal is confirming your GPU works with Livepeer before committing to production setup.
Two smoke tests on this page: video transcoding (20-30 min) and AI inference (35-65 min). Both run with Docker only. Complete the video test first - it shares the Docker prerequisites with the AI test.
Use this page to verify hardware. Continue to the after these tests when you are ready for on-chain activation and earning.

Prerequisites

Confirm each item before starting:

Video transcoding test

What this proves: the orchestrator accepts video segments, transcodes them on the GPU, and delivers HLS output.

Video flow summary

The gateway received the RTMP stream, split it into segments, routed them to the orchestrator, the orchestrator transcoded each segment on the GPU, and the gateway reassembled the output as an HLS stream. The -network offchain flag kept the test local and bypassed blockchain interaction. GPU transcoding works on this machine. Continue to the AI test, or skip to the for production configuration.

AI inference test

The diffusion test requires 24 GB VRAM. For GPUs with 8-16 GB VRAM, skip to the LLM alternative below.
What this proves: the AI runner container downloads and serves a warm model, and the orchestrator routes inference requests correctly. Time estimate: 35-65 minutes - most of this is the model download (~6 GB for SDXL-Lightning). The actual inference test takes under 2 minutes once the model is loaded. Use http://llm_runner:8000 as the url value in aiModels.json when you switch to the Ollama-based LLM path.
The llm pipeline uses the Ollama runner instead of livepeer/ai-runner. It runs quantised LLMs within 8 GB VRAM.Pull the Ollama runner:
docker pull tztcloud/livepeer-ollama-runner:0.1.1
docker pull ollama/ollama:latest
Create a Docker volume and pull a model:
docker volume create ollama
docker run -d --name ollama --gpus all \
  -v ollama:/root/.ollama \
  ollama/ollama:latest

docker exec -it ollama ollama pull llama3.1:8b
Add the LLM entry to aiModels.json:
[
  {
    "pipeline": "llm",
    "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "warm": true,
    "price_per_unit": 0.18,
    "currency": "USD",
    "pixels_per_unit": 1000000,
    "url": "http://llm_runner:8000"
  }
]
For the full LLM pipeline setup including the Docker network configuration, see .

AI flow summary

The orchestrator started in off-chain mode with the AI worker enabled. On first start, livepeer/ai-runner was spawned as a child Docker container via Docker-out-of-Docker, downloaded model weights from HuggingFace, and loaded them into GPU VRAM. The test inference request travelled from curl to the orchestrator, through the AI runner container, and returned a generated PNG. AI inference works on this machine. Model weights remain cached in ~/.lpData/models/ for production use.

Next steps

Last modified on March 16, 2026