To ensure your GPU(s) are performing optimally and to identify any bottlenecks, you can run a benchmark test. Optimal performance significantly increases your chances of being selected for jobs, as inference speed is a crucial factor in the selection process. This guide will show you how to run a benchmark test on your GPU(s) to determine the best pipeline/model combination to serve on your GPU(s), thereby maximizing your revenue.

Prerequisites

Before running a benchmarking test, make sure you have met all the prerequisites for running an Orchestrator node.

Benchmarking Steps

1

Pull the AI Runner Docker Image

First, pull the latest AI runner Docker image from Docker Hub. This image contains the necessary tools for benchmarking.

docker pull livepeer/ai-runner:latest
2

Pull Pipeline-Specific Images (optional)

Next, pull any pipeline-specific images if needed. Check the pipelines documentation for more information. For example, to pull the image for the segment-anything-2 pipeline:

docker pull livepeer/ai-runner:segment-anything-2
3

Download AI Models

Download the AI models you want to benchmark. For more information see the Download AI Models guide.

4

Execute the Benchmark Test

Run the benchmark test using the following command. This will simulate the pipeline/model combination on your GPU(s) and measure performance.

docker run --gpus <GPU_IDs> -v ./models:/models livepeer/ai-runner:latest python bench.py --pipeline <PIPELINE> --model_id <MODEL_ID> --runs <RUNS> --num_inference_steps <NUM_INFERENCE_STEPS> --batch_size <BATCH_SIZE>

Example command:

docker run --gpus '"device=0"' -v ./models:/models livepeer/ai-runner:latest python bench.py --pipeline text-to-image --model_id stabilityai/sd-turbo --runs 3 --batch_size 3

In this command, the following parameters are used:

  • <GPU_IDs>: Specify which GPU(s) to use. For example, '"device=0"' for GPU 0, '"device=0,1"' for GPU 0 and GPU 1, or '"device=all"' for all GPUs.
  • <PIPELINE>: The pipeline to benchmark (e.g., text-to-image).
  • <MODEL_ID>: The model ID to use for benchmarking (e.g., stabilityai/sd-turbo).
  • <RUNS>: The number of benchmark runs to perform.
  • <NUM_INFERENCE_STEPS>: The number of inference steps to perform (optional).

For benchmarking script usage information:

docker run livepeer/ai-runner:latest python bench.py -h
5

Analyze the Benchmark Results

After running the benchmark, you should see output similar to the following:

----AGGREGATE METRICS----


pipeline load time: 5.199s
pipeline load max GPU memory allocated: 5.876GiB
pipeline load max GPU memory reserved: 6.043GiB
avg inference time: 7.538s
avg inference max GPU memory allocated: 6.407GiB
avg inference max GPU memory reserved: 6.729GiB

This output shows the performance of your GPU(s) and helps you determine the best pipeline/model combination to serve on your GPU(s).