Overview

The segment-anything-2 pipeline provides direct access to the Segment Anything 2 pipeline developed by Meta AI Research. In its current version, it supports only image segmentation, enabling it to segment any object in an image. Future versions will also support direct video input, allowing the object to be consistently tracked across all frames of a video in real-time. This advancement will unlock new possibilities for video editing and enhance experiences in mixed reality. The pipeline is powered by the latest diffusion models from HuggingFace’s facebook/sam2-hiera-large.

Models

Warm Models

The current warm model requested for the segment-anything-2 pipeline is:

  • facebook/sam2-hiera-large: The largest model in the Segment Anything 2 model suite, designed for the most accurate image segmentation.

For faster responses with different segment-anything-2 diffusion models, ask Orchestrators to load it on their GPU via the ai-video channel in Discord Server.

On-Demand Models

The following models have been tested and verified for the segment-anything-2 pipeline:

If a specific model you wish to use is not listed, please submit a feature request on GitHub to get the model verified and added to the list.

Basic Usage Instructions

For a detailed understanding of the segment-anything-2 endpoint and to experiment with the API, see the Livepeer AI API Reference.

To generate an image with the segment-anything-2 pipeline, send a POST request to the Gateway’s segment-anything-2 API endpoint:

curl -X POST http://<GATEWAY_IP>/segment-anything-2 \
    -F model_id="facebook/sam2-hiera-large" \
    -F point_coords="[[120,100],[120,50]]" \
    -F point_labels="[1,0]" \
    -F image=@<PATH_TO_IMAGE>/cool-cat.png

In this command:

  • <GATEWAY_IP> should be replaced with your AI Gateway’s IP address.
  • model_id is the diffusion model for image segmentation.
  • The point_coords field holds the coordinates of the points to be segmented.
  • The point_labels field holds the labels for the points to be segmented.
  • The image field holds the absolute path to the image file to be transformed.

For additional optional parameters, refer to the Livepeer AI API Reference.

After execution, the Orchestrator processes the request and returns the response to the Gateway:

{
  "masks": "[[[2.84, 2.83, ...], [2.92, 2.91, ...], [3.22, 3.56, ...], ...]]",
  "scores": "[0.50, 0.37, ...]",
  "logits": "[[[2.84, 2.66, ...], [3.59, 5.20, ...], [5.07, 5.68, ...], ...]]"
}

Orchestrator Configuration

To configure your Orchestrator to serve the segment-anything-2 pipeline, refer to the Orchestrator Configuration guide.

System Requirements

The following system requirements are recommended for optimal performance:

We are planning to simplify the pricing in the future so orchestrators can set one AI price per compute unit and have the system automatically scale based on the model’s compute requirements.

The pricing for the segment-anything-2 pipeline is based on competitor pricing. However, we strongly encourage orchestrators to set their own pricing based on their costs and requirements. Setting a competitive price will help attract more jobs, as Gateways can set their maximum price for a job. The current recommended pricing for this pipeline is 3.22e-11 USD per input pixel (height * width).

Pipeline-Specific Image

To serve the segment-anything-2 pipeline, you must use a pipeline specific AI Runner container. Pull the required container from Docker Hub using the following command:

docker pull livepeer/ai-runner:segment-anything-2

API Reference

API Reference

Explore the segment-anything-2 endpoint and experiment with the API in Livepeer AI API Reference.