Overview - Livepeer Docs

The Livepeer AI network offers a variety of generative AI pipelines that applications can use to request AI inference jobs on the Livepeer network. Currently, the focus is on Diffusion models developed using Huggingface’s Diffusers library, but future updates will extend support to other model types. This section introduces the available pipelines, the models they support, and provides a basic usage example. For a comprehensive guide on integrating the Livepeer AI network into your application, refer to the Building on Livepeer AI section.

Models on the Livepeer AI network

Warm Models

During the Beta phase of the Livepeer AI network, Orchestrators are encouraged to keep at least one model per pipeline active on their GPUs (“warm models”). This approach ensures quicker response times for early builders. We’re optimizing GPU model loading/unloading to relax this requirement. The current warm models for each pipeline are listed on their respective pages.

For faster responses with different Diffusion models, request Orchestrators to load it on their GPU via the ai-video channel in Discord Server.

On-Demand Models

Orchestrators can theoretically load any diffusion model from Hugging Face on-demand, optimizing GPU resources by loading models only when needed. However, during the Beta phase, Orchestrators need to pre-download a model.

If a specific model you wish to use is not listed on the respective pipeline page, submit a feature request on GitHub to get the model verified and added to the list.

Generative AI Pipelines

The Livepeer AI network currently supports the following generative AI pipelines:

Audio-to-Text

The audio-to-text pipeline uses automatic speech recognition (ASR) to translate audio to text with timestamps

Image-to-Image

The image-to-image pipeline enables advanced image manipulations, including style transfer, image enhancement, and more

Image-to-Text

The image-to-text pipeline generates captions for input images, with an optional prompt to guide the process.

Image-to-Video

The image-to-video pipeline creates animated high-quality videos from images

Segment-Anything-2

The segment-anything-2 pipeline offers promptable visual segmentation for images and videos.

Text-to-Image

The text-to-image pipeline generates high-quality images from text descriptions

Text-to-Speech

The text-to-speech pipeline generates high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

Upscale

The upscale pipeline transforms low-resolution images into high-quality ones without distortion

LLM

The LLM pipeline provides an OpenAI-compatible interface for text generation, enabling seamless integration into media workflows.

AI Video

​Models on the Livepeer AI network

​Warm Models

​On-Demand Models

​Generative AI Pipelines

Audio-to-Text

Image-to-Image

Image-to-Text

Image-to-Video

Segment-Anything-2

Text-to-Image

Text-to-Speech

Upscale

LLM

Models on the Livepeer AI network

Warm Models

On-Demand Models

Generative AI Pipelines