Overview
The Livepeer AI network offers a variety of generative AI pipelines that applications can use to request AI inference jobs on the Livepeer network. Currently, the focus is on Diffusion models developed using Huggingface’s Diffusers library, but future updates will extend support to other model types. This section introduces the available pipelines, the models they support, and provides a basic usage example. For a comprehensive guide on integrating the Livepeer AI network into your application, refer to the Building on Livepeer AI section.
Models on the Livepeer AI network
Warm Models
During the Beta phase of the Livepeer AI network, Orchestrators are encouraged to keep at least one model per pipeline active on their GPUs (“warm models”). This approach ensures quicker response times for early builders. We’re optimizing GPU model loading/unloading to relax this requirement. The current warm models for each pipeline are listed on their respective pages.
For faster responses with different Diffusion
models, request Orchestrators
to load it on their GPU via the ai-video
channel in Discord
Server.
On-Demand Models
Orchestrators can theoretically load any diffusion model from Hugging Face on-demand, optimizing GPU resources by loading models only when needed. However, during the Beta phase, Orchestrators need to pre-download a model.
If a specific model you wish to use is not listed on the respective pipeline page, submit a feature request on GitHub to get the model verified and added to the list.
Generative AI Pipelines
The Livepeer AI network currently supports the following generative AI pipelines:
Text-to-Image
The text-to-image pipeline generates high-quality images from text descriptions
Image-to-Image
The image-to-image pipeline enables advanced image manipulations, including style transfer, image enhancement, and more
Image-to-Video
The image-to-video pipeline creates animated high-quality videos from images
Upscale
The upscale pipeline transforms low-resolution images into high-quality ones without distortion
Audio-to-Text
The audio-to-text pipeline uses automatic speech recognition (ASR) to translate audio to text with timestamps
Segment-Anything-2
The segment-anything-2 pipeline offers promptable visual segmentation for images and videos.
Text-to-Speech
The text-to-speech pipeline generates high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).
Was this page helpful?