Models on the Livepeer AI network
Warm Models
During the Beta phase of the Livepeer AI network, Orchestrators are encouraged to keep at least one model per pipeline active on their GPUs (“warm models”). This approach ensures quicker response times for early builders. We’re optimizing GPU model loading/unloading to relax this requirement. The current warm models for each pipeline are listed on their respective pages.For faster responses with different Diffusion
models, request Orchestrators
to load it on their GPU via the
ai-video
channel in Discord
Server.On-Demand Models
Orchestrators can theoretically load any diffusion model from Hugging Face on-demand, optimizing GPU resources by loading models only when needed. However, during the Beta phase, Orchestrators need to pre-download a model.If a specific model you wish to use is not listed on the respective pipeline
page, submit a feature
request
on GitHub to get the model verified and added to the list.
Generative AI Pipelines
The Livepeer AI network currently supports the following generative AI pipelines:Audio-to-Text
The audio-to-text pipeline uses automatic speech recognition (ASR) to
translate audio to text with timestamps
Image-to-Image
The image-to-image pipeline enables advanced image manipulations, including
style transfer, image enhancement, and more
Image-to-Text
The image-to-text pipeline generates captions for input images, with an
optional prompt to guide the process.
Image-to-Video
The image-to-video pipeline creates animated high-quality videos from images
Segment-Anything-2
The segment-anything-2 pipeline offers promptable visual segmentation for
images and videos.
Text-to-Image
The text-to-image pipeline generates high-quality images from text
descriptions
Text-to-Speech
The text-to-speech pipeline generates high-quality, natural sounding speech
in the style of a given speaker (gender, pitch, speaking style, etc).
Upscale
The upscale pipeline transforms low-resolution images into high-quality ones
without distortion
LLM
The LLM pipeline provides an OpenAI-compatible interface for text
generation, enabling seamless integration into media workflows.