Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt

Use this file to discover all available pages before exploring further.


The Livepeer AI gateway exposes nine batch pipelines and one LLM pipeline through HTTP POST endpoints. Each pipeline accepts a JSON request body keyed by model_id and pipeline-specific fields, and returns a JSON response with the result. Real-time video AI (live-video-to-video) runs through the trickle protocol and is covered separately in the real-time AI overview. For warm models, VRAM requirements, and architecture support per pipeline, see model support. For SDK wrappers, see AI SDKs.

Shared conventions

Base URL: Any Livepeer gateway endpoint. The community gateway at https://dream-gateway.livepeer.cloud accepts unauthenticated requests for development. Authentication: Bearer token when the gateway requires it. The community gateway does not require a token. Request format: POST /<pipeline-endpoint> with Content-Type: application/json. model_id field: Every pipeline accepts a model_id field specifying the Hugging Face model ID (or Ollama model ID for LLM). Omitting model_id uses the pipeline’s default warm model. Error responses: 400 for malformed requests, 422 for validation errors (invalid model_id, missing required fields), 500 for inference failures. Error bodies include a detail field with the failure reason. Cold model latency: If no orchestrator has the requested model warm in GPU memory, the first request triggers a model load (30 seconds to 5 minutes depending on model size). Subsequent requests to the same model on the same orchestrator are immediate.

Pipeline reference

Generate images from text prompts using diffusion models (SDXL, SD 1.5, Flux).
curl -X POST https://dream-gateway.livepeer.cloud/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "prompt": "a glowing neural network in a dark room",
    "width": 1024,
    "height": 1024,
    "guidance_scale": 7.5,
    "num_inference_steps": 8,
    "seed": 42
  }'
FieldTypeRequiredDescription
model_idstringNoHugging Face model ID. Default: SG161222/RealVisXL_V4.0_Lightning
promptstringYesText prompt for generation
negative_promptstringNoTerms to avoid in generation
widthintegerNoOutput width in pixels (default: 1024)
heightintegerNoOutput height in pixels (default: 1024)
guidance_scalenumberNoClassifier-free guidance scale (default: 7.5)
num_inference_stepsintegerNoDenoising steps (default depends on model; Lightning models use 4-8)
seedintegerNoRandom seed for reproducibility
num_images_per_promptintegerNoNumber of images to generate (default: 1)
safety_checkbooleanNoRun NSFW safety filter (default: true)
Response: JSON object with images array. Each image is a { url, seed } object.
Transform images using style transfer, enhancement, or img2img diffusion.
curl -X POST https://dream-gateway.livepeer.cloud/image-to-image \
  -F "model_id=timbrooks/instruct-pix2pix" \
  -F "prompt=make it look like a watercolour painting" \
  -F "image=@input.png" \
  -F "strength=0.8"
FieldTypeRequiredDescription
model_idstringNoDefault: timbrooks/instruct-pix2pix
imagefileYesInput image (multipart form upload)
promptstringYesTransformation instruction
strengthnumberNoHow much to transform (0.0 = no change, 1.0 = full regeneration)
guidance_scalenumberNoGuidance scale (default: 7.5)
num_inference_stepsintegerNoDenoising steps
seedintegerNoRandom seed
safety_checkbooleanNoNSFW filter (default: true)
Response: JSON with images array, same format as text-to-image.
image-to-image uses multipart/form-data, not application/json. The image is uploaded as a file field.
Animate a still image into a short video clip using Stable Video Diffusion.
curl -X POST https://dream-gateway.livepeer.cloud/image-to-video \
  -F "model_id=stabilityai/stable-video-diffusion-img2vid-xt" \
  -F "image=@input.png" \
  -F "fps=6" \
  -F "motion_bucket_id=127"
FieldTypeRequiredDescription
model_idstringNoDefault: stabilityai/stable-video-diffusion-img2vid-xt
imagefileYesInput image (multipart form upload)
fpsintegerNoOutput frames per second (default: 6)
motion_bucket_idintegerNoMotion intensity (0-255; default: 127)
seedintegerNoRandom seed
safety_checkbooleanNoNSFW filter (default: true)
Response: JSON with frames array containing frame URLs, or a video URL.
SVD outputs 14-25 frames at 576x1024 resolution. Text prompts are not used; the image is the sole conditioning input.
Generate captions or descriptions for images using BLIP or vision-language models.
curl -X POST https://dream-gateway.livepeer.cloud/image-to-text \
  -F "model_id=Salesforce/blip-image-captioning-large" \
  -F "image=@photo.jpg"
FieldTypeRequiredDescription
model_idstringNoDefault: Salesforce/blip-image-captioning-large
imagefileYesInput image (multipart form upload)
promptstringNoOptional prompt to guide caption content
Response: JSON with text field containing the generated caption.
Transcribe audio to text with per-chunk timestamps using Whisper.
curl -X POST https://dream-gateway.livepeer.cloud/audio-to-text \
  -F "model_id=openai/whisper-large-v3" \
  -F "audio=@recording.mp3"
FieldTypeRequiredDescription
model_idstringNoDefault: openai/whisper-large-v3
audiofileYesAudio file (mp4, webm, mp3, flac, wav, m4a). Max 50 MB.
Response: JSON with text (full transcript) and chunks array (per-segment timestamps and text).
Generate natural speech from text using Parler-TTS.
curl -X POST https://dream-gateway.livepeer.cloud/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "parler-tts/parler-tts-large-v1",
    "text": "Livepeer is a decentralised video infrastructure network.",
    "description": "A female speaker with a warm, clear voice and moderate pace."
  }'
FieldTypeRequiredDescription
model_idstringNoDefault: parler-tts/parler-tts-large-v1
textstringYesText to synthesise. Max ~600 characters; chunk longer text.
descriptionstringNoVoice characteristics (speaker identity, style, audio quality)
Response: JSON with audio object containing a URL to the generated audio file.
Requires a pipeline-specific AI Runner container. Not all orchestrators have this pipeline active.
Upscale low-resolution images using the SD x4-Upscaler (4x super-resolution).
curl -X POST https://dream-gateway.livepeer.cloud/upscale \
  -F "model_id=stabilityai/stable-diffusion-x4-upscaler" \
  -F "image=@lowres.png" \
  -F "prompt=high quality, sharp details"
FieldTypeRequiredDescription
model_idstringNoDefault: stabilityai/stable-diffusion-x4-upscaler
imagefileYesInput image (multipart form upload)
promptstringNoOptional quality guidance prompt
seedintegerNoRandom seed
safety_checkbooleanNoNSFW filter (default: true)
Response: JSON with images array, same format as text-to-image.
Promptable visual segmentation for images using SAM 2 (Meta AI).
curl -X POST https://dream-gateway.livepeer.cloud/segment-anything-2 \
  -F "model_id=facebook/sam2-hiera-large" \
  -F "image=@photo.jpg" \
  -F 'point_coords=[[500,375]]' \
  -F 'point_labels=[1]'
FieldTypeRequiredDescription
model_idstringNoDefault: facebook/sam2-hiera-large
imagefileYesInput image
point_coordsarrayNoPoint prompts as [[x,y], ...]
point_labelsarrayNoLabels for points (1 = foreground, 0 = background)
boxarrayNoBounding box prompt [x1, y1, x2, y2]
Response: JSON with masks, scores, and logits arrays.
OpenAI-compatible chat completions using Ollama-based runner.
curl -X POST https://dream-gateway.livepeer.cloud/llm \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
      {"role": "user", "content": "Explain Livepeer in one sentence."}
    ]
  }'
FieldTypeRequiredDescription
modelstringYesOllama-compatible model ID
messagesarrayYesOpenAI-format message array (role + content)
max_tokensintegerNoMaximum output tokens
temperaturenumberNoSampling temperature (0.0-2.0)
streambooleanNoStream response tokens (SSE)
Response: OpenAI-compatible chat completion object with choices[0].message.content.
The LLM pipeline is in beta. The request format follows the OpenAI /v1/chat/completions shape. Supported models include Meta-Llama-3.1-8B-Instruct (warm, 8 GB VRAM), Mistral-7B-Instruct-v0.3, Gemma-2-9b-it, and Qwen2.5-7B-Instruct.

Operational notes

Multipart vs JSON. Pipelines that accept file uploads (image-to-image, image-to-video, image-to-text, audio-to-text, upscale, segment-anything-2) use multipart/form-data. Pipelines that accept only text input (text-to-image, text-to-speech, LLM) use application/json. Gateway selection. The community gateway routes to whichever orchestrator in the active set has the requested model warm. For production, operate a self-hosted gateway with -maxPricePerUnit to control costs, or use a gateway provider with an API key. safety_check filter. Enabled by default on image-generating pipelines. Set to false to disable. The filter runs on the orchestrator side; disabling it does not affect content moderation policies that the gateway operator may enforce. The AI quickstart walks through the first inference call end-to-end with error handling.
Last modified on May 19, 2026