POST
/
audio-to-text

The default Gateway used in this guide is the public Livepeer.cloud Gateway. It is free to use but not intended for production-ready applications. For production-ready applications, consider using the Livepeer Studio Gateway, which requires an API token. Alternatively, you can set up your own Gateway node or partner with one via the ai-video channel on Discord.

Please note that the exact parameters, default values, and responses may vary between models. For more information on model-specific parameters, please refer to the respective model documentation available in the audio-to-text pipeline. Not all parameters might be available for a given model.

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
audio
file
required

Uploaded audio file to be transcribed.

model_id
string
default: required

Hugging Face model ID used for transcription.

return_timestamps
string
default: true

Return timestamps for the transcribed text. Supported values: 'sentence', 'word', or a string boolean ('true' or 'false'). Default is 'true' ('sentence'). 'false' means no timestamps. 'word' means word-based timestamps.

Response

200 - application/json

Response model for text generation.

text
string
required

The generated text.

chunks
object[]
required

The generated text chunks.