POST
/
audio-to-text
import { Livepeer } from "@livepeer/ai";
import { openAsBlob } from "node:fs";

const livepeer = new Livepeer({
  httpBearer: "<YOUR_BEARER_TOKEN_HERE>",
});

async function run() {
  const result = await livepeer.generate.audioToText({
    audio: await openAsBlob("example.file"),
    modelId: "",
    returnTimestamps: "true",
  });

  // Handle the result
  console.log(result);
}

run();
{
  "text": "<string>",
  "chunks": [
    {
      "timestamp": [
        "<any>"
      ],
      "text": "<string>"
    }
  ]
}

The default Gateway used in this guide is the public Livepeer.cloud Gateway. It is free to use but not intended for production-ready applications. For production-ready applications, consider using the Livepeer Studio Gateway, which requires an API token. Alternatively, you can set up your own Gateway node or partner with one via the ai-video channel on Discord.

Please note that the exact parameters, default values, and responses may vary between models. For more information on model-specific parameters, please refer to the respective model documentation available in the audio-to-text pipeline. Not all parameters might be available for a given model.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
audio
file
required

Uploaded audio file to be transcribed.

model_id
string
default:
required

Hugging Face model ID used for transcription.

return_timestamps
string
default:
true

Return timestamps for the transcribed text. Supported values: 'sentence', 'word', or a string boolean ('true' or 'false'). Default is 'true' ('sentence'). 'false' means no timestamps. 'word' means word-based timestamps.

Response

200
application/json
Successful Response

Response model for text generation.

text
string
required

The generated text.

chunks
object[]
required

The generated text chunks.

A chunk of text with a timestamp.