audio-to-text
pipeline converts audio from media files into text,
utilizing cutting-edge diffusion models from HuggingFace’s
automatic-speech-recognition (ASR) pipeline.
audio-to-text
pipeline is:
ai-video
channel in Discord Server.audio-to-text
pipeline:
Tested and Verified Diffusion Models
audio-to-text
endpoint and to experiment
with the API, see the Livepeer AI API
Reference.audio-to-text
pipeline, submit a
POST
request to the Gateway’s audio-to-text
API endpoint:
<GATEWAY_IP>
should be replaced with your AI Gateway’s IP address.model_id
is the diffusion model for audio transcription.audio
is the path to the audio file to be transcribed.mp4
, webm
, mp3
, flac
, wav
and m4a
-
Maximum request size: 50 MBaudio-to-text
pipeline, refer to
the Orchestrator Configuration guide.
audio-to-text
pipeline is based on competitor pricing.
However, we strongly encourage orchestrators to set their own pricing based on
their costs and requirements. Setting a competitive price will help attract more
jobs, as Gateways can set their maximum price for a job. The currently
recommended pricing for this pipeline is 0.02e-6 USD
per milliseconds of
audio input.
audio-to-text
endpoint and experiment with the API in the
Livepeer AI API Reference.