This is Tutorial 2 of 3.
- Tutorial 1: (start here if not completed)
- Tutorial 3:
Architecture
Trickle Protocol
The trickle protocol is a simple HTTP-based streaming convention:- The Orchestrator calls the container’s
PUT /live/{job_id}/source- the input stream (frames, audio, or arbitrary bytes) - The container processes the data and writes results to
GET /live/{job_id}/output- the output stream the Orchestrator pulls - PyTrickle abstracts both sides: implement a
FrameProcessorthat receives data and returns data
Pattern
Prerequisites
From Tutorial 1:./livepeerbinary installed and working- Off-chain Orchestrator + Gateway tested
- Docker Engine 24+
- Python 3.10+ with pip
- Optional:
pip install openai-whisperfor the Whisper-tiny step
Steps
What Happened
A complete custom AI pipeline was built and deployed on the Livepeer network without a GPU:- BYOC containers use the trickle HTTP protocol - not gRPC, not the ai-runner Pipeline interface
- Any Docker container that speaks trickle can be a Livepeer pipeline
- The Gateway-Orchestrator routing logic is identical for BYOC and standard pipelines
- CPU-based AI inference (Whisper-tiny, scikit-learn, etc.) works without any GPU changes
Troubleshooting
Gateway returns 404 or 'model not found'
Gateway returns 404 or 'model not found'
Check that
-byocModelID on the Orchestrator matches X-Model-Id in the test job. They must be identical strings. Confirm the Orchestrator log shows BYOC capability registered: green-tint-cpu.Orchestrator cannot reach the BYOC container
Orchestrator cannot reach the BYOC container
Verify the container is running and reachable:If
Check Container
--network host was not used, check Docker bridge network connectivity to port 8000.Container exits immediately
Container exits immediately
Check Logs
Process receives empty bytes
Process receives empty bytes
The Orchestrator may send a keepalive ping before the actual payload:
Handle Empty
Whisper-tiny is very slow
Whisper-tiny is very slow
On CPU, Whisper-tiny processes approximately 1 second of audio in ~10 seconds. This is expected. For real-time inference, a GPU is needed (Tutorial 3).
Related Pages
Tutorial 3: Go Production
On-chain registration, GPU acceleration, and the public Orchestrator network.
BYOC Pipelines
Full BYOC reference: discovery, capability advertisement, and container requirements.
ai-runner Pipelines
For GPU models needing the full ai-runner stack, use the Pipeline interface instead of BYOC.
Python Gateway SDK
Send jobs programmatically with session management and remote signer payments.