Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.livepeer.org/llms.txt

Use this file to discover all available pages before exploring further.


By the end of this tutorial you’ll have a Next.js 15 chatbot that takes user messages, streams responses from the Livepeer LLM pipeline token-by-token, and maintains conversation history. The LLM pipeline is OpenAI-compatible at the wire level: it accepts messages arrays, returns choices[0].delta.content chunks, and behaves like any other chat completions endpoint. The orchestrator pool runs Ollama-backed inference on GPUs as small as 8 GB. This is the Persona 1 activation moment for text inference. The image generation tutorial proved the batch path; this one proves the streaming path. The wire format you’ll handle here works against any OpenAI-compatible endpoint, which means swapping providers is a URL change.

Required Tools

  • Node.js 20 or later
  • npm, pnpm, or yarn
  • A code editor
No API key needed for development. The community gateway at dream-gateway.livepeer.cloud accepts unauthenticated POSTs to the LLM endpoint for experimentation.

Project Bootstrap

1

Create the project

npx create-next-app@latest livepeer-chatbot \
  --typescript \
  --tailwind \
  --app \
  --src-dir \
  --import-alias "@/*"
cd livepeer-chatbot
2

Configure environment variables

Save as .env.local:
LIVEPEER_GATEWAY_URL=https://dream-gateway.livepeer.cloud
LIVEPEER_LLM_MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct
The warm model on the community gateway is Llama 3.1 8B Instruct. Cold-start applies to any other model: 30 seconds to a few minutes for the first request while the orchestrator loads the weights.

Streaming Route Handler

Server actions can’t stream responses cleanly. Route handlers can; the standard pattern for chat is a POST /api/chat handler that proxies the request to the LLM endpoint and pipes the SSE response back to the client. Save as src/app/api/chat/route.ts:
import { NextRequest } from 'next/server';

export const runtime = 'edge';

const GATEWAY_URL = process.env.LIVEPEER_GATEWAY_URL!;
const MODEL = process.env.LIVEPEER_LLM_MODEL!;

interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

export async function POST(req: NextRequest) {
  const { messages } = (await req.json()) as { messages: Message[] };

  const response = await fetch(`${GATEWAY_URL}/llm`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: MODEL,
      messages,
      stream: true,
    }),
  });

  if (!response.ok || !response.body) {
    return new Response(`Gateway returned ${response.status}`, {
      status: 502,
    });
  }

  // Pipe the SSE stream straight through to the client.
  return new Response(response.body, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
    },
  });
}
Three things to notice. export const runtime = 'edge' runs the handler on Edge runtime, which keeps cold-start low and streams responses without buffering. The stream: true flag in the request body asks the LLM endpoint for Server-Sent Events instead of a single JSON response. The handler pipes the response body directly through; no SSE parsing on the server side, no JSON deserialisation. The browser parses the stream.

SSE Wire Format

The LLM endpoint streams chunks in this shape:
data: {"choices":[{"delta":{"content":"Live","role":"assistant"},"finish_reason":null}]}

data: {"choices":[{"delta":{"content":"peer","role":"assistant"},"finish_reason":null}]}

data: {"choices":[{"delta":{"content":" is","role":"assistant"},"finish_reason":null}]}

...

data: {"choices":[{"delta":{"content":"","role":"assistant"},"finish_reason":"stop"}]}
Each data: line is one token (or a small group of tokens) wrapped in OpenAI’s chat completions chunk shape. The final chunk has empty content and finish_reason: "stop". The client concatenates the content fields as they arrive and renders them incrementally.

Chat UI Component

The UI maintains a list of messages and appends to the last assistant message as tokens stream in. Save as src/app/components/Chat.tsx:
'use client';

const { useState } = React;

interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

export function Chat() {
  const [messages, setMessages] = useState<Message[]>([
    {
      role: 'system',
      content: 'You are a helpful assistant. Keep responses concise.',
    },
  ]);
  const [input, setInput] = useState('');
  const [streaming, setStreaming] = useState(false);

  async function sendMessage() {
    if (!input.trim() || streaming) return;

    const userMessage: Message = { role: 'user', content: input };
    const newMessages = [...messages, userMessage];
    setMessages(newMessages);
    setInput('');
    setStreaming(true);

    // Add an empty assistant message that we'll fill as tokens arrive.
    setMessages((prev) => [...prev, { role: 'assistant', content: '' }]);

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ messages: newMessages }),
    });

    if (!response.ok || !response.body) {
      setStreaming(false);
      return;
    }

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop() ?? '';

      for (const line of lines) {
        if (!line.startsWith('data: ')) continue;
        const data = line.slice(6).trim();
        if (!data) continue;

        try {
          const chunk = JSON.parse(data);
          const token = chunk.choices?.[0]?.delta?.content ?? '';
          const finished = chunk.choices?.[0]?.finish_reason === 'stop';

          if (token) {
            setMessages((prev) => {
              const next = [...prev];
              next[next.length - 1] = {
                ...next[next.length - 1],
                content: next[next.length - 1].content + token,
              };
              return next;
            });
          }

          if (finished) break;
        } catch {
          // Skip malformed chunks
        }
      }
    }

    setStreaming(false);
  }

  return (
    <div className="max-w-2xl mx-auto p-4 space-y-4">
      <div className="space-y-2 min-h-[400px]">
        {messages
          .filter((m) => m.role !== 'system')
          .map((m, i) => (
            <div
              key={i}
              className={`p-3 rounded ${
                m.role === 'user' ? 'bg-blue-100' : 'bg-gray-100'
              }`}
            >
              <p className="text-xs text-gray-600 mb-1">{m.role}</p>
              <p className="whitespace-pre-wrap">{m.content}</p>
            </div>
          ))}
      </div>
      <div className="flex gap-2">
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyDown={(e) => e.key === 'Enter' && sendMessage()}
          placeholder="Ask anything"
          disabled={streaming}
          className="flex-1 border rounded p-2"
        />
        <button
          onClick={sendMessage}
          disabled={streaming}
          className="bg-blue-600 text-white px-4 py-2 rounded disabled:opacity-50"
        >
          {streaming ? 'Streaming…' : 'Send'}
        </button>
      </div>
    </div>
  );
}
The reader loop pulls bytes from the response stream, decodes them, and splits on newlines. The buffer handles the case where a chunk lands mid-line. For each complete data: line, the handler parses the JSON, extracts the token from choices[0].delta.content, and appends it to the last assistant message. The loop exits when finish_reason: "stop" arrives.

Page Composition

Save as src/app/page.tsx:
// Import Chat from ./components/Chat.

export default function HomePage() {
  return (
    <main className="min-h-screen bg-white">
      <header className="border-b p-4">
        <h1 className="text-xl font-bold">Livepeer LLM Chatbot</h1>
        <p className="text-sm text-gray-600">
          Streaming chat via the decentralised LLM pipeline.
        </p>
      </header>
      <Chat />
    </main>
  );
}
Run the dev server:
npm run dev
Open http://localhost:3000. Type a message, hit Send, and tokens stream into the response bubble.

Model Selection

The community gateway routes any model value to whichever orchestrator has the requested weights warm. Llama 3.1 8B Instruct is the default warm model on the network. Three other Ollama-compatible models are commonly available:
ModelVRAMNotes
meta-llama/Meta-Llama-3.1-8B-Instruct8 GBWarm default, fastest first response
mistralai/Mistral-7B-Instruct-v0.38 GBStrong instruction-following
google/gemma-2-9b-it10 GBGoogle’s open instruction model
Qwen/Qwen2.5-7B-Instruct8 GBStrong on code and reasoning
Any Ollama-compatible model works. Cold-start (30 seconds to a few minutes) applies to models not currently loaded on any orchestrator. For consistent latency in production, run your own gateway with the target model pre-loaded; see .

Production Considerations

The community gateway is shaped for experimentation. Production chat needs four changes. Authentication. Swap to a paid gateway and add Authorization: Bearer ${process.env.LIVEPEER_API_KEY} to the fetch headers in the route handler. Conversation persistence. The current implementation holds messages in client state, which means refresh loses the conversation. Persist to a database keyed by user and session. Token usage and rate limits. The LLM pipeline charges per token of output. Add a per-user token budget enforced server-side, and a per-IP rate limit on the route handler. Cold-start handling. If the requested model is cold, the first response can take a few minutes. Add a warming request on app start that sends a one-token completion in the background, so by the time a user opens chat the model is ready. Full hardening guidance in .

Common Errors

The route handler couldn’t reach the gateway. Confirm LIVEPEER_GATEWAY_URL is set; the Edge runtime doesn’t read variables from .env.local in production unless they’re declared in next.config.ts or as Edge-runtime env vars.
The orchestrator timed out or the model unloaded. Retry the request; the network routes to a different orchestrator on retry.
A proxy (Cloudflare, nginx, Vercel) is buffering. Confirm the Cache-Control: no-cache and Content-Type: text/event-stream headers are set on the response. For Cloudflare, disable response buffering on the route.
Some chunks contain comments or empty lines. The handler skips empty lines and wraps parse in try/catch; if you see frequent parse errors, log the raw line to identify the format drift.
Expected for non-warm models. Either use the warm default (meta-llama/Meta-Llama-3.1-8B-Instruct) or send a warming request on app start.
You have a streaming chatbot on the Livepeer LLM pipeline. The same endpoint shape works for any Ollama-compatible model; switch the model field to try Mistral, Gemma, or Qwen variants.

AI agent prompt

Build the "Chatbot with Livepeer LLM" tutorial as a Next.js App Router project. Create a TypeScript app, add LIVEPEER_GATEWAY_URL=https://dream-gateway.livepeer.cloud to .env.local, implement src/app/api/chat/route.ts as a streaming Server-Sent Events route that forwards OpenAI-compatible chat completion requests to the Livepeer LLM endpoint, and build a client chat UI that appends streamed tokens in place. Use model "meta-llama/Meta-Llama-3.1-8B-Instruct" by default and expose a small model selector for Mistral, Gemma, and Qwen variants. Include run commands, a curl test for the route, browser verification at http://localhost:3000, and production notes that any LIVEPEER_API_KEY must stay server-side. Do not use Studio.

Next Steps

Eliza Plugin Tutorial

Build a full agent with character files, RAG, and multi-agent swarms.

AI Pipelines

The other ten pipelines: image gen, audio, vision, segmentation.

Model Support

Warm models, VRAM requirements, custom model paths.

Production Hardening

Rate limits, auth, observability, cold-start handling.
Last modified on May 19, 2026