
Unifies diverse LLM providers behind a single OpenAI-style API: streaming-first chat/embeddings/voice/realtime/batch modalities, per-model health-aware fallback chains, outbound control-plane auth and extensibility.
llmleaf is a llm proxy. It proxies different llm providers and their slighty different apis and converts it a a single api surface.
echo for local testing.# Run with the embedded dev config (echo provider, key `local-dev:s3cret`)
cargo run -p llmleaf
# …or point at your own config
cargo run -p llmleaf -- llmleaf.tomlCopy llmleaf.example.toml, fill in provider credentials (use env:VAR indirection — secrets
never live in the file), and pass it as the argument. Container image: docker buildx bake image
(listens on :8080). Send a request:
curl localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $(printf 'local-dev:s3cret' | base64)" \
-d '{"model":"demo","messages":[{"role":"user","content":"hi"}]}'See llmleaf.example.toml for the full configuration surface (providers, routes, keys, control plane).
Consumer endpoints (OpenAI-compatible unless noted):
| Endpoint | Purpose |
|---|---|
POST /v1/chat/completions |
Chat (SSE streaming) |
POST /v1/messages |
Anthropic Messages dialect |
POST /v1/embeddings |
Embeddings |
POST /v1/audio/speech, GET /v1/audio/voices
|
Text-to-speech |
POST /v1/audio/transcriptions |
Speech-to-text |
GET /v1/realtime |
OpenAI Realtime (WebSocket) |
POST /v1/batches, GET /v1/batches/{id}[/results]
|
Batch jobs |
GET /v1/models, GET /v1/openapi.json, GET /healthz
|
Discovery & health |
Read-only admin (optional token): GET /admin/routes, /admin/health, /admin/keys.
Official client SDKs for 6 languages live in clients/.
Two strictly separated planes. The core (data plane) is the proxy; the control plane is reached only outbound — the core pulls identity/verdicts and pushes usage, never the reverse. See SOUL.md for the full design constitution.
flowchart LR
Cons["Consumers<br/>OpenAI · OpenRouter · Anthropic"] --> Surf["Compat surfaces"]
subgraph Core["llmleaf core — data plane"]
direction LR
Surf --> Auth["authenticate"] --> In["map in"] --> Route["route + fallback"] --> Stream["stream"] --> Out["map out"] --> Ev["emit events"]
end
Route --> Prov["Providers<br/>compiled-in traits · WASM plugins"]
Prov --> Up["LLM providers"]
Ctrl[["Control plane (outbound)"]]
Auth -. "pull identity / verdicts" .-> Ctrl
Ev -. "push usage" .-> CtrlCopyright (C) 2026 Fionn Langhans fionnlanghans@codefionn.eu.
llmleaf is free software licensed under the GNU Lesser General Public License,
version 3 or later (LGPL-3.0-or-later). The full text is in COPYING.LESSER
(the LGPLv3 terms) together with COPYING (the GPLv3 it builds on).
Clients are licensed under MIT and APACHE-2.0 license.
llmleaf is a llm proxy. It proxies different llm providers and their slighty different apis and converts it a a single api surface.
echo for local testing.# Run with the embedded dev config (echo provider, key `local-dev:s3cret`)
cargo run -p llmleaf
# …or point at your own config
cargo run -p llmleaf -- llmleaf.tomlCopy llmleaf.example.toml, fill in provider credentials (use env:VAR indirection — secrets
never live in the file), and pass it as the argument. Container image: docker buildx bake image
(listens on :8080). Send a request:
curl localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $(printf 'local-dev:s3cret' | base64)" \
-d '{"model":"demo","messages":[{"role":"user","content":"hi"}]}'See llmleaf.example.toml for the full configuration surface (providers, routes, keys, control plane).
Consumer endpoints (OpenAI-compatible unless noted):
| Endpoint | Purpose |
|---|---|
POST /v1/chat/completions |
Chat (SSE streaming) |
POST /v1/messages |
Anthropic Messages dialect |
POST /v1/embeddings |
Embeddings |
POST /v1/audio/speech, GET /v1/audio/voices
|
Text-to-speech |
POST /v1/audio/transcriptions |
Speech-to-text |
GET /v1/realtime |
OpenAI Realtime (WebSocket) |
POST /v1/batches, GET /v1/batches/{id}[/results]
|
Batch jobs |
GET /v1/models, GET /v1/openapi.json, GET /healthz
|
Discovery & health |
Read-only admin (optional token): GET /admin/routes, /admin/health, /admin/keys.
Official client SDKs for 6 languages live in clients/.
Two strictly separated planes. The core (data plane) is the proxy; the control plane is reached only outbound — the core pulls identity/verdicts and pushes usage, never the reverse. See SOUL.md for the full design constitution.
flowchart LR
Cons["Consumers<br/>OpenAI · OpenRouter · Anthropic"] --> Surf["Compat surfaces"]
subgraph Core["llmleaf core — data plane"]
direction LR
Surf --> Auth["authenticate"] --> In["map in"] --> Route["route + fallback"] --> Stream["stream"] --> Out["map out"] --> Ev["emit events"]
end
Route --> Prov["Providers<br/>compiled-in traits · WASM plugins"]
Prov --> Up["LLM providers"]
Ctrl[["Control plane (outbound)"]]
Auth -. "pull identity / verdicts" .-> Ctrl
Ev -. "push usage" .-> CtrlCopyright (C) 2026 Fionn Langhans fionnlanghans@codefionn.eu.
llmleaf is free software licensed under the GNU Lesser General Public License,
version 3 or later (LGPL-3.0-or-later). The full text is in COPYING.LESSER
(the LGPLv3 terms) together with COPYING (the GPLv3 it builds on).
Clients are licensed under MIT and APACHE-2.0 license.