
Koog-compatible on-device LLM client enabling offline Gemini Nano and Apple Foundation Models inference without API keys, supports hybrid cloud fallback, model download, warm-up and routing.
Koog LLMClient for on-device inference — Gemini Nano (Android) & Apple Foundation Models (iOS).
Drop-in replacement for cloud providers in your Koog agents.
This library provides a Koog-compatible LLMClient that runs inference on-device using platform-native APIs:
No API keys. No network calls. No cloud costs. Your data stays on the device.
| koog-ondevice | koog-edge | |
|---|---|---|
| Model source | OS-provided (system-managed) | App-bundled (GGUF) |
| Backend | Gemini Nano, Apple FM | Cactus Compute, Leap SDK |
| App size impact | None | +hundreds of MB |
| Model updates | Automatic (OS updates) | Manual (app update) |
| Model choice | Platform default only | Any GGUF model |
They are complementary, not competing.
// build.gradle.kts
dependencies {
implementation("dev.ynagai.koog:koog-ondevice:0.1.0")
}val executor = simpleOnDeviceExecutor()
val service = AIAgentService(
// Resolves to the platform's native model (Gemini Nano on Android,
// Apple Foundation on iOS) — no platform branching needed.
llmModel = OnDeviceModels.Default,
promptExecutor = executor,
)
val response = with(service) {
createAgentAndRun("Summarize today's workout")
}Need a specific model?
OnDeviceModels.GeminiNanoandOnDeviceModels.AppleFoundationare available explicitly.
val executor = hybridExecutor(
onDevice = simpleOnDeviceExecutor(),
cloud = simpleFirebaseExecutor(), // from koog-firebase
)
val service = AIAgentService(
llmModel = FirebaseModels.Gemini2_5Flash,
promptExecutor = executor,
)checkOnDeviceStatus() is a suspend function — call it from a coroutine.
downloadOnDeviceModel() returns a Flow<OnDeviceDownload> of progress frames;
awaitAvailable() collects it and suspends until the model is ready.
when (checkOnDeviceStatus()) {
OnDeviceStatus.AVAILABLE -> { /* ready to run on-device */ }
OnDeviceStatus.DOWNLOADABLE -> { downloadOnDeviceModel().awaitAvailable() }
OnDeviceStatus.DOWNLOADING -> { /* a download is already in progress */ }
OnDeviceStatus.UNAVAILABLE -> { /* not supported — fall back to cloud */ }
}Pre-load the model to avoid the first-generation cold-start cost — best-effort, safe to call repeatedly:
warmUpOnDevice() // e.g. when entering a screen that will use the modelWhen routing with hybridExecutor, DefaultPromptRouter uses the platform's
exact token count where available (Android) and falls back to a character-based
estimate otherwise (iOS), so prompts that would overflow the on-device context
go to the cloud automatically.
On-device models are smaller and more constrained than cloud LLMs:
execute rejects any prompt that passes tools.moderate() is unsupported.Route prompts that need any of the above to a cloud executor (see Hybrid).
| Platform | Minimum | Runtime |
|---|---|---|
| Android | API 26 | Google Play Services with AICore (Gemini Nano) |
| iOS | 26.0 | Apple Intelligence-capable device (e.g. iPhone 15 Pro+, Apple Silicon iPad/Mac) |
Building for iOS requires the iOS 26 SDK (Xcode 26, Swift 6.2), since Apple Foundation Models ships only in that SDK.
Copyright 2026 Yuki Nagai
Licensed under the Apache License, Version 2.0
Koog LLMClient for on-device inference — Gemini Nano (Android) & Apple Foundation Models (iOS).
Drop-in replacement for cloud providers in your Koog agents.
This library provides a Koog-compatible LLMClient that runs inference on-device using platform-native APIs:
No API keys. No network calls. No cloud costs. Your data stays on the device.
| koog-ondevice | koog-edge | |
|---|---|---|
| Model source | OS-provided (system-managed) | App-bundled (GGUF) |
| Backend | Gemini Nano, Apple FM | Cactus Compute, Leap SDK |
| App size impact | None | +hundreds of MB |
| Model updates | Automatic (OS updates) | Manual (app update) |
| Model choice | Platform default only | Any GGUF model |
They are complementary, not competing.
// build.gradle.kts
dependencies {
implementation("dev.ynagai.koog:koog-ondevice:0.1.0")
}val executor = simpleOnDeviceExecutor()
val service = AIAgentService(
// Resolves to the platform's native model (Gemini Nano on Android,
// Apple Foundation on iOS) — no platform branching needed.
llmModel = OnDeviceModels.Default,
promptExecutor = executor,
)
val response = with(service) {
createAgentAndRun("Summarize today's workout")
}Need a specific model?
OnDeviceModels.GeminiNanoandOnDeviceModels.AppleFoundationare available explicitly.
val executor = hybridExecutor(
onDevice = simpleOnDeviceExecutor(),
cloud = simpleFirebaseExecutor(), // from koog-firebase
)
val service = AIAgentService(
llmModel = FirebaseModels.Gemini2_5Flash,
promptExecutor = executor,
)checkOnDeviceStatus() is a suspend function — call it from a coroutine.
downloadOnDeviceModel() returns a Flow<OnDeviceDownload> of progress frames;
awaitAvailable() collects it and suspends until the model is ready.
when (checkOnDeviceStatus()) {
OnDeviceStatus.AVAILABLE -> { /* ready to run on-device */ }
OnDeviceStatus.DOWNLOADABLE -> { downloadOnDeviceModel().awaitAvailable() }
OnDeviceStatus.DOWNLOADING -> { /* a download is already in progress */ }
OnDeviceStatus.UNAVAILABLE -> { /* not supported — fall back to cloud */ }
}Pre-load the model to avoid the first-generation cold-start cost — best-effort, safe to call repeatedly:
warmUpOnDevice() // e.g. when entering a screen that will use the modelWhen routing with hybridExecutor, DefaultPromptRouter uses the platform's
exact token count where available (Android) and falls back to a character-based
estimate otherwise (iOS), so prompts that would overflow the on-device context
go to the cloud automatically.
On-device models are smaller and more constrained than cloud LLMs:
execute rejects any prompt that passes tools.moderate() is unsupported.Route prompts that need any of the above to a cloud executor (see Hybrid).
| Platform | Minimum | Runtime |
|---|---|---|
| Android | API 26 | Google Play Services with AICore (Gemini Nano) |
| iOS | 26.0 | Apple Intelligence-capable device (e.g. iPhone 15 Pro+, Apple Silicon iPad/Mac) |
Building for iOS requires the iOS 26 SDK (Xcode 26, Swift 6.2), since Apple Foundation Models ships only in that SDK.
Copyright 2026 Yuki Nagai
Licensed under the Apache License, Version 2.0