koog-ondevice

Koog-compatible on-device LLM client enabling offline Gemini Nano and Apple Foundation Models inference without API keys, supports hybrid cloud fallback, model download, warm-up and routing.

#router
#llm
#kotlin-flow
#coroutines
#client
#apple
#ai

Suggest an edit

Android JVMKotlin/Native

GitHub stars0

Authorsuny

Dependents0

LicenseApache License 2.0

Creation date2 months ago

Last activity2 months ago

Latest release0.1.0 (2 months ago)

GitHub repository

koog-ondevice

Koog LLMClient for on-device inference — Gemini Nano (Android) & Apple Foundation Models (iOS).

Drop-in replacement for cloud providers in your Koog agents.

What is this?

This library provides a Koog-compatible LLMClient that runs inference on-device using platform-native APIs:

Android: ML Kit GenAI Prompt API (Gemini Nano via AICore)
iOS: Apple Foundation Models (iOS 26+)

No API keys. No network calls. No cloud costs. Your data stays on the device.

How is this different from koog-edge?

	koog-ondevice	koog-edge
Model source	OS-provided (system-managed)	App-bundled (GGUF)
Backend	Gemini Nano, Apple FM	Cactus Compute, Leap SDK
App size impact	None	+hundreds of MB
Model updates	Automatic (OS updates)	Manual (app update)
Model choice	Platform default only	Any GGUF model

They are complementary, not competing.

Setup

// build.gradle.kts
dependencies {
    implementation("dev.ynagai.koog:koog-ondevice:0.1.0")
}

Quick Start

Pure On-Device

val executor = simpleOnDeviceExecutor()
val service = AIAgentService(
    // Resolves to the platform's native model (Gemini Nano on Android,
    // Apple Foundation on iOS) — no platform branching needed.
    llmModel = OnDeviceModels.Default,
    promptExecutor = executor,
)

val response = with(service) {
    createAgentAndRun("Summarize today's workout")
}

Need a specific model? OnDeviceModels.GeminiNano and OnDeviceModels.AppleFoundation are available explicitly.

Hybrid (On-Device + Cloud)

val executor = hybridExecutor(
    onDevice = simpleOnDeviceExecutor(),
    cloud = simpleFirebaseExecutor(),  // from koog-firebase
)

val service = AIAgentService(
    llmModel = FirebaseModels.Gemini2_5Flash,
    promptExecutor = executor,
)

Availability Check

checkOnDeviceStatus() is a suspend function — call it from a coroutine. downloadOnDeviceModel() returns a Flow<OnDeviceDownload> of progress frames; awaitAvailable() collects it and suspends until the model is ready.

when (checkOnDeviceStatus()) {
    OnDeviceStatus.AVAILABLE -> { /* ready to run on-device */ }
    OnDeviceStatus.DOWNLOADABLE -> { downloadOnDeviceModel().awaitAvailable() }
    OnDeviceStatus.DOWNLOADING -> { /* a download is already in progress */ }
    OnDeviceStatus.UNAVAILABLE -> { /* not supported — fall back to cloud */ }
}

Warm-up (optional)

Pre-load the model to avoid the first-generation cold-start cost — best-effort, safe to call repeatedly:

warmUpOnDevice()  // e.g. when entering a screen that will use the model

When routing with hybridExecutor, DefaultPromptRouter uses the platform's exact token count where available (Android) and falls back to a character-based estimate otherwise (iOS), so prompts that would overflow the on-device context go to the cloud automatically.

Limitations

On-device models are smaller and more constrained than cloud LLMs:

No tool / function calling — execute rejects any prompt that passes tools.
No moderation — moderate() is unsupported.
No structured output — only plain text completion (with temperature) is supported.
iOS serializes generation — a single on-device Foundation Models session backs the client, so concurrent requests run one at a time.

Route prompts that need any of the above to a cloud executor (see Hybrid).

Requirements

Platform	Minimum	Runtime
Android	API 26	Google Play Services with AICore (Gemini Nano)
iOS	26.0	Apple Intelligence-capable device (e.g. iPhone 15 Pro+, Apple Silicon iPad/Mac)

Building for iOS requires the iOS 26 SDK (Xcode 26, Swift 6.2), since Apple Foundation Models ships only in that SDK.

Koog — JetBrains AI Agent Framework for Kotlin
koog-firebase — Firebase Gemini integration for Koog
koog-edge — GGUF-based on-device inference for Koog

License

Copyright 2026 Yuki Nagai

Licensed under the Apache License, Version 2.0

Android JVMKotlin/Native

GitHub stars0

Authorsuny

Dependents0

LicenseApache License 2.0

Creation date2 months ago

Last activity2 months ago

Latest release0.1.0 (2 months ago)

GitHub repository

koog-ondevice

Koog LLMClient for on-device inference — Gemini Nano (Android) & Apple Foundation Models (iOS).

Drop-in replacement for cloud providers in your Koog agents.

What is this?

This library provides a Koog-compatible LLMClient that runs inference on-device using platform-native APIs:

Android: ML Kit GenAI Prompt API (Gemini Nano via AICore)
iOS: Apple Foundation Models (iOS 26+)

No API keys. No network calls. No cloud costs. Your data stays on the device.

How is this different from koog-edge?

	koog-ondevice	koog-edge
Model source	OS-provided (system-managed)	App-bundled (GGUF)
Backend	Gemini Nano, Apple FM	Cactus Compute, Leap SDK
App size impact	None	+hundreds of MB
Model updates	Automatic (OS updates)	Manual (app update)
Model choice	Platform default only	Any GGUF model

They are complementary, not competing.

Setup

// build.gradle.kts
dependencies {
    implementation("dev.ynagai.koog:koog-ondevice:0.1.0")
}

Quick Start

Pure On-Device

val executor = simpleOnDeviceExecutor()
val service = AIAgentService(
    // Resolves to the platform's native model (Gemini Nano on Android,
    // Apple Foundation on iOS) — no platform branching needed.
    llmModel = OnDeviceModels.Default,
    promptExecutor = executor,
)

val response = with(service) {
    createAgentAndRun("Summarize today's workout")
}

Need a specific model? OnDeviceModels.GeminiNano and OnDeviceModels.AppleFoundation are available explicitly.

Hybrid (On-Device + Cloud)

val executor = hybridExecutor(
    onDevice = simpleOnDeviceExecutor(),
    cloud = simpleFirebaseExecutor(),  // from koog-firebase
)

val service = AIAgentService(
    llmModel = FirebaseModels.Gemini2_5Flash,
    promptExecutor = executor,
)

Availability Check

when (checkOnDeviceStatus()) {
    OnDeviceStatus.AVAILABLE -> { /* ready to run on-device */ }
    OnDeviceStatus.DOWNLOADABLE -> { downloadOnDeviceModel().awaitAvailable() }
    OnDeviceStatus.DOWNLOADING -> { /* a download is already in progress */ }
    OnDeviceStatus.UNAVAILABLE -> { /* not supported — fall back to cloud */ }
}

Warm-up (optional)

Pre-load the model to avoid the first-generation cold-start cost — best-effort, safe to call repeatedly:

warmUpOnDevice()  // e.g. when entering a screen that will use the model

Limitations

On-device models are smaller and more constrained than cloud LLMs:

No tool / function calling — execute rejects any prompt that passes tools.
No moderation — moderate() is unsupported.
No structured output — only plain text completion (with temperature) is supported.
iOS serializes generation — a single on-device Foundation Models session backs the client, so concurrent requests run one at a time.

Route prompts that need any of the above to a cloud executor (see Hybrid).

Requirements

Platform	Minimum	Runtime
Android	API 26	Google Play Services with AICore (Gemini Nano)
iOS	26.0	Apple Intelligence-capable device (e.g. iPhone 15 Pro+, Apple Silicon iPad/Mac)

Building for iOS requires the iOS 26 SDK (Xcode 26, Swift 6.2), since Apple Foundation Models ships only in that SDK.

Koog — JetBrains AI Agent Framework for Kotlin
koog-firebase — Firebase Gemini integration for Koog
koog-edge — GGUF-based on-device inference for Koog

License

Copyright 2026 Yuki Nagai

Licensed under the Apache License, Version 2.0

koog-ondevice

koog-ondevice

What is this?

How is this different from koog-edge?

Setup

Quick Start

Pure On-Device

Hybrid (On-Device + Cloud)

Availability Check

Warm-up (optional)

Limitations

Requirements

Related

License

koog-ondevice

What is this?

How is this different from koog-edge?

Setup

Quick Start

Pure On-Device

Hybrid (On-Device + Cloud)

Availability Check

Warm-up (optional)

Limitations

Requirements

Related

License