
Framework-agnostic on-device LLM inference with a unified OnDeviceGenerator API: single-shot and streaming generation, model download/progress tracking, systemInstruction handling, and simple cancellation semantics.
Framework-agnostic, on-device LLM inference for Kotlin Multiplatform — Gemini Nano (Android, via ML Kit GenAI) & Apple Foundation Models (iOS).
A single OnDeviceGenerator API for native on-device text generation. No agent framework required — most on-device use cases (summarize, classify, rewrite, extract) are single-shot generate() calls. To use this with Koog, see koog-ondevice.
| Platform | Backend | Min OS | Hardware |
|---|---|---|---|
| Android | Gemini Nano (ML Kit GenAI Prompt API, via AICore) | API 26+ | Pixel 9+ and other AICore-capable devices |
| iOS | Apple Foundation Models | iOS 26+ | Apple Intelligence-capable devices (iPhone 15 Pro+, etc.) |
gradle/libs.versions.toml:
[libraries]
ondevice-llm = { module = "dev.ynagai.ondevice:ondevice-llm", version = "0.1.0" }build.gradle.kts:
kotlin {
sourceSets {
commonMain.dependencies {
implementation(libs.ondevice.llm)
}
}
}val generator = createOnDeviceGenerator()
if (checkOnDeviceStatus() == OnDeviceStatus.AVAILABLE) {
val response = generator.generate(
OnDeviceRequest(prompt = "Summarize: ...", maxOutputTokens = 200)
)
println("${response.text} [${response.finishReason}]")
}
// streaming
generator.generateStream(OnDeviceRequest(prompt = "...")).collect { chunk ->
when (chunk) {
is OnDeviceChunk.Delta -> print(chunk.text)
is OnDeviceChunk.Done -> println("\n[${chunk.finishReason}]")
}
}
generator.close()The systemInstruction field is delivered through each platform's native channel
(Foundation Models Instructions on iOS; prepended to the prompt on Android). Do
not inline system text into prompt yourself.
On platforms that support it, trigger a model download with downloadOnDeviceModel(),
which emits OnDeviceDownload progress until a terminal Completed/Failed. Collect it
to drive a progress UI, or use awaitAvailable() if you only need the final result:
val ready = downloadOnDeviceModel().awaitAvailable()The iOS backend links against Apple's FoundationModels framework through a small
Swift bridge resolved via Gradle's swiftPMDependencies (Experimental in Kotlin 2.4.0):
swift/) is built as part of the Kotlin/Native compilegenerate() / stream collection forwards the
cancellation to Foundation Models, which stops generation at the next token boundary
and frees the device for the next call.LENGTH (token cap hit); iOS cannot detect a
maxOutputTokens cutoff and always reports STOP.The unit tests are platform-independent. Real inference requires capable hardware
(AICore/Gemini Nano on Android; an Apple Intelligence device on iOS); CI emulators
and simulators have no model and report UNAVAILABLE, so there are no integration
tests — verify on a real device.
Apache 2.0
Framework-agnostic, on-device LLM inference for Kotlin Multiplatform — Gemini Nano (Android, via ML Kit GenAI) & Apple Foundation Models (iOS).
A single OnDeviceGenerator API for native on-device text generation. No agent framework required — most on-device use cases (summarize, classify, rewrite, extract) are single-shot generate() calls. To use this with Koog, see koog-ondevice.
| Platform | Backend | Min OS | Hardware |
|---|---|---|---|
| Android | Gemini Nano (ML Kit GenAI Prompt API, via AICore) | API 26+ | Pixel 9+ and other AICore-capable devices |
| iOS | Apple Foundation Models | iOS 26+ | Apple Intelligence-capable devices (iPhone 15 Pro+, etc.) |
gradle/libs.versions.toml:
[libraries]
ondevice-llm = { module = "dev.ynagai.ondevice:ondevice-llm", version = "0.1.0" }build.gradle.kts:
kotlin {
sourceSets {
commonMain.dependencies {
implementation(libs.ondevice.llm)
}
}
}val generator = createOnDeviceGenerator()
if (checkOnDeviceStatus() == OnDeviceStatus.AVAILABLE) {
val response = generator.generate(
OnDeviceRequest(prompt = "Summarize: ...", maxOutputTokens = 200)
)
println("${response.text} [${response.finishReason}]")
}
// streaming
generator.generateStream(OnDeviceRequest(prompt = "...")).collect { chunk ->
when (chunk) {
is OnDeviceChunk.Delta -> print(chunk.text)
is OnDeviceChunk.Done -> println("\n[${chunk.finishReason}]")
}
}
generator.close()The systemInstruction field is delivered through each platform's native channel
(Foundation Models Instructions on iOS; prepended to the prompt on Android). Do
not inline system text into prompt yourself.
On platforms that support it, trigger a model download with downloadOnDeviceModel(),
which emits OnDeviceDownload progress until a terminal Completed/Failed. Collect it
to drive a progress UI, or use awaitAvailable() if you only need the final result:
val ready = downloadOnDeviceModel().awaitAvailable()The iOS backend links against Apple's FoundationModels framework through a small
Swift bridge resolved via Gradle's swiftPMDependencies (Experimental in Kotlin 2.4.0):
swift/) is built as part of the Kotlin/Native compilegenerate() / stream collection forwards the
cancellation to Foundation Models, which stops generation at the next token boundary
and frees the device for the next call.LENGTH (token cap hit); iOS cannot detect a
maxOutputTokens cutoff and always reports STOP.The unit tests are platform-independent. Real inference requires capable hardware
(AICore/Gemini Nano on Android; an Apple Intelligence device on iOS); CI emulators
and simulators have no model and report UNAVAILABLE, so there are no integration
tests — verify on a real device.
Apache 2.0