ondevice-llm

Framework-agnostic on-device LLM inference with a unified OnDeviceGenerator API: single-shot and streaming generation, model download/progress tracking, systemInstruction handling, and simple cancellation semantics.

#llm

Suggest an edit

Android JVMKotlin/Native

GitHub stars1

Authorsuny

Dependents1

LicenseApache License 2.0

Creation date2 months ago

Last activity2 months ago

Latest release0.1.4 (2 months ago)

GitHub repository

ondevice-llm

Framework-agnostic, on-device LLM inference for Kotlin Multiplatform — Gemini Nano (Android, via ML Kit GenAI) & Apple Foundation Models (iOS).

A single OnDeviceGenerator API for native on-device text generation. No agent framework required — most on-device use cases (summarize, classify, rewrite, extract) are single-shot generate() calls. To use this with Koog, see koog-ondevice.

Supported platforms

Platform	Backend	Min OS	Hardware
Android	Gemini Nano (ML Kit GenAI Prompt API, via AICore)	API 26+	Pixel 9+ and other AICore-capable devices
iOS	Apple Foundation Models	iOS 26+	Apple Intelligence-capable devices (iPhone 15 Pro+, etc.)

Installation

gradle/libs.versions.toml:

[libraries]
ondevice-llm = { module = "dev.ynagai.ondevice:ondevice-llm", version = "0.1.0" }

build.gradle.kts:

kotlin {
    sourceSets {
        commonMain.dependencies {
            implementation(libs.ondevice.llm)
        }
    }
}

Usage

val generator = createOnDeviceGenerator()
if (checkOnDeviceStatus() == OnDeviceStatus.AVAILABLE) {
    val response = generator.generate(
        OnDeviceRequest(prompt = "Summarize: ...", maxOutputTokens = 200)
    )
    println("${response.text} [${response.finishReason}]")
}

// streaming
generator.generateStream(OnDeviceRequest(prompt = "...")).collect { chunk ->
    when (chunk) {
        is OnDeviceChunk.Delta -> print(chunk.text)
        is OnDeviceChunk.Done -> println("\n[${chunk.finishReason}]")
    }
}

generator.close()

The systemInstruction field is delivered through each platform's native channel (Foundation Models Instructions on iOS; prepended to the prompt on Android). Do not inline system text into prompt yourself.

On platforms that support it, trigger a model download with downloadOnDeviceModel(), which emits OnDeviceDownload progress until a terminal Completed/Failed. Collect it to drive a progress UI, or use awaitAvailable() if you only need the final result:

val ready = downloadOnDeviceModel().awaitAvailable()

iOS toolchain requirements

The iOS backend links against Apple's FoundationModels framework through a small Swift bridge resolved via Gradle's swiftPMDependencies (Experimental in Kotlin 2.4.0):

Xcode 26.x with the Swift 6.2 / iOS 26 SDK
A local SwiftPM package (swift/) is built as part of the Kotlin/Native compile

Limitations

No tool calling, moderation, or structured-output abstraction (platform APIs are immature; out of scope for this library).
iOS serializes generations (single Foundation Models session, one at a time).
Cancelling an in-flight iOS generate() / stream collection forwards the cancellation to Foundation Models, which stops generation at the next token boundary and frees the device for the next call.
Finish reason: Android can report LENGTH (token cap hit); iOS cannot detect a maxOutputTokens cutoff and always reports STOP.
No desktop / web / JVM targets — Android and iOS only.

Testing

The unit tests are platform-independent. Real inference requires capable hardware (AICore/Gemini Nano on Android; an Apple Intelligence device on iOS); CI emulators and simulators have no model and report UNAVAILABLE, so there are no integration tests — verify on a real device.

License

Apache 2.0

Android JVMKotlin/Native

GitHub stars1

Authorsuny

Dependents1

LicenseApache License 2.0

Creation date2 months ago

Last activity2 months ago

Latest release0.1.4 (2 months ago)

GitHub repository

ondevice-llm

Framework-agnostic, on-device LLM inference for Kotlin Multiplatform — Gemini Nano (Android, via ML Kit GenAI) & Apple Foundation Models (iOS).

Supported platforms

Platform	Backend	Min OS	Hardware
Android	Gemini Nano (ML Kit GenAI Prompt API, via AICore)	API 26+	Pixel 9+ and other AICore-capable devices
iOS	Apple Foundation Models	iOS 26+	Apple Intelligence-capable devices (iPhone 15 Pro+, etc.)

Installation

gradle/libs.versions.toml:

[libraries]
ondevice-llm = { module = "dev.ynagai.ondevice:ondevice-llm", version = "0.1.0" }

build.gradle.kts:

kotlin {
    sourceSets {
        commonMain.dependencies {
            implementation(libs.ondevice.llm)
        }
    }
}

Usage

val generator = createOnDeviceGenerator()
if (checkOnDeviceStatus() == OnDeviceStatus.AVAILABLE) {
    val response = generator.generate(
        OnDeviceRequest(prompt = "Summarize: ...", maxOutputTokens = 200)
    )
    println("${response.text} [${response.finishReason}]")
}

// streaming
generator.generateStream(OnDeviceRequest(prompt = "...")).collect { chunk ->
    when (chunk) {
        is OnDeviceChunk.Delta -> print(chunk.text)
        is OnDeviceChunk.Done -> println("\n[${chunk.finishReason}]")
    }
}

generator.close()

val ready = downloadOnDeviceModel().awaitAvailable()

iOS toolchain requirements

The iOS backend links against Apple's FoundationModels framework through a small Swift bridge resolved via Gradle's swiftPMDependencies (Experimental in Kotlin 2.4.0):

Xcode 26.x with the Swift 6.2 / iOS 26 SDK
A local SwiftPM package (swift/) is built as part of the Kotlin/Native compile

Limitations

No tool calling, moderation, or structured-output abstraction (platform APIs are immature; out of scope for this library).
iOS serializes generations (single Foundation Models session, one at a time).
Cancelling an in-flight iOS generate() / stream collection forwards the cancellation to Foundation Models, which stops generation at the next token boundary and frees the device for the next call.
Finish reason: Android can report LENGTH (token cap hit); iOS cannot detect a maxOutputTokens cutoff and always reports STOP.
No desktop / web / JVM targets — Android and iOS only.

Testing

License

Apache 2.0