deviceai

On-device AI runtime enabling speech recognition, TTS, and local LLM inference with offline RAG, auto model downloads, streaming generation, and GPU acceleration for low-latency, privacy-preserving apps.

#wrapper
#sdk
#llm
#kotlin-native
#kotlin-flow
#file
#compose-multiplatform
#compose
#audio
#ai

Suggest an edit

Android JVMJVMKotlin/Native

GitHub stars5

Authorsdeviceai-labs

Dependents0

LicenseOther

Creation date5 months ago

Last activity4 months ago

Latest release0.0.1 (2 months ago)

GitHub repository Wiki page

DeviceAI

On-device AI for Android & iOS — speech recognition, text-to-speech, and LLM chat. Zero cloud latency, zero privacy risk. Optional cloud backend for OTA model updates, telemetry, and device management.

Install

Android (Kotlin)

// build.gradle.kts
implementation("dev.deviceai:core:0.0.1")
implementation("dev.deviceai:speech:0.0.1")   // STT + TTS
implementation("dev.deviceai:llm:0.0.1")      // LLM + RAG

iOS / macOS (Swift Package Manager)

Add the DeviceAI package to your Xcode project or Package.swift:

// Package.swift
dependencies: [
    .package(url: "https://github.com/deviceai-labs/deviceai", from: "0.0.1")
]

Then add the modules you need:

.target(
    name: "YourApp",
    dependencies: [
        .product(name: "DeviceAI", package: "deviceai"),
        .product(name: "DeviceAISpeech", package: "deviceai"),   // STT + TTS
        .product(name: "DeviceAILLM", package: "deviceai"),      // LLM + RAG
    ]
)

Or in Xcode: File → Add Package Dependencies → paste https://github.com/deviceai-labs/deviceai → select the modules you need.

Initialize

Android

class MyApp : Application() {
    override fun onCreate() {
        super.onCreate()
        PlatformStorage.initialize(this)
        DeviceAI.initialize(context = this)
    }
}

iOS / macOS

import DeviceAI

// Local mode — no cloud, fully offline
DeviceAI.initialize()

// With cloud backend (optional)
DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
}

That's it. The SDK runs fully on-device with no backend required.

With cloud backend (optional)

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Minimal
    appVersion = BuildConfig.VERSION_NAME
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
    $0.appVersion = Bundle.main.infoDictionary?["CFBundleShortVersionString"] as? String
}

The API key connects the SDK to the DeviceAI cloud backend. Device hardware (RAM, CPU, SoC) is detected automatically — no manual configuration needed.

Speech-to-Text

Android

SpeechBridge.initStt(modelPath, SttConfig(language = "en", useGpu = true))

// From raw audio samples
val text = SpeechBridge.transcribeAudio(samples)  // FloatArray, 16kHz mono

// From a WAV file
val textFromFile = SpeechBridge.transcribe("/path/to/audio.wav")

SpeechBridge.shutdownStt()

iOS

let engine = try await SttEngine(modelPath: path, config: .init(language: "en"))

// From raw audio samples
let text = try await engine.transcribe(samples: audioBuffer)  // [Float], 16kHz mono

// From a WAV file
let textFromFile = try await engine.transcribe(audioPath: "/path/to/audio.wav")

engine.shutdown()

Text-to-Speech

Android

SpeechBridge.initTts(modelPath, tokensPath, TtsConfig(speechRate = 1.0f))

val pcm: ShortArray = SpeechBridge.synthesize("Hello from DeviceAI.")
// Play with AudioTrack

SpeechBridge.shutdownTts()

iOS

let tts = try await TtsEngine(modelPath: path, tokensPath: tokens)
let audio = try await tts.synthesize("Hello from DeviceAI")
tts.shutdown()

// Or use Apple's built-in voices (zero setup, no model download):
let systemTts = SystemTTSEngine()
try await systemTts.speak("Hello from DeviceAI")

LLM Chat

Android

val session = DeviceAI.llm.chat("/path/to/model.gguf") {
    systemPrompt = "You are a helpful assistant."
    maxTokens = 512
    temperature = 0.7f
    useGpu = true
}

// Streaming (recommended for UI)
session.send("What is Kotlin?").collect { token -> print(token) }

// Multi-turn — history managed automatically
session.send("Give me an example.").collect { print(it) }

// Lifecycle
session.cancel()        // abort generation
session.clearHistory()  // fresh conversation
session.close()         // unload model

iOS

let session = try await ChatSession(modelPath: path) {
    $0.systemPrompt = "You are a helpful assistant."
    $0.maxTokens = 512
    $0.temperature = 0.7
}

// Streaming
for try await token in try session.send("What is Swift?") {
    print(token, terminator: "")
}

// Multi-turn — history managed automatically
for try await token in try session.send("Give me an example.") {
    print(token, terminator: "")
}

// Lifecycle
session.cancel()        // abort generation
session.clearHistory()  // fresh conversation
session.close()         // unload model

Offline RAG

Android

val store = BM25RagStore(rawChunks = listOf(
    "DeviceAI supports Android and iOS.",
    "LLM inference uses llama.cpp with Vulkan GPU."
))
val session = DeviceAI.llm.chat("/path/to/model.gguf") { ragStore = store }
session.send("What GPU does DeviceAI use?").collect { print(it) }

iOS

let store = BM25RagStore(chunks: [
    "DeviceAI supports Android and iOS.",
    "LLM inference uses llama.cpp with Metal GPU."
])
let session = try await ChatSession(modelPath: path) {
    $0.ragStore = store
}
for try await token in try session.send("What GPU does DeviceAI use?") {
    print(token, terminator: "")
}

No embedding model needed — BM25 keyword retrieval runs entirely on-device.

Telemetry

When telemetry is enabled, the SDK automatically tracks performance metrics for all modules:

What's collected

Module	Metrics
STT	Model load time, transcription latency, audio duration (input_length_ms)
TTS	Model load time, synthesis latency, text length (output_chars)
LLM	Model load time, inference latency, time-to-first-token, tokens/sec, token counts, finish reason

What's NEVER collected

Prompt or response text content
Audio recordings or transcript content
PII by default

Apps should avoid putting PII in appAttributes, since developer-provided attributes are sent in the capability profile.

Telemetry levels

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Off      // default — nothing sent
    telemetry = TelemetryLevel.Minimal  // model load/unload + inference metrics
    telemetry = TelemetryLevel.Full     // includes OTA downloads + manifest syncs
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .off      // default — nothing sent
    $0.telemetry = .minimal  // model load/unload + inference metrics
    $0.telemetry = .full     // includes OTA downloads + manifest syncs
}

Events are batched on-device and delivered efficiently — respects Wi-Fi preference, data-saver mode, and flushes automatically when the app goes to background.

Custom telemetry sink

Route events to your own analytics instead of the DeviceAI backend:

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Minimal
    telemetrySink = object : TelemetrySink {
        override suspend fun ingest(events: List<TelemetryEvent>) {
            myAnalytics.track(events)
        }
    }
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
    $0.telemetrySink = MyAnalyticsSink()  // conforms to TelemetrySink protocol
}

Cloud Backend

The SDK optionally connects to a cloud control plane. When an API key is provided:

Feature	What happens
Device registration	Automatic — hardware profile sent, capability tier assigned
Model manifest	Backend assigns the right model for each device tier, synced every 6h
OTA updates	Push new models with canary rollouts and instant kill-switch
Telemetry	Performance metrics batched and delivered (when enabled)
Device identity	Stable across reinstalls — same device always gets the same ID

No cloud calls are made without an API key. Local mode works fully offline.

Models

Whisper (STT)

Model	Size	Speed	Best for
`ggml-tiny.en.bin`	75 MB	7× real-time	English, mobile-first
`ggml-base.bin`	142 MB	Fast	Multilingual, balanced
`ggml-small.bin`	466 MB	Medium	Higher accuracy

LLM (GGUF via llama.cpp)

Model	Size	Best for
SmolLM2-360M-Instruct (Q4)	~220 MB	Fastest, mobile-first
Qwen2.5-0.5B-Instruct (Q4)	~400 MB	Multilingual, compact
Llama-3.2-1B-Instruct (Q4)	~700 MB	Strong reasoning
SmolLM2-1.7B-Instruct (Q4)	~1 GB	Balanced

Browse LLM models with LlmCatalog. Download Whisper/TTS models via ModelRegistry.

Features

Feature	Android	iOS
Speech-to-Text (whisper.cpp)	✅	✅
Text-to-Speech (sherpa-onnx VITS / Kokoro)	✅	✅
System TTS (Apple AVSpeechSynthesizer)	—	✅
Voice Activity Detection	✅	✅
LLM inference (llama.cpp, GGUF)	✅	✅
Streaming generation	✅	✅
Stateful multi-turn chat	✅	✅
Offline RAG (BM25)	✅	✅
Auto model download (HuggingFace)	✅	🗓
GPU acceleration	✅ Vulkan	✅ Metal
Cloud backend (registration, manifest, telemetry)	✅	✅
Auto hardware detection	✅	✅
Stable device identity (survives reinstall)	✅	✅
Telemetry (STT/TTS/LLM)	✅	✅
Custom telemetry sink	✅	✅
OTA model rollouts + kill switch	✅	✅
Flutter plugin	🗓	🗓
React Native module	🗓	🗓

Platform support

Platform	STT	TTS	LLM	Version	Status
Android (API 26+)	✅	✅	✅	0.0.1	Available
iOS 17+ / macOS 14+	✅	✅	✅	0.0.1	Available
Flutter	—	—	—	—	Planned
React Native	—	—	—	—	Planned

Benchmarks

Device	SoC	Model	Audio	Inference	RTF
Redmi Note 9 Pro	Snapdragon 720G	whisper-tiny.en	5.4s	746ms	0.14x

RTF < 1.0 = faster than real-time. 0.14x = ~7× faster than real-time.

Building from source

Android

git clone https://github.com/deviceai-labs/deviceai.git
cd deviceai
make setup
./gradlew :kotlin:core:compileDebugKotlinAndroid
./gradlew :kotlin:speech:compileDebugKotlinAndroid
./gradlew :kotlin:llm:compileDebugKotlinAndroid

iOS (Swift)

git clone https://github.com/deviceai-labs/deviceai.git
cd deviceai

# Build XCFrameworks (requires Xcode + CMake)
./sdk/deviceai-commons/scripts/build-xcframeworks.sh

# Build the Swift package
cd swift
swift build

Sample App

# Android: Open samples/androidApp/ in Android Studio and run on device/emulator
# iOS: Open samples/iosApp/ in Xcode

Contributing

Issues and PRs welcome. Platform SDK contributions (flutter/, react-native/) are especially welcome.

License

Apache 2.0 — see LICENSE.

Android JVMJVMKotlin/Native

GitHub stars5

Authorsdeviceai-labs

Dependents0

LicenseOther

Creation date5 months ago

Last activity4 months ago

Latest release0.0.1 (2 months ago)

GitHub repository Wiki page

DeviceAI

Install

Android (Kotlin)

// build.gradle.kts
implementation("dev.deviceai:core:0.0.1")
implementation("dev.deviceai:speech:0.0.1")   // STT + TTS
implementation("dev.deviceai:llm:0.0.1")      // LLM + RAG

iOS / macOS (Swift Package Manager)

Add the DeviceAI package to your Xcode project or Package.swift:

// Package.swift
dependencies: [
    .package(url: "https://github.com/deviceai-labs/deviceai", from: "0.0.1")
]

Then add the modules you need:

.target(
    name: "YourApp",
    dependencies: [
        .product(name: "DeviceAI", package: "deviceai"),
        .product(name: "DeviceAISpeech", package: "deviceai"),   // STT + TTS
        .product(name: "DeviceAILLM", package: "deviceai"),      // LLM + RAG
    ]
)

Or in Xcode: File → Add Package Dependencies → paste https://github.com/deviceai-labs/deviceai → select the modules you need.

Initialize

Android

class MyApp : Application() {
    override fun onCreate() {
        super.onCreate()
        PlatformStorage.initialize(this)
        DeviceAI.initialize(context = this)
    }
}

iOS / macOS

import DeviceAI

// Local mode — no cloud, fully offline
DeviceAI.initialize()

// With cloud backend (optional)
DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
}

That's it. The SDK runs fully on-device with no backend required.

With cloud backend (optional)

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Minimal
    appVersion = BuildConfig.VERSION_NAME
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
    $0.appVersion = Bundle.main.infoDictionary?["CFBundleShortVersionString"] as? String
}

The API key connects the SDK to the DeviceAI cloud backend. Device hardware (RAM, CPU, SoC) is detected automatically — no manual configuration needed.

Speech-to-Text

Android

SpeechBridge.initStt(modelPath, SttConfig(language = "en", useGpu = true))

// From raw audio samples
val text = SpeechBridge.transcribeAudio(samples)  // FloatArray, 16kHz mono

// From a WAV file
val textFromFile = SpeechBridge.transcribe("/path/to/audio.wav")

SpeechBridge.shutdownStt()

iOS

let engine = try await SttEngine(modelPath: path, config: .init(language: "en"))

// From raw audio samples
let text = try await engine.transcribe(samples: audioBuffer)  // [Float], 16kHz mono

// From a WAV file
let textFromFile = try await engine.transcribe(audioPath: "/path/to/audio.wav")

engine.shutdown()

Text-to-Speech

Android

SpeechBridge.initTts(modelPath, tokensPath, TtsConfig(speechRate = 1.0f))

val pcm: ShortArray = SpeechBridge.synthesize("Hello from DeviceAI.")
// Play with AudioTrack

SpeechBridge.shutdownTts()

iOS

let tts = try await TtsEngine(modelPath: path, tokensPath: tokens)
let audio = try await tts.synthesize("Hello from DeviceAI")
tts.shutdown()

// Or use Apple's built-in voices (zero setup, no model download):
let systemTts = SystemTTSEngine()
try await systemTts.speak("Hello from DeviceAI")

LLM Chat

Android

val session = DeviceAI.llm.chat("/path/to/model.gguf") {
    systemPrompt = "You are a helpful assistant."
    maxTokens = 512
    temperature = 0.7f
    useGpu = true
}

// Streaming (recommended for UI)
session.send("What is Kotlin?").collect { token -> print(token) }

// Multi-turn — history managed automatically
session.send("Give me an example.").collect { print(it) }

// Lifecycle
session.cancel()        // abort generation
session.clearHistory()  // fresh conversation
session.close()         // unload model

iOS

let session = try await ChatSession(modelPath: path) {
    $0.systemPrompt = "You are a helpful assistant."
    $0.maxTokens = 512
    $0.temperature = 0.7
}

// Streaming
for try await token in try session.send("What is Swift?") {
    print(token, terminator: "")
}

// Multi-turn — history managed automatically
for try await token in try session.send("Give me an example.") {
    print(token, terminator: "")
}

// Lifecycle
session.cancel()        // abort generation
session.clearHistory()  // fresh conversation
session.close()         // unload model

Offline RAG

Android

val store = BM25RagStore(rawChunks = listOf(
    "DeviceAI supports Android and iOS.",
    "LLM inference uses llama.cpp with Vulkan GPU."
))
val session = DeviceAI.llm.chat("/path/to/model.gguf") { ragStore = store }
session.send("What GPU does DeviceAI use?").collect { print(it) }

iOS

let store = BM25RagStore(chunks: [
    "DeviceAI supports Android and iOS.",
    "LLM inference uses llama.cpp with Metal GPU."
])
let session = try await ChatSession(modelPath: path) {
    $0.ragStore = store
}
for try await token in try session.send("What GPU does DeviceAI use?") {
    print(token, terminator: "")
}

No embedding model needed — BM25 keyword retrieval runs entirely on-device.

Telemetry

When telemetry is enabled, the SDK automatically tracks performance metrics for all modules:

What's collected

Module	Metrics
STT	Model load time, transcription latency, audio duration (input_length_ms)
TTS	Model load time, synthesis latency, text length (output_chars)
LLM	Model load time, inference latency, time-to-first-token, tokens/sec, token counts, finish reason

What's NEVER collected

Prompt or response text content
Audio recordings or transcript content
PII by default

Apps should avoid putting PII in appAttributes, since developer-provided attributes are sent in the capability profile.

Telemetry levels

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Off      // default — nothing sent
    telemetry = TelemetryLevel.Minimal  // model load/unload + inference metrics
    telemetry = TelemetryLevel.Full     // includes OTA downloads + manifest syncs
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .off      // default — nothing sent
    $0.telemetry = .minimal  // model load/unload + inference metrics
    $0.telemetry = .full     // includes OTA downloads + manifest syncs
}

Events are batched on-device and delivered efficiently — respects Wi-Fi preference, data-saver mode, and flushes automatically when the app goes to background.

Custom telemetry sink

Route events to your own analytics instead of the DeviceAI backend:

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Minimal
    telemetrySink = object : TelemetrySink {
        override suspend fun ingest(events: List<TelemetryEvent>) {
            myAnalytics.track(events)
        }
    }
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
    $0.telemetrySink = MyAnalyticsSink()  // conforms to TelemetrySink protocol
}

Cloud Backend

The SDK optionally connects to a cloud control plane. When an API key is provided:

Feature	What happens
Device registration	Automatic — hardware profile sent, capability tier assigned
Model manifest	Backend assigns the right model for each device tier, synced every 6h
OTA updates	Push new models with canary rollouts and instant kill-switch
Telemetry	Performance metrics batched and delivered (when enabled)
Device identity	Stable across reinstalls — same device always gets the same ID

No cloud calls are made without an API key. Local mode works fully offline.

Models

Whisper (STT)

Model	Size	Speed	Best for
`ggml-tiny.en.bin`	75 MB	7× real-time	English, mobile-first
`ggml-base.bin`	142 MB	Fast	Multilingual, balanced
`ggml-small.bin`	466 MB	Medium	Higher accuracy

LLM (GGUF via llama.cpp)

Model	Size	Best for
SmolLM2-360M-Instruct (Q4)	~220 MB	Fastest, mobile-first
Qwen2.5-0.5B-Instruct (Q4)	~400 MB	Multilingual, compact
Llama-3.2-1B-Instruct (Q4)	~700 MB	Strong reasoning
SmolLM2-1.7B-Instruct (Q4)	~1 GB	Balanced

Browse LLM models with LlmCatalog. Download Whisper/TTS models via ModelRegistry.

Features

Feature	Android	iOS
Speech-to-Text (whisper.cpp)	✅	✅
Text-to-Speech (sherpa-onnx VITS / Kokoro)	✅	✅
System TTS (Apple AVSpeechSynthesizer)	—	✅
Voice Activity Detection	✅	✅
LLM inference (llama.cpp, GGUF)	✅	✅
Streaming generation	✅	✅
Stateful multi-turn chat	✅	✅
Offline RAG (BM25)	✅	✅
Auto model download (HuggingFace)	✅	🗓
GPU acceleration	✅ Vulkan	✅ Metal
Cloud backend (registration, manifest, telemetry)	✅	✅
Auto hardware detection	✅	✅
Stable device identity (survives reinstall)	✅	✅
Telemetry (STT/TTS/LLM)	✅	✅
Custom telemetry sink	✅	✅
OTA model rollouts + kill switch	✅	✅
Flutter plugin	🗓	🗓
React Native module	🗓	🗓

Platform support

Platform	STT	TTS	LLM	Version	Status
Android (API 26+)	✅	✅	✅	0.0.1	Available
iOS 17+ / macOS 14+	✅	✅	✅	0.0.1	Available
Flutter	—	—	—	—	Planned
React Native	—	—	—	—	Planned

Benchmarks

Device	SoC	Model	Audio	Inference	RTF
Redmi Note 9 Pro	Snapdragon 720G	whisper-tiny.en	5.4s	746ms	0.14x

RTF < 1.0 = faster than real-time. 0.14x = ~7× faster than real-time.

Building from source

Android

git clone https://github.com/deviceai-labs/deviceai.git
cd deviceai
make setup
./gradlew :kotlin:core:compileDebugKotlinAndroid
./gradlew :kotlin:speech:compileDebugKotlinAndroid
./gradlew :kotlin:llm:compileDebugKotlinAndroid

iOS (Swift)

git clone https://github.com/deviceai-labs/deviceai.git
cd deviceai

# Build XCFrameworks (requires Xcode + CMake)
./sdk/deviceai-commons/scripts/build-xcframeworks.sh

# Build the Swift package
cd swift
swift build

Sample App

# Android: Open samples/androidApp/ in Android Studio and run on device/emulator
# iOS: Open samples/iosApp/ in Xcode

Contributing

Issues and PRs welcome. Platform SDK contributions (flutter/, react-native/) are especially welcome.

License

Apache 2.0 — see LICENSE.