
On-device and remote LLM inference via native llama.cpp bindings, offering embeddings, context-aware text generation (streaming & non-streaming), lightweight HTTP client/server and GGUF model support.
Run AI locally on Android, iOS, Desktop and WASM — using a single Kotlin API.
Offline-first · Privacy-preserving · True Kotlin Multiplatform
Llamatik is a true Kotlin Multiplatform AI library that lets you run:
llama.cpp
whisper.cpp
stable-diffusion.cpp
Fully on-device, optionally remote — all behind a unified Kotlin API.
No Python.
No required servers.
Your models, your data, your device.
Designed for privacy-first, offline-capable, and cross-platform AI applications.
llamatik-backend)Want to see Llamatik in action before integrating it?
The Llamatik App showcases:
Your App
│
▼
LlamaBridge (shared Kotlin API)
│
├─ llamatik-core → Native llama.cpp, whisper.cpp and stablediffusion.cpp (on-device)
├─ llamatik-client → Remote HTTP inference
└─ llamatik-backend → llama.cpp-compatible server
Switching between local and remote inference requires no API changes — only configuration.
Llamatik is published on Maven Central and follows semantic versioning.
dependencyResolutionManagement {
repositories {
google()
mavenCentral()
}
}
commonMain.dependencies {
implementation("com.llamatik:library:1.0.0")
}// Resolve model path (place GGUF in assets / bundle)
val modelPath = LlamaBridge.getModelPath("phi-2.Q4_0.gguf")
// (Optional) tune parameters before loading — contextLength/useMmap/flashAttention
// take effect at model init time; the others can be changed at any time
LlamaBridge.updateGenerateParams(
temperature = 0.7f,
maxTokens = 512,
topP = 0.95f,
topK = 40,
repeatPenalty = 1.1f,
contextLength = 4096,
numThreads = 4,
useMmap = true,
flashAttention = false,
)
// Load model
LlamaBridge.initGenerateModel(modelPath)
// Generate text
val output = LlamaBridge.generate(
"Explain Kotlin Multiplatform in one sentence."
)The public Kotlin API is defined in LlamaBridge (an expect object with platform-specific actual implementations).
@Suppress("EXPECT_ACTUAL_CLASSIFIERS_ARE_IN_BETA_WARNING")
expect object LlamaBridge {
// Utilities
fun getModelPath(modelFileName: String): String // copy asset/bundle model to app files dir and return absolute path
fun shutdown() // free native resources
// Embeddings
fun initEmbedModel(modelPath: String): Boolean // load embeddings model
fun embed(input: String): FloatArray // return embedding vector
// Text generation (non-streaming)
fun initGenerateModel(modelPath: String): Boolean // load generation model
fun generate(prompt: String): String
fun generateWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String
): String
// Text generation (streaming)
fun generateStream(prompt: String, callback: GenStream)
fun generateStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
callback: GenStream
)
// Convenience streaming overload (lambda callbacks)
fun generateWithContextStream(
system: String,
context: String,
user: String,
onDelta: (String) -> Unit,
onDone: () -> Unit,
onError: (String) -> Unit
)
// Text generation with JSON schema (non-streaming)
fun generateJson(prompt: String, jsonSchema: String? = null): String
fun generateJsonWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null
): String
// Text generation with JSON schema (streaming)
fun generateJsonStream(prompt: String, jsonSchema: String? = null, callback: GenStream)
fun generateJsonStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null,
callback: GenStream
)
// KV cache session support
fun sessionReset(): Boolean // clear KV state, keep model loaded
fun sessionSave(path: String): Boolean // persist KV state to file
fun sessionLoad(path: String): Boolean // restore KV state from file
fun generateContinue(prompt: String): String // generate using existing KV cache
// Generation parameters (applied on next generate call)
fun updateGenerateParams(
temperature: Float, // randomness (0.0–2.0)
maxTokens: Int, // max output tokens
topP: Float, // nucleus sampling threshold
topK: Int, // top-k sampling
repeatPenalty: Float, // penalty for repeated tokens
contextLength: Int, // KV context window size (requires model reload)
numThreads: Int, // CPU threads for inference
useMmap: Boolean, // memory-map model weights (requires model reload)
flashAttention: Boolean, // enable Flash Attention (requires model reload)
batchSize: Int, // token batch size for prompt processing (requires model reload)
)
fun nativeCancelGenerate() // cancel ongoing generation
}
interface GenStream {
fun onDelta(text: String)
fun onComplete()
fun onError(message: String)
}All sampling and hardware parameters are set via updateGenerateParams. Parameters that affect model loading (contextLength, useMmap, flashAttention, numThreads) must be set before calling initGenerateModel to take effect — the others can be updated at any time.
| Parameter | Default | Description |
|---|---|---|
temperature |
0.7 |
Randomness of outputs (0 = deterministic, 2 = very random) |
maxTokens |
256 |
Maximum number of tokens to generate |
topP |
0.95 |
Nucleus sampling: keep tokens covering this probability mass |
topK |
40 |
Only sample from the top-K most likely tokens |
repeatPenalty |
1.1 |
Penalty multiplier for recently generated tokens |
contextLength |
4096 |
KV cache window size in tokens (reload required) |
numThreads |
4 |
CPU threads used for inference (reload required) |
useMmap |
true |
Memory-map model weights instead of loading into RAM (reload required) |
flashAttention |
false |
Enable Flash Attention for faster, more memory-efficient attention (reload required) |
batchSize |
512 |
Token batch size for prompt processing — larger = faster prefill, more RAM (reload required) |
Use the session API to persist and resume conversation state across calls without re-feeding the full prompt:
// Generate and keep the KV state in memory
LlamaBridge.generate("Tell me about Kotlin.")
// Save the KV state to disk
LlamaBridge.sessionSave("/path/to/session.bin")
// ... later or in a new process ...
// Restore state and continue from where you left off
LlamaBridge.sessionLoad("/path/to/session.bin")
val continuation = LlamaBridge.generateContinue("What about multiplatform support?")
// Reset state without unloading the model
LlamaBridge.sessionReset()WhisperBridge exposes a small, platform-friendly wrapper around whisper.cpp for on-device speech-to-text.
The workflow is:
object WhisperBridge {
/** Returns a platform-specific absolute path for the model filename. */
fun getModelPath(modelFileName: String): String
/** Loads the model at [modelPath]. Returns true if loaded. */
fun initModel(modelPath: String): Boolean
/**
* Transcribes a WAV file and returns text.
* Tip: record WAV as 16 kHz, mono, 16-bit PCM for best compatibility.
*
* @param initialPrompt Optional text prepended to the decoder input (up to 224 tokens).
* Use it to bias transcription toward domain-specific vocabulary (e.g. medical terms).
*/
fun transcribeWav(wavPath: String, language: String? = null, initialPrompt: String? = null): String
/** Frees native resources. */
fun release()
}import com.llamatik.library.platform.WhisperBridge
val modelPath = WhisperBridge.getModelPath("ggml-tiny-q8_0.bin")
// 1) Init once (e.g. app start)
WhisperBridge.initModel(modelPath)
// 2) Record to a WAV file (16kHz mono PCM16) using your own recorder
val wavPath: String = "/path/to/recording.wav"
// 3) Transcribe
val text = WhisperBridge.transcribeWav(wavPath, language = null).trim()
println(text)
// 4) Optional: release on app shutdown
WhisperBridge.release()Note: WhisperBridge expects a WAV file path. Llamatik’s app uses AudioRecorder + AudioPaths.tempWavPath() to generate the WAV before calling transcribeWav(...).
Llamatik exposes Stable Diffusion through StableDiffusionBridge.
Workflow
object StableDiffusionBridge {
/** Returns absolute model path (copied from assets/bundle if needed). */
fun getModelPath(modelFileName: String): String
/** Loads the Stable Diffusion model. */
fun initModel(modelPath: String): Boolean
/**
* Generates an image from a prompt.
*
* @param prompt Text prompt
* @param width Output width
* @param height Output height
* @param steps Inference steps
* @param cfgScale Guidance scale
* @return PNG image as ByteArray
*/
fun generateImage(
prompt: String,
width: Int = 512,
height: Int = 512,
steps: Int = 20,
cfgScale: Float = 7.5f
): ByteArray
/** Releases native resources */
fun release()
}import com.llamatik.library.platform.StableDiffusionBridge
val modelPath = StableDiffusionBridge.getModelPath("sd-model.bin")
StableDiffusionBridge.initModel(modelPath)
val imageBytes = StableDiffusionBridge.generateImage(
prompt = "A cyberpunk llama in neon Tokyo",
width = 512,
height = 512
)
// Save imageBytes as PNG fileMultimodalBridge wraps llama.cpp's multimodal (VLM) support for on-device image analysis using vision-language models such as SmolVLM.
The workflow is:
object MultimodalBridge {
/**
* Load the vision model and its multimodal projector (mmproj) side-by-side.
* Both files must be available on disk before calling this.
*
* @param modelPath Absolute path to the GGUF vision model.
* @param mmprojPath Absolute path to the GGUF mmproj file.
* @return true on success.
*/
fun initModel(modelPath: String, mmprojPath: String): Boolean
/**
* Analyze an image given as raw bytes (JPEG/PNG/BMP), streaming the response
* token by token via [callback].
*
* Must be called from a background thread/coroutine; blocks until generation completes.
*/
fun analyzeImageBytesStream(imageBytes: ByteArray, prompt: String, callback: GenStream)
/** Cancel an in-progress analyzeImageBytesStream call. */
fun cancelAnalysis()
/** Free all native resources (model, mmproj context, llama context). */
fun release()
}import com.llamatik.library.platform.MultimodalBridge
// 1) Init once — both model and mmproj must be downloaded first
val loaded = MultimodalBridge.initModel(
modelPath = "/path/to/SmolVLM-256M-Instruct-Q8_0.gguf",
mmprojPath = "/path/to/mmproj-SmolVLM-256M-Instruct-f16.gguf"
)
// 2) Analyze an image (e.g. loaded from disk or camera)
val imageBytes: ByteArray = File("/path/to/photo.jpg").readBytes()
MultimodalBridge.analyzeImageBytesStream(
imageBytes = imageBytes,
prompt = "Describe what you see in this image.",
callback = object : GenStream {
override fun onDelta(text: String) { print(text) }
override fun onComplete() { println("\n[done]") }
override fun onError(message: String){ println("Error: $message") }
}
)
// 3) Optional: cancel mid-stream
MultimodalBridge.cancelAnalysis()
// 4) Optional: release on app shutdown
MultimodalBridge.release()Note: MultimodalBridge requires both a vision model GGUF and a matching mmproj GGUF. Llamatik's app downloads both automatically when you select a VLM model.
The Llamatik backend server is now maintained in a dedicated repository.
👉 Llamatik Server Repository https://github.com/ferranpons/Llamatik-Server
Visit the repository for full setup instructions, configuration options, and usage details.
Llamatik is already used in production apps on Google Play and App Store.
Want to showcase your app here? Open a PR and add it to the list 🚀
Llamatik is 100% open-source and actively developed.
All contributions are welcome!
This project is licensed under the MIT License.
See LICENSE for details.
Built with ❤️ for the Kotlin community.
Run AI locally on Android, iOS, Desktop and WASM — using a single Kotlin API.
Offline-first · Privacy-preserving · True Kotlin Multiplatform
Llamatik is a true Kotlin Multiplatform AI library that lets you run:
llama.cpp
whisper.cpp
stable-diffusion.cpp
Fully on-device, optionally remote — all behind a unified Kotlin API.
No Python.
No required servers.
Your models, your data, your device.
Designed for privacy-first, offline-capable, and cross-platform AI applications.
llamatik-backend)Want to see Llamatik in action before integrating it?
The Llamatik App showcases:
Your App
│
▼
LlamaBridge (shared Kotlin API)
│
├─ llamatik-core → Native llama.cpp, whisper.cpp and stablediffusion.cpp (on-device)
├─ llamatik-client → Remote HTTP inference
└─ llamatik-backend → llama.cpp-compatible server
Switching between local and remote inference requires no API changes — only configuration.
Llamatik is published on Maven Central and follows semantic versioning.
dependencyResolutionManagement {
repositories {
google()
mavenCentral()
}
}
commonMain.dependencies {
implementation("com.llamatik:library:1.0.0")
}// Resolve model path (place GGUF in assets / bundle)
val modelPath = LlamaBridge.getModelPath("phi-2.Q4_0.gguf")
// (Optional) tune parameters before loading — contextLength/useMmap/flashAttention
// take effect at model init time; the others can be changed at any time
LlamaBridge.updateGenerateParams(
temperature = 0.7f,
maxTokens = 512,
topP = 0.95f,
topK = 40,
repeatPenalty = 1.1f,
contextLength = 4096,
numThreads = 4,
useMmap = true,
flashAttention = false,
)
// Load model
LlamaBridge.initGenerateModel(modelPath)
// Generate text
val output = LlamaBridge.generate(
"Explain Kotlin Multiplatform in one sentence."
)The public Kotlin API is defined in LlamaBridge (an expect object with platform-specific actual implementations).
@Suppress("EXPECT_ACTUAL_CLASSIFIERS_ARE_IN_BETA_WARNING")
expect object LlamaBridge {
// Utilities
fun getModelPath(modelFileName: String): String // copy asset/bundle model to app files dir and return absolute path
fun shutdown() // free native resources
// Embeddings
fun initEmbedModel(modelPath: String): Boolean // load embeddings model
fun embed(input: String): FloatArray // return embedding vector
// Text generation (non-streaming)
fun initGenerateModel(modelPath: String): Boolean // load generation model
fun generate(prompt: String): String
fun generateWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String
): String
// Text generation (streaming)
fun generateStream(prompt: String, callback: GenStream)
fun generateStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
callback: GenStream
)
// Convenience streaming overload (lambda callbacks)
fun generateWithContextStream(
system: String,
context: String,
user: String,
onDelta: (String) -> Unit,
onDone: () -> Unit,
onError: (String) -> Unit
)
// Text generation with JSON schema (non-streaming)
fun generateJson(prompt: String, jsonSchema: String? = null): String
fun generateJsonWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null
): String
// Text generation with JSON schema (streaming)
fun generateJsonStream(prompt: String, jsonSchema: String? = null, callback: GenStream)
fun generateJsonStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null,
callback: GenStream
)
// KV cache session support
fun sessionReset(): Boolean // clear KV state, keep model loaded
fun sessionSave(path: String): Boolean // persist KV state to file
fun sessionLoad(path: String): Boolean // restore KV state from file
fun generateContinue(prompt: String): String // generate using existing KV cache
// Generation parameters (applied on next generate call)
fun updateGenerateParams(
temperature: Float, // randomness (0.0–2.0)
maxTokens: Int, // max output tokens
topP: Float, // nucleus sampling threshold
topK: Int, // top-k sampling
repeatPenalty: Float, // penalty for repeated tokens
contextLength: Int, // KV context window size (requires model reload)
numThreads: Int, // CPU threads for inference
useMmap: Boolean, // memory-map model weights (requires model reload)
flashAttention: Boolean, // enable Flash Attention (requires model reload)
batchSize: Int, // token batch size for prompt processing (requires model reload)
)
fun nativeCancelGenerate() // cancel ongoing generation
}
interface GenStream {
fun onDelta(text: String)
fun onComplete()
fun onError(message: String)
}All sampling and hardware parameters are set via updateGenerateParams. Parameters that affect model loading (contextLength, useMmap, flashAttention, numThreads) must be set before calling initGenerateModel to take effect — the others can be updated at any time.
| Parameter | Default | Description |
|---|---|---|
temperature |
0.7 |
Randomness of outputs (0 = deterministic, 2 = very random) |
maxTokens |
256 |
Maximum number of tokens to generate |
topP |
0.95 |
Nucleus sampling: keep tokens covering this probability mass |
topK |
40 |
Only sample from the top-K most likely tokens |
repeatPenalty |
1.1 |
Penalty multiplier for recently generated tokens |
contextLength |
4096 |
KV cache window size in tokens (reload required) |
numThreads |
4 |
CPU threads used for inference (reload required) |
useMmap |
true |
Memory-map model weights instead of loading into RAM (reload required) |
flashAttention |
false |
Enable Flash Attention for faster, more memory-efficient attention (reload required) |
batchSize |
512 |
Token batch size for prompt processing — larger = faster prefill, more RAM (reload required) |
Use the session API to persist and resume conversation state across calls without re-feeding the full prompt:
// Generate and keep the KV state in memory
LlamaBridge.generate("Tell me about Kotlin.")
// Save the KV state to disk
LlamaBridge.sessionSave("/path/to/session.bin")
// ... later or in a new process ...
// Restore state and continue from where you left off
LlamaBridge.sessionLoad("/path/to/session.bin")
val continuation = LlamaBridge.generateContinue("What about multiplatform support?")
// Reset state without unloading the model
LlamaBridge.sessionReset()WhisperBridge exposes a small, platform-friendly wrapper around whisper.cpp for on-device speech-to-text.
The workflow is:
object WhisperBridge {
/** Returns a platform-specific absolute path for the model filename. */
fun getModelPath(modelFileName: String): String
/** Loads the model at [modelPath]. Returns true if loaded. */
fun initModel(modelPath: String): Boolean
/**
* Transcribes a WAV file and returns text.
* Tip: record WAV as 16 kHz, mono, 16-bit PCM for best compatibility.
*
* @param initialPrompt Optional text prepended to the decoder input (up to 224 tokens).
* Use it to bias transcription toward domain-specific vocabulary (e.g. medical terms).
*/
fun transcribeWav(wavPath: String, language: String? = null, initialPrompt: String? = null): String
/** Frees native resources. */
fun release()
}import com.llamatik.library.platform.WhisperBridge
val modelPath = WhisperBridge.getModelPath("ggml-tiny-q8_0.bin")
// 1) Init once (e.g. app start)
WhisperBridge.initModel(modelPath)
// 2) Record to a WAV file (16kHz mono PCM16) using your own recorder
val wavPath: String = "/path/to/recording.wav"
// 3) Transcribe
val text = WhisperBridge.transcribeWav(wavPath, language = null).trim()
println(text)
// 4) Optional: release on app shutdown
WhisperBridge.release()Note: WhisperBridge expects a WAV file path. Llamatik’s app uses AudioRecorder + AudioPaths.tempWavPath() to generate the WAV before calling transcribeWav(...).
Llamatik exposes Stable Diffusion through StableDiffusionBridge.
Workflow
object StableDiffusionBridge {
/** Returns absolute model path (copied from assets/bundle if needed). */
fun getModelPath(modelFileName: String): String
/** Loads the Stable Diffusion model. */
fun initModel(modelPath: String): Boolean
/**
* Generates an image from a prompt.
*
* @param prompt Text prompt
* @param width Output width
* @param height Output height
* @param steps Inference steps
* @param cfgScale Guidance scale
* @return PNG image as ByteArray
*/
fun generateImage(
prompt: String,
width: Int = 512,
height: Int = 512,
steps: Int = 20,
cfgScale: Float = 7.5f
): ByteArray
/** Releases native resources */
fun release()
}import com.llamatik.library.platform.StableDiffusionBridge
val modelPath = StableDiffusionBridge.getModelPath("sd-model.bin")
StableDiffusionBridge.initModel(modelPath)
val imageBytes = StableDiffusionBridge.generateImage(
prompt = "A cyberpunk llama in neon Tokyo",
width = 512,
height = 512
)
// Save imageBytes as PNG fileMultimodalBridge wraps llama.cpp's multimodal (VLM) support for on-device image analysis using vision-language models such as SmolVLM.
The workflow is:
object MultimodalBridge {
/**
* Load the vision model and its multimodal projector (mmproj) side-by-side.
* Both files must be available on disk before calling this.
*
* @param modelPath Absolute path to the GGUF vision model.
* @param mmprojPath Absolute path to the GGUF mmproj file.
* @return true on success.
*/
fun initModel(modelPath: String, mmprojPath: String): Boolean
/**
* Analyze an image given as raw bytes (JPEG/PNG/BMP), streaming the response
* token by token via [callback].
*
* Must be called from a background thread/coroutine; blocks until generation completes.
*/
fun analyzeImageBytesStream(imageBytes: ByteArray, prompt: String, callback: GenStream)
/** Cancel an in-progress analyzeImageBytesStream call. */
fun cancelAnalysis()
/** Free all native resources (model, mmproj context, llama context). */
fun release()
}import com.llamatik.library.platform.MultimodalBridge
// 1) Init once — both model and mmproj must be downloaded first
val loaded = MultimodalBridge.initModel(
modelPath = "/path/to/SmolVLM-256M-Instruct-Q8_0.gguf",
mmprojPath = "/path/to/mmproj-SmolVLM-256M-Instruct-f16.gguf"
)
// 2) Analyze an image (e.g. loaded from disk or camera)
val imageBytes: ByteArray = File("/path/to/photo.jpg").readBytes()
MultimodalBridge.analyzeImageBytesStream(
imageBytes = imageBytes,
prompt = "Describe what you see in this image.",
callback = object : GenStream {
override fun onDelta(text: String) { print(text) }
override fun onComplete() { println("\n[done]") }
override fun onError(message: String){ println("Error: $message") }
}
)
// 3) Optional: cancel mid-stream
MultimodalBridge.cancelAnalysis()
// 4) Optional: release on app shutdown
MultimodalBridge.release()Note: MultimodalBridge requires both a vision model GGUF and a matching mmproj GGUF. Llamatik's app downloads both automatically when you select a VLM model.
The Llamatik backend server is now maintained in a dedicated repository.
👉 Llamatik Server Repository https://github.com/ferranpons/Llamatik-Server
Visit the repository for full setup instructions, configuration options, and usage details.
Llamatik is already used in production apps on Google Play and App Store.
Want to showcase your app here? Open a PR and add it to the list 🚀
Llamatik is 100% open-source and actively developed.
All contributions are welcome!
This project is licensed under the MIT License.
See LICENSE for details.
Built with ❤️ for the Kotlin community.