
Open-source deep learning framework simplifies creation of modern AI applications, adhering to GitFlow for branching and Semantic Versioning for release management.
SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.
For architecture details see ARCHITECTURE.md.
Add the core dependencies (Gradle Kotlin DSL):
dependencies {
implementation("sk.ainet.core:SKaiNET-lang-core:0.20.0")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.20.0")
}val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }
val c = a matMul b
val d = c.relu()// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
println("Tensors: ${reader.tensorCount}")
// Load specific tensor on demand (no whole-file loading)
val bytes = reader.loadTensor("token_embd.weight")
// Or get a TensorStorage descriptor with encoding/placement metadata
val storage = reader.loadTensorStorage("token_embd.weight")
}More examples: SKaiNET-examples | SKaiNET-notebook
SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:
| Project | Description |
|---|---|
| SKaiNET-LLM | Llama, Gemma, and BERT inference runtimes |
| SKaiNET-transformers | Pre-built transformer architectures and layers |
| SKaiNET-examples | Sample projects and integration demos |
| Goal | Start here |
|---|---|
| Examples and sample projects | SKaiNET-examples |
| Interactive notebooks | SKaiNET-notebook |
| LLM inference (Llama, Gemma) | SKaiNET-LLM |
safe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.JavaAgentLoop (in skainet-lang-java)nn { input(); dense(); relu(); dense() }
dag { } for ResNet, YOLO-style architecturesSKaiNET entry point, TensorJavaOps, builder-pattern model definitionsk.ainet:skainet-bom) for one-line version managementHloGenerator
Q6_KTensorData stores 210-byte ggml blocks verbatim and a Vector-API SIMD kernel (matmulQ6_KVec) dispatches from DefaultCpuOpsJvm.chooseQuantizedMatmul. Together with the existing Q4_K infra, this unblocks running Gemma 4 E2B Q4_K_M (and any mostly-Q4_K + Q6_K checkpoint) through the DSL path without a ~12 GB FP32 dequant blow-up at load.ops.transpose on Q4_KTensorData / Q6_KTensorData now returns a new tensor wrapping the same packed byte array with swapped shape, matching the existing Q4/Q8 MemorySegment path. linearProject(x, W) can run matmul(x, transpose(W)) on Q4_K/Q6_K weights without round-tripping through FP32 (Δ logits = 4.29e-6 vs FP32 baseline on Gemma).scaledDotProductAttention is now recorded by RecordingExecution and lowered to StableHLO as dot_general(Q, K.T) → scale → optional mask → softmax → dot_general(weights, V), so attention blocks compile end-to-end through the SKaiNET → StableHLO → IREE path. (#543)head_dim between Q/K or Q/V (seen in real Gemma 4 E2B with mixed-head-dim layers sharing a KV cache) used to surface as an ArrayIndexOutOfBoundsException deep in the dot-product loop; scaledDotProductAttention now fails fast with require() messages naming the offending dimensions.See CHANGELOG.md for the full release history.
We love contributions! Whether it's a new operator, documentation, or a bug fix:
Browse the full codebase documentation on DeepWiki.
MIT — see LICENCE.
SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.
For architecture details see ARCHITECTURE.md.
Add the core dependencies (Gradle Kotlin DSL):
dependencies {
implementation("sk.ainet.core:SKaiNET-lang-core:0.20.0")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.20.0")
}val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }
val c = a matMul b
val d = c.relu()// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
println("Tensors: ${reader.tensorCount}")
// Load specific tensor on demand (no whole-file loading)
val bytes = reader.loadTensor("token_embd.weight")
// Or get a TensorStorage descriptor with encoding/placement metadata
val storage = reader.loadTensorStorage("token_embd.weight")
}More examples: SKaiNET-examples | SKaiNET-notebook
SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:
| Project | Description |
|---|---|
| SKaiNET-LLM | Llama, Gemma, and BERT inference runtimes |
| SKaiNET-transformers | Pre-built transformer architectures and layers |
| SKaiNET-examples | Sample projects and integration demos |
| Goal | Start here |
|---|---|
| Examples and sample projects | SKaiNET-examples |
| Interactive notebooks | SKaiNET-notebook |
| LLM inference (Llama, Gemma) | SKaiNET-LLM |
safe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.JavaAgentLoop (in skainet-lang-java)nn { input(); dense(); relu(); dense() }
dag { } for ResNet, YOLO-style architecturesSKaiNET entry point, TensorJavaOps, builder-pattern model definitionsk.ainet:skainet-bom) for one-line version managementHloGenerator
Q6_KTensorData stores 210-byte ggml blocks verbatim and a Vector-API SIMD kernel (matmulQ6_KVec) dispatches from DefaultCpuOpsJvm.chooseQuantizedMatmul. Together with the existing Q4_K infra, this unblocks running Gemma 4 E2B Q4_K_M (and any mostly-Q4_K + Q6_K checkpoint) through the DSL path without a ~12 GB FP32 dequant blow-up at load.ops.transpose on Q4_KTensorData / Q6_KTensorData now returns a new tensor wrapping the same packed byte array with swapped shape, matching the existing Q4/Q8 MemorySegment path. linearProject(x, W) can run matmul(x, transpose(W)) on Q4_K/Q6_K weights without round-tripping through FP32 (Δ logits = 4.29e-6 vs FP32 baseline on Gemma).scaledDotProductAttention is now recorded by RecordingExecution and lowered to StableHLO as dot_general(Q, K.T) → scale → optional mask → softmax → dot_general(weights, V), so attention blocks compile end-to-end through the SKaiNET → StableHLO → IREE path. (#543)head_dim between Q/K or Q/V (seen in real Gemma 4 E2B with mixed-head-dim layers sharing a KV cache) used to surface as an ArrayIndexOutOfBoundsException deep in the dot-product loop; scaledDotProductAttention now fails fast with require() messages naming the offending dimensions.See CHANGELOG.md for the full release history.
We love contributions! Whether it's a new operator, documentation, or a bug fix:
Browse the full codebase documentation on DeepWiki.
MIT — see LICENCE.