
Open-source deep learning framework simplifies creation of modern AI applications, adhering to GitFlow for branching and Semantic Versioning for release management.
For architecture details see ARCHITECTURE.md.
SKaiNET is a Kotlin Multiplatform AI framework. New here? Choose the path that matches what you want to try first.
| Goal | Start here | Time |
|---|---|---|
| Run tensor operations | Quickstart (below) | 2–5 min |
| Build and train a neural net | Hello Neural Net (below) | 5 min |
| Run a local GGUF model | SKaiNET Transformers starter | 5 min after model setup |
| Export a secure MCU bundle | Minerva getting started | 10 min without firmware flashing |
Working in Java? SKaiNET ships first-class Java support — see the Java getting-started guide.
Use the version shown in this README as the source of truth for first-run snippets. If another page shows a different version, please open an issue or PR.
Add the core dependencies (Gradle Kotlin DSL):
dependencies {
// Recommended: import the umbrella BOM and drop versions on the engine modules.
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation("sk.ainet.core:skainet-lang-core")
implementation("sk.ainet.core:skainet-backend-cpu")
}The BOM was first correctly published to Maven Central in 0.22.2 — earlier versions shipped at the wrong coordinates and could not be imported. Pin versions directly if you need an older release.
val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }
val c = a matMul b
val d = c.relu()// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
println("Tensors: ${reader.tensorCount}")
// Load specific tensor on demand (no whole-file loading)
val bytes = reader.loadTensor("token_embd.weight")
// Or get a TensorStorage descriptor with encoding/placement metadata
val storage = reader.loadTensorStorage("token_embd.weight")
}More examples: SKaiNET-examples | SKaiNET-notebook
SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:
| Project | Description |
|---|---|
| SKaiNET-transformers | Pre-built transformer architectures and layers |
| SKaiNET-examples | Sample projects and integration demos |
| Goal | Start here |
|---|---|
| Examples and sample projects | SKaiNET-examples |
| Interactive notebooks | SKaiNET-notebook |
| Eager backends & kernels (what runs where) | Backends & kernels mindmap |
SKaiNET ships an official Phoronix-Test-Suite-compatible benchmark
program for the compute engine. See the
methodology and replay docs,
the release manifest, and the
CI workflow. Smoke runs fire
on every PR via ubuntu-latest; full publishable runs fire on a
self-hosted Linux x86 runner on release.
Quick local replay:
./gradlew :skainet-backends:benchmarks:jvm-cpu-publish:shadowJar
./scripts/run_engine_smoke.shSKaiNET is built around one path: a model is defined once in the Kotlin DSL, then either compiled to native code or executed eagerly — without rewriting it.
nn { } / dag { }).HloGenerator) and
compile to native code (IREE-compatible) for native / edge targets.flowchart LR
DSL["Model — Kotlin DSL"] --> Graph["Tape / DAG"]
Graph --> HLO["MLIR / StableHLO"]
Graph --> Eager["Eager backend (JVM, …)"]
HLO --> Native["Native code"]The same DSL model feeds both paths — eager execution for development and JVM deployment, the StableHLO path for native and edge targets.
SKaiNET now includes a Minerva export backend for secure MCU deployment. It is a sibling to StableHLO and Arduino/C99 export: it starts from a supported ComputeGraph, lowers static MLPs to a Minerva compiler input, invokes libminerva when configured, and packages generated weights, host fixtures, firmware skeletons, and a fingerprinted manifest.json.
Start here:
Runnable examples:
./gradlew :skainet-compile:skainet-compile-minerva:runMinervaSecureMcuExamples
./gradlew :skainet-compile:skainet-compile-minerva:runMinervaSecureMcuExamples \
-Pminerva.example=sensor-classifiersafe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.nn { input(); dense(); relu(); dense() }
dag { } for ResNet, YOLO-style architecturesHloGenerator
Q5_KBlockTensorData packed type and a Q5KMatmulKernel SPI with scalar (commonMain / Kotlin-Native), JVM Panama Vector, and native-C implementations, wired into DefaultCpuOps matmul dispatch + lazy transpose and the GGUF streaming loader. Q5_K weights now stay packed (no FP32 inflation) and dequantize inside the matmul, like Q4_K/Q6_K.__ARM_NEON so x86 keeps its scalar / auto-vectorized path. The native CMake build gains an aarch64 branch (-march=armv8.2-a+fp16+dotprod, dotprod for Cortex-A55) plus an opt-in cross-compile.skainet-backend-native-cpu now also builds a static archive and exposes the kernels to Kotlin/Native (linuxX64 + linuxArm64) through a KernelProvider, so on-device (non-JVM) binaries get the same hand-tuned kernels the JVM reaches via FFM. (PR #734)sk.ainet.core:skainet-compile-minerva now publishes to Maven Central (packaging fix for the Minerva export module shipped in 0.29.0)..npz compiler input → a libminerva-packaged secure MCU project bundle, with host-side runtime verification and fingerprinted manifest artifacts (runnable sample, examples, ONNX workflow, getting-started docs). Plus packed-quant matmul kernels with Kotlin/Native parity (Q5_0/Q5_1/Q4_K/Q6_K — commonMain scalar + SPI, packed-quant dispatch in DefaultCpuOpsBase, Panama Vector for Q5_1/Q5_0 and Q6_K via the KernelRegistry), and an auto-generated, CI-gated kernel × platform support matrix. (PRs #697–#726)vmfb): inferDagOutputSpecs now infers correct output shapes for shape-changing ops, and reduce_window (pooling) emits IREE's generic region form. (PRs #674, #676)HloGenerator tracing #668) plus non-JVM image runtime support (#671). (PRs #664, #670, #671)vmfb (zero op gaps, verified by GemmaTraceTest): new scaledDotProductAttention (with causal + explicit additive mask), permute, narrow, and multi-output split converters, plus boxing-free FloatArray weight externalization for .irpa baking. (PRs #661 et al.)tanh as a first-class activation primitive, and a CPU tensor convert op, plus test/build/CI hygiene. (PRs #648–#651, #631, #636)pow/log and the conv/pool/upsample/split family, the hybrid adaptive dtype-constraint DSL, the @DarcValidated operator-doc flag, and the SentencePiece special-token splitter. (PRs #595, #605–#628)TensorDataFactory.placeholder(...)); Kotlin/Native can finally load GGUFs over 2 GiB via the new POSIX-pread-backed PosixPreadRandomAccessSource. (Issues #587, #589; PRs #588, #591)sk.ainet:skainet-bom now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584)StreamingShardedSafeTensorsReader.loadTensorStorageMapped for zero-copy reads of multi-shard tensors above the 2 GB JVM ByteArray limit. (PR #582)KernelRegistry.bestAvailable(). (PR #571)See CHANGELOG.md for the full release history.
We love contributions! Whether it's a new operator, documentation, or a bug fix:
Browse the full codebase documentation on DeepWiki.
MIT — see LICENCE.
For architecture details see ARCHITECTURE.md.
SKaiNET is a Kotlin Multiplatform AI framework. New here? Choose the path that matches what you want to try first.
| Goal | Start here | Time |
|---|---|---|
| Run tensor operations | Quickstart (below) | 2–5 min |
| Build and train a neural net | Hello Neural Net (below) | 5 min |
| Run a local GGUF model | SKaiNET Transformers starter | 5 min after model setup |
| Export a secure MCU bundle | Minerva getting started | 10 min without firmware flashing |
Working in Java? SKaiNET ships first-class Java support — see the Java getting-started guide.
Use the version shown in this README as the source of truth for first-run snippets. If another page shows a different version, please open an issue or PR.
Add the core dependencies (Gradle Kotlin DSL):
dependencies {
// Recommended: import the umbrella BOM and drop versions on the engine modules.
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation("sk.ainet.core:skainet-lang-core")
implementation("sk.ainet.core:skainet-backend-cpu")
}The BOM was first correctly published to Maven Central in 0.22.2 — earlier versions shipped at the wrong coordinates and could not be imported. Pin versions directly if you need an older release.
val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }
val c = a matMul b
val d = c.relu()// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
println("Tensors: ${reader.tensorCount}")
// Load specific tensor on demand (no whole-file loading)
val bytes = reader.loadTensor("token_embd.weight")
// Or get a TensorStorage descriptor with encoding/placement metadata
val storage = reader.loadTensorStorage("token_embd.weight")
}More examples: SKaiNET-examples | SKaiNET-notebook
SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:
| Project | Description |
|---|---|
| SKaiNET-transformers | Pre-built transformer architectures and layers |
| SKaiNET-examples | Sample projects and integration demos |
| Goal | Start here |
|---|---|
| Examples and sample projects | SKaiNET-examples |
| Interactive notebooks | SKaiNET-notebook |
| Eager backends & kernels (what runs where) | Backends & kernels mindmap |
SKaiNET ships an official Phoronix-Test-Suite-compatible benchmark
program for the compute engine. See the
methodology and replay docs,
the release manifest, and the
CI workflow. Smoke runs fire
on every PR via ubuntu-latest; full publishable runs fire on a
self-hosted Linux x86 runner on release.
Quick local replay:
./gradlew :skainet-backends:benchmarks:jvm-cpu-publish:shadowJar
./scripts/run_engine_smoke.shSKaiNET is built around one path: a model is defined once in the Kotlin DSL, then either compiled to native code or executed eagerly — without rewriting it.
nn { } / dag { }).HloGenerator) and
compile to native code (IREE-compatible) for native / edge targets.flowchart LR
DSL["Model — Kotlin DSL"] --> Graph["Tape / DAG"]
Graph --> HLO["MLIR / StableHLO"]
Graph --> Eager["Eager backend (JVM, …)"]
HLO --> Native["Native code"]The same DSL model feeds both paths — eager execution for development and JVM deployment, the StableHLO path for native and edge targets.
SKaiNET now includes a Minerva export backend for secure MCU deployment. It is a sibling to StableHLO and Arduino/C99 export: it starts from a supported ComputeGraph, lowers static MLPs to a Minerva compiler input, invokes libminerva when configured, and packages generated weights, host fixtures, firmware skeletons, and a fingerprinted manifest.json.
Start here:
Runnable examples:
./gradlew :skainet-compile:skainet-compile-minerva:runMinervaSecureMcuExamples
./gradlew :skainet-compile:skainet-compile-minerva:runMinervaSecureMcuExamples \
-Pminerva.example=sensor-classifiersafe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.nn { input(); dense(); relu(); dense() }
dag { } for ResNet, YOLO-style architecturesHloGenerator
Q5_KBlockTensorData packed type and a Q5KMatmulKernel SPI with scalar (commonMain / Kotlin-Native), JVM Panama Vector, and native-C implementations, wired into DefaultCpuOps matmul dispatch + lazy transpose and the GGUF streaming loader. Q5_K weights now stay packed (no FP32 inflation) and dequantize inside the matmul, like Q4_K/Q6_K.__ARM_NEON so x86 keeps its scalar / auto-vectorized path. The native CMake build gains an aarch64 branch (-march=armv8.2-a+fp16+dotprod, dotprod for Cortex-A55) plus an opt-in cross-compile.skainet-backend-native-cpu now also builds a static archive and exposes the kernels to Kotlin/Native (linuxX64 + linuxArm64) through a KernelProvider, so on-device (non-JVM) binaries get the same hand-tuned kernels the JVM reaches via FFM. (PR #734)sk.ainet.core:skainet-compile-minerva now publishes to Maven Central (packaging fix for the Minerva export module shipped in 0.29.0)..npz compiler input → a libminerva-packaged secure MCU project bundle, with host-side runtime verification and fingerprinted manifest artifacts (runnable sample, examples, ONNX workflow, getting-started docs). Plus packed-quant matmul kernels with Kotlin/Native parity (Q5_0/Q5_1/Q4_K/Q6_K — commonMain scalar + SPI, packed-quant dispatch in DefaultCpuOpsBase, Panama Vector for Q5_1/Q5_0 and Q6_K via the KernelRegistry), and an auto-generated, CI-gated kernel × platform support matrix. (PRs #697–#726)vmfb): inferDagOutputSpecs now infers correct output shapes for shape-changing ops, and reduce_window (pooling) emits IREE's generic region form. (PRs #674, #676)HloGenerator tracing #668) plus non-JVM image runtime support (#671). (PRs #664, #670, #671)vmfb (zero op gaps, verified by GemmaTraceTest): new scaledDotProductAttention (with causal + explicit additive mask), permute, narrow, and multi-output split converters, plus boxing-free FloatArray weight externalization for .irpa baking. (PRs #661 et al.)tanh as a first-class activation primitive, and a CPU tensor convert op, plus test/build/CI hygiene. (PRs #648–#651, #631, #636)pow/log and the conv/pool/upsample/split family, the hybrid adaptive dtype-constraint DSL, the @DarcValidated operator-doc flag, and the SentencePiece special-token splitter. (PRs #595, #605–#628)TensorDataFactory.placeholder(...)); Kotlin/Native can finally load GGUFs over 2 GiB via the new POSIX-pread-backed PosixPreadRandomAccessSource. (Issues #587, #589; PRs #588, #591)sk.ainet:skainet-bom now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584)StreamingShardedSafeTensorsReader.loadTensorStorageMapped for zero-copy reads of multi-shard tensors above the 2 GB JVM ByteArray limit. (PR #582)KernelRegistry.bestAvailable(). (PR #571)See CHANGELOG.md for the full release history.
We love contributions! Whether it's a new operator, documentation, or a bug fix:
Browse the full codebase documentation on DeepWiki.
MIT — see LICENCE.