
Open-source deep learning framework simplifies creation of modern AI applications, adhering to GitFlow for branching and Semantic Versioning for release management.
For architecture details see ARCHITECTURE.md.
Add the core dependencies (Gradle Kotlin DSL):
dependencies {
// Recommended: import the umbrella BOM and drop versions on the engine modules.
implementation(platform("sk.ainet:skainet-bom:0.23.0"))
implementation("sk.ainet.core:skainet-lang-core")
implementation("sk.ainet.core:skainet-backend-cpu")
}The BOM was first correctly published to Maven Central in 0.22.2 — earlier versions shipped at the wrong coordinates and could not be imported. Pin versions directly if you need an older release.
val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }
val c = a matMul b
val d = c.relu()// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
println("Tensors: ${reader.tensorCount}")
// Load specific tensor on demand (no whole-file loading)
val bytes = reader.loadTensor("token_embd.weight")
// Or get a TensorStorage descriptor with encoding/placement metadata
val storage = reader.loadTensorStorage("token_embd.weight")
}More examples: SKaiNET-examples | SKaiNET-notebook
SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:
| Project | Description |
|---|---|
| SKaiNET-transformers | Pre-built transformer architectures and layers |
| SKaiNET-examples | Sample projects and integration demos |
| Goal | Start here |
|---|---|
| Examples and sample projects | SKaiNET-examples |
| Interactive notebooks | SKaiNET-notebook |
SKaiNET ships an official Phoronix-Test-Suite-compatible benchmark
program for the compute engine. See the
methodology and replay docs,
the release manifest, and the
CI workflow. Smoke runs fire
on every PR via ubuntu-latest; full publishable runs fire on a
self-hosted Linux x86 runner on release.
Quick local replay:
./gradlew :skainet-backends:benchmarks:jvm-cpu-publish:shadowJar
./scripts/run_engine_smoke.shsafe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.nn { input(); dense(); relu(); dense() }
dag { } for ResNet, YOLO-style architecturesHloGenerator
FloatArray(shape.volume) for every Linear / Conv weight at module-creation time, even though downstream loaders overwrite those zeros immediately. For an Apertus-8B Q4_K_S GGUF (4.7 GB on disk) that was ~27 GB of FP32 zeros allocated and thrown away — OOMed at 12 GB heap. New TensorDataFactory.placeholder(...) API; every eager zeros(...) call site in the network builders routes through it. Lazy materialization fires only if a caller actually reads the tensor (which the load path never does). Verified end-to-end against unsloth/Apertus-8B-Instruct-2509-GGUF: now loads in 12 GB heap. Same fix benefits Gemma / Llama / Qwen / Voxtral DSL paths transparently. (Issue #587, PR #588)createRandomAccessSource(filePath) had no native actual; K/N consumers fell through to the legacy slurp-into-ByteArray reader, which capped at Int.MAX_VALUE bytes (~2 GiB). Practical impact: macOS / Linux / iOS native couldn't open Q8 models above ~1B parameters or Q4 above ~3B. New POSIX-pread-backed PosixPreadRandomAccessSource covers macosArm64, linuxX64, linuxArm64, iosArm64, iosSimulatorArm64. (Issue #589, PR #591)sk.ainet:skainet-bom now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584)StreamingShardedSafeTensorsReader.loadTensorStorageMapped for zero-copy reads of multi-shard tensors above the 2 GB JVM ByteArray limit. (PR #582)KernelRegistry.bestAvailable(). (PR #571)See CHANGELOG.md for the full release history.
We love contributions! Whether it's a new operator, documentation, or a bug fix:
Browse the full codebase documentation on DeepWiki.
MIT — see LICENCE.
For architecture details see ARCHITECTURE.md.
Add the core dependencies (Gradle Kotlin DSL):
dependencies {
// Recommended: import the umbrella BOM and drop versions on the engine modules.
implementation(platform("sk.ainet:skainet-bom:0.23.0"))
implementation("sk.ainet.core:skainet-lang-core")
implementation("sk.ainet.core:skainet-backend-cpu")
}The BOM was first correctly published to Maven Central in 0.22.2 — earlier versions shipped at the wrong coordinates and could not be imported. Pin versions directly if you need an older release.
val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }
val c = a matMul b
val d = c.relu()// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
println("Tensors: ${reader.tensorCount}")
// Load specific tensor on demand (no whole-file loading)
val bytes = reader.loadTensor("token_embd.weight")
// Or get a TensorStorage descriptor with encoding/placement metadata
val storage = reader.loadTensorStorage("token_embd.weight")
}More examples: SKaiNET-examples | SKaiNET-notebook
SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:
| Project | Description |
|---|---|
| SKaiNET-transformers | Pre-built transformer architectures and layers |
| SKaiNET-examples | Sample projects and integration demos |
| Goal | Start here |
|---|---|
| Examples and sample projects | SKaiNET-examples |
| Interactive notebooks | SKaiNET-notebook |
SKaiNET ships an official Phoronix-Test-Suite-compatible benchmark
program for the compute engine. See the
methodology and replay docs,
the release manifest, and the
CI workflow. Smoke runs fire
on every PR via ubuntu-latest; full publishable runs fire on a
self-hosted Linux x86 runner on release.
Quick local replay:
./gradlew :skainet-backends:benchmarks:jvm-cpu-publish:shadowJar
./scripts/run_engine_smoke.shsafe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.nn { input(); dense(); relu(); dense() }
dag { } for ResNet, YOLO-style architecturesHloGenerator
FloatArray(shape.volume) for every Linear / Conv weight at module-creation time, even though downstream loaders overwrite those zeros immediately. For an Apertus-8B Q4_K_S GGUF (4.7 GB on disk) that was ~27 GB of FP32 zeros allocated and thrown away — OOMed at 12 GB heap. New TensorDataFactory.placeholder(...) API; every eager zeros(...) call site in the network builders routes through it. Lazy materialization fires only if a caller actually reads the tensor (which the load path never does). Verified end-to-end against unsloth/Apertus-8B-Instruct-2509-GGUF: now loads in 12 GB heap. Same fix benefits Gemma / Llama / Qwen / Voxtral DSL paths transparently. (Issue #587, PR #588)createRandomAccessSource(filePath) had no native actual; K/N consumers fell through to the legacy slurp-into-ByteArray reader, which capped at Int.MAX_VALUE bytes (~2 GiB). Practical impact: macOS / Linux / iOS native couldn't open Q8 models above ~1B parameters or Q4 above ~3B. New POSIX-pread-backed PosixPreadRandomAccessSource covers macosArm64, linuxX64, linuxArm64, iosArm64, iosSimulatorArm64. (Issue #589, PR #591)sk.ainet:skainet-bom now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584)StreamingShardedSafeTensorsReader.loadTensorStorageMapped for zero-copy reads of multi-shard tensors above the 2 GB JVM ByteArray limit. (PR #582)KernelRegistry.bestAvailable(). (PR #571)See CHANGELOG.md for the full release history.
We love contributions! Whether it's a new operator, documentation, or a bug fix:
Browse the full codebase documentation on DeepWiki.
MIT — see LICENCE.