
Open-source deep learning framework simplifies creation of modern AI applications, adhering to GitFlow for branching and Semantic Versioning for release management.
SKaiNET is an open-source deep learning framework written in Kotlin, designed with developers in mind to enable the creation of modern AI-powered applications with ease.
SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.
[!IMPORTANT] About the name
“SKaiNET” is a working project name chosen early in the project’s life as part of a personal learning and experimentation effort, before any trademark considerations were known.
The name is not intended to reference, infringe, or imply association with any existing trademarks, companies, or products. It is not a commercial brand and is not claimed or assignable to any company or organization that contributors may be affiliated with.
If a naming conflict arises, the project name may be changed in the future.
SKaiNET uses a hybrid backend strategy that separates development iteration from production deployment.
MNIST, Fashion-MNIST, CIFAR-10
GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG, etc.)Llama (via KLlama), Gemma, BERT (via KBert)// Data Transformation Pipeline
val transform = transforms<PlatformBitmapImage, Tensor<FP32, Float>> {
resize(224, 224)
centerCrop(200, 200)
toTensor(ctx)
normalize(ctx, mean = floatArrayOf(0.485f, 0.456f, 0.406f), std = floatArrayOf(0.229f, 0.224f, 0.225f))
}
val processedTensor = transform.apply(rawImage)// data loaders
val ds = MNIST.load(train = true)
val (x, y) = ds.nextBatch(64)
// Type-safe tensor creation via tensor DSL
val ctx = DefaultNeuralNetworkExecutionContext()
val mask = data<FP32, Float>(ctx) {
tensor{
shape(3, 3) {
from(
1f, 0f, 0f,
1f, 1f, 0f,
1f, 1f, 1f,
)
}
}
}
val t = tensor<FP32, Float>(ctx, FP32::class) {
tensor {
shape(2, 3) {
from(
0f, 1f, 2f,
10f, 11f, 12f
)
}
}
}
println("shape=${t.shape} first=${t.data[0,0]}")val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}For complex architectures with arbitrary wiring (like YOLO or ResNet), use the Graph DSL:
val program = dag {
val x = input<FP32>("input", spec)
val c1 = conv2d(x, w1, b1, padding = 1 to 1)
val c2 = conv2d(c1, w2, b2, padding = 1 to 1)
val sum = add(x, c2)
output(relu(sum))
}Read the Graph DSL Documentation for more details.
// Works smoothly in Kotlin Notebooks
display(model.summary())
println(ds.describe())ConstantFoldingPass: Folds arithmetic operations with constant operands.OperationFusionPass: Fuses multiple ops (e.g., Add + ReLU) into efficient kernels.DeadCodeEliminationPass: Removes unused computations.// Applying Compiler Optimizations
val optimizer = StableHloOptimizer.createDefault()
val optimizedModule = optimizer.optimize(mlirModule)// Export model to an Arduino library
val facade = CCodegenFacade()
facade.exportToArduinoLibrary(
model = model,
forwardPass = { ctx -> model.forward(input, ctx) },
outputPath = "build/arduino",
libraryName = "MyModel"
)Read the Deep Technical Explanation for more details.
dependencies {
implementation("sk.ainet.core:SKaiNET-lang-core:0.13.0")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.13.0")Generate text with just a few lines of code using any Llama-based GGUF model:
val ctx = DirectCpuExecutionContext()
val ingestion = LlamaIngestion(ctx)
// Load model and tokenizer
val weights = ingestion.load { SystemFileSystem.source(Path("model.gguf")).buffered() }
val tokenizer = GGUFTokenizer.fromSource(SystemFileSystem.source(Path("model.gguf")).buffered())
// Generate!
val runtime = LlamaRuntime(ctx, weights)
runtime.generate(tokenizer.encode("Once upon a time"), steps = 64) { token ->
print(tokenizer.decode(token))
}Gradle (Kotlin DSL):
dependencyResolutionManagement {
repositories {
mavenCentral()
}
}
dependencies {
// minimal dependency with simple CPU backend
implementation("sk.ainet.core:SKaiNET-lang-core:0.13.0")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.13.0")
// simple model zoo
implementation("sk.ainet.core:SKaiNET-lang-models:0.13.0")
// Optional I/O (e.g., GGUF loader, SafeTensors, JSON)
implementation("sk.ainet.core:SKaiNET-io-core:0.13.0")
implementation("sk.ainet.io:skainet-io-safetensors:0.13.0")
implementation("sk.ainet.core:SKaiNET-io-gguf:0.13.0")
// Apps & Runtimes
implementation("sk.ainet.apps:skainet-llm:0.13.0") // Llama runtime
implementation("sk.ainet.apps:skainet-bert:0.13.0") // BERT runtime
}Maven:
<dependency>
<groupId>sk.ainet.core</groupId>
<artifactId>SKaiNET-lang-core</artifactId>
<version>0.13.0</version>
</dependency>skainet-kllama-agent module with support for function calling and tool use.// Example: KLlama tool calling
val agent = KLlamaAgent(llama, tools = listOf(WeatherTool()))
val response = agent.chat("What's the weather like in London?")// Example: Generating embeddings with KBert
val runtime = BertRuntime(ctx, weights, FP32::class)
val emb = runtime.encode(inputIds, attentionMask, tokenTypeIds)BenchmarkDsl and ExecutionObserver for detailed performance analysis.WeightMapper and progress tracking.// Example: Loading SafeTensors weights
val loader = SafeTensorsParametersLoader(ctx)
loader.load("model.safetensors", model)See CHANGELOG.md for the full list.
mmap for zero-copy loading.Q8_0, Q4_K, and BitNet/Ternary (TQ1_0, TQ2_0) formats.LeakyReLU, ELU, AvgPool2d, Conv1d, and Conv3d.Adam, AdamW optimizers and Accuracy metrics.CIFAR-10, Fashion-MNIST, and a new Data Transform API.// Example: Streaming inference with KLlama (GGUF)
val llama = KLlama.load("path/to/model.gguf")
llama.generate("Once upon a time") { token ->
print(token) // streaming output
}See CHANGELOG.md for the full list.
DefaultGradientTape.SgdOptimizer and training DSL to build and run training loops.MSELoss and CrossEntropyLoss with configurable reduction strategies.// Example training step with Autograd
val loss = MSELoss()
val optimizer = sgd(lr = 0.01)
val (tape, l) = record { loss.forward(model.forward(x, ctx), y, ctx) }
tape.computeGradients(targets = listOf(l), sources = model.parameters())
optimizer.step()See CHANGELOG.md for the full list.
// Compile model to StableHLO and run on CUDA
val ir = Compile.toStableHlo(model)
println(ir.pretty())// Export model to an Arduino library
val facade = CCodegenFacade()
facade.exportToArduinoLibrary(
model = model,
forwardPass = { ctx -> model.forward(input, ctx) },
outputPath = "build/arduino",
libraryName = "MyModel"
)See CHANGELOG.md for the full list.
val model = nn {
input(64)
dense(out = 64)
// KAN layer (preview) with residual when dims match
kanLayer(outputDim = 64, gridSize = 16, useResidual = true)
dense(out = 10)
}val base = DefaultNeuralNetworkExecutionContext() // default = EVAL
val yTrain = train(base) { ctx -> model.forward(x, ctx) }
val yEval = eval(base) { ctx -> model.forward(x, ctx) }val y = x
.let { dropout(p = 0.1).forward(it, ctx) }
.let { batchNorm(numFeatures = 64).forward(it, ctx) }val model = nn {
conv2d(outChannels = 16, kernel = 3)
maxPool2d(kernel = 2)
dense(out = 10)
}val ds = MNIST.load(train = true) // platform-aware loader
val (batchX, batchY) = ds.nextBatch(64)val gguf = GGUF.read("/path/to/model.gguf")
println("Tensors: ${gguf.tensors.size}")SKaiNET includes an initial KAN layer implementation that you can wire into the NN DSL. A KAN layer expands each input feature by a learnable grid of basis coefficients and then mixes them with a linear projection, with optional bias and residual connection.
gridSize, useBias, useResidual, and a custom baseActivation are supported. The degree parameter is reserved for future spline/basis functions and is not yet used.Quick usage example:
val model = nn {
input(64)
dense(out = 64)
// Add a KAN layer that keeps the same dimensionality and uses a residual connection
kanLayer(outputDim = 64, gridSize = 16, useResidual = true)
dense(out = 10)
}Notes and limitations:
outputDim == inputDim.See source for details:
Minimize cosine distance between tensors with just a few lines:
skainet(ctx) {
val a = tensor(1f, 0f, 0f).withRequiresGrad()
val b = tensor(0f, 1f, 0f)
// Record and compute gradients
val (tape, distance) = record { a.cosineDistance(b) }
tape.computeGradients(targets = listOf(distance), sources = listOf(a))
// Optimize
val optimizer = sgd(lr = 0.5)
optimizer.addParameter(a)
optimizer.step()
println("Distance decreased to: ${a.cosineDistance(b).data.get()}")
}We love contributions! Whether it's a new operator, documentation, or a bug fix:
MIT — see LICENCE.
SKaiNET is an open-source deep learning framework written in Kotlin, designed with developers in mind to enable the creation of modern AI-powered applications with ease.
SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.
[!IMPORTANT] About the name
“SKaiNET” is a working project name chosen early in the project’s life as part of a personal learning and experimentation effort, before any trademark considerations were known.
The name is not intended to reference, infringe, or imply association with any existing trademarks, companies, or products. It is not a commercial brand and is not claimed or assignable to any company or organization that contributors may be affiliated with.
If a naming conflict arises, the project name may be changed in the future.
SKaiNET uses a hybrid backend strategy that separates development iteration from production deployment.
MNIST, Fashion-MNIST, CIFAR-10
GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG, etc.)Llama (via KLlama), Gemma, BERT (via KBert)// Data Transformation Pipeline
val transform = transforms<PlatformBitmapImage, Tensor<FP32, Float>> {
resize(224, 224)
centerCrop(200, 200)
toTensor(ctx)
normalize(ctx, mean = floatArrayOf(0.485f, 0.456f, 0.406f), std = floatArrayOf(0.229f, 0.224f, 0.225f))
}
val processedTensor = transform.apply(rawImage)// data loaders
val ds = MNIST.load(train = true)
val (x, y) = ds.nextBatch(64)
// Type-safe tensor creation via tensor DSL
val ctx = DefaultNeuralNetworkExecutionContext()
val mask = data<FP32, Float>(ctx) {
tensor{
shape(3, 3) {
from(
1f, 0f, 0f,
1f, 1f, 0f,
1f, 1f, 1f,
)
}
}
}
val t = tensor<FP32, Float>(ctx, FP32::class) {
tensor {
shape(2, 3) {
from(
0f, 1f, 2f,
10f, 11f, 12f
)
}
}
}
println("shape=${t.shape} first=${t.data[0,0]}")val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}For complex architectures with arbitrary wiring (like YOLO or ResNet), use the Graph DSL:
val program = dag {
val x = input<FP32>("input", spec)
val c1 = conv2d(x, w1, b1, padding = 1 to 1)
val c2 = conv2d(c1, w2, b2, padding = 1 to 1)
val sum = add(x, c2)
output(relu(sum))
}Read the Graph DSL Documentation for more details.
// Works smoothly in Kotlin Notebooks
display(model.summary())
println(ds.describe())ConstantFoldingPass: Folds arithmetic operations with constant operands.OperationFusionPass: Fuses multiple ops (e.g., Add + ReLU) into efficient kernels.DeadCodeEliminationPass: Removes unused computations.// Applying Compiler Optimizations
val optimizer = StableHloOptimizer.createDefault()
val optimizedModule = optimizer.optimize(mlirModule)// Export model to an Arduino library
val facade = CCodegenFacade()
facade.exportToArduinoLibrary(
model = model,
forwardPass = { ctx -> model.forward(input, ctx) },
outputPath = "build/arduino",
libraryName = "MyModel"
)Read the Deep Technical Explanation for more details.
dependencies {
implementation("sk.ainet.core:SKaiNET-lang-core:0.13.0")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.13.0")Generate text with just a few lines of code using any Llama-based GGUF model:
val ctx = DirectCpuExecutionContext()
val ingestion = LlamaIngestion(ctx)
// Load model and tokenizer
val weights = ingestion.load { SystemFileSystem.source(Path("model.gguf")).buffered() }
val tokenizer = GGUFTokenizer.fromSource(SystemFileSystem.source(Path("model.gguf")).buffered())
// Generate!
val runtime = LlamaRuntime(ctx, weights)
runtime.generate(tokenizer.encode("Once upon a time"), steps = 64) { token ->
print(tokenizer.decode(token))
}Gradle (Kotlin DSL):
dependencyResolutionManagement {
repositories {
mavenCentral()
}
}
dependencies {
// minimal dependency with simple CPU backend
implementation("sk.ainet.core:SKaiNET-lang-core:0.13.0")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.13.0")
// simple model zoo
implementation("sk.ainet.core:SKaiNET-lang-models:0.13.0")
// Optional I/O (e.g., GGUF loader, SafeTensors, JSON)
implementation("sk.ainet.core:SKaiNET-io-core:0.13.0")
implementation("sk.ainet.io:skainet-io-safetensors:0.13.0")
implementation("sk.ainet.core:SKaiNET-io-gguf:0.13.0")
// Apps & Runtimes
implementation("sk.ainet.apps:skainet-llm:0.13.0") // Llama runtime
implementation("sk.ainet.apps:skainet-bert:0.13.0") // BERT runtime
}Maven:
<dependency>
<groupId>sk.ainet.core</groupId>
<artifactId>SKaiNET-lang-core</artifactId>
<version>0.13.0</version>
</dependency>skainet-kllama-agent module with support for function calling and tool use.// Example: KLlama tool calling
val agent = KLlamaAgent(llama, tools = listOf(WeatherTool()))
val response = agent.chat("What's the weather like in London?")// Example: Generating embeddings with KBert
val runtime = BertRuntime(ctx, weights, FP32::class)
val emb = runtime.encode(inputIds, attentionMask, tokenTypeIds)BenchmarkDsl and ExecutionObserver for detailed performance analysis.WeightMapper and progress tracking.// Example: Loading SafeTensors weights
val loader = SafeTensorsParametersLoader(ctx)
loader.load("model.safetensors", model)See CHANGELOG.md for the full list.
mmap for zero-copy loading.Q8_0, Q4_K, and BitNet/Ternary (TQ1_0, TQ2_0) formats.LeakyReLU, ELU, AvgPool2d, Conv1d, and Conv3d.Adam, AdamW optimizers and Accuracy metrics.CIFAR-10, Fashion-MNIST, and a new Data Transform API.// Example: Streaming inference with KLlama (GGUF)
val llama = KLlama.load("path/to/model.gguf")
llama.generate("Once upon a time") { token ->
print(token) // streaming output
}See CHANGELOG.md for the full list.
DefaultGradientTape.SgdOptimizer and training DSL to build and run training loops.MSELoss and CrossEntropyLoss with configurable reduction strategies.// Example training step with Autograd
val loss = MSELoss()
val optimizer = sgd(lr = 0.01)
val (tape, l) = record { loss.forward(model.forward(x, ctx), y, ctx) }
tape.computeGradients(targets = listOf(l), sources = model.parameters())
optimizer.step()See CHANGELOG.md for the full list.
// Compile model to StableHLO and run on CUDA
val ir = Compile.toStableHlo(model)
println(ir.pretty())// Export model to an Arduino library
val facade = CCodegenFacade()
facade.exportToArduinoLibrary(
model = model,
forwardPass = { ctx -> model.forward(input, ctx) },
outputPath = "build/arduino",
libraryName = "MyModel"
)See CHANGELOG.md for the full list.
val model = nn {
input(64)
dense(out = 64)
// KAN layer (preview) with residual when dims match
kanLayer(outputDim = 64, gridSize = 16, useResidual = true)
dense(out = 10)
}val base = DefaultNeuralNetworkExecutionContext() // default = EVAL
val yTrain = train(base) { ctx -> model.forward(x, ctx) }
val yEval = eval(base) { ctx -> model.forward(x, ctx) }val y = x
.let { dropout(p = 0.1).forward(it, ctx) }
.let { batchNorm(numFeatures = 64).forward(it, ctx) }val model = nn {
conv2d(outChannels = 16, kernel = 3)
maxPool2d(kernel = 2)
dense(out = 10)
}val ds = MNIST.load(train = true) // platform-aware loader
val (batchX, batchY) = ds.nextBatch(64)val gguf = GGUF.read("/path/to/model.gguf")
println("Tensors: ${gguf.tensors.size}")SKaiNET includes an initial KAN layer implementation that you can wire into the NN DSL. A KAN layer expands each input feature by a learnable grid of basis coefficients and then mixes them with a linear projection, with optional bias and residual connection.
gridSize, useBias, useResidual, and a custom baseActivation are supported. The degree parameter is reserved for future spline/basis functions and is not yet used.Quick usage example:
val model = nn {
input(64)
dense(out = 64)
// Add a KAN layer that keeps the same dimensionality and uses a residual connection
kanLayer(outputDim = 64, gridSize = 16, useResidual = true)
dense(out = 10)
}Notes and limitations:
outputDim == inputDim.See source for details:
Minimize cosine distance between tensors with just a few lines:
skainet(ctx) {
val a = tensor(1f, 0f, 0f).withRequiresGrad()
val b = tensor(0f, 1f, 0f)
// Record and compute gradients
val (tape, distance) = record { a.cosineDistance(b) }
tape.computeGradients(targets = listOf(distance), sources = listOf(a))
// Optimize
val optimizer = sgd(lr = 0.5)
optimizer.addParameter(a)
optimizer.step()
println("Distance decreased to: ${a.cosineDistance(b).data.get()}")
}We love contributions! Whether it's a new operator, documentation, or a bug fix:
MIT — see LICENCE.