Disco

Offers machine learning primitives for building complex neural networks. Features activation functions, layers, optimizers, and training methods, serving as educational resources for optimization and visualization techniques.

#utility
#science
#math
#ai

Suggest an edit

Android JVMJVMKotlin/Native

GitHub stars1

AuthorsPointyware

Open issues7

LicenseApache License 2.0

Creation date7 months ago

Last activity7 months ago

Latest release0.2.2-alpha (7 months ago)

GitHub repository Wiki page

Disco

This project started as a small attempt to recreate some machine learning primitives from scratch in Kotlin. I quickly realized while doing research that I could fairly easily build a small set of primitives to build larger components and scale up complexity rapidly. At the same time that I was learning about different optimization and visualization techniques for my own use, I realized they would make fantastic learning tools for others as well.

ML Primitives

Category	Name	Description
Tensors	Pools	Pools to store and reuse tensors by dimension
Activation Functions	ReLU	Rectified Linear Unit
	Logistic	Often referred to as "the sigmoid"
	Tanh	Hyperbolic tangent
	GELU†	Gaussian Error Linear Unit
	Swish†	Linear interpolation between linear and ReLU
	SwiGLU†	Swish-based Gated Linear Unit
Regularizers	RMSNorm	Normalizes each input by the root-mean-squared across all inputs
	LayerNorm†
Layers	Dense	Linear (Fully Connected)
	Convolutional†
Networks	Sequential Networks	Networks composed entirely of layers, each receiving a single input from the previous layer
	Residual Networks	Layer-based networks that allow skip connections
Cost/Loss Functions	Mean Squared Error
	Cross Entropy Loss	Converts the output predictions to
Optimizers	Gradient Descent	Computes gradients across all samples before updating parameters.
	Stochastic (Gradient Descent)	Computes gradients across stochastically selected samples before updating parameters.
	Adam†	Momentum-based; performs multiple passes over samples and parameter updates in a single epoch
Training	Sequential Trainer	Trainer for a Sequential Network
	AutoDiff Trainer	Trainer for any network that produces a computation graph.
	Organic Trainer†	Trainer that modifies a network according to statistics; mimics neurogenesis and ablation at alternating stages

† - Planned/Experimental

classDiagram
    class Tensor {
        dimensions: List~Int~
        get(indices: List~Int~): Double
    }
    class ActivationFunction {
        calculate(value: Double): Double
    }
    class Layer {
        weights: Tensor
        biases: Tensor
        activation: ActivationFunction
    }
    Layer *--> Tensor
    Layer *--> ActivationFunction

    note for Network "A neural network composed of neurons."
    class Network
    class SequentialNetwork {
        layers: List~Layer~
    }
    SequentialNetwork *--> "1..*" Layer
    Network <|-- SequentialNetwork

    class Loss {
        calculate(expected: Tensor, actual: Tensor): Double
    }
    note for Optimizer "An optimizer is responsible for <br>adjusting the weights and biases <br>of a layer based on the error <br>gradient."
    class Optimizer {
        batch()
        update()
    }

    class EpochStatistics {
        onEpochStart()
        onEpochEnd()
    }
    class BatchStatistics {
        onBatchStart()
        onBatchEnd()
    }
    class SampleStatistics {
        onSampleStart()
        onSampleEnd()
    }
    class LayerStatistics {
        onLayerStart()
        onLayerEnd()
    }
    class GradientDescent
    GradientDescent --|> Optimizer
    GradientDescent --|> SampleStatistics
    class StochasticGradientDescent
    StochasticGradientDescent --|> Optimizer
    StochasticGradientDescent --|> BatchStatistics
    class Adam
    Adam --|> Optimizer
    Adam --|> BatchStatistics

    note for StudyCase "A study case associates an <br>input with an expected output."
    class StudyCase {
        input: Tensor
        output: Tensor
    }

    note for SequentialTrainer "A trainer presents cases to <br>a network and tracks gradients <br>for back-propagation."
    class SequentialTrainer {
        network: SequentialNetwork
        cases: StudyCase
        lossFunction: Loss
        optimizer: Optimizer
    }
    SequentialTrainer *--> SequentialNetwork
    SequentialTrainer *--> "1.." StudyCase
    SequentialTrainer *--> Loss
    SequentialTrainer *--> Optimizer

    class LearningTensor
    class SimpleTensor
    Tensor <|-- LearningTensor
    Tensor <|-- SimpleTensor

    class ReLU
    class Sigmoid
    class Linear
    ActivationFunction <|-- ReLU
    ActivationFunction <|-- Sigmoid
    ActivationFunction <|-- Linear

    class MeanSquaredError {
    }
    Loss <|-- MeanSquaredError

    class StochasticGradientDescent {
    }
    Optimizer <|-- StochasticGradientDescent

Project Structure

The structure of this project is based on Clean Architecture applied to Android's MVVM architecture. UI and Data implementations occupy the outermost frameworks/drivers layer. ViewModels, Repository Implementations, Data Source interfaces and occupy the adapter/interfaces layer. Interactors and Repository interfaces occupy the application business layer. Entities occupy the enterprise business layer.

graph
    subgraph apps
    :app-android --> :app-shared
    :app-desktop --> :app-shared
    end
    apps --> features

    subgraph features
    :feature-training --> :feature-simulation
    :feature-simulation-training --> :feature-simulation
    :feature-simulation-training --> :feature-training
    :feature-training
    :feature-evolution --> :feature-simulation
    end
    features --> core

    subgraph core
    :core-ui --> :core-viewmodels --> :core-interactors --> :core-data --> :core-entities --> :core-common
    end

Research Citations

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton. Layer Normalization. arXiv:1607.06450v1, 2016.
Tianqi Chen, Ian Goodfellow, Jonathon Shlens. Net2Net: Accelerating Learning Via Knowledge Transfer. arXiv:1511.05641v4, 2016.
David Ha, Jürgen Schmidhuber. World Models. arXiv:1803.10122v4, 2018.
Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685v2, 2021.
Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. URL: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf, retrieved 2025.
Diederik P. Kingma, Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980v9, 2017.
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. Visualizing the Loss Landscape of Neural Nets. arXiv:1712.09913v3, 2018.
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever. Improving Language Understanding by Generative Pre-Training. arXiv:1810.04805v1, 2018.
Ravid Schwartz-Ziv, Naftali Tishby. Opening the Black Box of Deep Neural Networks via Information. arXiv preprint arXiv:1703.00810v3, 2017.
Sathya Krishnan Suresh, Shunmugapriya P. Towards Smaller, Faster Decoder-Only Transformers: Architectural Variants and Their Implications. arXiv:2404.14462v4, 2024.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention is All You Need. arXiv preprint arXiv:IDvN, 2017.
Biao Zhang, Rico Sennrich. Root Mean Square Layer Normalization. arXiv:1910.07467v1, 2019.

Android JVMJVMKotlin/Native

GitHub stars1

AuthorsPointyware

Open issues7

LicenseApache License 2.0

Creation date7 months ago

Last activity7 months ago

Latest release0.2.2-alpha (7 months ago)

GitHub repository Wiki page

Disco

ML Primitives

Category	Name	Description
Tensors	Pools	Pools to store and reuse tensors by dimension
Activation Functions	ReLU	Rectified Linear Unit
	Logistic	Often referred to as "the sigmoid"
	Tanh	Hyperbolic tangent
	GELU†	Gaussian Error Linear Unit
	Swish†	Linear interpolation between linear and ReLU
	SwiGLU†	Swish-based Gated Linear Unit
Regularizers	RMSNorm	Normalizes each input by the root-mean-squared across all inputs
	LayerNorm†
Layers	Dense	Linear (Fully Connected)
	Convolutional†
Networks	Sequential Networks	Networks composed entirely of layers, each receiving a single input from the previous layer
	Residual Networks	Layer-based networks that allow skip connections
Cost/Loss Functions	Mean Squared Error
	Cross Entropy Loss	Converts the output predictions to
Optimizers	Gradient Descent	Computes gradients across all samples before updating parameters.
	Stochastic (Gradient Descent)	Computes gradients across stochastically selected samples before updating parameters.
	Adam†	Momentum-based; performs multiple passes over samples and parameter updates in a single epoch
Training	Sequential Trainer	Trainer for a Sequential Network
	AutoDiff Trainer	Trainer for any network that produces a computation graph.
	Organic Trainer†	Trainer that modifies a network according to statistics; mimics neurogenesis and ablation at alternating stages

† - Planned/Experimental

classDiagram
    class Tensor {
        dimensions: List~Int~
        get(indices: List~Int~): Double
    }
    class ActivationFunction {
        calculate(value: Double): Double
    }
    class Layer {
        weights: Tensor
        biases: Tensor
        activation: ActivationFunction
    }
    Layer *--> Tensor
    Layer *--> ActivationFunction

    note for Network "A neural network composed of neurons."
    class Network
    class SequentialNetwork {
        layers: List~Layer~
    }
    SequentialNetwork *--> "1..*" Layer
    Network <|-- SequentialNetwork

    class Loss {
        calculate(expected: Tensor, actual: Tensor): Double
    }
    note for Optimizer "An optimizer is responsible for <br>adjusting the weights and biases <br>of a layer based on the error <br>gradient."
    class Optimizer {
        batch()
        update()
    }

    class EpochStatistics {
        onEpochStart()
        onEpochEnd()
    }
    class BatchStatistics {
        onBatchStart()
        onBatchEnd()
    }
    class SampleStatistics {
        onSampleStart()
        onSampleEnd()
    }
    class LayerStatistics {
        onLayerStart()
        onLayerEnd()
    }
    class GradientDescent
    GradientDescent --|> Optimizer
    GradientDescent --|> SampleStatistics
    class StochasticGradientDescent
    StochasticGradientDescent --|> Optimizer
    StochasticGradientDescent --|> BatchStatistics
    class Adam
    Adam --|> Optimizer
    Adam --|> BatchStatistics

    note for StudyCase "A study case associates an <br>input with an expected output."
    class StudyCase {
        input: Tensor
        output: Tensor
    }

    note for SequentialTrainer "A trainer presents cases to <br>a network and tracks gradients <br>for back-propagation."
    class SequentialTrainer {
        network: SequentialNetwork
        cases: StudyCase
        lossFunction: Loss
        optimizer: Optimizer
    }
    SequentialTrainer *--> SequentialNetwork
    SequentialTrainer *--> "1.." StudyCase
    SequentialTrainer *--> Loss
    SequentialTrainer *--> Optimizer

    class LearningTensor
    class SimpleTensor
    Tensor <|-- LearningTensor
    Tensor <|-- SimpleTensor

    class ReLU
    class Sigmoid
    class Linear
    ActivationFunction <|-- ReLU
    ActivationFunction <|-- Sigmoid
    ActivationFunction <|-- Linear

    class MeanSquaredError {
    }
    Loss <|-- MeanSquaredError

    class StochasticGradientDescent {
    }
    Optimizer <|-- StochasticGradientDescent

Project Structure

graph
    subgraph apps
    :app-android --> :app-shared
    :app-desktop --> :app-shared
    end
    apps --> features

    subgraph features
    :feature-training --> :feature-simulation
    :feature-simulation-training --> :feature-simulation
    :feature-simulation-training --> :feature-training
    :feature-training
    :feature-evolution --> :feature-simulation
    end
    features --> core

    subgraph core
    :core-ui --> :core-viewmodels --> :core-interactors --> :core-data --> :core-entities --> :core-common
    end

Research Citations

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton. Layer Normalization. arXiv:1607.06450v1, 2016.
Tianqi Chen, Ian Goodfellow, Jonathon Shlens. Net2Net: Accelerating Learning Via Knowledge Transfer. arXiv:1511.05641v4, 2016.
David Ha, Jürgen Schmidhuber. World Models. arXiv:1803.10122v4, 2018.
Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685v2, 2021.
Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. URL: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf, retrieved 2025.
Diederik P. Kingma, Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980v9, 2017.
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. Visualizing the Loss Landscape of Neural Nets. arXiv:1712.09913v3, 2018.
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever. Improving Language Understanding by Generative Pre-Training. arXiv:1810.04805v1, 2018.
Ravid Schwartz-Ziv, Naftali Tishby. Opening the Black Box of Deep Neural Networks via Information. arXiv preprint arXiv:1703.00810v3, 2017.
Sathya Krishnan Suresh, Shunmugapriya P. Towards Smaller, Faster Decoder-Only Transformers: Architectural Variants and Their Implications. arXiv:2404.14462v4, 2024.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention is All You Need. arXiv preprint arXiv:IDvN, 2017.
Biao Zhang, Rico Sennrich. Root Mean Square Layer Normalization. arXiv:1910.07467v1, 2019.