
Line-by-line transliteration of a battle-tested base64 implementation offering high-performance encoders/decoders, engine-level APIs, canonical-padding handling, and configurable decoding for non-canonical inputs.
This is a Kotlin Multiplatform line-by-line transliteration port of marshallpierce/rust-base64.
Original Project: This port is based on marshallpierce/rust-base64. All design credit and project intent belong to the upstream authors; this repository is a faithful port to Kotlin Multiplatform with no behavioural changes intended.
This is an in-progress port. The goal is feature parity with the upstream Rust crate while providing a native Kotlin Multiplatform API. Every Kotlin file carries a // port-lint: source <path> header naming its upstream Rust counterpart so the AST-distance tool can track provenance.
The text below is reproduced and lightly edited from
https://github.com/marshallpierce/rust-base64. It is the upstream project's own description and remains under the upstream authors' authorship; links have been rewritten to absolute upstream URLs so they continue to resolve from this repository.
Made with CLion. Thanks to JetBrains for supporting open source!
It's base64. What more could anyone want?
This library's goals are to be correct and fast. It's thoroughly tested and widely used. It exposes functionality at
multiple levels of abstraction so you can choose the level of convenience vs performance that you want,
e.g. decode_engine_slice decodes into an existing &mut [u8] and is pretty fast (2.6GiB/s for a 3 KiB input),
whereas decode_engine allocates a new Vec<u8> and returns it, which might be more convenient in some cases, but is
slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
See the docs for all the details.
Remove non-base64 characters from your input before decoding.
If you have a Vec of base64, retain can be used to
strip out whatever you need removed.
If you have a Read (e.g. reading a file or network socket), there are various approaches.
Read's bytes() to filter out unwanted bytes.Read with a read() impl that delegates to your actual Read, and then drops any bytes you don't want.line-wrap does just that.
First, don't do this. You should no more expect Base64 to be canonical than you should expect compression algorithms to produce canonical output across all usage in the wild (hint: they don't). However, people are drawn to their own destruction like moths to a flame, so here we are.
There are two opportunities for non-canonical encoding (and thus, detection of the same during decoding): the final bits
of the last encoded token in two or three token suffixes, and the = token used to inflate the suffix to a full four
tokens.
The trailing bits issue is unavoidable: with 6 bits available in each encoded token, 1 input byte takes 2 tokens, with the second one having some bits unused. Same for two input bytes: 16 bits, but 3 tokens have 18 bits. Unless we decide to stop shipping whole bytes around, we're stuck with those extra bits that a sneaky or buggy encoder might set to 1 instead of 0.
The = pad bytes, on the other hand, are entirely a self-own by the Base64 standard. They do not affect decoding other
than to provide an opportunity to say "that padding is incorrect". Exabytes of storage and transfer have no doubt been
wasted on pointless = bytes. Somehow we all seem to be quite comfortable with, say, hex-encoded data just stopping
when it's done rather than requiring a confirmation that the author of the encoder could count to four. Anyway, there
are two ways to make pad bytes predictable: require canonical padding to the next multiple of four bytes as per the RFC,
or, if you control all producers and consumers, save a few bytes by requiring no padding (especially applicable to the
url-safe alphabet).
All Engine implementations must at a minimum support treating non-canonical padding of both types as an error, and
optionally may allow other behaviors.
The minimum supported Rust version is 1.48.0.
Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody wants to chase bugs in encoding of any sort.
All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the free time to give each PR the attention it deserves. I will get to everyone eventually!
Benchmarks are in benches/.
cargo benchThis crate supports no_std. By default the crate targets std via the std feature. You can deactivate
the default-features to target core instead. In that case you lose out on all the functionality revolving
around std::io, std::error::Error, and heap allocations. There is an additional alloc feature that you can activate
to bring back the support for heap allocations.
On Linux, you can use perf for profiling. Then compile the
benchmarks with cargo bench --no-run.
Run the benchmark binary with perf (shown here filtering to one particular benchmark, which will make the results
easier to read). perf is only available to the root user on most systems as it fiddles with event counters in your
CPU, so use sudo. We need to run the actual benchmark binary, hence the path into target. You can see the actual
full path with cargo bench -v; it will print out the commands it runs. If you use the exact path
that bench outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want
to cargo clean so you have only one benchmarks- binary (they tend to accumulate).
sudo perf record target/release/deps/benchmarks-* --bench decode_10mib_reuseThen analyze the results, again with perf:
sudo perf annotate -lYou'll see a bunch of interleaved rust source and assembly like this. The section with lib.rs:327 is telling us that
4.02% of samples saw the movzbl aka bit shift as the active instruction. However, this percentage is not as exact as
it seems due to a phenomenon called skid. Basically, a consequence of how fancy modern CPUs are is that this sort of
instruction profiling is inherently inaccurate, especially in branch-heavy code.
lib.rs:322 0.70 : 10698: mov %rdi,%rax
2.82 : 1069b: shr $0x38,%rax
: if morsel == decode_tables::INVALID_VALUE {
: bad_byte_index = input_index;
: break;
: };
: accum = (morsel as u64) << 58;
lib.rs:327 4.02 : 1069f: movzbl (%r9,%rax,1),%r15d
: // fast loop of 8 bytes at a time
: while input_index < length_of_full_chunks {
: let mut accum: u64;
:
: let input_chunk = BigEndian::read_u64(&input_bytes[input_index..(input_index + 8)]);
: morsel = decode_table[(input_chunk >> 56) as usize];
lib.rs:322 3.68 : 106a4: cmp $0xff,%r15
: if morsel == decode_tables::INVALID_VALUE {
0.00 : 106ab: je 1090e <base64::decode_config_buf::hbf68a45fefa299c1+0x46e>
This uses cargo-fuzz. See fuzz/fuzzers for the available fuzzing scripts.
To run, use an invocation like these:
cargo +nightly fuzz run roundtrip
cargo +nightly fuzz run roundtrip_no_pad
cargo +nightly fuzz run roundtrip_random_config -- -max_len=10240
cargo +nightly fuzz run decode_randomThis project is dual-licensed under MIT and Apache 2.0.
dependencies {
implementation("io.github.kotlinmania:base64-kotlin:0.1.0")
}./gradlew build
./gradlew testSee AGENTS.md and CLAUDE.md for translator discipline, port-lint header convention, and Rust → Kotlin idiom mapping.
This Kotlin port is distributed under the same MIT license as the upstream marshallpierce/rust-base64. See LICENSE (and any sibling LICENSE-* / NOTICE files mirrored from upstream) for the full text.
Original work copyrighted by the rust-base64 authors.
Kotlin port: Copyright (c) 2026 Sydney Renee and The Solace Project.
Thanks to the marshallpierce/rust-base64 maintainers and contributors for the original Rust implementation. This port reproduces their work in Kotlin Multiplatform; bug reports about upstream design or behavior should go to the upstream repository.
This is a Kotlin Multiplatform line-by-line transliteration port of marshallpierce/rust-base64.
Original Project: This port is based on marshallpierce/rust-base64. All design credit and project intent belong to the upstream authors; this repository is a faithful port to Kotlin Multiplatform with no behavioural changes intended.
This is an in-progress port. The goal is feature parity with the upstream Rust crate while providing a native Kotlin Multiplatform API. Every Kotlin file carries a // port-lint: source <path> header naming its upstream Rust counterpart so the AST-distance tool can track provenance.
The text below is reproduced and lightly edited from
https://github.com/marshallpierce/rust-base64. It is the upstream project's own description and remains under the upstream authors' authorship; links have been rewritten to absolute upstream URLs so they continue to resolve from this repository.
Made with CLion. Thanks to JetBrains for supporting open source!
It's base64. What more could anyone want?
This library's goals are to be correct and fast. It's thoroughly tested and widely used. It exposes functionality at
multiple levels of abstraction so you can choose the level of convenience vs performance that you want,
e.g. decode_engine_slice decodes into an existing &mut [u8] and is pretty fast (2.6GiB/s for a 3 KiB input),
whereas decode_engine allocates a new Vec<u8> and returns it, which might be more convenient in some cases, but is
slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
See the docs for all the details.
Remove non-base64 characters from your input before decoding.
If you have a Vec of base64, retain can be used to
strip out whatever you need removed.
If you have a Read (e.g. reading a file or network socket), there are various approaches.
Read's bytes() to filter out unwanted bytes.Read with a read() impl that delegates to your actual Read, and then drops any bytes you don't want.line-wrap does just that.
First, don't do this. You should no more expect Base64 to be canonical than you should expect compression algorithms to produce canonical output across all usage in the wild (hint: they don't). However, people are drawn to their own destruction like moths to a flame, so here we are.
There are two opportunities for non-canonical encoding (and thus, detection of the same during decoding): the final bits
of the last encoded token in two or three token suffixes, and the = token used to inflate the suffix to a full four
tokens.
The trailing bits issue is unavoidable: with 6 bits available in each encoded token, 1 input byte takes 2 tokens, with the second one having some bits unused. Same for two input bytes: 16 bits, but 3 tokens have 18 bits. Unless we decide to stop shipping whole bytes around, we're stuck with those extra bits that a sneaky or buggy encoder might set to 1 instead of 0.
The = pad bytes, on the other hand, are entirely a self-own by the Base64 standard. They do not affect decoding other
than to provide an opportunity to say "that padding is incorrect". Exabytes of storage and transfer have no doubt been
wasted on pointless = bytes. Somehow we all seem to be quite comfortable with, say, hex-encoded data just stopping
when it's done rather than requiring a confirmation that the author of the encoder could count to four. Anyway, there
are two ways to make pad bytes predictable: require canonical padding to the next multiple of four bytes as per the RFC,
or, if you control all producers and consumers, save a few bytes by requiring no padding (especially applicable to the
url-safe alphabet).
All Engine implementations must at a minimum support treating non-canonical padding of both types as an error, and
optionally may allow other behaviors.
The minimum supported Rust version is 1.48.0.
Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody wants to chase bugs in encoding of any sort.
All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the free time to give each PR the attention it deserves. I will get to everyone eventually!
Benchmarks are in benches/.
cargo benchThis crate supports no_std. By default the crate targets std via the std feature. You can deactivate
the default-features to target core instead. In that case you lose out on all the functionality revolving
around std::io, std::error::Error, and heap allocations. There is an additional alloc feature that you can activate
to bring back the support for heap allocations.
On Linux, you can use perf for profiling. Then compile the
benchmarks with cargo bench --no-run.
Run the benchmark binary with perf (shown here filtering to one particular benchmark, which will make the results
easier to read). perf is only available to the root user on most systems as it fiddles with event counters in your
CPU, so use sudo. We need to run the actual benchmark binary, hence the path into target. You can see the actual
full path with cargo bench -v; it will print out the commands it runs. If you use the exact path
that bench outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want
to cargo clean so you have only one benchmarks- binary (they tend to accumulate).
sudo perf record target/release/deps/benchmarks-* --bench decode_10mib_reuseThen analyze the results, again with perf:
sudo perf annotate -lYou'll see a bunch of interleaved rust source and assembly like this. The section with lib.rs:327 is telling us that
4.02% of samples saw the movzbl aka bit shift as the active instruction. However, this percentage is not as exact as
it seems due to a phenomenon called skid. Basically, a consequence of how fancy modern CPUs are is that this sort of
instruction profiling is inherently inaccurate, especially in branch-heavy code.
lib.rs:322 0.70 : 10698: mov %rdi,%rax
2.82 : 1069b: shr $0x38,%rax
: if morsel == decode_tables::INVALID_VALUE {
: bad_byte_index = input_index;
: break;
: };
: accum = (morsel as u64) << 58;
lib.rs:327 4.02 : 1069f: movzbl (%r9,%rax,1),%r15d
: // fast loop of 8 bytes at a time
: while input_index < length_of_full_chunks {
: let mut accum: u64;
:
: let input_chunk = BigEndian::read_u64(&input_bytes[input_index..(input_index + 8)]);
: morsel = decode_table[(input_chunk >> 56) as usize];
lib.rs:322 3.68 : 106a4: cmp $0xff,%r15
: if morsel == decode_tables::INVALID_VALUE {
0.00 : 106ab: je 1090e <base64::decode_config_buf::hbf68a45fefa299c1+0x46e>
This uses cargo-fuzz. See fuzz/fuzzers for the available fuzzing scripts.
To run, use an invocation like these:
cargo +nightly fuzz run roundtrip
cargo +nightly fuzz run roundtrip_no_pad
cargo +nightly fuzz run roundtrip_random_config -- -max_len=10240
cargo +nightly fuzz run decode_randomThis project is dual-licensed under MIT and Apache 2.0.
dependencies {
implementation("io.github.kotlinmania:base64-kotlin:0.1.0")
}./gradlew build
./gradlew testSee AGENTS.md and CLAUDE.md for translator discipline, port-lint header convention, and Rust → Kotlin idiom mapping.
This Kotlin port is distributed under the same MIT license as the upstream marshallpierce/rust-base64. See LICENSE (and any sibling LICENSE-* / NOTICE files mirrored from upstream) for the full text.
Original work copyrighted by the rust-base64 authors.
Kotlin port: Copyright (c) 2026 Sydney Renee and The Solace Project.
Thanks to the marshallpierce/rust-base64 maintainers and contributors for the original Rust implementation. This port reproduces their work in Kotlin Multiplatform; bug reports about upstream design or behavior should go to the upstream repository.