
Unicode-aware string segmentation: iterators for grapheme clusters, words and sentences per UAX#29; includes GraphemeCursor for random/bidirectional access, word indices and performance optimizations.
This is a Kotlin Multiplatform line-by-line transliteration port of unicode-rs/unicode-segmentation.
Original Project: This port is based on unicode-rs/unicode-segmentation. All design credit and project intent belong to the upstream authors; this repository is a faithful port to Kotlin Multiplatform with no behavioural changes intended.
This is an in-progress port. The goal is feature parity with the upstream Rust crate while providing a native Kotlin Multiplatform API. Every Kotlin file carries a // port-lint: source <path> header naming its upstream Rust counterpart so the AST-distance tool can track provenance.
The text below is reproduced and lightly edited from
https://github.com/unicode-rs/unicode-segmentation. It is the upstream project's own description and remains under the upstream authors' authorship; links have been rewritten to absolute upstream URLs so they continue to resolve from this repository.
Iterators which split strings on Grapheme Cluster, Word, or Sentence boundaries, according to the Unicode Standard Annex #29 rules.
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let s = "a̐éö̲\r\n";
let g = s.graphemes(true).collect::<Vec<&str>>();
let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
assert_eq!(g, b);
let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
let w = s.unicode_words().collect::<Vec<&str>>();
let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
assert_eq!(w, b);
let s = "The quick (\"brown\") fox";
let w = s.split_word_bounds().collect::<Vec<&str>>();
let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", "fox"];
assert_eq!(w, b);
}unicode-segmentation does not depend on libstd, so it can be used in crates
with the #![no_std] attribute.
You can use this package in your project by adding the following
to your Cargo.toml:
[dependencies]
unicode-segmentation = "1"Yanked due to accidental breakage and MSRV mistag.
#[inline] opportunities, resulting in 15-40% performance improvement.GraphemeCursor API allows random access and bidirectional iteration.as_str methods to the iterator types.dependencies {
implementation("io.github.kotlinmania:unicode-segmentation-kotlin:0.1.1")
}./gradlew build
./gradlew testSee AGENTS.md and CLAUDE.md for translator discipline, port-lint header convention, and Rust → Kotlin idiom mapping.
This Kotlin port is distributed under the same MIT license as the upstream unicode-rs/unicode-segmentation. See LICENSE (and any sibling LICENSE-* / NOTICE files mirrored from upstream) for the full text.
Original work copyrighted by the unicode-segmentation authors.
Kotlin port: Copyright (c) 2026 Sydney Renee and The Solace Project.
Thanks to the unicode-rs/unicode-segmentation maintainers and contributors for the original Rust implementation. This port reproduces their work in Kotlin Multiplatform; bug reports about upstream design or behavior should go to the upstream repository.
This is a Kotlin Multiplatform line-by-line transliteration port of unicode-rs/unicode-segmentation.
Original Project: This port is based on unicode-rs/unicode-segmentation. All design credit and project intent belong to the upstream authors; this repository is a faithful port to Kotlin Multiplatform with no behavioural changes intended.
This is an in-progress port. The goal is feature parity with the upstream Rust crate while providing a native Kotlin Multiplatform API. Every Kotlin file carries a // port-lint: source <path> header naming its upstream Rust counterpart so the AST-distance tool can track provenance.
The text below is reproduced and lightly edited from
https://github.com/unicode-rs/unicode-segmentation. It is the upstream project's own description and remains under the upstream authors' authorship; links have been rewritten to absolute upstream URLs so they continue to resolve from this repository.
Iterators which split strings on Grapheme Cluster, Word, or Sentence boundaries, according to the Unicode Standard Annex #29 rules.
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let s = "a̐éö̲\r\n";
let g = s.graphemes(true).collect::<Vec<&str>>();
let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
assert_eq!(g, b);
let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
let w = s.unicode_words().collect::<Vec<&str>>();
let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
assert_eq!(w, b);
let s = "The quick (\"brown\") fox";
let w = s.split_word_bounds().collect::<Vec<&str>>();
let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", "fox"];
assert_eq!(w, b);
}unicode-segmentation does not depend on libstd, so it can be used in crates
with the #![no_std] attribute.
You can use this package in your project by adding the following
to your Cargo.toml:
[dependencies]
unicode-segmentation = "1"Yanked due to accidental breakage and MSRV mistag.
#[inline] opportunities, resulting in 15-40% performance improvement.GraphemeCursor API allows random access and bidirectional iteration.as_str methods to the iterator types.dependencies {
implementation("io.github.kotlinmania:unicode-segmentation-kotlin:0.1.1")
}./gradlew build
./gradlew testSee AGENTS.md and CLAUDE.md for translator discipline, port-lint header convention, and Rust → Kotlin idiom mapping.
This Kotlin port is distributed under the same MIT license as the upstream unicode-rs/unicode-segmentation. See LICENSE (and any sibling LICENSE-* / NOTICE files mirrored from upstream) for the full text.
Original work copyrighted by the unicode-segmentation authors.
Kotlin port: Copyright (c) 2026 Sydney Renee and The Solace Project.
Thanks to the unicode-rs/unicode-segmentation maintainers and contributors for the original Rust implementation. This port reproduces their work in Kotlin Multiplatform; bug reports about upstream design or behavior should go to the upstream repository.