
Enables creation of text or token parsers using easily combinable building blocks, drawing inspiration from JParsec and Haskell's Parsec for building parser combinators.
Parsek is a library for (and written in) Kotlin for easily building parser combinators. It is based on JParsec and (Haskell) Parsec. It allows you to create a text (or token) parser based on easy-to-combine building blocks.
Parsek is a functional parser combinator library that provides tools to construct complex parsers by combining smaller, reusable components. It is designed to be:
Parsek is versatile and can be used for a variety of parsing tasks, such as:
Add Parsek to your project using Gradle or Maven.
dependencies {
implementation 'nl.w8mr.parsek:core:<latest-version>'
}<dependency>
<groupId>nl.w8mr.parsek</groupId>
<artifactId>core</artifactId>
<version><!-- latest-version --></version>
</dependency>Replace <latest-version> with the version shown in the badge above.
Here's a minimal example to get you started:
//import nl.w8mr.parsek.text.*
val parser = number // Parses a sequence of digits as an Int
val result = parser("123abc") // result: 123You can also combine parsers:
val signed = signedNumber
println(signed("-42")) // Output: -42
println(signed("17")) // Output: 17At its core, Parsek operates on the concept of a Parser. A Parser is a function that takes an input, consumes a part of it, and returns a result along with the remaining input. The result can either be a success or a failure.
interface Parser<Token, R> {
fun apply(source: ParserSource<Token>): Result<R>
sealed class Result<R> {
data class Success<R>(val value: R) : Result<R>()
data class Failure<R>(val message: String) : Result<R>()
}
}While the core library is generic, the nl.w8mr.parsek.text package provides utilities specifically for parsing CharSequence (e.g., String). These text-specific parsers simplify common tasks like matching characters, strings, or patterns.
For example, instead of writing a generic parser for a specific character, you can use the char function from the text package:
val digit = char { it.isDigit() }String. This ensures consistency when working with text-based data.repeat or sepBy combinator), the result is automatically concatenated into a single string.For example:
val digit = char { it.isDigit() }
val digits = repeat(digit)
// Input: "123abc"
// Output: "123" (list of digits concatenated into a single string)
digits("123abc")There are also predefined parsers for common patterns, such as digit, letter, number, and more.
val parser = digit
parser("5abc") shouldBe "5"val parser = number
parser("123abc") shouldBe 123val identifier = letter and some(letter or digit)
identifier("abc123") shouldBe "abc123"val parser = signedNumber
parser("-42") shouldBe -42
parser("17") shouldBe 17val digit = char { it.isDigit() }
val number = repeat(digit, min = 1) map { it.joinToString("").toInt() }
val comma = char(',')
val numberList = number sepBy comma
numberList("123,45,6") // Success: ([123, 45, 6], "")Using the text package, the same parser can be written more concisely:
val numberList = string("123,45,6")
numberList("123,45,6") // Success: ([123, 45, 6], "")val openBracket = literal('[')
val closeBracket = literal(']')
val comma = char(',')
val number = repeat(char { it.isDigit() }, min = 1) map { it.toInt() }
val value = ref(::list) or number
val list: Parser<Char, List<Any>> = openBracket and (value sepBy comma) and closeBracket
list("[1,[2,3],4]") shouldBe
listOf(1, listOf(2, 3), 4)val list = string("[1,[2,3],4]")
list("[1,[2,3],4]") // Success: ([1, [2, 3], 4], "")Parsek provides robust error handling through its Result class. A parser can return either:
For example:
val parser = char('a')
parser("b") // Failure: Expected 'a', but found 'b'.General parsers operate on any sequence of tokens. You define what a "token" is, making these parsers highly flexible for non-text inputs (e.g., token streams from a lexer).
Text parsers are optimized for CharSequence inputs. They provide utilities for common text-parsing tasks, such as matching characters, strings, or patterns. These parsers are more concise and easier to use for text-based grammars.
The combinator DSL in Parsek provides a powerful and expressive way to define parsers. It allows you to combine multiple parsers into a single cohesive unit, producing a structured output object. Additionally, the DSL provides mechanisms to handle failure responses explicitly, enabling custom error handling and recovery strategies.
combi: A DSL block for combining multiple parsers, handling their results and errors in a structured way.bind: Used inside a combi block to run a parser and extract its result, or fail if the parser fails.val keyValueParser = combi<Char, Pair<String, String>> {
val key = repeat(char { it.isLetterOrDigit() }).bind()
-char('=')
val value = repeat(char { it.isLetterOrDigit() || it == ' ' }, min = 1).bind()
key to value
}
keyValueParser("username=John Doe") // Success: ("username" to "John Doe")In this example:
key parser extracts the key (e.g., username).value parser extracts the value (e.g., John Doe).Pair object.The combinator DSL also allows you to handle failure responses explicitly. This is useful when you want to provide custom error messages or fallback behavior.
val safeKeyValueParser = combi {
val key = repeat(char { it.isLetterOrDigit() }).bind()
if (key.isEmpty()) {
fail("Key cannot be empty")
}
-char('=')
val value = repeat(char { it.isLetterOrDigit() || it == ' ' }, min = 1).bind()
if (value.isEmpty()) {
fail("Value cannot be empty")
}
key to value
}
val result = safeKeyValueParser("=John Doe")
if (result is Parser.Failure) {
println("Parsing failed: ${result.message}")
}The bindAsResult and Result.bind methods allow you to customize how failures are handled within a parser. These methods are particularly useful when you want to propagate or transform failure results explicitly.
val repeatedParser = combi {
val list = mutableListOf<String>()
val parser = char { it.isLetter() }
while (list.size < 3) { // Ensure at least 3 elements are parsed
when (val result = parser.bindAsResult()) {
is Parser.Success -> list.add(result.bind())
is Parser.Failure -> fail("Only ${list.size} elements found, needed at least 3")
}
}
while (list.size < 5) { // Parse up to 5 elements
when (val result = parser.bindAsResult()) {
is Parser.Success -> list.add(result.bind())
is Parser.Failure -> break
}
}
list
}
repeatedParser("abcde") shouldBe listOf("a", "b", "c", "d", "e")
shouldThrowMessage<ParseException>("Combinator failed, parser number 3 with error: Only 2 elements found, needed at least 3") {
repeatedParser("ab") shouldBe listOf("a", "b")
}Parsek is designed to be lightweight and efficient for most parsing tasks. For very large inputs or performance-critical applications, consider benchmarking against alternatives. Contributions with benchmarks are welcome!
We welcome contributions! To get started:
Please follow the Kotlin coding conventions and write tests for new features.
Parsek is licensed under the MIT License. See LICENSE for details.
Parsek is a library for (and written in) Kotlin for easily building parser combinators. It is based on JParsec and (Haskell) Parsec. It allows you to create a text (or token) parser based on easy-to-combine building blocks.
Parsek is a functional parser combinator library that provides tools to construct complex parsers by combining smaller, reusable components. It is designed to be:
Parsek is versatile and can be used for a variety of parsing tasks, such as:
Add Parsek to your project using Gradle or Maven.
dependencies {
implementation 'nl.w8mr.parsek:core:<latest-version>'
}<dependency>
<groupId>nl.w8mr.parsek</groupId>
<artifactId>core</artifactId>
<version><!-- latest-version --></version>
</dependency>Replace <latest-version> with the version shown in the badge above.
Here's a minimal example to get you started:
//import nl.w8mr.parsek.text.*
val parser = number // Parses a sequence of digits as an Int
val result = parser("123abc") // result: 123You can also combine parsers:
val signed = signedNumber
println(signed("-42")) // Output: -42
println(signed("17")) // Output: 17At its core, Parsek operates on the concept of a Parser. A Parser is a function that takes an input, consumes a part of it, and returns a result along with the remaining input. The result can either be a success or a failure.
interface Parser<Token, R> {
fun apply(source: ParserSource<Token>): Result<R>
sealed class Result<R> {
data class Success<R>(val value: R) : Result<R>()
data class Failure<R>(val message: String) : Result<R>()
}
}While the core library is generic, the nl.w8mr.parsek.text package provides utilities specifically for parsing CharSequence (e.g., String). These text-specific parsers simplify common tasks like matching characters, strings, or patterns.
For example, instead of writing a generic parser for a specific character, you can use the char function from the text package:
val digit = char { it.isDigit() }String. This ensures consistency when working with text-based data.repeat or sepBy combinator), the result is automatically concatenated into a single string.For example:
val digit = char { it.isDigit() }
val digits = repeat(digit)
// Input: "123abc"
// Output: "123" (list of digits concatenated into a single string)
digits("123abc")There are also predefined parsers for common patterns, such as digit, letter, number, and more.
val parser = digit
parser("5abc") shouldBe "5"val parser = number
parser("123abc") shouldBe 123val identifier = letter and some(letter or digit)
identifier("abc123") shouldBe "abc123"val parser = signedNumber
parser("-42") shouldBe -42
parser("17") shouldBe 17val digit = char { it.isDigit() }
val number = repeat(digit, min = 1) map { it.joinToString("").toInt() }
val comma = char(',')
val numberList = number sepBy comma
numberList("123,45,6") // Success: ([123, 45, 6], "")Using the text package, the same parser can be written more concisely:
val numberList = string("123,45,6")
numberList("123,45,6") // Success: ([123, 45, 6], "")val openBracket = literal('[')
val closeBracket = literal(']')
val comma = char(',')
val number = repeat(char { it.isDigit() }, min = 1) map { it.toInt() }
val value = ref(::list) or number
val list: Parser<Char, List<Any>> = openBracket and (value sepBy comma) and closeBracket
list("[1,[2,3],4]") shouldBe
listOf(1, listOf(2, 3), 4)val list = string("[1,[2,3],4]")
list("[1,[2,3],4]") // Success: ([1, [2, 3], 4], "")Parsek provides robust error handling through its Result class. A parser can return either:
For example:
val parser = char('a')
parser("b") // Failure: Expected 'a', but found 'b'.General parsers operate on any sequence of tokens. You define what a "token" is, making these parsers highly flexible for non-text inputs (e.g., token streams from a lexer).
Text parsers are optimized for CharSequence inputs. They provide utilities for common text-parsing tasks, such as matching characters, strings, or patterns. These parsers are more concise and easier to use for text-based grammars.
The combinator DSL in Parsek provides a powerful and expressive way to define parsers. It allows you to combine multiple parsers into a single cohesive unit, producing a structured output object. Additionally, the DSL provides mechanisms to handle failure responses explicitly, enabling custom error handling and recovery strategies.
combi: A DSL block for combining multiple parsers, handling their results and errors in a structured way.bind: Used inside a combi block to run a parser and extract its result, or fail if the parser fails.val keyValueParser = combi<Char, Pair<String, String>> {
val key = repeat(char { it.isLetterOrDigit() }).bind()
-char('=')
val value = repeat(char { it.isLetterOrDigit() || it == ' ' }, min = 1).bind()
key to value
}
keyValueParser("username=John Doe") // Success: ("username" to "John Doe")In this example:
key parser extracts the key (e.g., username).value parser extracts the value (e.g., John Doe).Pair object.The combinator DSL also allows you to handle failure responses explicitly. This is useful when you want to provide custom error messages or fallback behavior.
val safeKeyValueParser = combi {
val key = repeat(char { it.isLetterOrDigit() }).bind()
if (key.isEmpty()) {
fail("Key cannot be empty")
}
-char('=')
val value = repeat(char { it.isLetterOrDigit() || it == ' ' }, min = 1).bind()
if (value.isEmpty()) {
fail("Value cannot be empty")
}
key to value
}
val result = safeKeyValueParser("=John Doe")
if (result is Parser.Failure) {
println("Parsing failed: ${result.message}")
}The bindAsResult and Result.bind methods allow you to customize how failures are handled within a parser. These methods are particularly useful when you want to propagate or transform failure results explicitly.
val repeatedParser = combi {
val list = mutableListOf<String>()
val parser = char { it.isLetter() }
while (list.size < 3) { // Ensure at least 3 elements are parsed
when (val result = parser.bindAsResult()) {
is Parser.Success -> list.add(result.bind())
is Parser.Failure -> fail("Only ${list.size} elements found, needed at least 3")
}
}
while (list.size < 5) { // Parse up to 5 elements
when (val result = parser.bindAsResult()) {
is Parser.Success -> list.add(result.bind())
is Parser.Failure -> break
}
}
list
}
repeatedParser("abcde") shouldBe listOf("a", "b", "c", "d", "e")
shouldThrowMessage<ParseException>("Combinator failed, parser number 3 with error: Only 2 elements found, needed at least 3") {
repeatedParser("ab") shouldBe listOf("a", "b")
}Parsek is designed to be lightweight and efficient for most parsing tasks. For very large inputs or performance-critical applications, consider benchmarking against alternatives. Contributions with benchmarks are welcome!
We welcome contributions! To get started:
Please follow the Kotlin coding conventions and write tests for new features.
Parsek is licensed under the MIT License. See LICENSE for details.