
Library enables tokenization and parsing of expressions with a tokenizer, lexer, and scanner. Supports JSON, CSV, and custom languages, offering configurable expression parsing and dynamic lookahead.
Parser library for Kotlin consisting of a tokenizer and expression parser.
Tokenization is the process of splitting the input into a stream of token that is consumed by a parser.
In Parsek, this is distributed between two classes called Lexer and Scanner.
The lexer (source, kdoc) is basically an iterator for a stream of tokens that is generated by splitting the input using regular expressions.
Regular expressions are mapped to token types using a function which typically just returns a fixed token type inline. The function can be used to implement a second layer of mapping, but this should be fairly uncommon. Input mapped to null (typically whitespace) will not be reported.
The lexer is usually not used directly; instead, it's handed in to the Scanner, which in turn is used by the parser.
The reason for the Lexer/Scanner split is to separate "raw" parsing from providing a nice and convenient API. The small API surface of the Lexer allows us to easily install additional processing between the Lexer and Scanner, for instance for context-sensitive newline filtering.
Typically, the Lexer is constructed directly inline where the Scanner is constructed.
The token class (source, kdoc) stores the token type (typically a user-defined enum), the token text and the token position. Token instances are generated by the Lexer.
The RegularExpressions object (source, kdoc) contains a set of useful regular expressions for source code and data format tokenization.
The Scanner class (source, kdoc) provides a simple API for convenient access to the token stream generated by the Lexer.
The scanner provides a notion of a "current" token that can be inspected multiple times -- opposed to iterator.next(), where the current token is "gone" after the call. This makes it easy to hand the scanner with the current token down in a recursive descend parser until it is consumed and processed by the corresponding handler.
It provides unlimited dynamic lookahead.
It provides a tryConsume() convenience method that checks for a given token text and consumes the token and returns true when it was found.
Typical use cases that only need a scanner and no expression parser are data formats such as JSON or CSV.
For a simple example, please refer to the JSON parser example.
The configurable expression parser (source, kdoc) operates on a tokenizer, is stateless and should be shared / reused.
A simple example evaluating mathematical expressions directly (opposed to building an explicit parse tree) can be found in the tests
A complete PL/0 parser is included in the examples module to illustrate how to use the expression parser and tokenizer for a simple but computational complete language: Parser.kt, Pl0Test.kt
A parser for mathematical expressions: ExpressionParser.kt, ExpressionsTest.kt
A simple example for using the scanner and expression parser to implement a simple indentation-based programming language: mython, MythonTest.kt
A BASIC interpreter using Parsek: https://github.com/stefanhaustein/basik
Parser library for Kotlin consisting of a tokenizer and expression parser.
Tokenization is the process of splitting the input into a stream of token that is consumed by a parser.
In Parsek, this is distributed between two classes called Lexer and Scanner.
The lexer (source, kdoc) is basically an iterator for a stream of tokens that is generated by splitting the input using regular expressions.
Regular expressions are mapped to token types using a function which typically just returns a fixed token type inline. The function can be used to implement a second layer of mapping, but this should be fairly uncommon. Input mapped to null (typically whitespace) will not be reported.
The lexer is usually not used directly; instead, it's handed in to the Scanner, which in turn is used by the parser.
The reason for the Lexer/Scanner split is to separate "raw" parsing from providing a nice and convenient API. The small API surface of the Lexer allows us to easily install additional processing between the Lexer and Scanner, for instance for context-sensitive newline filtering.
Typically, the Lexer is constructed directly inline where the Scanner is constructed.
The token class (source, kdoc) stores the token type (typically a user-defined enum), the token text and the token position. Token instances are generated by the Lexer.
The RegularExpressions object (source, kdoc) contains a set of useful regular expressions for source code and data format tokenization.
The Scanner class (source, kdoc) provides a simple API for convenient access to the token stream generated by the Lexer.
The scanner provides a notion of a "current" token that can be inspected multiple times -- opposed to iterator.next(), where the current token is "gone" after the call. This makes it easy to hand the scanner with the current token down in a recursive descend parser until it is consumed and processed by the corresponding handler.
It provides unlimited dynamic lookahead.
It provides a tryConsume() convenience method that checks for a given token text and consumes the token and returns true when it was found.
Typical use cases that only need a scanner and no expression parser are data formats such as JSON or CSV.
For a simple example, please refer to the JSON parser example.
The configurable expression parser (source, kdoc) operates on a tokenizer, is stateless and should be shared / reused.
A simple example evaluating mathematical expressions directly (opposed to building an explicit parse tree) can be found in the tests
A complete PL/0 parser is included in the examples module to illustrate how to use the expression parser and tokenizer for a simple but computational complete language: Parser.kt, Pl0Test.kt
A parser for mathematical expressions: ExpressionParser.kt, ExpressionsTest.kt
A simple example for using the scanner and expression parser to implement a simple indentation-based programming language: mython, MythonTest.kt
A BASIC interpreter using Parsek: https://github.com/stefanhaustein/basik