kotlinx-charset

Minimal charset support replicates parts of the JDK's Charset API with restrictions. Features foundational components for charset creation, EBCDIC support, and JS/WASM-compatible encoding and decoding.

#web
#wasm
#text
#serialization

Suggest an edit

JVMKotlin/NativeWasmJS

GitHub stars5

Authorslppedd

Open issues0

LicenseMIT License

Creation date10 months ago

Last activityabout 1 month ago

Latest release0.1.8 (about 1 month ago)

GitHub repository

kotlinx-charset

Minimal charset support for Kotlin Multiplatform

Changelog

See CHANGELOG.md.

Supported Kotlin platforms

All Kotlin platforms are supported, except for Android Native.

Design considerations

kotlinx-charset aims to be straightforward for existing JDK consumers.
It replicates parts of the JDK's Charset API, albeit with more restrictions.

The core types you will work with are:

XCharset - describes a character set
XCharsetDecoder - decodes a byte sequence into characters
XCharsetEncoder - encodes characters into a byte sequence
XCharsetRegistrar - a registry and lookup service for obtaining XCharset instances

core

The core module provides the foundational components for implementing and registering new charsets.

// Create an empty charset registry
private val registrar = XCharsetRegistrar()

// Register a new charset
registrar.registerCharset(YourCharset())

// Retrieve and use a charset
val charset = registrar.getCharset("YourCharsetName")
val decoder = charset.newDecoder()
val encoder = charset.newEncoder()

ebcdic

The ebcdic module adds support for:

IBM037  IBM930   IBM1144
IBM273  IBM937   IBM1146
IBM278  IBM939   IBM1147
IBM280  IBM1047  IBM1390
IBM285  IBM1141  IBM1399
IBM297  IBM1143

You can register supported EBCDIC charsets to your XCharsetRegistrar via the provideCharsets function.

import com.lppedd.kotlinx.charset.ebcdic.provideCharsets as provideEbcdicCharsets

// Your shared charset registry
private val registrar = XCharsetRegistrar()

provideEbcdicCharsets(registrar)

exported

The exported module allows JS consumers to decode bytes and encode strings, using top-level functions exported from ECMAScript modules.

[!TIP]
Avoid using this module when consuming kotlinx-charset from a Kotlin project

You can depend on the @lppedd/kotlinx-charset npm package.
For example, consuming the library from TypeScript would look like:

import { decode, encode, getCharset } from "@lppedd/kotlinx-charset";

function funsExample(bytes: Uint8Array): Uint8Array {
  const str = decode("ibm037", bytes);
  return encode("ibm037", str);
}

// Alternatively, you can interact with a charset instance directly.
// This allows setting or removing (by passing null) the replacement character.
function instanceExample(bytes: Uint8Array): Uint8Array {
  const charset = getCharset("ibm037");

  const decoder = charset.newDecoder();
  const str = decoder.decode(bytes);

  const encoder = charset.newEncoder();
  return encoder.encode(str);
}

Both the decode and encode functions will throw an Error if the specified charset does not exist or if an error occurs during data processing.

JVMKotlin/NativeWasmJS

GitHub stars5

Authorslppedd

Open issues0

LicenseMIT License

Creation date10 months ago

Last activityabout 1 month ago

Latest release0.1.8 (about 1 month ago)

GitHub repository

kotlinx-charset

Minimal charset support for Kotlin Multiplatform

Changelog

See CHANGELOG.md.

Supported Kotlin platforms

All Kotlin platforms are supported, except for Android Native.

Design considerations

kotlinx-charset aims to be straightforward for existing JDK consumers.
It replicates parts of the JDK's Charset API, albeit with more restrictions.

The core types you will work with are:

XCharset - describes a character set
XCharsetDecoder - decodes a byte sequence into characters
XCharsetEncoder - encodes characters into a byte sequence
XCharsetRegistrar - a registry and lookup service for obtaining XCharset instances

core

The core module provides the foundational components for implementing and registering new charsets.

// Create an empty charset registry
private val registrar = XCharsetRegistrar()

// Register a new charset
registrar.registerCharset(YourCharset())

// Retrieve and use a charset
val charset = registrar.getCharset("YourCharsetName")
val decoder = charset.newDecoder()
val encoder = charset.newEncoder()

ebcdic

The ebcdic module adds support for:

IBM037  IBM930   IBM1144
IBM273  IBM937   IBM1146
IBM278  IBM939   IBM1147
IBM280  IBM1047  IBM1390
IBM285  IBM1141  IBM1399
IBM297  IBM1143

You can register supported EBCDIC charsets to your XCharsetRegistrar via the provideCharsets function.

import com.lppedd.kotlinx.charset.ebcdic.provideCharsets as provideEbcdicCharsets

// Your shared charset registry
private val registrar = XCharsetRegistrar()

provideEbcdicCharsets(registrar)

exported

The exported module allows JS consumers to decode bytes and encode strings, using top-level functions exported from ECMAScript modules.

[!TIP]
Avoid using this module when consuming kotlinx-charset from a Kotlin project

You can depend on the @lppedd/kotlinx-charset npm package.
For example, consuming the library from TypeScript would look like:

import { decode, encode, getCharset } from "@lppedd/kotlinx-charset";

function funsExample(bytes: Uint8Array): Uint8Array {
  const str = decode("ibm037", bytes);
  return encode("ibm037", str);
}

// Alternatively, you can interact with a charset instance directly.
// This allows setting or removing (by passing null) the replacement character.
function instanceExample(bytes: Uint8Array): Uint8Array {
  const charset = getCharset("ibm037");

  const decoder = charset.newDecoder();
  const str = decoder.decode(bytes);

  const encoder = charset.newEncoder();
  return encoder.encode(str);
}

Both the decode and encode functions will throw an Error if the specified charset does not exist or if an error occurs during data processing.