
Composable DSL for async operations implementing Timeout, Retry (with backoffs), Circuit Breaker, Rate Limiter, Bulkhead, Hedging, in-memory TTL Cache and Fallback, with real-time telemetry stream.
A Kotlin Multiplatform library providing resilience patterns (Timeout, Retry, Circuit Breaker, Rate Limiter, Bulkhead, Hedging, Cache, Fallback) for suspend functions. Compose them declaratively with a small DSL and observe runtime telemetry via Flow.
import com.santimattius.resilient.composition.resilient
import com.santimattius.resilient.retry.ExponentialBackoff
import kotlin.time.Duration.Companion.seconds
import kotlin.time.Duration.Companion.milliseconds
val policy = resilient {
timeout { timeout = 2.seconds }
retry {
maxAttempts = 3
backoffStrategy = ExponentialBackoff(initialDelay = 100.milliseconds)
}
circuitBreaker {
failureThreshold = 5
successThreshold = 2
timeout = 30.seconds
}
}
suspend fun call(): String = policy.execute { fetchData() }
Observe runtime events from policy.events (SharedFlow):
val job = scope.launch {
policy.events.collect { event -> println(event) }
}
Outer → Inner (default):
The composition order is configurable. Use compositionOrder() to specify a custom order:
import com.santimattius.resilient.composition.OrderablePolicyType
val policy = resilient(scope) {
compositionOrder(listOf(
OrderablePolicyType.CACHE, // Check cache first (after Fallback)
OrderablePolicyType.TIMEOUT, // Then apply timeout
OrderablePolicyType.RETRY, // Retry on failures
OrderablePolicyType.CIRCUIT_BREAKER,
OrderablePolicyType.RATE_LIMITER,
OrderablePolicyType.BULKHEAD,
OrderablePolicyType.HEDGING
))
// Fallback is automatically added as the outermost policy
// ... configure policies
}
Important Notes:
OrderablePolicyType - it is always positioned outermost automaticallycompositionOrder(listOf(..., OrderablePolicyType.RETRY, OrderablePolicyType.TIMEOUT, ...)) if you want per-attempt timeout.Aborts the operation after a configured duration. onTimeout runs only on actual timeouts. This is a single time limit for the block as wrapped by the policy (see Timeout vs Retry order). For a timeout per retry attempt, use Retry with perAttemptTimeout instead.
val policy = resilient {
timeout {
timeout = 3.seconds
onTimeout = { /* log metric */ }
}
}
Retries failing operations according to a backoff strategy and predicate. Use shouldRetry to avoid retrying non-transient errors (e.g. 4xx client errors); retry only on 5xx, network/IO, or timeouts. Optional perAttemptTimeout limits each attempt (including the first) so a single slow attempt does not consume the whole retry budget.
import com.santimattius.resilient.retry.*
val policy = resilient {
retry {
maxAttempts = 4
shouldRetry = { it is java.io.IOException } // e.g. only retry IO; do not retry 4xx
perAttemptTimeout = 5.seconds // optional: timeout per attempt
backoffStrategy = ExponentialBackoff(
initialDelay = 200.milliseconds,
maxDelay = 5.seconds,
factor = 2.0,
jitter = true
)
}
}
Supported backoffs: ExponentialBackoff, LinearBackoff, FixedBackoff.
Prevents hammering failing downstreams. States: CLOSED → OPEN → HALF_OPEN.
val policy = resilient {
circuitBreaker {
failureThreshold = 5
successThreshold = 2
halfOpenMaxCalls = 1
timeout = 60.seconds // OPEN duration
}
}
Token-bucket rate limiting: tokens refill at the start of each period (fixed-window refill). With maxCalls = 10 and period = 1.seconds, at most 10 calls per second are allowed. One bucket per policy (global limit). Optional max wait when limited.
val policy = resilient {
rateLimiter {
maxCalls = 10
period = 1.seconds
timeoutWhenLimited = 5.seconds // throw if wait would exceed
onRateLimited = { /* metric */ }
}
}
Limit concurrent executions and queued waiters; optional acquire timeout.
val policy = resilient {
bulkhead {
maxConcurrentCalls = 8
maxWaitingCalls = 32
timeout = 2.seconds
}
}
Launch parallel attempts and return the first successful result. Use carefully (extra load).
val policy = resilient {
hedging {
attempts = 3
stagger = 50.milliseconds
}
}
Cache successful results per key with TTL. Use a fixed key or a keyProvider (suspend) for dynamic keys (e.g. per user, per request). Invalidation is available via policy.cacheHandle when cache is configured.
val policy = resilient(scope) {
cache {
key = "users:123" // fixed key
// or keyProvider = { "user:${userId}" } // dynamic key
ttl = 30.seconds
}
}
// Invalidate by key or prefix (e.g. after write)
policy.cacheHandle?.invalidate("users:123")
policy.cacheHandle?.invalidatePrefix("user:")
The in-memory implementation is one possible CachePolicy; custom backends (e.g. persistent storage or Redis) can implement the same interface.
Use policy.getHealthSnapshot() to build health or readiness endpoints (e.g. Kubernetes probes, /health API). The snapshot includes circuit breaker state and counters, and bulkhead usage when configured.
import com.santimattius.resilient.circuitbreaker.CircuitState
val snapshot = policy.getHealthSnapshot()
// Circuit breaker: is the circuit open?
val healthy = snapshot.circuitBreaker?.state != CircuitState.OPEN
// Optional: snapshot.circuitBreaker?.failureCount, successCount for metrics
// Bulkhead: active and waiting counts
snapshot.bulkhead?.let { bh ->
// bh.activeConcurrentCalls, bh.waitingCalls, bh.maxConcurrentCalls, bh.maxWaitingCalls
}
Return a fallback value when the operation fails. Ensure the fallback return type matches the type returned by the block passed to execute(), otherwise a ClassCastException may occur at runtime.
import com.santimattius.resilient.fallback.FallbackConfig
val policy = resilient {
fallback(FallbackConfig { err ->
// produce a fallback from error (log/metric here)
"fallback-value"
})
}
val policy = resilient {
cache { key = "profile:42"; ttl = 20.seconds }
timeout { timeout = 2.seconds }
retry {
maxAttempts = 3
backoffStrategy = ExponentialBackoff(initialDelay = 100.milliseconds)
}
circuitBreaker { failureThreshold = 5; successThreshold = 2; halfOpenMaxCalls = 1 }
rateLimiter { maxCalls = 50; period = 1.seconds }
bulkhead { maxConcurrentCalls = 10; maxWaitingCalls = 100 }
hedging { attempts = 2 }
fallback(FallbackConfig { "cached-or-default" })
}
suspend fun load(): User = policy.execute { loadProfile() }
A ready-to-run sample is available at:
androidApp/src/main/kotlin/com/santimattius/kmp/ResilientExample.ktUse it from MainActivity:
setContent { ResilientExample() }
A Kotlin Multiplatform library providing resilience patterns (Timeout, Retry, Circuit Breaker, Rate Limiter, Bulkhead, Hedging, Cache, Fallback) for suspend functions. Compose them declaratively with a small DSL and observe runtime telemetry via Flow.
import com.santimattius.resilient.composition.resilient
import com.santimattius.resilient.retry.ExponentialBackoff
import kotlin.time.Duration.Companion.seconds
import kotlin.time.Duration.Companion.milliseconds
val policy = resilient {
timeout { timeout = 2.seconds }
retry {
maxAttempts = 3
backoffStrategy = ExponentialBackoff(initialDelay = 100.milliseconds)
}
circuitBreaker {
failureThreshold = 5
successThreshold = 2
timeout = 30.seconds
}
}
suspend fun call(): String = policy.execute { fetchData() }
Observe runtime events from policy.events (SharedFlow):
val job = scope.launch {
policy.events.collect { event -> println(event) }
}
Outer → Inner (default):
The composition order is configurable. Use compositionOrder() to specify a custom order:
import com.santimattius.resilient.composition.OrderablePolicyType
val policy = resilient(scope) {
compositionOrder(listOf(
OrderablePolicyType.CACHE, // Check cache first (after Fallback)
OrderablePolicyType.TIMEOUT, // Then apply timeout
OrderablePolicyType.RETRY, // Retry on failures
OrderablePolicyType.CIRCUIT_BREAKER,
OrderablePolicyType.RATE_LIMITER,
OrderablePolicyType.BULKHEAD,
OrderablePolicyType.HEDGING
))
// Fallback is automatically added as the outermost policy
// ... configure policies
}
Important Notes:
OrderablePolicyType - it is always positioned outermost automaticallycompositionOrder(listOf(..., OrderablePolicyType.RETRY, OrderablePolicyType.TIMEOUT, ...)) if you want per-attempt timeout.Aborts the operation after a configured duration. onTimeout runs only on actual timeouts. This is a single time limit for the block as wrapped by the policy (see Timeout vs Retry order). For a timeout per retry attempt, use Retry with perAttemptTimeout instead.
val policy = resilient {
timeout {
timeout = 3.seconds
onTimeout = { /* log metric */ }
}
}
Retries failing operations according to a backoff strategy and predicate. Use shouldRetry to avoid retrying non-transient errors (e.g. 4xx client errors); retry only on 5xx, network/IO, or timeouts. Optional perAttemptTimeout limits each attempt (including the first) so a single slow attempt does not consume the whole retry budget.
import com.santimattius.resilient.retry.*
val policy = resilient {
retry {
maxAttempts = 4
shouldRetry = { it is java.io.IOException } // e.g. only retry IO; do not retry 4xx
perAttemptTimeout = 5.seconds // optional: timeout per attempt
backoffStrategy = ExponentialBackoff(
initialDelay = 200.milliseconds,
maxDelay = 5.seconds,
factor = 2.0,
jitter = true
)
}
}
Supported backoffs: ExponentialBackoff, LinearBackoff, FixedBackoff.
Prevents hammering failing downstreams. States: CLOSED → OPEN → HALF_OPEN.
val policy = resilient {
circuitBreaker {
failureThreshold = 5
successThreshold = 2
halfOpenMaxCalls = 1
timeout = 60.seconds // OPEN duration
}
}
Token-bucket rate limiting: tokens refill at the start of each period (fixed-window refill). With maxCalls = 10 and period = 1.seconds, at most 10 calls per second are allowed. One bucket per policy (global limit). Optional max wait when limited.
val policy = resilient {
rateLimiter {
maxCalls = 10
period = 1.seconds
timeoutWhenLimited = 5.seconds // throw if wait would exceed
onRateLimited = { /* metric */ }
}
}
Limit concurrent executions and queued waiters; optional acquire timeout.
val policy = resilient {
bulkhead {
maxConcurrentCalls = 8
maxWaitingCalls = 32
timeout = 2.seconds
}
}
Launch parallel attempts and return the first successful result. Use carefully (extra load).
val policy = resilient {
hedging {
attempts = 3
stagger = 50.milliseconds
}
}
Cache successful results per key with TTL. Use a fixed key or a keyProvider (suspend) for dynamic keys (e.g. per user, per request). Invalidation is available via policy.cacheHandle when cache is configured.
val policy = resilient(scope) {
cache {
key = "users:123" // fixed key
// or keyProvider = { "user:${userId}" } // dynamic key
ttl = 30.seconds
}
}
// Invalidate by key or prefix (e.g. after write)
policy.cacheHandle?.invalidate("users:123")
policy.cacheHandle?.invalidatePrefix("user:")
The in-memory implementation is one possible CachePolicy; custom backends (e.g. persistent storage or Redis) can implement the same interface.
Use policy.getHealthSnapshot() to build health or readiness endpoints (e.g. Kubernetes probes, /health API). The snapshot includes circuit breaker state and counters, and bulkhead usage when configured.
import com.santimattius.resilient.circuitbreaker.CircuitState
val snapshot = policy.getHealthSnapshot()
// Circuit breaker: is the circuit open?
val healthy = snapshot.circuitBreaker?.state != CircuitState.OPEN
// Optional: snapshot.circuitBreaker?.failureCount, successCount for metrics
// Bulkhead: active and waiting counts
snapshot.bulkhead?.let { bh ->
// bh.activeConcurrentCalls, bh.waitingCalls, bh.maxConcurrentCalls, bh.maxWaitingCalls
}
Return a fallback value when the operation fails. Ensure the fallback return type matches the type returned by the block passed to execute(), otherwise a ClassCastException may occur at runtime.
import com.santimattius.resilient.fallback.FallbackConfig
val policy = resilient {
fallback(FallbackConfig { err ->
// produce a fallback from error (log/metric here)
"fallback-value"
})
}
val policy = resilient {
cache { key = "profile:42"; ttl = 20.seconds }
timeout { timeout = 2.seconds }
retry {
maxAttempts = 3
backoffStrategy = ExponentialBackoff(initialDelay = 100.milliseconds)
}
circuitBreaker { failureThreshold = 5; successThreshold = 2; halfOpenMaxCalls = 1 }
rateLimiter { maxCalls = 50; period = 1.seconds }
bulkhead { maxConcurrentCalls = 10; maxWaitingCalls = 100 }
hedging { attempts = 2 }
fallback(FallbackConfig { "cached-or-default" })
}
suspend fun load(): User = policy.execute { loadProfile() }
A ready-to-run sample is available at:
androidApp/src/main/kotlin/com/santimattius/kmp/ResilientExample.ktUse it from MainActivity:
setContent { ResilientExample() }