
Composable DSL for async operations implementing Timeout, Retry (with backoffs), Circuit Breaker, Rate Limiter, Bulkhead, Hedging, in-memory TTL Cache and Fallback, with real-time telemetry stream.
A Kotlin Multiplatform library providing resilience patterns (Timeout, Retry, Circuit Breaker, Rate Limiter, Bulkhead, Hedging, Cache, Fallback) for suspend functions. Compose them declaratively with a small DSL and observe runtime telemetry via Flow.
import com.santimattius.resilient.composition.resilient
import com.santimattius.resilient.retry.ExponentialBackoff
import kotlin.time.Duration.Companion.seconds
import kotlin.time.Duration.Companion.milliseconds
val policy = resilient {
timeout { timeout = 2.seconds }
retry {
maxAttempts = 3
backoffStrategy = ExponentialBackoff(initialDelay = 100.milliseconds)
}
circuitBreaker {
failureThreshold = 5
successThreshold = 2
timeout = 30.seconds
}
}
suspend fun call(): String = policy.execute { fetchData() }Observe runtime events from policy.events (SharedFlow):
val job = scope.launch {
policy.events.collect { event -> println(event) }
}Outer → Inner (default):
The composition order is configurable. Use compositionOrder() to specify a custom order:
import com.santimattius.resilient.composition.OrderablePolicyType
val policy = resilient(scope) {
compositionOrder(listOf(
OrderablePolicyType.CACHE, // Check cache first (after Fallback)
OrderablePolicyType.COALESCE, // Deduplicate in-flight requests by key
OrderablePolicyType.TIMEOUT, // Then apply timeout
OrderablePolicyType.RETRY, // Retry on failures
OrderablePolicyType.CIRCUIT_BREAKER,
OrderablePolicyType.RATE_LIMITER,
OrderablePolicyType.BULKHEAD,
OrderablePolicyType.HEDGING
))
// Fallback is automatically added as the outermost policy
// ... configure policies
}Important Notes:
OrderablePolicyType - it is always positioned outermost automaticallycompositionOrder(...) accepts a subset: policy types not included are appended in the default ordercompositionOrder(listOf(..., OrderablePolicyType.RETRY, OrderablePolicyType.TIMEOUT, ...)) if you want per-attempt timeout.Aborts the operation after a configured duration. onTimeout runs only on actual timeouts. This is a single time limit for the block as wrapped by the policy (see Timeout vs Retry order). For a timeout per retry attempt, use Retry with perAttemptTimeout instead.
val policy = resilient {
timeout {
timeout = 3.seconds
onTimeout = { /* log metric */ }
}
}Retries failing operations according to a backoff strategy and predicate. Use shouldRetry to avoid retrying non-transient errors (e.g. 4xx client errors); retry only on 5xx, network/IO, or timeouts. Optional perAttemptTimeout limits each attempt (including the first) so a single slow attempt does not consume the whole retry budget.
Optional shouldRetryResult retries when the block returns successfully but the value is not acceptable (e.g. HTTP 202, empty body, “not ready” flag). Return true from the predicate to request another attempt; it shares the same maxAttempts budget as exception-based retries. onRetry receives RetryableResultException (with lastValue) for telemetry; if attempts are exhausted while the predicate stays true, the last returned value is returned (unlike exception exhaustion, which rethrows).
import com.santimattius.resilient.retry.*
val policy = resilient {
retry {
maxAttempts = 4
shouldRetry = { it is java.io.IOException } // e.g. only retry IO; do not retry 4xx
shouldRetryResult = { status -> (status as Int) >= 500 } // e.g. retry on 5xx-like codes returned as value
perAttemptTimeout = 5.seconds // optional: timeout per attempt
backoffStrategy = ExponentialBackoff(
initialDelay = 200.milliseconds,
maxDelay = 5.seconds,
factor = 2.0,
jitter = true
)
}
}Supported backoffs: ExponentialBackoff, LinearBackoff, FixedBackoff, DecorrelatedJitterBackoff.
DecorrelatedJitterBackoff implements the AWS decorrelated jitter formula — each caller's delay sequence is independent of every other caller's, which breaks synchronized retry waves in high-concurrency scenarios:
import com.santimattius.resilient.retry.DecorrelatedJitterBackoff
val policy = resilient {
retry {
maxAttempts = 4
backoffStrategy = DecorrelatedJitterBackoff(base = 100.milliseconds, cap = 10.seconds)
}
}Prevents hammering failing downstreams. States: CLOSED → OPEN → HALF_OPEN.
Three trip modes are available:
failureThreshold consecutive failures open the circuit; a success resets the count.slidingWindow): failureThreshold failures within a time window open the circuit.failureRateThreshold): circuit opens when the failure rate over the last minimumNumberOfCalls outcomes meets or exceeds a percentage.val policy = resilient(scope) {
circuitBreaker {
failureThreshold = 5
successThreshold = 2
halfOpenMaxCalls = 1
timeout = 60.seconds
// slidingWindow = 30.seconds // time-based mode (mutually exclusive with failureRateThreshold)
// failureRateThreshold = 50.0 // failure-rate mode — opens at 50% over last minimumNumberOfCalls
// minimumNumberOfCalls = 10 // ring-buffer size (default 10)
}
}shouldRecordResult counts a successful return value as a failure (while still returning it to the caller) — useful when the downstream signals errors inside HTTP 200 responses:
val policy = resilient(scope) {
circuitBreaker {
failureThreshold = 3
shouldRecordResult = { result -> result is ApiResponse && result.code == 503 }
}
}Named / shared circuit breaker: use CircuitBreakerRegistry so multiple policies share the same breaker state (e.g. all "payments" calls trip a single breaker). Cannot combine circuitBreaker { } and circuitBreakerNamed(...) in the same policy.
import com.santimattius.resilient.circuitbreaker.CircuitBreakerRegistry
val circuitBreakers = CircuitBreakerRegistry()
val policyA = resilient(scope) {
circuitBreakerNamed(circuitBreakers, "payments") {
failureThreshold = 5
timeout = 30.seconds
}
}
val policyB = resilient(scope) {
circuitBreakerNamed(circuitBreakers, "payments") { } // same instance as policyA
}Token-bucket rate limiting: tokens refill at the start of each period (fixed-window refill). With maxCalls = 10 and period = 1.seconds, at most 10 calls per second are allowed. One bucket per policy (global limit). Optional max wait when limited.
val policy = resilient {
rateLimiter {
maxCalls = 10
period = 1.seconds
timeoutWhenLimited = 5.seconds // throw if wait would exceed
onRateLimited = { /* metric */ }
}
}Named / shared rate limiter: use RateLimiterRegistry so multiple policies share the same token-bucket quota (e.g. all "payments" calls consume from the same pool). Cannot combine rateLimiter { } and rateLimiterNamed(...) in the same policy.
import com.santimattius.resilient.ratelimiter.RateLimiterRegistry
val rateLimiters = RateLimiterRegistry()
val policyA = resilient(scope) {
rateLimiterNamed(rateLimiters, "payments") { maxCalls = 10; period = 1.seconds }
}
val policyB = resilient(scope) {
rateLimiterNamed(rateLimiters, "payments") { } // same instance as policyA
}Limit concurrent executions and queued waiters; optional acquire timeout.
val policy = resilient(scope) {
bulkhead {
maxConcurrentCalls = 8
maxWaitingCalls = 32
timeout = 2.seconds
}
}Named / shared bulkhead: use BulkheadRegistry so several policies share the same pool (e.g. one limit for all "database" calls). Cannot combine bulkhead { } and bulkheadNamed(...) in the same policy.
import com.santimattius.resilient.bulkhead.BulkheadRegistry
val bulkheads = BulkheadRegistry()
val policyA = resilient(scope) {
bulkheadNamed(bulkheads, "database") {
maxConcurrentCalls = 4
maxWaitingCalls = 16
}
}
val policyB = resilient(scope) {
bulkheadNamed(bulkheads, "database") {
maxConcurrentCalls = 4 // first registration wins; same instance as policyA
}
}Launch parallel attempts and return the first successful result. Use carefully (extra load).
val policy = resilient {
hedging {
attempts = 3
stagger = 50.milliseconds
}
}Cache successful results per key with TTL. Use a fixed key or a keyProvider (suspend) for dynamic keys (e.g. per user, per request). Invalidation is available via policy.cacheHandle when cache is configured.
val policy = resilient(scope) {
cache {
key = "users:123" // fixed key
// or keyProvider = { "user:${userId}" } // dynamic key
ttl = 30.seconds
}
}
// Invalidate by key or prefix (e.g. after write)
policy.cacheHandle?.invalidate("users:123")
policy.cacheHandle?.invalidatePrefix("user:")The in-memory implementation is one possible CachePolicy; custom backends (e.g. persistent storage or Redis) can implement the same interface.
→ In-depth documentation
Deduplicates concurrent in-flight executions by key. If multiple callers resolve the same key while the operation is still running, only one block execution happens and all callers share the same result (or error). It does not cache completed results; for TTL caching, use cache { ... }.
val policy = resilient(scope) {
coalesce {
key = "profile:42" // fixed key
// or keyProvider = { "profile:${userId}" } // dynamic key
}
}By default you pass a ResilientScope explicitly. When working inside a framework that already manages a CoroutineScope (e.g. viewModelScope, lifecycleScope, rememberCoroutineScope), you can bind the policy lifecycle directly to that scope — no manual ResilientScope creation or policy.close() needed.
CoroutineScope.asResilientScope() — wraps an existing scope as a ResilientScope. Cancelling the outer scope automatically cancels all internal background jobs (cache cleanup, coalescing deduplication):
import com.santimattius.resilient.composition.asResilientScope
// Android ViewModel
class ProfileViewModel : ViewModel() {
private val scope = viewModelScope.asResilientScope()
private val policy = resilient(scope) {
cache { key = "profile"; ttl = 60.seconds }
retry { maxAttempts = 3 }
}
// When the ViewModel is cleared, viewModelScope is cancelled → scope and all policy
// background jobs are cancelled automatically. No policy.close() needed.
}CoroutineScope.resilient { } — shorthand that combines .asResilientScope() and resilient(scope) { } in one call:
import com.santimattius.resilient.composition.resilient
class ProfileViewModel : ViewModel() {
private val policy = viewModelScope.resilient {
cache { key = "profile"; ttl = 60.seconds }
retry { maxAttempts = 3 }
}
}Lifecycle contract:
CoroutineScope is cancelled → the derived scope and all its background jobs are cancelled automatically via structured concurrency.policy.close() stops only the internal policy jobs — it does not cancel the outer scope.CancellationException.Use policy.getHealthSnapshot() to build health or readiness endpoints (e.g. Kubernetes probes, /health API). The snapshot includes circuit breaker state, bulkhead usage, rate limiter quota, retry config, and cache stats when the corresponding policies are configured.
import com.santimattius.resilient.circuitbreaker.CircuitState
val snapshot = policy.getHealthSnapshot()
// Circuit breaker state
val healthy = snapshot.circuitBreaker?.state != CircuitState.OPEN
// snapshot.circuitBreaker?.failureCount, successCount
// Bulkhead: active and waiting counts
snapshot.bulkhead?.let { bh ->
// bh.activeConcurrentCalls, bh.waitingCalls, bh.maxConcurrentCalls, bh.maxWaitingCalls
}
// Rate limiter: remaining tokens and time until next refill
snapshot.rateLimiter?.let { rl ->
// rl.remainingCalls, rl.timeToRefill
}
// Retry: configured max attempts
snapshot.retry?.let { r ->
// r.maxAttempts
}
// Cache: entry count and cumulative hit rate (Double.NaN when no calls yet)
snapshot.cache?.let { c ->
// c.entryCount, c.hitRate
}Return a fallback value when the operation fails. Ensure the fallback return type matches the type returned by the block passed to execute(), otherwise a ClassCastException may occur at runtime.
import com.santimattius.resilient.fallback.FallbackConfig
val policy = resilient {
fallback(FallbackConfig { err ->
// produce a fallback from error (log/metric here)
"fallback-value"
})
}val policy = resilient {
cache { key = "profile:42"; ttl = 20.seconds }
timeout { timeout = 2.seconds }
retry {
maxAttempts = 3
backoffStrategy = ExponentialBackoff(initialDelay = 100.milliseconds)
}
circuitBreaker { failureThreshold = 5; successThreshold = 2; halfOpenMaxCalls = 1 }
rateLimiter { maxCalls = 50; period = 1.seconds }
bulkhead { maxConcurrentCalls = 10; maxWaitingCalls = 100 }
hedging { attempts = 2 }
fallback(FallbackConfig { "cached-or-default" })
}
suspend fun load(): User = policy.execute { loadProfile() }A ready-to-run sample is available at:
androidApp/src/main/kotlin/com/santimattius/kmp/ResilientExample.ktUse it from MainActivity:
setContent { ResilientExample() }The resilient-test module provides utilities for testing resilience policies with fault injection and simplified policy builders.
Use FaultInjector to simulate failures, delays, and intermittent behavior in your tests. Prefer failCount for unit tests — it produces deterministic results and avoids intermittent failures.
import com.santimattius.resilient.test.FaultInjector
import kotlin.time.Duration.Companion.milliseconds
// Deterministic (preferred for unit tests): fail the first N calls, then succeed
val injector = FaultInjector.builder()
.failCount(3) // fails calls 1–3, succeeds on call 4
.build()
// Probabilistic: use only for chaos / load tests, not unit tests
val chaosInjector = FaultInjector.builder()
.failureRate(0.3) // 30% chance of failure per call
.delay(50.milliseconds) // Add 50ms delay
.delayJitter(true) // Randomize delay ±20%
.exception { CustomException() }
.build()Configuration:
failCount(n: Int): Fail the first n calls deterministically, then always succeed. Takes precedence over failureRate. Preferred for unit tests.
failureRate(rate: Double): Probability of throwing an exception per call (0.0 = never, 1.0 = always). Ignored when failCount > 0.exception(block: () -> Throwable): Factory for the exception to throw (default: FaultInjectedException).delay(duration: Duration): Fixed delay before executing the block (default: Duration.ZERO).delayJitter(enable: Boolean): Randomize delay ±20% (default: false).Use PolicyBuilders to create policies with sensible test defaults:
import com.santimattius.resilient.test.PolicyBuilders
import com.santimattius.resilient.test.TestResilientScope
import kotlinx.coroutines.test.runTest
@Test
fun `test retry with fault injection`() = runTest {
val scope = TestResilientScope()
val policy = PolicyBuilders.retryPolicy(
scope,
maxAttempts = 3,
initialDelay = 10.milliseconds
)
val injector = FaultInjector.builder()
.failCount(2) // fail first 2 calls, succeed on the 3rd
.build()
val result = policy.execute {
injector.execute { "success" }
}
assertEquals("success", result)
}Available builders:
retryPolicy(scope, maxAttempts = 3, initialDelay = 10ms, maxDelay = 100ms, shouldRetry = { true })timeoutPolicy(scope, timeout = 1.second)circuitBreakerPolicy(scope, failureThreshold = 3, successThreshold = 2, timeout = 5.seconds)bulkheadPolicy(scope, maxConcurrentCalls = 2, maxWaitingCalls = 4)rateLimiterPolicy(scope, maxCalls = 5, period = 1.second)Use TestResilientScope to create a test-friendly scope for policies:
import com.santimattius.resilient.test.TestResilientScope
import kotlinx.coroutines.test.runTest
@Test
fun `test policy lifecycle`() = runTest {
val scope = TestResilientScope(this)
val policy = resilient(scope) {
retry { maxAttempts = 3 }
}
// ... test logic
scope.cancel() // cleanup
}Add the resilient-test module to your test dependencies:
// build.gradle.kts
kotlin {
sourceSets {
commonTest.dependencies {
implementation("io.github.santimattius.resilient:resilient-test:1.5.0")
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-test:1.9.0")
}
}
}kotlinx-coroutines-test for virtual time with runTest and TestTimeSource.FaultInjector to simulate realistic failure scenarios (intermittent errors, slow responses).Theoretical and technical deep-dive documentation for each pattern — definition, when to apply, configuration reference, and extended examples:
| Pattern | Document |
|---|---|
| Timeout | docs/patterns/timeout.md |
| Retry | docs/patterns/retry.md |
| Circuit Breaker | docs/patterns/circuit-breaker.md |
| Rate Limiter | docs/patterns/rate-limiter.md |
| Bulkhead | docs/patterns/bulkhead.md |
| Hedging | docs/patterns/hedging.md |
| Cache | docs/patterns/cache.md |
| Fallback | docs/patterns/fallback.md |
→ Full index with descriptions
Contributions are welcome. Please read CONTRIBUTING.md for development setup, coding guidelines, and the pull request process.
This project is licensed under the Apache License 2.0.
A Kotlin Multiplatform library providing resilience patterns (Timeout, Retry, Circuit Breaker, Rate Limiter, Bulkhead, Hedging, Cache, Fallback) for suspend functions. Compose them declaratively with a small DSL and observe runtime telemetry via Flow.
import com.santimattius.resilient.composition.resilient
import com.santimattius.resilient.retry.ExponentialBackoff
import kotlin.time.Duration.Companion.seconds
import kotlin.time.Duration.Companion.milliseconds
val policy = resilient {
timeout { timeout = 2.seconds }
retry {
maxAttempts = 3
backoffStrategy = ExponentialBackoff(initialDelay = 100.milliseconds)
}
circuitBreaker {
failureThreshold = 5
successThreshold = 2
timeout = 30.seconds
}
}
suspend fun call(): String = policy.execute { fetchData() }Observe runtime events from policy.events (SharedFlow):
val job = scope.launch {
policy.events.collect { event -> println(event) }
}Outer → Inner (default):
The composition order is configurable. Use compositionOrder() to specify a custom order:
import com.santimattius.resilient.composition.OrderablePolicyType
val policy = resilient(scope) {
compositionOrder(listOf(
OrderablePolicyType.CACHE, // Check cache first (after Fallback)
OrderablePolicyType.COALESCE, // Deduplicate in-flight requests by key
OrderablePolicyType.TIMEOUT, // Then apply timeout
OrderablePolicyType.RETRY, // Retry on failures
OrderablePolicyType.CIRCUIT_BREAKER,
OrderablePolicyType.RATE_LIMITER,
OrderablePolicyType.BULKHEAD,
OrderablePolicyType.HEDGING
))
// Fallback is automatically added as the outermost policy
// ... configure policies
}Important Notes:
OrderablePolicyType - it is always positioned outermost automaticallycompositionOrder(...) accepts a subset: policy types not included are appended in the default ordercompositionOrder(listOf(..., OrderablePolicyType.RETRY, OrderablePolicyType.TIMEOUT, ...)) if you want per-attempt timeout.Aborts the operation after a configured duration. onTimeout runs only on actual timeouts. This is a single time limit for the block as wrapped by the policy (see Timeout vs Retry order). For a timeout per retry attempt, use Retry with perAttemptTimeout instead.
val policy = resilient {
timeout {
timeout = 3.seconds
onTimeout = { /* log metric */ }
}
}Retries failing operations according to a backoff strategy and predicate. Use shouldRetry to avoid retrying non-transient errors (e.g. 4xx client errors); retry only on 5xx, network/IO, or timeouts. Optional perAttemptTimeout limits each attempt (including the first) so a single slow attempt does not consume the whole retry budget.
Optional shouldRetryResult retries when the block returns successfully but the value is not acceptable (e.g. HTTP 202, empty body, “not ready” flag). Return true from the predicate to request another attempt; it shares the same maxAttempts budget as exception-based retries. onRetry receives RetryableResultException (with lastValue) for telemetry; if attempts are exhausted while the predicate stays true, the last returned value is returned (unlike exception exhaustion, which rethrows).
import com.santimattius.resilient.retry.*
val policy = resilient {
retry {
maxAttempts = 4
shouldRetry = { it is java.io.IOException } // e.g. only retry IO; do not retry 4xx
shouldRetryResult = { status -> (status as Int) >= 500 } // e.g. retry on 5xx-like codes returned as value
perAttemptTimeout = 5.seconds // optional: timeout per attempt
backoffStrategy = ExponentialBackoff(
initialDelay = 200.milliseconds,
maxDelay = 5.seconds,
factor = 2.0,
jitter = true
)
}
}Supported backoffs: ExponentialBackoff, LinearBackoff, FixedBackoff, DecorrelatedJitterBackoff.
DecorrelatedJitterBackoff implements the AWS decorrelated jitter formula — each caller's delay sequence is independent of every other caller's, which breaks synchronized retry waves in high-concurrency scenarios:
import com.santimattius.resilient.retry.DecorrelatedJitterBackoff
val policy = resilient {
retry {
maxAttempts = 4
backoffStrategy = DecorrelatedJitterBackoff(base = 100.milliseconds, cap = 10.seconds)
}
}Prevents hammering failing downstreams. States: CLOSED → OPEN → HALF_OPEN.
Three trip modes are available:
failureThreshold consecutive failures open the circuit; a success resets the count.slidingWindow): failureThreshold failures within a time window open the circuit.failureRateThreshold): circuit opens when the failure rate over the last minimumNumberOfCalls outcomes meets or exceeds a percentage.val policy = resilient(scope) {
circuitBreaker {
failureThreshold = 5
successThreshold = 2
halfOpenMaxCalls = 1
timeout = 60.seconds
// slidingWindow = 30.seconds // time-based mode (mutually exclusive with failureRateThreshold)
// failureRateThreshold = 50.0 // failure-rate mode — opens at 50% over last minimumNumberOfCalls
// minimumNumberOfCalls = 10 // ring-buffer size (default 10)
}
}shouldRecordResult counts a successful return value as a failure (while still returning it to the caller) — useful when the downstream signals errors inside HTTP 200 responses:
val policy = resilient(scope) {
circuitBreaker {
failureThreshold = 3
shouldRecordResult = { result -> result is ApiResponse && result.code == 503 }
}
}Named / shared circuit breaker: use CircuitBreakerRegistry so multiple policies share the same breaker state (e.g. all "payments" calls trip a single breaker). Cannot combine circuitBreaker { } and circuitBreakerNamed(...) in the same policy.
import com.santimattius.resilient.circuitbreaker.CircuitBreakerRegistry
val circuitBreakers = CircuitBreakerRegistry()
val policyA = resilient(scope) {
circuitBreakerNamed(circuitBreakers, "payments") {
failureThreshold = 5
timeout = 30.seconds
}
}
val policyB = resilient(scope) {
circuitBreakerNamed(circuitBreakers, "payments") { } // same instance as policyA
}Token-bucket rate limiting: tokens refill at the start of each period (fixed-window refill). With maxCalls = 10 and period = 1.seconds, at most 10 calls per second are allowed. One bucket per policy (global limit). Optional max wait when limited.
val policy = resilient {
rateLimiter {
maxCalls = 10
period = 1.seconds
timeoutWhenLimited = 5.seconds // throw if wait would exceed
onRateLimited = { /* metric */ }
}
}Named / shared rate limiter: use RateLimiterRegistry so multiple policies share the same token-bucket quota (e.g. all "payments" calls consume from the same pool). Cannot combine rateLimiter { } and rateLimiterNamed(...) in the same policy.
import com.santimattius.resilient.ratelimiter.RateLimiterRegistry
val rateLimiters = RateLimiterRegistry()
val policyA = resilient(scope) {
rateLimiterNamed(rateLimiters, "payments") { maxCalls = 10; period = 1.seconds }
}
val policyB = resilient(scope) {
rateLimiterNamed(rateLimiters, "payments") { } // same instance as policyA
}Limit concurrent executions and queued waiters; optional acquire timeout.
val policy = resilient(scope) {
bulkhead {
maxConcurrentCalls = 8
maxWaitingCalls = 32
timeout = 2.seconds
}
}Named / shared bulkhead: use BulkheadRegistry so several policies share the same pool (e.g. one limit for all "database" calls). Cannot combine bulkhead { } and bulkheadNamed(...) in the same policy.
import com.santimattius.resilient.bulkhead.BulkheadRegistry
val bulkheads = BulkheadRegistry()
val policyA = resilient(scope) {
bulkheadNamed(bulkheads, "database") {
maxConcurrentCalls = 4
maxWaitingCalls = 16
}
}
val policyB = resilient(scope) {
bulkheadNamed(bulkheads, "database") {
maxConcurrentCalls = 4 // first registration wins; same instance as policyA
}
}Launch parallel attempts and return the first successful result. Use carefully (extra load).
val policy = resilient {
hedging {
attempts = 3
stagger = 50.milliseconds
}
}Cache successful results per key with TTL. Use a fixed key or a keyProvider (suspend) for dynamic keys (e.g. per user, per request). Invalidation is available via policy.cacheHandle when cache is configured.
val policy = resilient(scope) {
cache {
key = "users:123" // fixed key
// or keyProvider = { "user:${userId}" } // dynamic key
ttl = 30.seconds
}
}
// Invalidate by key or prefix (e.g. after write)
policy.cacheHandle?.invalidate("users:123")
policy.cacheHandle?.invalidatePrefix("user:")The in-memory implementation is one possible CachePolicy; custom backends (e.g. persistent storage or Redis) can implement the same interface.
→ In-depth documentation
Deduplicates concurrent in-flight executions by key. If multiple callers resolve the same key while the operation is still running, only one block execution happens and all callers share the same result (or error). It does not cache completed results; for TTL caching, use cache { ... }.
val policy = resilient(scope) {
coalesce {
key = "profile:42" // fixed key
// or keyProvider = { "profile:${userId}" } // dynamic key
}
}By default you pass a ResilientScope explicitly. When working inside a framework that already manages a CoroutineScope (e.g. viewModelScope, lifecycleScope, rememberCoroutineScope), you can bind the policy lifecycle directly to that scope — no manual ResilientScope creation or policy.close() needed.
CoroutineScope.asResilientScope() — wraps an existing scope as a ResilientScope. Cancelling the outer scope automatically cancels all internal background jobs (cache cleanup, coalescing deduplication):
import com.santimattius.resilient.composition.asResilientScope
// Android ViewModel
class ProfileViewModel : ViewModel() {
private val scope = viewModelScope.asResilientScope()
private val policy = resilient(scope) {
cache { key = "profile"; ttl = 60.seconds }
retry { maxAttempts = 3 }
}
// When the ViewModel is cleared, viewModelScope is cancelled → scope and all policy
// background jobs are cancelled automatically. No policy.close() needed.
}CoroutineScope.resilient { } — shorthand that combines .asResilientScope() and resilient(scope) { } in one call:
import com.santimattius.resilient.composition.resilient
class ProfileViewModel : ViewModel() {
private val policy = viewModelScope.resilient {
cache { key = "profile"; ttl = 60.seconds }
retry { maxAttempts = 3 }
}
}Lifecycle contract:
CoroutineScope is cancelled → the derived scope and all its background jobs are cancelled automatically via structured concurrency.policy.close() stops only the internal policy jobs — it does not cancel the outer scope.CancellationException.Use policy.getHealthSnapshot() to build health or readiness endpoints (e.g. Kubernetes probes, /health API). The snapshot includes circuit breaker state, bulkhead usage, rate limiter quota, retry config, and cache stats when the corresponding policies are configured.
import com.santimattius.resilient.circuitbreaker.CircuitState
val snapshot = policy.getHealthSnapshot()
// Circuit breaker state
val healthy = snapshot.circuitBreaker?.state != CircuitState.OPEN
// snapshot.circuitBreaker?.failureCount, successCount
// Bulkhead: active and waiting counts
snapshot.bulkhead?.let { bh ->
// bh.activeConcurrentCalls, bh.waitingCalls, bh.maxConcurrentCalls, bh.maxWaitingCalls
}
// Rate limiter: remaining tokens and time until next refill
snapshot.rateLimiter?.let { rl ->
// rl.remainingCalls, rl.timeToRefill
}
// Retry: configured max attempts
snapshot.retry?.let { r ->
// r.maxAttempts
}
// Cache: entry count and cumulative hit rate (Double.NaN when no calls yet)
snapshot.cache?.let { c ->
// c.entryCount, c.hitRate
}Return a fallback value when the operation fails. Ensure the fallback return type matches the type returned by the block passed to execute(), otherwise a ClassCastException may occur at runtime.
import com.santimattius.resilient.fallback.FallbackConfig
val policy = resilient {
fallback(FallbackConfig { err ->
// produce a fallback from error (log/metric here)
"fallback-value"
})
}val policy = resilient {
cache { key = "profile:42"; ttl = 20.seconds }
timeout { timeout = 2.seconds }
retry {
maxAttempts = 3
backoffStrategy = ExponentialBackoff(initialDelay = 100.milliseconds)
}
circuitBreaker { failureThreshold = 5; successThreshold = 2; halfOpenMaxCalls = 1 }
rateLimiter { maxCalls = 50; period = 1.seconds }
bulkhead { maxConcurrentCalls = 10; maxWaitingCalls = 100 }
hedging { attempts = 2 }
fallback(FallbackConfig { "cached-or-default" })
}
suspend fun load(): User = policy.execute { loadProfile() }A ready-to-run sample is available at:
androidApp/src/main/kotlin/com/santimattius/kmp/ResilientExample.ktUse it from MainActivity:
setContent { ResilientExample() }The resilient-test module provides utilities for testing resilience policies with fault injection and simplified policy builders.
Use FaultInjector to simulate failures, delays, and intermittent behavior in your tests. Prefer failCount for unit tests — it produces deterministic results and avoids intermittent failures.
import com.santimattius.resilient.test.FaultInjector
import kotlin.time.Duration.Companion.milliseconds
// Deterministic (preferred for unit tests): fail the first N calls, then succeed
val injector = FaultInjector.builder()
.failCount(3) // fails calls 1–3, succeeds on call 4
.build()
// Probabilistic: use only for chaos / load tests, not unit tests
val chaosInjector = FaultInjector.builder()
.failureRate(0.3) // 30% chance of failure per call
.delay(50.milliseconds) // Add 50ms delay
.delayJitter(true) // Randomize delay ±20%
.exception { CustomException() }
.build()Configuration:
failCount(n: Int): Fail the first n calls deterministically, then always succeed. Takes precedence over failureRate. Preferred for unit tests.
failureRate(rate: Double): Probability of throwing an exception per call (0.0 = never, 1.0 = always). Ignored when failCount > 0.exception(block: () -> Throwable): Factory for the exception to throw (default: FaultInjectedException).delay(duration: Duration): Fixed delay before executing the block (default: Duration.ZERO).delayJitter(enable: Boolean): Randomize delay ±20% (default: false).Use PolicyBuilders to create policies with sensible test defaults:
import com.santimattius.resilient.test.PolicyBuilders
import com.santimattius.resilient.test.TestResilientScope
import kotlinx.coroutines.test.runTest
@Test
fun `test retry with fault injection`() = runTest {
val scope = TestResilientScope()
val policy = PolicyBuilders.retryPolicy(
scope,
maxAttempts = 3,
initialDelay = 10.milliseconds
)
val injector = FaultInjector.builder()
.failCount(2) // fail first 2 calls, succeed on the 3rd
.build()
val result = policy.execute {
injector.execute { "success" }
}
assertEquals("success", result)
}Available builders:
retryPolicy(scope, maxAttempts = 3, initialDelay = 10ms, maxDelay = 100ms, shouldRetry = { true })timeoutPolicy(scope, timeout = 1.second)circuitBreakerPolicy(scope, failureThreshold = 3, successThreshold = 2, timeout = 5.seconds)bulkheadPolicy(scope, maxConcurrentCalls = 2, maxWaitingCalls = 4)rateLimiterPolicy(scope, maxCalls = 5, period = 1.second)Use TestResilientScope to create a test-friendly scope for policies:
import com.santimattius.resilient.test.TestResilientScope
import kotlinx.coroutines.test.runTest
@Test
fun `test policy lifecycle`() = runTest {
val scope = TestResilientScope(this)
val policy = resilient(scope) {
retry { maxAttempts = 3 }
}
// ... test logic
scope.cancel() // cleanup
}Add the resilient-test module to your test dependencies:
// build.gradle.kts
kotlin {
sourceSets {
commonTest.dependencies {
implementation("io.github.santimattius.resilient:resilient-test:1.5.0")
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-test:1.9.0")
}
}
}kotlinx-coroutines-test for virtual time with runTest and TestTimeSource.FaultInjector to simulate realistic failure scenarios (intermittent errors, slow responses).Theoretical and technical deep-dive documentation for each pattern — definition, when to apply, configuration reference, and extended examples:
| Pattern | Document |
|---|---|
| Timeout | docs/patterns/timeout.md |
| Retry | docs/patterns/retry.md |
| Circuit Breaker | docs/patterns/circuit-breaker.md |
| Rate Limiter | docs/patterns/rate-limiter.md |
| Bulkhead | docs/patterns/bulkhead.md |
| Hedging | docs/patterns/hedging.md |
| Cache | docs/patterns/cache.md |
| Fallback | docs/patterns/fallback.md |
→ Full index with descriptions
Contributions are welcome. Please read CONTRIBUTING.md for development setup, coding guidelines, and the pull request process.
This project is licensed under the Apache License 2.0.