
AI-powered mobile testing framework enables authoring and executing tests using natural language, supports on-device execution, offers custom agent tools, and provides detailed reporting with Maestro integration.
Natural-language device control for your coding agent — across iOS, Android, and web. Every session is a replayable trail you can run as a test.
Trailblaze gives your coding agent a single, typed, replayable way to drive any device.
Built-in primitives plus your own typed tools, with a natural-language source of truth
that travels across platforms. Whether you're exploring an app, automating a workflow,
or building a test suite, the artifact is the same: a portable .trail.yaml your CI can
replay deterministically from recorded steps with no LLM at replay time.
Composable, replayable device control for your agent. Your coding agent drives the
device through Trailblaze's primitives — plus first-class commands of your own, like
login or addToCart, defined in a typed language with type-safe bindings. Each step
records its objective. The resulting .trail.yaml is both a natural-language source of
truth — what you're testing or doing — and a deterministic execution artifact —
how it runs. One trail can describe a flow across iOS, Android, and web — same
natural-language steps, platform-specific actions or tools captured for each. Agents
read the what; the framework handles the how.
A cross-platform Trace Viewer. Open any session — yours or your agent's — and inspect view hierarchy, screenshots, video, and platform logs at every step. When you want a different selector than the one Trailblaze auto-picked for a step, the viewer lets you choose from generated alternatives computed against the same captured hierarchy — human judgment, no re-recording. Same viewer for iOS, Android, and web.
Most multi-platform testing tools expose the intersection of what iOS, Android, and
web can do — a generic API that maps onto all three, at the cost of losing
platform-specific capability. Trailblaze takes the opposite approach: full-fidelity
native semantic surfaces, OS-native primitives, and typed tool sets composed per
(target, platform). Your agent works against each platform's native UI tree —
the accessibility tree on Android, native UI semantics on iOS, the DOM on web —
rather than a flattened cross-platform abstraction.
The agent picks elements semantically — "the Sign in button" — from the native hierarchy. Trailblaze computes the platform-specific selectors behind the scenes. The natural-language test stays the same; the execution uses each platform's full power.
This only works because an agent is driving. Exposing full native fidelity to a human is overwhelming — twenty platform-specific selector strategies per element is no one's idea of a good testing SDK. Exposing it to an LLM is the point. The agent handles the complexity; you get the expressive power of native automation with the consistency of a single natural-language test.
You can stop anywhere. Tools you add to your agent's surface become first-class commands the next time your agent drives a device.
Drive a device. Point your coding agent (Claude Code, Cursor, Codex CLI) at the Trailblaze CLI. Natural-language device control across iOS, Android, and web — through built-in primitives plus any custom tools your team has shipped.
Save and replay. Any session becomes a .trail.yaml — replay it ad-hoc, commit
it to your repo as a CI regression test, or open it in the Trace Viewer to see
exactly what happened. Same artifact, three uses, no LLM at replay.
Compose your own agent surface. Give your agent first-class commands like
login or addToCart, named waypoints for your screens, and trailmaps from other
teams. Curate exactly what your agent sees: surface your login and hide the
low-level taps; pick four of twenty primitives if that's what your tests need.
Custom commands are typed and type-safe, with IDE support, replayable capture, and
LLM-facing descriptions you write for the tool and each parameter — your agent
reads those descriptions to decide when and how to call them. Every call — yours,
theirs, or built-in — is a first-class command, recordable and replayable. Your
agent gets dramatically more capable on every task that uses your composition.¹
Recorded trails replay deterministically by default — no LLM in the loop, no flake. When a recorded step genuinely doesn't match the screen anymore, there are two repair paths:
--self-heal and Trailblaze's
built-in agent patches the failing step against the live screen.The natural-language objectives are what make this work. They tell the agent what the step was trying to do, so repair is a matter of updating the how against the current app — not re-deriving intent from a broken selector.
The AI agent ecosystem is moving fast. Whatever it looks like in a year — or five, or ten — your natural-language tests will come with you. Trailblaze captures what you're testing as portable prose; the how (selectors, recordings, agent harness, framework version) adapts as the landscape changes.
curl -fsSL https://raw.githubusercontent.com/block/trailblaze/main/install.sh | bashRequired: curl, java 17+.
Optional (Homebrew users: brew install bun esbuild ffmpeg):
bun + esbuild — needed only when authoring or running trailmap-defined scripted
tools (on disk under trails/config/trailmaps/<trailmap>/tools/*.ts). Without these,
./trailblaze run … will fail at dispatch with Unsupported tool type for RPC execution for any trail that uses a target: trailmap with .ts tool descriptors.ffmpeg — needed only for trail video capture / sprite extraction (visual playback
artifacts under ~/.trailblaze/logs/<session>/). Trails still run without it; only
the rendered video and sprite-strip outputs are missing.# List connected devices (Android emulator, iOS simulator, or web browser)
trailblaze device list
# Pin this terminal to a device + target so subsequent calls inherit both.
# Trailblaze remembers per-terminal — other terminals stay independent.
trailblaze device connect android --target default
# Read the screen — returns a view hierarchy with refs (e.g. ab42) the agent can target
trailblaze snapshot
# Act on a referenced element — your agent picks the element; Trailblaze picks the
# platform-specific selector strategy. Every action takes a --step for self-heal.
trailblaze tool tap ref=ab42 -s "Tap sign in"
# Done — release the device and clear this terminal's pin
trailblaze device disconnectPaste those into Claude Code, Codex, Cursor, Goose, or anything that can run bash and
you're already authoring trails. For CI / scripts that prefer determinism over shell
state, every device-acting command (snapshot, tool, step, ask, verify,
session start/stop, run) also accepts -d <platform> (and --target <app> where
supported — tool, step, session start, mcp) as a per-call override.
(blaze remains accepted as a deprecated alias of step.)
Runnable, standalone workspaces under examples/ — each a complete template
you can copy: typed trailblaze.tool<Input, Output>() tools auto-discovered from bare
.ts files (no .yaml manifests), a trail suite, and tool unit tests.
| Example | Platform | What it teaches |
|---|---|---|
examples/ios-contacts |
iOS | Canonical mobile reference — 9 typed scripted tools, 18 trails, and tool unit tests (*.test.ts). |
examples/wikipedia |
Web | Canonical web reference — 9 typed scripted tools driving live en.wikipedia.org, 28 trails. |
examples/playwright-native |
Web | Smallest end-to-end scripted-tool setup, with a bundled sample app. |
./trailblaze check --workspace examples/ios-contacts # materialize SDK + typed bindings (once)See examples/README.md for the full index, including sample target
apps.
Trailblaze is moving fast. These are landing now and are worth knowing about even if they're not stable yet:
@trailblaze/scripting SDK in a
typed language with type-safe bindings. Tools execute in a QuickJS sandbox on-device
or in a host subprocess. After clone, run
./trailblaze check --workspace examples/playwright-native once to materialize the
workspace SDK + per-trailmap typed bindings.
(devlog)loginAsTestUser trail becomes a one-line setup step inside any other
test. (devlog)trailblaze app # Launch the desktop app for visual trail authoring and report browsingThe desktop app, HTML reports, and session browser all work the same regardless of how the trail was authored.
Full docs at block.github.io/trailblaze.
ios-contacts or wikipedia)(A longer Getting Started walkthrough is being rewritten to match the new direction. Until that lands, the README above is the canonical starting point.)
Trailblaze is not a full coding agent. It ships a functioning agent focused on the natural-language step it's currently executing, with vision into the current screen and past steps — fine for many flows. What it doesn't try to be is a Claude Code, Cursor, or Codex: those have your entire codebase in context, can query logcat at runtime, and bring far more project context to bear. For serious authoring work, you want one of them driving Trailblaze. And Trailblaze is not a SaaS test platform — the trail YAML lives in your repo, you own it, you can read and edit it.
¹ Cross-team and cross-repo trailmap sharing today; npm-based distribution for community-published trailmaps is in active development.
Natural-language device control for your coding agent — across iOS, Android, and web. Every session is a replayable trail you can run as a test.
Trailblaze gives your coding agent a single, typed, replayable way to drive any device.
Built-in primitives plus your own typed tools, with a natural-language source of truth
that travels across platforms. Whether you're exploring an app, automating a workflow,
or building a test suite, the artifact is the same: a portable .trail.yaml your CI can
replay deterministically from recorded steps with no LLM at replay time.
Composable, replayable device control for your agent. Your coding agent drives the
device through Trailblaze's primitives — plus first-class commands of your own, like
login or addToCart, defined in a typed language with type-safe bindings. Each step
records its objective. The resulting .trail.yaml is both a natural-language source of
truth — what you're testing or doing — and a deterministic execution artifact —
how it runs. One trail can describe a flow across iOS, Android, and web — same
natural-language steps, platform-specific actions or tools captured for each. Agents
read the what; the framework handles the how.
A cross-platform Trace Viewer. Open any session — yours or your agent's — and inspect view hierarchy, screenshots, video, and platform logs at every step. When you want a different selector than the one Trailblaze auto-picked for a step, the viewer lets you choose from generated alternatives computed against the same captured hierarchy — human judgment, no re-recording. Same viewer for iOS, Android, and web.
Most multi-platform testing tools expose the intersection of what iOS, Android, and
web can do — a generic API that maps onto all three, at the cost of losing
platform-specific capability. Trailblaze takes the opposite approach: full-fidelity
native semantic surfaces, OS-native primitives, and typed tool sets composed per
(target, platform). Your agent works against each platform's native UI tree —
the accessibility tree on Android, native UI semantics on iOS, the DOM on web —
rather than a flattened cross-platform abstraction.
The agent picks elements semantically — "the Sign in button" — from the native hierarchy. Trailblaze computes the platform-specific selectors behind the scenes. The natural-language test stays the same; the execution uses each platform's full power.
This only works because an agent is driving. Exposing full native fidelity to a human is overwhelming — twenty platform-specific selector strategies per element is no one's idea of a good testing SDK. Exposing it to an LLM is the point. The agent handles the complexity; you get the expressive power of native automation with the consistency of a single natural-language test.
You can stop anywhere. Tools you add to your agent's surface become first-class commands the next time your agent drives a device.
Drive a device. Point your coding agent (Claude Code, Cursor, Codex CLI) at the Trailblaze CLI. Natural-language device control across iOS, Android, and web — through built-in primitives plus any custom tools your team has shipped.
Save and replay. Any session becomes a .trail.yaml — replay it ad-hoc, commit
it to your repo as a CI regression test, or open it in the Trace Viewer to see
exactly what happened. Same artifact, three uses, no LLM at replay.
Compose your own agent surface. Give your agent first-class commands like
login or addToCart, named waypoints for your screens, and trailmaps from other
teams. Curate exactly what your agent sees: surface your login and hide the
low-level taps; pick four of twenty primitives if that's what your tests need.
Custom commands are typed and type-safe, with IDE support, replayable capture, and
LLM-facing descriptions you write for the tool and each parameter — your agent
reads those descriptions to decide when and how to call them. Every call — yours,
theirs, or built-in — is a first-class command, recordable and replayable. Your
agent gets dramatically more capable on every task that uses your composition.¹
Recorded trails replay deterministically by default — no LLM in the loop, no flake. When a recorded step genuinely doesn't match the screen anymore, there are two repair paths:
--self-heal and Trailblaze's
built-in agent patches the failing step against the live screen.The natural-language objectives are what make this work. They tell the agent what the step was trying to do, so repair is a matter of updating the how against the current app — not re-deriving intent from a broken selector.
The AI agent ecosystem is moving fast. Whatever it looks like in a year — or five, or ten — your natural-language tests will come with you. Trailblaze captures what you're testing as portable prose; the how (selectors, recordings, agent harness, framework version) adapts as the landscape changes.
curl -fsSL https://raw.githubusercontent.com/block/trailblaze/main/install.sh | bashRequired: curl, java 17+.
Optional (Homebrew users: brew install bun esbuild ffmpeg):
bun + esbuild — needed only when authoring or running trailmap-defined scripted
tools (on disk under trails/config/trailmaps/<trailmap>/tools/*.ts). Without these,
./trailblaze run … will fail at dispatch with Unsupported tool type for RPC execution for any trail that uses a target: trailmap with .ts tool descriptors.ffmpeg — needed only for trail video capture / sprite extraction (visual playback
artifacts under ~/.trailblaze/logs/<session>/). Trails still run without it; only
the rendered video and sprite-strip outputs are missing.# List connected devices (Android emulator, iOS simulator, or web browser)
trailblaze device list
# Pin this terminal to a device + target so subsequent calls inherit both.
# Trailblaze remembers per-terminal — other terminals stay independent.
trailblaze device connect android --target default
# Read the screen — returns a view hierarchy with refs (e.g. ab42) the agent can target
trailblaze snapshot
# Act on a referenced element — your agent picks the element; Trailblaze picks the
# platform-specific selector strategy. Every action takes a --step for self-heal.
trailblaze tool tap ref=ab42 -s "Tap sign in"
# Done — release the device and clear this terminal's pin
trailblaze device disconnectPaste those into Claude Code, Codex, Cursor, Goose, or anything that can run bash and
you're already authoring trails. For CI / scripts that prefer determinism over shell
state, every device-acting command (snapshot, tool, step, ask, verify,
session start/stop, run) also accepts -d <platform> (and --target <app> where
supported — tool, step, session start, mcp) as a per-call override.
(blaze remains accepted as a deprecated alias of step.)
Runnable, standalone workspaces under examples/ — each a complete template
you can copy: typed trailblaze.tool<Input, Output>() tools auto-discovered from bare
.ts files (no .yaml manifests), a trail suite, and tool unit tests.
| Example | Platform | What it teaches |
|---|---|---|
examples/ios-contacts |
iOS | Canonical mobile reference — 9 typed scripted tools, 18 trails, and tool unit tests (*.test.ts). |
examples/wikipedia |
Web | Canonical web reference — 9 typed scripted tools driving live en.wikipedia.org, 28 trails. |
examples/playwright-native |
Web | Smallest end-to-end scripted-tool setup, with a bundled sample app. |
./trailblaze check --workspace examples/ios-contacts # materialize SDK + typed bindings (once)See examples/README.md for the full index, including sample target
apps.
Trailblaze is moving fast. These are landing now and are worth knowing about even if they're not stable yet:
@trailblaze/scripting SDK in a
typed language with type-safe bindings. Tools execute in a QuickJS sandbox on-device
or in a host subprocess. After clone, run
./trailblaze check --workspace examples/playwright-native once to materialize the
workspace SDK + per-trailmap typed bindings.
(devlog)loginAsTestUser trail becomes a one-line setup step inside any other
test. (devlog)trailblaze app # Launch the desktop app for visual trail authoring and report browsingThe desktop app, HTML reports, and session browser all work the same regardless of how the trail was authored.
Full docs at block.github.io/trailblaze.
ios-contacts or wikipedia)(A longer Getting Started walkthrough is being rewritten to match the new direction. Until that lands, the README above is the canonical starting point.)
Trailblaze is not a full coding agent. It ships a functioning agent focused on the natural-language step it's currently executing, with vision into the current screen and past steps — fine for many flows. What it doesn't try to be is a Claude Code, Cursor, or Codex: those have your entire codebase in context, can query logcat at runtime, and bring far more project context to bear. For serious authoring work, you want one of them driving Trailblaze. And Trailblaze is not a SaaS test platform — the trail YAML lives in your repo, you own it, you can read and edit it.
¹ Cross-team and cross-repo trailmap sharing today; npm-based distribution for community-published trailmaps is in active development.