BroSki: A Task Runner That Tells You Why It Reran

Jun 30, 2026

16 min read

engineeringdeep-diverustdeveloper-toolsbroski

BroSki is a task runner — the same category as Make or Just — built around one specific complaint with that whole category: when a build reruns, or doesn't, you usually can't tell why. Existing shell-first runners are fast to adopt and easy to write, but they offer no real answer to "why did this rebuild," "why did this pass locally and fail in CI," or "why is my local runner corrupting outputs when a task fails halfway through." Heavier build systems answer all three, at the cost of a migration most solo developers and small teams will never sign up for. BroSki's bet is that you can keep Make's simplicity and still get real answers to those three questions, if you're willing to build the underlying machinery — content-addressed caching, transactional writes, and sandboxing — properly instead of skipping it.

This is a deep dive into that machinery: a Rust workspace, five crates, and the engineering decisions behind making a task runner that doesn't lie to you about its own behavior.

The problem

Three failure modes show up constantly in shell-script-based build tooling, and none of them have anything to do with the tasks themselves:

"Why did this rebuild?" — without a record of exactly what changed since the last run, a CI rerun is unexplainable. You either trust the cache blindly or --no-cache everything out of paranoia.
"Why did this pass locally but fail in CI?" — implicit inputs (an env var you forgot you had set, a file the task happened to read that nothing declared as a dependency) make builds nondeterministic between environments.
"Why is my local runner corrupting outputs on failure?" — if a multi-step task dies halfway through, partial output files are often left behind, and the next run can't tell the difference between "this succeeded" and "this is the wreckage of a crash."

BroSki's answer to all three is architectural, not advisory: explicit input/output contracts instead of implicit discovery, content-addressed fingerprints instead of timestamps, and transactional output promotion instead of writing directly into the workspace and hoping nothing goes wrong.

Architecture

BroSki is a Cargo workspace split into five crates, each with a single responsibility:

crates/
├── broski-store   # ArtifactStore trait — the seam for pluggable cache backends
├── broski-cache   # SQLite + content-addressed local object store (implements the trait)
├── broski-core    # DSL parser, dependency graph, fingerprinting, the executor itself
├── broski-cli     # Command routing, argument parsing, error reporting
└── broski-tui     # Ratatui terminal dashboard, a thin client over the executor's event stream

The dependency direction matters: broski-store defines the storage interface and depends on nothing else in the workspace, broski-cache implements that interface against local SQLite, and broski-core consumes both without knowing whether it's talking to a local store or something else entirely — that seam is what makes a future remote cache backend a plugin, not a rewrite.

Rendering diagram…

Nothing in broski-core imports broski-cache directly for its storage calls — it only ever talks through the ArtifactStore trait broski-store defines. A hosted remote cache (discussed below) is a second implementation of that same trait, not a change to broski-core at all.

Dependency	Role
blake3	Content-addressed fingerprinting, including keyed hashing for secrets
petgraph	Topological sort of the task dependency graph
rayon	Parallel input hashing for large workspaces
rusqlite (bundled)	Local cache metadata — no external database to install
reflink-copy	Copy-on-write output promotion on btrfs/APFS
winnow	Parser-combinator DSL parsing, instead of regex
ratatui + crossterm	The live terminal dashboard
miette	Source-span-aware error diagnostics

The release profile is tuned hard for a CLI binary that people invoke constantly: lto = true, codegen-units = 1, strip = true, panic = "abort" — full link-time optimization and a lean panic strategy, accepting slower release builds in exchange for faster runtime and a smaller shipped binary. MSRV is pinned at Rust 1.78 and verified in CI on every commit, so "works on my machine" can't quietly creep in via a newer-than-advertised compiler feature.

Design decisions and tradeoffs

Two execution modes, because one mode can't serve both audiences

Every task declares whether it runs in graph mode (staged, cached, deterministic, with validated outputs) or interactive mode (runs directly against the workspace with an inherited TTY — the right shape for a dev server or a REPL). Graph mode is what makes caching and reproducibility possible at all, but it requires explicit @in/@out contracts, which is friction a long-running interactive process doesn't need and shouldn't pay. Forcing every task into one mode means either dev servers get awkwardly shoehorned into a caching model that doesn't apply to them, or every task loses caching to accommodate the few that need a live terminal. Making the mode a per-task declaration instead of a global runner setting means both kinds of task get the model that actually fits them.

Fingerprints built from content, not from clocks

A task's cache key is a BLAKE3 hash over a structured manifest — task definition, resolved variables, environment variables explicitly declared as inputs, and the content hash of every file matched by its @in patterns. Because it's built from actual content rather than file modification timestamps, the fingerprint is identical across machines and across checkouts as long as the content is identical, and a single changed byte in a tracked input produces a completely different hash. This is also what makes --explain possible at all: when a fingerprint differs from the cached one, BroSki can diff the two manifests and report the specific entry that changed — cache miss: input changed: tests/test_api.py — instead of just "cache miss," because the manifest is the explanation, not a side log kept in sync with it.

Secrets get special handling inside the same manifest: @secret_env values are hashed with keyed BLAKE3, using a 32-byte salt generated per-workspace on first run and stored at .broski/config/salt. The manifest needs some representation of a secret's value to detect when it changes — caching has to be sensitive to secret rotation — but an unsalted hash of a secret is a rainbow-table attack waiting to happen the moment that manifest shows up in a CI log or an --explain output. A workspace-local salt means the hash is only meaningful inside that workspace, closing the leak without giving up cache invalidation correctness.

Transactional output promotion, because "mostly succeeded" should not look like "succeeded"

This is the core ACID guarantee the whole project is built around. When a task finishes, its outputs don't get written directly into the workspace — they're staged, then promoted through a sequence designed to leave the workspace in exactly one of two states: fully updated, or completely unchanged.

fn promote_outputs(&self, stage_workspace: &Path, outputs: &[PathBuf]) -> Result<()> {
    // 1. Back up any existing outputs into a transactional temp dir
    // 2. Attempt to atomically rename every staged output into place
    // 3. If ANY rename fails, roll back ALL of them from the backups,
    //    in reverse order, before returning the error
}

Without this, a task that dies on output 3 of 5 leaves a workspace with two new files and three stale ones — a state that's neither "the old build" nor "the new build," and one the next run's cache logic has no honest way to reason about. The transactional version guarantees the next run sees either the complete prior state or the complete new state, with nothing in between, which is exactly the property a cache needs to stay trustworthy after a crash.

Rendering diagram…

There is no third outcome. Every path through this diagram ends at either "fully new" or "fully old" — the two states --explain and the next run's fingerprint comparison are actually designed to reason about.

Staged execution, so a crashed task never gets the chance to corrupt anything

Before a graph-mode task runs at all, its inputs are snapshotted into an ephemeral directory under .broski/stage/<uuid>, and the task executes there — not in the live workspace. Two things fall out of this for free: a task with implicit side effects (writing somewhere it wasn't supposed to) can't pollute the real workspace even before transactional promotion gets involved, and a cache hit can skip execution entirely and jump straight to restoring outputs, since there's no live workspace state to reconcile against. Staging and transactional promotion are doing different jobs that happen to compound — staging keeps execution from touching the workspace at all, promotion makes the final write atomic — and together they mean a failed task simply can't leave evidence behind.

Isolation that adapts to what the platform actually offers

pub enum IsolationMode { Strict, BestEffort, Off }

fn effective_isolation(&self) -> IsolationMode {
    if let Some(mode) = self.isolation { return mode; }
    if cfg!(target_os = "linux") { IsolationMode::Strict } else { IsolationMode::BestEffort }
}

On Linux, the default is Strict: tasks run inside bubblewrap, with the stage directory mounted as root, only /usr, /bin, /lib, /lib64, and /etc bind-mounted in from the host, and $HOME masked with an empty tmpfs — real sandboxing, not a polite suggestion. That capability doesn't exist the same way on macOS or Windows, so the default there is BestEffort: the staging-directory isolation described above, without the kernel-level namespace guarantees bwrap provides. Off exists too, for tasks that genuinely need direct workspace access and accept the tradeoff explicitly. Defaulting to the strongest isolation each platform can actually support — rather than picking one isolation strategy and pretending it works everywhere — means Linux CI gets real sandboxing without macOS users hitting a runner that simply fails to start because bwrap isn't available.

Rendering diagram…

An event stream between the executor and everything that watches it

The terminal dashboard doesn't query the executor's internal state directly — it consumes a stream of ProgressEvents (TaskStarted, TaskPhase, LogLine, TaskFinished, and so on) over an mpsc channel, exactly the same interface a future JSON-output mode or a metrics exporter would use. The executor has no idea the TUI exists. That decoupling is what makes line-by-line log streaming, live ETA estimates from historical per-task duration, and selective force-rerun (press x on one task in the dashboard to bypass just its cache) all additive features on top of the executor rather than special cases wired into it.

Cooperative cancellation instead of an instant kill

Ctrl-C is two-stage: the first press stops queuing new tasks and waits for whatever's already running to finish naturally; a second press escalates to sending SIGTERM to in-flight child processes. An instant hard-kill is simpler to implement but routinely leaves the same kind of partial-output mess the transactional promotion logic exists specifically to prevent — interrupting a task mid-write is functionally the same failure mode as the task crashing on its own. Two-stage cancellation gives a fast exit path for someone who really needs to leave right now, while making the default "I changed my mind" case finish cleanly instead of needing the next run's transactional rollback to clean up after an interrupt.

Hard problems, solved

A failing command in the middle of a script that still reported success

Multi-line task bodies run through sh -lc "<script>", and the shell's default behavior is to keep executing after a failing command (set +e) unless told otherwise. A task like mkdir dist; cargo fmt --all --check; printf ok > stamp would have its check fail silently — the shell moved on to printf, which always succeeds, so the task exited 0, got cached, and reported as passing. The fix is one line: set -e injected ahead of every multi-line script body, so any intermediate failure stops execution immediately instead of getting buried under whatever ran after it. This is the kind of bug that's invisible until someone notices their CI is green on a build that should have failed, and by then it's already cost real debugging time tracing back through a passing cache entry that should never have existed.

A glob pattern that copied the same file twice and crashed

An @in pattern ending in **/* matches both a directory entry and every file nested inside it — docs/architecture and docs/architecture/cache-explain.mdx both show up as separate matches from the same glob. Stage snapshotting copied each match independently, so a file reached both directly and via its parent directory's recursive walk got copied twice — and the second reflink copy crashed with File exists. The fix dedupes at the level of the actual resolved file path rather than the top-level glob entry, so overlapping matches collapse to one copy regardless of which pattern technically matched them. Glob syntax is user-friendly precisely because it's forgiving about exactly this kind of overlap; the staging logic needed to be equally forgiving, or every workspace with a directory glob next to a file glob inside it would hit this.

A stale lock that survived process death because PIDs get reused

The runtime lock used to check whether a process holding it was still alive purely by PID liveness — if kill -0 <pid> succeeded, the lock was treated as active. The flaw: PIDs get reused by the OS, so a process that died and a new, unrelated process that happened to get assigned the same PID would be indistinguishable, and a genuinely stale lock could block subsequent runs indefinitely. The Linux-specific fix adds a second signal: the lock record now also stores the process's start time, read from /proc/<pid>/stat, which is a monotonically increasing counter the kernel never reuses. A lock is only considered active if both the PID is alive and its start time matches what was recorded — closing the reuse window without needing any new external dependency.

An interactive task that deadlocked inside the dashboard

Running a task with @mode interactive while the TUI dashboard was active produced a deadlock: the dashboard's raw terminal mode, alternate screen buffer, and mouse capture meant the child process couldn't get a usable TTY to read from, so any prompt or REPL the task launched simply hung. The fix detects interactive-mode tasks ahead of execution and has the dashboard step out of the way for the duration: leave raw mode and the alt screen, restore a normal cooked terminal, hand the child process a fully inherited TTY, wait for it to exit, then re-enter the dashboard state. It's explicitly logged as best-effort rather than a complete solution — an embedded PTY that would let the dashboard stay visible around an interactive task is left as deliberate future work, not something that got silently dropped.

Secret redaction that buffered the entire process output before showing any of it

The first version of secret redaction for interactive tasks used Command::output(), which buffers a child process's entire stdout/stderr until it exits before returning anything. For a long-running or chatty interactive task, that meant no terminal output at all until completion — a dead-looking terminal for the task's whole runtime — and for genuinely large output, outright memory exhaustion or pipe backpressure deadlocks. The fix switches to spawn() with dedicated reader threads for stdout and stderr, applying redaction line-by-line as data arrives and flushing immediately, instead of redacting a buffered blob after the fact. Redaction and interactivity looked like they were in tension — show output immediately, or show it safely — and the actual fix was realizing line-granularity redaction gets both at once.

What I'd do differently

A hosted remote cache — letting a team share build results across machines and CI the way the local SQLite store works for one machine — is explicitly parked, not abandoned. The project's own notes are direct about why: it would mean owning auth, billing, and a usage dashboard, none of which are worth building until there's runway to support paying users on infrastructure that has to stay up. What is already done is the part that makes that decision safe to defer: broski-store's ArtifactStore trait was designed as the seam from day one, so a remote backend is a new implementation of an existing interface, not a retrofit into code that assumed "local" everywhere. Architecting for the optional feature before deciding whether to build it is the right order — it makes "we're not doing this yet" a genuinely revisitable decision instead of a wall the codebase would have to be rebuilt to get past.

Stack at a glance


Language	Rust, 5-crate Cargo workspace
Fingerprinting	BLAKE3, content-addressed, keyed hashing for secrets
Graph resolution	petgraph, topological layer execution
Local cache	Bundled SQLite + content-addressed object store
Output writes	Transactional promotion — atomic rename with full rollback on partial failure
Isolation	bubblewrap (Linux, strict) / staged temp dir (macOS/Windows, best-effort) / off
Terminal UI	Ratatui, decoupled via an executor event stream
Parsing	winnow parser combinators, explicit DSL versioning
Quality bar	MSRV 1.78, strict Clippy, 50%+ coverage floor, full e2e fixture suite

More deep dives in this series: SuperSay, an on-device TTS app built around the exact same instinct — restructure the work so a hard serial constraint doesn't become a user-facing delay — SuperZen, a macOS wellness app where a single heartbeat plays the same "one source of truth" role BroSki's transactional promotion plays here, and ytdld, a terminal downloader that answers a smaller version of the same question — what happens to in-flight work when something interrupts it — by raising an exception from inside a callback instead of building a whole cancellation protocol.