RBE Sprint Gate

GloriousFlywheel RBE Sprint Gate

Date: 2026-04-26

Related Linear: TIN-650, TIN-658 through TIN-675

Purpose

This note gates the proposed Bazel remote build execution sprint against current repo and Linear truth.

It is not an implementation plan. It is the boundary a future implementation plan must satisfy before GloriousFlywheel claims remote build, remote test, or remote execution in the Bazel sense.

Current Ground Truth

As of c3404992f36e743ddf9f50724903c1db3e02f8a2, the default branch proves shared cache acceleration:

  • Source Bazel Proof passes on tinyland-nix
  • the proof requires GF_BAZEL_SUBSTRATE_MODE=shared-cache-backed
  • the proof passes a real BAZEL_REMOTE_CACHE=grpc://bazel-cache.nix-cache.svc.cluster.local:9092 endpoint through scripts/bazel-cache-backed.sh
  • the proof reports remote cache use, currently including 1 remote cache hit
  • external repository fetch authority remains upstream-with-retries, not repository-cache or distdir authority

That is a real product capability. It is not Bazel remote execution.

Current repo facts:

  • .bazelrc contains build:ci-cached remote-cache settings only
  • there is no --remote_executor config
  • there is no BAZEL_REMOTE_EXECUTOR contract
  • scripts/cache-attachment-contract.sh explicitly says the preflight is not a remote-execution proof
  • scripts/bazel-cache-backed.sh passes only --remote_cache
  • local developer shells remain compatibility-local-only unless an operator provides a routable BAZEL_REMOTE_CACHE
  • ARC and GitHub Actions dispatch run jobs on remote machines, but that is coarse-grained remote CI/job execution. It is not Bazel action-level remote execution and must not be counted as REAPI proof.

Useful Parts Of The Proposed RBE Plan

The April 26 RBE planning pass correctly identified several important facts:

  • remote cache is not remote execution
  • --remote_executor or equivalent executor wiring is required before claiming Bazel remote execution
  • build/tofu/rules.bzl still gives tofu_fmt_test a BUILD_WORKSPACE_DIRECTORY local-workspace preference, which must be revisited before counting fmt checks as remote-execution eligible
  • build/tofu/rules.bzl no longer uses BUILD_WORKSPACE_DIRECTORY in tofu_validate, but validation still relies on ambient tofu from PATH and use_default_shell_env = True, which are hermeticity hazards
  • starting proof with a small hermetic JS/TS target is safer than trying to move all Tofu validation remotely at once
  • TIN-620 network continuity and TIN-627 capacity policy matter before any RBE proof can be counted as operationally reliable

Unsafe Or Ungrounded Parts

The same plan also introduced assumptions that must not become authority:

  • Buildbarn, Buildfarm, BuildBuddy, and NativeLink are class-peer projects to compare against, not dependencies or selected backends.
  • Linear issues TIN-658 through TIN-675 are backlog scaffolding, not accepted execution authority.
  • TIN-673 must not be interpreted as permission to update the README to claim remote build before proof exists.
  • A .bazelrc line like build:rbe --remote_executor= is not the right contract pattern. Bazel rc files do not expand shell environment variables in option values; the current cache-backed design intentionally passes endpoints through wrappers after strict validation.
  • --modify_execution_info=TofuFmtCheck=+no-remote-exec is not grounded in the current rule implementation unless the fmt rule is given a matching mnemonic or other reliable execution property. TofuValidate currently has a mnemonic; tofu_fmt_test does not.
  • “All Tofu targets become RBE-eligible” is too broad until every action, toolchain, provider fetch, generated file, and environment assumption is inventoried.
  • Embedding Nix closures in worker images is plausible, but it is a product and supply-chain design decision, not an automatic phase-two detail.
  • tinyland-nix-heavy should not be the default place to run RBE proof merely because it is near compute. Runner label use must remain capability-driven and capacity-aware.

Peer Project And Competitive Boundary

Buildbarn, Buildfarm, BuildBuddy, and NativeLink are useful reference points because they live in the same broad product class: shared build cache, CAS, worker/executor management, scheduling, and Bazel Remote Execution API support. They are peers and possible competitors, not normal GloriousFlywheel dependencies.

GloriousFlywheel already overlaps parts of that class through ARC-backed capability lanes, Attic-backed Nix substitution, bazel-remote-backed cache attachment, and owner-overlay enrollment. The missing piece is not “some remote machine can run a job”; GitHub Actions already provides that. The missing piece is a Bazel-compatible action executor that speaks the Remote Execution API and can be attached through a validated --remote_executor contract.

GloriousFlywheel’s product contract should stay backend-neutral: attach Bazel clients to a validated Remote Execution API endpoint, prove the shared cache and executor behavior, and keep the operator-facing substrate independent of any single implementation unless a later architecture decision deliberately chooses one.

NativeLink remains a reasonable spike target inside that peer set because it has a small operational shape and documented Bazel support. However, the official NativeLink Local Remote Execution documentation describes LRE as experimental and currently limited to x86_64-linux, and the official remote-execution examples separate cache-only, cache-plus-execution, and hybrid execution patterns. Those constraints match this repo’s risk profile: a peer spike is reasonable, but a peer deployment should not be treated as predetermined product truth or a required GloriousFlywheel dependency.

Before implementing tofu/modules/nativelink/ or any equivalent peer-shaped adapter, answer these questions in a short architecture decision:

  • backend posture: backend-neutral contract, NativeLink spike, BuildBuddy, Buildbarn, Buildfarm, or defer
  • license and public-product compatibility
  • CAS/storage backend and retention model
  • auth model for internal runners, local developer machines, and future public consumers
  • scheduler and worker placement, including pod-count and imagefs headroom
  • worker image provenance and update policy
  • remote cache interoperability with the existing Bazel cache
  • BES and observability expectations
  • sandbox, network, and secret isolation model
  • minimum proof targets and exact success counters

Minimum Countable RBE Proof

The first countable RBE proof should be deliberately small.

Required evidence:

  • a wrapper validates BAZEL_REMOTE_EXECUTOR separately from BAZEL_REMOTE_CACHE
  • Bazel receives both endpoints as explicit CLI options, not literal rc placeholders
  • the build log shows remote processes, not only remote cache hits
  • a hermetic target such as //app:build or //app:unit_tests succeeds through the executor
  • unsupported targets are either tagged local-only or explicitly excluded with a documented reason
  • the proof runs on a shared capability label, not a repo-shaped label
  • docs continue to say “shared cache acceleration” until this proof lands on default branch

Only after that proof should README or public-alpha language expand toward “remote build.”

Linear Hygiene Boundary

The May-Aug productization project now contains a NativeLink-shaped milestone and issue set. Keep it, but treat it as draft scaffolding.

Immediate Linear cleanup should:

  • add a gate issue for “Choose and document RBE backend architecture”
  • block or annotate TIN-658 through TIN-662 on that decision
  • retitle or annotate TIN-673 so the README claim happens only after default branch proof
  • keep TIN-650 separate as the nearer-term developer-machine shared-cache attachment proof
  • reconcile the old Truthing project separately; it still contains unresolved backlog items and was not cleaned up by the RBE planning pass

Decision

Do not start RBE implementation from the generated NativeLink task list.

The next correct slice is an architecture decision and proof-contract gate that keeps the existing cache-backed dogfood contract stable while defining the smallest credible --remote_executor proof.

GloriousFlywheel