GloriousFlywheel RBE Platform Sprint Plan, 2026-05-10

GloriousFlywheel RBE Platform Sprint Plan, 2026-05-10

Date: 2026-05-09

Related issues: TIN-974, TIN-668, TIN-1012, TIN-1041, TIN-1043, TIN-1046

Goal

Move GloriousFlywheel one week closer to the target product: a clean Bazel/Nix-oriented enterprise build substrate that accelerates resource constrained developer machines and CI through shared caches, capability-class runners, package authority, and target-scoped Bazel remote execution.

The current product truth remains cache-forward local/CI execution plus narrow explicit REAPI proofs. This sprint must not claim broad/default Bazel RBE, a trusted RustFS write path, official BCR readiness, or repo-specific runner taxonomy.

Success Metrics

  • Default branch stays green across Validate, Secret Detection, Source Bazel Proof, Platform Proof, Deploy Docs, and Publish to FlakeHub.
  • just arc-runtime-audit --fail-on-listener-cap-drift --fail-on-runner-count-drift --fail-on-runner-session-drift either passes after an approved maintenance window or records a concrete runner-control blocker.
  • Attic trusted writes stay quarantined by default; RustFS is not promoted to HA state, BCR, or future RBE CAS/action-cache authority.
  • Bazel external input authority advances beyond an unqualified upstream-with-retries story for at least one real consumer path, or the missing mirror/repository-cache/distdir input is named precisely.
  • One additional RBE target class is selected and either proved with nonzero remote process evidence or blocked with an exact hermeticity/toolchain reason.
  • BCR/Bzlmod work is tracked as package authority, separate from RBE execution authority.

Owners

  • Jess: live operations, backend authority decisions, maintenance windows, and final product tradeoff calls.
  • Codex: repo guards, proof harnesses, RBE target eligibility, external input authority, and CI validation.
  • Claude: docs, Linear/GitHub tracker hygiene, BCR/Bzlmod package posture, and product narrative cleanup.

Workstreams

1. Runner Control Plane

Start with live ARC truth, not green-workflow inference. The first gate is:

just arc-runtime-audit \
  --fail-on-listener-cap-drift \
  --fail-on-runner-count-drift \
  --fail-on-runner-session-drift

If it fails, classify whether the issue is missing listener config, stale EphemeralRunner pods, runner-count drift, broker/session errors, kubelet/node pressure, or queued-job/listener drift. Do not create repo-specific runner labels as a workaround.

2. Backend And Cache Authority

Keep role boundaries strict:

  • RustFS may remain guarded interim infrastructure for current reads/state checks.
  • RustFS must not become trusted Attic write-publication authority until the bucket-index failure class has a non-restart repair, clean representative proof, or replacement backend.
  • Future RBE CAS/action-cache storage must be designed separately from the current RustFS singleton.

3. Bazel External Input And Developer Attachment

Promote repository-cache, distdir, approved mirror, or generated injected repository paths as external input authority. BAZEL_REMOTE_CACHE only covers action outputs; it does not make repository archives authoritative.

Current next gate: the generated Node.js 22.13.1 Linux x64 toolchain archive used by the source proof is materialized into an ephemeral verified distdir before Bazel starts, and docs/contracts/bazel-distdir-source-proof-coverage.json now makes the one required source-proof archive plus the seven deferred generated Node archives machine-checkable. Finishing this lane still means turning that source-proof staging into durable fetch authority with auth, retention, restore, provenance, and consumer exposure clearly stated, then covering the remaining generated platform archives. docs/contracts/bazel-external-input-durable-authority.json keeps that status machine-checkable as no-live-durable-authority until a real backend and restore proof exist.

Developer-machine attachment remains operator-provided endpoint only until a tailnet/public endpoint and auth story are explicitly selected.

4. RBE Target-Class Expansion

Select one small hermetic target class for the next proof. Prefer a bounded TypeScript/docs-adjacent build or test shape before attempting Rust, Zig, Chapel, or C++ breadth. Avoid OpenTofu, image publication, developer servers, and privileged host-state actions until their blockers are removed.

Promotion requires --remote_executor, --remote_cache, --remote_accept_cached=false, nonzero remote process evidence, and worker provenance. Cache hits, ARC dispatch, and self-hosted runner execution do not count as RBE.

5. BCR/Bzlmod Package Authority

Advance BCR/Bzlmod as package authority:

  • reconcile dependency pins such as TIN-1041
  • keep the attic-iac compatibility module-name boundary explicit
  • decide internal registry versus public BCR posture with release evidence
  • do not treat BCR progress as RBE proof

Week Closeout

End the sprint with one tracker update that states:

  • what is product-grade now
  • what is proof-only
  • what is still blocked
  • the next target class or backend gate
  • any live ARC/RustFS maintenance evidence gathered during the week

GloriousFlywheel