GloriousFlywheel RBE Platform Sprint Plan, 2026-05-10
Date: 2026-05-09
Related issues: TIN-974, TIN-668, TIN-1012, TIN-1041, TIN-1043, TIN-1046
Goal
Move GloriousFlywheel one week closer to the target product: a clean Bazel/Nix-oriented enterprise build substrate that accelerates resource constrained developer machines and CI through shared caches, capability-class runners, package authority, and target-scoped Bazel remote execution.
The current product truth remains cache-forward local/CI execution plus narrow explicit REAPI proofs. This sprint must not claim broad/default Bazel RBE, a trusted RustFS write path, official BCR readiness, or repo-specific runner taxonomy.
Success Metrics
- Default branch stays green across
Validate,Secret Detection,Source Bazel Proof,Platform Proof,Deploy Docs, andPublish to FlakeHub. just arc-runtime-audit --fail-on-listener-cap-drift --fail-on-runner-count-drift --fail-on-runner-session-drifteither passes after an approved maintenance window or records a concrete runner-control blocker.- Attic trusted writes stay quarantined by default; RustFS is not promoted to HA state, BCR, or future RBE CAS/action-cache authority.
- Bazel external input authority advances beyond an unqualified
upstream-with-retriesstory for at least one real consumer path, or the missing mirror/repository-cache/distdir input is named precisely. - One additional RBE target class is selected and either proved with nonzero remote process evidence or blocked with an exact hermeticity/toolchain reason.
- BCR/Bzlmod work is tracked as package authority, separate from RBE execution authority.
Owners
- Jess: live operations, backend authority decisions, maintenance windows, and final product tradeoff calls.
- Codex: repo guards, proof harnesses, RBE target eligibility, external input authority, and CI validation.
- Claude: docs, Linear/GitHub tracker hygiene, BCR/Bzlmod package posture, and product narrative cleanup.
Workstreams
1. Runner Control Plane
Start with live ARC truth, not green-workflow inference. The first gate is:
just arc-runtime-audit \
--fail-on-listener-cap-drift \
--fail-on-runner-count-drift \
--fail-on-runner-session-drift
If it fails, classify whether the issue is missing listener config, stale EphemeralRunner pods, runner-count drift, broker/session errors, kubelet/node pressure, or queued-job/listener drift. Do not create repo-specific runner labels as a workaround.
2. Backend And Cache Authority
Keep role boundaries strict:
- RustFS may remain guarded interim infrastructure for current reads/state checks.
- RustFS must not become trusted Attic write-publication authority until the bucket-index failure class has a non-restart repair, clean representative proof, or replacement backend.
- Future RBE CAS/action-cache storage must be designed separately from the current RustFS singleton.
3. Bazel External Input And Developer Attachment
Promote repository-cache, distdir, approved mirror, or generated injected
repository paths as external input authority. BAZEL_REMOTE_CACHE only covers
action outputs; it does not make repository archives authoritative.
Current next gate: the generated Node.js 22.13.1 Linux x64 toolchain archive
used by the source proof is materialized into an ephemeral verified distdir
before Bazel starts, and
docs/contracts/bazel-distdir-source-proof-coverage.json now makes the one
required source-proof archive plus the seven deferred generated Node archives
machine-checkable. Finishing this lane still means turning that source-proof
staging into durable fetch authority with auth, retention, restore, provenance,
and consumer exposure clearly stated, then covering the remaining generated
platform archives. docs/contracts/bazel-external-input-durable-authority.json
keeps that status machine-checkable as no-live-durable-authority until a real
backend and restore proof exist.
Developer-machine attachment remains operator-provided endpoint only until a tailnet/public endpoint and auth story are explicitly selected.
4. RBE Target-Class Expansion
Select one small hermetic target class for the next proof. Prefer a bounded TypeScript/docs-adjacent build or test shape before attempting Rust, Zig, Chapel, or C++ breadth. Avoid OpenTofu, image publication, developer servers, and privileged host-state actions until their blockers are removed.
Promotion requires --remote_executor, --remote_cache,
--remote_accept_cached=false, nonzero remote process evidence, and worker
provenance. Cache hits, ARC dispatch, and self-hosted runner execution do not
count as RBE.
5. BCR/Bzlmod Package Authority
Advance BCR/Bzlmod as package authority:
- reconcile dependency pins such as TIN-1041
- keep the
attic-iaccompatibility module-name boundary explicit - decide internal registry versus public BCR posture with release evidence
- do not treat BCR progress as RBE proof
Week Closeout
End the sprint with one tracker update that states:
- what is product-grade now
- what is proof-only
- what is still blocked
- the next target class or backend gate
- any live ARC/RustFS maintenance evidence gathered during the week