Backend Authority Decision — May 2026
Snapshot date: 2026-05-09, with 2026-05-19 state-authority addendum. Author: J2 of the May 10-16 sprint plan (2026-05-10-cache-forward-toward-rbe.md).
Status: Approved. Drafted by Claude from existing repo truth; Jess signed off on 2026-05-09 as product authority.
Purpose
Lock down what the GloriousFlywheel substrate trusts each backend for, so that future work — Attic publication, HA OpenTofu state, RBE CAS/action-cache — does not silently inherit an interim trust model that hasn’t been re-validated.
This decision is referenced by:
- BCR/Bzlmod posture
- BCR/RBE/RustFS product reality review
- The May 10-16 sprint plan (J2 deliverable)
Decision Summary
- Current RustFS is interim-only. The live singleton RustFS path is acceptable only for guarded reads and non-trusted state probes. It is not the current trusted backend for Attic publication, strict HA OpenTofu state, or future RBE CAS/action-cache because bucket-index visibility has repeatedly failed and restart has been the observed recovery path. A future RustFS promotion would require a separate TIN-1147 repair/topology proof and an explicit replacement decision; green canaries or restart recovery do not count.
- Attic trusted writes stay quarantined. TIN-1043 closed the default-read-only quarantine; TIN-1046 owns any future trusted publication ramp, gated on a non-restart repair path, a different backend, or a clean representative ramp after a backend fix.
- HA OpenTofu state has a candidate but no live endpoint. TIN-1016 selected a managed/appliance S3-compatible service as the proof target; TIN-1026 is the live-endpoint blocker. TIN-1012 stays In Progress until a real HA endpoint or replicated RustFS proof passes strict mode.
- RBE CAS / action-cache is a separate future design. Broad/default
RBE is the product goal, but its durable CAS/action-cache authority does
not inherit the current RustFS trust model. Backend, auth, retention,
quota, tenant isolation, and observability for RBE storage are evaluated as
their own production gate. The narrow
gf-reapi-cellproof uses node-local storage by design and does not pre-commit a CAS choice.
Context
What is true today (verified 2026-05-09; amended 2026-05-19)
tests/ha_state_candidate_inventory.shclassifies the currentattic-rustfs-openebsRustFS service as interim_only — not HA-ready..github/actions/nix-job/action.yml:10defaultspush-cachetofalse.scripts/validate-attic-write-quarantine.pyenforces the quarantine in CI workflows.- The 2026-05-06 RustFS bucket-index incident
(
docs/research/gloriousflywheel-attic-rustfs-nar-index-incident-2026-05-06.md) reproducedNoSuchBucket+ HTTP 500 on both small and medium publication probes; restart restored the S3 view but no non-restart repair path exists. - The TIN-1016 candidate contract
(
docs/contracts/ha-opentofu-state-managed-s3-candidate.json) chose managed/appliance S3-compatible state. TIN-1026 is the live-endpoint package + scopedTOFU_HA_STATE_*proof credentials blocker. - On 2026-05-19, the active RustFS state path failed the interim readiness
guard again:
tofu-statewas absent from S3list-bucketswhile/data/tofu-stateand/data/.rustfs.sys/buckets/tofu-stateremained present. GloriousFlywheel PR #735 also failedPlan ARC Runnerson the same state-authority guard. The PR fixes parser truth only; it does not make RustFS a deploy/state authority. - Later on 2026-05-19, post-merge RustFS canary run
26083251931passed. That is renewed current coherence evidence, not non-restart repair or strict HA proof. - The narrow
gf-reapi-cellproof (gf-reapi-cell.md) deliberately uses node-local PVC storage on acompute-expansionlane and is explicitly separated from RustFS, Attic, and OpenTofu state buckets.
What this decision is NOT
- Not a vendor decision for HA state. TIN-1016 names a class (managed S3-compatible); the specific vendor decision lands when TIN-1026 unblocks the live endpoint.
- Not a Garage/SeaweedFS/MinIO/managed-S3 selection for RBE. RBE CAS/action-cache backend selection is gated on the broad-RBE work and does not happen in this decision.
- Not a deprecation of RustFS. RustFS continues to back guarded interim read paths where the guard is green. The May 19 state-authority failure means it must not be treated as deploy/state authority, trusted Attic publication, strict HA state, or future CAS/action-cache authority until TIN-1147 repairs or replaces that role.
Decision Detail
Per-role backend posture
| Role | Current backend | Trusted? | Reference |
|---|---|---|---|
| Nix cache reads (Attic) | RustFS via attic-rustfs-openebs |
Yes — guarded interim | tofu/modules/rustfs/main.tf, roadmap “Now” |
| Trusted Attic writes (publication) | RustFS | No — quarantined | TIN-1043, validate-attic-write-quarantine.py |
OpenTofu state (tofu-state) |
RustFS bucket | Degraded guarded interim; recurring guard failures | TIN-1147 evidence; just tofu-state-ha-readiness |
| Strict HA OpenTofu state | (no live backend) | No backend selected | TIN-1012 In Progress; TIN-1026 blocker |
| Bazel remote action cache | RustFS-backed bazel-cache bucket via attic-rustfs-openebs |
Yes for cache-forward acceleration; not for trusted writes | roadmap “Now” |
| RBE CAS / action cache | (none — narrow proof uses node-local PVC) | N/A — separate future design | docs/build-system/gf-reapi-cell.md “Storage Boundary” |
| WAS-110 public archive mirror | was110-public-inputs RustFS bucket |
Yes for read-side public-input pinning | roadmap “Now” |
Required next decisions (out of scope here)
These are not J2 decisions; they are flagged so future authors can pick them up cleanly:
- TIN-1026 live HA endpoint package. Names the vendor + endpoint for managed S3-compatible state. Owns the actual cutover.
- RBE CAS / action-cache backend selection. Probably blocks broader RBE rollout. Distinct from HA state because the write profile, retention, and auth model differ. See “Stop/Go Table” in the BCR/RBE/RustFS product reality review.
- Trusted Attic write ramp (TIN-1046). Either a RustFS non-restart
repair, a different backend, or a clean representative ramp after a
backend fix.
rustfs-trusted-publication-backend-gate.jsonis the static TIN-1147 stop/go gate for that backend decision. It keeps restart-only recovery, green canary-only coherence, source-only admin-route existence, and unrelated RBE or OpenTofu state evidence from counting as trusted Attic publication backend proof.rustfs-upgrade-topology-proof-plan.jsonis the non-mutating beta.4 upgrade/topology operating plan: it narrows the eventual live change torustfs_image, requiresjust tofu-plan-guard atticplusjust rustfs-upgrade-topology-plan-guardand operator approval, rejects Civo, and keeps TIN-1046 blocked until representative publication evidence clears the current failure classes. The managedDeploy Attic Stackworkflow exposes this as manualplan_scope=rustfs_upgrade_topology: only plan may continue past the expected-red state authority check, and only to produce a saved plan that passes both guards; apply remains strict and still requires operator approval plus post-upgrade evidence.
Open questions
These do not block J2 sign-off but should be tracked:
- Does
tofu-state-on-RustFS need to migrate before TIN-1026’s HA endpoint lands, or is an emergency restoration window needed first? It can no longer be described as simply “interim, read-side trusted” while the guard is red. - Should
was110-public-inputsgraduate to a different backend if the public-alpha mirror story expands? (Currently fine as a guarded read-side use.) - When RBE broad-rollout work begins, who owns the CAS backend evaluation — the same person who owns TIN-1016, or a separate evaluation?
Approval
| Role | Name | Sign-off |
|---|---|---|
| Product authority (Jess) | Jess Sullivan | approved 2026-05-09 |
| Drafter (Claude) | n/a | drafted 2026-05-09 |
Related
- TIN-1012 — strict HA state authority, In Progress
- TIN-1016 — selected managed/appliance S3-compatible state candidate
- TIN-1026 — live HA endpoint package (active blocker)
- TIN-1043 — Attic publication quarantine, Done
- TIN-1046 — future trusted Attic publication ramp owner
- TIN-1070 — May 10-16 sprint control list
docs/research/gloriousflywheel-attic-rustfs-nar-index-incident-2026-05-06.mddocs/research/gloriousflywheel-bcr-rbe-rustfs-product-reality-2026-05-08.mddocs/build-system/gf-reapi-cell.md— narrow proof storage boundarytests/ha_state_candidate_inventory.sh— interim_only classification