GloriousFlywheel BCR, RBE, And RustFS Product Reality Review
Date: 2026-05-08
Related issues: TIN-974, TIN-1043, TIN-1046, TIN-1012, TIN-1027, TIN-665, TIN-1041
Executive Reality
GloriousFlywheel is currently a cache-forward local/CI execution substrate.
That is real product value: shared ARC runner lanes, Attic-backed Nix
substitution, Bazel remote-cache acceleration, implementation overlays, and
repo-managed proof workflows are all working on main.
It is not yet the full target product.
The target product is broader:
- local acceleration through shared caches that attach cleanly from devshells, CI, and runner jobs
- BCR/Bzlmod package authority where reusable packages resolve from versioned module releases instead of local copies or ad hoc source pins
- Bazel external input authority through repository-cache, distdir, approved mirrors, or generated injected repositories rather than mutable upstream URLs at build time
- Bazel remote execution through a countable REAPI executor endpoint, separate from remote cache, with remote process evidence
- remote test and remote build expansion after target eligibility is classified
- remote runner capacity that remains capability-class based rather than repo-shaped
- storage authorities that are separated by role: OpenTofu state, Attic cache, Bazel cache/CAS/action-cache, BCR mirrors, and future RBE CAS must not be conflated
The most important current blocker is RustFS reliability debt. RustFS can still serve the cache-forward read path after recovery, but it has reproduced bucket-index loss during trusted Attic publication. That makes it unsafe as a trusted write-publication backend and disqualifies it as a final HA state authority or future RBE CAS/action-cache authority until repair, replacement, or a stronger recovery proof exists.
Where We Are
Pooled Runner Substrate
Current main proves the shared runner model:
- capability classes such as
tinyland-nix,tinyland-docker,tinyland-dind, and bounded additive lanes such astinyland-nix-heavyremain the product taxonomy - repo-specific runner labels are debt, not normal product structure
- implementation overlays own owner-specific GitHub App installs, tfvars, backend settings, and private registration anchors
- ARC/GitHub Actions jobs are remote CI jobs, but that is coarse-grained runner execution, not Bazel action-level remote execution
This is a viable pooled runner substrate. It is still not the same thing as Bazel remote execution.
Nix And Attic
Current Attic truth:
- Attic reads are part of the intended shared cache substrate
nix-jobdefaultspush-cachetofalse- pull requests remain read-only for Attic publication
- pilot and downstream examples may use default-branch plus
ATTIC_TOKENgatedpush-cache - broad GloriousFlywheel proof workflows still keep
push-cache: "false"while the RustFS bucket-index recurrence is unresolved - the manual Attic publication probe remains the only current strict
require-cache-pushtrusted-write workflow exception small-checkandmedium-checkare now controlled reproduction profiles, not safe ramp steps
Trusted Attic writes remain quarantined. TIN-1043 closed the immediate
default-read-only safety gate. TIN-1046 owns the trusted publication ramp, but
TIN-1147 is the active stop/go blocker: it must prove non-restart RustFS
repair/reindex, a RustFS upgrade/topology fix, or a replacement backend before
any clean representative write ramp can restore broad push-cache.
Bazel Remote Cache
Current Bazel truth:
BAZEL_REMOTE_CACHEis the only default Bazel substrate endpoint- source-repo proof passes cache-backed Bazel through the repo-managed wrapper
.bazelrcintentionally avoids executor endpoint literals and placeholder--remote_executor=valuesscripts/bazel-cache-backed.shcan enter opt-inexecutor-backedmode only whenBAZEL_REMOTE_EXECUTORis set and the strict contract validates the shell as executor-backed- ARC runner endpoint wiring can now inject a backend-neutral
BAZEL_REMOTE_EXECUTORonly whenbazel_executor_endpointis explicitly configured; the default runner posture remains cache-backed just rbe-boundary-checkkeeps default operational surfaces cache-backed
This is cache-forward Bazel acceleration. It is not Bazel remote execution.
Bazel REAPI Proof Lane
Current RBE proof truth:
gf-reapi-cellis the first GloriousFlywheel-owned REAPI proof endpoint- the proof lane is explicit and non-default through
GF_RBE_PROOF_MODE=explicit - PR #564 proved
//app:buildthrough--remote_executorwith worker imagesha256:be2832171ac69cc9a2d012b3c789e8b765afb7cae0df8f7e9677dd6d8542dbc0 - the proof reported
2308 processes: 1439 internal, 869 remote app/sveltekit_syncandapp/vite_buildboth ran remotely with exit code 0- cache-warm proof reruns must use
GF_RBE_PROOF_FORCE_EXECUTION=true; remote cache hits alone are endpoint continuity evidence, not fresh remote-worker evidence - PR #572 made the WAS-110 public input workflow artifact machine-verifiable;
main run
25589377905built//:public_vendor_handoff_fixturewith forced execution,1 remoteprocess, worker provenance, and injectedwas110_vendor_blobsevidence - PR #582 made build/test proof mode explicit; main run
25601913985tested//app:unit_testswithbazel_command=test, forced execution,527 remoteprocesses, 20 Vitest files, 168 passing tests, and worker evidence fortest-setup.sh app/unit_tests_/unit_tests - main run
25602726443built//:deployment_bundlewithbazel_command=build, forced execution,1 remoteprocess, and worker evidence for therules_pkgbuild_taraction that writesdeployment_bundle.tar.gz - main run
25608601158built//docs-site:buildwithbazel_command=build, forced execution,1046 remoteprocesses, and remoteJsRunBinaryevidence fordocs-site/.svelte-kitanddocs-site/build - PR #604 added Stage 1 rust/c++/go cache-backed test targets; this broadened cache-backed toolchain evidence but did not promote those languages to RBE
- PR #605 fixed gf-reapi-cell output inlining after forced Go proof run
25631848864; retry run25632300253reached2 remoterules_go actions and then failed inruntime/cgowithcc: no such file or directory; run25634296833proved pure-Go//examples/hello-go:hello_testwithbazel_command=test, forced execution,11 remoteprocesses, and remotetest-setupevidence - the RBE target eligibility manifest now records the proved target classes,
promotes
//app:unit_testsas the first remote-test target class, promotes//:deployment_bundleas the first deployment packaging target class, promotes//docs-site:buildfor static docs-site rendering, promotes//examples/hello-go:hello_testfor one pure-Gorules_gounit-test class, promotes//examples/hello-go-cgo:cgo_testfor one cgo-backedrules_gounit-test class, promotes//examples/hello-rust:hello_testand//examples/hello-cc:hello_testfor one trivial unit-test class each, and promotes//docs-site:playwright_chromium_smokefrom run25712694947for one Chromium static-site Vite/SvelteKit Playwright smoke class with1060 remoteprocesses, and promotes public consumer web target classes fromtinyland-inc/omux.xoxd.ai //:puppeteer_chromium_smokerun25826953857,tinyland-inc/omux.xoxd.ai //:playwright_chromium_smokerun25897326537with proof nonce20260515T024138Z-25897326537-1and6 remoteprocesses, andJesssullivan/jesssullivan.github.io//:puppeteer_chromium_smoke///:sveltekit_vite_build_smokeruns25777472760and25779597385, plusJesssullivan/jesssullivan.github.io //:types_unit_testsrun25892939448for one public SvelteKit/Vite/Vitest target class with855 remoteprocesses. Later private web proofs also promote narrowtinyland-inc/tinyland.devapp/package classes andJesssullivan/MassageIthacaclasses, including//:sveltekit_node_buildfrom run25983800544with3193 remoteprocesses, remotesveltekit_sync_bin_/sveltekit_sync_bin, and remotevite_build_bin_/vite_build_binevidence, plustinyland-inc/tinyland.dev //packages/tinyland-a11y-engine:typecheckfrom run25984827370with proof nonce20260517T073751Z-25984827370-1,2 remoteprocesses, remoteesbuildlifecycle-hook execution, and remote TypeScripttscforpackages/tinyland-color-utils, andtinyland-inc/tinyland.dev //:playwright_local_route_smokefrom run25989829826with proof nonce20260517T114200Z-25989829826-1,53 remoteprocesses, remotetest-setup.sh, and a passing local-server Playwright route smoke over loopback-served SvelteKit output. OpenTofu target classes remain blocked until toolchain, provider, runfiles, and mutable-state behavior are hermetic.
This is countable RBE evidence for narrow build, test, and public-input target classes. It is not broad product RBE, not broad Playwright/Puppeteer/E2E RBE, not ARC dispatch, not cache-only execution, and not RustFS-backed CAS/action-cache authority.
Bazel External Fetch Authority
Current external fetch truth:
- the default Source Bazel Proof now materializes a verified ephemeral
BAZEL_DISTDIRfor thenodejs_linux_amd64:22.13.1:linux_amd64toolchain archive before Bazel starts docs/contracts/bazel-distdir-source-proof-coverage.jsonvalidates that required source-proof archive and classifies the other seven generated Node archives as deferred- repo-wide default status is still not durable mirror authority
.bazelrchas retry and timeout mitigation- no repo-wide repository-cache or durable distdir authority is live by default
BAZEL_REPOSITORY_CACHE,BAZEL_DISTDIR, andGF_BAZEL_INJECT_REPOSITORIESare wired as authority inputs for the cache-backed wrapper and explicit RBE proof wrapperdocs/contracts/bazel-external-input-mirror-candidates.jsonrecords candidate integrity for the eight generated Node.js 22.13.1 toolchain archives, but it iscandidate-integrity-onlywithmaterialized: falsedocs/contracts/bazel-external-input-durable-authority.jsonrecords the promotion gate asno-live-durable-authority: no candidate is durably covered yet, all eight candidates are pending, and future promotion requires auth, retention, restore, provenance, and consumer exposure evidence- the WAS-110 public-input mirror proof exists, but it is a specific public input staging path, not universal external fetch authority
Remote cache does not cover repository resolution. The verified ephemeral distdir removes raw Node template fetching from the source proof’s Bazel phase, but the product still needs a durable repository-cache/distdir or mirror policy before it can claim broad cache authority for Bazel external inputs. The durable authority contract makes the promotion criteria executable, but it does not make any storage backend live by itself. Candidate hashes reduce ambiguity about what to mirror next; they do not prove offline fetch authority by themselves.
BCR And Bzlmod
Current BCR/Bzlmod truth:
- GloriousFlywheel itself is Bzlmod-shaped, but its module name is still
attic-iacfor compatibility - Bzlmod currently helps separate reusable core code from implementation overlays
- internal package authority work is happening in package repos and registry follow-ups, not in the RustFS/RBE backend lane
- BCR readiness is not proved by green cache-backed CI
The BCR goal is package/module authority:
- package releases exist at their source authority
- registry entries point to correct versions and dependencies
- consumer repos resolve through Bzlmod without local package copies
- compatibility names and dependency pins are reconciled
- public BCR or internal-registry posture is explicit
That is adjacent to RBE but not the same work. BCR controls module discovery and dependency authority. RBE controls where Bazel actions execute.
RustFS
Current RustFS truth:
- RustFS backs Attic object storage, Bazel cache storage, and the interim OpenTofu state path
- the current active RustFS service is one RustFS Deployment, one service endpoint, one OpenEBS ZFS node, and a bumble-scoped ReadWriteOnce PVC
- after the May 19 recovery,
tofu-stateis currently readable again, but the same date also showed it can disappear from S3 list-buckets while disk markers persist; strict HA still fails - live inventory returns
NO_LIVE_HA_STATE_CANDIDATE - the current RustFS image does not expose an obvious non-restart heal/repair/reindex command surface
- Attic publication has reproduced
NoSuchBucket, HTTP 500, andInternalServerErrorwhile/data/<bucket>and/data/.rustfs.sys/buckets/<bucket>markers existed - May 14 repair-surface inventory confirmed the deployed
rustfs v1.0.0-beta.1CLI exposes onlyserverandinfo; the pod has norustfs-admin,rc,mc,aws, ors5cmdclient binary. The tagged source has internal admin heal endpoints. A May 14 signed background-heal status endpoint probe returned HTTP 200 with valid JSON from/rustfs/admin/v3/background-heal/status; this proves observability, not repair. A source semantics audit also found that the bucket/object heal endpoint drops the parseddryRunoption: the handler builds channel work withcreate_heal_request, that constructor setsdry_run: None, and the heal processor defaults missingdry_runtofalse. There is still no repo-proved signed repair runbook for the live bucket-index recurrence. A follow-on source audit found export/import-bucket-metadata is not a proved repair path: export depends on the currentlist_bucket/get_bucket_infoAPI view, while import is a mutating zip-archive path that can callmake_bucket(force_create)and does not persist accumulated imported metadata config updates in the current handler. TIN-1147 remains active until that repair path, a RustFS upgrade/topology change, or a replacement backend is proved.
RustFS is currently guarded interim infrastructure. It is not an HA state authority, not trusted write-publication authority, not BCR authority, and not future RBE CAS/action-cache authority.
Where We Want To Be
Product North Star
GloriousFlywheel should become a pooled build substrate where a developer or CI job can attach once and receive the same governed acceleration and execution contract:
- Nix substitution from trusted caches
- Bazel remote cache for action outputs
- Bazel external input authority for repository/archive fetches
- Bzlmod/BCR package authority for reusable package dependencies
- REAPI remote execution for eligible Bazel actions
- capability-class runner capacity for workflows that remain runner-shaped
- explicit local-only escape hatches for actions that are not remote-execution eligible
The product should not claim “remote build” just because a GitHub Actions job
runs on a self-hosted runner. The countable remote-build claim starts when
Bazel actions execute through a validated --remote_executor.
RBE Target
The first countable RBE milestone was intentionally small and is now landed for
//app:build. Subsequent target-class expansions have landed for
//app:unit_tests, //:deployment_bundle, the WAS-110 public injected
repository handoff, and //docs-site:build. TIN-1027 and TIN-665 are closed on
the minimum proof, while TIN-668 owns the continuing target-class gate:
- provision a backend-neutral REAPI executor endpoint
- validate
BAZEL_REMOTE_EXECUTORseparately fromBAZEL_REMOTE_CACHE - pass both as explicit Bazel CLI flags through a repo-managed wrapper after strict mode validation
- prove a small hermetic target such as
//app:buildor//app:unit_tests - record remote process evidence, not only remote cache hits
- pin or otherwise identify the worker image/provenance
- mark unsupported targets local-only or explicitly excluded
Public/operator language should move from “shared cache acceleration” to “Bazel remote execution” only for explicitly proved target classes and only through the explicit proof lane or opt-in executor-backed wrapper mode until a default product posture is selected.
BCR Target
The BCR target is a package authority lane:
- source packages publish versioned releases
- internal registry entries are current and dependency pins are reconciled
- consumers resolve through Bzlmod instead of local copies
- compatibility module names are intentionally retired or documented
- official BCR readiness is decided separately from internal registry health
BCR work should run in parallel with backend work, but it should not depend on RustFS being the object store and should not be counted as RBE evidence.
RustFS Target
The RustFS target is a stop/go decision, not endless probing:
- either RustFS gains a proved non-restart repair/reindex path and enough retention/observability to be trusted for the relevant role
- or trusted writes and state authority move to a backend that removes this bucket-index failure class
For OpenTofu state, the selected direction is managed or appliance S3-compatible state authority with scratch and disposable OpenTofu proofs before protected migration.
For Attic cache publication, TIN-1147 is the next gate: backend
repair/reindex, RustFS upgrade/topology fix, or backend replacement must remove
the failure class before a representative clean write ramp can count.
The signed background-heal status probe is useful operator observability, not
repair, so it does not unblock TIN-1046 by itself.
The current tagged RustFS bucket/object heal path is not a safe dry-run proof
surface because dryRun is not preserved into the queued heal request.
The current tagged RustFS bucket-metadata export/import path is also not a
safe repair proof: export uses the current bucket API view, and import is a
mutating archive path rather than a disk-marker reindex.
rustfs-trusted-publication-backend-gate.json
is the static TIN-1147 stop/go gate that keeps the next decision finite:
non-restart RustFS repair/reindex, RustFS upgrade/topology fix, or backend
replacement. It rejects restart-only recovery, canary-only coherence,
source-only admin-route existence, dry-run assumptions, ARC dispatch evidence,
RBE proof evidence, and OpenTofu state-only HA proof as substitutes for
trusted Attic publication backend evidence.
rustfs-upgrade-topology-candidate.json
is the concrete TIN-1152 candidate packet for the upgrade/topology path. It
records upstream RustFS 1.0.0-beta.4 as a candidate because the release
and beta.1…beta.4 comparison touch ListBuckets CreationDate,
filemeta/metacache, bucket metadata, list_object_v2/listing, HeadObject,
scanner/rebalance, and S3 tracing paths, but they explicitly do not claim the
bucket-index recurrence is fixed. The selected Docker Hub beta.4
manifest/platform digests are recorded, but the preflight must now treat
RustFS State Authority Canary as expected-red and preserve the uploaded
evidence artifact alongside normal green main-suite health. Maintenance-window
approval, state readiness, bucket-index RCA, NAR integrity, and representative
small-check/medium-check publication evidence are still required.
rustfs-upgrade-topology-proof-plan.json
turns that candidate into the next source-owned, non-mutating operating plan:
only tofu/stacks/attic/honey.tfvars rustfs_image may change in the eventual
maintenance window, just tofu-plan-guard attic must approve the saved plan,
live secret authorities and OpenEBS/PVC selectors must remain stable, Civo is
not an endpoint or fallback, and TIN-1046 stays blocked until post-upgrade
tofu-state, bucket-index RCA, NAR integrity, and representative publication
evidence clear the current NoSuchBucket, curl 18, and size_download=0
failure classes.
just rustfs-upgrade-topology-plan-guard is the saved-plan guard for that future
maintenance window. It accepts only the digest-pinned beta.1 -> upgrade-topology candidate RustFS
image update on the live Deployment and the drained legacy StatefulSet template
when the shared module input touches both workload templates; it rejects Secret
rotation, selector/PVC/storage/service drift, delete/create actions, wrong image
direction, and plans that do not update the live Deployment.
The managed Deploy Attic Stack workflow now has a manual
plan_scope=rustfs_upgrade_topology path for this candidate: plan may continue
past expected-red TIN-1147 state authority only to produce the saved plan and
run both guards, while apply keeps strict state authority and adds post-apply
candidate-image verification.
For future RBE, CAS/action-cache storage should be designed separately and must not inherit the current RustFS trust gap by accident.
Stop/Go Table
| Area | Current state | Go condition |
|---|---|---|
| Nix cache reads | usable cache-forward acceleration | keep active |
| Trusted Attic writes | quarantined after small/medium reproduction failures | non-restart repair/reindex, backend replacement, or clean representative ramp after backend fix |
| OpenTofu state | readable guarded interim RustFS path | strict HA proof through TIN-1012/TIN-1017 path |
| Bazel remote cache | proved shared cache acceleration; 2026-05-25 recovered one remote-cache digest mismatch, and the cache-object delete reproduced RustFS bucket-index loss until restart recovery | keep active as acceleration only; run decoded CAS integrity audit on digest-mismatch incidents |
| Bazel external inputs | upstream-with-retries |
repository-cache/distdir/mirror authority with retention and consumer policy |
| BCR/Bzlmod | internal Bzlmod/package authority in progress | reconciled package releases, registry entries, dependency pins, and public/internal BCR decision |
| RBE | narrow explicit //app:build, //app:unit_tests, //:deployment_bundle, //docs-site:build, WAS-110 public-input, and target-scoped web/private consumer proofs with worker provenance |
next target inventory, product wrapper posture, and backend authority before broad/default RBE |
| RBE CAS/action-cache | not designed as product authority | separate backend/storage/auth/retention design, not inherited from current RustFS |
Sprint Implications
- Keep the green cache-forward baseline green.
- Keep Attic writes read-only by default until TIN-1147 proves a repaired, upgraded, or replaced backend path and TIN-1046 then records a clean representative ramp.
- Move OpenTofu state off the interim RustFS singleton through the selected HA candidate proof path.
- Continue BCR/package authority work independently from backend repair.
- Harden Bazel external input authority before broad remote-build claims.
- Keep the minimum REAPI executor proof lane alive while deciding backend and product wrapper posture.
- Expand remote test/build eligibility one target class at a time.
Boundary Statement
The current product is cache-forward local/CI execution with shared runners, shared caches, and a narrow explicit REAPI proof lane. It is valuable, but it is not yet full RBE, not official BCR readiness, not HA state authority, and not a trusted RustFS write-publication backend.