GloriousFlywheel Attic RustFS NAR Index Incident - 2026-05-06
Summary
The main Platform Proof for GloriousFlywheel commit
15afd127a3074443e6968fa14525a0279abbd869 passed only because the normal Nix
proof now uses nix build ... --fallback.
The underlying cache failure was still live: Attic returned a valid narinfo for
/nix/store/8iv5j0f2difw6wg9vwj9r2raacb08fkv-runner-dashboard-0.1.0, but the
advertised NAR body stream failed from the RustFS storage path.
This is cache-object integrity and bucket-index reliability debt. It is not Bazel remote execution, and it is not proof of a remote execution backend.
Evidence
The failing narinfo advertised:
- store path:
/nix/store/8iv5j0f2difw6wg9vwj9r2raacb08fkv-runner-dashboard-0.1.0 - URL:
nar/8iv5j0f2difw6wg9vwj9r2raacb08fkv.nar - compression:
zstd - NAR size:
3330176
Before repair:
GET /main/nar/8iv5j0f2difw6wg9vwj9r2raacb08fkv.narreturned HTTP 200 but transferred zero bytes and curl exited with code 18.- Attic logs reported
Storage error: service error: unhandled error (NoSuchBucket)fromattic_server::api::binary_cache. - Direct RustFS checks showed both
atticandtofu-stateabsent from S3ListBucketswhile disk markers existed under/data/<bucket>and/data/.rustfs.sys/buckets/<bucket>.
Repair action:
- A controlled restart of
nix-cache/deployment/attic-rustfs-openebsrestored the RustFS bucket index.
After repair:
just tofu-state-ha-readiness --expect-interimpassed again for the current guarded-but-not-HA state path.- The
atticbucket reappeared in RustFSListBuckets. - The incident NAR body streamed successfully from Attic with HTTP 200 and a nonzero compressed download.
Guardrail Added
just attic-nar-integrity-check now probes the exact Nix substituter body path:
just attic-nar-integrity-check --store-hash 8iv5j0f2difw6wg9vwj9r2raacb08fkv
The check mints a short-lived read-only Attic token from the in-cluster signing
secret, fetches <cache>/<hash>.narinfo, downloads the advertised NAR body, and
fails if the body stream errors or transfers zero bytes.
2026-05-08 Publication Regression
PR #542 added a manual Attic Trusted Publication Probe and run
25530760619 proved that a one-path synthetic builtins.toFile publication
could succeed with clean pre/post tofu-state and attic bucket-index
evidence.
That bounded synthetic proof was not representative enough. PR #543 added the
manual medium-check profile, which builds the repo’s bounded deadnix flake
check and publishes the resulting closure with require-cache-push: "true".
Default-branch run 25531269981 failed:
- profile:
medium-check - probe output:
/nix/store/7mj2v3i73gca7437kz5ikhvhpnw7zak6-deadnix-check - Attic attempted to push 22 new Nix store paths
- the check output path deduplicated successfully
- dependency/source path uploads failed with
InternalServerError
The ramp then added a small-check profile between the one-path synthetic
probe and medium-check. It publishes the repo’s bounded statix flake check
with the same pre/post bucket-index and tofu-state evidence, using
confirm=probe-attic-publication-small-check. Default-branch run
25559359638 failed too:
- profile:
small-check - probe output:
/nix/store/aqk7n3kkr3shakz09f4lszgcp0wi1208-statix-check - Attic attempted to push 22 Nix store paths
- the check output path deduplicated successfully
- dependency/source path uploads failed with
InternalServerError
That result removed the assumption that smaller real-output closures are safe
staging steps. small-check and medium-check are now controlled reproduction
profiles only.
- Attic logs showed RustFS/S3
NoSuchBucketatserver/src/api/v1/upload_path.rs:364and HTTP 500
On 2026-05-13, current-main run 25817881900 repeated the representative
medium-check proof after two one-path synthetic probes had passed. It
requested
/nix/store/ljsjzk9xcwb9rsfwk6p8dgcjcviqiwlj-deadnix-check, attempted a
22-path Attic push delta, and failed pushing
/nix/store/mdxyb7p8m5sfqhcks25pvs25k90prmis-source with
InternalServerError.
Pre-publication evidence passed for both tofu-state and attic.
Post-publication evidence failed: both buckets were absent from S3
list-buckets while /data/<bucket> and
/data/.rustfs.sys/buckets/<bucket> existed.
A controlled restart of nix-cache/deployment/attic-rustfs-openebs restored
current service health. After restart,
just tofu-state-ha-readiness --expect-interim passed again and
just rustfs-bucket-index-rca --bucket attic --scratch-probe --strict-scratch-disk-markers passed.
The conclusion did not change: broad trusted Attic writes stay quarantined until RustFS has a non-restart repair path or the backend is replaced with one that removes this bucket-index failure class.
Boundary
The correct product statement remains:
- Attic/RustFS is the mutable cache-forward Nix binary-cache path.
- Bazel currently has shared remote cache proof, not full RBE.
- OpenTofu state needs a dedicated HA authority path beyond bumble-local RustFS.
- Cache fallback is required because cache acceleration must not become build authority.