Gloriousflywheel Attic Rustfs Nar Index Incident 2026 05 06

GloriousFlywheel Attic RustFS NAR Index Incident - 2026-05-06

Summary

The main Platform Proof for GloriousFlywheel commit 15afd127a3074443e6968fa14525a0279abbd869 passed only because the normal Nix proof now uses nix build ... --fallback.

The underlying cache failure was still live: Attic returned a valid narinfo for /nix/store/8iv5j0f2difw6wg9vwj9r2raacb08fkv-runner-dashboard-0.1.0, but the advertised NAR body stream failed from the RustFS storage path.

This is cache-object integrity and bucket-index reliability debt. It is not Bazel remote execution, and it is not proof of a remote execution backend.

Evidence

The failing narinfo advertised:

  • store path: /nix/store/8iv5j0f2difw6wg9vwj9r2raacb08fkv-runner-dashboard-0.1.0
  • URL: nar/8iv5j0f2difw6wg9vwj9r2raacb08fkv.nar
  • compression: zstd
  • NAR size: 3330176

Before repair:

  • GET /main/nar/8iv5j0f2difw6wg9vwj9r2raacb08fkv.nar returned HTTP 200 but transferred zero bytes and curl exited with code 18.
  • Attic logs reported Storage error: service error: unhandled error (NoSuchBucket) from attic_server::api::binary_cache.
  • Direct RustFS checks showed both attic and tofu-state absent from S3 ListBuckets while disk markers existed under /data/<bucket> and /data/.rustfs.sys/buckets/<bucket>.

Repair action:

  • A controlled restart of nix-cache/deployment/attic-rustfs-openebs restored the RustFS bucket index.

After repair:

  • just tofu-state-ha-readiness --expect-interim passed again for the current guarded-but-not-HA state path.
  • The attic bucket reappeared in RustFS ListBuckets.
  • The incident NAR body streamed successfully from Attic with HTTP 200 and a nonzero compressed download.

Guardrail Added

just attic-nar-integrity-check now probes the exact Nix substituter body path:

just attic-nar-integrity-check --store-hash 8iv5j0f2difw6wg9vwj9r2raacb08fkv

The check mints a short-lived read-only Attic token from the in-cluster signing secret, fetches <cache>/<hash>.narinfo, downloads the advertised NAR body, and fails if the body stream errors or transfers zero bytes.

2026-05-08 Publication Regression

PR #542 added a manual Attic Trusted Publication Probe and run 25530760619 proved that a one-path synthetic builtins.toFile publication could succeed with clean pre/post tofu-state and attic bucket-index evidence.

That bounded synthetic proof was not representative enough. PR #543 added the manual medium-check profile, which builds the repo’s bounded deadnix flake check and publishes the resulting closure with require-cache-push: "true". Default-branch run 25531269981 failed:

  • profile: medium-check
  • probe output: /nix/store/7mj2v3i73gca7437kz5ikhvhpnw7zak6-deadnix-check
  • Attic attempted to push 22 new Nix store paths
  • the check output path deduplicated successfully
  • dependency/source path uploads failed with InternalServerError

The ramp then added a small-check profile between the one-path synthetic probe and medium-check. It publishes the repo’s bounded statix flake check with the same pre/post bucket-index and tofu-state evidence, using confirm=probe-attic-publication-small-check. Default-branch run 25559359638 failed too:

  • profile: small-check
  • probe output: /nix/store/aqk7n3kkr3shakz09f4lszgcp0wi1208-statix-check
  • Attic attempted to push 22 Nix store paths
  • the check output path deduplicated successfully
  • dependency/source path uploads failed with InternalServerError

That result removed the assumption that smaller real-output closures are safe staging steps. small-check and medium-check are now controlled reproduction profiles only.

  • Attic logs showed RustFS/S3 NoSuchBucket at server/src/api/v1/upload_path.rs:364 and HTTP 500

On 2026-05-13, current-main run 25817881900 repeated the representative medium-check proof after two one-path synthetic probes had passed. It requested /nix/store/ljsjzk9xcwb9rsfwk6p8dgcjcviqiwlj-deadnix-check, attempted a 22-path Attic push delta, and failed pushing /nix/store/mdxyb7p8m5sfqhcks25pvs25k90prmis-source with InternalServerError.

Pre-publication evidence passed for both tofu-state and attic. Post-publication evidence failed: both buckets were absent from S3 list-buckets while /data/<bucket> and /data/.rustfs.sys/buckets/<bucket> existed.

A controlled restart of nix-cache/deployment/attic-rustfs-openebs restored current service health. After restart, just tofu-state-ha-readiness --expect-interim passed again and just rustfs-bucket-index-rca --bucket attic --scratch-probe --strict-scratch-disk-markers passed.

The conclusion did not change: broad trusted Attic writes stay quarantined until RustFS has a non-restart repair path or the backend is replaced with one that removes this bucket-index failure class.

Boundary

The correct product statement remains:

  • Attic/RustFS is the mutable cache-forward Nix binary-cache path.
  • Bazel currently has shared remote cache proof, not full RBE.
  • OpenTofu state needs a dedicated HA authority path beyond bumble-local RustFS.
  • Cache fallback is required because cache acceleration must not become build authority.

GloriousFlywheel