KVM Cache Prewarm

KVM Cache Prewarm

The tinyland-nix-kvm lane runs KVM-backed VM proofs with shared Nix cache reads and local execution on the runner pod. That is not remote execution, and it is not a remote-builder contract.

The KVM lane needs a separate trusted publication path for heavy closures that are expensive to rebuild from cold ephemeral pods. PR jobs stay read-only for cache writes. They may consume the shared Attic substituter, but they must not receive ATTIC_TOKEN.

The prewarm helper starts attic watch-store during trusted builds so build-time store paths are streamed as they are realized. It also publishes the recursive closure of each built output as a final check. Both pieces matter for VM harnesses: final test outputs can have tiny runtime closures, while cold dependencies such as libguestfs are expensive build-time inputs that still rebuild unless their store paths are explicitly present in Attic.

Trusted Publisher

Run cache prewarm only from a trusted operator context:

  • a protected default-branch workflow with ATTIC_TOKEN
  • a scheduled operator job with ATTIC_TOKEN
  • a manual operator shell with ATTIC_TOKEN

Do not run cache publication from pull_request jobs or untrusted forks. The token must have write permission for the target Attic cache; read-side consumer tokens are expected to fail during attic push. The helper writes ATTIC_TOKEN into a private Attic token-file under a temporary HOME and points the Attic CLI at that config for watch-store and final push publication. Operator wrappers should avoid expanding the token into long-running shell command strings.

watch-store is a best-effort stream for build-time paths. The authoritative publication step is the explicit recursive attic push after each successful build, so a transient watch-store exit should not prevent closure publication.

First KVM Closure Set

The first built-in KVM profile warms the cold dependencies that surfaced during the nix-vm-test KVM proof work:

  • libguestfs
  • guestfs-tools
  • qemu
  • qemu_kvm

The built-in profile uses github:NixOS/nixpkgs/nixos-unstable by default so the libguestfs package line matches the newer 1.56.x closure observed in the KVM signal-9 investigation. Operators may override this with --nixpkgs-ref when a consumer repo needs a different Nixpkgs pin. qemu_kvm is included because fresh KVM preflight shells still had to realize a visible QEMU KVM closure after the heavier libguestfs and guestfs-tools paths were warm.

Command

From this repo, after entering the dev shell:

ATTIC_SERVER=https://nix-cache.tinyland.dev \
ATTIC_SERVER_NAME=tinyland \
ATTIC_TOKEN="$ATTIC_TOKEN" \
scripts/cache-warm.sh --profile kvm-vm-test --cache main

For an in-cluster trusted job, the server should normally be the cluster-local Attic API endpoint:

ATTIC_SERVER=http://attic.nix-cache.svc.cluster.local \
ATTIC_SERVER_NAME=tinyland \
ATTIC_TOKEN="$ATTIC_TOKEN" \
scripts/cache-warm.sh --profile kvm-vm-test --cache main

To include a downstream VM harness closure, append the exact installable that the consuming repo needs:

ATTIC_SERVER=http://attic.nix-cache.svc.cluster.local \
ATTIC_SERVER_NAME=tinyland \
ATTIC_TOKEN="$ATTIC_TOKEN" \
scripts/cache-warm.sh \
  --profile kvm-vm-test \
  --cache main \
  --installable 'github:Jesssullivan/nix-vm-test?ref=main#checks.x86_64-linux.rocky-10_1-budgie-graphical-login-manager-persistence-test'

Use --no-push for build-only proof runs. That mode still builds locally, but it does not require ATTIC_SERVER, does not authenticate to Attic, and does not publish cache paths.

In-Cluster Publisher Job

The repo-managed in-cluster wrapper is the preferred operator shape for the trusted KVM publisher. It renders a ConfigMap containing scripts/cache-warm.sh and a single Kubernetes Job with the proven KVM envelope:

  • nixos/nix:latest
  • privileged container
  • /dev/kvm mounted as a CharDevice
  • KVM node selector plus the current Honey hostname pin
  • explicit emptyDir scratch for /nix/var/nix/builds and /tmp
  • /nix/var/nix/builds forced to mode 0755
  • Attic public key passed explicitly for Nix substituter trust
  • Attic write token referenced through secretKeyRef
  • cgroup evidence including memory.peak, memory.events, and pids.peak

Printing the manifests is non-mutating:

just kvm-cache-prewarm-job \
  --attic-token-secret attic-cache-publisher \
  --attic-token-key token \
  --attic-public-key "$ATTIC_PUBLIC_KEY"

Applying the trusted publisher is explicit:

just kvm-cache-prewarm-job \
  --apply \
  --wait \
  --logs \
  --attic-token-secret attic-cache-publisher \
  --attic-token-key token \
  --attic-public-key "$ATTIC_PUBLIC_KEY"

--attic-token-key defaults to token; pass the actual Secret key when the publisher Secret uses a different key such as ATTIC_TOKEN.

Use --no-push when the same in-cluster envelope is needed for a build-only scratch or KVM proof. In that mode the job does not require --attic-token-secret and forwards --no-push to cache-warm.sh.

Proof Of Warmth

A successful prewarm is not enough by itself. The close-out evidence must also include a fresh ephemeral tinyland-nix-kvm PR-style run showing the warmed paths substitute instead of rebuilding the heavy closure from source.

For the current Budgie/Rocky KVM work, the important evidence is that a fresh runner crosses the old cold path without compiling libguestfs locally before entering the VM assertion layer.

For dependency-level close-out, use a fresh read-only pod or PR-style runner with no Attic write token, verify the target is absent before realization, and show exact libguestfs, guestfs-tools, or qemu_kvm paths copying from Attic with no local building lines.

Scratch Isolation

Exact downstream VM-test graphs can put heavy pressure on the Nix build directory before the final store path exists. If a trusted prewarm fails while compiling libguestfs, do not treat a larger memory limit as the first fix without cgroup OOM evidence.

The May 3 isolation proof built the exact nix-vm-test#13 libguestfs-1.56.2 derivation after giving Nix an explicit scratch mount at /nix/var/nix/builds and a separate /tmp. The /nix/var/nix/builds path must be a normal non-world-writable directory; if it is backed by a fresh volume, set its mode to 0755 before invoking Nix. That proof did not publish cache paths and did not mount /dev/kvm, so it is scratch/filesystem evidence, not a complete KVM prewarm substitution proof.

Boundaries

  • cache reads are safe for PR-style KVM jobs
  • cache writes require trusted ATTIC_TOKEN
  • trusted publication uses an Attic token-file; do not pass the token as a login argument or rely on stdin login for this flow
  • the Attic CLI server alias (ATTIC_SERVER_NAME) is separate from the cache name (ATTIC_CACHE)
  • runner labels remain capability-shaped, such as tinyland-nix-kvm
  • owner or repo names may appear only in private ARC registration anchors or explicit downstream installable refs, not as product runner labels
  • this is cache-backed local execution until a separate remote-builder or remote-executor proof exists

GloriousFlywheel