KVM Capacity Policy

KVM Capacity Policy

The tinyland-nix-kvm label is the shared capability class for KVM-backed VM execution. It is not a repo-specific runner label, and ARC does not provide a global concurrency cap for the label. ARC caps each scale set independently, so global capacity has to be treated as an operator policy on top of the live scale-set inventory.

This page records the current TIN-912 policy checkpoint for Honey-first KVM capacity with bounded Sting overflow.

Current Live Audit

Run these read-only checks before changing KVM runner counts, placement, or resource envelopes:

just arc-shared-label-capacity-audit --include-label tinyland-nix-kvm
kubectl --context honey get nodes -o json

The 2026-05-05 audit showed three live scale sets advertising tinyland-nix-kvm:

Scale set Scope Node selector Toleration Min Max Request Limit
tinyland-nix-kvm tinyland-inc honey plus capability.tinyland.dev/kvm=true none 0 1 2 CPU, 8Gi 8 CPU, 16Gi
jesssullivan-nix-kvm Jesssullivan/jesssullivan-infra honey plus capability.tinyland.dev/kvm=true none 0 1 2 CPU, 8Gi 8 CPU, 16Gi
jesssullivan-nix-vm-test-kvm Jesssullivan/nix-vm-test capability.tinyland.dev/kvm=true Sting 0 2 4 CPU, 32Gi, 24Gi ephemeral 8 CPU, 64Gi, 40Gi ephemeral

Current aggregate label policy:

Label Active scale sets Aggregate max Aggregate request at max Aggregate limit at max
tinyland-nix-kvm 3 4 12 CPU, 80Gi 32 CPU, 160Gi

All KVM scale sets must keep minRunners = 0 unless a separate warm-pool decision is made. Scale-to-zero is part of the capacity contract.

Node Budget

The 2026-05-03 live node budget was normalized to GiB from the raw Kubernetes allocatable values:

Node KVM label Taints Allocatable CPU Allocatable memory Allocatable ephemeral storage Allocatable pods
honey true none 32 219.81Gi 1384.60Gi 110
sting true dedicated.tinyland.dev/compute-expansion=true:NoSchedule 32 54.68Gi 66.32Gi 110
bumble none none 4 15.06Gi 66.44Gi 110

Honey remains the primary KVM payload node for this policy checkpoint. Sting is admitted only as explicitly tolerated compute-expansion overflow with a memory and ephemeral-storage guardrail. Bumble is not eligible because it does not advertise the KVM capability label.

Current Policy

  • Keep workflow-facing labels capability-shaped: tinyland-nix-kvm, not repo-shaped labels.
  • Keep owner or repo names only in private ARC registration anchors.
  • Preserve minRunners = 0 on every KVM scale set.
  • Treat the current aggregate tinyland-nix-kvm max of 4 as the live KVM ceiling until a fresh node-budget audit justifies a higher number.
  • New owner overlays advertising tinyland-nix-kvm default to maxRunners = 1 unless the operator explicitly budgets the aggregate label ceiling upward.
  • Heavier VM-test overlays may use the proven 4 CPU / 32Gi request and 8 CPU / 64Gi limit envelope, but only when the aggregate request still fits the admitted node budget. On Sting, that envelope must include an explicit ephemeral-storage request and limit; the current guardrail is 24Gi requested and 40Gi limited.
  • This is cache-backed local KVM execution. It is not Bazel remote execution, not Nix remote builders, and not another remote executor.

Sting Admission Gate

Sting has a KVM label and is now admitted only as bounded overflow for KVM payloads that satisfy the gates below. The default posture is still Honey-first; Sting admission is an explicit operator decision, not a reason to mint Sting-specific or repo-specific workflow labels.

Before any KVM payload schedules on Sting, all of these must remain true:

  • the lane has an explicit toleration for dedicated.tinyland.dev/compute-expansion=true:NoSchedule;
  • the runner envelope fits Sting’s memory budget, not just Honey’s;
  • a kubelet root/imagefs capacity audit is healthy for the target envelope;
  • cache/prewarm work has reduced cold libguestfs and VM-image rebuild risk;
  • the proof keeps the same tinyland-nix-kvm capability label instead of minting a Sting-specific or repo-specific label;
  • the rollout has a rollback path that returns the lane to Honey-only placement.

Given Sting’s 2026-05-03 memory and ephemeral-storage budget, the proven 32Gi request / 64Gi limit VM-test envelope is a one-runner-at-a-time Sting overflow candidate, not a default multi-runner Sting payload. A smaller KVM envelope may be tested separately, but it still needs the gates above.

  • TIN-912: KVM runner capacity policy.
  • TIN-908: trusted cache prewarm and publication for heavy KVM closures.
  • TIN-627: broader shared-label capacity policy.

GloriousFlywheel