KVM Capacity Policy
The tinyland-nix-kvm label is the shared capability class for KVM-backed VM
execution. It is not a repo-specific runner label, and ARC does not provide a
global concurrency cap for the label. ARC caps each scale set independently, so
global capacity has to be treated as an operator policy on top of the live
scale-set inventory.
This page records the current TIN-912 policy checkpoint for Honey-first KVM capacity with bounded Sting overflow.
Current Live Audit
Run these read-only checks before changing KVM runner counts, placement, or resource envelopes:
just arc-shared-label-capacity-audit --include-label tinyland-nix-kvm
kubectl --context honey get nodes -o json
The 2026-05-05 audit showed three live scale sets advertising
tinyland-nix-kvm:
| Scale set | Scope | Node selector | Toleration | Min | Max | Request | Limit |
|---|---|---|---|---|---|---|---|
tinyland-nix-kvm |
tinyland-inc |
honey plus capability.tinyland.dev/kvm=true |
none | 0 | 1 | 2 CPU, 8Gi |
8 CPU, 16Gi |
jesssullivan-nix-kvm |
Jesssullivan/jesssullivan-infra |
honey plus capability.tinyland.dev/kvm=true |
none | 0 | 1 | 2 CPU, 8Gi |
8 CPU, 16Gi |
jesssullivan-nix-vm-test-kvm |
Jesssullivan/nix-vm-test |
capability.tinyland.dev/kvm=true |
Sting | 0 | 2 | 4 CPU, 32Gi, 24Gi ephemeral |
8 CPU, 64Gi, 40Gi ephemeral |
Current aggregate label policy:
| Label | Active scale sets | Aggregate max | Aggregate request at max | Aggregate limit at max |
|---|---|---|---|---|
tinyland-nix-kvm |
3 | 4 | 12 CPU, 80Gi |
32 CPU, 160Gi |
All KVM scale sets must keep minRunners = 0 unless a separate warm-pool
decision is made. Scale-to-zero is part of the capacity contract.
Node Budget
The 2026-05-03 live node budget was normalized to GiB from the raw Kubernetes allocatable values:
| Node | KVM label | Taints | Allocatable CPU | Allocatable memory | Allocatable ephemeral storage | Allocatable pods |
|---|---|---|---|---|---|---|
honey |
true |
none | 32 | 219.81Gi |
1384.60Gi |
110 |
sting |
true |
dedicated.tinyland.dev/compute-expansion=true:NoSchedule |
32 | 54.68Gi |
66.32Gi |
110 |
bumble |
none | none | 4 | 15.06Gi |
66.44Gi |
110 |
Honey remains the primary KVM payload node for this policy checkpoint. Sting is admitted only as explicitly tolerated compute-expansion overflow with a memory and ephemeral-storage guardrail. Bumble is not eligible because it does not advertise the KVM capability label.
Current Policy
- Keep workflow-facing labels capability-shaped:
tinyland-nix-kvm, not repo-shaped labels. - Keep owner or repo names only in private ARC registration anchors.
- Preserve
minRunners = 0on every KVM scale set. - Treat the current aggregate
tinyland-nix-kvmmax of 4 as the live KVM ceiling until a fresh node-budget audit justifies a higher number. - New owner overlays advertising
tinyland-nix-kvmdefault tomaxRunners = 1unless the operator explicitly budgets the aggregate label ceiling upward. - Heavier VM-test overlays may use the proven
4 CPU / 32Girequest and8 CPU / 64Gilimit envelope, but only when the aggregate request still fits the admitted node budget. On Sting, that envelope must include an explicit ephemeral-storage request and limit; the current guardrail is24Girequested and40Gilimited. - This is cache-backed local KVM execution. It is not Bazel remote execution, not Nix remote builders, and not another remote executor.
Sting Admission Gate
Sting has a KVM label and is now admitted only as bounded overflow for KVM payloads that satisfy the gates below. The default posture is still Honey-first; Sting admission is an explicit operator decision, not a reason to mint Sting-specific or repo-specific workflow labels.
Before any KVM payload schedules on Sting, all of these must remain true:
- the lane has an explicit toleration for
dedicated.tinyland.dev/compute-expansion=true:NoSchedule; - the runner envelope fits Sting’s memory budget, not just Honey’s;
- a kubelet root/imagefs capacity audit is healthy for the target envelope;
- cache/prewarm work has reduced cold
libguestfsand VM-image rebuild risk; - the proof keeps the same
tinyland-nix-kvmcapability label instead of minting a Sting-specific or repo-specific label; - the rollout has a rollback path that returns the lane to Honey-only placement.
Given Sting’s 2026-05-03 memory and ephemeral-storage budget, the proven
32Gi request / 64Gi limit VM-test envelope is a one-runner-at-a-time Sting
overflow candidate, not a default multi-runner Sting payload. A smaller KVM
envelope may be tested separately, but it still needs the gates above.
Related Work
TIN-912: KVM runner capacity policy.TIN-908: trusted cache prewarm and publication for heavy KVM closures.TIN-627: broader shared-label capacity policy.