Runner Topology

Runner Topology Model

Post-Liqo ARC topology for the GloriousFlywheel runner platform. Supersedes the prior multi-cluster burst model that used Liqo virtual nodes.

Current State: Single-Cluster ARC

All runners deploy to the on-prem honey RKE2 cluster managed by a single ARC controller in arc-systems.

The current placement contract is split on purpose:

  • ARC controller pods, baseline Nix payloads, DinD payloads, and Jess implementation-overlay listeners are pinned to honey
  • stateless Docker payloads and their listener, plus bounded heavy Nix compute lanes, are admitted to sting with the compute-expansion toleration
  • bumble remains the storage-biased OpenEBS/ZFS node, not the ARC scheduling authority

That split exists because storage capacity and kubelet eviction capacity are different things. bumble has the durable ZFS-backed storage plane, but its node root/image filesystem can still hit DiskPressure from RKE2/containerd and host Nix store churn. Baseline ARC control and shared runner scheduling must not depend on that storage node being free of rootfs pressure.

honey RKE2 cluster
+-- arc-systems namespace
|   +-- ARC Controller (v0.14.0)
|       - manages all AutoScalingRunnerSets
|       - Prometheus metrics on :8080
|       - 5 alert rules (PrometheusRule)
|
+-- arc-runners namespace
|   +-- tinyland-nix (ARC ScaleSet label)
|   |   - Nix builds, Attic cache integration
|   |   - Runner image ships Determinate Nix
|   |   - No shared /nix/store PVC requirement for baseline scheduling
|   |   - Warm pool: CronJob scales up during business hours
|   |   - CPU: 4 limit, Memory: 8Gi limit
|   |
|   +-- tinyland-docker (ARC ScaleSet label)
|   |   - General CI workloads
|   |   - Runtime placement: sting compute-expansion lane
|   |   - CPU: 2 limit, Memory: 4Gi limit
|   |
|   +-- tinyland-dind (ARC ScaleSet label)
|   |   - Container builds (Docker-in-Docker)
|   |   - 40-50Gi ephemeral storage
|   |
|   +-- tinyland-nix-gpu (ARC ScaleSet, extra)
|   |   - Shared bounded GPU lane on honey
|   |   - Host /dev/dri pass-through + Vulkan and Dawn/WebGPU userspace canaries
|   |   - Max 1 runner
|   |
|   +-- any live repo-shaped residue found by audit
|       - migration debt to retire, not committed product taxonomy
|
+-- gitlab-runners namespace
|   +-- GitLab HPA runners (docker, dind, nix)
|       - HPA-managed deployments
|       - Separate from ARC, managed by gitlab-runner Helm chart
|
+-- nix-cache namespace
    +-- Attic binary cache server
    +-- Bazel remote cache
    +-- CNPG PostgreSQL cluster
    +-- RustFS object storage

Runner Classes

Workflow-facing labels are shared capability classes. Implementation object names may differ inside the ARC stack, but those names are not the consumer contract and must not encode project identity.

Label Forge Type Scale Model Storage Use Case
tinyland-nix GitHub ARC ScaleSet min 0 + scheduled warm pool, core max 16 + sting overflow max 8 Ephemeral rootfs + Attic/Bazel acceleration Nix builds, flake checks
tinyland-docker GitHub ARC ScaleSet Scale-to-zero, core max 20 Ephemeral General CI, tests, linting
tinyland-dind GitHub ARC ScaleSet Scale-to-zero, core max 20 + sting overflow max 16 Honey large ephemeral; sting fast-local PVC scratch Container image builds
tinyland-nix-operator GitHub ARC ScaleSet (extra) Scale-to-zero, max 1 Ephemeral rootfs + Attic/Bazel acceleration ARC deploy/operator maintenance
tinyland-nix-heavy GitHub ARC ScaleSet (extra) Scale-to-zero Ephemeral rootfs + Attic/Bazel acceleration Memory-heavy Nix/Rust builds
tinyland-nix-kvm GitHub ARC ScaleSet (extra) Max 1 Ephemeral + host /dev/kvm Shared VM execution
tinyland-nix-gpu GitHub ARC ScaleSet (extra) Max 1 Ephemeral + host /dev/dri Bounded GPU/Vulkan/WebGPU smoke
docker GitLab HPA Deployment Min 1 replica Ephemeral Compatibility GitLab CI
dind GitLab HPA Deployment Min 1 replica Ephemeral GitLab container builds
nix GitLab HPA Deployment Min 1 replica Ephemeral GitLab Nix builds

Repo-shaped add-ons found in live cluster audits or historical stack values are migration debt. They must either be removed or replaced by shared capability labels with explicit runtime, architecture, privilege, or resource semantics. Owner-specific GitHub App installs belong in implementation overlays, not in this runner topology as product classes.

When multiple implementation overlays attach to one physical backend, ARC runner caps remain local to each owner-specific scale set. They do not create a global cap for the shared workflow-facing label. Current Honey behavior relies on Kubernetes scheduling as the final global backpressure, so two owner overlays can each request one tinyland-nix-heavy runner and the second pod will queue if the backend cannot admit another heavy runner.

Scale-Up and Scale-Down Lifecycle

ARC Runners (GitHub)

Job queued in GitHub Actions
  -> GitHub webhook -> ARC Controller
  -> Controller creates runner pod in scale set
  -> Pod starts (pulls image, exposes runtime hints)
  -> Runner registers with GitHub
  -> Job executes
  -> Job completes -> pod terminates (ephemeral)
  -> Scale set returns to minRunners (default: 0)

Warm pool override (tinyland-nix):

13:00 UTC weekdays (business hours start)
  -> CronJob patches AutoScalingRunnerSet: minRunners=1
  -> ARC pre-creates 1 idle runner pod
  -> Pods already carry the runner image + Nix toolchain
  -> Jobs get instant assignment (no registration cold start)

01:00 UTC daily (business hours end)
  -> CronJob patches AutoScalingRunnerSet: minRunners=0
  -> Idle pods drain and terminate
  -> Attic remains the reusable acceleration layer between jobs

GitLab Runners (HPA)

Always-on: minimum 1 replica per runner type
  -> HPA monitors CPU/memory utilization
  -> Scales up when CPU > target (default 70%)
  -> Scale-down stabilization: 300s window
  -> Never scales below min replicas

Cold Start Times

Runner Cold Start Warm Start Notes
tinyland-docker ~20-30s N/A (ephemeral) Image pull + registration
tinyland-dind ~25-35s N/A (ephemeral) Larger image, DinD setup
tinyland-nix (cold) ~90-120s ~5-10s Runner image already contains Nix; cache misses dominate
tinyland-nix (warm pool) ~5-10s ~5-10s Warm pool avoids registration cold start; Attic still carries reuse
GitLab compatibility runner ~5s ~5s Always running (HPA min replicas)

What Changed from Liqo Era

The prior topology used Liqo to extend capacity across multiple clusters (blahaj, honey) via virtual nodes. This created several problems:

  1. Scheduling complexity: Liqo virtual nodes required affinity rules and tolerations that differed per cluster, making runner configs non-portable.
  2. Network partitions: Cross-cluster pod communication was fragile when Liqo peering connections dropped.
  3. State management: Shared volumes (like /nix/store) couldn’t span Liqo-peered clusters without additional infrastructure.
  4. Debugging difficulty: Pod failures on virtual nodes were hard to diagnose — logs and events were split across clusters.

Resolution: All runners now deploy to a single cluster. Multi-cluster burst is deferred until a clear need emerges. When it does, the approach will be federated ARC (multiple independent ARC controllers with a shared GitHub App) rather than virtual-node scheduling.

Advanced Runner Classes

Runner class Current state Next proof surface Anchor
KVM Shared lane with bounded proof floor Broader rockies and graphical VM execution if a fresh slice is reopened beyond the current terminal-first floor #312
GPU / WebGPU / Dawn Shared tinyland-nix-gpu host-device lane on honey, one bounded Dawn/WebGPU compute-plus-render userspace proof floor, and one downstream default-branch proof on the shared lane Wider downstream adoption only when an authoritative workload needs it; keep local NVIDIA-fabric ideas as future design context unless a real product requirement revives them #342, #347, tinyland-inc/lab#163
macOS Tightened bounded self-hosted Darwin proof floor Decide whether to promote beyond the current proof floor into a platform-owned shared macOS lane #320, #335
riscv Not started Name the first repo, workload, and operator boundary instead of ambient demand #333
Cross-forge follow-on Compatibility-only, not product Repeatable GitHub-first runtime pattern before any parity claim #333

Capacity Planning

Current cluster: 3 nodes (on-prem RKE2) — honey (control plane), bumble (storage/ZFS), sting (stateless compute).

Current ARC headroom baseline after the bumble rootfs follow-up, listener-placement apply on 2026-04-25, and Docker placement apply on 2026-04-27:

  • honey: primary ARC controller, baseline Nix payloads, DinD payloads, honey-backed tinyland-nix-heavy, Jess overlay listeners, and storage-sensitive runner payloads
  • bumble: storage-biased OpenEBS/ZFS services plus live owner/repo-shaped ARC residue and older in-flight runner pods; currently observed as schedulable only when explicitly uncordoned/admitted
  • sting: bounded compute-expansion surface for stateless tinyland-docker and explicit compute-expansion lanes; not the default baseline Nix, heavy Nix, or DinD runtime surface

The 2026-04-25 audit found bumble DiskPressure=False but still tight on rootfs headroom after supported CRI and Nix cleanup. The durable fix is placement and filesystem architecture, not treating raw storage capacity as kubelet imagefs capacity.

The 2026-04-29 just kubelet-imagefs-capacity-audit --node bumble checkpoint kept that boundary active: bumble remained Ready=True and DiskPressure=False, but kubelet rootfs, imagefs, and containerfs each had only 11.4 GiB available (16.3%) on a 69.9 GiB filesystem. The node can be a 30 TiB-class storage node and still be unsafe as default bursty ARC imagefs capacity. The May 1 offline fixture guard now keeps healthy, warning, and critical rootfs/imagefs/containerfs boundaries covered in CI; it is not a live capacity remediation by itself.

The 2026-05-02 read-only audit still showed bumble below the warning threshold: 12.0 GiB available (17.1%) for kubelet rootfs, imagefs, and containerfs, while Ready=True and DiskPressure=False. The operating decision is therefore a hybrid one: default ARC and GitLab runner placement continues to avoid bumble, and host-level RKE2/containerd or /nix reshaping is a later maintenance action. just runner-scale-contract-check guards the committed ARC and GitLab selectors so bumble cannot silently return as default runner burst capacity before that remediation is explicit.

The 2026-04-25 overlay validation queue also exposed a separate scheduler limit: honey can fill its pod slots while bumble is cordoned and sting is protected by the compute-expansion taint. That is shared pool capacity/placement debt, not a reason to mint repo-specific runner labels. The first bounded relief paths admitted tinyland-nix-heavy and stateless tinyland-docker to sting with the compute-expansion toleration. Baseline Nix and DinD lanes stay on honey until their runtime and storage envelopes are proven separately.

The 2026-05-11 outage confirmed the same limit in a more direct way: broad cluster resources can be available while a honey-pinned lane is still blocked by the honey node pod-count ceiling. Kubernetes pod capacity, selectors, taints, tolerations, and per-lane storage envelopes are the real admission contract. Do not read spare CPU or memory on sting as runner availability unless that runner class has a reviewed sting placement and scratch-storage contract. Do not read bumble OpenEBS/ZFS capacity as runner availability; it is the durable PVC plane.

The 2026-05-12 post-merge burst showed that completed runner utility Jobs can also consume honey’s finite pod slots long after their useful work is done. The ARC runner stack now enables the runner-cleanup CronJob in arc-runners so Succeeded and Failed runner-namespace pods age out through the repo-managed control plane instead of relying on ad hoc live deletion.

The May 15 managed-apply recurrence exposed a second listener-continuity edge: ARC can hold listener recreation after a scale-set spec change until existing runners drain, and freezing only minRunners still allows queued work to refill the shared lane through an existing listener. The managed ARC apply workflow now keeps plan/apply/verify off labels it quiesces, max-freezes the shared Nix/Docker/DinD scale sets before mutation, records a cap snapshot, generates and guards a fresh post-quiesce apply plan, restores caps from source tfvars targets on success before listener proof, keeps the cap snapshot only as the failure rollback, keeps best-effort failure restore in the workflow trap, gives active shared jobs a bounded 20-minute drain window, and treats a missing listener with active runners as a failed post-apply proof. tinyland-nix-operator is the dedicated control-plane lane; the workflow falls back to tinyland-nix-heavy only until that lane is bootstrapped live.

The 2026-06-09 managed apply exposed two further edges in that lane. First, the post-apply listener cap prove treated mid-recreation listener churn as drift and went red on a successful apply; the prove step is now the settle-aware scripts/arc-prove-listener-caps.sh, which classifies drift as transient while a set’s AutoscalingListener CR or listener-config secret is missing or younger than a grace window, hard-fails fast only on a stable drift signature with a settled listener, and still hard-fails anything unresolved at its overall deadline. Second, idle leaked EphemeralRunner CRs (zero job fields with their owning EphemeralRunnerSet at replicas=0, or excess beyond desired) kept Running pods alive and stalled the freeze drain for its full 20-minute window; scripts/reap-idle-leaked-ephemeral-runners.sh now runs between quiesce scoping and the cap freeze, deleting only provably leaked no-job CRs (graceful CR deletion, just-in-time per-CR job re-check, warm minRunners pools at current==desired untouchable) and waiting bounded for the runner sets and listeners to settle before the freeze proceeds.

The 2026-04-29, 2026-05-10, 2026-05-12, and 2026-05-15 cap expansions keep that placement model but raise the source-owned ceilings for the primary shared lanes: tinyland-docker to 20 on sting, tinyland-nix to 16 on honey plus 8 sting overflow slots, and tinyland-dind to 20 on honey plus 16 sting overflow slots. The additive tinyland-nix-compute-expansion scale set contributes those shared tinyland-nix overflow slots on sting; its TIN-1400 storage model uses per-pod generic ephemeral PVCs on local-path-sting-fast-ephemeral for /nix and /home/runner/_work, then copies the baked image /nix into the PVC before runner startup so the image’s Nix installation survives the mount. The additive tinyland-dind-compute-expansion scale set contributes those shared tinyland-dind overflow slots on sting; those pods use generic ephemeral PVCs on local-path-sting-fast-ephemeral for /home/runner/_work and /var/lib/docker because sting’s fast disks are not the kubelet root ephemeral-storage filesystem. Those pods therefore keep root ephemeral-storage requests small and reserve the large scratch budget through the fast-local PVCs. The Docker lane also carries an explicit 1Gi/8Gi ephemeral-storage request/limit so large stateless CI bursts are no longer invisible to scheduler disk accounting. The honey DinD lane splits ephemeral-storage admission between a 4Gi/8Gi runner workspace container and a 24Gi/40Gi Docker daemon sidecar so neither side of the build pod inherits the namespace default. Heavy, KVM, GPU, and compatibility-residue lanes remain separately bounded.

The 2026-05-24 first-party dogfood surge made the remaining Sting storage truth visible: tinyland-nix-compute-expansion can have the fast-local PVC model present and still hit scheduler Insufficient ephemeral-storage if the kubelet root/nodefs surface on sting only advertises roughly 71GB. That is not a reason to route GloriousFlywheel validation to GitHub-hosted runners, and it is not proof that the physical SSD/NVMe pool is exhausted. It is a node storage integration problem: kubelet/local-path/root ephemeral accounting must match the intended fast-local runner substrate before a larger overflow cap is treated as fully usable live capacity.

That maxRunners = 1 is still per ARC scale set. With both Tinyland and Jess owner overlays installed, simultaneous heavy jobs can request two heavy pods for the same shared label. Until GloriousFlywheel has a higher-level cross-overlay capacity controller or more sting-class capacity, this is honest queueing and should be described as global capacity debt rather than runner label debt.

Use just arc-shared-label-capacity-audit to make that boundary visible from live state. It groups Helm release values by workflow-facing tinyland-* labels and reports which owner overlay scale sets publish each label, their per-scale-set caps, current runner counts, resource envelopes, and placement.

Scenario Concurrent Runners Estimated Resource Usage
Quiet (off-hours) 3 GitLab + 0 ARC ~3 vCPU / 6Gi
Normal (business hours) 3 GitLab + 2 warm Nix ~11 vCPU / 22Gi
Burst (multiple PRs) 3 GitLab + 4-6 ARC Exceeds cluster, queue builds

Scaling strategy: Vertical (larger nodes) before horizontal (more nodes). ARC minRunners = 0 means burst capacity is only consumed when needed, except for explicitly scheduled warm-pool windows that pre-scale selected lanes.

Ownership

Component Owner Change Process
ARC Controller config Platform Engineer PR to tofu/stacks/arc-runners/
Runner scale set params Platform Engineer PR to tofu/stacks/arc-runners/
Warm pool schedules Operator PR to tofu/stacks/arc-runners/
HPA policies Platform Engineer PR to tofu/modules/gitlab-runner/
Cluster nodes Org Admin On-prem provisioning (honey, bumble, sting)
Prometheus alerts Platform Engineer PR to tofu/modules/arc-controller/monitoring.tf

GloriousFlywheel