Resource Limits

Resource Limits Reference

Default resource limits for CI job pods, by runner type.

Important boundary:

  • these are pod-level cgroup requests and limits
  • they are not the same thing as total RAM or CPU available in the honey cluster
  • a job can still be OOM-killed inside an 8Gi runner pod even when the cluster as a whole has abundant free memory

Default Job Pod Limits

These are the module defaults. Overlay deployments override these values in their *.tfvars files.

Runner CPU Request CPU Limit Memory Request Memory Limit
docker 100m 2 256Mi 2Gi
dind 500m 4 1Gi 8Gi
rocky8 100m 2 256Mi 2Gi
rocky9 100m 2 256Mi 2Gi
nix 500m 4 1Gi 8Gi

For the current ARC stack, the committed baseline tinyland-nix lane still matches that 8Gi memory limit.

The current repo-owned additive canary for heavier Nix work is tinyland-nix-heavy, defined with a 16Gi memory limit in tofu/stacks/arc-runners/dev-extra-runner-sets.tfvars.

Use just arc-runtime-audit to inspect the live ARC runner-set envelopes and confirm that the cluster actually matches the repo contract after rollout.

Typical Workload Profiles

Workload CPU (typical) Memory (typical) Recommended Runner
Python lint (ruff) 50-200m 128-256Mi docker
Python tests (pytest) 100-500m 256-512Mi docker
Nix flake check 200-500m 256-512Mi nix
GHC build (warm cache) 500m-1 512Mi-1Gi nix
GHC build (cold cache) 2-4 2-4Gi nix
MUSL static build 1-2 1-2Gi nix
FPM RPM packaging 100-500m 256-512Mi rocky8/rocky9
Docker image build 500m-2 512Mi-2Gi dind

Namespace Quota

The runner namespace has a total resource quota shared across all concurrent job pods:

Resource Default Description
CPU requests 16 Total CPU requests across all pods
Memory requests 32Gi Total memory requests across all pods
Max pods 50 Maximum concurrent pods

This quota is about namespace scheduling/accounting. It does not override the memory limit of an individual runner pod.

Requesting Limit Increases

If your jobs are being OOM-killed or throttled:

  1. Check pod resource usage: kubectl top pods -n <runner-namespace>
  2. Review the job logs for OOM or throttling messages
  3. Update the overlay *.tfvars file with higher limits for the relevant runner type
  4. Run tofu plan and tofu apply to apply changes
  5. The runner Helm release will be updated with new pod resource templates

GloriousFlywheel