HPA Tuning
This page applies to the legacy GitLab runner/HPA path.
GitHub Actions runner scale sets use ARC, not Kubernetes HPA for job-pod autosizing. ARC can scale runner count up and down, but it does not automatically change the CPU or memory limit of one runner pod.
Each runner type has an independent HorizontalPodAutoscaler (HPA) that manages replica count based on CPU utilization.
Default Configuration
| Parameter | Value |
|---|---|
| Minimum replicas | 1 |
| Maximum replicas | 5 |
| Scale-up stabilization window | 15 seconds |
| Scale-down stabilization window | 5 minutes |
| Target metric | CPU utilization at 70% |
The asymmetric stabilization windows are intentional. The short scale-up window allows the cluster to respond quickly to load spikes. The longer scale-down window prevents thrashing when load fluctuates around the threshold.
Overriding Defaults
HPA parameters are configured per runner type in organization.yaml. To
change the replica range or stabilization windows for a specific runner,
edit the corresponding entry and run tofu apply from the overlay.
Monitoring
Check HPA status for all runners in the namespace:
kubectl get hpa -n {org}-runners
This shows current and desired replica counts, current CPU utilization, and whether the autoscaler is actively scaling.
For more detail on a specific runner:
kubectl describe hpa runner-docker -n {org}-runners
Scaling Considerations
- docker: Scales frequently under mixed workloads. The default range (1—5) is usually sufficient.
- dind: Container builds are bursty. If builds queue during peak hours, consider increasing the maximum.
- rocky8/rocky9: Typically low utilization. Minimum of 1 keeps a warm pod available.
- nix: CPU-intensive builds can saturate a pod quickly. Monitor utilization and adjust the maximum if builds are queuing.
Related
- Runbook — procedures for scaling up and emergency stop
- Troubleshooting — diagnosing pod crashes from resource pressure