Common issues with the runner infrastructure and how to resolve them.
Symptom: Runner pod starts but does not appear in the GitLab group runner list.
Causes and fixes:
tofu apply to recreate it.Symptom: Runner pods restart repeatedly. kubectl describe pod shows
OOMKilled as the termination reason.
Fix: Increase the memory limit for the affected runner type in
organization.yaml and run tofu apply. See HPA Tuning
for resource limit configuration.
Common memory-hungry workloads:
dind: Container builds with large build contexts.nix: Derivations that compile large packages from source.Symptom: Nix builds download or compile everything from scratch despite previous builds having populated the cache.
Causes and fixes:
{org}-runners namespace.attic-cache-dev namespace. Test connectivity from a runner pod
with curl $ATTIC_SERVER.ATTIC_CACHE matches the cache name
used in attic push commands.The GitLab Runner TOML configuration has several pitfalls in Runner 17.x:
cpu_limit,
memory_limit, cpu_request, and memory_request must be specified as
flat keys in the [[runners.kubernetes]] section. Do not nest them inside
a TOML table.pod_spec with a
containers field causes a type mismatch error in Runner 17.x. Instead,
use environment = [...] on the [[runners]] section to inject
environment variables.Symptom: Pods stay in Pending state and are not scheduled.
Causes and fixes:
kubectl describe nodes. The cluster may need more nodes or the runner
resource requests may be too high.