Honey Runner Workdir Contract
Canonical lifecycle contract for persistent runner _work/* state on the
honey GitHub Actions runner hosts.
Use this when a job dies inside actions/checkout before repository code runs,
especially for EACCES unlink failures, stale read-only files, or ownership
drift under _work/<repo>.
Scope
This contract applies to:
- the persistent
_work/*trees onhoney-am-*runner hosts - failures that happen before downstream repository code starts
- operator recovery and escalation for contaminated repo workdirs
This contract does not redefine:
- ARC pod memory or placement behavior
- downstream repository cleanup after checkout succeeds
- GitHub-hosted workflow paths
Contract
honeyrunner hosts may be long-lived, but_work/<repo>trees are disposable scratch, not durable repo state- checkout failures before repo code runs are platform-owned incidents, not downstream patch requests
- the default recovery unit is one repo workdir on one host, not the whole
_work/*root - bounded remediation happens only after the affected runner has been stopped or drained
unlockis an inspection and recovery aid, not a steady-state fix- if contamination is broader than one repo workdir, or ownership drift remains after bounded remediation, replace the affected runner service or host instead of widening salvage
Default Recovery Flow
- If you have the failing GitHub Actions run URL or run id, start with
just honey-runner-checkout-triage; otherwise audit the affected host set withjust honey-runner-workdir-audit. - Reconcile the affected host set with
just honey-runner-workdir-reconcilewhen you need the direct host view or want to confirm the run-driven extraction still matches the live host state. - Stop or drain the affected runner service or host.
- Preview bounded recovery with
just honey-runner-workdir-remediate <host> <repo>, or re-runjust honey-runner-workdir-reconcileafter drain if you want the repo-owned automation path to choose the safe single-repo candidates. - Apply bounded recovery:
--mode unlock --applyif inspection or manual follow-up is still needed--applywith defaultremovemode when the repo workdir should be discarded
- Restart or replace the affected runner.
- Re-run the blocked downstream job.
Escalate To Replacement
Treat replacement as the default next step when any of the following are true:
- more than one repo workdir on the same host shows contamination
- the same repo contamination recurs after bounded cleanup
- ownership drift remains after
unlock - the operator would otherwise need broad
chown -Ror root-level cleanup over_work/*
Meaning:
- host identity may persist, but contaminated workspace state does not deserve preservation
- do not normalize repo-local or operator-manual salvage as the primary steady state
Allowed Operator Surfaces
scripts/honey-runner-workdir-audit.shscripts/honey-runner-workdir-remediate.shscripts/honey-runner-workdir-reconcile.shscripts/honey-runner-checkout-triage.pyjust honey-runner-workdir-auditjust honey-runner-workdir-remediate <host> <repo> [--mode unlock|remove] [--apply]just honey-runner-workdir-reconcile [--apply --confirm-drained]just honey-runner-checkout-triage <run-url|run-id> [--repo <owner/name>]- the runner Runbook and Troubleshooting guides
Non-Goals
- do not treat this as a downstream repository contract fix
- do not widen one-repo remediation into arbitrary cleanup of the full
_work/*root without escalation - do not depend on post-checkout cleanup inside the downstream repository for this failure class
Related Docs
- Current State — public operating contract and active gaps
- Runbook — operator recovery steps
- Troubleshooting — incident entry point
- GloriousFlywheel Honey Runner Workspace Hygiene 2026-04-16 — incident evidence and original problem framing