GloriousFlywheel Honey Runner Workspace Hygiene 2026-04-16
Snapshot date: 2026-04-16
Purpose
Capture a new class of platform failure exposed by downstream dogfooding:
runner-host workspace state on honey can block jobs before repository code
even runs.
This is not a downstream repo contract problem. It is a GloriousFlywheel runner hygiene and lifecycle problem.
Triggering Evidence
Downstream repo:
Jesssullivan/acuity-middleware- PR
#37 - latest failed run inspected:
24525417273
Failure shape:
- both failing jobs died inside
actions/checkout@v6 - no repo build or test logic executed
- both jobs failed on
honeyrunner hosts with the sameEACCESunlink error
Observed paths from the failed logs:
/home/jess/am-runners/honey-am-1/_work/acuity-middleware/acuity-middleware/pkg/LICENSE/home/jess/am-runners/honey-am-2/_work/acuity-middleware/acuity-middleware/pkg/LICENSE
Observed error:
File was unable to be removed Error: EACCES: permission denied, unlink .../pkg/LICENSE
Read
Current platform read:
- GloriousFlywheel runner pickup is working
- repo contract is not the blocker here
- stale read-only files in persistent runner work directories can prevent
actions/checkoutfrom cleaning a prior workspace - a repo-side cleanup step after checkout cannot fix this class of failure, because checkout itself is where the job dies
Meaning:
- the immediate fix is runner-host cleanup or runner replacement
- the durable fix belongs in GloriousFlywheel runner lifecycle hygiene
What This Blocks
Current direct downstream consequence:
acuity-middleware#37is blocked even though the latest repo patch already attempted post-checkout cleanup
Broader implication:
- any honey-hosted self-hosted runner with a stale read-only file in
_work/can fail identically - this can surface as random downstream CI instability even when the runner label contract and cache contract are correct
Immediate Operator Action
For the affected runners:
- stop, drain, or replace
honey-am-1andhoney-am-2 - remove or
chownthe stale workdirs under_work/acuity-middleware - restart the affected runners
- rerun
acuity-middleware#37
GloriousFlywheel Follow-On Work
The platform needs one explicit hygiene lane for runner-host workdirs:
- decide whether honey runners are cattle or pets for workspace state
- define a cleanup contract for
_work/*between jobs or on failure - decide whether runner replacement is cheaper than in-place salvage
- add an operator-facing audit and remediation path for stale workspace state
- make sure this does not depend on downstream repo patches to recover
Current repo-owned operator surfaces added after this note:
scripts/honey-runner-workdir-audit.shjust honey-runner-workdir-auditscripts/honey-runner-workdir-remediate.shjust honey-runner-workdir-remediate <host> <repo> [--mode unlock|remove] [--apply]docs/runners/runbook.mddocs/runners/troubleshooting.md
Contract Update 2026-04-19
The repo now carries one explicit honey runner workdir lifecycle contract:
honeyhosts may be long-lived, but_work/<repo>trees are disposable scratch- checkout failures before downstream code runs are platform-owned incidents
- bounded remediation targets one repo workdir on one host at a time
- default recovery is drain, remediate, restart or replace, then rerun
- if contamination is broader than one repo tree, or ownership drift remains after bounded remediation, the runner service or host should be replaced instead of widening salvage
Canonical contract surface:
docs/architecture/honey-runner-workdir-contract.md
Live Audit Update 2026-04-17
Live host audit on jess@100.113.89.12 narrowed the problem further:
- the affected runner service roots are:
/home/jess/am-runners/honey-am-1/_work/home/jess/am-runners/honey-am-2/_work
- both currently contain only one repo workdir:
acuity-middleware - both workdirs are large, about
5.9G - no stale
.git/index.lockwas present - no ownership mismatch was observed; expected owner remained
jess:jess - both trees contain many non-writable
.git/objects/*files - the original
pkg/LICENSEpath still exists on both hosts honey-am-2still has the exact non-writable symptom from the GitHub log:/home/jess/am-runners/honey-am-2/_work/acuity-middleware/acuity-middleware/pkg/LICENSE- mode
0555(-r-xr-xr-x)
honey-am-1has the same path but it is currently writable by owner:- mode
0755(-rwxr-xr-x)
- mode
Meaning:
- this is not generic ownership drift
- it is stale persisted checkout state inside the
acuity-middlewareworkdirs honey-am-2still has the concrete checkout blocker- both hosts also have broader read-only git-object contamination
Non-Goal
Do not treat this as a downstream repo fix request.
The failure happens before downstream code runs, so the platform should own the recovery and prevention path.