GloriousFlywheel

GloriousFlywheel Honey Runner Workspace Hygiene 2026-04-16

Snapshot date: 2026-04-16

Purpose

Capture a new class of platform failure exposed by downstream dogfooding: runner-host workspace state on honey can block jobs before repository code even runs.

This is not a downstream repo contract problem. It is a GloriousFlywheel runner hygiene and lifecycle problem.

Triggering Evidence

Downstream repo:

Jesssullivan/acuity-middleware
PR #37
latest failed run inspected: 24525417273

Failure shape:

both failing jobs died inside actions/checkout@v6
no repo build or test logic executed
both jobs failed on honey runner hosts with the same EACCES unlink error

Observed paths from the failed logs:

/home/jess/am-runners/honey-am-1/_work/acuity-middleware/acuity-middleware/pkg/LICENSE
/home/jess/am-runners/honey-am-2/_work/acuity-middleware/acuity-middleware/pkg/LICENSE

Observed error:

File was unable to be removed Error: EACCES: permission denied, unlink .../pkg/LICENSE

Read

Current platform read:

GloriousFlywheel runner pickup is working
repo contract is not the blocker here
stale read-only files in persistent runner work directories can prevent actions/checkout from cleaning a prior workspace
a repo-side cleanup step after checkout cannot fix this class of failure, because checkout itself is where the job dies

Meaning:

the immediate fix is runner-host cleanup or runner replacement
the durable fix belongs in GloriousFlywheel runner lifecycle hygiene

What This Blocks

Current direct downstream consequence:

acuity-middleware#37 is blocked even though the latest repo patch already attempted post-checkout cleanup

Broader implication:

any honey-hosted self-hosted runner with a stale read-only file in _work/ can fail identically
this can surface as random downstream CI instability even when the runner label contract and cache contract are correct

Immediate Operator Action

For the affected runners:

stop, drain, or replace honey-am-1 and honey-am-2
remove or chown the stale workdirs under _work/acuity-middleware
restart the affected runners
rerun acuity-middleware#37

GloriousFlywheel Follow-On Work

The platform needs one explicit hygiene lane for runner-host workdirs:

decide whether honey runners are cattle or pets for workspace state
define a cleanup contract for _work/* between jobs or on failure
decide whether runner replacement is cheaper than in-place salvage
add an operator-facing audit and remediation path for stale workspace state
make sure this does not depend on downstream repo patches to recover

Current repo-owned operator surfaces added after this note:

scripts/honey-runner-workdir-audit.sh
just honey-runner-workdir-audit
scripts/honey-runner-workdir-remediate.sh
just honey-runner-workdir-remediate <host> <repo> [--mode unlock|remove] [--apply]
docs/runners/runbook.md
docs/runners/troubleshooting.md

Contract Update 2026-04-19

The repo now carries one explicit honey runner workdir lifecycle contract:

honey hosts may be long-lived, but _work/<repo> trees are disposable scratch
checkout failures before downstream code runs are platform-owned incidents
bounded remediation targets one repo workdir on one host at a time
default recovery is drain, remediate, restart or replace, then rerun
if contamination is broader than one repo tree, or ownership drift remains after bounded remediation, the runner service or host should be replaced instead of widening salvage

Canonical contract surface:

docs/architecture/honey-runner-workdir-contract.md

Live Audit Update 2026-04-17

Live host audit on jess@100.113.89.12 narrowed the problem further:

the affected runner service roots are:
- /home/jess/am-runners/honey-am-1/_work
- /home/jess/am-runners/honey-am-2/_work
both currently contain only one repo workdir: acuity-middleware
both workdirs are large, about 5.9G
no stale .git/index.lock was present
no ownership mismatch was observed; expected owner remained jess:jess
both trees contain many non-writable .git/objects/* files
the original pkg/LICENSE path still exists on both hosts
honey-am-2 still has the exact non-writable symptom from the GitHub log:
- /home/jess/am-runners/honey-am-2/_work/acuity-middleware/acuity-middleware/pkg/LICENSE
- mode 0555 (-r-xr-xr-x)
honey-am-1 has the same path but it is currently writable by owner:
- mode 0755 (-rwxr-xr-x)

Meaning:

this is not generic ownership drift
it is stale persisted checkout state inside the acuity-middleware workdirs
honey-am-2 still has the concrete checkout blocker
both hosts also have broader read-only git-object contamination

Non-Goal

Do not treat this as a downstream repo fix request.

The failure happens before downstream code runs, so the platform should own the recovery and prevention path.