Cleanup Program

Cleanup Program

This page records the completed GloriousFlywheel cleanup and convergence cycle and the current follow-on lane: pooled-substrate dogfood reset plus authority-truth cleanup across the docs and admin surfaces.

Use this page when the question is not “what is GloriousFlywheel?” but “what still needs to be reconciled, in what order, and under which boundaries?”

Snapshot date: 2026-05-01

Purpose

The repo is past foundational truthing and the completed longevity sprint:

  • the major sprint execution slices have landed
  • honey is the real on-prem baseline
  • the follow-on hardening and expansion lanes are materially complete on the GitHub execution surface
  • the square-one platform-definition lane is complete

That foundational cleanup is no longer the main uncertainty.

The completed follow-on problem became narrower and more operational:

  • the default-branch proof package is green again after both bounded recovery and the PR #404 / TIN-545 heavy-lane hardening fix
  • repo-managed dogfood still depends on self-hosted cache/env injection, cluster-local DNS reachability, and finite runner capacity
  • docs and admin surfaces still overstate enrollment, authority, or readiness in places where the live contract is more bounded
  • repo-shaped compatibility debt is still too easy to mistake for the product direction
  • owner-specific GitHub App and tfvars plumbing needs to be isolated in implementation overlays so it does not reappear as core runner taxonomy
  • the active Product Execution board has completed its stability, pooled substrate, cache authority, and auth authority milestones; future product work should be promoted intentionally as fresh productionization slices

This page now serves mainly as:

  • the structured record of the completed cleanup cycle
  • the handoff into the current pooled-substrate reset and authority-truth lane
  • the place to keep operator/docs/admin wording aligned with that narrower reality

Current Follow-On Lane

Status: closeout after the completed public-surface parity, pooled-substrate, cache authority, auth authority, TIN-545 hardening, public-alpha export, and PR #444 Docker placement slices

Focus:

  • public-docs/ is now the future public entrypoint
  • the public export/scrub/generation lane is no longer the active execution board
  • keep the completed proof recovery and heavy-lane hardening described honestly on the management and docs surfaces
  • keep the proof story explicit: PR #479 is the latest audited default-branch proof package before docs/status-only syncs. Do not fall back to older PR #404-, PR #470-, PR #477-, or PR #478-era recovery anchors as the current baseline.
  • keep current pooled-substrate wording matched to the live contract: self-hosted cache injection, cluster-local DNS reachability, bounded runner envelopes, and active owner-scope debt remain part of the baseline
  • preserve the current placement boundary: stateless Docker work and bounded ordinary Nix overflow may use sting, and kube-API mutation workflows must use the in-cluster Kubernetes service endpoint when they run inside ARC rather than relying on node-specific tailnet API reachability
  • stop letting nominal shared-label config, template consumption, repo-shaped compatibility sets, or historical exceptions read like broad solved enrollment or universal capacity
  • keep the completed ARC state rehome honest: massageithaca-browser, massageithaca-dind, and personal-* / personal-package-* compatibility lanes are now Jess-overlay-owned quarantine, not core residue; #412 still tracks their eventual retirement, while closed TIN-681 / #438 records the separate Docker-capable MassageIthaca image-publication proof through shared capability labels rather than repo-shaped workflow labels
  • keep the May-Aug RBE scaffold gated: current proof is shared cache acceleration, not Bazel remote execution
  • keep TIN-650 and TIN-758 closed as proof/policy-complete: the negative strict-contract path fails without BAZEL_REMOTE_CACHE, the positive operator path proves bounded read-only developer cache attachment through a Honey svc/bazel-cache localhost port-forward, and the supported developer-machine exposure policy is operator-provided endpoint only. GitHub #417 is closed after the downstream lab package-canary cache proof. This is not Bazel remote execution.

Done means:

  • top-level repo docs stop presenting recovery and authority truth as already settled broad rollout
  • internal and public docs describe the proof baseline as green again but bounded by self-hosted prerequisites and finite runner pressure
  • active planning/admin surfaces reflect stability recovery and authority truth rather than the already-landed public-surface MVP
  • the next productization work starts from a fresh scoped issue or project surface, not from stale cleanup umbrella momentum
  • GitHub #417 remains visible as historical cache exposure proof after the downstream package-canary evidence landed. TIN-613, TIN-620, TIN-643, and TIN-758 are complete. Keep just arc-listener-queue-drift and just arc-network-continuity-audit as future incident diagnostics, not as active cleanup debt. TIN-568, TIN-627, TIN-650, and TIN-681 are complete and should stay recorded as proof, not active runner taxonomy

Non-Negotiable Boundaries

These are already settled for the current cleanup cycle:

  • honey is the only physical cluster target
  • bumble is the durable-state node inside honey
  • sting is the stateless compute-expansion node inside honey
  • GitHub is the primary forge surface
  • GitLab is the current compatibility runner surface; Woodpecker/Codeberg remain future adapter work, not equal-maturity control planes
  • Nix bootstrap on self-hosted Nix runners is workflow-owned
  • ARC scales runner count horizontally, not one runner pod’s memory envelope
  • heavy Rust and Nix workloads belong on tinyland-nix-heavy
  • Attic and Bazel are acceleration layers, not publication surfaces
  • FlakeHub is post-bootstrap publication and discovery work, not runtime or bootstrap
  • cache reachability is part of the proof contract on self-hosted lanes
  • hosted rollback remains a manual emergency escape hatch, not an implemented automatic failover path

Cleanup work should not reopen those decisions casually.

Previously Completed Cleanup Cycle

Workstream 1: Backend Authority To Environment-Owned S3 State

Owner issues:

  • #276
  • TIN-278

Focus:

  • migrate backend authority away from transitional HTTP
  • lock the active stacks onto the now-proven environment-owned S3-compatible backend path on honey
  • update operator docs and env-var docs so the new authority contract is primary

Done means:

  • transitional HTTP is no longer the active backend authority surface
  • one S3-compatible state path is the canonical authority
  • primary docs and validation surfaces agree on that contract

Workstream 2: Broader Downstream Rollout

Status: completed via #277 / TIN-279

Focus:

  • move beyond tranche-1 canary proof into a wider rollout set
  • turn the landed adoption evidence into a repeatable migration path
  • keep pooled capability classes, owner-scope control-plane debt, and intentionally hosted exceptions honest as the rollout widens

Meaning:

  • the rollout lane no longer depends on vague follow-on language
  • further rollout execution should now happen through fresh issue slices rather than by pretending this completed workstream is still an active umbrella

Workstream 3: Dashboard Control-Plane Debt Reduction

Status: completed via #278 / TIN-280

Focus:

  • clarify dashboard read and mutation authority
  • reduce remaining GitLab-shaped or compatibility-only control-plane assumptions
  • align dashboard docs, API surfaces, and operator expectations

Workstream 4: Runner Hygiene And Orgwide Enrollment Automation

Status: completed via #279 / TIN-281

Focus:

  • automate the honey runner hygiene loop beyond manual remediation
  • tighten orgwide enrollment reporting and promotion rules
  • reduce the gap between recent repo activity and real runner enrollment

Meaning:

  • this workstream is complete as a month-scale cleanup lane
  • residual autonomy and rollout-execution gaps are now normal next-horizon platform work, not open tracker debt inside this completed cleanup program

Adjacent Platform Ask: KVM-Capable Shared Runner For VM Execution

Status: completed via #312 / TIN-330

Meaning:

  • terminal-first Rocky proof, the later Budgie graphical gate, and the capacity hardening behind them now exist as a bounded KVM floor rather than an open substrate-definition lane

Multi-Month Productization Horizon

Status: management framing completed via #320 / TIN-335

Planning surfaces:

  • GloriousFlywheel Productization Horizon (May–Aug 2026)
  • GloriousFlywheel Productization — Cache-First Dogfood and Advanced Runners

Meaning:

  • the horizon and proof ordering are explicit
  • the next execution board should only become active when current stability-recovery and authority-truth work have settled

GloriousFlywheel