GloriousFlywheel

GloriousFlywheel Benchmark Scorecard 2026-04-19

Snapshot date: 2026-04-19

GitHub owner: #212

Purpose

Record the current measured benchmark evidence for the release baseline on the actual merged main branch.

This note is intentionally narrower than the full competitive benchmark plan:

Measured commit: e6c5871310435c387720de55b74e7ddcddfd258a

workflow: Runner Benchmarks
dispatch date: 2026-04-19
workload selection: all
measured lanes:
- tinyland-nix
- tinyland-nix-heavy
measured workloads:
- nix-build via nix build .#runner-dashboard-image --no-link
- flake-check via nix flake check
bootstrap mode: DeterminateSystems/determinate-nix-action@v3 through .github/actions/nix-job/action.yml
cache posture during these runs:
- Attic cache: unknown in artifact output
- Bazel cache: available in artifact output

Run id	Lane	Queue latency	Time to first step	Benchmark step duration	Timed workload total	Approx bootstrap/setup before timers	Result
`24641466958`	`tinyland-nix`	`47s`	`48s`	`26.000s`	`13.930s`	`12.070s`	success
`24641466963`	`tinyland-nix-heavy`	`29s`	`30s`	`1383.000s`	`1368.150s`	`14.850s`	success

Notes:

queue latency is measured from workflow creation to job start
time to first step is measured from workflow creation to the first job step
bootstrap/setup before timers is the benchmark-step wall clock minus the summed workload timers from the artifact JSON
the in-workload timers come from scripts/benchmark/runner-benchmark.sh and only cover the timed command body plus the script’s own minimal wrapper

Run id	Lane	Workload id	Total runtime	Build runtime	In-script overhead	Attic	Bazel cache	Nix store size	Hostname
`24641466958`	`tinyland-nix`	`gf-nix-build`	`1.222s`	`1.220s`	`0.002s`	`unknown`	`available`	`5021 MiB`	`tinyland-nix-fhdlf-runner-75fdt`
`24641466958`	`tinyland-nix`	`gf-flake-check`	`12.708s`	`12.706s`	`0.002s`	`unknown`	`available`	`5021 MiB`	`tinyland-nix-fhdlf-runner-75fdt`
`24641466963`	`tinyland-nix-heavy`	`gf-nix-build`	`1m 25.411s`	`1m 25.408s`	`0.003s`	`unknown`	`available`	`1813 MiB`	`tinyland-nix-heavy-7n5bq-runner-wnctg`
`24641466963`	`tinyland-nix-heavy`	`gf-flake-check`	`21m 22.739s`	`21m 22.736s`	`0.003s`	`unknown`	`available`	`6440 MiB`	`tinyland-nix-heavy-7n5bq-runner-wnctg`

the repo-owned benchmark workflow runs successfully on merged main
both currently documented Nix lanes produced artifact-backed results
the heavy lane is not theoretical; it completed a real nix flake check benchmark on main
explicit Nix bootstrap overhead exists and is now separated from the timed workload body

no GitHub-hosted baseline is included in this scorecard yet
no commercial comparison lane is included yet
no warm-cache versus cold-cache split is controlled yet
this is still only one repo and two workload shapes, not the full pack described in the methodology note

GloriousFlywheel has real measured Nix lanes on merged main; the source repo is not relying on hypothetical runner claims
the heavy lane is real and can complete a full nix flake check benchmark, which is materially stronger evidence than a light smoke-only contract
bootstrap/setup overhead is now separated from the timed workload body, so self-hosted runner cost is not hidden inside one opaque wall-clock number
the benchmark workflow produces reproducible artifacts and a parsable scorecard instead of ad hoc timing notes

queue latency is still well above the aspirational < 15s target in the current measured runs (29s to 47s)
the current measured pack is too narrow to support broad competitiveness claims against GitHub-hosted or commercial alternatives
cache behavior is only visible as coarse availability metadata in the current artifacts; real hit-rate and restore/save timing evidence is still missing

This scorecard is current enough for the release baseline checklist because it documents the latest successful benchmark evidence on merged main.

It is not enough to make broad competitive claims for #212. Public claims should remain limited to: