GloriousFlywheel Benchmark Scorecard 2026-04-19
Snapshot date: 2026-04-19
GitHub owner: #212
Purpose
Record the current measured benchmark evidence for the release baseline on the
actual merged main branch.
This note is intentionally narrower than the full competitive benchmark plan:
- it captures the currently measured
GloriousFlywheelrunner lanes - it does not invent missing GitHub-hosted or commercial comparison data
- it keeps release gating tied to real runs instead of the scorecard template
Measured commit: e6c5871310435c387720de55b74e7ddcddfd258a
Current Measured Pack
- workflow:
Runner Benchmarks - dispatch date: 2026-04-19
- workload selection:
all - measured lanes:
tinyland-nixtinyland-nix-heavy
- measured workloads:
nix-buildvianix build .#runner-dashboard-image --no-linkflake-checkvianix flake check
- bootstrap mode:
DeterminateSystems/determinate-nix-action@v3through.github/actions/nix-job/action.yml - cache posture during these runs:
- Attic cache:
unknownin artifact output - Bazel cache:
availablein artifact output
- Attic cache:
Run Summary
| Run id | Lane | Queue latency | Time to first step | Benchmark step duration | Timed workload total | Approx bootstrap/setup before timers | Result |
|---|---|---|---|---|---|---|---|
24641466958 |
tinyland-nix |
47s |
48s |
26.000s |
13.930s |
12.070s |
success |
24641466963 |
tinyland-nix-heavy |
29s |
30s |
1383.000s |
1368.150s |
14.850s |
success |
Notes:
- queue latency is measured from workflow creation to job start
- time to first step is measured from workflow creation to the first job step
- bootstrap/setup before timers is the benchmark-step wall clock minus the summed workload timers from the artifact JSON
- the in-workload timers come from
scripts/benchmark/runner-benchmark.shand only cover the timed command body plus the script’s own minimal wrapper
Workload Results
| Run id | Lane | Workload id | Total runtime | Build runtime | In-script overhead | Attic | Bazel cache | Nix store size | Hostname |
|---|---|---|---|---|---|---|---|---|---|
24641466958 |
tinyland-nix |
gf-nix-build |
1.222s |
1.220s |
0.002s |
unknown |
available |
5021 MiB |
tinyland-nix-fhdlf-runner-75fdt |
24641466958 |
tinyland-nix |
gf-flake-check |
12.708s |
12.706s |
0.002s |
unknown |
available |
5021 MiB |
tinyland-nix-fhdlf-runner-75fdt |
24641466963 |
tinyland-nix-heavy |
gf-nix-build |
1m 25.411s |
1m 25.408s |
0.003s |
unknown |
available |
1813 MiB |
tinyland-nix-heavy-7n5bq-runner-wnctg |
24641466963 |
tinyland-nix-heavy |
gf-flake-check |
21m 22.739s |
21m 22.736s |
0.003s |
unknown |
available |
6440 MiB |
tinyland-nix-heavy-7n5bq-runner-wnctg |
Current Read
What Is Proven
- the repo-owned benchmark workflow runs successfully on merged
main - both currently documented Nix lanes produced artifact-backed results
- the heavy lane is not theoretical; it completed a real
nix flake checkbenchmark onmain - explicit Nix bootstrap overhead exists and is now separated from the timed workload body
What Is Not Yet Proven
- no GitHub-hosted baseline is included in this scorecard yet
- no commercial comparison lane is included yet
- no warm-cache versus cold-cache split is controlled yet
- this is still only one repo and two workload shapes, not the full pack described in the methodology note
What We Currently Win On
- GloriousFlywheel has real measured Nix lanes on merged
main; the source repo is not relying on hypothetical runner claims - the heavy lane is real and can complete a full
nix flake checkbenchmark, which is materially stronger evidence than a light smoke-only contract - bootstrap/setup overhead is now separated from the timed workload body, so self-hosted runner cost is not hidden inside one opaque wall-clock number
- the benchmark workflow produces reproducible artifacts and a parsable scorecard instead of ad hoc timing notes
What We Currently Lose On
- queue latency is still well above the aspirational
< 15starget in the current measured runs (29sto47s) - the current measured pack is too narrow to support broad competitiveness claims against GitHub-hosted or commercial alternatives
- cache behavior is only visible as coarse availability metadata in the current artifacts; real hit-rate and restore/save timing evidence is still missing
Where Evidence Is Still Missing
- GitHub-hosted baseline runs on the same named workloads
- commercial trial or clearly separated vendor-claim comparison rows
- warm-cache versus cold-cache splits
- a broader workload pack beyond the source repo’s two current Nix shapes
Release-Gate Read
This scorecard is current enough for the release baseline checklist because it
documents the latest successful benchmark evidence on merged main.
It is not enough to make broad competitive claims for #212. Public claims
should remain limited to:
tinyland-nixandtinyland-nix-heavyare real measured lanes- benchmark automation and artifact capture exist on the source repo
- broader GitHub-hosted and commercial comparisons are still pending