GloriousFlywheel Benchmark Scorecard Template 2026-04-16
Snapshot date: 2026-04-16
Purpose
Provide a concrete scorecard format for #212 so benchmark work produces
comparable evidence instead of ad hoc notes.
GitHub owner: #212
Scorecard Rules
- keep cold-cache and warm-cache runs separate
- compare like with like on the same named workload
- distinguish measured GloriousFlywheel and GitHub-hosted results from vendor claims or trial evidence
- record enough environment detail to reproduce the run
Workload Pack
The first benchmark pack should cover at least these workloads:
| Workload id | Repo | Workflow shape | Why it matters |
|---|---|---|---|
gf-validate |
tinyland-inc/GloriousFlywheel |
validation / repo health | source-of-truth platform workload |
gf-nix-build |
tinyland-inc/GloriousFlywheel |
Nix derivation build | clean-derivation and cache behavior |
td-site |
tinyland-inc/tinyland.dev |
representative product workflow | org product-repo consumer |
lab-validate |
tinyland-inc/lab |
operator / validation workflow | operator-tooling consumer |
xw-canary |
Jesssullivan/XoxdWM |
user-repo canary workflow | public-contract and user-owned canary |
linux-xr-builder |
linux-xr |
Linux builder workload | named builder canary when available |
Result Table
| Run id | Workload id | Lane | Cold/warm | Bootstrap mode | Bootstrap overhead | Queue latency | Time to first step | Total runtime | Cache notes | Result | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|
| example | gf-validate |
GitHub-hosted | cold | preinstalled | 0 | ||||||
| example | gf-validate |
tinyland-docker |
cold | preinstalled | 0 | ||||||
| example | gf-nix-build |
tinyland-nix |
cold | determinate install |
Lane Definitions
| Lane | Type | Interpretation |
|---|---|---|
| GitHub-hosted | measured baseline | first-party baseline |
tinyland-docker |
measured product lane | default Linux CI builder |
tinyland-nix |
measured product lane | reproducible Nix lane |
tinyland-dind |
measured product lane | privileged container-build lane |
| Namespace | vendor claim or trial | breadth competitor |
| Blacksmith | vendor claim or trial | narrow GitHub drop-in competitor |
| RWX | vendor claim or trial | CI-platform comparison set |
Bootstrap mode guidance:
preinstalled: runner image already has the required toolchaindeterminate install: workflow bootstraps Nix withDeterminateSystems/determinate-nix-action@v3other workflow bootstrap: workflow installs or verifies the toolchain by a different explicit step
Summary Table
| Dimension | GitHub-hosted | GloriousFlywheel best measured lane | Commercial reference | Current read |
|---|---|---|---|---|
| Queue latency | ||||
| Cold-start latency | ||||
| Total runtime | ||||
| Cache behavior | ||||
| Debugability | ||||
| Private-network fit | ||||
| Operator overhead |
Win/Lose Prompt
After the first measured pack, write one short summary:
- where GloriousFlywheel clearly wins
- where it currently loses
- where evidence is still missing
- which claims should stay out of public docs until measured