GloriousFlywheel Benchmark Scorecard Template 2026-04-16

GloriousFlywheel Benchmark Scorecard Template 2026-04-16

Snapshot date: 2026-04-16

Purpose

Provide a concrete scorecard format for #212 so benchmark work produces comparable evidence instead of ad hoc notes.

GitHub owner: #212

Scorecard Rules

  • keep cold-cache and warm-cache runs separate
  • compare like with like on the same named workload
  • distinguish measured GloriousFlywheel and GitHub-hosted results from vendor claims or trial evidence
  • record enough environment detail to reproduce the run

Workload Pack

The first benchmark pack should cover at least these workloads:

Workload id Repo Workflow shape Why it matters
gf-validate tinyland-inc/GloriousFlywheel validation / repo health source-of-truth platform workload
gf-nix-build tinyland-inc/GloriousFlywheel Nix derivation build clean-derivation and cache behavior
td-site tinyland-inc/tinyland.dev representative product workflow org product-repo consumer
lab-validate tinyland-inc/lab operator / validation workflow operator-tooling consumer
xw-canary Jesssullivan/XoxdWM user-repo canary workflow public-contract and user-owned canary
linux-xr-builder linux-xr Linux builder workload named builder canary when available

Result Table

Run id Workload id Lane Cold/warm Bootstrap mode Bootstrap overhead Queue latency Time to first step Total runtime Cache notes Result Notes
example gf-validate GitHub-hosted cold preinstalled 0
example gf-validate tinyland-docker cold preinstalled 0
example gf-nix-build tinyland-nix cold determinate install

Lane Definitions

Lane Type Interpretation
GitHub-hosted measured baseline first-party baseline
tinyland-docker measured product lane default Linux CI builder
tinyland-nix measured product lane reproducible Nix lane
tinyland-dind measured product lane privileged container-build lane
Namespace vendor claim or trial breadth competitor
Blacksmith vendor claim or trial narrow GitHub drop-in competitor
RWX vendor claim or trial CI-platform comparison set

Bootstrap mode guidance:

  • preinstalled: runner image already has the required toolchain
  • determinate install: workflow bootstraps Nix with DeterminateSystems/determinate-nix-action@v3
  • other workflow bootstrap: workflow installs or verifies the toolchain by a different explicit step

Summary Table

Dimension GitHub-hosted GloriousFlywheel best measured lane Commercial reference Current read
Queue latency
Cold-start latency
Total runtime
Cache behavior
Debugability
Private-network fit
Operator overhead

Win/Lose Prompt

After the first measured pack, write one short summary:

  • where GloriousFlywheel clearly wins
  • where it currently loses
  • where evidence is still missing
  • which claims should stay out of public docs until measured

GloriousFlywheel