GloriousFlywheel

GloriousFlywheel Benchmark Scorecard Template 2026-04-16

Snapshot date: 2026-04-16

Purpose

Provide a concrete scorecard format for #212 so benchmark work produces comparable evidence instead of ad hoc notes.

GitHub owner: #212

keep cold-cache and warm-cache runs separate
compare like with like on the same named workload
distinguish measured GloriousFlywheel and GitHub-hosted results from vendor claims or trial evidence
record enough environment detail to reproduce the run

The first benchmark pack should cover at least these workloads:

Workload id	Repo	Workflow shape	Why it matters
`gf-validate`	`tinyland-inc/GloriousFlywheel`	validation / repo health	source-of-truth platform workload
`gf-nix-build`	`tinyland-inc/GloriousFlywheel`	Nix derivation build	clean-derivation and cache behavior
`td-site`	`tinyland-inc/tinyland.dev`	representative product workflow	org product-repo consumer
`lab-validate`	`tinyland-inc/lab`	operator / validation workflow	operator-tooling consumer
`xw-canary`	`Jesssullivan/XoxdWM`	user-repo canary workflow	public-contract and user-owned canary
`linux-xr-builder`	`linux-xr`	Linux builder workload	named builder canary when available

Lane	Type	Interpretation
GitHub-hosted	measured baseline	first-party baseline
`tinyland-docker`	measured product lane	default Linux CI builder
`tinyland-nix`	measured product lane	reproducible Nix lane
`tinyland-dind`	measured product lane	privileged container-build lane
Namespace	vendor claim or trial	breadth competitor
Blacksmith	vendor claim or trial	narrow GitHub drop-in competitor
RWX	vendor claim or trial	CI-platform comparison set

Bootstrap mode guidance:

preinstalled: runner image already has the required toolchain
determinate install: workflow bootstraps Nix with DeterminateSystems/determinate-nix-action@v3
other workflow bootstrap: workflow installs or verifies the toolchain by a different explicit step

Dimension	GitHub-hosted	GloriousFlywheel best measured lane	Commercial reference	Current read
Queue latency
Cold-start latency
Total runtime
Cache behavior
Debugability
Private-network fit
Operator overhead

After the first measured pack, write one short summary: