GloriousFlywheel

GloriousFlywheel Runner Benchmark Methodology 2026-04-16

Snapshot date: 2026-04-16

Purpose

Define a benchmark method that can answer a practical question:

How competitive is GloriousFlywheel, on real workloads, against:

GitHub-hosted runners
GloriousFlywheel ARC runners on honey
commercial alternatives such as Namespace, Blacksmith, and RWX

This note is about methodology, not marketing claims.

GitHub owner: #212

Core Principle

GloriousFlywheel should not claim competitiveness on speed, reliability, cost, or observability without measured evidence on named workloads.

Comparison Lanes

Baseline

GitHub-hosted standard runners

Product-under-test

GloriousFlywheel tinyland-docker
GloriousFlywheel tinyland-nix
GloriousFlywheel tinyland-dind

Commercial reference set

Namespace
Blacksmith
RWX

The commercial set may be evaluated through trial accounts, public benchmark material, or downstream customer evidence, but product claims and measured results must stay distinct.

Benchmark Dimensions

Performance

queue latency
cold-start latency
toolchain bootstrap time
time-to-first-step
total wall-clock job duration
container build duration
Nix build duration
test-suite duration

Cache Behavior

cache restore time
cache save time
cache hit rate
remote-cache throughput
Docker layer reuse effectiveness

Reliability

flaky job rate
failed-run rate attributable to runner platform
retry success rate
time lost to platform-induced failures

Operator Experience

debug path quality
log quality
visibility into queueing and failure causes
time to isolate an infrastructure-caused failure

Private-Network Fit

ability to reach tailnet-only services
ability to keep cluster management private
secrets and identity overhead

Cost And Overhead

direct compute cost
cache/storage cost
artifact or transfer cost
operator-maintenance cost
one-time setup effort

Candidate Workloads

The first benchmark pack should use named repos that already matter:

tinyland-inc/GloriousFlywheel
- validation workflow
- Nix derivation build
- container/image path where relevant
tinyland-inc/tinyland.dev
- representative Nix or site-validation workflow
tinyland-inc/lab
- operator/validation workflow
- Nix build or devshell workflow as the self-hosted Nix bootstrap canary
Jesssullivan/XoxdWM
- user-repo canary workflow
linux-xr as the named Linux-builder canary, if the workload is available for controlled comparison

Benchmark Rules

compare like with like
separate cold-cache and warm-cache runs
do not blend GitHub-hosted and self-hosted results into one number
distinguish measured result from vendor-claimed result
record enough config context that another operator could reproduce the run
separate runner bootstrap overhead from repo build logic when self-hosted lanes install toolchains during the workflow

Minimum Output

The first benchmark report should include:

workload name
repo
runner lane
bootstrap mode
bootstrap overhead
run date
cold or warm cache mode
total runtime
queue latency
major cache notes
success or failure
key operator observations

Acceptance Criteria

at least 3 representative workloads are benchmarked locally against GitHub-hosted and GloriousFlywheel lanes
at least one tinyland-nix workload captures explicit Nix bootstrap cost instead of hiding it inside total runtime
one comparison table exists for:
- GitHub-hosted baseline
- GloriousFlywheel ARC lane
- commercial reference claims or measured trial results
the repo has a written “what we currently win on / what we currently lose on” summary grounded in those runs

Non-Goals

do not publish vendor takedowns
do not turn public marketing claims into internal truth without measurement
do not benchmark every repo before the first comparison set is useful