GloriousFlywheel

Bazel Benchmarking

GloriousFlywheel benchmark evidence must keep three Bazel modes separate:

cold-local: isolated local Bazel output base, no remote cache, no executor
shared-cache-backed: shared Bazel remote cache, no executor
executor-backed: shared Bazel remote cache plus explicit BAZEL_REMOTE_EXECUTOR

Do not combine these modes into one speed claim. Cache-backed local execution and executor-backed action execution answer different product questions.

Harness

Use the repo-managed harness:

scripts/benchmark/gf-bazel-mode-benchmark.sh \
  --mode shared-cache-backed \
  --command build \
  --target //app:build \
  --runs 3

For executor-backed samples:

GF_BAZEL_SUBSTRATE_MODE=executor-backed \
BAZEL_REMOTE_CACHE=grpc://bazel-cache.nix-cache.svc.cluster.local:9092 \
BAZEL_REMOTE_EXECUTOR=grpc://gf-reapi-cell.gf-rbe.svc.cluster.local:8980 \
scripts/benchmark/gf-bazel-mode-benchmark.sh \
  --mode executor-backed \
  --command build \
  --target //app:build \
  --runs 3

Executor-backed samples default to GF_BENCHMARK_FORCE_EXECUTION=true, which passes --remote_accept_cached=false. That keeps cache-hit-only runs from being cited as RBE evidence.

To run every mode:

scripts/benchmark/gf-bazel-mode-benchmark.sh \
  --mode all \
  --command build \
  --target //app:build \
  --runs 3

Output

The harness writes one JSON file and one log file per sample into RESULTS_DIR (default /tmp/gf-benchmark). Each JSON sample records:

bazel_mode
target and Bazel command
total/build duration
remote cache endpoint used for the sample
remote executor endpoint used for the sample
whether execution was forced
whether the Bazel output base was isolated
log file path
success or failure status

Generate a markdown scorecard from a result directory:

scripts/benchmark/parse-results.sh /tmp/gf-benchmark

The manual Runner Benchmarks workflow supports bazel-build and bazel-test. Select bazel_mode separately from the workload. If executor-backed is selected, the chosen runner lane must already receive BAZEL_REMOTE_CACHE, BAZEL_REMOTE_EXECUTOR, and GF_BAZEL_SUBSTRATE_MODE=executor-backed.

Evidence Rules

A shared-cache-backed sample is cache-forward evidence, not RBE evidence.
An executor-backed sample is only countable when the sample JSON contains a non-empty remote_executor and forced_execution=true, or the run otherwise includes explicit proof that actions executed remotely.
A cold-local sample is a baseline, not the intended product path.
RustFS state or Attic health does not imply RBE CAS/action-cache readiness.
Publish benchmark numbers only with the raw JSON artifact or a cited scorecard generated from it.

Validate harness wiring with:

just bazel-benchmark-modes-contract-check

Bazel Benchmarking

Bazel Benchmarking

Harness

Output

GitHub Workflow

Evidence Rules