Bazel Benchmarking

Bazel Benchmarking

GloriousFlywheel benchmark evidence must keep three Bazel modes separate:

  • cold-local: isolated local Bazel output base, no remote cache, no executor
  • shared-cache-backed: shared Bazel remote cache, no executor
  • executor-backed: shared Bazel remote cache plus explicit BAZEL_REMOTE_EXECUTOR

Do not combine these modes into one speed claim. Cache-backed local execution and executor-backed action execution answer different product questions.

Harness

Use the repo-managed harness:

scripts/benchmark/gf-bazel-mode-benchmark.sh \
  --mode shared-cache-backed \
  --command build \
  --target //app:build \
  --runs 3

For executor-backed samples:

GF_BAZEL_SUBSTRATE_MODE=executor-backed \
BAZEL_REMOTE_CACHE=grpc://bazel-cache.nix-cache.svc.cluster.local:9092 \
BAZEL_REMOTE_EXECUTOR=grpc://gf-reapi-cell.gf-rbe.svc.cluster.local:8980 \
scripts/benchmark/gf-bazel-mode-benchmark.sh \
  --mode executor-backed \
  --command build \
  --target //app:build \
  --runs 3

Executor-backed samples default to GF_BENCHMARK_FORCE_EXECUTION=true, which passes --remote_accept_cached=false. That keeps cache-hit-only runs from being cited as RBE evidence.

To run every mode:

scripts/benchmark/gf-bazel-mode-benchmark.sh \
  --mode all \
  --command build \
  --target //app:build \
  --runs 3

Output

The harness writes one JSON file and one log file per sample into RESULTS_DIR (default /tmp/gf-benchmark). Each JSON sample records:

  • bazel_mode
  • target and Bazel command
  • total/build duration
  • remote cache endpoint used for the sample
  • remote executor endpoint used for the sample
  • whether execution was forced
  • whether the Bazel output base was isolated
  • log file path
  • success or failure status

Generate a markdown scorecard from a result directory:

scripts/benchmark/parse-results.sh /tmp/gf-benchmark

GitHub Workflow

The manual Runner Benchmarks workflow supports bazel-build and bazel-test. Select bazel_mode separately from the workload. If executor-backed is selected, the chosen runner lane must already receive BAZEL_REMOTE_CACHE, BAZEL_REMOTE_EXECUTOR, and GF_BAZEL_SUBSTRATE_MODE=executor-backed.

Evidence Rules

  • A shared-cache-backed sample is cache-forward evidence, not RBE evidence.
  • An executor-backed sample is only countable when the sample JSON contains a non-empty remote_executor and forced_execution=true, or the run otherwise includes explicit proof that actions executed remotely.
  • A cold-local sample is a baseline, not the intended product path.
  • RustFS state or Attic health does not imply RBE CAS/action-cache readiness.
  • Publish benchmark numbers only with the raw JSON artifact or a cited scorecard generated from it.

Validate harness wiring with:

just bazel-benchmark-modes-contract-check

GloriousFlywheel