Bazel Benchmarking
GloriousFlywheel benchmark evidence must keep three Bazel modes separate:
cold-local: isolated local Bazel output base, no remote cache, no executorshared-cache-backed: shared Bazel remote cache, no executorexecutor-backed: shared Bazel remote cache plus explicitBAZEL_REMOTE_EXECUTOR
Do not combine these modes into one speed claim. Cache-backed local execution and executor-backed action execution answer different product questions.
Harness
Use the repo-managed harness:
scripts/benchmark/gf-bazel-mode-benchmark.sh \
--mode shared-cache-backed \
--command build \
--target //app:build \
--runs 3
For executor-backed samples:
GF_BAZEL_SUBSTRATE_MODE=executor-backed \
BAZEL_REMOTE_CACHE=grpc://bazel-cache.nix-cache.svc.cluster.local:9092 \
BAZEL_REMOTE_EXECUTOR=grpc://gf-reapi-cell.gf-rbe.svc.cluster.local:8980 \
scripts/benchmark/gf-bazel-mode-benchmark.sh \
--mode executor-backed \
--command build \
--target //app:build \
--runs 3
Executor-backed samples default to GF_BENCHMARK_FORCE_EXECUTION=true, which
passes --remote_accept_cached=false. That keeps cache-hit-only runs from
being cited as RBE evidence.
To run every mode:
scripts/benchmark/gf-bazel-mode-benchmark.sh \
--mode all \
--command build \
--target //app:build \
--runs 3
Output
The harness writes one JSON file and one log file per sample into RESULTS_DIR
(default /tmp/gf-benchmark). Each JSON sample records:
bazel_mode- target and Bazel command
- total/build duration
- remote cache endpoint used for the sample
- remote executor endpoint used for the sample
- whether execution was forced
- whether the Bazel output base was isolated
- log file path
- success or failure status
Generate a markdown scorecard from a result directory:
scripts/benchmark/parse-results.sh /tmp/gf-benchmark
GitHub Workflow
The manual Runner Benchmarks workflow supports bazel-build and
bazel-test. Select bazel_mode separately from the workload. If
executor-backed is selected, the chosen runner lane must already receive
BAZEL_REMOTE_CACHE, BAZEL_REMOTE_EXECUTOR, and
GF_BAZEL_SUBSTRATE_MODE=executor-backed.
Evidence Rules
- A
shared-cache-backedsample is cache-forward evidence, not RBE evidence. - An
executor-backedsample is only countable when the sample JSON contains a non-emptyremote_executorandforced_execution=true, or the run otherwise includes explicit proof that actions executed remotely. - A
cold-localsample is a baseline, not the intended product path. - RustFS state or Attic health does not imply RBE CAS/action-cache readiness.
- Publish benchmark numbers only with the raw JSON artifact or a cited scorecard generated from it.
Validate harness wiring with:
just bazel-benchmark-modes-contract-check