GF REAPI Cell

GF REAPI Cell

gf-reapi-cell is the first GloriousFlywheel-owned remote execution proof cell. It exists to turn the RBE lane from planning language into a runnable REAPI endpoint without adopting BuildBuddy, Buildbarn, Buildfarm, or NativeLink as the product authority.

Current status:

  • implements Capabilities, ByteStream, CAS, Action Cache, Execution, and WaitExecution
  • stores CAS and action-cache data on the service-local filesystem, scoped by validated REAPI instance_name (default, system, or spoke-<slug>)
  • can optionally enforce JWT-backed tenant authorization for CAS, AC, ByteStream, Execute, and WaitExecution using Authorization: Bearer <jwt>; the default deployment remains GF_REAPI_AUTHZ_MODE=off until rollout
  • builds through nix build .#gf-reapi-cell
  • exposes an OCI image package as nix build .#gf-reapi-cell-image
  • publishes through .github/workflows/publish-gf-reapi-cell.yml to ghcr.io/tinyland-inc/gf-reapi-cell by digest
  • carries a minimal worker runtime envelope including /bin/sh, /usr/bin/env, Node 22, Python 3, glibc, the /lib64/ld-linux-x86-64.so.2 loader bridge, the compiler C++ runtime needed by hermetic Linux tool inputs such as rules_nodejs Node, and common POSIX archive tools needed by first-proof Bazel actions
  • carries Chromium in the browser-capable image path for the proved bounded Playwright static-site smoke class and the proved public consumer Puppeteer static-output smoke class; additional browser target classes still need forced proof before promotion, and browser binaries must come from the pinned Browser Runtime Authority, not from npm lifecycle downloads
  • can be exercised through the manual .github/workflows/gf-reapi-cell-proof.yml workflow or scripts/run-gf-reapi-cell-proof.sh
  • emits worker, platform, action digest, and command evidence in execution logs
  • is intended only for the explicit proof wrapper, scripts/bazel-rbe-proof.sh
  • is not wired into .bazelrc, ordinary Just build recipes, ARC runners, or public adoption docs as the default path

The initial countable proof target was //app:build. PR #564 proved that target through the explicit wrapper with worker image sha256:be2832171ac69cc9a2d012b3c789e8b765afb7cae0df8f7e9677dd6d8542dbc0; Bazel reported 2308 processes: 1439 internal, 869 remote, and both app/sveltekit_sync and app/vite_build exited 0 on the REAPI worker. PR #565 made the proof strict by default with GF_RBE_PROOF_FORCE_EXECUTION=true, --remote_accept_cached=false, fresh-window worker logs, and cache-hit-only rejection. A proof counts only when Bazel is invoked with a non-empty executor endpoint, the explicit wrapper passes both cache and executor flags, and logs show an action running through the REAPI worker rather than a cache hit or remote CI runner.

PR #582 added explicit build/test proof selection and proved //app:unit_tests on the default branch with GF_RBE_PROOF_BAZEL_COMMAND=test. Main run 25601913985 reported 1249 processes: 722 internal, 527 remote, 20 Vitest test files, 168 passing tests, and remote worker evidence for external/bazel_tools/tools/test/test-setup.sh app/unit_tests_/unit_tests.

Main run 25602726443 proved //:deployment_bundle with the Bazel build command, forced execution, 7 processes: 6 internal, 1 remote, and remote worker evidence for the rules_pkg build_tar action that writes deployment_bundle.tar.gz.

PR #605 fixed a proof-cell response-contract bug discovered while testing //examples/hello-go:hello_test: the cell was inlining every declared output file into ActionResult.OutputFile.contents instead of honoring only ExecuteRequest.InlineOutputFiles. The fixed cell image, sha256:bb5455a038bdbff2560f22491c131c2163d3089ffafedee08f937d63f35fa848, removed the prior 4 MiB Execute-response failure. The follow-up Go proof run 25632300253 then reached real rules_go remote actions before failing in GoStdlib runtime/cgo with cc: no such file or directory. Run 25634296833 then proved the intentionally pure-Go pure = "on" target with bazel_command=test, forced execution, 11 remote processes, and remote test-setup evidence for //examples/hello-go:hello_test. After the worker image carried the C/C++ wrapper closure, run 25649628233 proved the separate cgo-backed //examples/hello-go-cgo:cgo_test target with remote runtime/cgo, cgo compile, link, and test-setup evidence.

Storage Boundary

Do not back this v0 CAS/action-cache with the current RustFS service. RustFS still has bucket-index reliability debt and remains guarded interim infrastructure for existing cache/state checks. The disqualification is about the current evidence, not the product ambition: RustFS returned NoSuchBucket for existing bucket data/metadata and recovered only after a controlled restart, and there is no proved signed non-restart repair runbook for that recurrence. The proof cell should use separate ephemeral local storage for the first execution proof, then graduate to a separately designed CAS/action-cache authority only after backend reliability, auth, retention, tenant isolation, write admission, restore, and observability are selected and proved.

The durability seam for that graduation now exists in code: CAS and action-cache persistence flow through a provider-neutral BlobStore interface (internal/cell/blobstore.go) selected by GF_REAPI_BLOBSTORE_BACKEND. The default local backend preserves the historical service-local filesystem layout byte-for-byte. The s3 backend (GF_REAPI_S3_ENDPOINT, GF_REAPI_S3_BUCKET, GF_REAPI_S3_*) targets S3-compatible endpoints without selecting a provider. The live GloriousFlywheel storage plane already uses RustFS for existing self-hosted S3-compatible cache/state paths. That does not automatically promote the current RustFS topology to CAS/AC authority; TIN-1147 must prove repair, restore, retention, failure-domain behavior, and bucket-index coherence before any RustFS-backed CAS/action-cache namespace is trusted. The S3 client is dependency-free (Go stdlib plus an AWS SigV4 signer), so no vendor SDK enters the data path and the Nix vendorHash is unchanged. S3 credentials need bucket reachability plus object GET, PUT, and HEAD; /readyz performs a signed HeadBucket. The default backend stays local until an operator selects and proves an S3 endpoint and namespace for CAS/action-cache.

Age-based garbage collection (W1.3/TIN-1460) is also wired: setting GF_REAPI_BLOB_TTL (a Go duration, e.g. 168h) enables a background sweeper that evicts CAS and action-cache entries not recently accessed within the TTL, scoped strictly to instances/<name>/{cas,ac} so execution scratch, the AC audit log, and quarantine markers are never touched. GF_REAPI_GC_INTERVAL sets the sweep cadence (default 1h). The local backend implements the sweep; the S3 backend deliberately does not — object expiry there belongs to bucket lifecycle/ILM rules, so the sweeper logs that it is relying on backend policy and exits. Sweep activity is exported as gf_reapi_gc_runs_total, gf_reapi_gc_evicted_objects_total, gf_reapi_gc_evicted_bytes_total, and gf_reapi_gc_errors_total. /readyz returns 503 (with the failing reason) when the configured blob backend is unreachable.

GC is reconciled with the Bazel client cache lease (W1.3/TIN-1460): Bazel trusts that a referenced blob stays in the CAS for --experimental_remote_cache_ttl (default 3h, set explicitly on the ci-cached config). If GC evicted a blob inside that window it would break a build mid-flight, so when the deployment declares the served lease via GF_REAPI_MIN_CLIENT_CACHE_TTL, the cell refuses to start unless GF_REAPI_BLOB_TTL >= that floor. just gf-reapi-cell-lease-gc-reconcile-check is the static CI mirror of that guard.

The first W1.4/TIN-1461 size-bound primitive is present for the local backend: GF_REAPI_CAS_MAX_BYTES accepts a byte count with optional Ki/Mi/Gi/Ti suffix. It requires GF_REAPI_MIN_CLIENT_CACHE_TTL, evicts only CAS blobs older than that lease floor, orders eligible candidates by LRU, and reconciles durable quota counters after reclamation. It emits gf_reapi_size_eviction_runs_total, gf_reapi_size_evicted_objects_total, gf_reapi_size_evicted_bytes_total, gf_reapi_size_eviction_errors_total, and the gf_reapi_evicted_while_referenced_total tripwire that should remain zero by construction. Sharded/replicated topology, p99-tuned TTL from live dashboards, and threading request contexts into blob I/O remain separate gates.

The first W4.4/TIN-1475 quota primitive is present as an in-process policy behind GF_REAPI_QUOTAS. The policy is JSON with a default object and optional instances map keyed by REAPI instance_name; each entry can set maxConcurrentExecutions and maxBlobBytes, where zero means unlimited. Incoming CAS blob-size breaches and Execute concurrency breaches return gRPC ResourceExhausted and increment gf_reapi_quota_rejected_total{dimension="execution_concurrency|blob_size"}.

maxCasBytes and maxAcEntries (W4.6/TIN-1718) add durable per-tenant limits on total stored CAS bytes and action-cache entries. Their counters are seeded from an authoritative backend scan at startup, maintained live on each new write (dedup and overwrites do not double-count), and reconciled after every GC sweep, so the accounting survives process restarts. Breaches emit gf_reapi_quota_rejected_total{dimension="cas_bytes|ac_entries"}, and current usage is exported as the gf_reapi_tenant_cas_bytes / gf_reapi_tenant_ac_entries gauges (the data source for a per-tenant quota dashboard panel, which lives in the observability lane). Durable accounting requires a usage-scannable backend: the local backend supports it, while configuring a byte/entry quota on the s3 backend fails startup because that retention belongs to bucket policy.

The first W4.3/TIN-1474 executor-pool admission primitive is present behind GF_REAPI_EXECUTOR_POOLS. The policy is JSON with an optional propertyName (default Pool), a default rule, and optional instances overrides keyed by REAPI instance_name; each rule lists allowedPools. When a rule has pools, Execute loads the stored Action, reads its Action.platform exec property, and rejects missing, duplicated, or unauthorized pool values before AC lookup or execution. Rejections return gRPC PermissionDenied/InvalidArgument and increment gf_reapi_executor_pool_rejected_total{reason="missing|unauthorized|ambiguous"}. Admitted executions then pass through the first scheduler/placement seam, which records gf_reapi_scheduler_enqueued_total, gf_reapi_scheduler_started_total, gf_reapi_scheduler_completed_total, gf_reapi_scheduler_queued, gf_reapi_scheduler_inflight, and gf_reapi_scheduler_queue_seconds_bucket/_sum/_count by REAPI instance_name and pool before acquiring a scheduler worker lease. The first operator-facing W5.3 fairness view lives in docs/monitoring/gf-reapi-fairness-dashboard.json; it derives p95 queue time, max/median tenant skew, queued/running executions, throughput, and worker-pool saturation from those metrics. The first runner-side W5.4 TTFCH view lives in docs/monitoring/gf-runner-ttfch-dashboard.json and is fed by .github/workflows/ttfch-probe.yml rather than by the cell itself. GF_REAPI_WORKER_POOLS adds the first local worker-pool dispatch guardrail: a JSON policy with a default rule and optional pools overrides keyed by admitted executor-pool name, where slots bounds concurrent local worker leases for that pool and optional static workers give the lease concrete worker identity/provenance. Saturated pools queue before worker start, and /metrics exports gf_reapi_worker_pool_slots, gf_reapi_worker_pool_available_slots, and gf_reapi_worker_pool_registered_workers. This is measurable scheduler plumbing and a real local worker-pool inventory boundary.

GF_REAPI_WORKER_REGISTRY_TTL adds the first live heartbeat registry seam. When enabled with GF_REAPI_WORKER_REGISTRY_TOKEN, proof-cell workers can post authenticated heartbeats to /worker/heartbeat; non-expired live workers are preferred for scheduler leases and reflected in Execute worker provenance. Aggregate known/live/stale/leased/available counts are exported under gf_reapi_worker_registry_*. This is still an in-memory, single-cell scheduler registry. It proves live worker identity and heartbeat plumbing, not distributed remote dispatch or a durable worker-control plane.

Within that proof-local boundary, CAS and action-cache entries are now keyed by REAPI instance_name. Empty request fields map to default; explicit spoke-<slug> traffic lands under that spoke’s local namespace; ByteStream uses the leading path segment for instance routing. This closes the first routing primitive for TIN-1472.

The first W4.2/TIN-1473 authz primitive is also now present in code, but it is explicitly opt-in. GF_REAPI_AUTHZ_MODE=warn|enforce enables validation of RSA-signed OIDC-shaped bearer JWTs from configured JWKS issuers. In enforce mode, the cell maps RPCs to scopes (cas:Read, cas:Write, actioncache:Read, actioncache:Write, remoteexecution:Run) and rejects a request whose token tenant/scope set does not authorize the request instance_name. Execute can run with remoteexecution:Run, but it only reads or populates the action cache when the same caller also has the corresponding actioncache:Read or actioncache:Write scope. This is still not the full tenant model. The first Bazel credential-helper slice is present as gf-reapi-credhelper: it implements Bazel’s get protocol, reads a short-lived JWT from GF_REAPI_CREDENTIAL_HELPER_TOKEN_FILE, GF_REAPI_CREDENTIAL_HELPER_TOKEN, or the default projected-token path, and returns an Authorization: Bearer header with an expiry one minute before the token’s exp claim. That makes projected-token and explicit-token callers able to exercise the authz middleware without putting credentials into .bazelrc. Token exchange, default read-only policy, full IAM/OIDC tenant-claim rollout, remote worker dispatch, and multi-replica behavior remain separate production gates.

The first W2.1/TIN-1462 AC writer-attestation primitive is also present in code behind GF_REAPI_AC_WRITE_ATTESTATION_MODE=off|warn|enforce. When enabled, actioncache:Write is necessary but not sufficient: the caller’s validated JWT sub must also match GF_REAPI_AC_WRITE_TRUSTED_SUBJECTS. In enforce mode, direct UpdateActionResult writes from untrusted subjects return PermissionDenied; Execute still returns the execution result but refuses to populate the action cache when the writer subject is not trusted. The cell emits gf_reapi_ac_write_rejected_total for rejected attempts and structured log lines with the source RPC, instance name, tenant, subject, JTI, action digest, and reject reason. This is AC-only by design; CAS writes remain governed by digest verification and tenant authz.

The first W2.3/TIN-1464 AC audit-log primitive is present as a cell-owned JSONL append log. By default, rows land at ${GF_REAPI_STORE_ROOT}/audit/ac-writes.jsonl; GF_REAPI_AC_AUDIT_LOG_PATH can point at an absolute path, a store-root-relative path, or off. Each row records timestamp, source RPC, REAPI instance_name, JWT tenant/subject/JTI, worker image digest, platform digest, action digest, outcome, gRPC code, reject reason, and attestation mode. Accepted AC writes fail closed if the audit append fails. Rejected writes keep the original PermissionDenied behavior while the cell logs any audit append error. This is durable local evidence and an in-process tenant query primitive; the operator Resource Usage API, 30-day retention policy, and dashboard surface remain follow-on production gates.

The first W2.2/TIN-1463 AC entry-tagging primitive is present in code. The cell strips caller-supplied GF platform tags and stores a server-attached tag in ActionResult.execution_metadata.auxiliary_metadata with worker_image_digest, platform_digest, platform_digest_recipe_version, written_at, writer_subject, and writer_token_id. Direct GetActionResult reads fail closed with FailedPrecondition when the tag is missing, malformed, written by a different worker image digest, or mismatched against the action platform digest. Execute treats the same refusal as an unsafe cache miss and re-runs the action instead of returning a poisoned AC hit. Refusals increment gf_reapi_ac_platform_tag_mismatch_total and append a refused AC audit row. This is still a first primitive: heterogeneous worker allow-lists, platform-recipe migration policy, and dashboard surfacing remain follow-on gates.

The first W2.4/TIN-1465 nuke-key primitive is also present for surgical AC invalidation. scripts/gf-reapi-ac-nuke-key.py nuke requires an instance_name, an action_digest, and a matching AC audit row by default. When executed, it backs up and removes exactly one instances/<instance_name>/ac/<hash>-<size>.pb entry, appends a nuke-key event, and writes a quarantine tombstone under instances/<instance_name>/ac-quarantine/<hash>-<size>.json. The server checks that tombstone on AC writes: direct UpdateActionResult fails with FailedPrecondition, while Execute returns the remote execution result but does not populate the quarantined AC key. rollback restores from the backup and removes the tombstone. See gf-reapi-cell AC Nuke-Key Runbook.

The first W2.5/TIN-1466 chaos gate is also present as a dedicated workflow and contract test. just gf-reapi-ac-attestation-chaos-check runs the TestActionCacheAttestationChaosRejectsNonAttestedWriterWithPermissionDenied probe and validates the nightly/path-triggered workflow wiring. The probe gives an authenticated caller actioncache:Write tenant:spoke-alpha but uses a subject outside GF_REAPI_AC_WRITE_TRUSTED_SUBJECTS. Expected result: UpdateActionResult returns gRPC PermissionDenied (HTTP 403 equivalent), no AC entry is created, gf_reapi_ac_write_rejected_total increments, and the AC audit log records one outcome=rejected / reject_reason=untrusted_subject row. Authentication failures such as no token or wrong audience still fail before AC attestation; this chaos gate is the non-attested writer case named by TIN-1466.

For the live lab lane, the intended deployment boundary is:

  • namespace: gf-rbe
  • node class: sting or another explicit compute-expansion KVM/worker lane when available
  • storage: node-local proof PVC such as local-path-sting-fast-ephemeral
  • resource envelope: 4 CPU / 8Gi requested and 16 CPU / 16Gi limited for the single-cell proof deployment, because web TypeScript fanout can run many remote actions concurrently inside the proof cell. The memory limit preserves the accepted TypeScript proof envelope; the lower request keeps the pod schedulable while ARC runner pods drain after adjacent proof jobs.
  • residency policy: the committed manifest and local/operator script default remain scale-to-zero between proof runs; --apply proof runs temporarily scale the deployment to one replica and wait for rollout. The GitHub gf-reapi-cell-proof.yml workflow default keeps the cell resident after a successful apply so GF dogfoods a stable REAPI endpoint for TTFCH and follow-on proofs. Operators can opt back into teardown with the scale_to_zero_after_proof input.
  • no RustFS bucket, Attic bucket, or OpenTofu state bucket reuse

The scale-to-zero policy is the TIN-1249 capacity boundary for the committed manifest and for local/operator bounded proof windows. It exists because a resident gf-reapi-cell on sting can block the sting-pinned tinyland-nix-heavy lane used by Platform Proof if its request is treated as a standing reservation. The current GF-operated workflow intentionally differs: while TTFCH and RBE dogfood are active, its workflow default keeps the cell resident so hourly probes and back-to-back proofs do not race a missing service endpoint. If capacity pressure returns, operators should either dispatch with scale_to_zero_after_proof=true or move the cell behind the next scheduler / worker-pool placement gate rather than letting TTFCH silently measure an absent endpoint.

The reference manifest is deploy/gf-rbe/gf-reapi-cell.yaml. It is guarded by just gf-reapi-cell-manifest-check, which verifies the proof namespace, local proof storage class, digest-pinned image shape, platform identity, ingress boundary, deny-egress policy, and idle replica policy. The proof-window capacity assumption is guarded by just gf-reapi-cell-capacity-policy-check. The publication/rendering path is guarded by just gf-reapi-cell-publish-contract-check.

The image publication path writes registry credentials to a temporary Docker authfile and passes that file to the nix2container.copyTo/skopeo copy path. Do not replace that with --dest-creds or another token-bearing argv form: runner process listings are operator-visible during live incident work, and the publisher must keep registry write credentials out of child process arguments.

The proof-cell image carries only a bounded worker compatibility runtime. It includes common POSIX tools, Node, Python, glibc, the Nix C/C++ wrapper closure, the C++ runtime, zlib, and now the Chromium runtime path so remote actions can run the currently proved JavaScript, packaging, pure-Go, cgo-backed Go, Rust, C++, browser/web, and private app typecheck target-class probes. The current Worker Toolchain Model records this runtime envelope. That does not make every language target eligible: target classes still need forced proof evidence before promotion, and missing worker runtime dependencies must stay recorded as blockers in config/rbe-target-eligibility.json.

Render the manifest with the published gf-reapi-cell image digest before applying it:

GF_REAPI_CELL_DIGEST="$(just gf-reapi-cell-resolve-digest --tag latest)"

GF_REAPI_CELL_DIGEST="${GF_REAPI_CELL_DIGEST}" \
bash ./scripts/render-gf-reapi-cell-manifest.sh | kubectl apply -f -

latest is only a lookup convenience. Proofs and manifests must keep recording the resolved immutable digest, not a floating tag. The resolver reads GitHub Packages/GHCR package metadata and rejects cosign signature objects as proof image inputs.

The end-to-end proof harness records the rendered manifest, Kubernetes status, Bazel output, profile, and worker logs under an evidence directory:

GF_RBE_PROOF_MODE=explicit \
GF_RBE_PROOF_FORCE_EXECUTION=true \
GF_RBE_PROOF_BAZEL_COMMAND=build \
GF_REAPI_CELL_DIGEST=sha256:<published digest> \
bash ./scripts/run-gf-reapi-cell-proof.sh --apply --target //app:build

GF_RBE_PROOF_FORCE_EXECUTION=true is the default workflow posture. It passes --remote_accept_cached=false, adds --nocache_test_results for test proofs, injects a non-secret GF_RBE_PROOF_NONCE action environment value so cache-warm target classes get fresh action keys, and rejects proof runs that only show remote cache hits, because cache-hit continuity is not fresh remote-worker evidence.

The current proof cell does not advertise Bazel remote cache compression, so the proof harness also passes --noremote_cache_compression after consumer configs. Broad/default RBE must either keep that compatibility override or teach the production cell to advertise and serve compressed remote cache traffic before accepting consumer .bazelrc compression defaults.

After a successful run, verify the downloaded artifact before citing it as countable evidence:

bash ./scripts/verify-gf-reapi-proof-artifact.sh \
  --evidence-dir /path/to/gf-reapi-cell-proof \
  --target //:public_vendor_handoff_fixture \
  --require-force-execution \
  --require-injected-repo was110_vendor_blobs \
  --require-platform gloriousflywheel-rbe-linux-x86_64 \
  --require-image-digest sha256:<published digest>

The manual workflow runs this verifier before uploading the artifact.

For future remote test candidates, set the workflow bazel_command input or GF_RBE_PROOF_BAZEL_COMMAND=test. A build-mode proof of a js_test target is not remote test evidence.

The first web proof is target=//docs-site:playwright_chromium_smoke with bazel_command=test on browser-capable image digest sha256:a567696e341f6eb0589ece9efd6014a2133a4f10831bdad31e8dd84055eff8a0. Run 25712694947 reported 2549 processes: 1489 internal, 1060 remote, remote sveltekit_sync, remote vite_build, remote external/bazel_tools/tools/test/test-setup.sh docs-site/playwright_chromium_smoke_/playwright_chromium_smoke, and docs-site Playwright Chromium smoke passed with /bin/chromium. The test uses playwright-core against a system Chromium from the worker image; it must not download a browser during the action. The smoke harness must also create writable HOME, XDG_CONFIG_HOME, and XDG_CACHE_HOME directories in remote scratch space before launching Chromium; the non-root proof cell runs with a read-only root filesystem, and Chromium crashpad fails without a writable profile/cache home.

The first public consumer Puppeteer proof is target=//:puppeteer_chromium_smoke in tinyland-inc/omux.xoxd.ai on the same browser-capable image digest. Run 25826953857 reported 3162 processes: 1 action cache hit, 1043 remote cache hit, 1982 internal, 137 remote, remote sveltekit_sync, remote vite_build, remote external/bazel_tools/tools/test/test-setup.sh puppeteer_chromium_smoke_/puppeteer_chromium_smoke, and a passing puppeteer-core smoke with /bin/chromium. That proof only counts because the consumer disabled Puppeteer lifecycle browser downloads and launched the pinned worker Chromium by explicit executable path.

The first public consumer standalone SvelteKit/Vite build proof is target=//:build in tinyland-inc/omux.xoxd.ai on the same browser-capable image digest. Run 25891956165 reported 3155 processes: 1 action cache hit, 1173 remote cache hit, 1978 internal, 4 remote, recorded proof nonce 20260514T234057Z-25891956165-1, and showed remote lifecycle-hook actions for @tailwindcss/oxide and esbuild, remote sveltekit_sync, and remote vite_build. That proof only counts because the harness injected the non-secret GF_RBE_PROOF_NONCE action environment value and the artifact verifier rejected cache-hit-only evidence.

The public omux Playwright static-output proof is target=//:playwright_chromium_smoke in tinyland-inc/omux.xoxd.ai on the same browser-capable image digest. Run 25897326537 reported 3162 processes: 1 action cache hit, 1174 remote cache hit, 1982 internal, 6 remote, recorded proof nonce 20260515T024138Z-25897326537-1, and showed remote @tailwindcss/oxide and esbuild lifecycle hooks, remote sveltekit_sync, remote vite_build, remote external/bazel_tools/tools/test/test-setup.sh playwright_chromium_smoke_/playwright_chromium_smoke, remote generate-xml.sh, and a passing Playwright Chromium smoke with /bin/chromium. That proof only promotes one public consumer target class; it does not prove broad Playwright, Vitest browser mode, hosted E2E, or Firefox (WebKit is now proved separately for one consumer static-smoke class via run 27330688866).

The public jesssullivan.github.io consumer proofs add a second Puppeteer class, a Playwright runtime-smoke class, and a SvelteKit/Vite build-smoke class on the same browser-capable image. Runs 25777472760, 25894297074, and 25779597385 reported 2331 processes: 1477 internal, 855 remote each with bazel_command=test, forced execution, and remote test-setup evidence for puppeteer_chromium_smoke, playwright_chromium_smoke, and sveltekit_vite_build_smoke. The Playwright proof recorded proof nonce 20260515T005745Z-25894297074-1 and remote test-setup.sh playwright_chromium_smoke_/playwright_chromium_smoke with exit_code=0. These are target-class proofs only: the Puppeteer and Playwright proofs depend on disabled browser lifecycle downloads, and the SvelteKit/Vite proof does not imply publication, deployment, or hosted E2E.

The public jesssullivan.github.io Vitest refresh is target=//:types_unit_tests on the same browser-capable image digest. Run 25892939448 reported 2331 processes: 1477 internal, 855 remote, recorded proof nonce 20260515T001050Z-25892939448-1, and showed remote npm extraction, remote lifecycle-hook actions for esbuild, sharp, and puppeteer, and remote test-setup.sh types_unit_tests_/types_unit_tests with exit_code=0. This is a public SvelteKit/Vite/Vitest unit-test target class only, not broad/default web RBE.

Browser runtime authority for that proof is Chromium 138.0.7204.49 from pkgs.chromium at locked nixpkgs revision 9b008d60392981ad674e04016d25619281550a9d, exposed as GF_RBE_CHROMIUM_EXECUTABLE=/bin/chromium in the worker image. Playwright and Puppeteer target classes must consume that or another explicit browser authority by executablePath; they must not run playwright install, Puppeteer postinstall Chrome downloads, or npm/pnpm lifecycle browser downloads inside REAPI actions.

Workflow Consumer Canary

The manual GF REAPI Cell Proof workflow can also run a public-input consumer canary against the WAS-110 firmware workspace. Dispatch it with:

  • image_digest: published gf-reapi-cell image digest
  • target: //:public_vendor_handoff_fixture
  • consumer_repository: Jesssullivan/8311-was-110-firmware-builder
  • consumer_ref: main by default; override only for consumer repositories whose default branch is still different.
  • was110_public_handoff: true
  • force_execution: true

For the Darwin proof wrapper, resolve the same immutable digest before running readiness against the operator-provided macOS REAPI endpoint:

GF_REAPI_CELL_DIGEST="$(just gf-reapi-cell-resolve-digest --tag latest)"

just darwin-rbe-proof-readiness \
  --image-digest "${GF_REAPI_CELL_DIGEST}" \
  --target //build/macos:darwin_package_release_artifacts_unsigned \
  --bazel-command build \
  --remote-executor grpcs://<macos-reapi-host>:8980 \
  --consumer-repository tinyland-inc/tummycrypt \
  --check-gh-workflow \
  --probe-endpoint

That path checks out the consumer repo, uses the consumer-owned public input pinning scripts to materialize and verify vendor-blobs/public-community-repo, passes it as GF_BAZEL_INJECT_REPOSITORIES=was110_vendor_blobs=/absolute/vendor/repo, and sets GF_RBE_PROOF_BAZEL_CONFIG= so the proof does not require GloriousFlywheel’s .bazelrc inside the consumer workspace.

For private consumer repositories, set require_consumer_app_token=true. The Actions workflow then requires the existing tranche-proof GitHub App secrets and mints a repository-scoped checkout token for supported owners (tinyland-inc and Jesssullivan) with contents: read. The token is used only for the consumer checkout and is not persisted into the checked-out workspace. If those secrets are absent, the workflow fails before actions/checkout instead of producing a misleading RBE failure. Public consumer proofs should leave require_consumer_app_token=false; those checkouts use the workflow’s normal read token and do not mint an App token.

If the GitHub App permission update is not available yet, operators may choose the explicit alternate authority with consumer_checkout_authority=repo-scoped-deploy-key or consumer_checkout_authority=owner-scoped-secret instead of require_consumer_app_token=true. The deploy-key path is preferred when the operator can create read-only deploy keys on the consumer repositories. It only supports fixed per-repo secrets: GF_REAPI_CONSUMER_CHECKOUT_SSH_KEY_TINYLAND_DEV and GF_REAPI_CONSUMER_CHECKOUT_SSH_KEY_MASSAGEITHACA. The token path only supports the fixed owner-scoped secrets GF_REAPI_CONSUMER_CHECKOUT_TOKEN_TINYLAND_INC and GF_REAPI_CONSUMER_CHECKOUT_TOKEN_JESSSULLIVAN; the token must be a repository-scoped read credential for the consumer repo. It is proof-only, still uses persist-credentials: false, and must not be replaced by a broad PAT or workflow input secret.

If the token mint step fails with The permissions requested are not granted to this installation., the GitHub App installation does not currently grant repository Contents: Read-only for the requested consumer repo. That is an App permission and installation-approval blocker, not a Bazel target, worker, or remote-execution failure. After the App permission is updated and the installation is approved, rerun the same dispatch command and evaluate only the new proof artifact.

Current private consumer evidence is explicit. MassageIthaca run 25928429263 used consumer_checkout_authority=repo-scoped-deploy-key, checked out the private repo, forced execution, reported 3319 remote processes, and passed //:booking_operation_unit_tests; that promotes one private booking-operation Vitest class only. MassageIthaca run 25938855554 then used the same repo-scoped deploy-key authority, forced execution, proof nonce 20260515T200641Z-25938855554-1, and passed //:svelte_check_test with 3319 remote processes plus remote sveltekit_sync_bin_/sveltekit_sync_bin, test-setup.sh svelte_check_test_/svelte_check_test, and generate-xml.sh evidence. That promotes one private SvelteKit/svelte-check class only. MassageIthaca run 25948484331 used the same repo-scoped deploy-key authority, forced execution, proof nonce 20260516T005553Z-25948484331-1, and passed //:tsc_noemit_test with 3319 remote processes plus remote sveltekit_sync_bin_/sveltekit_sync_bin, test-setup.sh tsc_noemit_test_/tsc_noemit_test, generate-xml.sh, and a 24.2s passing TypeScript no-emit action. That promotes one private TypeScript no-emit class only. MassageIthaca run 25953478878 used the same repo-scoped deploy-key authority, forced execution, proof nonce 20260516T050753Z-25953478878-1, and passed //:playwright_tmd_smoke with 3318 remote processes plus remote sveltekit_sync_bin_/sveltekit_sync_bin, vite_build_bin_/vite_build_bin, test-setup.sh playwright_tmd_smoke_/playwright_tmd_smoke, generate-xml.sh, and a 4.5s passing Playwright TMD smoke action. That promotes one private browser-smoke class only. MassageIthaca run 25983800544 used the same repo-scoped deploy-key authority, forced execution, proof nonce 20260517T064447Z-25983800544-1, and passed //:sveltekit_node_build with 3193 remote processes plus remote lifecycle-hook execution for esbuild, msw, and sharp, remote sveltekit_sync_bin_/sveltekit_sync_bin, remote vite_build_bin_/vite_build_bin, proof artifact verifier success, and Kubernetes restart evidence that stayed at 0. That promotes one private SvelteKit/Vite production-build class only. tinyland.dev run 25928429273 also checked out through the repo-scoped deploy-key path, then failed before target analysis because the private tinyland-schemas archive URL returned 404 Not Found to Bazel’s unauthenticated external repository fetch. The v0.2.4 tag/release exists; the missing piece is private external-input auth, verified distdir placement, or a future approved mirror/repository-cache authority. PR #682 added the codeload handoff and forced remote-first proof lane, and tinyland.dev PR #401 fixed the Grafana Vitest Kubernetes-environment assertion. Main proof 25935041748 then passed //packages/tinyland-grafana:test with 1531 processes: 468 remote cache hit, 1059 internal, 4 remote, verified tummycrypt_tinyland_schemas:0.2.4 in the proof distdir, and remote test-setup.sh packages/tinyland-grafana/test_/test evidence. That promotes one private tinyland.dev Grafana package Vitest target class only; it is not durable private mirror authority or broad tinyland.dev RBE.

The first root tinyland.dev app typecheck proof attempt after the source target was cleaned is not a successful target-class proof. Main run 25969813133 checked out tinyland-inc/tinyland.dev@main through the GitHub App path, materialized the private tinyland-schemas distdir input, analyzed //:app_typecheck, and reported 5472 processes: 2346 remote cache hit, 2988 internal, 138 remote. It failed because the gf-reapi-cell pod OOMKilled under the old 4Gi memory limit while remote TsProject actions were running. That run is capacity/observability evidence for the proof cell, not acceptance for the //:app_typecheck target class.

After the proof-cell memory envelope was corrected, main run 25970619559 passed tinyland-inc/tinyland.dev //:app_typecheck with GitHub App checkout authority, the verified private tummycrypt_tinyland_schemas:0.2.4 distdir handoff, forced execution, proof nonce 20260516T191944Z-25970619559-1, 5578 processes: 1 action cache hit, 2567 remote cache hit, 2955 internal, 56 remote, remote TypeScript tsc, remote Svelte build tool, remote Vite build tool, remote app_typecheck_tool, proof artifact verifier success, and Kubernetes restart evidence that stayed at 0 before and after the proof. That promotes one private tinyland.dev root app typecheck target class only; it is not all tinyland.dev builds, all tinyland.dev tests, browser E2E, the Vite production build class, durable private mirror/repository-cache authority, broad/default web RBE, or CAS/action-cache backend suitability. Run 25978934708 then passed tinyland-inc/tinyland.dev //:app_build with GitHub App checkout authority, the same verified private tummycrypt_tinyland_schemas:0.2.4 distdir handoff, forced execution, proof nonce 20260517T021820Z-25978934708-1, 6146 processes: 3125 remote cache hit, 2959 internal, 62 remote, remote TypeScript package fanout, remote JsRunBinary app_build.log, proof artifact verifier success, and Kubernetes restart evidence that stayed at 0 before and after the proof. That promotes one private tinyland.dev root Vite/SvelteKit production-build target class only; it is not all tinyland.dev builds/tests, browser E2E, deployed app behavior, durable private mirror/repository-cache authority, broad/default web RBE, or CAS/action-cache backend suitability. Run 25981546207 then passed tinyland-inc/tinyland.dev //packages/tinyland-activitypub:test with GitHub App checkout authority, the same verified private tummycrypt_tinyland_schemas:0.2.4 distdir handoff, workspace_path=consumer-workspace, forced execution, proof nonce 20260517T044208Z-25981546207-1, 728 processes: 1 action cache hit, 299 remote cache hit, 415 internal, 14 remote, remote esbuild lifecycle-hook execution, remote TypeScript tsc for packages/tinyland-content-types, remote test-setup.sh packages/tinyland-activitypub/test_/test, remote generate-xml.sh, proof artifact verifier success, and Kubernetes restart evidence that stayed at 0 before and after the proof. That promotes one private tinyland.dev ActivityPub package Vitest target class only; it is not all tinyland.dev package tests, browser E2E, deployed app behavior, durable private mirror/repository-cache authority, broad/default web RBE, or CAS/action-cache backend suitability. Run 25984827370 then passed tinyland-inc/tinyland.dev //packages/tinyland-a11y-engine:typecheck with GitHub App checkout authority, the same verified private tummycrypt_tinyland_schemas:0.2.4 distdir handoff, workspace_path=consumer-workspace, forced execution, proof nonce 20260517T073751Z-25984827370-1, consumer checkout commit 3730c6966d5e069cff92abc7c606fca9db5b54af, 553 processes: 223 remote cache hit, 328 internal, 2 remote, remote esbuild lifecycle-hook execution, remote TypeScript tsc for packages/tinyland-color-utils, proof artifact verifier success, and Kubernetes restart evidence that stayed at 0 before and after the proof. That promotes one private tinyland.dev package TypeScript typecheck target class only; it is not all tinyland.dev package typechecks, all TypeScript, Vite/SvelteKit builds, durable private mirror/repository-cache authority, broad/default web RBE, or CAS/action-cache backend suitability.

Operators can render or dispatch the workflow through:

just gf-reapi-cell-proof-dispatch -- \
  --image-digest sha256:<published digest> \
  --target //packages/tinyland-grafana:test \
  --bazel-command test \
  --workspace-path consumer-workspace \
  --consumer-repository tinyland-inc/tinyland.dev \
  --consumer-ref main \
  --consumer-checkout-authority repo-scoped-deploy-key \
  --tinyland-schemas-private-handoff \
  --apply

Add --dry-run to print the gh workflow run command without dispatching. The dispatch itself is not RBE evidence; only the uploaded proof artifact can promote a target class.

--tinyland-schemas-private-handoff mints a GitHub App token scoped to tinyland-inc/tinyland-schemas, downloads the GitHub codeload tag archive that matches the BCR-recorded archive/refs/tags/v0.2.4.tar.gz sha256 and tinyland-schemas-0.2.4/ prefix, verifies that sha256, and places it in BAZEL_DISTDIR before Bazel starts. It is proof-run staging only: it is not durable mirror authority, repository-cache retention, CAS/action-cache authority, or broad/default RBE.

Branch proof 25930423009 showed the first live gate for this path: the gloriousflywheel GitHub App installation lacked contents: read, so token minting for tinyland-inc/tinyland-schemas failed before Bazel started. After that permission was approved, main proof 25932330729 minted the token but showed that the github.com/.../archive/refs/tags/... web URL still returned 404 to the installation token. The handoff therefore fetches the equivalent direct codeload tag archive and still verifies the BCR-recorded sha256 before Bazel sees the file. Branch proof 25932703830 then reached the target and the test passed, but it did not count as RBE because tinyland.dev’s cache-backed .bazelrc selected sandboxed,worker,local spawn strategy and remote-local fallback. Branch proof 25933145419 then forced remote-first spawn strategy and disabled remote-local fallback. That run reached //packages/tinyland-grafana:test, reported 1531 processes: 1 action cache hit, 468 remote cache hit, 1059 internal, 4 remote, and executed the Vitest test action on gf-reapi-cell-ff5f7699f-2td2v, but the remote test failed with exit_code=1 because tests/grafana-config.test.ts hit GRAFANA_SERVICE_ACCOUNT_TOKEN is not set under kubernetes environment semantics. tinyland.dev PR #401 fixed that test-environment assumption, and main proof 25935041748 passed after the merge with the same codeload handoff and forced remote-first execution. The codeload distdir handoff is still proof-run staging, not durable mirror, repository-cache, CAS/action-cache, or broad RBE authority.

This canary is public-community input evidence only. Private WAS-110 blobs still require a separate private input and worker trust boundary before they can count as product RBE evidence.

Operator Invocation

The proof wrapper is non-default and requires an explicit opt-in:

GF_RBE_PROOF_MODE=explicit \
GF_BAZEL_SUBSTRATE_MODE=executor-backed \
BAZEL_REMOTE_CACHE=grpc://bazel-cache.nix-cache.svc.cluster.local:9092 \
BAZEL_REMOTE_EXECUTOR=grpc://gf-reapi-cell.gf-rbe.svc.cluster.local:8980 \
bash ./scripts/bazel-rbe-proof.sh --target //app:build

For a checked-out consumer repo, keep the same proof wrapper and pass the consumer workspace explicitly:

GF_RBE_PROOF_MODE=explicit \
GF_BAZEL_SUBSTRATE_MODE=executor-backed \
GF_RBE_PROOF_BAZEL_CONFIG= \
BAZEL_REMOTE_CACHE=grpc://bazel-cache.nix-cache.svc.cluster.local:9092 \
BAZEL_REMOTE_EXECUTOR=grpc://gf-reapi-cell.gf-rbe.svc.cluster.local:8980 \
GF_BAZEL_INJECT_REPOSITORIES=was110_vendor_blobs=/absolute/vendor/repo \
bash ./scripts/bazel-rbe-proof.sh \
  --workspace /absolute/consumer/workspace \
  --target //:public_vendor_handoff_fixture

GF_RBE_PROOF_BAZEL_CONFIG= intentionally omits --config=ci-cached for consumer workspaces that do not define GloriousFlywheel’s .bazelrc config.

The normal product path is still scripts/bazel-cache-backed.sh; without BAZEL_REMOTE_EXECUTOR, it remains the shared cache-backed contract. With BAZEL_REMOTE_EXECUTOR and GF_BAZEL_SUBSTRATE_MODE=executor-backed, the same wrapper can exercise the opt-in executor-backed path. The landed //app:build, //app:unit_tests, //:deployment_bundle, //docs-site:build, //docs-site:playwright_chromium_smoke, the public Puppeteer/SvelteKit and Playwright consumer proofs, including the public omux Playwright static smoke, the private MassageIthaca SvelteKit/svelte-check, TypeScript no-emit, Playwright TMD smoke, and SvelteKit/Vite production-build proofs, and WAS-110 public-input proofs are real RBE implementation evidence, but they are still narrow target-class proofs, not a product claim that GloriousFlywheel broadly provides Bazel remote execution.

The latest MassageIthaca TypeScript no-emit proof is Jesssullivan/MassageIthaca //:tsc_noemit_test. Run 25948484331 used consumer_checkout_authority=repo-scoped-deploy-key, checked out the private repo, used bazel_command=test, forced execution, proof nonce 20260516T005553Z-25948484331-1, and the browser-capable worker image recorded in the manifest. Bazel reported 7662 processes: 4 action cache hit, 4343 internal, 3319 remote; worker logs show remote lifecycle-hook execution for esbuild, sharp, @sparticuz/chromium, msw, and @vercel/speed-insights, remote sveltekit_sync_bin_/sveltekit_sync_bin, remote external/bazel_tools/tools/test/test-setup.sh tsc_noemit_test_/tsc_noemit_test, and remote generate-xml.sh. The test passed in 24.2s. This proves one private TypeScript no-emit target class only, not all MassageIthaca tests, Playwright/Puppeteer browser tests, deployed flows, or broad/default web RBE.

The latest MassageIthaca Playwright TMD proof is Jesssullivan/MassageIthaca //:playwright_tmd_smoke. Run 25953478878 used consumer_checkout_authority=repo-scoped-deploy-key, checked out consumer commit 08555e16b9ee0504b1b23e6373b5b6bbfb799f5f, used bazel_command=test, forced execution, proof nonce 20260516T050753Z-25953478878-1, and the browser-capable worker image recorded in the manifest. Bazel reported 7670 processes: 3 action cache hit, 4352 internal, 3318 remote; worker logs show remote sveltekit_sync_bin_/sveltekit_sync_bin, remote vite_build_bin_/vite_build_bin, remote external/bazel_tools/tools/test/test-setup.sh playwright_tmd_smoke_/playwright_tmd_smoke, and remote generate-xml.sh. The test passed in 4.5s. This proves one private Playwright TMD browser-smoke target class only, not all MassageIthaca tests, deployed flows, or broad/default web RBE.

The latest MassageIthaca production-build proof is Jesssullivan/MassageIthaca //:sveltekit_node_build. Run 25983800544 used consumer_checkout_authority=repo-scoped-deploy-key, checked out consumer commit e06a70d12417f04568092a62e225b6c6595c3b39, used bazel_command=build, forced execution, proof nonce 20260517T064447Z-25983800544-1, and the browser-capable worker image recorded in the manifest. Bazel reported 7379 processes: 2 action cache hit, 4186 internal, 3193 remote; worker logs show remote lifecycle-hook execution for esbuild, msw, and sharp, remote sveltekit_sync_bin_/sveltekit_sync_bin, and remote vite_build_bin_/vite_build_bin. The proof artifact verifier passed and Kubernetes restart evidence stayed at 0. This proves one private SvelteKit/Vite production-build target class only, not all MassageIthaca builds/tests, deployed booking E2E, image publication, durable private mirror authority, or broad/default web RBE.

The latest tinyland.dev package typecheck proof is tinyland-inc/tinyland.dev //packages/tinyland-a11y-engine:typecheck. Run 25984827370 used consumer_checkout_authority=github-app, workspace_path=consumer-workspace, checked out consumer commit 3730c6966d5e069cff92abc7c606fca9db5b54af, staged the verified private tummycrypt_tinyland_schemas:0.2.4 distdir input, used bazel_command=build, forced execution, proof nonce 20260517T073751Z-25984827370-1, and the browser-capable worker image recorded in the manifest. Bazel reported 553 processes: 223 remote cache hit, 328 internal, 2 remote; worker logs show remote esbuild lifecycle-hook execution and remote TypeScript tsc for packages/tinyland-color-utils. The proof artifact verifier passed and Kubernetes restart evidence stayed at 0. This proves one private package TypeScript typecheck target class only, not all tinyland.dev packages, all TypeScript, durable private mirror authority, or broad/default web RBE.

Main run 25608601158 proved //docs-site:build with the Bazel build command, forced execution, 2529 processes: 1483 internal, 1046 remote, and remote JsRunBinary evidence for docs-site/.svelte-kit and docs-site/build. This proves static docs-site rendering only, not docs publication or deployment. Its earlier default-branch proof attempt, run 25607350105, remains inventory evidence only because it failed before remote execution on the old parent-package markdown glob.

GloriousFlywheel