Per-Spoke `remote_instance_name` Routing Design

Per-Spoke remote_instance_name Routing Design

Decision summary

  • Status: First gf-reapi-cell implementation slice in progress (W4.1 / TIN-1472, under parent E4 / TIN-1448).
  • Rule: Each spoke gets its own remote_instance_name = spoke-<slug>. Cross-tenant reads and writes default-deny.
  • Wiring: Client .bazelrc sets --remote_instance_name=spoke-<slug> per spoke; gf-reapi-cell extracts instance_name from the standard REAPI request field and ByteStream path prefix, validates it, keys CAS/AC storage by it, and logs instance_name on CAS/AC/ByteStream/Execute operations.
  • What blocks if absent: E4/TIN-1448 cannot close. The tenant model is decoration without a routing primitive, and W4.2 IAM (TIN-1473), W4.3 executor pools (TIN-1474), W4.4 quota enforcement (TIN-1475), W2.3 audit log (TIN-1464), and W5.3 fairness panel (TIN-1479) all need this primitive before they can attach to it.
  • Scope: REAPI-side routing semantics for instance_name. Not IAM enforcement, not pool selection, not quota math, not durable production CAS storage — those siblings call out to this primitive but own their own contracts.

The Problem

The three (of five) spoke modules relevant to instance-name routing that landed on this branch (tofu/modules/spoke-cache-quota, tofu/modules/spoke-runner-binding, tofu/modules/spoke-state-namespace, commit 721af25) ship a tenant declaration layer: a per-spoke ConfigMap named spoke-cache-quota-<slug> labelled tinyland.dev/spoke=<slug>, with attic_namespace, bazel_cache_prefix, cache_gib, and ttl_days. The declaration is operator-grade and idempotent. What it is not is enforcement. Today, a spoke ConfigMap saying “I have 50 GiB of cache budget under spokes/elders/” is a hint to the cache services, not a wall the REAPI layer holds up. The Bazel client doesn’t know about the prefix; the gf-reapi-cell CAS handler doesn’t read it; nothing in the request path keys storage by spoke.

Without instance_name routing, every spoke shares one global CAS and AC namespace. The consequences are concrete: (1) noisy-neighbor eviction — a spoke with a hot working set evicts a quieter spoke’s digests on the same LRU clock; (2) digest-guess info disclosureGetBlob(digest) returns 200 for any caller who guesses or learns a digest, regardless of which spoke authored the blob, because there is no namespace key in the lookup; (3) no per-tenant audit — every operation looks like it came from “the cell,” not “spoke-elders”; (4) no path to per-tenant quotas — W4.4 cannot enforce a budget against a tenant whose identity is not on the request. The instance_name field is REAPI’s native routing primitive for exactly this problem. This doc turns it on, picks semantics, and walks the wire.

What remote_instance_name Actually Is

remote_instance_name is a field in REAPI v2’s build.bazel.remote.execution.v2.*Request messages and in the ByteStream resource-name path. The REAPI spec defines it as a free-form string: “the instance of the execution system to operate against.” That’s the entire specification. Concretely:

  • It appears on every CAS request (FindMissingBlobs, BatchReadBlobs, BatchUpdateBlobs), every AC request (GetActionResult, UpdateActionResult), and Execute / WaitExecution.
  • For ByteStream, it is the leading segment of the resource name ({instance_name}/blobs/{hash}/{size} for reads, uploads/... for writes).
  • The server side decides what the string means. Bazel clients pass whatever --remote_instance_name says; the server picks routing, namespacing, and authz semantics on top of the field.
  • Real implementations diverge: BuildBuddy treats instance_name as a namespace prefix in storage and a routing key for executor pools. Buildbarn (bb-storage) uses it to select a storage configuration block. buchgr/bazel-remote honors it as a literal on-disk path component, with no per-instance auth.

We pick the semantics. The REAPI spec gives us the field; everything else in this doc is GloriousFlywheel’s contract on top of it.

GloriousFlywheel Semantics

Concrete decisions, made now, written so Codex and the operator can both grep for them later:

  • Format. spoke-<slug> where <slug> matches the ^[a-z][a-z0-9-]{1,62}$ regex enforced by every spoke-* tofu module (spoke-cache-quota/variables.tf, spoke-runner-binding/variables.tf, spoke-state-namespace/variables.tf). The spoke- literal prefix is part of the wire value, not implied — so the field carries spoke-elders, not bare elders. This matches the existing attic_namespace = "spoke-${var.spoke_slug}" convention in spoke-cache-quota/main.tf and the bazel_cache_prefix = "spokes/${var.spoke_slug}/" convention; the spoke- prefix is the no-collision signal that this is a tenant identity, not a route or a path.
  • Reserved instances. Two non-spoke instance names are reserved:
    • default — for non-spoke dogfood traffic during migration, e.g. the existing bazel-cache lane and any caller that has not yet adopted --remote_instance_name. Writable during migration, read-only in the steady state, deleted after the 30-day tail (see “The Non-Spoke default Instance” below).
    • system — for internal health checks, synthetic probes (TTFCH from slo.md), and gf-reapi-cell self-tests. Writable by the cell only; rejected from external client routes.
  • Slug source of truth. The spoke slug is the same string the spoke-* tofu modules consume — there is exactly one slug per spoke, and it shows up in: the ConfigMap name (spoke-cache-quota-<slug>), the Attic namespace, the Bazel cache prefix, the env-reaper IAM role name, the tinyland.dev/spoke label, and now remote_instance_name. New slugs are not minted by this doc; they are minted by lanes.json (referenced in spoke-runner-binding/variables.tf) and consumed here.
  • Lifetime. Per-spoke. The instance exists from the first spoke apply to env-reaper takedown. Cleanup is the env-reaper’s job (already scoped per spoke in spoke-state-namespace/variables.tf’s iam_role_name_prefix/create_reaper_iam shape). When a spoke goes away, its instance_name becomes read-only for a documented tail (default 7 days) and then deletable.
  • Migration shape. default stays writable through rollout. Each spoke flips one at a time: the spoke’s CI starts sending --remote_instance_name=spoke-<slug> while default continues to accept traffic from un-migrated callers. Once every active caller has migrated, default flips to read-only, then deleted.

End-to-End Wiring

Walking the request path top to bottom, with the concrete change at each layer. The principle: instance_name is set once at the edge (the Bazel client) and carried as context everywhere else. Nothing in the middle invents an instance; nothing at the bottom guesses one.

Client side — .bazelrc

Two acceptable shapes. The doc does not pre-decide; the W4.1 implementation picks one.

  • Per-spoke .bazelrc.<slug>. Each spoke’s repo (or each spoke’s CI invocation) imports .bazelrc.<slug> which sets:
    build --remote_instance_name=spoke-<slug>
    build --remote_executor=grpc://gf-reapi-cell.gf-rbe.svc.cluster.local:8980
    build --remote_cache=grpc://gf-reapi-cell.gf-rbe.svc.cluster.local:8980
  • Unified .bazelrc with --config=spoke-<slug>. One shared .bazelrc with named config blocks:
    build:spoke-elders --remote_instance_name=spoke-elders
    build:spoke-blahaj --remote_instance_name=spoke-blahaj
    Callers invoke bazel build --config=spoke-<slug> //....

Either way: there is no default value baked into the client. A Bazel invocation with no --remote_instance_name falls through to default on the server side, which is observable on the dashboard and (after migration) rejected.

Server side — gf-reapi-cell

The cell already implements Capabilities, ByteStream, CAS, AC, Execution, and WaitExecution (docs/build-system/gf-reapi-cell.md). The first implementation slice makes those proof-cell handlers branch on instance_name and key local proof CAS/AC storage by it. The change:

  1. Extract. On every inbound RPC, read request.instance_name (for ByteStream, parse the leading path segment of the resource name).
  2. Validate. Match against ^(spoke-[a-z][a-z0-9-]{1,62}|default|system)$. Reject (gRPC INVALID_ARGUMENT) on miss. No silent fallthrough to default — that would mask client misconfiguration.
  3. Propagate. Every handler — CAS, AC, Execution, audit emitter, metric labeler — uses the validated instance name. The first Go slice uses explicit helper parameters rather than a hidden context value so tests can prove the storage key directly.
  4. Log. Structured log line per RPC includes instance_name as a first-class field, alongside the existing worker, platform, action_digest, and command evidence.

CAS layer

The CAS becomes (instance_name, digest)-keyed instead of digest-keyed. Whatever storage primitives gf-reapi-cell lands first under cas-primitives.md, the keying shape is the same:

  • FindMissingBlobs(instance_name, [digests]) consults only that instance’s namespace.
  • BatchReadBlobs(instance_name, [digests]) returns blobs from that instance’s namespace. A digest that exists in spoke-A’s namespace but not spoke-B’s is reported as missing for a spoke-B caller — not as a cross-instance fallthrough.
  • BatchUpdateBlobs(instance_name, [(digest, content)]) writes into that instance’s namespace.
  • ByteStream Read and Write parse the leading instance_name from the resource name and route accordingly.

Peer designs (inspiration only — these are class peers, not adoption candidates) take this primitive at different layers: BuildBuddy keys CAS by API-key-scoped namespace, Buildbarn routes instance_name to a per-instance bb-storage config block, bazel-remote uses the on-disk path component with no enforced isolation. gf-reapi-cell owns its CAS storage primitive end-to-end (see cas-primitives.md); the keying shape this doc requires — (instance_name, digest) as the lookup pair — is what gf-reapi-cell adds. Mechanical path-component prefixing on its own is not enough; isolation at the auth layer is what makes the wall real (see “Integration with Siblings” below: E4/W4.2 IAM).

AC layer

Same shape, narrower surface:

  • GetActionResult(instance_name, action_digest) is keyed by the pair.
  • UpdateActionResult(instance_name, action_digest, action_result) writes into the pair’s namespace.
  • Cross-tenant action result reuse is default-denied. A spoke-B caller asking for an action result that exists in spoke-A’s AC gets NOT_FOUND — not the action result.

This is the AC-side wall that makes E2/TIN-1446 honest under a tenant model. Without it, a digest-guessing tenant could trick the AC into surfacing another tenant’s output. With it, the AC’s “did we compute this before?” question is scoped per tenant by construction.

Audit log

Every CAS/AC operation emits an audit record with instance_name as a required field. The schema aligns with W2.3 audit log (TIN-1464); this doc names the field, that doc owns the rest of the schema. Minimum payload shape:

{
  "ts": "2026-05-18T22:17:03Z",
  "rpc": "BatchReadBlobs",
  "instance_name": "spoke-elders",
  "client_id": "<from IAM, W4.2>",
  "digests": ["sha256:..."],
  "bytes": 12345,
  "result": "ok | not_found | denied | error"
}

Audit records ship to the same log surface gf-reapi-cell already writes worker / platform / action evidence to. The field is on every record, not optional, not nullable.

Default-Deny Semantics Matrix

The wall this doc puts up, in one grid. “Cross-tenant” = a caller routing instance_name=spoke-A querying a digest or action authored by instance_name=spoke-B.

Operation Same-tenant behavior Cross-tenant behavior Why
FindMissingBlobs normal default-deny (digests reported missing) digest-guess info disclosure defense
BatchReadBlobs (CAS) normal default-deny (NOT_FOUND per blob) same
ByteStream.Read normal default-deny (NOT_FOUND) same
BatchUpdateBlobs (CAS) normal N/A — writes target the caller’s instance writes always land in the caller’s namespace
ByteStream.Write normal N/A same
GetActionResult (AC) normal default-deny (NOT_FOUND) cross-tenant action reuse is a correctness risk, not a feature
UpdateActionResult (AC) normal N/A writes target the caller’s instance
Execute normal default-deny (PERMISSION_DENIED) execution requests for another tenant’s action namespace rejected
WaitExecution normal default-deny (NOT_FOUND) operation IDs are scoped per instance
GetCapabilities allowed allowed capabilities are not tenant-specific

“Default-deny” here means the server answers the cross-tenant query as if the digest or action does not exist in this instance. It does not answer “unauthorized” with metadata that confirms the digest exists in another instance, because that itself is the info-disclosure channel we’re closing. Optional second mode (configurable, default off): structured PERMISSION_DENIED with a non-leaking reason code, useful in dev for debugging misconfigured clients. Production stays on quiet NOT_FOUND.

The Non-Spoke default Instance

default is a deliberate construct, not an oversight. It exists for three narrow uses:

  1. Dogfood traffic during migration. The existing bazel-cache lane (see tofu/modules/bazel-cache/) and any developer machine pointing at the cell before per-spoke .bazelrc lands routes through default. This keeps migration incremental — no big-bang cutover.
  2. gf-reapi-cell-internal canaries. Health checks and synthetic probes that aren’t owned by any spoke (e.g. the TTFCH probe in slo.md) route through system for cell-internal probes, but operator-driven smoke probes from outside the cell route through default.
  3. Migration tail. After all spokes are migrated, default shrinks to zero traffic and gets deleted.

default is not a forever escape hatch. The discipline is:

  • 30-day migration window from this doc’s recommendation landing.
  • During the window, default accepts both reads and writes.
  • At end-of-window, default flips to read-only. New CI/dev callers that have not adopted --remote_instance_name=spoke-<slug> will see writes fail; the failure mode is loud and observable on the W5.3 fairness panel (TIN-1479).
  • 7 days after that, default reads are also disabled. Callers must route through a real spoke instance or be rejected.
  • Migration progress lives on the dashboard: per-day count of operations on default vs spoke-* instances. The metric tells the operator whether the window can close.

The shape is: default is the airlock, not the lobby.

Integration with Siblings

This doc provides the routing primitive. Each sibling workstream consumes it and adds its own enforcement layer on top. Cross-references for the closing gate audit:

  • E4/W4.2 IAM (TIN-1473). IAM scopes are written as tenant:<slug> and instance:<slug>. A caller presents a token; the IAM layer reads the scopes, compares against the instance_name on the request, and enforces. This doc routes; that doc authorizes. Without IAM, instance_name is unauthenticated routing — a caller can claim any instance. IAM closes that gap by binding the token to the legal instance set.
  • E4/W4.3 executor pools (TIN-1474). Pool selection reads instance_name from the request context (set by this doc’s middleware) together with the action’s capability label and routes to the right executor pool. The pool selector does not duplicate the extraction — it depends on the context value this doc puts there.
  • E4/W4.4 quota enforcement (TIN-1475). Quotas are per-instance_name. The cache_gib, ttl_days, and eviction_policy values declared in spoke-cache-quota ConfigMaps become enforceable budgets indexed by this doc’s instance_name. The quota enforcer joins ConfigMap declarations to live CAS bytes-used metrics, both keyed by instance_name.
  • E2/W2.3 audit log (TIN-1464). instance_name is a required field on every audit record. The audit schema is W2.3’s contract; this doc commits to emitting the field.
  • E5/W5.3 fairness panel (TIN-1479). All dashboards group by instance_name. The per-tenant queue-skew SLO from slo.md (max(p95_queue_per_tenant) / median(p95_queue_per_tenant) < 2×) is computed across instance_name. The dashboard panel is a downstream consumer of this routing decision.

Migration Plan

Six numbered steps, sequential. Each step is small enough that a rollback is a tofu apply -target=... away.

  1. Make instance_name explicit on the dogfood path. Add --remote_instance_name=default to whatever .bazelrc the dogfood CI currently uses. Semantically a no-op — the cell already treats absent instance_name as default — but it makes the field appear on every request and gives the migration metric something to count.
  2. Land gf-reapi-cell instance routing. Handlers read instance_name, validate it, log it, and route CAS/AC/ByteStream/Execute through instance-scoped helper paths. This first implementation stores proof data under instances/<instance_name>/cas/... and instances/<instance_name>/ac/....
  3. Accept proof-store cold start. The proof cell’s prior un-prefixed local data is not migrated into default; the service is still explicitly proof-local and scale-to-zero. A deployment may lose old proof-cache warmth, which is acceptable until the durable CAS substrate is selected and proved.
  4. Stand up the first real spoke. For each spoke in lanes.json (e.g. elders, blahaj), apply spoke-cache-quota, spoke-runner-binding, spoke-state-namespace, and then wire --config=spoke-<slug> into the spoke’s CI. Run the five proved target classes (//app:build, //app:unit_tests, //:deployment_bundle, //docs-site:build, //:public_vendor_handoff_fixture) through the spoke instance and confirm isolated reads and writes.
  5. Drop default writability. After every active caller has migrated (visible on the dashboard as zero writes against default for 7 consecutive days), set default to read-only. New un-migrated callers fail loudly.
  6. Drop default reads. 30 days later, disable default reads. The only paths that remain are spoke-* and system. default is deleted from the validator regex.

Integration Test Contract

Concrete tests gf-reapi-cell (and ongoing CI) must pass. The first implementation slice carries Go unit coverage in services/gf-reapi-cell/internal/cell/server_test.go; a later production acceptance wrapper may promote the same checks into a shell-level contract matching the existing gf_reapi_cell_publish_contract.sh and rustfs_openebs_restart_hygiene.sh shape.

Setup. Two synthetic spokes: spoke-test-a, spoke-test-b. Both have ConfigMaps applied via the spoke modules. Both have valid IAM tokens under the first W4.2/TIN-1473 GF_REAPI_AUTHZ_MODE=enforce slice.

Test 1 — same-tenant read succeeds.

  1. spoke-test-a writes a 4 KiB blob with known digest D.
  2. spoke-test-a BatchReadBlobs([D]) → expect OK, body matches.

Test 2 — cross-tenant read is denied.

  1. (Reusing blob D written by spoke-test-a in Test 1.)
  2. spoke-test-b issues BatchReadBlobs([D]) → expect NOT_FOUND for D.
  3. Confirm the audit log shows instance_name=spoke-test-b, result=not_found.

Test 3 — cross-tenant write is independent.

  1. spoke-test-b writes the same content as Test 1 (same digest D).
  2. Expect OK. The blob lands in spoke-test-b’s namespace, does not collide with spoke-test-a’s copy, and does not make spoke-test-a’s reads of D go away.
  3. Confirm both spoke-test-a and spoke-test-b can BatchReadBlobs([D]) independently.

Test 4 — cross-tenant AC isolation.

  1. spoke-test-a writes an ActionResult for action digest A.
  2. spoke-test-b GetActionResult(A) → expect NOT_FOUND.
  3. spoke-test-a GetActionResult(A) → expect OK.

Test 5 — default migration window behavior.

  1. Caller with no --remote_instance_name (or =default) writes blob E.
  2. Same caller reads E → expect OK during migration window.
  3. After default flips read-only: write E2 → expect PERMISSION_DENIED.
  4. After default reads disabled: read E → expect NOT_FOUND / PERMISSION_DENIED.

Test 6 — invalid instance_name rejected.

  1. Caller sends --remote_instance_name=evil/../system or =Spoke-Elders (wrong case) → expect INVALID_ARGUMENT. No silent fallthrough.

The test suite is the closing gate for this workstream alongside the documentation: passes mean W4.1 is real, not just declared.

Failure Modes

Failure Current exposure Design defense Residual risk
Client forgets --remote_instance_name Operates against default silently; behavior changes after migration window closes Explicit =default in dogfood .bazelrc during migration; observable on fairness panel (W5.3); loud failure after window None during window; after window, caller must adopt or be rejected — acceptable
Compromised credentials in spoke-A reach spoke-B Trivially impersonates spoke-B by sending instance_name=spoke-B if authz is off First E4/W4.2 IAM (TIN-1473) slice binds token tenant/scope → legal instance set in enforce mode. Routing alone cannot fix this Live rollout still needs token exchange, credential helper, and enforce-mode deployment proof
Digest collision across tenants Today: same digest = same blob, shared (any tenant reads any tenant’s blob with that digest) Per-instance prefixing makes the namespaces independent. Both copies exist in parallel, neither shadows the other Storage cost — same content stored N times across N tenants; documented trade-off below
Hot digest accidentally shared across tenants Today: free cross-tenant cache warmup as a side effect Per-instance prefixing loses this warmup. Spoke B re-uploads what spoke A already cached, identical bytes Cache-warming benefit lost. Not a correctness issue; a documented operational trade-off
instance_name typo in .bazelrc Today: any string is honored, blobs land in a typoed namespace and look lost Strict validator regex (^(spoke-[a-z][a-z0-9-]{1,62}\|default\|system)$) rejects typos as INVALID_ARGUMENT None if validator is on; full mitigation depends on shipping the regex
Operator deletes a spoke while traffic in-flight Today: ConfigMap removal does not block in-flight CAS reads env-reaper sequence: instance marked read-only for 7 days before deletion; CI catches stale references 7-day window is a policy not a hard guarantee; revisit if a hard-cut is ever needed
default lingers past migration window Easy to forget; becomes a permanent escape hatch Calendar gate: 30-day window from doc landing, then read-only, then deleted; metric on dashboard tracks default traffic Operator discipline required to actually flip the switch on schedule

Open Questions

These must be answered before W4.1 fully closes. Each is named so it can become a sub-ticket under TIN-1472.

  1. Wire format. Decided for the first implementation slice: standard REAPI request fields for CAS, AC, GetTree, and Execute; leading ByteStream path segment for ByteStream Read, Write, and QueryWriteStatus. Metadata-only instance hints are rejected by omission rather than accepted.
  2. Storage cost of per-instance namespacing. Cross-tenant cache warmup is lost, hot digests are duplicated. What’s the budget? At N spokes and a working set of W bytes, worst case is N×W storage. Recommendation pending: model against the gf-reapi-cell CAS primitives plan (cas-primitives.md), accept a 2–3× duplication ceiling, revisit if the metric trips.
  3. Cross-tenant cache warmup as an explicit feature. Should there be a blessed “this digest is public” lane that lets multiple spokes share a single physical blob? Recommendation: no. That is a coordination cost we don’t want, and it re-opens the digest-guess channel. If a class of blobs (e.g. the worker image’s base layer) is truly shared, it belongs in a different system (Attic, a separate bucket), not in the CAS-per-tenant contract.
  4. Interaction with TIN-1458 (gf-reapi-cell CAS primitives build-out). cas-primitives.md is the in-house plan for the CAS storage substrate gf-reapi-cell owns. Peer designs (inspiration only) key instance_name at different layers — Buildbarn at config-routing, BuildBuddy at the API-key + instance combination, bazel-remote at on-disk path. The (instance_name, digest) keying contract in this doc is the shape gf-reapi-cell implements regardless of which storage primitives the CAS build-out lands first. Refine the wiring section as cas-primitives.md lands its storage and keying decisions.
  5. system instance lockdown. system is reserved for cell-internal probes. Should the cell refuse system from any non-loopback origin, or rely on IAM (W4.2) to gate it? Recommendation pending: hard refuse at the cell perimeter, belt and suspenders with IAM.
  6. Multi-cluster instance scope. If gf-reapi-cell ever runs on both Honey and a future off-cluster environment simultaneously (it doesn’t today; this is post-current architecture), is instance_name cluster-scoped or global? Pending: this doc assumes single-cell. Revisit when multi-cell becomes real.
  7. Instance name on config/rbe-target-eligibility.json. Proofs today are recorded per target class without tenant context. Should the proof shape eventually carry instance_name so the eligibility validator can say “this target class is proved under spoke-elders specifically”? Pending: add an optional instance_name field to the proof schema; do not require it until E4 closes.

References

GloriousFlywheel