CAS Primitives — In-House Build Plan for gf-reapi-cell
Framing note (load-bearing). This doc replaces the deleted
cas-backend-decision.mdwhich incorrectly framed the work as a Buildbarn / BuildBuddy / bazel-remote vendor bake-off. GloriousFlywheel is a peer to those projects, not a consumer.gf-reapi-cellowns every REAPI byte. Peers are studied as reference architectures for inspiration; they are never adoption candidates. Any future PR or doc that drifts back toward vendor adoption is a defect.Decision summary
- Status: Working draft (W1.1 / TIN-1458, under parent E1 / TIN-1445).
- Frame: Pure in-house. Buildbarn, BuildBuddy, NativeLink, bazel-remote, and EngFlow are class peers, studied as inspiration only. No third-party REAPI server, CAS daemon, or storage daemon lives in the data path.
- What
gf-reapi-cellships today: Capabilities, ByteStream, CAS, AC, Execution, WaitExecution against ephemeral node-local PVC (local-path-sting-fast-ephemeral), digest verification on read/write paths, first-sliceinstance_nameCAS/AC routing on proof-local disk, opt-in JWT tenant/scope authz, no eviction policy, no TTL contract, single-replica, no AC writer attestation.- What’s being added in E1: S3-compatible storage substrate; digest verification on both read and write paths; namespace routing keyed by
instance_name; bounded eviction (LRU or LFU); TTL +FindMissingBlobskeep-alive contract; sharding/replication topology decision; write-side attestation parallel to the AC writer.- Explicitly out: Adopting any peer’s CAS server or storage daemon. Promoting the current RustFS topology to CAS/AC authority without TIN-1147 repair/proof evidence. Per-tenant per-mnemonic SLOs (E4 territory). Multi-cell topology (single-cell horizon).
Frame
GloriousFlywheel competes with Buildbarn, BuildBuddy, NativeLink, and
bazel-remote on the same primitive: a REAPI surface that turns Bazel’s
content-addressable build model into a shared substrate for local development
and CI. Those projects are class peers — they solve the same problem, with
their own architectural choices, in their own languages, against their own
storage stacks. gf-reapi-cell is the GloriousFlywheel-owned REAPI surface. It
already implements Capabilities, ByteStream, CAS, AC, Execution, and
WaitExecution end-to-end (see docs/build-system/gf-reapi-cell.md); the
default-branch proofs for //app:build, //app:unit_tests,
//:deployment_bundle, //docs-site:build, and the WAS-110 public-input
fixture all run through it. The data path is in-house from the first byte to
the last.
The work in E1 is to extend gf-reapi-cell’s primitives in-house so its
CAS data path is production-grade. “Production-grade” here means: durable
storage substrate that survives node loss; digest verification on every byte
crossing the boundary; namespace routing that makes tenant isolation honest;
bounded eviction that doesn’t poison referenced digests; an explicit TTL
contract Bazel’s --experimental_remote_cache_ttl can hold the cell to;
sharding/replication that survives a single-replica failure; and
write-side attestation so the audit log can attribute every byte. Each of
these primitives is owned by gf-reapi-cell. Peers are read for “how does
Buildbarn solve eviction across multiple bb-storage backends?” — not for
“which one do we adopt?” The answer to the second question is always: write
it in-house. The peer-frame discipline is captured in
feedback_rbe_backend and is non-negotiable.
What “Primitive” Means Here
The CAS layer decomposes into seven primitives, and gf-reapi-cell owns all
seven: storage substrate (where bytes live), digest verification
(every byte hashed and checked on read and write), namespace routing
(instance_name → tenant-keyed storage), eviction policy (bounded
capacity with LRU or LFU), TTL / lease (how long a blob is promised to
survive after last touch), sharding / replication (how the cell stays
available across replicas), and TLS / write attestation (who wrote this
byte and can we prove it). Each primitive has its own contract, its own
failure mode, and its own gating ticket. Decomposing this way lets the build
plan attack each primitive in isolation, name a library boundary where one
exists, and refuse a vendor adoption at the REAPI server level.
Layer-by-Layer Build Plan
One section per primitive. For each: what gf-reapi-cell ships today, what’s
missing, the in-house build recommendation, a short peer reference (tagged
inspiration only), and the failure mode if the primitive is not built.
a. Storage substrate
What gf-reapi-cell ships today. Ephemeral node-local PVC. The
deploy/gf-rbe/gf-reapi-cell.yaml manifest mounts
local-path-sting-fast-ephemeral (or another explicit
compute-expansion KVM/worker class), as documented in
docs/build-system/gf-reapi-cell.md under “Storage Boundary.” Storage
survives pod restart on the same node, does not survive node loss, does
not support cross-replica access, and has no quota or eviction layer.
What’s missing for production. Durability across node loss; a single
storage authority that multiple gf-reapi-cell replicas can write to and
read from; bounded capacity with an explicit eviction story; a backup or
disaster-recovery shape that doesn’t depend on the PVC’s underlying disk.
In-house build recommendation. Target an S3-compatible object store whose
repair, restore, quota, tenant-isolation, and lifecycle behavior has been
proved before it becomes the CAS/action-cache authority. The CAS layer in
gf-reapi-cell talks to it through a thin storage interface
(type CASStore interface { Get(ctx, digest) ([]byte, error); Put(ctx, digest, []byte) error; Stat(ctx, digest) (Metadata, error); Delete(ctx, digest) error; List(ctx, prefix) ([]Entry, error) }) so the substrate can be swapped without
touching the REAPI surface. No provider is selected in this doc. Civo Object
Storage is explicitly not an option. The candidate family is a
managed/appliance S3-compatible service, a self-hosted S3-compatible
object-store class, or a repaired/topology-changed RustFS path only if TIN-1147
supplies a separate promotion decision and evidence stronger than restart
recovery. See
the dedicated Storage substrate decision section below for the current
picking grid. The storage SDK is borrowable at the library level (AWS SDK for
Go’s s3 client, equivalent S3 client libraries) — these are clients to a substrate we operate,
not REAPI servers. Current RustFS is not the CAS/AC authority. TIN-1147 /
the RustFS RCA
(docs/research/gloriousflywheel-rustfs-state-backend-rca-gate-2026-05-06.md)
blocks it today: bucket-index reliability debt, restart-as-recovery, and
single-pod failure modes that recurred during the 2026-05-10 and 2026-05-11
incidents.
Peer reference (inspiration only). Buildbarn separates its storage backend
(bb-storage with pluggable blobstore configs: local disk, S3, GCS, Redis,
sharded combinations) from its REAPI frontend. BuildBuddy’s OSS server uses
local disk or S3, keyed by API key + instance_name. bazel-remote uses local
disk with optional S3 proxy upload. NativeLink uses a stack of Store
implementations composed in TOML. EngFlow runs on managed object storage. The
inspiration: separate the storage interface from the protocol surface, pick
S3 semantics as the wire, swap substrates without touching handlers. The
adoption: none.
Failure mode if not built. Ephemeral storage means every node-loss event
re-poisons the CAS hit-rate SLO (slo.md target ≥ 90%), every cell restart
loses warm working set, and the cell cannot horizontally scale because
replicas have no shared storage to read from. This is the dominant gating
failure for E1/TIN-1445.
b. Digest verification
What gf-reapi-cell ships today. The first implementation slice re-hashes
CAS bytes on BatchUpdateBlobs / ByteStream.Write, BatchReadBlobs /
ByteStream.Read, directory materialization, tree walking, and presence-style
checks (FindMissingBlobs, QueryWriteStatus, AC write precondition). Write
mismatches return structured DIGEST_MISMATCH write / INVALID_ARGUMENT;
read mismatches return structured DIGEST_MISMATCH read / DATA_LOSS where
the REAPI method can surface a status. The cell exports the initial poison
counter as gf_reapi_digest_mismatch_total{path="read|write"} from /metrics.
What’s missing for production. The current counter is intentionally minimal.
Production still needs labels for {hash_function="sha256", instance_name="..."}, dashboard wiring, alert routing, an AC lookup audit path
that can prove referenced output digests were read through verified CAS, and a
durable CAS substrate with restore/retention evidence. The mismatch counter is
the gf_reapi_digest_mismatch_total SLI from slo.md; any nonzero value is a
paged incident.
In-house build recommendation. Keep the verification primitive owned inside
gf-reapi-cell’s CAS package. Structured error remains gRPC
INVALID_ARGUMENT on write mismatch (client sent bad bytes) and gRPC
DATA_LOSS on read mismatch (storage substrate corrupted the bytes; this is
the poison case). The hash function is borrowable at the library level (Go
crypto/sha256, BLAKE3 if/when REAPI v2.1 lands blake3 digests). The
verification logic itself is ours. Cross-link
TIN-1459 (W1.2 digest
verification + write-side attestation primitive).
Peer reference (inspiration only). Buildbarn re-hashes on read by default
through its VerifyingBlobAccess decorator; BuildBuddy re-hashes on write but
skips on read for latency unless verification mode is enabled; bazel-remote
verifies on write and trusts on read; NativeLink composes verification into
its store-decorator stack. The inspiration: the read path verification is
the one that catches storage substrate corruption — it’s expensive and
mandatory. The adoption: none.
Failure mode if not built. Storage corruption silently propagates into
Bazel actions; one bit-flip in object storage becomes a poisoned action result
that gets cached in the AC and re-served forever. This is the dominant
correctness failure for E1/TIN-1445 and feeds directly into the
digest-mismatch rate poison signal in slo.md (no error budget; any event
pages).
c. Namespace routing
What gf-reapi-cell ships today. The first implementation slice validates
instance_name as default, system, or spoke-<slug>, reads it from the
standard REAPI request field or ByteStream path prefix, and keys proof-local
CAS/AC files under instances/<instance_name>/cas/... and
instances/<instance_name>/ac/.... Cross-instance CAS and AC lookups miss by
construction. Existing empty instance_name traffic maps to default.
What’s missing for production. The implementation is still proof-local.
Production still needs IAM binding from caller identity to allowed
instance_name, a durable shared CAS substrate, per-instance quota/eviction,
audit records, metrics by instance, default read-only/deletion policy, and
multi-replica behavior. The wire-level design is
docs/build-system/instance-name-routing-design.md (W4.1 / TIN-1472); this
primitive is the CAS-side implementation of that contract.
In-house build recommendation. Keep the current proof-cell route shape and
lift it behind a storage interface before production durability lands. Storage
keys remain instance-scoped; the future CASStore interface gains an
instance_name parameter on every method. The cell validates against the
^(spoke-[a-z][a-z0-9-]{1,62}|default|system)$ regex and routes accordingly.
Cross-tenant reads return NOT_FOUND (not PERMISSION_DENIED — that confirms
existence in another tenant’s namespace, which is the info-disclosure channel
we close). The routing logic is in-house; no library applies. Cross-link
docs/build-system/instance-name-routing-design.md (sibling W4.1) and
TIN-1472.
Peer reference (inspiration only). Buildbarn maps instance_name to a
storage-config block via its BlobAccessConfiguration switch. BuildBuddy
prefixes storage keys with the API key + instance_name pair. bazel-remote
uses instance_name as a literal on-disk path component with no auth.
NativeLink composes it through its Store stack. The inspiration: prefix
the storage key, route at the middleware boundary, default-deny
cross-tenant. The adoption: none.
Failure mode if not built. Every spoke shares one global CAS. Noisy neighbors evict quiet spokes’ working set. Digest-guess info disclosure becomes a real attack surface. Per-tenant quota enforcement (E4/W4.4 / TIN-1475) has nothing to count. The tenant model in E4 cannot close without this primitive landing in the CAS layer.
d. Eviction policy
What gf-reapi-cell ships today. The first local-backend size bound exists:
GF_REAPI_CAS_MAX_BYTES enables a lease-protected, LRU-ordered CAS evictor
that skips blobs inside GF_REAPI_MIN_CLIENT_CACHE_TTL, reconciles durable
quota counters after reclamation, and emits gf_reapi_size_eviction_* plus the
gf_reapi_evicted_while_referenced_total poison tripwire. It is disabled by
default and applies to CAS blobs on the local backend only; S3/object-store
size policy still belongs to the backend’s lifecycle/ILM/quota layer.
What’s missing for production. A per-instance_name size budget (initial
default reads from spoke-cache-quota’s cache_gib ConfigMap value; fallback
global default if absent), high/low-water hysteresis, S3/object-store lifecycle
policy proof, and a distributed referenced-set when multiple cell replicas or
worker pools are executing against the same CAS. LRU is the recommended
default because Bazel’s access pattern is dominated by “recently-built ⇒
likely-rebuilt”; LFU is documented as the alternative and left as an open
question pending real workload data.
In-house build recommendation. Owned by gf-reapi-cell’s storage layer.
The eviction loop runs as a background goroutine per instance_name, reads
bytes-used from the storage substrate (S3 ListObjectsV2 with prefix), holds
an in-memory access-time index (or atime/last-touch-time stored as object
metadata on the substrate — the implementation picks one and documents it),
and issues Delete calls in least-recent-first order until utilization is
back under the low-water mark. The LRU index data structure is borrowable at
the library level (github.com/hashicorp/golang-lru/v2, generic Go LRU
implementations); the policy wiring is ours. Critical invariant: eviction
must never delete a digest that is currently referenced by an unfinished
action. The bytes-evicted-while-referenced poison signal in slo.md (no
error budget; any event pages) is the contract this primitive must hold up.
The referenced-set lives in gf-reapi-cell’s in-memory action tracker —
every Execute and WaitExecution registers the action’s input digests as
referenced, releases them on action completion or timeout.
Peer reference (inspiration only). Buildbarn’s SizeDistinguishingBlobAccess
splits small and large blobs with separate eviction policies. BuildBuddy uses
LRU at the row level with TTL overlay. bazel-remote uses LRU with a
disk-bound --max_size. NativeLink composes via FilesystemStore + size
limits. EngFlow runs LRU with hot-tier pinning. The inspiration: bound the
cache, evict on a high-water trigger, separate the referenced-set from the
LRU set. The adoption: none.
Failure mode if not built. The substrate fills; writes start failing;
either Bazel client-side retry storms (the action retry rate SLI from
slo.md trips, < 1% budget consumed in hours), or the entire cell becomes
unwritable until manual operator intervention. Worse: an ad-hoc eviction
(operator-driven rm) hits a referenced digest and poisons an in-flight
action.
e. TTL / lease
What gf-reapi-cell ships today. GF_REAPI_BLOB_TTL enables local TTL
eviction for instances/<name>/{cas,ac}, and
GF_REAPI_MIN_CLIENT_CACHE_TTL is a startup guard: TTL must be greater than or
equal to the Bazel client cache lease. The local backend also touches CAS/AC
objects on read so the filesystem mtime is an LRU signal for TTL and size
eviction. GF_REAPI_CAS_MAX_BYTES requires the same lease floor before it will
start.
What’s missing for production. A documented TTL contract Bazel’s
--experimental_remote_cache_ttl can hold the cell to: after a successful
FindMissingBlobs returns “not missing” for a digest, the cell promises the
digest survives for at least N seconds. The Bazel client uses
FindMissingBlobs as a keep-alive: as long as a build periodically refreshes
the action’s input digests, the cell agrees not to evict them. The minimum
contractually-guaranteed lifetime is the TTL.
In-house build recommendation. Add an expires_at field to the CAS
metadata (stored as object metadata on the storage substrate, or in an
adjacent index). On Put, expires_at = now() + ttl_default. On every
FindMissingBlobs hit, refresh: expires_at = max(expires_at, now() + ttl_default). The eviction loop from primitive d treats expires_at as a
floor — never evict a blob whose expires_at is in the future, regardless of
LRU position. Default ttl_default is 7 days (matches the planned
ttl_days field on spoke-cache-quota ConfigMaps); per-instance override
reads from the ConfigMap. The contract is named in
--experimental_remote_cache_ttl semantics on the Bazel client side; cell
side it is enforced in the eviction loop. Cross-link
TIN-1460 (W1.3 TTL + lease
contract).
Peer reference (inspiration only). Buildbarn’s CompletenessCheckingBlobAccess
extends TTL on FindMissingBlobs hits. BuildBuddy implements
--experimental_remote_cache_ttl natively with per-org overrides.
bazel-remote does not honor TTL beyond raw LRU. NativeLink supports
TTL through its Store decorators. The inspiration: FindMissingBlobs is
the keep-alive primitive; treat every hit as a lease renewal. The
adoption: none.
Failure mode if not built. Long-running builds (CI cold paths, multi-hour
RBE jobs) lose digests mid-action; the AC then surfaces an action result
whose referenced inputs are gone; the next read returns NOT_FOUND and the
client falls back to local execution or fails. This trips the action retry rate and digest-mismatch rate SLIs in slo.md simultaneously.
f. Sharding / replication
What gf-reapi-cell ships today. Single-replica Deployment. One pod, one
PVC, one node. Pod restart loses any in-memory state (action tracker,
referenced-set, LRU index); node loss loses all storage.
What’s missing for production. A deployment topology that survives single
pod / single node loss without losing the working set. With S3-compatible
storage as the substrate (primitive a), the data tier is already
horizontally available; what’s missing is multi-replica gf-reapi-cell
pods that share that substrate. Replicas must agree on the referenced-set
(so eviction in replica A doesn’t poison an action running in replica B) and
on the LRU index (so eviction decisions are consistent).
In-house build recommendation. Phase 1: stateless replicas. Multiple
gf-reapi-cell pods, all reading and writing the same S3 substrate. Each
replica maintains its own in-memory referenced-set scoped to actions it
itself is executing; the eviction loop runs on a single elected leader
(Kubernetes lease via coordination.k8s.io/v1/Lease). Phase 2 (deferred):
shared referenced-set via Redis or an in-cell consensus layer if leader
eviction becomes a bottleneck. The leader-election library is borrowable
(k8s.io/client-go/tools/leaderelection); the eviction policy and
referenced-set logic are ours. Sharding by instance_name is implicit
because storage is already prefixed (primitive c); no additional sharding
layer is needed at the CAS level for the single-cell horizon. Cross-link
TIN-1461 (W1.4 sharding /
replication topology).
Peer reference (inspiration only). Buildbarn shards via
ShardingBlobAccess with weighted backends. BuildBuddy scales horizontally
on its API layer with shared object storage. bazel-remote does not shard.
NativeLink composes ShardStore for backend distribution. EngFlow runs a
managed sharded fleet. The inspiration: stateless REAPI replicas on shared
durable storage is the simplest path to HA; leader-elect the eviction
loop. The adoption: none.
Failure mode if not built. Single-pod failure = full cell outage. The
gf-reapi-cell availability SLI (not yet named in slo.md; tracked as part
of E5/TIN-1449) trips immediately. Cold-start time after pod restart spikes
the TTFCH < 90s SLO.
g. TLS / write attestation
What gf-reapi-cell ships today. mTLS terminates at the cell ingress
(per the deploy/gf-rbe/gf-reapi-cell.yaml ingress boundary); CAS writes
have no per-blob attestation. The AC side has a parallel writer-attestation
design landing in docs/build-system/ac-writer-attestation-design.md.
What’s missing for production. Per-blob write attestation: the audit
record for every BatchUpdateBlobs / ByteStream.Write carries the writing
identity (from the mTLS peer cert or the IAM token from E4/W4.2 once landed),
the digest, the byte count, the instance_name, and a timestamp. The audit
log is the same surface instance-name-routing-design.md names as W2.3 /
TIN-1464.
In-house build recommendation. Mirror the AC-side design from
ac-writer-attestation-design.md: same audit envelope, same identity
extraction, same field names. The CAS-side handler emits one audit record
per blob written (batched if multiple blobs in one RPC, but each blob gets
its own record). The signing of audit records, if/when audit records get
signed, is the same primitive as AC; both share whatever signing library
lands there. TLS termination remains at the ingress; in-cluster traffic
between gf-reapi-cell replicas and the storage substrate uses cluster mTLS
(NetworkPolicy already in gf-reapi-cell.yaml). Cross-link
docs/build-system/ac-writer-attestation-design.md (sibling in this corpus).
Peer reference (inspiration only). Buildbarn does not emit write
attestation by default; it logs at the storage backend layer. BuildBuddy
attaches write attestation to its audit log keyed by API key. bazel-remote
has no write attestation. NativeLink can emit through its
StoreFilter stack. EngFlow has enterprise audit. The inspiration: the
write attestation primitive is the AC and CAS sharing one audit envelope;
don’t invent two. The adoption: none.
Failure mode if not built. No forensic trail when a poisoned digest
surfaces. The digest-mismatch rate SLI in slo.md becomes a paged
incident with no way to attribute the bad write. Tenant-attribution for
quota and noisy-neighbor RCAs has no source data.
The Build-vs-Borrow Line
Explicit, because this is where the deleted predecessor doc went wrong.
| Borrowable (library level only) | Not borrowable (REAPI / CAS server level) |
|---|---|
AWS SDK for Go s3 client; equivalent S3 client libraries |
Buildbarn bb-storage daemon |
crypto/sha256, BLAKE3 (lukechampine.com/blake3) — hash function implementations |
BuildBuddy OSS server |
github.com/hashicorp/golang-lru/v2 — generic LRU data structure |
bazel-remote binary |
k8s.io/client-go/tools/leaderelection — Kubernetes lease primitive |
NativeLink REAPI binary |
google.golang.org/grpc — gRPC server framework (already in use) |
EngFlow scheduler |
github.com/prometheus/client_golang/prometheus — metric emission |
Any peer’s CAS routing layer |
go.opentelemetry.io/otel — tracing emission |
Any peer’s eviction policy implementation |
| Standard structured-log libraries (zap, zerolog, slog) | Any peer’s digest-verification middleware |
| TLS / mTLS primitives from the Go standard library | Any peer’s audit log envelope |
The rule. When in doubt, default to (a) write it in-house. A library is
borrowable when it implements a generic primitive (LRU, SHA-256, gRPC, S3
client). A library is not borrowable when it implements REAPI semantics,
CAS protocol behavior, or any decision-layer logic specific to remote build
execution. That second class is where strategic coupling to a competitor
hides — buildbarn/bb-storage’s BlobAccess interface looks like a generic
storage shim until you notice it carries Buildbarn-shaped semantics for
instance_name, eviction, and verification, at which point adopting it
adopts Buildbarn’s product decisions. Don’t.
Storage Substrate Decision
Picking the storage substrate in one section, so the operator reading just this part can see the answer.
No default provider selected yet. The Week-1 decision is to keep CAS on a proved S3-compatible substrate and to require a concrete endpoint package before implementation. Civo Object Storage is explicitly excluded. Current RustFS is blocked by TIN-1147 and does not qualify from green canaries or restart recovery. A future candidate must name the endpoint family, authentication model, lifecycle policy, restore proof, regional/failure-domain behavior, and tenant-isolation shape before it can become the CAS authority.
Managed/appliance S3-compatible candidate. This is the preferred class if
it gives us operator-owned credentials, lifecycle policy, restore evidence,
and enough availability without adding a second storage service we run
ourselves. The implementation should still use the same CASStore interface
and an S3-compatible client library so the REAPI surface is not coupled to a
provider.
Self-hosted S3-compatible candidate. Keep this as a candidate class when operator control and predictable latency matter more than reducing the storage operations surface. The current live member of that class is RustFS for existing cache/state paths; using RustFS for CAS/AC requires TIN-1147 repair, restore, lifecycle, bucket-index coherence, and failure-domain evidence before promotion.
Test-environment fallback. CI or integration environments may use a scoped
S3-compatible test bucket with ephemeral credentials. The test endpoint must
exercise the same CASStore contract and must not become the production
authority by accident.
Current RustFS promotion is gated today. TIN-1147 / the RustFS RCA
(docs/research/gloriousflywheel-rustfs-state-backend-rca-gate-2026-05-06.md)
documents the bucket-index reliability failure that recurred on 2026-05-10
and the storage-node recovery failure on 2026-05-11. Restart-as-recovery is
an incident response, not an availability design. CAS hit rate, action retry
rate, and digest-mismatch rate SLOs cannot be met on a substrate whose
bucket-index visibility depends on a clean restart. Current RustFS is not
CAS/AC authority. TIN-1147 may still produce a repaired RustFS topology or a
replacement backend, but current RustFS CAS/AC promotion remains blocked by
TIN-1147 until a new promotion decision and evidence clears the recurrence
class.
Explicitly disqualified: any peer’s storage daemon as the substrate.
Running bb-storage as the S3 endpoint would be adopting Buildbarn at the
data-tier layer — the CASStore interface would be talking to a
Buildbarn-flavored API, and Buildbarn’s eviction / sharding / verification
choices would propagate up into our cell. That’s the strategic coupling the
peer-frame discipline refuses.
Prioritized Backlog
Ordered by gating-power for the E1/TIN-1445 close criteria (CAS hit rate
≥ 90% for 14 days, digest-mismatch rate = 0 for 14 days,
evicted-while-referenced = 0 for 14 days, cas-primitives-static-gate
passes).
- Storage substrate (primitive a) — TIN-1458 / W1.1 (this doc). Without durable substrate, every other primitive measures noise. Require a concrete S3-compatible endpoint package with restore evidence. Keep managed/appliance, self-hosted S3-compatible, and any separately promoted RustFS candidate explicit; Civo is excluded and current RustFS CAS/AC promotion remains blocked by TIN-1147.
- Digest verification (primitive b) — TIN-1459 / W1.2. The first
correctness slice is implemented in
gf-reapi-cellread/write and presence-check paths with a minimal poison counter. Remaining production work is richer labels, dashboard/alert wiring, and AC lookup provenance. - TTL / lease (primitive e) — TIN-1460 / W1.3. Required before any bounded substrate (a) can run an eviction loop (d) without poisoning long-running actions.
- Eviction policy (primitive d) — under TIN-1460’s neighborhood; schedule the eviction-loop sub-ticket explicitly. LRU default; LFU deferred. Referenced-set invariant is the hard contract.
- Sharding / replication (primitive f) — TIN-1461 / W1.4. Phase 1 stateless replicas + leader-elected eviction. Phase 2 deferred.
- Namespace routing (primitive c) — W4.1 / TIN-1472 (already drafted
in
instance-name-routing-design.md). The CAS-side implementation lands alongside that doc; this primitive’s gating-power is on E4, not E1, so it’s lower in the E1 backlog even though it touches every CAS request. - TLS / write attestation (primitive g) — co-lands with
ac-writer-attestation-design.md. Audit envelope is shared with AC; no separate ticket needed for E1 close, but the audit record must include CAS writes by the time E5/TIN-1449 closes.
Failure Modes — At the Primitive Level
| Failure | Which primitive owns the defense | Residual risk |
|---|---|---|
| Node loss wipes warm CAS | a. storage substrate | Substrate outage or object-store host loss; mitigated by explicit restore and fallback substrate selection |
| Storage substrate returns corrupted bytes | b. digest verification (read path) | Pre-verification corruption between substrate write and substrate read (in-flight memory corruption); rare |
| Client uploads bytes that don’t match the declared digest | b. digest verification (write path) | None — write-side is fully covered by re-hashing |
| Spoke A reads spoke B’s blob by digest-guess | c. namespace routing | First JWT authz slice exists behind GF_REAPI_AUTHZ_MODE; live rollout still needs token exchange, credential helper, and enforce-mode proof |
| Substrate fills, writes start failing | d. eviction policy | Eviction loop fails to keep up under burst write load; mitigated by sizing high-water mark conservatively |
| Eviction deletes a referenced digest mid-action | d. eviction policy (referenced-set invariant) | Referenced-set is per-replica in Phase 1; cross-replica reference leak deferred to Phase 2 + Redis |
| Long-running build loses digests mid-action | e. TTL / lease | Client must actually call FindMissingBlobs periodically; older Bazel versions don’t always |
| Single pod failure takes down the cell | f. sharding / replication | Leader election failure during eviction is recoverable on next lease cycle; brief eviction pause acceptable |
| Poisoned blob surfaces without forensic trail | g. TLS / write attestation | Audit log is append-only but not yet signed; signing deferred (see Open Questions) |
| Storage substrate provider compromise | a. + g. (substrate selection + audit) | Outside the trust boundary of this doc; named in Open Questions |
Cross-cell instance_name collision |
c. namespace routing | Single-cell horizon; multi-cell named in Open Questions |
Open Questions
These must be answered before W1.1 closes. Each is named so it can become a sub-ticket under TIN-1458.
- Managed/appliance S3-compatible service vs self-hosted S3-compatible service? Civo Object Storage is not a candidate. The open decision is whether a managed/appliance endpoint can give us a better operator contract than running a self-hosted S3-compatible service ourselves or promoting a repaired RustFS topology. Pending: run a measured availability, restore, and P99 read/write-latency probe against the real candidate set.
- LRU vs LFU for eviction in our access pattern? LRU is the default recommendation. LFU may win on workloads where a small hot set is re-accessed across many builds (compile-cache-dominant) and a long tail of one-shot blobs (test fixtures, generated configs) gets discarded correctly. Pending: instrument the access-time histogram on the first 30 days post-substrate-cutover, decide LRU vs LFU vs hybrid based on data, not vibes.
- Where does the CAS hot-blob set live across
gf-reapi-cellreplicas? Phase 1: per-replica in-memory referenced-set, leader-elected eviction. Phase 2: shared referenced-set in Redis or equivalent. The question: when does Phase 1 stop being sufficient? Pending: if eviction-leader pod restart causes a measurable eviction-pause SLI breach, promote to Phase 2. - When does TIN-1147 RustFS recovery change anything here? Only if it produces more than restart recovery: a repaired or topology- changed RustFS authority with retention, restore, quota, tenant isolation, and recurrence-clearing evidence, plus an explicit promotion decision. Otherwise CAS/AC stays on a different selected S3-compatible substrate.
- How does this interact with the spoke-cache-quota tenant model? The
spoke-cache-quotaConfigMap declarescache_gib,ttl_days, and (as of the most recent module variant)attic_namespaceandbazel_cache_prefix. The CAS eviction loop reads these as the per-instance_namequota; the TTL contract readsttl_days. Open question: what happens when an operator updates the ConfigMap mid-flight — does the eviction loop watch for changes or re-read on each pass? Pending: pick “watch” because update-during-burn is the operator-recovery path. - Audit-log signing. The write-attestation primitive emits an audit record per blob; the AC side does the same. Are records signed? By what key? Today: append-only structured log on the cell’s existing log surface. Pending: when the audit log graduates from “evidence for operators” to “evidence for compliance,” pick a signing scheme; for now, un-signed structured records are sufficient.
- Cross-substrate migration shape. If we cut from one selected S3-compatible substrate to another, what’s the data move? Pending: write a one-page cutover runbook when the substrate decision lands. Likely answer: dual-write window, read-fallthrough during cutover, single-substrate cutoff once the new substrate’s hit-rate matches the old.
- Hash function plurality. REAPI v2 supports SHA-256 as the default,
with BLAKE3 and others optionally. Bazel today uses SHA-256.
gf-reapi-cell’s capabilities response should declare its supported hash functions; today it declares only SHA-256. Pending: leave as SHA-256-only until a client asks for BLAKE3.
References
- Repo-local:
docs/research/gloriousflywheel-rustfs-state-backend-rca-gate-2026-05-06.md— why current RustFS is not CAS/AC authority - Repo-local:
docs/build-system/gf-reapi-cell.md— current shape of the cell this doc extends - Repo-local:
docs/build-system/ac-writer-attestation-design.md— sibling primitive; CAS write-attestation shares its envelope - Repo-local:
docs/build-system/instance-name-routing-design.md— sibling primitive; CAS namespace routing is its CAS-side implementation - Repo-local:
docs/build-system/slo.md— the SLOs this primitives plan must hold up - Repo-local:
config/rbe-target-eligibility.json— proof shape for what currently runs through the cell - Repo-local:
deploy/gf-rbe/gf-reapi-cell.yaml— current deployment manifest the substrate decision will edit - Repo-local:
tofu/modules/spoke-cache-quota/— tenant quota declarations the eviction loop consumes - REAPI v2 spec — canonical CAS / AC / ByteStream message definitions
- Linear epic: TIN-1445 (E1 CAS authority — parent)
- Linear siblings: TIN-1458 (W1.1 this doc), TIN-1459 (W1.2 digest verification), TIN-1460 (W1.3 TTL / lease), TIN-1461 (W1.4 sharding / replication)
- Linear adjacent: TIN-1472 (W4.1 instance-name routing — CAS-side implementation site), TIN-1147 (RustFS RCA; disqualification source), TIN-1449 (E5 observability; consumes the metrics this doc names)
- Repo-local memory:
feedback_rbe_backend— peer-frame discipline; this doc is its load-bearing application to CAS
Peer architectures — INSPIRATION ONLY — these are peers, not adoption candidates
- Buildbarn
bb-deployments— reference deployment shapes for a peer’s CAS / scheduler / worker topology. INSPIRATION ONLY — peer, not adoption candidate. - Buildbarn
bb-storageblobstore configuration — peer’s storage interface decomposition. INSPIRATION ONLY — peer, not adoption candidate. - BuildBuddy OSS — peer’s monolith REAPI server with per-org tenant model. INSPIRATION ONLY — peer, not adoption candidate.
buchgr/bazel-remote— peer’s minimalist cache-only server. INSPIRATION ONLY — peer, not adoption candidate.- NativeLink — peer’s
Rust-implemented composable
Storestack. INSPIRATION ONLY — peer, not adoption candidate. - EngFlow — peer’s managed-service patterns
for IAM scope composition with
instance_name. INSPIRATION ONLY — peer, not adoption candidate.