Cache and State Backend Roles

Cache and State Backend Roles

Canonical reference for cache systems and state backend authority in GloriousFlywheel. Each system has a specific role; this document is the single place that defines what each does, where it runs, and what its limits are.

For the FlakeHub vs Attic evaluation and hybrid architecture rationale, see FlakeHub vs Attic.

Cache Roles

System Role Endpoint Backing Store Notes
Attic Internal CI binary cache for Nix store paths nix-cache.tinyland.dev (HTTPS) / attic.nix-cache.svc (cluster-internal) RustFS S3 on bumble (OpenEBS ZFS) Primary CI cache. Chunk-level NAR dedup. Signing keys per machine/runner.
FlakeHub Public Nix flake publication and discovery flakehub.com/f/tinyland-inc/GloriousFlywheel/* Determinate Systems SaaS Publication-only, not active runtime. Used for downstream discoverability, not CI cache.
Bazel remote cache Optional build acceleration for Bazel targets bazel-cache.tinyland.dev Disk-backed on honey (not S3-backed currently) Cache miss = rebuild. Non-durable.
RustFS S3-compatible object storage backend for Attic Not user-facing OpenEBS ZFS on bumble NOT a user-facing cache. Provides the S3 API that Attic writes to.

Relationship diagram

CI Runners (honey)
    |
    v
  Attic  ──S3 API──>  RustFS (bumble, OpenEBS ZFS)
    |
    v
  /nix/store PVC

Bazel targets ──gRPC──> Bazel remote cache (honey, disk)

Post-merge publish ──OIDC──> FlakeHub (SaaS)

State Backend

Property Current Planned
Backend type HTTP backend (GitLab Managed Terraform State) S3-compatible backend (RustFS or similar)
Rationale Transitional; inherited from GitLab-primary era Local-first operation, no SaaS dependency in critical path
Stack isolation Per-stack state isolation Per-stack state isolation (unchanged)

OpenTofu stacks use per-stack state files. The backend type is orthogonal to stack isolation — migrating from HTTP to S3 changes the transport, not the state partitioning.

HA and Durability Limits

None of these systems are HA today. This section is intentionally honest about what fails when a node goes down.

Component Deployment HA Replication Backup
Attic PostgreSQL Single-replica on bumble (OpenEBS ZFS) No None None off-site
RustFS Single-replica on bumble (OpenEBS ZFS) No None None off-site
Bazel remote cache Disk-backed on honey No None None (non-durable by design)
ZFS volumes lz4 compression on bumble No None No off-site backup

Impact of bumble outage: Attic and RustFS are unavailable. CI builds fall back to uncached Nix evaluation (slow but functional). Bazel cache on honey survives independently but is also non-durable.

Impact of honey outage: All caches and runners are unavailable. No failover cluster exists.

GloriousFlywheel