Deployment Contract

Deployment Contract

How GloriousFlywheel stacks land in the target cluster, who owns what, and where state authority lives.

Stack Topology

GloriousFlywheel deploys 4 OpenTofu stacks into the honey RKE2 cluster. Each stack has its own state file and can be planned/applied independently.

Stack Namespaces State Backend Purpose
arc-runners arc-systems, arc-runners GitLab HTTP (project 79706605) ARC controller + GitHub runner scale sets
attic cnpg-system, attic-cache-dev GitLab HTTP (project 79706605) Nix binary cache + PostgreSQL + RustFS
gitlab-runners gitlab-runners GitLab HTTP (project 79706605) GitLab Runner Helm deployments
runner-dashboard runner-dashboard GitLab HTTP (project 79706605) SvelteKit monitoring dashboard

State Authority

Source of truth hierarchy:

1. OpenTofu state (GitLab Managed Terraform State)
   - Authoritative for all managed resources
   - HTTP backend, locked per-stack

2. Repo code (tofu/stacks/*, tofu/modules/*)
   - Desired state definition
   - Drift = difference between (2) and (1)

3. Cluster live state (kubectl get)
   - Observed state
   - Drift = difference between (1) and (3)
   - Dashboard /api/gitops/drift compares (2) vs (3)

Rule: Never modify cluster resources directly. All changes flow through OpenTofu. Manual kubectl edits will be overwritten on next apply.

Exception: Emergency operator actions (pause runner, scale to zero) via dashboard API are intentional drift — they modify live state and will reconcile on next apply.

Deployment Flow

Local Development

# Prerequisites
export TF_HTTP_PASSWORD="<gitlab-pat-with-api-scope>"
export HONEY_KUBECONFIG="<path-to-honey-kubeconfig>"

# Plan a stack
just tofu-plan arc-runners

# Apply (requires review of plan output)
just tofu-apply arc-runners

CI/CD (GitHub Actions)

PR opened
  -> validate.yml: tofu init -backend=false && tofu validate (all modules + stacks)
  -> No plan/apply on PR (read-only validation)

Push to main (after merge)
  -> deploy-arc-runners.yml: tofu plan -> tofu apply (arc-runners stack only)
  -> Other stacks: manual deployment via `just tofu-apply <stack>`

Deployment Dependencies

OpenEBS ZFS (storage)
  -> must be deployed before any PVC-using workload
  -> provisioned on bumble node (ZFS pool)

CNPG Operator (database)
  -> must be deployed before attic stack
  -> deployed as part of attic stack (self-bootstraps)

ARC Controller (runner orchestration)
  -> must be deployed before any ARC scale set
  -> deployed as part of arc-runners stack, before scale sets

Dependency order for fresh cluster:
  1. arc-runners (OpenEBS ZFS + ARC controller + scale sets)
  2. attic (CNPG + Attic + RustFS)
  3. gitlab-runners (GitLab Runner deployments)
  4. runner-dashboard (SvelteKit app)

Cluster Requirements

Target Cluster: honey

Property Value
Provider On-prem
Distribution RKE2
Nodes 3 (honey: control plane, bumble: storage/ZFS, sting: stateless compute)
Kubernetes version 1.29+
Storage OpenEBS ZFS (on bumble node)
Ingress nginx-ingress (RKE2 bundled)
CNI Canal (RKE2 default)
Context name honey

Required Cluster Features

  • OpenEBS ZFS StorageClass: durable storage for stateful services on honey (the baseline Nix runner lanes should not depend on it for scheduling)
  • Metrics Server: Required for HPA CPU/memory scaling
  • Prometheus Operator CRDs: ServiceMonitor and PrometheusRule for metrics
  • Cert-Manager (optional): TLS for dashboard ingress

Required Secrets

Secret Scope Source
TF_HTTP_PASSWORD All stacks GitLab PAT with api scope
HONEY_KUBECONFIG CI deployment Kubeconfig for honey RKE2 cluster
GitHub App credentials arc-runners GitHub App installation for ARC
GitLab runner tokens gitlab-runners GitLab group runner tokens
Attic signing key attic Generated during attic setup
PostgreSQL credentials attic Generated by CNPG operator

Residual Assumptions

Current (honey cluster)

  • cluster_context = "honey" in tfvars and environment config
  • 10.43.0.0/16 service CIDR (shared by k3s and RKE2 defaults)
  • OpenEBS ZFS on bumble node for persistent storage

Transitional (may change)

  • GitLab state backend: Works but couples GloriousFlywheel to GitLab infrastructure. Migration to S3 backend or OpenTofu Cloud is a future option.
  • Single cluster: Current model is single-cluster. Multi-cluster would require per-cluster state files and a federation layer (see runner-topology.md).

Removed (no longer valid)

  • Liqo virtual nodes: Removed. No cross-cluster scheduling.
  • Civo Object Storage for backups: Disabled in attic stack. Using RustFS/external S3. Civo decommissioned April 2026.
  • Civo as provider: Migrated to on-prem honey RKE2 cluster (Civo decommissioned April 2026). Civo-specific config (civo.tfvars, Civo CLI in CI) has been removed.
  • Multiple cluster contexts: All stacks target one cluster.

Adding a New Stack

  1. Create tofu/stacks/<name>/ with main.tf, variables.tf, outputs.tf
  2. Add <name>.tfvars with production values (gitignore sensitive values)
  3. Configure GitLab HTTP state backend in main.tf:
    terraform {
      backend "http" {}
    }
  4. Add validation job in .github/workflows/validate.yml
  5. Add just tofu-plan <name> / just tofu-apply <name> support (automatic via Justfile)
  6. Document namespace ownership in this file

GloriousFlywheel