2026 04 16 Honey Migration

Honey Cluster Migration Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Migrate all GloriousFlywheel IaC stacks from Civo to the on-prem honey RKE2 cluster, completing the OpenEBS storage cutover and deploying ARC runners, GitLab runners, and the runner dashboard on-prem with tailnet-only access.

Architecture: Three-node on-prem RKE2 v1.32.12 cluster (honey=control-plane, bumble=ZFS durable storage, sting=stateless compute). All operator access via Tailscale MagicDNS (taila4c78d.ts.net). OpenEBS ZFS CSI on bumble (openebs-bumble-zfs StorageClass, pool tank/openebs). No ingress controllers, no CNPG operator. Canal CNI with pod CIDR 10.42.0.0/16 and service CIDR 10.43.0.0/16.

Tech Stack: OpenTofu 1.8.x, RKE2 v1.32.12, Tailscale Operator (Helm), GitHub ARC 0.14.0, OpenEBS ZFS CSI, SvelteKit 5 + Skeleton v4, kubectl, Helm

Tracker References:

  • TIN-78: Workload migration from Civo to honey/bumble/sting (In Progress)
  • TIN-125 / #167: Cache/state contract convergence (P0)
  • TIN-128 / #169: Local-first Tofu + blahaj deployment (P1)
  • TIN-140 / #178: Install Tailscale K8s Operator (P1)
  • TIN-148 / #185: Fix Attic cache deployment (P0)

File Structure

New files (create)

File Responsibility
tofu/stacks/tailscale/honey.tfvars Tailscale stack config for honey cluster
tofu/stacks/attic/honey.tfvars Attic stack config for honey cluster
tofu/stacks/arc-runners/honey.tfvars ARC runners config for honey cluster
tofu/stacks/runner-dashboard/honey.tfvars Dashboard config for honey cluster
tofu/stacks/gitlab-runners/honey.tfvars GitLab runners config for honey cluster

Modified files

File Change
.github/workflows/validate.yml Add honey.tfvars format validation
tofu/stacks/attic/main.tf Add adopt_existing_namespace plumbing if missing

Existing files (reference only, no modifications)

File Used for
tofu/modules/tailscale-operator/main.tf Understanding Helm + Connector CRD wiring
tofu/modules/tailscale-operator/variables.tf Variable defaults and validation rules
tofu/stacks/tailscale/main.tf Stack-to-module variable pass-through
tofu/stacks/attic/variables.tf All variable definitions for honey.tfvars
tofu/stacks/arc-runners/variables.tf All variable definitions for honey.tfvars
tofu/stacks/tailscale/civo.tfvars Reference for honey.tfvars structure
tofu/stacks/attic/civo.tfvars Reference for honey.tfvars structure
tofu/stacks/arc-runners/civo.tfvars Reference for honey.tfvars structure

Cluster Context

Nodes:

Node Role IP (tailnet) Workload placement
honey control-plane 100.86.x.x Control plane, lightweight pods
bumble worker 100.93.x.x ZFS-backed durable state (PG, RustFS, PVCs)
sting worker 100.121.x.x Stateless compute (runners, API servers, dashboards)

Storage:

StorageClass Provisioner Node Notes
openebs-bumble-zfs zfs.csi.openebs.io bumble lz4 compression, 128k recordsize, tank/openebs pool
local-path rancher local-path honey Legacy, being migrated away

Existing services on honey (nix-cache namespace):

Service Type Backend Status
attic-pg-rw PostgreSQL local-path on honey Running (9d, has restarts)
attic-pg-openebs-rw PostgreSQL openebs-bumble-zfs on bumble Running (~21h, zero restarts)
attic-rustfs-hl RustFS/MinIO local-path on honey Running
attic-rustfs-openebs RustFS/MinIO openebs-bumble-zfs on bumble Running (~21h, zero restarts)
attic-api Attic server N/A Points to attic-pg-rw + attic-rustfs-hl (STALE)

Tailscale operator: Already installed on honey, managing 21 proxy pods. No Connector CRD deployed yet.


Task 1: Complete Attic OpenEBS Storage Cutover

Context: Attic on honey currently points to attic-pg-rw (local-path on honey node) and attic-rustfs-hl (local-path on honey node). The OpenEBS-backed equivalents (attic-pg-openebs-rw, attic-rustfs-openebs) are running with zero restarts on bumble. This task switches the config to use the durable ZFS-backed services.

Files:

  • Modify: attic-config ConfigMap in nix-cache namespace on honey cluster (kubectl, not Tofu)

Pre-requisites: kubectl context set to honey (kubeconfig at ~/.kube/config with context name matching honey cluster).

  • Step 1: Verify both PostgreSQL services are healthy
kubectl --context honey get pods -n nix-cache -l app=postgresql -o wide
kubectl --context honey exec -n nix-cache deploy/attic-pg-openebs -- pg_isready -U attic

Expected: Both attic-pg-rw and attic-pg-openebs-rw pods Running, pg_isready returns “accepting connections”.

  • Step 2: Verify RustFS OpenEBS service is healthy
kubectl --context honey exec -n nix-cache deploy/attic-rustfs-openebs -- mc ready local

If mc is not available in the container, use:

kubectl --context honey port-forward -n nix-cache svc/attic-rustfs-openebs 9001:9000 &
curl -s http://localhost:9001/minio/health/live
kill %1

Expected: health check passes or “mc ready” returns 0.

  • Step 3: Dump current configmap for backup
kubectl --context honey get configmap attic-config -n nix-cache -o yaml > /tmp/attic-config-backup-$(date +%Y%m%d).yaml

Expected: YAML file saved with current configuration.

  • Step 4: Migrate data from local-path PostgreSQL to OpenEBS PostgreSQL
# Dump from the old local-path PG
kubectl --context honey exec -n nix-cache deploy/attic-pg -- pg_dump -U attic -d attic -Fc -f /tmp/attic-dump.pgfc

# Copy the dump out
kubectl --context honey cp nix-cache/$(kubectl --context honey get pod -n nix-cache -l app=postgresql,storage=local-path -o jsonpath='{.items[0].metadata.name}'):/tmp/attic-dump.pgfc /tmp/attic-dump.pgfc

# Copy into the OpenEBS PG pod
kubectl --context honey cp /tmp/attic-dump.pgfc nix-cache/$(kubectl --context honey get pod -n nix-cache -l app=postgresql,storage=openebs -o jsonpath='{.items[0].metadata.name}'):/tmp/attic-dump.pgfc

# Restore into OpenEBS PG
kubectl --context honey exec -n nix-cache deploy/attic-pg-openebs -- pg_restore -U attic -d attic --clean --if-exists /tmp/attic-dump.pgfc

Note: The exact pod label selectors above are illustrative. Adjust the selectors or use pod names directly based on the actual labels on honey. If the databases are already synced (e.g., via replication), skip this step. Verify by comparing row counts:

kubectl --context honey exec -n nix-cache deploy/attic-pg -- psql -U attic -d attic -c "SELECT count(*) FROM nars;"
kubectl --context honey exec -n nix-cache deploy/attic-pg-openebs -- psql -U attic -d attic -c "SELECT count(*) FROM nars;"
  • Step 5: Update the attic-config ConfigMap

Patch the configmap to point to OpenEBS-backed services:

kubectl --context honey get configmap attic-config -n nix-cache -o jsonpath='{.data.server\.toml}' > /tmp/attic-server.toml

Edit /tmp/attic-server.toml to replace:

  • attic-pg-rw.nix-cache.svc.cluster.local with attic-pg-openebs-rw.nix-cache.svc.cluster.local
  • attic-rustfs-hl.nix-cache.svc with attic-rustfs-openebs.nix-cache.svc

The [database] section should become:

[database]
url = "postgresql://attic:ZilWsA_e%2BPU0rm2M%2A4a-DT%3D_947wSwAT@attic-pg-openebs-rw.nix-cache.svc.cluster.local:5432/attic?sslmode=disable"

The [storage] section should become:

[storage]
type = "s3"
endpoint = "http://attic-rustfs-openebs.nix-cache.svc:9000"

Apply:

kubectl --context honey create configmap attic-config -n nix-cache \
  --from-file=server.toml=/tmp/attic-server.toml \
  --dry-run=client -o yaml | kubectl --context honey apply -f -
  • Step 6: Restart Attic pods to pick up new config
kubectl --context honey rollout restart deployment/attic-api -n nix-cache
kubectl --context honey rollout restart deployment/attic-gc -n nix-cache 2>/dev/null || true
kubectl --context honey rollout status deployment/attic-api -n nix-cache --timeout=120s

Expected: Pods restart and reach Running state within 2 minutes.

  • Step 7: Validate Attic is serving from OpenEBS backends
# Check API health
kubectl --context honey exec -n nix-cache deploy/attic-api -- curl -sf http://localhost:8080/_attic/api/v1/cache-config/main || echo "FAIL"

# Check logs for database connection
kubectl --context honey logs deploy/attic-api -n nix-cache --tail=20 | grep -i "database\|postgres\|connected"

Expected: API responds, logs show successful connection to attic-pg-openebs-rw.

  • Step 8: Commit

No file changes to commit for this task (all changes were kubectl operations). Record the cutover in a commit message for audit:

git add -A && git commit --allow-empty -m "ops(attic): complete OpenEBS storage cutover on honey

Switched attic-config ConfigMap from local-path services to
OpenEBS ZFS-backed services on bumble node:
- attic-pg-rw -> attic-pg-openebs-rw
- attic-rustfs-hl -> attic-rustfs-openebs

Ref: TIN-125, TIN-148, #167, #185"

Task 2: Create honey.tfvars for Tailscale Stack

Context: The Tailscale operator is already running on honey. This tfvars file configures the Tofu stack to manage it going forward and deploy a Connector CRD that advertises honey’s pod and service CIDRs to the tailnet.

Files:

  • Create: tofu/stacks/tailscale/honey.tfvars

  • Step 1: Write honey.tfvars

Create tofu/stacks/tailscale/honey.tfvars:

# Tailscale Stack - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)

cluster_context  = "honey"
namespace        = "tailscale"
create_namespace = false
chart_version    = "1.94.2"

# Tags
default_tags = ["tag:k8s"]

# Connector - advertise RKE2 Canal CNI pod and service CIDRs to tailnet
enable_connector     = true
connector_name       = "honey-connector"
connector_hostname   = "honey-cluster"
connector_tags       = ["tag:k8s-operator"]
enable_subnet_router = true
subnet_routes        = ["10.42.0.0/16", "10.43.0.0/16"]
enable_exit_node     = false
  • Step 2: Format the file
cd tofu/stacks/tailscale && tofu fmt honey.tfvars

Expected: File formatted, no diff or only whitespace changes.

  • Step 3: Validate syntax
cd tofu/stacks/tailscale && tofu init -backend=false && tofu validate

Expected: “Success! The configuration is valid.”

  • Step 4: Commit
git add tofu/stacks/tailscale/honey.tfvars
git commit -m "feat(tailscale): add honey.tfvars for on-prem cluster

Configures Tailscale operator for honey RKE2 cluster with:
- Connector CRD advertising Canal CNI CIDRs (10.42/16, 10.43/16)
- hostname honey-cluster on tailnet
- Operator already running, namespace pre-exists

Ref: TIN-140, TIN-78, #178"

Task 3: Create honey.tfvars for Attic Stack

Context: Attic is already running on honey via manual kubectl deployments. This tfvars file prepares the stack for future Tofu adoption. Key differences from Civo: no CNPG (plain StatefulSet PG), no ingress (tailnet-only), OpenEBS ZFS storage, existing namespace must be adopted.

Files:

  • Create: tofu/stacks/attic/honey.tfvars

  • Step 1: Write honey.tfvars

Create tofu/stacks/attic/honey.tfvars:

# Attic Stack - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
#
# IMPORTANT: Attic is already running on honey via manual kubectl deployments.
# This tfvars is for future Tofu adoption. Do NOT apply without importing
# existing resources first (namespace, configmap, secrets, deployments).

cluster_context        = "honey"
namespace              = "nix-cache"
environment            = "production"
adopt_existing_namespace = true

# Storage: OpenEBS ZFS on bumble node
use_rustfs          = true
rustfs_image        = "minio/minio:latest"
minio_storage_class = "openebs-bumble-zfs"
rustfs_volume_size  = "50Gi"
minio_root_user     = "minioadmin"

# PostgreSQL: No CNPG operator on honey - use existing StatefulSet PG
# The database_url points to the OpenEBS-backed PG after Task 1 cutover
use_cnpg_postgres   = false
install_cnpg_operator = false
pg_storage_class    = "openebs-bumble-zfs"

# API Server
attic_image      = "ghcr.io/zhaofengli/attic:latest"
api_min_replicas = 1
api_max_replicas = 2

# Ingress: DISABLED - all access via Tailscale proxy pods
enable_ingress = false
enable_tls     = false

# Monitoring
enable_prometheus_monitoring = false

# Bootstrap safety
api_wait_for_rollout = false
  • Step 2: Format the file
cd tofu/stacks/attic && tofu fmt honey.tfvars

Expected: File formatted.

  • Step 3: Validate syntax
cd tofu/stacks/attic && tofu init -backend=false && tofu validate

Expected: “Success! The configuration is valid.”

Note: If validation fails because adopt_existing_namespace is not yet defined in variables.tf, that variable needs to be added. It already exists in the stack’s variables.tf (line 53-57), so this should pass.

  • Step 4: Commit
git add tofu/stacks/attic/honey.tfvars
git commit -m "feat(attic): add honey.tfvars for on-prem cluster

Configures Attic stack for honey RKE2 cluster with:
- OpenEBS ZFS storage on bumble (openebs-bumble-zfs)
- No CNPG operator (plain StatefulSet PG, already deployed)
- No ingress (tailnet-only access via Tailscale proxy)
- Adopts existing nix-cache namespace
- NOT safe to apply yet (needs resource import)

Ref: TIN-125, TIN-148, TIN-78, #167, #185"

Task 4: Create honey.tfvars for ARC Runners Stack

Context: ARC runners are the primary GitHub CI workload. On honey, runners should schedule on sting (stateless compute). No Longhorn needed (OpenEBS ZFS available for persistent nix stores). Persistent /nix/store PVCs go on bumble via OpenEBS ZFS.

Files:

  • Create: tofu/stacks/arc-runners/honey.tfvars

  • Step 1: Write honey.tfvars

Create tofu/stacks/arc-runners/honey.tfvars:

# ARC Runners - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
#
# Runner pods schedule on sting (stateless compute).
# Persistent /nix/store PVCs use OpenEBS ZFS on bumble.
# No Longhorn - OpenEBS ZFS provides durable storage.

cluster_context      = "honey"
github_config_url    = "https://github.com/tinyland-inc"
github_config_secret = "github-app-secret"

# Controller
controller_chart_version = "0.14.0"

# Nix runners (primary workload: compositor builds, Elisp CI, flake checks)
nix_min_runners  = 0
nix_max_runners  = 5
nix_cpu_limit    = "4"
nix_memory_limit = "8Gi"

# Persistent Nix store on OpenEBS ZFS (bumble node)
nix_store_enabled          = true
nix_store_size             = "50Gi"
nix_store_init_derivations = "nixpkgs#bash nixpkgs#coreutils nixpkgs#git nixpkgs#cacert"

# Warm pool: keep 1 runner warm during business hours
nix_warm_pool_enabled = true
nix_warm_min_runners  = 1
nix_warm_schedule     = "0 13 * * 1-5"
nix_cold_schedule     = "0 1 * * *"

# Docker runners (general CI)
deploy_docker_runner = true
docker_cpu_limit     = "2"
docker_memory_limit  = "4Gi"

# DinD runners (container: directive support)
deploy_dind_runner             = true
dind_ephemeral_storage_request = "40Gi"
dind_ephemeral_storage_limit   = "50Gi"

# No Longhorn - OpenEBS ZFS provides storage layer
deploy_longhorn = false

# Cache integration - Attic on-cluster via tailnet
attic_server = "http://attic-api.nix-cache.svc:8080"
attic_cache  = "main"

# Extra runner scale sets for external repos
extra_runner_sets = {
  linux-xr-docker = {
    github_config_url         = "https://github.com/tinyland-inc"
    runner_label              = "linux-xr-docker"
    runner_type               = "dind"
    container_mode            = "dind"
    max_runners               = 2
    cpu_request               = "2"
    memory_request            = "8Gi"
    cpu_limit                 = "4"
    memory_limit              = "16Gi"
    ephemeral_storage_request = "40Gi"
    ephemeral_storage_limit   = "50Gi"
  }
}
  • Step 2: Format the file
cd tofu/stacks/arc-runners && tofu fmt honey.tfvars

Expected: File formatted.

  • Step 3: Validate syntax
cd tofu/stacks/arc-runners && tofu init -backend=false && tofu validate

Expected: “Success! The configuration is valid.”

  • Step 4: Commit
git add tofu/stacks/arc-runners/honey.tfvars
git commit -m "feat(arc): add honey.tfvars for on-prem runners

Configures ARC runners for honey RKE2 cluster with:
- Nix/Docker/DinD runner scale sets on sting (stateless compute)
- Persistent /nix/store on OpenEBS ZFS (bumble)
- Warm pool (1 runner during business hours)
- No Longhorn (OpenEBS ZFS instead)
- Attic cache integration via cluster-internal service

Ref: TIN-126, TIN-78, #170"

Task 5: Create honey.tfvars for Runner Dashboard Stack

Context: The SvelteKit runner dashboard monitors runner fleet status. On honey, it should be tailnet-only (no ingress, access via Tailscale proxy pod). No GitLab OAuth needed if using Tailscale identity.

Files:

  • Create: tofu/stacks/runner-dashboard/honey.tfvars

  • Step 1: Verify runner-dashboard variables.tf for required vars

Read tofu/stacks/runner-dashboard/variables.tf to identify required variables (those without defaults):

cd tofu/stacks/runner-dashboard && grep -A2 'variable.*{' variables.tf | grep -B1 'description'

Identify which variables are required vs optional to determine the minimal honey.tfvars.

  • Step 2: Write honey.tfvars

Create tofu/stacks/runner-dashboard/honey.tfvars:

# Runner Dashboard - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
#
# Tailnet-only access via Tailscale proxy pod.
# No ingress, no public DNS.

cluster_context = "honey"
namespace       = "runner-dashboard"

# Dashboard image (built and pushed by CI)
image = "ghcr.io/tinyland-inc/runner-dashboard:latest"

# Ingress: DISABLED - tailnet-only
enable_ingress = false

# Caddy sidecar for Tailscale auth
enable_caddy_sidecar = true

# Resources (sting node - stateless compute)
cpu_request    = "100m"
cpu_limit      = "500m"
memory_request = "128Mi"
memory_limit   = "256Mi"

Note: This is a minimal config. Additional variables (GitLab OAuth, Prometheus URL, session secret) should be added via -var flags from secrets or expanded after verifying which are actually required by the module.

  • Step 3: Format the file
cd tofu/stacks/runner-dashboard && tofu fmt honey.tfvars

Expected: File formatted.

  • Step 4: Validate syntax
cd tofu/stacks/runner-dashboard && tofu init -backend=false && tofu validate

Expected: “Success! The configuration is valid.”

If validation fails due to missing required variables, add placeholder values with comments noting they must come from secrets at apply time.

  • Step 5: Commit
git add tofu/stacks/runner-dashboard/honey.tfvars
git commit -m "feat(dashboard): add honey.tfvars for on-prem deployment

Configures runner dashboard for honey RKE2 cluster with:
- Tailnet-only access (no ingress)
- Caddy sidecar for Tailscale auth
- Minimal resource allocation on sting node

Ref: TIN-78"

Task 6: Create honey.tfvars for GitLab Runners Stack

Context: GitLab runners provide parity with ARC for GitLab CI pipelines. Same placement strategy: compute on sting, persistent storage on bumble.

Files:

  • Create: tofu/stacks/gitlab-runners/honey.tfvars

  • Step 1: Verify gitlab-runners variables.tf for required vars

Read tofu/stacks/gitlab-runners/variables.tf to identify required variables:

cd tofu/stacks/gitlab-runners && grep -B1 -A5 'variable' variables.tf | head -80
  • Step 2: Write honey.tfvars

Create tofu/stacks/gitlab-runners/honey.tfvars:

# GitLab Runners - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
#
# Runner pods schedule on sting (stateless compute).
# Secrets (runner token) passed via -var flags.

cluster_context = "honey"
namespace       = "gitlab-runners"

# Nix runner (Nix builds, flake checks)
deploy_nix_runner    = true
nix_runner_type      = "nix"
nix_cpu_limit        = "4"
nix_memory_limit     = "8Gi"
nix_concurrent_jobs  = 2

# Docker runner (general CI)
deploy_docker_runner = true
docker_cpu_limit     = "2"
docker_memory_limit  = "4Gi"

# DinD runner (container: directive support)
deploy_dind_runner = true

# Attic cache integration
attic_server = "http://attic-api.nix-cache.svc:8080"
attic_cache  = "main"

# HPA scaling
nix_min_replicas    = 0
nix_max_replicas    = 3
docker_min_replicas = 0
docker_max_replicas = 3

Note: The gitlab_token variable is sensitive and must be passed via TF_VAR_gitlab_token or -var flag at apply time. Check variables.tf for the exact variable name.

  • Step 3: Format the file
cd tofu/stacks/gitlab-runners && tofu fmt honey.tfvars

Expected: File formatted.

  • Step 4: Validate syntax
cd tofu/stacks/gitlab-runners && tofu init -backend=false && tofu validate

Expected: “Success! The configuration is valid.”

If validation fails due to variables not matching the module’s expected names, adjust the tfvars to match the actual variable names in variables.tf.

  • Step 5: Commit
git add tofu/stacks/gitlab-runners/honey.tfvars
git commit -m "feat(gitlab-runners): add honey.tfvars for on-prem runners

Configures GitLab runners for honey RKE2 cluster with:
- Nix/Docker/DinD runners on sting (stateless compute)
- Attic cache integration via cluster-internal service
- HPA scaling (0-3 per type)

Ref: TIN-78"

Task 7: Update CI to Validate honey.tfvars Files

Context: The .github/workflows/validate.yml CI pipeline runs tofu fmt -check and tofu validate on all stacks. The honey.tfvars files must pass these checks.

Files:

  • Modify: .github/workflows/validate.yml

  • Step 1: Write the failing test

Push the branch and verify CI fails if any honey.tfvars is malformatted:

# Deliberately malformat a file
echo "  cluster_context=\"honey\"" >> tofu/stacks/tailscale/honey.tfvars
cd tofu/stacks/tailscale && tofu fmt -check honey.tfvars

Expected: Exit code 1 (format check fails).

  • Step 2: Fix the formatting
cd tofu/stacks/tailscale && tofu fmt honey.tfvars

Expected: File reformatted.

  • Step 3: Verify CI validate job covers tfvars

The existing validate-stacks job in .github/workflows/validate.yml already runs:

- name: Format check
  run: |
    cd tofu/stacks/${{ matrix.stack }}
    tofu fmt -check -recursive

The -recursive flag means it checks all .tf and .tfvars files in the stack directory. No CI change needed as long as honey.tfvars files are properly formatted.

Verify locally:

for stack in tailscale attic arc-runners runner-dashboard gitlab-runners; do
  echo "=== $stack ==="
  cd tofu/stacks/$stack && tofu fmt -check -recursive && echo "OK" || echo "FAIL"
  cd -
done

Expected: All stacks report “OK”.

  • Step 4: Commit (if any CI changes were needed)

If the existing CI already covers tfvars (it should via -recursive), no commit needed for this task.


Task 8: Deploy Tailscale Connector CRD on Honey

Context: The Tailscale operator is already running on honey. This task deploys the Connector CRD to advertise honey’s pod and service CIDRs to the tailnet, enabling direct pod-to-pod access from other tailnet devices.

Files:

  • No file changes (kubectl operations against honey cluster)

Pre-requisites: Tasks 1-2 completed. Tailscale OAuth credentials available as TF_VAR_oauth_client_id and TF_VAR_oauth_client_secret.

  • Step 1: Verify Tailscale operator is running
kubectl --context honey get pods -n tailscale -l app.kubernetes.io/name=tailscale-operator

Expected: Operator pod in Running state.

  • Step 2: Check if Connector CRD type is registered
kubectl --context honey get crd connectors.tailscale.com

Expected: CRD exists (installed by operator Helm chart).

  • Step 3: Deploy Connector via Tofu

Option A (Tofu - recommended if operator was installed via Helm with the same release name):

cd tofu/stacks/tailscale
tofu init -backend=false
tofu plan -var-file=honey.tfvars \
  -var="oauth_client_id=$TF_VAR_oauth_client_id" \
  -var="oauth_client_secret=$TF_VAR_oauth_client_secret"

Review the plan. It should show:

  • helm_release.tailscale_operator - may show changes if existing install differs
  • kubectl_manifest.connector[0] - CREATE (new Connector CRD)

If the Helm release conflicts with the existing operator installation, use Option B instead.

Option B (kubectl - apply Connector CRD directly):

cat <<'EOF' | kubectl --context honey apply -f -
apiVersion: tailscale.com/v1alpha1
kind: Connector
metadata:
  name: honey-connector
spec:
  tags:
    - tag:k8s-operator
  hostname: honey-cluster
  subnetRouter:
    advertiseRoutes:
      - 10.42.0.0/16
      - 10.43.0.0/16
  exitNode: false
EOF
  • Step 4: Verify Connector pod starts
kubectl --context honey get pods -n tailscale -l tailscale.com/parent-resource=honey-connector

Expected: Connector proxy pod in Running state within 60 seconds.

  • Step 5: Verify routes appear on tailnet
# From any tailnet device (e.g., macbook-neo)
tailscale status | grep honey-cluster

Expected: honey-cluster appears as a subnet router advertising 10.42.0.0/16 and 10.43.0.0/16.

  • Step 6: Test pod-level connectivity from tailnet
# Get a pod IP on honey
POD_IP=$(kubectl --context honey get pod -n nix-cache -l app=attic-api -o jsonpath='{.items[0].status.podIP}')
echo "Testing connectivity to pod $POD_IP:8080"
curl -sf --connect-timeout 5 "http://$POD_IP:8080/_attic/api/v1/cache-config/main" && echo "OK" || echo "FAIL"

Expected: If subnet routes are approved in Tailscale admin, direct pod access works from tailnet devices.

  • Step 7: Commit
git commit --allow-empty -m "ops(tailscale): deploy Connector CRD on honey cluster

Deployed honey-connector advertising RKE2 Canal CNI CIDRs:
- 10.42.0.0/16 (pod CIDR)
- 10.43.0.0/16 (service CIDR)

Connector hostname: honey-cluster
Routes require approval in Tailscale admin console.

Ref: TIN-140, #178"

Task 9: Deploy ARC Controller and Runners on Honey

Context: This is the primary new deployment. ARC controller and runner scale sets need to be installed fresh on honey. The GitHub App secret must be pre-created in the cluster.

Files:

  • No file changes (Tofu apply operations)

Pre-requisites: Tasks 2, 4 completed. github-app-secret K8s secret exists in arc-systems namespace.

  • Step 1: Create the GitHub App secret on honey
kubectl --context honey create namespace arc-systems --dry-run=client -o yaml | kubectl --context honey apply -f -
kubectl --context honey create namespace arc-runners --dry-run=client -o yaml | kubectl --context honey apply -f -

# Create GitHub App secret (values from sops or env)
kubectl --context honey create secret generic github-app-secret \
  -n arc-systems \
  --from-literal=github_app_id="$GITHUB_APP_ID" \
  --from-literal=github_app_installation_id="$GITHUB_APP_INSTALLATION_ID" \
  --from-literal=github_app_private_key="$GITHUB_APP_PRIVATE_KEY" \
  --dry-run=client -o yaml | kubectl --context honey apply -f -

Expected: Namespaces and secret created.

  • Step 2: Initialize and plan
cd tofu/stacks/arc-runners
tofu init -backend=false
tofu plan -var-file=honey.tfvars \
  -out=honey.tfplan

Review the plan carefully. Expected resources:

  • helm_release.arc_controller - ARC controller Helm chart

  • helm_release.gh_nix - Nix runner scale set

  • helm_release.gh_docker - Docker runner scale set

  • helm_release.gh_dind - DinD runner scale set

  • helm_release.linux_xr_docker - Extra runner scale set

  • Various RBAC, NetworkPolicy, HPA resources

  • Step 3: Apply

cd tofu/stacks/arc-runners
tofu apply honey.tfplan

Expected: All resources created successfully.

  • Step 4: Verify ARC controller is running
kubectl --context honey get pods -n arc-systems
kubectl --context honey get pods -n arc-runners

Expected: Controller pod Running in arc-systems. Runner scale set listener pods Running in arc-runners.

  • Step 5: Trigger a test workflow

Push a trivial change to a tinyland-inc repo with a workflow that uses runs-on: [self-hosted, gh-nix]. Verify the runner picks up the job.

gh run list --repo tinyland-inc/GloriousFlywheel --limit 3

Expected: Workflow runs pick up self-hosted runners from honey.

  • Step 6: Commit
git commit --allow-empty -m "ops(arc): deploy ARC controller and runners on honey

Deployed ARC 0.14.0 with scale sets:
- gh-nix (0-5, persistent /nix/store on OpenEBS ZFS)
- gh-docker (0-5)
- gh-dind (0-5)
- linux-xr-docker (0-2)

Warm pool: 1 nix runner during business hours (M-F 13:00-01:00 UTC)

Ref: TIN-126, TIN-78, #170"

Task 10: Validate Full Deployment

Context: End-to-end validation that the honey cluster is serving all workloads correctly.

  • Step 1: Verify all pods healthy
kubectl --context honey get pods -A --field-selector status.phase!=Running,status.phase!=Succeeded | grep -v Completed

Expected: No pods in CrashLoopBackOff, Pending, or Error state (except any known pre-existing issues).

  • Step 2: Verify Attic cache is functional
# From a tailnet device, test cache push/pull
nix store ping --store http://attic-api.nix-cache.svc:8080 2>/dev/null || \
  echo "Direct pod access via tailnet - verify Connector routes are approved"
  • Step 3: Verify Tailscale proxy access to all services
# Check all Tailscale proxy pods
kubectl --context honey get pods -A -l tailscale.com/parent-resource

Expected: All 21+ proxy pods Running.

  • Step 4: Verify ARC runners register with GitHub
gh api /orgs/tinyland-inc/actions/runners --jq '.runners[] | select(.labels[].name == "self-hosted") | .name + " " + .status'

Expected: honey-based runners show as “online”.

  • Step 5: Document deployment state

Create a snapshot of the deployment state for the operations log:

echo "=== Honey Cluster State $(date -Iseconds) ===" > /tmp/honey-state.txt
kubectl --context honey get nodes -o wide >> /tmp/honey-state.txt
kubectl --context honey get pods -A -o wide >> /tmp/honey-state.txt
kubectl --context honey get pvc -A >> /tmp/honey-state.txt
kubectl --context honey get svc -A >> /tmp/honey-state.txt
helm --kube-context honey list -A >> /tmp/honey-state.txt
cat /tmp/honey-state.txt
  • Step 6: Final commit with all honey.tfvars files

If any files were modified during validation, commit them:

git status
# Stage any remaining changes
git add -A
git commit -m "feat(honey): complete on-prem migration for all stacks

All five Tofu stacks have honey.tfvars:
- tailscale: Connector CRD with Canal CNI CIDRs
- attic: OpenEBS ZFS storage, no CNPG, tailnet-only
- arc-runners: ARC 0.14.0, persistent nix store, warm pool
- runner-dashboard: Tailnet-only, Caddy sidecar
- gitlab-runners: Nix/Docker/DinD on sting

Completed:
- OpenEBS storage cutover (attic-config -> openebs services)
- Tailscale Connector advertising 10.42/16 + 10.43/16
- ARC controller + runner scale sets deployed
- All access via Tailscale MagicDNS (no ingress)

Ref: TIN-78, TIN-125, TIN-126, TIN-140, TIN-148"

Deployment Order

Execute tasks in this order to minimize risk:

Task 1 (Attic cutover)      <- No Tofu, kubectl only, lowest risk
  |
Task 2 (tailscale tfvars)   <- File creation only
Task 3 (attic tfvars)       <- File creation only
Task 4 (arc-runners tfvars) <- File creation only
Task 5 (dashboard tfvars)   <- File creation only
Task 6 (gitlab-runners tfvars) <- File creation only
  |
Task 7 (CI validation)      <- Verify format checks pass
  |
Task 8 (Deploy Connector)   <- First Tofu/kubectl apply
  |
Task 9 (Deploy ARC)         <- Primary new deployment
  |
Task 10 (Validate)          <- End-to-end verification

Tasks 2-6 are independent and can be done in parallel.


Rollback Plan

Component Rollback action
Attic configmap (Task 1) kubectl apply -f /tmp/attic-config-backup-*.yaml then restart pods
Tailscale Connector (Task 8) kubectl delete connector honey-connector
ARC runners (Task 9) cd tofu/stacks/arc-runners && tofu destroy -var-file=honey.tfvars
Any tfvars file git checkout -- tofu/stacks/*/honey.tfvars

Out of Scope (Deferred)

  • Runner dashboard deployment (Task 5 creates tfvars only; actual apply requires image build + secrets)
  • GitLab runners deployment (Task 6 creates tfvars only; actual apply requires runner token)
  • Importing existing honey resources into Tofu state (manual kubectl deployments predate Tofu)
  • Tearing down Civo deployments (separate decision, separate issue)
  • PostgreSQL ZFS tuning (recordsize=16k, full_page_writes=off) - operational follow-up
  • Cleaning stuck liqo-storage namespace on honey
  • Fixing tinyland-staging Pending pods (4 pods, 9 days)
  • Operational runbook creation (7 docs referenced by user)

GloriousFlywheel