Honey Cluster Migration Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Migrate all GloriousFlywheel IaC stacks from Civo to the on-prem honey RKE2 cluster, completing the OpenEBS storage cutover and deploying ARC runners, GitLab runners, and the runner dashboard on-prem with tailnet-only access.
Architecture: Three-node on-prem RKE2 v1.32.12 cluster (honey=control-plane, bumble=ZFS durable storage, sting=stateless compute). All operator access via Tailscale MagicDNS (taila4c78d.ts.net). OpenEBS ZFS CSI on bumble (openebs-bumble-zfs StorageClass, pool tank/openebs). No ingress controllers, no CNPG operator. Canal CNI with pod CIDR 10.42.0.0/16 and service CIDR 10.43.0.0/16.
Tech Stack: OpenTofu 1.8.x, RKE2 v1.32.12, Tailscale Operator (Helm), GitHub ARC 0.14.0, OpenEBS ZFS CSI, SvelteKit 5 + Skeleton v4, kubectl, Helm
Tracker References:
- TIN-78: Workload migration from Civo to honey/bumble/sting (In Progress)
- TIN-125 / #167: Cache/state contract convergence (P0)
- TIN-128 / #169: Local-first Tofu + blahaj deployment (P1)
- TIN-140 / #178: Install Tailscale K8s Operator (P1)
- TIN-148 / #185: Fix Attic cache deployment (P0)
File Structure
New files (create)
| File | Responsibility |
|---|---|
tofu/stacks/tailscale/honey.tfvars |
Tailscale stack config for honey cluster |
tofu/stacks/attic/honey.tfvars |
Attic stack config for honey cluster |
tofu/stacks/arc-runners/honey.tfvars |
ARC runners config for honey cluster |
tofu/stacks/runner-dashboard/honey.tfvars |
Dashboard config for honey cluster |
tofu/stacks/gitlab-runners/honey.tfvars |
GitLab runners config for honey cluster |
Modified files
| File | Change |
|---|---|
.github/workflows/validate.yml |
Add honey.tfvars format validation |
tofu/stacks/attic/main.tf |
Add adopt_existing_namespace plumbing if missing |
Existing files (reference only, no modifications)
| File | Used for |
|---|---|
tofu/modules/tailscale-operator/main.tf |
Understanding Helm + Connector CRD wiring |
tofu/modules/tailscale-operator/variables.tf |
Variable defaults and validation rules |
tofu/stacks/tailscale/main.tf |
Stack-to-module variable pass-through |
tofu/stacks/attic/variables.tf |
All variable definitions for honey.tfvars |
tofu/stacks/arc-runners/variables.tf |
All variable definitions for honey.tfvars |
tofu/stacks/tailscale/civo.tfvars |
Reference for honey.tfvars structure |
tofu/stacks/attic/civo.tfvars |
Reference for honey.tfvars structure |
tofu/stacks/arc-runners/civo.tfvars |
Reference for honey.tfvars structure |
Cluster Context
Nodes:
| Node | Role | IP (tailnet) | Workload placement |
|---|---|---|---|
| honey | control-plane | 100.86.x.x | Control plane, lightweight pods |
| bumble | worker | 100.93.x.x | ZFS-backed durable state (PG, RustFS, PVCs) |
| sting | worker | 100.121.x.x | Stateless compute (runners, API servers, dashboards) |
Storage:
| StorageClass | Provisioner | Node | Notes |
|---|---|---|---|
openebs-bumble-zfs |
zfs.csi.openebs.io |
bumble | lz4 compression, 128k recordsize, tank/openebs pool |
local-path |
rancher local-path | honey | Legacy, being migrated away |
Existing services on honey (nix-cache namespace):
| Service | Type | Backend | Status |
|---|---|---|---|
attic-pg-rw |
PostgreSQL | local-path on honey | Running (9d, has restarts) |
attic-pg-openebs-rw |
PostgreSQL | openebs-bumble-zfs on bumble | Running (~21h, zero restarts) |
attic-rustfs-hl |
RustFS/MinIO | local-path on honey | Running |
attic-rustfs-openebs |
RustFS/MinIO | openebs-bumble-zfs on bumble | Running (~21h, zero restarts) |
attic-api |
Attic server | N/A | Points to attic-pg-rw + attic-rustfs-hl (STALE) |
Tailscale operator: Already installed on honey, managing 21 proxy pods. No Connector CRD deployed yet.
Task 1: Complete Attic OpenEBS Storage Cutover
Context: Attic on honey currently points to attic-pg-rw (local-path on honey node) and attic-rustfs-hl (local-path on honey node). The OpenEBS-backed equivalents (attic-pg-openebs-rw, attic-rustfs-openebs) are running with zero restarts on bumble. This task switches the config to use the durable ZFS-backed services.
Files:
- Modify:
attic-configConfigMap innix-cachenamespace on honey cluster (kubectl, not Tofu)
Pre-requisites: kubectl context set to honey (kubeconfig at ~/.kube/config with context name matching honey cluster).
- Step 1: Verify both PostgreSQL services are healthy
kubectl --context honey get pods -n nix-cache -l app=postgresql -o wide
kubectl --context honey exec -n nix-cache deploy/attic-pg-openebs -- pg_isready -U attic
Expected: Both attic-pg-rw and attic-pg-openebs-rw pods Running, pg_isready returns “accepting connections”.
- Step 2: Verify RustFS OpenEBS service is healthy
kubectl --context honey exec -n nix-cache deploy/attic-rustfs-openebs -- mc ready local
If mc is not available in the container, use:
kubectl --context honey port-forward -n nix-cache svc/attic-rustfs-openebs 9001:9000 &
curl -s http://localhost:9001/minio/health/live
kill %1
Expected: health check passes or “mc ready” returns 0.
- Step 3: Dump current configmap for backup
kubectl --context honey get configmap attic-config -n nix-cache -o yaml > /tmp/attic-config-backup-$(date +%Y%m%d).yaml
Expected: YAML file saved with current configuration.
- Step 4: Migrate data from local-path PostgreSQL to OpenEBS PostgreSQL
# Dump from the old local-path PG
kubectl --context honey exec -n nix-cache deploy/attic-pg -- pg_dump -U attic -d attic -Fc -f /tmp/attic-dump.pgfc
# Copy the dump out
kubectl --context honey cp nix-cache/$(kubectl --context honey get pod -n nix-cache -l app=postgresql,storage=local-path -o jsonpath='{.items[0].metadata.name}'):/tmp/attic-dump.pgfc /tmp/attic-dump.pgfc
# Copy into the OpenEBS PG pod
kubectl --context honey cp /tmp/attic-dump.pgfc nix-cache/$(kubectl --context honey get pod -n nix-cache -l app=postgresql,storage=openebs -o jsonpath='{.items[0].metadata.name}'):/tmp/attic-dump.pgfc
# Restore into OpenEBS PG
kubectl --context honey exec -n nix-cache deploy/attic-pg-openebs -- pg_restore -U attic -d attic --clean --if-exists /tmp/attic-dump.pgfc
Note: The exact pod label selectors above are illustrative. Adjust the selectors or use pod names directly based on the actual labels on honey. If the databases are already synced (e.g., via replication), skip this step. Verify by comparing row counts:
kubectl --context honey exec -n nix-cache deploy/attic-pg -- psql -U attic -d attic -c "SELECT count(*) FROM nars;"
kubectl --context honey exec -n nix-cache deploy/attic-pg-openebs -- psql -U attic -d attic -c "SELECT count(*) FROM nars;"
- Step 5: Update the attic-config ConfigMap
Patch the configmap to point to OpenEBS-backed services:
kubectl --context honey get configmap attic-config -n nix-cache -o jsonpath='{.data.server\.toml}' > /tmp/attic-server.toml
Edit /tmp/attic-server.toml to replace:
attic-pg-rw.nix-cache.svc.cluster.localwithattic-pg-openebs-rw.nix-cache.svc.cluster.localattic-rustfs-hl.nix-cache.svcwithattic-rustfs-openebs.nix-cache.svc
The [database] section should become:
[database]
url = "postgresql://attic:ZilWsA_e%2BPU0rm2M%2A4a-DT%3D_947wSwAT@attic-pg-openebs-rw.nix-cache.svc.cluster.local:5432/attic?sslmode=disable"
The [storage] section should become:
[storage]
type = "s3"
endpoint = "http://attic-rustfs-openebs.nix-cache.svc:9000"
Apply:
kubectl --context honey create configmap attic-config -n nix-cache \
--from-file=server.toml=/tmp/attic-server.toml \
--dry-run=client -o yaml | kubectl --context honey apply -f -
- Step 6: Restart Attic pods to pick up new config
kubectl --context honey rollout restart deployment/attic-api -n nix-cache
kubectl --context honey rollout restart deployment/attic-gc -n nix-cache 2>/dev/null || true
kubectl --context honey rollout status deployment/attic-api -n nix-cache --timeout=120s
Expected: Pods restart and reach Running state within 2 minutes.
- Step 7: Validate Attic is serving from OpenEBS backends
# Check API health
kubectl --context honey exec -n nix-cache deploy/attic-api -- curl -sf http://localhost:8080/_attic/api/v1/cache-config/main || echo "FAIL"
# Check logs for database connection
kubectl --context honey logs deploy/attic-api -n nix-cache --tail=20 | grep -i "database\|postgres\|connected"
Expected: API responds, logs show successful connection to attic-pg-openebs-rw.
- Step 8: Commit
No file changes to commit for this task (all changes were kubectl operations). Record the cutover in a commit message for audit:
git add -A && git commit --allow-empty -m "ops(attic): complete OpenEBS storage cutover on honey
Switched attic-config ConfigMap from local-path services to
OpenEBS ZFS-backed services on bumble node:
- attic-pg-rw -> attic-pg-openebs-rw
- attic-rustfs-hl -> attic-rustfs-openebs
Ref: TIN-125, TIN-148, #167, #185"
Task 2: Create honey.tfvars for Tailscale Stack
Context: The Tailscale operator is already running on honey. This tfvars file configures the Tofu stack to manage it going forward and deploy a Connector CRD that advertises honey’s pod and service CIDRs to the tailnet.
Files:
-
Create:
tofu/stacks/tailscale/honey.tfvars -
Step 1: Write honey.tfvars
Create tofu/stacks/tailscale/honey.tfvars:
# Tailscale Stack - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
cluster_context = "honey"
namespace = "tailscale"
create_namespace = false
chart_version = "1.94.2"
# Tags
default_tags = ["tag:k8s"]
# Connector - advertise RKE2 Canal CNI pod and service CIDRs to tailnet
enable_connector = true
connector_name = "honey-connector"
connector_hostname = "honey-cluster"
connector_tags = ["tag:k8s-operator"]
enable_subnet_router = true
subnet_routes = ["10.42.0.0/16", "10.43.0.0/16"]
enable_exit_node = false
- Step 2: Format the file
cd tofu/stacks/tailscale && tofu fmt honey.tfvars
Expected: File formatted, no diff or only whitespace changes.
- Step 3: Validate syntax
cd tofu/stacks/tailscale && tofu init -backend=false && tofu validate
Expected: “Success! The configuration is valid.”
- Step 4: Commit
git add tofu/stacks/tailscale/honey.tfvars
git commit -m "feat(tailscale): add honey.tfvars for on-prem cluster
Configures Tailscale operator for honey RKE2 cluster with:
- Connector CRD advertising Canal CNI CIDRs (10.42/16, 10.43/16)
- hostname honey-cluster on tailnet
- Operator already running, namespace pre-exists
Ref: TIN-140, TIN-78, #178"
Task 3: Create honey.tfvars for Attic Stack
Context: Attic is already running on honey via manual kubectl deployments. This tfvars file prepares the stack for future Tofu adoption. Key differences from Civo: no CNPG (plain StatefulSet PG), no ingress (tailnet-only), OpenEBS ZFS storage, existing namespace must be adopted.
Files:
-
Create:
tofu/stacks/attic/honey.tfvars -
Step 1: Write honey.tfvars
Create tofu/stacks/attic/honey.tfvars:
# Attic Stack - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
#
# IMPORTANT: Attic is already running on honey via manual kubectl deployments.
# This tfvars is for future Tofu adoption. Do NOT apply without importing
# existing resources first (namespace, configmap, secrets, deployments).
cluster_context = "honey"
namespace = "nix-cache"
environment = "production"
adopt_existing_namespace = true
# Storage: OpenEBS ZFS on bumble node
use_rustfs = true
rustfs_image = "minio/minio:latest"
minio_storage_class = "openebs-bumble-zfs"
rustfs_volume_size = "50Gi"
minio_root_user = "minioadmin"
# PostgreSQL: No CNPG operator on honey - use existing StatefulSet PG
# The database_url points to the OpenEBS-backed PG after Task 1 cutover
use_cnpg_postgres = false
install_cnpg_operator = false
pg_storage_class = "openebs-bumble-zfs"
# API Server
attic_image = "ghcr.io/zhaofengli/attic:latest"
api_min_replicas = 1
api_max_replicas = 2
# Ingress: DISABLED - all access via Tailscale proxy pods
enable_ingress = false
enable_tls = false
# Monitoring
enable_prometheus_monitoring = false
# Bootstrap safety
api_wait_for_rollout = false
- Step 2: Format the file
cd tofu/stacks/attic && tofu fmt honey.tfvars
Expected: File formatted.
- Step 3: Validate syntax
cd tofu/stacks/attic && tofu init -backend=false && tofu validate
Expected: “Success! The configuration is valid.”
Note: If validation fails because adopt_existing_namespace is not yet defined in variables.tf, that variable needs to be added. It already exists in the stack’s variables.tf (line 53-57), so this should pass.
- Step 4: Commit
git add tofu/stacks/attic/honey.tfvars
git commit -m "feat(attic): add honey.tfvars for on-prem cluster
Configures Attic stack for honey RKE2 cluster with:
- OpenEBS ZFS storage on bumble (openebs-bumble-zfs)
- No CNPG operator (plain StatefulSet PG, already deployed)
- No ingress (tailnet-only access via Tailscale proxy)
- Adopts existing nix-cache namespace
- NOT safe to apply yet (needs resource import)
Ref: TIN-125, TIN-148, TIN-78, #167, #185"
Task 4: Create honey.tfvars for ARC Runners Stack
Context: ARC runners are the primary GitHub CI workload. On honey, runners should schedule on sting (stateless compute). No Longhorn needed (OpenEBS ZFS available for persistent nix stores). Persistent /nix/store PVCs go on bumble via OpenEBS ZFS.
Files:
-
Create:
tofu/stacks/arc-runners/honey.tfvars -
Step 1: Write honey.tfvars
Create tofu/stacks/arc-runners/honey.tfvars:
# ARC Runners - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
#
# Runner pods schedule on sting (stateless compute).
# Persistent /nix/store PVCs use OpenEBS ZFS on bumble.
# No Longhorn - OpenEBS ZFS provides durable storage.
cluster_context = "honey"
github_config_url = "https://github.com/tinyland-inc"
github_config_secret = "github-app-secret"
# Controller
controller_chart_version = "0.14.0"
# Nix runners (primary workload: compositor builds, Elisp CI, flake checks)
nix_min_runners = 0
nix_max_runners = 5
nix_cpu_limit = "4"
nix_memory_limit = "8Gi"
# Persistent Nix store on OpenEBS ZFS (bumble node)
nix_store_enabled = true
nix_store_size = "50Gi"
nix_store_init_derivations = "nixpkgs#bash nixpkgs#coreutils nixpkgs#git nixpkgs#cacert"
# Warm pool: keep 1 runner warm during business hours
nix_warm_pool_enabled = true
nix_warm_min_runners = 1
nix_warm_schedule = "0 13 * * 1-5"
nix_cold_schedule = "0 1 * * *"
# Docker runners (general CI)
deploy_docker_runner = true
docker_cpu_limit = "2"
docker_memory_limit = "4Gi"
# DinD runners (container: directive support)
deploy_dind_runner = true
dind_ephemeral_storage_request = "40Gi"
dind_ephemeral_storage_limit = "50Gi"
# No Longhorn - OpenEBS ZFS provides storage layer
deploy_longhorn = false
# Cache integration - Attic on-cluster via tailnet
attic_server = "http://attic-api.nix-cache.svc:8080"
attic_cache = "main"
# Extra runner scale sets for external repos
extra_runner_sets = {
linux-xr-docker = {
github_config_url = "https://github.com/tinyland-inc"
runner_label = "linux-xr-docker"
runner_type = "dind"
container_mode = "dind"
max_runners = 2
cpu_request = "2"
memory_request = "8Gi"
cpu_limit = "4"
memory_limit = "16Gi"
ephemeral_storage_request = "40Gi"
ephemeral_storage_limit = "50Gi"
}
}
- Step 2: Format the file
cd tofu/stacks/arc-runners && tofu fmt honey.tfvars
Expected: File formatted.
- Step 3: Validate syntax
cd tofu/stacks/arc-runners && tofu init -backend=false && tofu validate
Expected: “Success! The configuration is valid.”
- Step 4: Commit
git add tofu/stacks/arc-runners/honey.tfvars
git commit -m "feat(arc): add honey.tfvars for on-prem runners
Configures ARC runners for honey RKE2 cluster with:
- Nix/Docker/DinD runner scale sets on sting (stateless compute)
- Persistent /nix/store on OpenEBS ZFS (bumble)
- Warm pool (1 runner during business hours)
- No Longhorn (OpenEBS ZFS instead)
- Attic cache integration via cluster-internal service
Ref: TIN-126, TIN-78, #170"
Task 5: Create honey.tfvars for Runner Dashboard Stack
Context: The SvelteKit runner dashboard monitors runner fleet status. On honey, it should be tailnet-only (no ingress, access via Tailscale proxy pod). No GitLab OAuth needed if using Tailscale identity.
Files:
-
Create:
tofu/stacks/runner-dashboard/honey.tfvars -
Step 1: Verify runner-dashboard variables.tf for required vars
Read tofu/stacks/runner-dashboard/variables.tf to identify required variables (those without defaults):
cd tofu/stacks/runner-dashboard && grep -A2 'variable.*{' variables.tf | grep -B1 'description'
Identify which variables are required vs optional to determine the minimal honey.tfvars.
- Step 2: Write honey.tfvars
Create tofu/stacks/runner-dashboard/honey.tfvars:
# Runner Dashboard - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
#
# Tailnet-only access via Tailscale proxy pod.
# No ingress, no public DNS.
cluster_context = "honey"
namespace = "runner-dashboard"
# Dashboard image (built and pushed by CI)
image = "ghcr.io/tinyland-inc/runner-dashboard:latest"
# Ingress: DISABLED - tailnet-only
enable_ingress = false
# Caddy sidecar for Tailscale auth
enable_caddy_sidecar = true
# Resources (sting node - stateless compute)
cpu_request = "100m"
cpu_limit = "500m"
memory_request = "128Mi"
memory_limit = "256Mi"
Note: This is a minimal config. Additional variables (GitLab OAuth, Prometheus URL, session secret) should be added via -var flags from secrets or expanded after verifying which are actually required by the module.
- Step 3: Format the file
cd tofu/stacks/runner-dashboard && tofu fmt honey.tfvars
Expected: File formatted.
- Step 4: Validate syntax
cd tofu/stacks/runner-dashboard && tofu init -backend=false && tofu validate
Expected: “Success! The configuration is valid.”
If validation fails due to missing required variables, add placeholder values with comments noting they must come from secrets at apply time.
- Step 5: Commit
git add tofu/stacks/runner-dashboard/honey.tfvars
git commit -m "feat(dashboard): add honey.tfvars for on-prem deployment
Configures runner dashboard for honey RKE2 cluster with:
- Tailnet-only access (no ingress)
- Caddy sidecar for Tailscale auth
- Minimal resource allocation on sting node
Ref: TIN-78"
Task 6: Create honey.tfvars for GitLab Runners Stack
Context: GitLab runners provide parity with ARC for GitLab CI pipelines. Same placement strategy: compute on sting, persistent storage on bumble.
Files:
-
Create:
tofu/stacks/gitlab-runners/honey.tfvars -
Step 1: Verify gitlab-runners variables.tf for required vars
Read tofu/stacks/gitlab-runners/variables.tf to identify required variables:
cd tofu/stacks/gitlab-runners && grep -B1 -A5 'variable' variables.tf | head -80
- Step 2: Write honey.tfvars
Create tofu/stacks/gitlab-runners/honey.tfvars:
# GitLab Runners - Honey On-Prem Deployment
# RKE2 v1.32.12 cluster (honey/bumble/sting)
#
# Runner pods schedule on sting (stateless compute).
# Secrets (runner token) passed via -var flags.
cluster_context = "honey"
namespace = "gitlab-runners"
# Nix runner (Nix builds, flake checks)
deploy_nix_runner = true
nix_runner_type = "nix"
nix_cpu_limit = "4"
nix_memory_limit = "8Gi"
nix_concurrent_jobs = 2
# Docker runner (general CI)
deploy_docker_runner = true
docker_cpu_limit = "2"
docker_memory_limit = "4Gi"
# DinD runner (container: directive support)
deploy_dind_runner = true
# Attic cache integration
attic_server = "http://attic-api.nix-cache.svc:8080"
attic_cache = "main"
# HPA scaling
nix_min_replicas = 0
nix_max_replicas = 3
docker_min_replicas = 0
docker_max_replicas = 3
Note: The gitlab_token variable is sensitive and must be passed via TF_VAR_gitlab_token or -var flag at apply time. Check variables.tf for the exact variable name.
- Step 3: Format the file
cd tofu/stacks/gitlab-runners && tofu fmt honey.tfvars
Expected: File formatted.
- Step 4: Validate syntax
cd tofu/stacks/gitlab-runners && tofu init -backend=false && tofu validate
Expected: “Success! The configuration is valid.”
If validation fails due to variables not matching the module’s expected names, adjust the tfvars to match the actual variable names in variables.tf.
- Step 5: Commit
git add tofu/stacks/gitlab-runners/honey.tfvars
git commit -m "feat(gitlab-runners): add honey.tfvars for on-prem runners
Configures GitLab runners for honey RKE2 cluster with:
- Nix/Docker/DinD runners on sting (stateless compute)
- Attic cache integration via cluster-internal service
- HPA scaling (0-3 per type)
Ref: TIN-78"
Task 7: Update CI to Validate honey.tfvars Files
Context: The .github/workflows/validate.yml CI pipeline runs tofu fmt -check and tofu validate on all stacks. The honey.tfvars files must pass these checks.
Files:
-
Modify:
.github/workflows/validate.yml -
Step 1: Write the failing test
Push the branch and verify CI fails if any honey.tfvars is malformatted:
# Deliberately malformat a file
echo " cluster_context=\"honey\"" >> tofu/stacks/tailscale/honey.tfvars
cd tofu/stacks/tailscale && tofu fmt -check honey.tfvars
Expected: Exit code 1 (format check fails).
- Step 2: Fix the formatting
cd tofu/stacks/tailscale && tofu fmt honey.tfvars
Expected: File reformatted.
- Step 3: Verify CI validate job covers tfvars
The existing validate-stacks job in .github/workflows/validate.yml already runs:
- name: Format check
run: |
cd tofu/stacks/${{ matrix.stack }}
tofu fmt -check -recursive
The -recursive flag means it checks all .tf and .tfvars files in the stack directory. No CI change needed as long as honey.tfvars files are properly formatted.
Verify locally:
for stack in tailscale attic arc-runners runner-dashboard gitlab-runners; do
echo "=== $stack ==="
cd tofu/stacks/$stack && tofu fmt -check -recursive && echo "OK" || echo "FAIL"
cd -
done
Expected: All stacks report “OK”.
- Step 4: Commit (if any CI changes were needed)
If the existing CI already covers tfvars (it should via -recursive), no commit needed for this task.
Task 8: Deploy Tailscale Connector CRD on Honey
Context: The Tailscale operator is already running on honey. This task deploys the Connector CRD to advertise honey’s pod and service CIDRs to the tailnet, enabling direct pod-to-pod access from other tailnet devices.
Files:
- No file changes (kubectl operations against honey cluster)
Pre-requisites: Tasks 1-2 completed. Tailscale OAuth credentials available as TF_VAR_oauth_client_id and TF_VAR_oauth_client_secret.
- Step 1: Verify Tailscale operator is running
kubectl --context honey get pods -n tailscale -l app.kubernetes.io/name=tailscale-operator
Expected: Operator pod in Running state.
- Step 2: Check if Connector CRD type is registered
kubectl --context honey get crd connectors.tailscale.com
Expected: CRD exists (installed by operator Helm chart).
- Step 3: Deploy Connector via Tofu
Option A (Tofu - recommended if operator was installed via Helm with the same release name):
cd tofu/stacks/tailscale
tofu init -backend=false
tofu plan -var-file=honey.tfvars \
-var="oauth_client_id=$TF_VAR_oauth_client_id" \
-var="oauth_client_secret=$TF_VAR_oauth_client_secret"
Review the plan. It should show:
helm_release.tailscale_operator- may show changes if existing install differskubectl_manifest.connector[0]- CREATE (new Connector CRD)
If the Helm release conflicts with the existing operator installation, use Option B instead.
Option B (kubectl - apply Connector CRD directly):
cat <<'EOF' | kubectl --context honey apply -f -
apiVersion: tailscale.com/v1alpha1
kind: Connector
metadata:
name: honey-connector
spec:
tags:
- tag:k8s-operator
hostname: honey-cluster
subnetRouter:
advertiseRoutes:
- 10.42.0.0/16
- 10.43.0.0/16
exitNode: false
EOF
- Step 4: Verify Connector pod starts
kubectl --context honey get pods -n tailscale -l tailscale.com/parent-resource=honey-connector
Expected: Connector proxy pod in Running state within 60 seconds.
- Step 5: Verify routes appear on tailnet
# From any tailnet device (e.g., macbook-neo)
tailscale status | grep honey-cluster
Expected: honey-cluster appears as a subnet router advertising 10.42.0.0/16 and 10.43.0.0/16.
- Step 6: Test pod-level connectivity from tailnet
# Get a pod IP on honey
POD_IP=$(kubectl --context honey get pod -n nix-cache -l app=attic-api -o jsonpath='{.items[0].status.podIP}')
echo "Testing connectivity to pod $POD_IP:8080"
curl -sf --connect-timeout 5 "http://$POD_IP:8080/_attic/api/v1/cache-config/main" && echo "OK" || echo "FAIL"
Expected: If subnet routes are approved in Tailscale admin, direct pod access works from tailnet devices.
- Step 7: Commit
git commit --allow-empty -m "ops(tailscale): deploy Connector CRD on honey cluster
Deployed honey-connector advertising RKE2 Canal CNI CIDRs:
- 10.42.0.0/16 (pod CIDR)
- 10.43.0.0/16 (service CIDR)
Connector hostname: honey-cluster
Routes require approval in Tailscale admin console.
Ref: TIN-140, #178"
Task 9: Deploy ARC Controller and Runners on Honey
Context: This is the primary new deployment. ARC controller and runner scale sets need to be installed fresh on honey. The GitHub App secret must be pre-created in the cluster.
Files:
- No file changes (Tofu apply operations)
Pre-requisites: Tasks 2, 4 completed. github-app-secret K8s secret exists in arc-systems namespace.
- Step 1: Create the GitHub App secret on honey
kubectl --context honey create namespace arc-systems --dry-run=client -o yaml | kubectl --context honey apply -f -
kubectl --context honey create namespace arc-runners --dry-run=client -o yaml | kubectl --context honey apply -f -
# Create GitHub App secret (values from sops or env)
kubectl --context honey create secret generic github-app-secret \
-n arc-systems \
--from-literal=github_app_id="$GITHUB_APP_ID" \
--from-literal=github_app_installation_id="$GITHUB_APP_INSTALLATION_ID" \
--from-literal=github_app_private_key="$GITHUB_APP_PRIVATE_KEY" \
--dry-run=client -o yaml | kubectl --context honey apply -f -
Expected: Namespaces and secret created.
- Step 2: Initialize and plan
cd tofu/stacks/arc-runners
tofu init -backend=false
tofu plan -var-file=honey.tfvars \
-out=honey.tfplan
Review the plan carefully. Expected resources:
-
helm_release.arc_controller- ARC controller Helm chart -
helm_release.gh_nix- Nix runner scale set -
helm_release.gh_docker- Docker runner scale set -
helm_release.gh_dind- DinD runner scale set -
helm_release.linux_xr_docker- Extra runner scale set -
Various RBAC, NetworkPolicy, HPA resources
-
Step 3: Apply
cd tofu/stacks/arc-runners
tofu apply honey.tfplan
Expected: All resources created successfully.
- Step 4: Verify ARC controller is running
kubectl --context honey get pods -n arc-systems
kubectl --context honey get pods -n arc-runners
Expected: Controller pod Running in arc-systems. Runner scale set listener pods Running in arc-runners.
- Step 5: Trigger a test workflow
Push a trivial change to a tinyland-inc repo with a workflow that uses runs-on: [self-hosted, gh-nix]. Verify the runner picks up the job.
gh run list --repo tinyland-inc/GloriousFlywheel --limit 3
Expected: Workflow runs pick up self-hosted runners from honey.
- Step 6: Commit
git commit --allow-empty -m "ops(arc): deploy ARC controller and runners on honey
Deployed ARC 0.14.0 with scale sets:
- gh-nix (0-5, persistent /nix/store on OpenEBS ZFS)
- gh-docker (0-5)
- gh-dind (0-5)
- linux-xr-docker (0-2)
Warm pool: 1 nix runner during business hours (M-F 13:00-01:00 UTC)
Ref: TIN-126, TIN-78, #170"
Task 10: Validate Full Deployment
Context: End-to-end validation that the honey cluster is serving all workloads correctly.
- Step 1: Verify all pods healthy
kubectl --context honey get pods -A --field-selector status.phase!=Running,status.phase!=Succeeded | grep -v Completed
Expected: No pods in CrashLoopBackOff, Pending, or Error state (except any known pre-existing issues).
- Step 2: Verify Attic cache is functional
# From a tailnet device, test cache push/pull
nix store ping --store http://attic-api.nix-cache.svc:8080 2>/dev/null || \
echo "Direct pod access via tailnet - verify Connector routes are approved"
- Step 3: Verify Tailscale proxy access to all services
# Check all Tailscale proxy pods
kubectl --context honey get pods -A -l tailscale.com/parent-resource
Expected: All 21+ proxy pods Running.
- Step 4: Verify ARC runners register with GitHub
gh api /orgs/tinyland-inc/actions/runners --jq '.runners[] | select(.labels[].name == "self-hosted") | .name + " " + .status'
Expected: honey-based runners show as “online”.
- Step 5: Document deployment state
Create a snapshot of the deployment state for the operations log:
echo "=== Honey Cluster State $(date -Iseconds) ===" > /tmp/honey-state.txt
kubectl --context honey get nodes -o wide >> /tmp/honey-state.txt
kubectl --context honey get pods -A -o wide >> /tmp/honey-state.txt
kubectl --context honey get pvc -A >> /tmp/honey-state.txt
kubectl --context honey get svc -A >> /tmp/honey-state.txt
helm --kube-context honey list -A >> /tmp/honey-state.txt
cat /tmp/honey-state.txt
- Step 6: Final commit with all honey.tfvars files
If any files were modified during validation, commit them:
git status
# Stage any remaining changes
git add -A
git commit -m "feat(honey): complete on-prem migration for all stacks
All five Tofu stacks have honey.tfvars:
- tailscale: Connector CRD with Canal CNI CIDRs
- attic: OpenEBS ZFS storage, no CNPG, tailnet-only
- arc-runners: ARC 0.14.0, persistent nix store, warm pool
- runner-dashboard: Tailnet-only, Caddy sidecar
- gitlab-runners: Nix/Docker/DinD on sting
Completed:
- OpenEBS storage cutover (attic-config -> openebs services)
- Tailscale Connector advertising 10.42/16 + 10.43/16
- ARC controller + runner scale sets deployed
- All access via Tailscale MagicDNS (no ingress)
Ref: TIN-78, TIN-125, TIN-126, TIN-140, TIN-148"
Deployment Order
Execute tasks in this order to minimize risk:
Task 1 (Attic cutover) <- No Tofu, kubectl only, lowest risk
|
Task 2 (tailscale tfvars) <- File creation only
Task 3 (attic tfvars) <- File creation only
Task 4 (arc-runners tfvars) <- File creation only
Task 5 (dashboard tfvars) <- File creation only
Task 6 (gitlab-runners tfvars) <- File creation only
|
Task 7 (CI validation) <- Verify format checks pass
|
Task 8 (Deploy Connector) <- First Tofu/kubectl apply
|
Task 9 (Deploy ARC) <- Primary new deployment
|
Task 10 (Validate) <- End-to-end verification
Tasks 2-6 are independent and can be done in parallel.
Rollback Plan
| Component | Rollback action |
|---|---|
| Attic configmap (Task 1) | kubectl apply -f /tmp/attic-config-backup-*.yaml then restart pods |
| Tailscale Connector (Task 8) | kubectl delete connector honey-connector |
| ARC runners (Task 9) | cd tofu/stacks/arc-runners && tofu destroy -var-file=honey.tfvars |
| Any tfvars file | git checkout -- tofu/stacks/*/honey.tfvars |
Out of Scope (Deferred)
- Runner dashboard deployment (Task 5 creates tfvars only; actual apply requires image build + secrets)
- GitLab runners deployment (Task 6 creates tfvars only; actual apply requires runner token)
- Importing existing honey resources into Tofu state (manual kubectl deployments predate Tofu)
- Tearing down Civo deployments (separate decision, separate issue)
- PostgreSQL ZFS tuning (recordsize=16k, full_page_writes=off) - operational follow-up
- Cleaning stuck liqo-storage namespace on honey
- Fixing tinyland-staging Pending pods (4 pods, 9 days)
- Operational runbook creation (7 docs referenced by user)