Tailnet Operator Plane

Tailnet-First Operator Plane

Defines the tailnet-first access model, dashboard exposure boundaries, and multi-org runner enrollment for the GloriousFlywheel platform.

Current boundary: tailnet access is an access-auth envelope, not a complete mutation authority. See Auth and Mutation Authority for the current read/write split.

Design Principle

The operator plane is private and tailnet-first. Dashboard auth is mixed-mode today: trusted Tailscale or mTLS proxy identity is preferred, while GitLab OAuth and WebAuthn remain interactive compatibility paths.

Operator (tailnet device)
  -> Tailscale tunnel
  -> Caddy reverse proxy (Tailscale or mTLS mode)
  -> Dashboard (SvelteKit) / API / MCP server

Access Model

Who Can Access What

Role Access Method Scope
Operator Dashboard UI over tailnet View fleet, use compatibility pause/resume or config submission when configured
Org Admin Dashboard UI + forge review Review compatibility GitOps MRs and enrollment requests
Platform Engineer CLI (kubectl, tofu, MCP) over tailnet Full cluster access, infrastructure changes
Downstream Consumer GitHub/GitLab CI only Submit jobs, no platform access
External User None (tailnet-only) No access to operator plane

Authentication Flow

User opens dashboard URL (e.g. https://dashboard.tail12345.ts.net)
  -> Caddy checks Tailscale or mTLS identity
  -> Emits trusted identity headers
  -> Maps to platform role:
     - Org owner email -> admin
     - Org member email -> operator
     - Unknown -> default role or explicit deny at proxy/policy layer
  -> SvelteKit reads trusted headers when TRUST_PROXY_HEADERS=true

Proxy identity outranks stored interactive sessions when trusted headers are enabled. Interactive GitLab OAuth and WebAuthn sessions still exist as compatibility paths.

Exposure Boundaries

What’s on the Tailnet

Service URL Pattern Port Auth
Dashboard UI dashboard.tail*.ts.net 443 (Caddy TLS) tailscale_auth
Dashboard API dashboard.tail*.ts.net/api/* 443 tailscale_auth + role check
MCP Server local/tooling process N/A Current connector/tool auth
Prometheus prometheus.tail*.ts.net 443 tailscale_auth
Grafana grafana.tail*.ts.net 443 tailscale_auth

What’s NOT on the Tailnet

Service Access Reason
ARC Controller Cluster-internal only No external API needed
Runner pods Cluster-internal only Ephemeral, no direct access
Attic cache Cluster-internal + tailnet Runners use cluster DNS, devs use tailnet
PostgreSQL Cluster-internal only CNPG manages access via mTLS
RustFS Cluster-internal only S3 API for Attic only

MCP Server Access

The MCP server runs as a local stdio process on the operator’s machine. It calls the dashboard API over the tailnet:

Claude Code -> MCP Server (stdio, local)
  -> HTTP to dashboard.tail*.ts.net/api/*
  -> Dashboard request path
  -> Dashboard resolves trusted proxy/session identity and checks role
  -> Returns envelope response
  -> MCP Server formats for Claude

Multi-Org Runner Enrollment

Model

GloriousFlywheel supports runner sharing across multiple GitHub organizations and GitLab groups through a single platform instance.

Platform Instance (single cluster)
  +-- Org A (GitHub)
  |   +-- tinyland-nix (shared)
  |   +-- tinyland-docker (shared)
  |   +-- tinyland-dind (shared)
  |
  +-- Org B (GitHub)
  |   +-- tinyland-docker (shared)
  |   +-- tinyland-nix-gpu (shared additive capability when enabled)
  |
  +-- Group C (GitLab)
      +-- gl-docker (shared)
      +-- gl-nix (shared)

Enrollment Types

Type Scope Registration Lifecycle
Shared All enrolled orgs Single GitHub App installation per org, all orgs route to same scale set Platform manages, orgs consume
Capability add-on Enrolled orgs with approved need Shared label with explicit runtime, architecture, privilege, or resource reason Platform provisions
Org-plus-user Org + specific repos GitHub App with restricted repo access Org admin configures repo list

Registration Flow

GitHub (ARC):

1. Org admin installs GloriousFlywheel GitHub App
2. App installation generates credentials
3. Platform engineer adds org to arc-runners stack:
   - New GitHub App secret in cluster
   - ARC scale set configured with org's app credentials
4. Runners appear in org's GitHub Actions runner list
5. Org's workflows use shared labels such as `tinyland-nix`

GitLab:

1. Group admin creates group runner token
2. Platform engineer adds token to gitlab-runners stack
3. Runner registers with GitLab group
4. Group's pipelines pick up the runner via tags

Runner Isolation

Isolation Level Mechanism Use Case
None (shared pool) All orgs share same scale set pods Trusted orgs, cost efficiency
Namespace Separate Kubernetes namespace per org Untrusted orgs, resource isolation
Node Dedicated node pool per org Strict isolation, compliance

Default: shared pool with ephemeral pods. Each job gets a fresh pod that is destroyed after completion. No cross-job data leakage.

Enrollment Lifecycle

Enroll:
  Org admin requests enrollment
  -> Platform engineer adds org config to tofu stack
  -> PR + review + merge + apply
  -> Runner appears in org's forge

Monitor:
  Dashboard shows enrollment status per org
  -> /api/runners groups by forge
  -> Metrics tracked per org via runner labels

Offboard:
  Org admin requests removal
  -> Platform engineer removes org config
  -> PR + review + merge + apply
  -> Runner deregisters from org's forge
  -> Pods drain, secrets deleted

Ownership Matrix

Decision Owner Process
Grant org enrollment Org Admin Request via issue, approved by platform team
Provision shared runner Platform Engineer PR to tofu/stacks/arc-runners/
Provision additive capability lane Platform Engineer PR with explicit runtime, privilege, architecture, or bounded resource reason
Set resource quotas per org Org Admin PR to stack variables
Manage GitHub App installation Org Admin (per org) GitHub org settings
Rotate runner credentials Platform Engineer Scheduled or on-demand via runbook
Emergency pause (compatibility runners) Operator Dashboard compatibility flow when GitLab backend is configured

GloriousFlywheel