Tailnet Operator Plane

Tailnet-First Operator Plane

Defines the tailnet-first access model, dashboard exposure boundaries, and multi-org runner enrollment for the GloriousFlywheel platform.

Current boundary: tailnet access is an access-auth envelope, not a complete mutation authority. See Auth and Mutation Authority for the current read/write split.

Design Principle

The operator plane is private and tailnet-first. Dashboard auth is mixed-mode today: trusted Tailscale or mTLS proxy identity is preferred, while GitLab OAuth and WebAuthn remain interactive compatibility paths.

Operator (tailnet device)
  -> Tailscale tunnel
  -> Caddy reverse proxy (Tailscale or mTLS mode)
  -> Dashboard (SvelteKit) / API / MCP server

Access Model

Who Can Access What

Role	Access Method	Scope
Operator	Dashboard UI over tailnet	View fleet, use compatibility pause/resume or config submission when configured
Org Admin	Dashboard UI + forge review	Review compatibility GitOps MRs and enrollment requests
Platform Engineer	CLI (`kubectl`, `tofu`, MCP) over tailnet	Full cluster access, infrastructure changes
Downstream Consumer	GitHub/GitLab CI only	Submit jobs, no platform access
External User	None (tailnet-only)	No access to operator plane

Authentication Flow

User opens dashboard URL (e.g. https://dashboard.tail12345.ts.net)
  -> Caddy checks Tailscale or mTLS identity
  -> Emits trusted identity headers
  -> Maps to platform role:
     - Org owner email -> admin
     - Org member email -> operator
     - Unknown -> default role or explicit deny at proxy/policy layer
  -> SvelteKit reads trusted headers when TRUST_PROXY_HEADERS=true

Proxy identity outranks stored interactive sessions when trusted headers are enabled. Interactive GitLab OAuth and WebAuthn sessions still exist as compatibility paths.

Exposure Boundaries

What’s on the Tailnet

Service	URL Pattern	Port	Auth
Dashboard UI	`dashboard.tail*.ts.net`	443 (Caddy TLS)	tailscale_auth
Dashboard API	`dashboard.tail.ts.net/api/`	443	tailscale_auth + role check
MCP Server	local/tooling process	N/A	Current connector/tool auth
Prometheus	`prometheus.tail*.ts.net`	443	tailscale_auth
Grafana	`grafana.tail*.ts.net`	443	tailscale_auth

What’s NOT on the Tailnet

Service	Access	Reason
ARC Controller	Cluster-internal only	No external API needed
Runner pods	Cluster-internal only	Ephemeral, no direct access
Attic cache	Cluster-internal + tailnet	Runners use cluster DNS, devs use tailnet
PostgreSQL	Cluster-internal only	CNPG manages access via mTLS
RustFS	Cluster-internal only	S3 API for Attic only

MCP Server Access

The MCP server runs as a local stdio process on the operator’s machine. It calls the dashboard API over the tailnet:

Claude Code -> MCP Server (stdio, local)
  -> HTTP to dashboard.tail*.ts.net/api/*
  -> Dashboard request path
  -> Dashboard resolves trusted proxy/session identity and checks role
  -> Returns envelope response
  -> MCP Server formats for Claude

Multi-Org Runner Enrollment

Model

GloriousFlywheel supports runner sharing across multiple GitHub organizations and GitLab groups through a single platform instance.

Platform Instance (single cluster)
  +-- Org A (GitHub)
  |   +-- tinyland-nix (shared)
  |   +-- tinyland-docker (shared)
  |   +-- tinyland-dind (shared)
  |
  +-- Org B (GitHub)
  |   +-- tinyland-docker (shared)
  |   +-- tinyland-nix-gpu (shared additive capability when enabled)
  |
  +-- Group C (GitLab)
      +-- gl-docker (shared)
      +-- gl-nix (shared)

Enrollment Types

Type	Scope	Registration	Lifecycle
Shared	All enrolled orgs	Single GitHub App installation per org, all orgs route to same scale set	Platform manages, orgs consume
Capability add-on	Enrolled orgs with approved need	Shared label with explicit runtime, architecture, privilege, or resource reason	Platform provisions
Org-plus-user	Org + specific repos	GitHub App with restricted repo access	Org admin configures repo list

Registration Flow

GitHub (ARC):

1. Org admin installs GloriousFlywheel GitHub App
2. App installation generates credentials
3. Platform engineer adds org to arc-runners stack:
   - New GitHub App secret in cluster
   - ARC scale set configured with org's app credentials
4. Runners appear in org's GitHub Actions runner list
5. Org's workflows use shared labels such as `tinyland-nix`

GitLab:

1. Group admin creates group runner token
2. Platform engineer adds token to gitlab-runners stack
3. Runner registers with GitLab group
4. Group's pipelines pick up the runner via tags

Runner Isolation

Isolation Level	Mechanism	Use Case
None (shared pool)	All orgs share same scale set pods	Trusted orgs, cost efficiency
Namespace	Separate Kubernetes namespace per org	Untrusted orgs, resource isolation
Node	Dedicated node pool per org	Strict isolation, compliance

Default: shared pool with ephemeral pods. Each job gets a fresh pod that is destroyed after completion. No cross-job data leakage.

Enrollment Lifecycle

Enroll:
  Org admin requests enrollment
  -> Platform engineer adds org config to tofu stack
  -> PR + review + merge + apply
  -> Runner appears in org's forge

Monitor:
  Dashboard shows enrollment status per org
  -> /api/runners groups by forge
  -> Metrics tracked per org via runner labels

Offboard:
  Org admin requests removal
  -> Platform engineer removes org config
  -> PR + review + merge + apply
  -> Runner deregisters from org's forge
  -> Pods drain, secrets deleted

Ownership Matrix

Decision	Owner	Process
Grant org enrollment	Org Admin	Request via issue, approved by platform team
Provision shared runner	Platform Engineer	PR to `tofu/stacks/arc-runners/`
Provision additive capability lane	Platform Engineer	PR with explicit runtime, privilege, architecture, or bounded resource reason
Set resource quotas per org	Org Admin	PR to stack variables
Manage GitHub App installation	Org Admin (per org)	GitHub org settings
Rotate runner credentials	Platform Engineer	Scheduled or on-demand via runbook
Emergency pause (compatibility runners)	Operator	Dashboard compatibility flow when GitLab backend is configured