Tailnet-First Operator Plane
Defines the tailnet-first access model, dashboard exposure boundaries, and multi-org runner enrollment for the GloriousFlywheel platform.
Design Principle
The operator plane (dashboard, API, MCP server) is exposed exclusively over the tailnet. No public endpoints. Authentication is handled by Tailscale’s identity provider (tsidp), not application-level passwords.
Operator (tailnet device)
-> Tailscale tunnel
-> Caddy reverse proxy (tailscale_auth directive)
-> Dashboard (SvelteKit) / API / MCP server
Access Model
Who Can Access What
| Role | Access Method | Scope |
|---|---|---|
| Operator | Dashboard UI over tailnet | View fleet, pause/resume, edit config, view metrics |
| Org Admin | Dashboard UI + GitOps MRs | All operator actions + approve config changes, manage enrollment |
| Platform Engineer | CLI (kubectl, tofu, MCP) over tailnet |
Full cluster access, infrastructure changes |
| Downstream Consumer | GitHub/GitLab CI only | Submit jobs, no platform access |
| External User | None (tailnet-only) | No access to operator plane |
Authentication Flow
User opens dashboard URL (e.g. https://dashboard.tail12345.ts.net)
-> Caddy checks Tailscale identity via tailscale_auth
-> Extracts tailnet user identity (email, node name)
-> Maps to platform role:
- Org owner email -> admin
- Org member email -> operator
- Unknown -> denied (not on tailnet = no access)
-> Sets X-Tailscale-User header
-> SvelteKit reads identity from header, populates session
No passwords. No OAuth flows. No session cookies to manage. If you’re on the tailnet, you’re authenticated. Role comes from identity.
Exposure Boundaries
What’s on the Tailnet
| Service | URL Pattern | Port | Auth |
|---|---|---|---|
| Dashboard UI | dashboard.tail*.ts.net |
443 (Caddy TLS) | tailscale_auth |
| Dashboard API | dashboard.tail*.ts.net/api/* |
443 | tailscale_auth + role check |
| MCP Server | stdio (local process) | N/A | DASHBOARD_TOKEN env var |
| Prometheus | prometheus.tail*.ts.net |
443 | tailscale_auth |
| Grafana | grafana.tail*.ts.net |
443 | tailscale_auth |
What’s NOT on the Tailnet
| Service | Access | Reason |
|---|---|---|
| ARC Controller | Cluster-internal only | No external API needed |
| Runner pods | Cluster-internal only | Ephemeral, no direct access |
| Attic cache | Cluster-internal + tailnet | Runners use cluster DNS, devs use tailnet |
| PostgreSQL | Cluster-internal only | CNPG manages access via mTLS |
| RustFS | Cluster-internal only | S3 API for Attic only |
MCP Server Access
The MCP server runs as a local stdio process on the operator’s machine. It calls the dashboard API over the tailnet:
Claude Code -> MCP Server (stdio, local)
-> HTTP to dashboard.tail*.ts.net/api/*
-> Bearer token = tsidp-issued JWT
-> Dashboard validates token, checks role
-> Returns envelope response
-> MCP Server formats for Claude
Multi-Org Runner Enrollment
Model
GloriousFlywheel supports runner sharing across multiple GitHub organizations and GitLab groups through a single platform instance.
Platform Instance (single cluster)
+-- Org A (GitHub)
| +-- gh-nix (shared)
| +-- gh-docker (shared)
| +-- gh-dind (shared)
|
+-- Org B (GitHub)
| +-- gh-docker (shared, same scale set)
| +-- org-b-gpu (dedicated, org-specific)
|
+-- Group C (GitLab)
+-- gl-docker (shared)
+-- gl-nix (shared)
Enrollment Types
| Type | Scope | Registration | Lifecycle |
|---|---|---|---|
| Shared | All enrolled orgs | Single GitHub App installation per org, all orgs route to same scale set | Platform manages, orgs consume |
| Dedicated | Single org | Org-specific GitHub App or runner group | Org requests, platform provisions |
| Org-plus-user | Org + specific repos | GitHub App with restricted repo access | Org admin configures repo list |
Registration Flow
GitHub (ARC):
1. Org admin installs GloriousFlywheel GitHub App
2. App installation generates credentials
3. Platform engineer adds org to arc-runners stack:
- New GitHub App secret in cluster
- ARC scale set configured with org's app credentials
4. Runners appear in org's GitHub Actions runner list
5. Org's workflows use `runs-on: [self-hosted, nix]`
GitLab:
1. Group admin creates group runner token
2. Platform engineer adds token to gitlab-runners stack
3. Runner registers with GitLab group
4. Group's pipelines pick up the runner via tags
Runner Isolation
| Isolation Level | Mechanism | Use Case |
|---|---|---|
| None (shared pool) | All orgs share same scale set pods | Trusted orgs, cost efficiency |
| Namespace | Separate Kubernetes namespace per org | Untrusted orgs, resource isolation |
| Node | Dedicated node pool per org | Strict isolation, compliance |
Default: shared pool with ephemeral pods. Each job gets a fresh pod that is destroyed after completion. No cross-job data leakage.
Enrollment Lifecycle
Enroll:
Org admin requests enrollment
-> Platform engineer adds org config to tofu stack
-> PR + review + merge + apply
-> Runner appears in org's forge
Monitor:
Dashboard shows enrollment status per org
-> /api/runners groups by forge
-> Metrics tracked per org via runner labels
Offboard:
Org admin requests removal
-> Platform engineer removes org config
-> PR + review + merge + apply
-> Runner deregisters from org's forge
-> Pods drain, secrets deleted
Ownership Matrix
| Decision | Owner | Process |
|---|---|---|
| Grant org enrollment | Org Admin | Request via issue, approved by platform team |
| Provision shared runner | Platform Engineer | PR to tofu/stacks/arc-runners/ |
| Provision dedicated runner | Platform Engineer | PR with new module call + namespace |
| Set resource quotas per org | Org Admin | PR to stack variables |
| Manage GitHub App installation | Org Admin (per org) | GitHub org settings |
| Rotate runner credentials | Platform Engineer | Scheduled or on-demand via runbook |
| Emergency pause (all orgs) | Operator | Dashboard or MCP pause_runner |