Replaced Headscale (too buggy in 0.28.x — random node drops) with direct WireGuard hub-and-spoke + full mesh. 7 Proxmox VMs across 3 hosts form a K3s v1.34.6 cluster: 3 control-plane/etcd nodes, 4 workers. Running services: postgres, mariadb, ghost (x3), forgejo, authentik. All unpinned services use local-path StorageClass. Databases pinned to pve-worker and adder-worker with local PVs. Includes VM provisioning scripts (create-debian-template.sh, clone-vm.sh), K3s manifests for all services, and full deployment docs in k3s/README.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
142 lines
5.6 KiB
Markdown
142 lines
5.6 KiB
Markdown
# K3s Session State
|
||
# Saved: 2026-04-06 (end of session 3)
|
||
|
||
## Current State
|
||
|
||
New Proxmox-based K3s cluster in progress. VirtualBox cluster retired.
|
||
All 7 Proxmox VMs created and on WireGuard mesh. K3s not yet installed.
|
||
Old VirtualBox services (ghost, forgejo, postgres, mariadb) still running on old cluster until migration complete.
|
||
|
||
## Proxmox VMs
|
||
|
||
| Node | vmbr1 IP | WG IP | Proxmox Host | Role |
|
||
|---|---|---|---|---|
|
||
| pve-control | 10.10.10.151 | 10.0.0.6 | pve | k3s control plane |
|
||
| pve-worker | 10.10.10.126 | 10.0.0.7 | pve | k3s worker |
|
||
| adder-control | 10.10.10.185 | 10.0.0.8 | adder | k3s control plane |
|
||
| adder-worker | 10.10.10.83 | 10.0.0.9 | adder | k3s worker |
|
||
| game-control | 10.10.10.158 | 10.0.0.10 | game | k3s control plane |
|
||
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game | k3s worker (local-lvm/HDD) |
|
||
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game | k3s worker (game-ssd/NVMe) |
|
||
|
||
WG IPs 10.0.0.2–10.0.0.5 reserved (old VirtualBox nodes, do not reuse).
|
||
Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1
|
||
|
||
## VM Specs
|
||
|
||
| Node | vCPUs | RAM | Disk | Storage |
|
||
|---|---|---|---|---|
|
||
| pve-control | 2 | 2GB | 20G | local-lvm |
|
||
| pve-worker | 6 | 8GB | 100G | local-lvm |
|
||
| adder-control | 2 | 2GB | 20G | local-lvm |
|
||
| adder-worker | 6 | 8GB | 100G | local-lvm |
|
||
| game-control | 2 | 2GB | 20G | local-lvm |
|
||
| game-worker-hdd | 6 | 8GB | 200G | local-lvm (HDD) |
|
||
| game-worker-ssd | 10 | 8GB | 200G | game-ssd (NVMe) |
|
||
|
||
## Network Architecture
|
||
|
||
- All VMs on vmbr1 (10.10.10.0/24), DHCP
|
||
- WireGuard mesh via DO hub — all nodes have static WG IPs (10.0.0.0/24)
|
||
- Full mesh: all nodes have each other as explicit WireGuard peers (not just hub-and-spoke)
|
||
- K3s will use --flannel-iface=wg0 so all cluster traffic runs over WireGuard
|
||
- Caddy at DO hub proxies external traffic to any node's WG IP + NodePort
|
||
- Tailscale/Headscale abandoned — too unreliable for cluster networking
|
||
|
||
## Proxmox Host Specs
|
||
|
||
- pve: workstation i9-13900KF, 96GB RAM
|
||
- adder: Proxmox node with RTX 2070, 4TB NVMe available
|
||
- game: Proxmox node with RTX 2070, 16GB RAM, 256GB NVMe (game-ssd) + 2TB HDD (local-lvm)
|
||
|
||
## VM Provisioning
|
||
|
||
### Template & Clone Scripts
|
||
Scripts at `~/private/Knowledge/repos/homelab/proxmox/scripts/`:
|
||
- `create-debian-template.sh <VMID> <NAME> [STORAGE] [BRIDGE]`
|
||
- Defaults: STORAGE=local-lvm, BRIDGE=vmbr1
|
||
- Bakes in: qemu-guest-agent, curl, wget, nano, rsync, htop, tmux, emacs-nox, nfs-common, tailscale
|
||
- Zeroes /etc/machine-id, removes /etc/ssh/ssh_host_* (Cloud-Init regenerates on first boot)
|
||
- Does NOT create .ssh or set keys — done post-boot via qm set
|
||
- `clone-vm.sh <TEMPLATE_VMID> <NEW_VMID> <NAME> [CORES] [MEMORY_MB] [DISK_SIZE] [STORAGE]`
|
||
- Defaults: 2 cores, 2048MB RAM, 20G disk, local-lvm storage
|
||
- Full clone, auto-starts the VM
|
||
|
||
### Post-Clone Formula (confirmed working)
|
||
1. Clone: `./clone-vm.sh <template> <vmid> <name> [cores] [mem] [disk] [storage]`
|
||
2. Get IP: `qm guest cmd <vmid> network-get-interfaces`
|
||
3. Set SSH key: `qm set <vmid> --sshkeys <pubkey-file>`
|
||
4. Reboot VM: `qm reboot <vmid>`
|
||
5. SSH in: `ssh samantha@<ip>`
|
||
6. Configure WireGuard on the VM
|
||
|
||
### VMID Convention
|
||
- pve: 100-199 (templates at 199)
|
||
- adder: 200-299 (templates at 299 — currently 200 exists, destroy after use)
|
||
- game: 300-399 (templates at 399 — currently 300 exists, destroy after use)
|
||
|
||
### Useful Proxmox CLI
|
||
- `qm guest cmd <VMID> network-get-interfaces` — get VM IP
|
||
- `qm set <VMID> --vga std --delete serial0` — fix serial console
|
||
- `qm destroy <VMID> --purge` — remove VM
|
||
- `qm list` — list all VMs
|
||
- `vgs` — check local-lvm free space
|
||
- `pvesh get /nodes/<nodename>/status` — CPU/memory usage
|
||
|
||
## Immediate Next Steps
|
||
1. Install K3s on pve-control first (--cluster-init)
|
||
2. Join adder-control and game-control as control plane peers
|
||
3. Join all 4 workers
|
||
4. Label workers and GPU nodes
|
||
5. Create namespaces: sjasoft, fulfillment, privacy-practice
|
||
6. Migrate services from old VirtualBox cluster
|
||
|
||
## K3s Install — see k3s/README.md for full commands
|
||
|
||
- Control plane uses --cluster-init on first node, --server on subsequent nodes
|
||
- All nodes use --flannel-iface=wg0 and --node-ip=<wg-ip>
|
||
- Traefik disabled on all nodes
|
||
- 3 control plane nodes for HA etcd (tolerates 1 failure)
|
||
|
||
## Running Services (old VirtualBox cluster — not yet migrated)
|
||
|
||
- postgres:16 — ClusterIP:5432
|
||
- mariadb:11 — ClusterIP:3306
|
||
- ghost1/2/3 — NodePorts 32368/32369/32370
|
||
- forgejo:9 — NodePort 32371, git.sjasoft.com
|
||
|
||
## NodePort Registry
|
||
|
||
| Port | Service | Namespace |
|
||
|---|---|---|
|
||
| 32368 | ghost1 | fulfillment |
|
||
| 32369 | ghost2 | fulfillment |
|
||
| 32370 | ghost3 | fulfillment |
|
||
| 32371 | forgejo | sjasoft |
|
||
|
||
## Manifests
|
||
|
||
All in Knowledge/repos/homelab/k3s/:
|
||
- k3s/postgres/postgres.yaml
|
||
- k3s/mariadb/mariadb.yaml
|
||
- k3s/ghost/ghost.yaml
|
||
- k3s/forgejo/forgejo.yaml
|
||
- k3s/README.md (authoritative WG mesh table + K3s install commands)
|
||
|
||
## Remaining Services to Port (from Proxmox Docker stack)
|
||
- authentik.yml — SSO (postgres)
|
||
- n8n.yml — automation (postgres)
|
||
- vaultwarden.yml — passwords
|
||
- nats.yml — messaging
|
||
- monerod.yml — monero node
|
||
- snikket.yml — XMPP
|
||
- synapse.yml — Matrix
|
||
|
||
## Known Issues / Notes
|
||
- Tailscale/Headscale abandoned — unreliable, randomly drops nodes, requires manual reconnect
|
||
- WireGuard full mesh is the correct approach for K3s cluster networking
|
||
- kubectl requires KUBECONFIG=~/.kube/config in ~/.bashrc on control nodes
|
||
- Cross-namespace secrets not supported — keep secrets in same namespace as consumer
|
||
- game node only has 16GB RAM — allocate worker VMs conservatively
|
||
- game-ssd is only 256GB NVMe — keep disk allocations conservative on game-worker-ssd
|
||
- Templates should be destroyed after all clones are complete on each node
|