homelab/K3s-SESSION-STATE.md
Samantha Atkins 759ef949bc K3s cluster on Proxmox with WireGuard mesh networking
Replaced Headscale (too buggy in 0.28.x — random node drops) with direct
WireGuard hub-and-spoke + full mesh. 7 Proxmox VMs across 3 hosts form a
K3s v1.34.6 cluster: 3 control-plane/etcd nodes, 4 workers.

Running services: postgres, mariadb, ghost (x3), forgejo, authentik.
All unpinned services use local-path StorageClass. Databases pinned to
pve-worker and adder-worker with local PVs.

Includes VM provisioning scripts (create-debian-template.sh, clone-vm.sh),
K3s manifests for all services, and full deployment docs in k3s/README.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 01:23:13 -04:00

142 lines
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# K3s Session State
# Saved: 2026-04-06 (end of session 3)
## Current State
New Proxmox-based K3s cluster in progress. VirtualBox cluster retired.
All 7 Proxmox VMs created and on WireGuard mesh. K3s not yet installed.
Old VirtualBox services (ghost, forgejo, postgres, mariadb) still running on old cluster until migration complete.
## Proxmox VMs
| Node | vmbr1 IP | WG IP | Proxmox Host | Role |
|---|---|---|---|---|
| pve-control | 10.10.10.151 | 10.0.0.6 | pve | k3s control plane |
| pve-worker | 10.10.10.126 | 10.0.0.7 | pve | k3s worker |
| adder-control | 10.10.10.185 | 10.0.0.8 | adder | k3s control plane |
| adder-worker | 10.10.10.83 | 10.0.0.9 | adder | k3s worker |
| game-control | 10.10.10.158 | 10.0.0.10 | game | k3s control plane |
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game | k3s worker (local-lvm/HDD) |
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game | k3s worker (game-ssd/NVMe) |
WG IPs 10.0.0.210.0.0.5 reserved (old VirtualBox nodes, do not reuse).
Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1
## VM Specs
| Node | vCPUs | RAM | Disk | Storage |
|---|---|---|---|---|
| pve-control | 2 | 2GB | 20G | local-lvm |
| pve-worker | 6 | 8GB | 100G | local-lvm |
| adder-control | 2 | 2GB | 20G | local-lvm |
| adder-worker | 6 | 8GB | 100G | local-lvm |
| game-control | 2 | 2GB | 20G | local-lvm |
| game-worker-hdd | 6 | 8GB | 200G | local-lvm (HDD) |
| game-worker-ssd | 10 | 8GB | 200G | game-ssd (NVMe) |
## Network Architecture
- All VMs on vmbr1 (10.10.10.0/24), DHCP
- WireGuard mesh via DO hub — all nodes have static WG IPs (10.0.0.0/24)
- Full mesh: all nodes have each other as explicit WireGuard peers (not just hub-and-spoke)
- K3s will use --flannel-iface=wg0 so all cluster traffic runs over WireGuard
- Caddy at DO hub proxies external traffic to any node's WG IP + NodePort
- Tailscale/Headscale abandoned — too unreliable for cluster networking
## Proxmox Host Specs
- pve: workstation i9-13900KF, 96GB RAM
- adder: Proxmox node with RTX 2070, 4TB NVMe available
- game: Proxmox node with RTX 2070, 16GB RAM, 256GB NVMe (game-ssd) + 2TB HDD (local-lvm)
## VM Provisioning
### Template & Clone Scripts
Scripts at `~/private/Knowledge/repos/homelab/proxmox/scripts/`:
- `create-debian-template.sh <VMID> <NAME> [STORAGE] [BRIDGE]`
- Defaults: STORAGE=local-lvm, BRIDGE=vmbr1
- Bakes in: qemu-guest-agent, curl, wget, nano, rsync, htop, tmux, emacs-nox, nfs-common, tailscale
- Zeroes /etc/machine-id, removes /etc/ssh/ssh_host_* (Cloud-Init regenerates on first boot)
- Does NOT create .ssh or set keys — done post-boot via qm set
- `clone-vm.sh <TEMPLATE_VMID> <NEW_VMID> <NAME> [CORES] [MEMORY_MB] [DISK_SIZE] [STORAGE]`
- Defaults: 2 cores, 2048MB RAM, 20G disk, local-lvm storage
- Full clone, auto-starts the VM
### Post-Clone Formula (confirmed working)
1. Clone: `./clone-vm.sh <template> <vmid> <name> [cores] [mem] [disk] [storage]`
2. Get IP: `qm guest cmd <vmid> network-get-interfaces`
3. Set SSH key: `qm set <vmid> --sshkeys <pubkey-file>`
4. Reboot VM: `qm reboot <vmid>`
5. SSH in: `ssh samantha@<ip>`
6. Configure WireGuard on the VM
### VMID Convention
- pve: 100-199 (templates at 199)
- adder: 200-299 (templates at 299 — currently 200 exists, destroy after use)
- game: 300-399 (templates at 399 — currently 300 exists, destroy after use)
### Useful Proxmox CLI
- `qm guest cmd <VMID> network-get-interfaces` — get VM IP
- `qm set <VMID> --vga std --delete serial0` — fix serial console
- `qm destroy <VMID> --purge` — remove VM
- `qm list` — list all VMs
- `vgs` — check local-lvm free space
- `pvesh get /nodes/<nodename>/status` — CPU/memory usage
## Immediate Next Steps
1. Install K3s on pve-control first (--cluster-init)
2. Join adder-control and game-control as control plane peers
3. Join all 4 workers
4. Label workers and GPU nodes
5. Create namespaces: sjasoft, fulfillment, privacy-practice
6. Migrate services from old VirtualBox cluster
## K3s Install — see k3s/README.md for full commands
- Control plane uses --cluster-init on first node, --server on subsequent nodes
- All nodes use --flannel-iface=wg0 and --node-ip=<wg-ip>
- Traefik disabled on all nodes
- 3 control plane nodes for HA etcd (tolerates 1 failure)
## Running Services (old VirtualBox cluster — not yet migrated)
- postgres:16 — ClusterIP:5432
- mariadb:11 — ClusterIP:3306
- ghost1/2/3 — NodePorts 32368/32369/32370
- forgejo:9 — NodePort 32371, git.sjasoft.com
## NodePort Registry
| Port | Service | Namespace |
|---|---|---|
| 32368 | ghost1 | fulfillment |
| 32369 | ghost2 | fulfillment |
| 32370 | ghost3 | fulfillment |
| 32371 | forgejo | sjasoft |
## Manifests
All in Knowledge/repos/homelab/k3s/:
- k3s/postgres/postgres.yaml
- k3s/mariadb/mariadb.yaml
- k3s/ghost/ghost.yaml
- k3s/forgejo/forgejo.yaml
- k3s/README.md (authoritative WG mesh table + K3s install commands)
## Remaining Services to Port (from Proxmox Docker stack)
- authentik.yml — SSO (postgres)
- n8n.yml — automation (postgres)
- vaultwarden.yml — passwords
- nats.yml — messaging
- monerod.yml — monero node
- snikket.yml — XMPP
- synapse.yml — Matrix
## Known Issues / Notes
- Tailscale/Headscale abandoned — unreliable, randomly drops nodes, requires manual reconnect
- WireGuard full mesh is the correct approach for K3s cluster networking
- kubectl requires KUBECONFIG=~/.kube/config in ~/.bashrc on control nodes
- Cross-namespace secrets not supported — keep secrets in same namespace as consumer
- game node only has 16GB RAM — allocate worker VMs conservatively
- game-ssd is only 256GB NVMe — keep disk allocations conservative on game-worker-ssd
- Templates should be destroyed after all clones are complete on each node