6 KiB
6 KiB
K3s Session State
Saved: 2026-04-14
Current State
K3s v1.34.6 cluster fully operational on Proxmox VMs + KVM worker over WireGuard mesh. fat_mama migrated from VirtualBox to KVM/libvirt on workstation 2026-04-14. All Proxmox K3s VMs have onboot: 1 set (fixed 2026-04-12).
Proxmox VMs
| Node | vmbr1 IP | WG IP | Proxmox Host | Role |
|---|---|---|---|---|
| pve-control | 10.10.10.151 | 10.0.0.6 | pve | k3s control plane |
| pve-worker | 10.10.10.126 | 10.0.0.7 | pve | k3s worker |
| adder-control | 10.10.10.185 | 10.0.0.8 | adder | k3s control plane |
| adder-worker | 10.10.10.83 | 10.0.0.9 | adder | k3s worker |
| game-control | 10.10.10.158 | 10.0.0.10 | game | k3s control plane |
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game | k3s worker (local-lvm/HDD) |
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game | k3s worker (game-ssd/NVMe) |
| fat_mama | 192.168.40.220 | 10.0.0.13 | workstation (KVM/libvirt, macvtap enp4s0) | k3s worker |
WG IPs 10.0.0.2–10.0.0.5 reserved (old VirtualBox nodes, do not reuse). Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1
VM Specs
| Node | vCPUs | RAM | Disk | Storage |
|---|---|---|---|---|
| pve-control | 2 | 2GB | 20G | local-lvm |
| pve-worker | 6 | 8GB | 100G | local-lvm |
| adder-control | 2 | 2GB | 20G | local-lvm |
| adder-worker | 6 | 8GB | 100G | local-lvm |
| game-control | 2 | 2GB | 20G | local-lvm |
| game-worker-hdd | 6 | 8GB | 200G | local-lvm (HDD) |
| game-worker-ssd | 10 | 8GB | 200G | game-ssd (NVMe) |
| fat_mama | 12 | 20GB | 200G | /var/lib/libvirt/images (qcow2) |
Network Architecture
- Proxmox VMs on vmbr1 (10.10.10.0/24), DHCP
- fat_mama on LAN (192.168.40.0/24) via macvtap on enp4s0 — workstation host cannot directly ping/SSH to it; reachable from rest of LAN and via WireGuard at 10.0.0.13
- WireGuard mesh via DO hub — all nodes have static WG IPs (10.0.0.0/24)
- Full mesh: all nodes have each other as explicit WireGuard peers (not just hub-and-spoke)
- K3s uses --flannel-iface=wg0 so all cluster traffic runs over WireGuard
- Caddy at DO hub proxies external traffic to any node's WG IP + NodePort
- Tailscale/Headscale abandoned — too unreliable for cluster networking
Proxmox Host Specs
- pve: Meerkat NUC, 64GB RAM, 4TB NVMe
- adder: Adder WS laptop, 32GB RAM, 2TB NVMe, RTX 2070
- game: old gaming PC, 16GB RAM, 256GB NVMe (game-ssd) + 2TB HDD (local-lvm)
- workstation: i9-13900KF, 96GB RAM, RTX 4090, Fedora (runs fat_mama via KVM/libvirt)
VM Provisioning
Template & Clone Scripts
Scripts at ~/private/Knowledge/repos/homelab/proxmox/scripts/:
create-debian-template.sh <VMID> <n> [STORAGE] [BRIDGE]- Defaults: STORAGE=local-lvm, BRIDGE=vmbr1
- Bakes in: qemu-guest-agent, curl, wget, nano, rsync, htop, tmux, emacs-nox, nfs-common, tailscale
- Zeroes /etc/machine-id, removes /etc/ssh/ssh_host_* (Cloud-Init regenerates on first boot)
- Does NOT create .ssh or set keys — done post-boot via qm set
clone-vm.sh <TEMPLATE_VMID> <NEW_VMID> <n> [CORES] [MEMORY_MB] [DISK_SIZE] [STORAGE]- Defaults: 2 cores, 2048MB RAM, 20G disk, local-lvm storage
- Full clone, auto-starts the VM
Post-Clone Formula (confirmed working)
- Clone:
./clone-vm.sh <template> <vmid> <n> [cores] [mem] [disk] [storage] - Get IP:
qm guest cmd <vmid> network-get-interfaces - Set SSH key:
qm set <vmid> --sshkeys <pubkey-file> - Reboot VM:
qm reboot <vmid> - SSH in:
ssh samantha@<ip> - Configure WireGuard on the VM
VMID Convention
- pve: 100-199 (templates at 199)
- adder: 200-299 (templates at 299 — currently 200 exists, destroy after use)
- game: 300-399 (templates at 399 — currently 300 exists, destroy after use)
Useful Proxmox CLI
qm guest cmd <VMID> network-get-interfaces— get VM IPqm set <VMID> --vga std --delete serial0— fix serial consoleqm destroy <VMID> --purge— remove VMqm list— list all VMsvgs— check local-lvm free spacepvesh get /nodes/<nodename>/status— CPU/memory usage
K3s Install — see k3s/README.md for full commands
- Control plane uses --cluster-init on first node, --server on subsequent nodes
- All nodes use --flannel-iface=wg0 and --node-ip=
- Traefik disabled on all nodes
- 3 control plane nodes for HA etcd (tolerates 1 failure)
Running Services
| Service | NodePort | Domain | Namespace |
|---|---|---|---|
| ghost1 | 32368 | — | fulfillment |
| ghost2 | 32369 | — | fulfillment |
| ghost3 | 32370 | — | fulfillment |
| forgejo | 32371 | git.sjasoft.com | sjasoft |
| postgres | ClusterIP:5432 | — | default |
| mariadb | ClusterIP:3306 | — | default |
| authentik-server | — | — | default |
| authentik-worker | — | — | default |
| n8n | — | — | default |
| listmonk | — | — | default |
Remaining Services to Deploy
- vaultwarden.yml — passwords (ACTIVE)
- mattermost.yml — chat (ACTIVE)
- nats.yml — messaging
- monerod.yml — monero node
- snikket.yml — XMPP
- synapse.yml — Matrix
Manifests
All in Knowledge/repos/homelab/k3s/:
- k3s/postgres/postgres.yaml
- k3s/mariadb/mariadb.yaml
- k3s/ghost/ghost.yaml
- k3s/forgejo/forgejo.yaml
- k3s/README.md (authoritative WG mesh table + K3s install commands)
Known Issues / Notes
- Tailscale/Headscale abandoned — unreliable, randomly drops nodes, requires manual reconnect
- WireGuard full mesh is the correct approach for K3s cluster networking
- kubectl requires KUBECONFIG=~/.kube/config in ~/.bashrc on control nodes
- Cross-namespace secrets not supported — keep secrets in same namespace as consumer
- game node only has 16GB RAM — allocate worker VMs conservatively
- game-ssd is only 256GB NVMe — keep disk allocations conservative on game-worker-ssd
- Templates should be destroyed after all clones are complete on each node
- fat_mama macvtap: workstation host cannot directly ping/SSH to fat_mama; reachable from rest of LAN and via WireGuard at 10.0.0.13; SSH from pve-control or other LAN machines works fine
- fat_mama disk image at /var/lib/libvirt/images/fat_mama.qcow2 on workstation