homelab/K3s-SESSION-STATE.md
Samantha Atkins b7c9dc81a0 cleanup
2026-04-17 20:33:17 -04:00

6 KiB
Raw Blame History

K3s Session State

Saved: 2026-04-14

Current State

K3s v1.34.6 cluster fully operational on Proxmox VMs + KVM worker over WireGuard mesh. fat_mama migrated from VirtualBox to KVM/libvirt on workstation 2026-04-14. All Proxmox K3s VMs have onboot: 1 set (fixed 2026-04-12).

Proxmox VMs

Node vmbr1 IP WG IP Proxmox Host Role
pve-control 10.10.10.151 10.0.0.6 pve k3s control plane
pve-worker 10.10.10.126 10.0.0.7 pve k3s worker
adder-control 10.10.10.185 10.0.0.8 adder k3s control plane
adder-worker 10.10.10.83 10.0.0.9 adder k3s worker
game-control 10.10.10.158 10.0.0.10 game k3s control plane
game-worker-hdd 10.10.10.186 10.0.0.11 game k3s worker (local-lvm/HDD)
game-worker-ssd 10.10.10.153 10.0.0.12 game k3s worker (game-ssd/NVMe)
fat_mama 192.168.40.220 10.0.0.13 workstation (KVM/libvirt, macvtap enp4s0) k3s worker

WG IPs 10.0.0.210.0.0.5 reserved (old VirtualBox nodes, do not reuse). Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1

VM Specs

Node vCPUs RAM Disk Storage
pve-control 2 2GB 20G local-lvm
pve-worker 6 8GB 100G local-lvm
adder-control 2 2GB 20G local-lvm
adder-worker 6 8GB 100G local-lvm
game-control 2 2GB 20G local-lvm
game-worker-hdd 6 8GB 200G local-lvm (HDD)
game-worker-ssd 10 8GB 200G game-ssd (NVMe)
fat_mama 12 20GB 200G /var/lib/libvirt/images (qcow2)

Network Architecture

  • Proxmox VMs on vmbr1 (10.10.10.0/24), DHCP
  • fat_mama on LAN (192.168.40.0/24) via macvtap on enp4s0 — workstation host cannot directly ping/SSH to it; reachable from rest of LAN and via WireGuard at 10.0.0.13
  • WireGuard mesh via DO hub — all nodes have static WG IPs (10.0.0.0/24)
  • Full mesh: all nodes have each other as explicit WireGuard peers (not just hub-and-spoke)
  • K3s uses --flannel-iface=wg0 so all cluster traffic runs over WireGuard
  • Caddy at DO hub proxies external traffic to any node's WG IP + NodePort
  • Tailscale/Headscale abandoned — too unreliable for cluster networking

Proxmox Host Specs

  • pve: Meerkat NUC, 64GB RAM, 4TB NVMe
  • adder: Adder WS laptop, 32GB RAM, 2TB NVMe, RTX 2070
  • game: old gaming PC, 16GB RAM, 256GB NVMe (game-ssd) + 2TB HDD (local-lvm)
  • workstation: i9-13900KF, 96GB RAM, RTX 4090, Fedora (runs fat_mama via KVM/libvirt)

VM Provisioning

Template & Clone Scripts

Scripts at ~/private/Knowledge/repos/homelab/proxmox/scripts/:

  • create-debian-template.sh <VMID> <n> [STORAGE] [BRIDGE]
    • Defaults: STORAGE=local-lvm, BRIDGE=vmbr1
    • Bakes in: qemu-guest-agent, curl, wget, nano, rsync, htop, tmux, emacs-nox, nfs-common, tailscale
    • Zeroes /etc/machine-id, removes /etc/ssh/ssh_host_* (Cloud-Init regenerates on first boot)
    • Does NOT create .ssh or set keys — done post-boot via qm set
  • clone-vm.sh <TEMPLATE_VMID> <NEW_VMID> <n> [CORES] [MEMORY_MB] [DISK_SIZE] [STORAGE]
    • Defaults: 2 cores, 2048MB RAM, 20G disk, local-lvm storage
    • Full clone, auto-starts the VM

Post-Clone Formula (confirmed working)

  1. Clone: ./clone-vm.sh <template> <vmid> <n> [cores] [mem] [disk] [storage]
  2. Get IP: qm guest cmd <vmid> network-get-interfaces
  3. Set SSH key: qm set <vmid> --sshkeys <pubkey-file>
  4. Reboot VM: qm reboot <vmid>
  5. SSH in: ssh samantha@<ip>
  6. Configure WireGuard on the VM

VMID Convention

  • pve: 100-199 (templates at 199)
  • adder: 200-299 (templates at 299 — currently 200 exists, destroy after use)
  • game: 300-399 (templates at 399 — currently 300 exists, destroy after use)

Useful Proxmox CLI

  • qm guest cmd <VMID> network-get-interfaces — get VM IP
  • qm set <VMID> --vga std --delete serial0 — fix serial console
  • qm destroy <VMID> --purge — remove VM
  • qm list — list all VMs
  • vgs — check local-lvm free space
  • pvesh get /nodes/<nodename>/status — CPU/memory usage

K3s Install — see k3s/README.md for full commands

  • Control plane uses --cluster-init on first node, --server on subsequent nodes
  • All nodes use --flannel-iface=wg0 and --node-ip=
  • Traefik disabled on all nodes
  • 3 control plane nodes for HA etcd (tolerates 1 failure)

Running Services

Service NodePort Domain Namespace
ghost1 32368 fulfillment
ghost2 32369 fulfillment
ghost3 32370 fulfillment
forgejo 32371 git.sjasoft.com sjasoft
postgres ClusterIP:5432 default
mariadb ClusterIP:3306 default
authentik-server default
authentik-worker default
n8n default
listmonk default

Remaining Services to Deploy

  • vaultwarden.yml — passwords (ACTIVE)
  • mattermost.yml — chat (ACTIVE)
  • nats.yml — messaging
  • monerod.yml — monero node
  • snikket.yml — XMPP
  • synapse.yml — Matrix

Manifests

All in Knowledge/repos/homelab/k3s/:

  • k3s/postgres/postgres.yaml
  • k3s/mariadb/mariadb.yaml
  • k3s/ghost/ghost.yaml
  • k3s/forgejo/forgejo.yaml
  • k3s/README.md (authoritative WG mesh table + K3s install commands)

Known Issues / Notes

  • Tailscale/Headscale abandoned — unreliable, randomly drops nodes, requires manual reconnect
  • WireGuard full mesh is the correct approach for K3s cluster networking
  • kubectl requires KUBECONFIG=~/.kube/config in ~/.bashrc on control nodes
  • Cross-namespace secrets not supported — keep secrets in same namespace as consumer
  • game node only has 16GB RAM — allocate worker VMs conservatively
  • game-ssd is only 256GB NVMe — keep disk allocations conservative on game-worker-ssd
  • Templates should be destroyed after all clones are complete on each node
  • fat_mama macvtap: workstation host cannot directly ping/SSH to fat_mama; reachable from rest of LAN and via WireGuard at 10.0.0.13; SSH from pve-control or other LAN machines works fine
  • fat_mama disk image at /var/lib/libvirt/images/fat_mama.qcow2 on workstation