138 lines
6 KiB
Markdown
138 lines
6 KiB
Markdown
# K3s Session State
|
||
# Saved: 2026-04-14
|
||
|
||
## Current State
|
||
|
||
K3s v1.34.6 cluster fully operational on Proxmox VMs + KVM worker over WireGuard mesh.
|
||
fat_mama migrated from VirtualBox to KVM/libvirt on workstation 2026-04-14.
|
||
All Proxmox K3s VMs have onboot: 1 set (fixed 2026-04-12).
|
||
|
||
## Proxmox VMs
|
||
|
||
| Node | vmbr1 IP | WG IP | Proxmox Host | Role |
|
||
|---|---|---|---|---|
|
||
| pve-control | 10.10.10.151 | 10.0.0.6 | pve | k3s control plane |
|
||
| pve-worker | 10.10.10.126 | 10.0.0.7 | pve | k3s worker |
|
||
| adder-control | 10.10.10.185 | 10.0.0.8 | adder | k3s control plane |
|
||
| adder-worker | 10.10.10.83 | 10.0.0.9 | adder | k3s worker |
|
||
| game-control | 10.10.10.158 | 10.0.0.10 | game | k3s control plane |
|
||
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game | k3s worker (local-lvm/HDD) |
|
||
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game | k3s worker (game-ssd/NVMe) |
|
||
| fat_mama | 192.168.40.220 | 10.0.0.13 | workstation (KVM/libvirt, macvtap enp4s0) | k3s worker |
|
||
|
||
WG IPs 10.0.0.2–10.0.0.5 reserved (old VirtualBox nodes, do not reuse).
|
||
Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1
|
||
|
||
## VM Specs
|
||
|
||
| Node | vCPUs | RAM | Disk | Storage |
|
||
|---|---|---|---|---|
|
||
| pve-control | 2 | 2GB | 20G | local-lvm |
|
||
| pve-worker | 6 | 8GB | 100G | local-lvm |
|
||
| adder-control | 2 | 2GB | 20G | local-lvm |
|
||
| adder-worker | 6 | 8GB | 100G | local-lvm |
|
||
| game-control | 2 | 2GB | 20G | local-lvm |
|
||
| game-worker-hdd | 6 | 8GB | 200G | local-lvm (HDD) |
|
||
| game-worker-ssd | 10 | 8GB | 200G | game-ssd (NVMe) |
|
||
| fat_mama | 12 | 20GB | 200G | /var/lib/libvirt/images (qcow2) |
|
||
|
||
## Network Architecture
|
||
|
||
- Proxmox VMs on vmbr1 (10.10.10.0/24), DHCP
|
||
- fat_mama on LAN (192.168.40.0/24) via macvtap on enp4s0 — workstation host cannot directly ping/SSH to it; reachable from rest of LAN and via WireGuard at 10.0.0.13
|
||
- WireGuard mesh via DO hub — all nodes have static WG IPs (10.0.0.0/24)
|
||
- Full mesh: all nodes have each other as explicit WireGuard peers (not just hub-and-spoke)
|
||
- K3s uses --flannel-iface=wg0 so all cluster traffic runs over WireGuard
|
||
- Caddy at DO hub proxies external traffic to any node's WG IP + NodePort
|
||
- Tailscale/Headscale abandoned — too unreliable for cluster networking
|
||
|
||
## Proxmox Host Specs
|
||
|
||
- pve: Meerkat NUC, 64GB RAM, 4TB NVMe
|
||
- adder: Adder WS laptop, 32GB RAM, 2TB NVMe, RTX 2070
|
||
- game: old gaming PC, 16GB RAM, 256GB NVMe (game-ssd) + 2TB HDD (local-lvm)
|
||
- workstation: i9-13900KF, 96GB RAM, RTX 4090, Fedora (runs fat_mama via KVM/libvirt)
|
||
|
||
## VM Provisioning
|
||
|
||
### Template & Clone Scripts
|
||
Scripts at `~/private/Knowledge/repos/homelab/proxmox/scripts/`:
|
||
- `create-debian-template.sh <VMID> <n> [STORAGE] [BRIDGE]`
|
||
- Defaults: STORAGE=local-lvm, BRIDGE=vmbr1
|
||
- Bakes in: qemu-guest-agent, curl, wget, nano, rsync, htop, tmux, emacs-nox, nfs-common, tailscale
|
||
- Zeroes /etc/machine-id, removes /etc/ssh/ssh_host_* (Cloud-Init regenerates on first boot)
|
||
- Does NOT create .ssh or set keys — done post-boot via qm set
|
||
- `clone-vm.sh <TEMPLATE_VMID> <NEW_VMID> <n> [CORES] [MEMORY_MB] [DISK_SIZE] [STORAGE]`
|
||
- Defaults: 2 cores, 2048MB RAM, 20G disk, local-lvm storage
|
||
- Full clone, auto-starts the VM
|
||
|
||
### Post-Clone Formula (confirmed working)
|
||
1. Clone: `./clone-vm.sh <template> <vmid> <n> [cores] [mem] [disk] [storage]`
|
||
2. Get IP: `qm guest cmd <vmid> network-get-interfaces`
|
||
3. Set SSH key: `qm set <vmid> --sshkeys <pubkey-file>`
|
||
4. Reboot VM: `qm reboot <vmid>`
|
||
5. SSH in: `ssh samantha@<ip>`
|
||
6. Configure WireGuard on the VM
|
||
|
||
### VMID Convention
|
||
- pve: 100-199 (templates at 199)
|
||
- adder: 200-299 (templates at 299 — currently 200 exists, destroy after use)
|
||
- game: 300-399 (templates at 399 — currently 300 exists, destroy after use)
|
||
|
||
### Useful Proxmox CLI
|
||
- `qm guest cmd <VMID> network-get-interfaces` — get VM IP
|
||
- `qm set <VMID> --vga std --delete serial0` — fix serial console
|
||
- `qm destroy <VMID> --purge` — remove VM
|
||
- `qm list` — list all VMs
|
||
- `vgs` — check local-lvm free space
|
||
- `pvesh get /nodes/<nodename>/status` — CPU/memory usage
|
||
|
||
## K3s Install — see k3s/README.md for full commands
|
||
|
||
- Control plane uses --cluster-init on first node, --server on subsequent nodes
|
||
- All nodes use --flannel-iface=wg0 and --node-ip=<wg-ip>
|
||
- Traefik disabled on all nodes
|
||
- 3 control plane nodes for HA etcd (tolerates 1 failure)
|
||
|
||
## Running Services
|
||
|
||
| Service | NodePort | Domain | Namespace |
|
||
|---|---|---|---|
|
||
| ghost1 | 32368 | — | fulfillment |
|
||
| ghost2 | 32369 | — | fulfillment |
|
||
| ghost3 | 32370 | — | fulfillment |
|
||
| forgejo | 32371 | git.sjasoft.com | sjasoft |
|
||
| postgres | ClusterIP:5432 | — | default |
|
||
| mariadb | ClusterIP:3306 | — | default |
|
||
| authentik-server | — | — | default |
|
||
| authentik-worker | — | — | default |
|
||
| n8n | — | — | default |
|
||
| listmonk | — | — | default |
|
||
|
||
## Remaining Services to Deploy
|
||
- vaultwarden.yml — passwords (ACTIVE)
|
||
- mattermost.yml — chat (ACTIVE)
|
||
- nats.yml — messaging
|
||
- monerod.yml — monero node
|
||
- snikket.yml — XMPP
|
||
- synapse.yml — Matrix
|
||
|
||
## Manifests
|
||
|
||
All in Knowledge/repos/homelab/k3s/:
|
||
- k3s/postgres/postgres.yaml
|
||
- k3s/mariadb/mariadb.yaml
|
||
- k3s/ghost/ghost.yaml
|
||
- k3s/forgejo/forgejo.yaml
|
||
- k3s/README.md (authoritative WG mesh table + K3s install commands)
|
||
|
||
## Known Issues / Notes
|
||
- Tailscale/Headscale abandoned — unreliable, randomly drops nodes, requires manual reconnect
|
||
- WireGuard full mesh is the correct approach for K3s cluster networking
|
||
- kubectl requires KUBECONFIG=~/.kube/config in ~/.bashrc on control nodes
|
||
- Cross-namespace secrets not supported — keep secrets in same namespace as consumer
|
||
- game node only has 16GB RAM — allocate worker VMs conservatively
|
||
- game-ssd is only 256GB NVMe — keep disk allocations conservative on game-worker-ssd
|
||
- Templates should be destroyed after all clones are complete on each node
|
||
- fat_mama macvtap: workstation host cannot directly ping/SSH to fat_mama; reachable from rest of LAN and via WireGuard at 10.0.0.13; SSH from pve-control or other LAN machines works fine
|
||
- fat_mama disk image at /var/lib/libvirt/images/fat_mama.qcow2 on workstation
|