homelab/K3s-SESSION-STATE.md
Samantha Atkins b7c9dc81a0 cleanup
2026-04-17 20:33:17 -04:00

138 lines
6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# K3s Session State
# Saved: 2026-04-14
## Current State
K3s v1.34.6 cluster fully operational on Proxmox VMs + KVM worker over WireGuard mesh.
fat_mama migrated from VirtualBox to KVM/libvirt on workstation 2026-04-14.
All Proxmox K3s VMs have onboot: 1 set (fixed 2026-04-12).
## Proxmox VMs
| Node | vmbr1 IP | WG IP | Proxmox Host | Role |
|---|---|---|---|---|
| pve-control | 10.10.10.151 | 10.0.0.6 | pve | k3s control plane |
| pve-worker | 10.10.10.126 | 10.0.0.7 | pve | k3s worker |
| adder-control | 10.10.10.185 | 10.0.0.8 | adder | k3s control plane |
| adder-worker | 10.10.10.83 | 10.0.0.9 | adder | k3s worker |
| game-control | 10.10.10.158 | 10.0.0.10 | game | k3s control plane |
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game | k3s worker (local-lvm/HDD) |
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game | k3s worker (game-ssd/NVMe) |
| fat_mama | 192.168.40.220 | 10.0.0.13 | workstation (KVM/libvirt, macvtap enp4s0) | k3s worker |
WG IPs 10.0.0.210.0.0.5 reserved (old VirtualBox nodes, do not reuse).
Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1
## VM Specs
| Node | vCPUs | RAM | Disk | Storage |
|---|---|---|---|---|
| pve-control | 2 | 2GB | 20G | local-lvm |
| pve-worker | 6 | 8GB | 100G | local-lvm |
| adder-control | 2 | 2GB | 20G | local-lvm |
| adder-worker | 6 | 8GB | 100G | local-lvm |
| game-control | 2 | 2GB | 20G | local-lvm |
| game-worker-hdd | 6 | 8GB | 200G | local-lvm (HDD) |
| game-worker-ssd | 10 | 8GB | 200G | game-ssd (NVMe) |
| fat_mama | 12 | 20GB | 200G | /var/lib/libvirt/images (qcow2) |
## Network Architecture
- Proxmox VMs on vmbr1 (10.10.10.0/24), DHCP
- fat_mama on LAN (192.168.40.0/24) via macvtap on enp4s0 — workstation host cannot directly ping/SSH to it; reachable from rest of LAN and via WireGuard at 10.0.0.13
- WireGuard mesh via DO hub — all nodes have static WG IPs (10.0.0.0/24)
- Full mesh: all nodes have each other as explicit WireGuard peers (not just hub-and-spoke)
- K3s uses --flannel-iface=wg0 so all cluster traffic runs over WireGuard
- Caddy at DO hub proxies external traffic to any node's WG IP + NodePort
- Tailscale/Headscale abandoned — too unreliable for cluster networking
## Proxmox Host Specs
- pve: Meerkat NUC, 64GB RAM, 4TB NVMe
- adder: Adder WS laptop, 32GB RAM, 2TB NVMe, RTX 2070
- game: old gaming PC, 16GB RAM, 256GB NVMe (game-ssd) + 2TB HDD (local-lvm)
- workstation: i9-13900KF, 96GB RAM, RTX 4090, Fedora (runs fat_mama via KVM/libvirt)
## VM Provisioning
### Template & Clone Scripts
Scripts at `~/private/Knowledge/repos/homelab/proxmox/scripts/`:
- `create-debian-template.sh <VMID> <n> [STORAGE] [BRIDGE]`
- Defaults: STORAGE=local-lvm, BRIDGE=vmbr1
- Bakes in: qemu-guest-agent, curl, wget, nano, rsync, htop, tmux, emacs-nox, nfs-common, tailscale
- Zeroes /etc/machine-id, removes /etc/ssh/ssh_host_* (Cloud-Init regenerates on first boot)
- Does NOT create .ssh or set keys — done post-boot via qm set
- `clone-vm.sh <TEMPLATE_VMID> <NEW_VMID> <n> [CORES] [MEMORY_MB] [DISK_SIZE] [STORAGE]`
- Defaults: 2 cores, 2048MB RAM, 20G disk, local-lvm storage
- Full clone, auto-starts the VM
### Post-Clone Formula (confirmed working)
1. Clone: `./clone-vm.sh <template> <vmid> <n> [cores] [mem] [disk] [storage]`
2. Get IP: `qm guest cmd <vmid> network-get-interfaces`
3. Set SSH key: `qm set <vmid> --sshkeys <pubkey-file>`
4. Reboot VM: `qm reboot <vmid>`
5. SSH in: `ssh samantha@<ip>`
6. Configure WireGuard on the VM
### VMID Convention
- pve: 100-199 (templates at 199)
- adder: 200-299 (templates at 299 — currently 200 exists, destroy after use)
- game: 300-399 (templates at 399 — currently 300 exists, destroy after use)
### Useful Proxmox CLI
- `qm guest cmd <VMID> network-get-interfaces` — get VM IP
- `qm set <VMID> --vga std --delete serial0` — fix serial console
- `qm destroy <VMID> --purge` — remove VM
- `qm list` — list all VMs
- `vgs` — check local-lvm free space
- `pvesh get /nodes/<nodename>/status` — CPU/memory usage
## K3s Install — see k3s/README.md for full commands
- Control plane uses --cluster-init on first node, --server on subsequent nodes
- All nodes use --flannel-iface=wg0 and --node-ip=<wg-ip>
- Traefik disabled on all nodes
- 3 control plane nodes for HA etcd (tolerates 1 failure)
## Running Services
| Service | NodePort | Domain | Namespace |
|---|---|---|---|
| ghost1 | 32368 | — | fulfillment |
| ghost2 | 32369 | — | fulfillment |
| ghost3 | 32370 | — | fulfillment |
| forgejo | 32371 | git.sjasoft.com | sjasoft |
| postgres | ClusterIP:5432 | — | default |
| mariadb | ClusterIP:3306 | — | default |
| authentik-server | — | — | default |
| authentik-worker | — | — | default |
| n8n | — | — | default |
| listmonk | — | — | default |
## Remaining Services to Deploy
- vaultwarden.yml — passwords (ACTIVE)
- mattermost.yml — chat (ACTIVE)
- nats.yml — messaging
- monerod.yml — monero node
- snikket.yml — XMPP
- synapse.yml — Matrix
## Manifests
All in Knowledge/repos/homelab/k3s/:
- k3s/postgres/postgres.yaml
- k3s/mariadb/mariadb.yaml
- k3s/ghost/ghost.yaml
- k3s/forgejo/forgejo.yaml
- k3s/README.md (authoritative WG mesh table + K3s install commands)
## Known Issues / Notes
- Tailscale/Headscale abandoned — unreliable, randomly drops nodes, requires manual reconnect
- WireGuard full mesh is the correct approach for K3s cluster networking
- kubectl requires KUBECONFIG=~/.kube/config in ~/.bashrc on control nodes
- Cross-namespace secrets not supported — keep secrets in same namespace as consumer
- game node only has 16GB RAM — allocate worker VMs conservatively
- game-ssd is only 256GB NVMe — keep disk allocations conservative on game-worker-ssd
- Templates should be destroyed after all clones are complete on each node
- fat_mama macvtap: workstation host cannot directly ping/SSH to fat_mama; reachable from rest of LAN and via WireGuard at 10.0.0.13; SSH from pve-control or other LAN machines works fine
- fat_mama disk image at /var/lib/libvirt/images/fat_mama.qcow2 on workstation