homelab/k3s/README.md
2026-04-18 18:28:55 -04:00

273 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# K3s Cluster — Setup & Deployment Notes
This is the production cluster running on Proxmox VMs, connected via WireGuard hub-and-spoke.
The VirtualBox learning cluster this replaced is retired.
---
## WireGuard Mesh — Node Assignments
Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1/24
| Node | vmbr1 IP | WG IP | Proxmox Host |
|---|---|---|---|
| pve-control | 10.10.10.151 | 10.0.0.6 | pve |
| pve-worker | 10.10.10.126 | 10.0.0.7 | pve |
| adder-control | 10.10.10.185 | 10.0.0.8 | adder |
| adder-worker | 10.10.10.83 | 10.0.0.9 | adder |
| game-control | 10.10.10.158 | 10.0.0.10 | game |
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game |
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game |
| fat_mama | 192.168.40.220 | 10.0.0.13 | workstation (VBox, bridged LAN) |
IPs 10.0.0.210.0.0.5 are reserved (old VirtualBox K3s nodes, leave alone).
All VMs are Debian Trixie on vmbr1 (10.10.10.0/24). Inter-node traffic runs over WireGuard (10.0.0.0/24).
---
## K3s Install
### Prerequisites — each VM must be on the WireGuard mesh first
WireGuard is configured via wg0.conf on each node (hub-and-spoke through DO droplet).
Verify connectivity: `ping 10.0.0.1` from the node.
### First control plane node (cluster init)
```bash
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--cluster-init --disable traefik \
--node-ip=<10.0.0.x> --flannel-iface=wg0" sh -
# Get token for other nodes to join
sudo cat /var/lib/rancher/k3s/server/node-token
```
### Second and third control plane nodes
```bash
curl -sfL https://get.k3s.io | K3S_URL=https://<control-1-mesh-ip>:6443 K3S_TOKEN=<token> \
INSTALL_K3S_EXEC="--server https://<control-1-mesh-ip>:6443 --disable traefik \
--node-ip=<this-node-mesh-ip> --flannel-iface=wg0" sh -
```
Note: use `--server` not just `K3S_URL` — this is what makes it a control plane peer, not a worker.
etcd requires odd numbers — 3 control nodes tolerates 1 failure. Never stop at 2.
### Workers
```bash
curl -sfL https://get.k3s.io | K3S_URL=https://<any-control-mesh-ip>:6443 K3S_TOKEN=<token> \
INSTALL_K3S_EXEC="--node-ip=<this-node-mesh-ip> --flannel-iface=wg0" sh -
```
### kubeconfig for normal user (on any control node)
```bash
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown samantha:samantha ~/.kube/config
export KUBECONFIG=~/.kube/config # also add to ~/.bashrc
# Update server IP in config if needed:
sed -i 's/127.0.0.1/<control-1-mesh-ip>/' ~/.kube/config
```
### Label workers
```bash
kubectl label node <name> node-role.kubernetes.io/worker=worker
```
---
## GPU Worker Nodes — adder and game
Both Proxmox hosts `adder` and `game` have RTX 2070 GPUs available for PCIe passthrough.
### Proxmox PCIe passthrough setup (on each Proxmox host)
```bash
# Enable IOMMU in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
# (use amd_iommu=on for AMD hosts)
update-grub
reboot
# Blacklist nvidia drivers on host so GPU is free for passthrough:
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u
reboot
```
In Proxmox UI: VM Hardware → Add → PCI Device → select the RTX 2070 → check "All Functions" and "Primary GPU" if it is the only GPU.
### Inside the GPU worker VM — install NVIDIA drivers
```bash
apt-get install -y linux-headers-$(uname -r)
# Add non-free repo if needed:
apt-get install -y nvidia-driver firmware-misc-nonfree
reboot
# Verify:
nvidia-smi
```
### Install NVIDIA device plugin in K3s
```bash
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
```
### Label GPU nodes
```bash
kubectl label node k3s-adder nvidia.com/gpu=true
kubectl label node k3s-game nvidia.com/gpu=true
```
### Verify GPU is schedulable
```bash
kubectl get nodes -o json | jq '.items[].status.capacity'
# Should show nvidia.com/gpu: "1" on adder and game
```
### Scheduling a workload to a GPU node
```yaml
resources:
limits:
nvidia.com/gpu: 1
```
---
## Namespaces — one per venture
```bash
kubectl create namespace sjasoft
kubectl create namespace fulfillment
kubectl create namespace privacy-practice
```
Secrets are always created per namespace — never share secrets across namespaces.
---
## Secrets
Never stored in files with real values. Always create directly on a control node.
```bash
# Pattern — adapt per service and namespace
kubectl create secret generic <name> \
--namespace <namespace> \
--from-literal=<key>='<value>'
# Generate passwords with:
openssl rand -base64 24
```
---
## NodePort Registry
NodePorts must be unique across the entire cluster (range 30000-32767).
Any NodePort is reachable on any node's WireGuard IP — K3s routes internally.
Caddy on each venture ingress VPS proxies to any node's WG IP + NodePort.
| Port | Service | Notes |
|---|---|---|
| 32368 | ghost1 | blog.the-fulfillment.org |
| 32369 | ghost2 | blog.privacy-practice.com |
| 32370 | ghost3 | blog.sjasoft.com |
| 32371 | forgejo | git.sjasoft.com |
| 32372 | authentik (HTTP) | auth.sjasoft.com — use this behind Caddy |
| 32373 | authentik (HTTPS) | skip — Caddy handles TLS |
| 32374 | mattermost | planned |
| 32375 | listmonk | deployed |
| 32376 | n8n | deployed |
| 32377 | vaultwarden | planned |
| 32379 | monerod (RPC) | planned |
| 32380 | monerod (P2P) | planned |
| 32381 | snikket (HTTP) | planned |
| 32382 | snikket (C2S) | planned |
| 32383 | snikket (S2S) | planned |
| 32384 | snikket (proxy65) | planned |
| 32385 | synapse | planned |
| 32386 | nats (client) | planned |
| 32387 | nats (websocket) | planned |
| 32388 | nats (monitoring) | planned |
| 32389 | nats (leafnode) | planned |
| 32390 | garage (S3 API) | deployed |
| 32391 | garage-webui | deployed |
| 32392 | mediawiki | deployed |
---
## Caddy Pattern — venture ingress VPS
Each venture has its own ingress VPS with its own public IP. Caddy on each proxies
to a different node's mesh IP for the same cluster — ventures look unrelated from outside.
```
# Example — any node's WG IP works for any NodePort
blog.the-fulfillment.org {
reverse_proxy 10.0.0.6:32368
}
git.sjasoft.com {
reverse_proxy 10.0.0.8:32371
}
auth.sjasoft.com {
reverse_proxy 10.0.0.10:32372
}
```
Pick any node's WG IP per service — they all work. Use different nodes per venture
so ventures look unrelated from outside. See the WireGuard mesh table above for IPs.
---
## Current Deployment Status (2026-04-16)
K3s v1.34.6 cluster fully operational. WireGuard full mesh (direct peer-to-peer over vmbr1,
hub for external traffic). Headscale removed — too buggy (0.28.x dropped nodes randomly).
### Cluster Nodes
| Node | Role | WG IP | Proxmox Host | Resources |
|---|---|---|---|---|
| pve-control | control-plane, etcd | 10.0.0.6 | pve | 2 CPU, 2GB RAM, 20GB |
| pve-worker | worker | 10.0.0.7 | pve | 8 CPU, 58GB RAM, 3.3TB |
| adder-control | control-plane, etcd | 10.0.0.8 | adder | 2 CPU, 2GB RAM, 20GB |
| adder-worker | worker | 10.0.0.9 | adder | 10 CPU, 58GB RAM, 1.7TB |
| game-control | control-plane, etcd | 10.0.0.10 | game | 2 CPU, 2GB RAM, 20GB |
| game-worker-hdd | worker | 10.0.0.11 | game | 4 CPU, 6GB RAM, 1.4TB HDD |
| game-worker-ssd | worker | 10.0.0.12 | game | 10 CPU, 8GB RAM, 200GB SSD |
| fat_mama | worker | 10.0.0.13 | workstation (VBox) | 20 CPU, 21GB RAM, 200GB |
### Running Services
Scheduler-assigned node in parens reflects current placement (unpinned services may
move on restart). Pinned services have `nodeName` in their manifest.
| Service | Node | NodePort | Domain | Status |
|---|---|---|---|---|
| postgres:16 | pve-worker (pinned) | ClusterIP | — | running |
| mariadb:11 | adder-worker (pinned) | ClusterIP | — | running |
| ghost1 | unpinned (game-worker-ssd) | 32368 | blog.the-fulfillment.org | running |
| ghost2 | unpinned (pve-worker) | 32369 | blog.privacy-practice.com | running |
| ghost3 | unpinned (adder-worker) | 32370 | blog.sjasoft.com | running |
| forgejo:9 | unpinned (pve-worker) | 32371 | git.sjasoft.com | running |
| authentik server | unpinned (adder-worker) | 32372 | auth.sjasoft.com | running |
| authentik worker | unpinned (adder-worker) | — | — | running |
| listmonk | unpinned (pve-worker) | 32375 | — | running |
| n8n | unpinned (game-worker-ssd) | 32376 | — | running |
### Remaining Services to Deploy
nats, vaultwarden, synapse, snikket, monerod, mattermost
### Next Steps
- Add VirtualBox workstation VMs as workers to this cluster
- Wire up remaining Ghost blogs in Caddy
- Deploy remaining services from k3s/ manifests
### Install Method
K3s was installed using `/etc/rancher/k3s/config.yaml` on each node (not INSTALL_K3S_EXEC env vars,
which get lost in nested SSH). Binary was downloaded once to pve and distributed via scp.
Use `INSTALL_K3S_SKIP_DOWNLOAD=true` when binary is pre-staged.