homelab/k3s/README.md

# K3s Cluster — Setup & Deployment Notes

This is the production cluster running on Proxmox VMs, connected via WireGuard hub-and-spoke.
The VirtualBox learning cluster this replaced is retired.

---

## WireGuard Mesh — Node Assignments

Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1/24

| Node | vmbr1 IP | WG IP | Proxmox Host |
|---|---|---|---|
| pve-control | 10.10.10.151 | 10.0.0.6 | pve |
| pve-worker | 10.10.10.126 | 10.0.0.7 | pve |
| adder-control | 10.10.10.185 | 10.0.0.8 | adder |
| adder-worker | 10.10.10.83 | 10.0.0.9 | adder |
| game-control | 10.10.10.158 | 10.0.0.10 | game |
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game |
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game |
| fat_mama | 192.168.40.220 | 10.0.0.13 | workstation (VBox, bridged LAN) |

IPs 10.0.0.2–10.0.0.5 are reserved (old VirtualBox K3s nodes, leave alone).

All VMs are Debian Trixie on vmbr1 (10.10.10.0/24). Inter-node traffic runs over WireGuard (10.0.0.0/24).

---

## K3s Install

### Prerequisites — each VM must be on the WireGuard mesh first

WireGuard is configured via wg0.conf on each node (hub-and-spoke through DO droplet).
Verify connectivity: `ping 10.0.0.1` from the node.

### First control plane node (cluster init)
```bash
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--cluster-init --disable traefik \
  --node-ip=<10.0.0.x> --flannel-iface=wg0" sh -

# Get token for other nodes to join
sudo cat /var/lib/rancher/k3s/server/node-token
```

### Second and third control plane nodes
```bash
curl -sfL https://get.k3s.io | K3S_URL=https://<control-1-mesh-ip>:6443 K3S_TOKEN=<token> \
  INSTALL_K3S_EXEC="--server https://<control-1-mesh-ip>:6443 --disable traefik \
  --node-ip=<this-node-mesh-ip> --flannel-iface=wg0" sh -
```

Note: use `--server` not just `K3S_URL` — this is what makes it a control plane peer, not a worker.
etcd requires odd numbers — 3 control nodes tolerates 1 failure. Never stop at 2.

### Workers
```bash
curl -sfL https://get.k3s.io | K3S_URL=https://<any-control-mesh-ip>:6443 K3S_TOKEN=<token> \
  INSTALL_K3S_EXEC="--node-ip=<this-node-mesh-ip> --flannel-iface=wg0" sh -
```

### kubeconfig for normal user (on any control node)
```bash
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown samantha:samantha ~/.kube/config
export KUBECONFIG=~/.kube/config   # also add to ~/.bashrc
# Update server IP in config if needed:
sed -i 's/127.0.0.1/<control-1-mesh-ip>/' ~/.kube/config
```

### Label workers
```bash
kubectl label node <name> node-role.kubernetes.io/worker=worker
```

---

## GPU Worker Nodes — adder and game

Both Proxmox hosts `adder` and `game` have RTX 2070 GPUs available for PCIe passthrough.

### Proxmox PCIe passthrough setup (on each Proxmox host)
```bash
# Enable IOMMU in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
# (use amd_iommu=on for AMD hosts)
update-grub
reboot

# Blacklist nvidia drivers on host so GPU is free for passthrough:
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u
reboot
```

In Proxmox UI: VM Hardware → Add → PCI Device → select the RTX 2070 → check "All Functions" and "Primary GPU" if it is the only GPU.

### Inside the GPU worker VM — install NVIDIA drivers
```bash
apt-get install -y linux-headers-$(uname -r)
# Add non-free repo if needed:
apt-get install -y nvidia-driver firmware-misc-nonfree
reboot
# Verify:
nvidia-smi
```

### Install NVIDIA device plugin in K3s
```bash
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
```

### Label GPU nodes
```bash
kubectl label node k3s-adder nvidia.com/gpu=true
kubectl label node k3s-game nvidia.com/gpu=true
```

### Verify GPU is schedulable
```bash
kubectl get nodes -o json | jq '.items[].status.capacity'
# Should show nvidia.com/gpu: "1" on adder and game
```

### Scheduling a workload to a GPU node
```yaml
resources:
  limits:
    nvidia.com/gpu: 1
```

---

## Namespaces — one per venture

```bash
kubectl create namespace sjasoft
kubectl create namespace fulfillment
kubectl create namespace privacy-practice
```

Secrets are always created per namespace — never share secrets across namespaces.

---

## Secrets

Never stored in files with real values. Always create directly on a control node.

```bash
# Pattern — adapt per service and namespace
kubectl create secret generic <name> \
  --namespace <namespace> \
  --from-literal=<key>='<value>'

# Generate passwords with:
openssl rand -base64 24
```

---

## NodePort Registry

NodePorts must be unique across the entire cluster (range 30000-32767).
Any NodePort is reachable on any node's WireGuard IP — K3s routes internally.
Caddy on each venture ingress VPS proxies to any node's WG IP + NodePort.

| Port | Service | Notes |
|---|---|---|
| 32368 | ghost1 | blog.the-fulfillment.org |
| 32369 | ghost2 | blog.privacy-practice.com |
| 32370 | ghost3 | blog.sjasoft.com |
| 32371 | forgejo | git.sjasoft.com |
| 32372 | authentik (HTTP) | auth.sjasoft.com — use this behind Caddy |
| 32373 | authentik (HTTPS) | skip — Caddy handles TLS |
| 32374 | mattermost | planned |
| 32375 | listmonk | deployed |
| 32376 | n8n | deployed |
| 32377 | vaultwarden | planned |
| 32379 | monerod (RPC) | planned |
| 32380 | monerod (P2P) | planned |
| 32381 | snikket (HTTP) | planned |
| 32382 | snikket (C2S) | planned |
| 32383 | snikket (S2S) | planned |
| 32384 | snikket (proxy65) | planned |
| 32385 | synapse | planned |
| 32386 | nats (client) | planned |
| 32387 | nats (websocket) | planned |
| 32388 | nats (monitoring) | planned |
| 32389 | nats (leafnode) | planned |
| 32390 | garage (S3 API) | deployed |
| 32391 | garage-webui | deployed |
| 32392 | mediawiki | deployed |

---

## Caddy Pattern — venture ingress VPS

Each venture has its own ingress VPS with its own public IP. Caddy on each proxies
to a different node's mesh IP for the same cluster — ventures look unrelated from outside.

```
# Example — any node's WG IP works for any NodePort
blog.the-fulfillment.org {
    reverse_proxy 10.0.0.6:32368
}

git.sjasoft.com {
    reverse_proxy 10.0.0.8:32371
}

auth.sjasoft.com {
    reverse_proxy 10.0.0.10:32372
}
```

Pick any node's WG IP per service — they all work. Use different nodes per venture
so ventures look unrelated from outside. See the WireGuard mesh table above for IPs.

---

## Current Deployment Status (2026-04-16)

K3s v1.34.6 cluster fully operational. WireGuard full mesh (direct peer-to-peer over vmbr1,
hub for external traffic). Headscale removed — too buggy (0.28.x dropped nodes randomly).

### Cluster Nodes

| Node | Role | WG IP | Proxmox Host | Resources |
|---|---|---|---|---|
| pve-control | control-plane, etcd | 10.0.0.6 | pve | 2 CPU, 2GB RAM, 20GB |
| pve-worker | worker | 10.0.0.7 | pve | 8 CPU, 58GB RAM, 3.3TB |
| adder-control | control-plane, etcd | 10.0.0.8 | adder | 2 CPU, 2GB RAM, 20GB |
| adder-worker | worker | 10.0.0.9 | adder | 10 CPU, 58GB RAM, 1.7TB |
| game-control | control-plane, etcd | 10.0.0.10 | game | 2 CPU, 2GB RAM, 20GB |
| game-worker-hdd | worker | 10.0.0.11 | game | 4 CPU, 6GB RAM, 1.4TB HDD |
| game-worker-ssd | worker | 10.0.0.12 | game | 10 CPU, 8GB RAM, 200GB SSD |
| fat_mama | worker | 10.0.0.13 | workstation (VBox) | 20 CPU, 21GB RAM, 200GB |

### Running Services

Scheduler-assigned node in parens reflects current placement (unpinned services may
move on restart). Pinned services have `nodeName` in their manifest.

| Service | Node | NodePort | Domain | Status |
|---|---|---|---|---|
| postgres:16 | pve-worker (pinned) | ClusterIP | — | running |
| mariadb:11 | adder-worker (pinned) | ClusterIP | — | running |
| ghost1 | unpinned (game-worker-ssd) | 32368 | blog.the-fulfillment.org | running |
| ghost2 | unpinned (pve-worker) | 32369 | blog.privacy-practice.com | running |
| ghost3 | unpinned (adder-worker) | 32370 | blog.sjasoft.com | running |
| forgejo:9 | unpinned (pve-worker) | 32371 | git.sjasoft.com | running |
| authentik server | unpinned (adder-worker) | 32372 | auth.sjasoft.com | running |
| authentik worker | unpinned (adder-worker) | — | — | running |
| listmonk | unpinned (pve-worker) | 32375 | — | running |
| n8n | unpinned (game-worker-ssd) | 32376 | — | running |

### Remaining Services to Deploy

nats, vaultwarden, synapse, snikket, monerod, mattermost

### Next Steps

- Add VirtualBox workstation VMs as workers to this cluster
- Wire up remaining Ghost blogs in Caddy
- Deploy remaining services from k3s/ manifests

### Install Method

K3s was installed using `/etc/rancher/k3s/config.yaml` on each node (not INSTALL_K3S_EXEC env vars,
which get lost in nested SSH). Binary was downloaded once to pve and distributed via scp.
Use `INSTALL_K3S_SKIP_DOWNLOAD=true` when binary is pre-staged.