Replaced Headscale (too buggy in 0.28.x — random node drops) with direct WireGuard hub-and-spoke + full mesh. 7 Proxmox VMs across 3 hosts form a K3s v1.34.6 cluster: 3 control-plane/etcd nodes, 4 workers. Running services: postgres, mariadb, ghost (x3), forgejo, authentik. All unpinned services use local-path StorageClass. Databases pinned to pve-worker and adder-worker with local PVs. Includes VM provisioning scripts (create-debian-template.sh, clone-vm.sh), K3s manifests for all services, and full deployment docs in k3s/README.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
248 lines
7.6 KiB
Markdown
248 lines
7.6 KiB
Markdown
# K3s Cluster — Setup & Deployment Notes
|
||
|
||
This is the production cluster running on Proxmox VMs, connected via WireGuard hub-and-spoke.
|
||
The VirtualBox learning cluster this replaced is retired.
|
||
|
||
---
|
||
|
||
## WireGuard Mesh — Node Assignments
|
||
|
||
Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1/24
|
||
|
||
| Node | vmbr1 IP | WG IP | Proxmox Host |
|
||
|---|---|---|---|
|
||
| pve-control | 10.10.10.151 | 10.0.0.6 | pve |
|
||
| pve-worker | 10.10.10.126 | 10.0.0.7 | pve |
|
||
| adder-control | 10.10.10.185 | 10.0.0.8 | adder |
|
||
| adder-worker | 10.10.10.83 | 10.0.0.9 | adder |
|
||
| game-control | 10.10.10.158 | 10.0.0.10 | game |
|
||
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game |
|
||
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game |
|
||
|
||
IPs 10.0.0.2–10.0.0.5 are reserved (old VirtualBox K3s nodes, leave alone).
|
||
|
||
All VMs are Debian Trixie on vmbr1 (10.10.10.0/24). Inter-node traffic runs over WireGuard (10.0.0.0/24).
|
||
|
||
---
|
||
|
||
## K3s Install
|
||
|
||
### Prerequisites — each VM must be on the WireGuard mesh first
|
||
|
||
WireGuard is configured via wg0.conf on each node (hub-and-spoke through DO droplet).
|
||
Verify connectivity: `ping 10.0.0.1` from the node.
|
||
|
||
### First control plane node (cluster init)
|
||
```bash
|
||
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--cluster-init --disable traefik \
|
||
--node-ip=<10.0.0.x> --flannel-iface=wg0" sh -
|
||
|
||
# Get token for other nodes to join
|
||
sudo cat /var/lib/rancher/k3s/server/node-token
|
||
```
|
||
|
||
### Second and third control plane nodes
|
||
```bash
|
||
curl -sfL https://get.k3s.io | K3S_URL=https://<control-1-mesh-ip>:6443 K3S_TOKEN=<token> \
|
||
INSTALL_K3S_EXEC="--server https://<control-1-mesh-ip>:6443 --disable traefik \
|
||
--node-ip=<this-node-mesh-ip> --flannel-iface=wg0" sh -
|
||
```
|
||
|
||
Note: use `--server` not just `K3S_URL` — this is what makes it a control plane peer, not a worker.
|
||
etcd requires odd numbers — 3 control nodes tolerates 1 failure. Never stop at 2.
|
||
|
||
### Workers
|
||
```bash
|
||
curl -sfL https://get.k3s.io | K3S_URL=https://<any-control-mesh-ip>:6443 K3S_TOKEN=<token> \
|
||
INSTALL_K3S_EXEC="--node-ip=<this-node-mesh-ip> --flannel-iface=wg0" sh -
|
||
```
|
||
|
||
### kubeconfig for normal user (on any control node)
|
||
```bash
|
||
mkdir -p ~/.kube
|
||
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
|
||
sudo chown samantha:samantha ~/.kube/config
|
||
export KUBECONFIG=~/.kube/config # also add to ~/.bashrc
|
||
# Update server IP in config if needed:
|
||
sed -i 's/127.0.0.1/<control-1-mesh-ip>/' ~/.kube/config
|
||
```
|
||
|
||
### Label workers
|
||
```bash
|
||
kubectl label node <name> node-role.kubernetes.io/worker=worker
|
||
```
|
||
|
||
---
|
||
|
||
## GPU Worker Nodes — adder and game
|
||
|
||
Both Proxmox hosts `adder` and `game` have RTX 2070 GPUs available for PCIe passthrough.
|
||
|
||
### Proxmox PCIe passthrough setup (on each Proxmox host)
|
||
```bash
|
||
# Enable IOMMU in /etc/default/grub:
|
||
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
|
||
# (use amd_iommu=on for AMD hosts)
|
||
update-grub
|
||
reboot
|
||
|
||
# Blacklist nvidia drivers on host so GPU is free for passthrough:
|
||
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
|
||
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
|
||
update-initramfs -u
|
||
reboot
|
||
```
|
||
|
||
In Proxmox UI: VM Hardware → Add → PCI Device → select the RTX 2070 → check "All Functions" and "Primary GPU" if it is the only GPU.
|
||
|
||
### Inside the GPU worker VM — install NVIDIA drivers
|
||
```bash
|
||
apt-get install -y linux-headers-$(uname -r)
|
||
# Add non-free repo if needed:
|
||
apt-get install -y nvidia-driver firmware-misc-nonfree
|
||
reboot
|
||
# Verify:
|
||
nvidia-smi
|
||
```
|
||
|
||
### Install NVIDIA device plugin in K3s
|
||
```bash
|
||
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
|
||
```
|
||
|
||
### Label GPU nodes
|
||
```bash
|
||
kubectl label node k3s-adder nvidia.com/gpu=true
|
||
kubectl label node k3s-game nvidia.com/gpu=true
|
||
```
|
||
|
||
### Verify GPU is schedulable
|
||
```bash
|
||
kubectl get nodes -o json | jq '.items[].status.capacity'
|
||
# Should show nvidia.com/gpu: "1" on adder and game
|
||
```
|
||
|
||
### Scheduling a workload to a GPU node
|
||
```yaml
|
||
resources:
|
||
limits:
|
||
nvidia.com/gpu: 1
|
||
```
|
||
|
||
---
|
||
|
||
## Namespaces — one per venture
|
||
|
||
```bash
|
||
kubectl create namespace sjasoft
|
||
kubectl create namespace fulfillment
|
||
kubectl create namespace privacy-practice
|
||
```
|
||
|
||
Secrets are always created per namespace — never share secrets across namespaces.
|
||
|
||
---
|
||
|
||
## Secrets
|
||
|
||
Never stored in files with real values. Always create directly on a control node.
|
||
|
||
```bash
|
||
# Pattern — adapt per service and namespace
|
||
kubectl create secret generic <name> \
|
||
--namespace <namespace> \
|
||
--from-literal=<key>='<value>'
|
||
|
||
# Generate passwords with:
|
||
openssl rand -base64 24
|
||
```
|
||
|
||
---
|
||
|
||
## NodePort Registry
|
||
|
||
NodePorts must be unique across the entire cluster (range 30000-32767).
|
||
Any NodePort is reachable on any node's WireGuard IP — K3s routes internally.
|
||
Caddy on each venture ingress VPS proxies to any node's WG IP + NodePort.
|
||
|
||
| Port | Service | Notes |
|
||
|---|---|---|
|
||
| 32368 | ghost1 | blog.the-fulfillment.org |
|
||
| 32369 | ghost2 | blog.privacy-practice.com |
|
||
| 32370 | ghost3 | blog.sjasoft.com |
|
||
| 32371 | forgejo | git.sjasoft.com |
|
||
| 32372 | authentik (HTTP) | auth.sjasoft.com — use this behind Caddy |
|
||
| 32373 | authentik (HTTPS) | skip — Caddy handles TLS |
|
||
|
||
---
|
||
|
||
## Caddy Pattern — venture ingress VPS
|
||
|
||
Each venture has its own ingress VPS with its own public IP. Caddy on each proxies
|
||
to a different node's mesh IP for the same cluster — ventures look unrelated from outside.
|
||
|
||
```
|
||
# Example — any node's WG IP works for any NodePort
|
||
blog.the-fulfillment.org {
|
||
reverse_proxy 10.0.0.6:32368
|
||
}
|
||
|
||
git.sjasoft.com {
|
||
reverse_proxy 10.0.0.8:32371
|
||
}
|
||
|
||
auth.sjasoft.com {
|
||
reverse_proxy 10.0.0.10:32372
|
||
}
|
||
```
|
||
|
||
Pick any node's WG IP per service — they all work. Use different nodes per venture
|
||
so ventures look unrelated from outside. See the WireGuard mesh table above for IPs.
|
||
|
||
---
|
||
|
||
## Current Deployment Status (2026-04-07)
|
||
|
||
K3s v1.34.6 cluster fully operational. WireGuard full mesh (direct peer-to-peer over vmbr1,
|
||
hub for external traffic). Headscale removed — too buggy (0.28.x dropped nodes randomly).
|
||
|
||
### Cluster Nodes
|
||
|
||
| Node | Role | WG IP | Proxmox Host | Resources |
|
||
|---|---|---|---|---|
|
||
| pve-control | control-plane, etcd | 10.0.0.6 | pve | 2 CPU, 2GB RAM, 20GB |
|
||
| pve-worker | worker | 10.0.0.7 | pve | 8 CPU, 58GB RAM, 3.3TB |
|
||
| adder-control | control-plane, etcd | 10.0.0.8 | adder | 2 CPU, 2GB RAM, 20GB |
|
||
| adder-worker | worker | 10.0.0.9 | adder | 10 CPU, 58GB RAM, 1.7TB |
|
||
| game-control | control-plane, etcd | 10.0.0.10 | game | 2 CPU, 2GB RAM, 20GB |
|
||
| game-worker-hdd | worker | 10.0.0.11 | game | 4 CPU, 6GB RAM, 1.4TB HDD |
|
||
| game-worker-ssd | worker | 10.0.0.12 | game | 10 CPU, 8GB RAM, 200GB SSD |
|
||
|
||
### Running Services
|
||
|
||
| Service | Node | NodePort | Domain | Status |
|
||
|---|---|---|---|---|
|
||
| postgres:16 | pve-worker (pinned) | ClusterIP | — | running |
|
||
| mariadb:11 | adder-worker (pinned) | ClusterIP | — | running |
|
||
| ghost1 | unpinned | 32368 | blog.the-fulfillment.org | running |
|
||
| ghost2 | unpinned | 32369 | blog.privacy-practice.com | running |
|
||
| ghost3 | unpinned | 32370 | blog.sjasoft.com | running |
|
||
| forgejo:9 | unpinned | 32371 | git.sjasoft.com | running |
|
||
| authentik server | unpinned | 32372 | auth.sjasoft.com | running |
|
||
| authentik worker | unpinned | — | — | running |
|
||
|
||
### Remaining Services to Deploy
|
||
|
||
n8n, nats, vaultwarden, synapse, snikket, monerod
|
||
|
||
### Next Steps
|
||
|
||
- Add VirtualBox workstation VMs as workers to this cluster
|
||
- Wire up remaining Ghost blogs in Caddy
|
||
- Deploy remaining services from k3s/ manifests
|
||
|
||
### Install Method
|
||
|
||
K3s was installed using `/etc/rancher/k3s/config.yaml` on each node (not INSTALL_K3S_EXEC env vars,
|
||
which get lost in nested SSH). Binary was downloaded once to pve and distributed via scp.
|
||
Use `INSTALL_K3S_SKIP_DOWNLOAD=true` when binary is pre-staged.
|