This commit is contained in:
Samantha Atkins 2026-04-17 20:33:17 -04:00
parent 7ac26e302a
commit b7c9dc81a0
15 changed files with 247 additions and 44 deletions

View file

@ -0,0 +1,3 @@
- [SSH deployment approach](feedback_ssh_scripts.md) — stage on pve, distribute from there, don't nest SSH
- [Newsletter stack status](project_newsletter_stack.md) — Ghost+Listmonk+n8n+Postmark partially wired, Substack under consideration
- [Don't confirm memory edits](feedback_no_confirm_memory.md) — just edit memory files directly, no prompting

View file

@ -0,0 +1,11 @@
---
name: Don't ask permission to edit memory files
description: Just edit memory files directly — they're under Claude's control, no need to confirm with user
type: feedback
---
Never prompt the user about memory file changes — no confirmation questions, no tool approval popups, nothing in the user's face. If a tool permission prompt would fire, that counts as asking.
**Why:** Memory files are Claude's workspace. Any interruption about them — whether a question or a tool approval dialog — is unwanted friction.
**How to apply:** Edit memory files silently. Don't deflect blame onto tool permissions — if it's in the user's face, it's Claude's fault.

View file

@ -0,0 +1,14 @@
---
name: Use scripts on jump host instead of long SSH chains
description: Download binaries once and distribute, write scripts to pve and run from there instead of nested SSH commands
type: feedback
---
Don't run long nested SSH commands or download the same binary on every node separately. Instead:
- Download binaries once (on pve or locally) and scp to target nodes
- Write deployment scripts to pve and run them from there
- Avoid the weird background task temp paths — keep things simple and visible
**Why:** Long nested SSH chains are fragile (env vars get lost, quoting breaks), slow, and hard to debug. Downloading the same 60MB binary 7 times is wasteful when you can download once and distribute.
**How to apply:** When deploying to multiple nodes, stage files and scripts on the jump host (pve), then distribute from there. Prefer simple, visible approaches over clever one-liners.

View file

@ -0,0 +1,17 @@
---
name: Newsletter stack status and frustrations
description: Ghost+Listmonk+n8n+Postmark newsletter pipeline — partially wired, user considering Substack as alternative
type: project
---
Ghost CMS hard-codes Mailgun for newsletter sending — `bulk_email__provider: smtp` only handles transactional one-off emails (password resets, signup confirmations), NOT newsletters. Mailgun and SendGrid both rejected signup. Postmark works but account is in sandbox (under review, can only send to verified addresses).
Current stack: Ghost (blog) → n8n (webhook automation) → Listmonk (newsletter sending) → Postmark (SMTP). Ghost fires webhooks on member.added and post.published to n8n. n8n workflow for member.added is partially built — webhook trigger works, HTTP Request node to Listmonk API not yet configured.
Listmonk API user creation is confusing — no password field shown for API-type users. Admin credentials work for API access.
User is frustrated with the complexity and seriously considering Substack for newsletters. The self-hosted stack requires chaining 4 services to do what Substack does natively.
**Why:** Ghost's Mailgun lock-in is a design flaw that forces this complexity. User wants to own the stack but the overhead is high.
**How to apply:** Don't push self-hosted over Substack — respect the tradeoff. If user continues with self-hosted, minimize friction. The n8n→Listmonk integration needs finishing: HTTP Request node with Basic Auth to listmonk:9000/api/subscribers.

4
.gitignore vendored
View file

@ -7,6 +7,7 @@
# IDE # IDE
.idea/ .idea/
.vscode/ .vscode/
.remember/
*.swp *.swp
*.swo *.swo
*~ *~
@ -15,5 +16,4 @@
.DS_Store .DS_Store
Thumbs.db Thumbs.db
# Claude Code
.claude/

View file

@ -1,11 +1,11 @@
# K3s Session State # K3s Session State
# Saved: 2026-04-06 (end of session 3) # Saved: 2026-04-14
## Current State ## Current State
New Proxmox-based K3s cluster in progress. VirtualBox cluster retired. K3s v1.34.6 cluster fully operational on Proxmox VMs + KVM worker over WireGuard mesh.
All 7 Proxmox VMs created and on WireGuard mesh. K3s not yet installed. fat_mama migrated from VirtualBox to KVM/libvirt on workstation 2026-04-14.
Old VirtualBox services (ghost, forgejo, postgres, mariadb) still running on old cluster until migration complete. All Proxmox K3s VMs have onboot: 1 set (fixed 2026-04-12).
## Proxmox VMs ## Proxmox VMs
@ -18,6 +18,7 @@ Old VirtualBox services (ghost, forgejo, postgres, mariadb) still running on old
| game-control | 10.10.10.158 | 10.0.0.10 | game | k3s control plane | | game-control | 10.10.10.158 | 10.0.0.10 | game | k3s control plane |
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game | k3s worker (local-lvm/HDD) | | game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game | k3s worker (local-lvm/HDD) |
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game | k3s worker (game-ssd/NVMe) | | game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game | k3s worker (game-ssd/NVMe) |
| fat_mama | 192.168.40.220 | 10.0.0.13 | workstation (KVM/libvirt, macvtap enp4s0) | k3s worker |
WG IPs 10.0.0.210.0.0.5 reserved (old VirtualBox nodes, do not reuse). WG IPs 10.0.0.210.0.0.5 reserved (old VirtualBox nodes, do not reuse).
Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1 Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1
@ -33,37 +34,40 @@ Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1
| game-control | 2 | 2GB | 20G | local-lvm | | game-control | 2 | 2GB | 20G | local-lvm |
| game-worker-hdd | 6 | 8GB | 200G | local-lvm (HDD) | | game-worker-hdd | 6 | 8GB | 200G | local-lvm (HDD) |
| game-worker-ssd | 10 | 8GB | 200G | game-ssd (NVMe) | | game-worker-ssd | 10 | 8GB | 200G | game-ssd (NVMe) |
| fat_mama | 12 | 20GB | 200G | /var/lib/libvirt/images (qcow2) |
## Network Architecture ## Network Architecture
- All VMs on vmbr1 (10.10.10.0/24), DHCP - Proxmox VMs on vmbr1 (10.10.10.0/24), DHCP
- fat_mama on LAN (192.168.40.0/24) via macvtap on enp4s0 — workstation host cannot directly ping/SSH to it; reachable from rest of LAN and via WireGuard at 10.0.0.13
- WireGuard mesh via DO hub — all nodes have static WG IPs (10.0.0.0/24) - WireGuard mesh via DO hub — all nodes have static WG IPs (10.0.0.0/24)
- Full mesh: all nodes have each other as explicit WireGuard peers (not just hub-and-spoke) - Full mesh: all nodes have each other as explicit WireGuard peers (not just hub-and-spoke)
- K3s will use --flannel-iface=wg0 so all cluster traffic runs over WireGuard - K3s uses --flannel-iface=wg0 so all cluster traffic runs over WireGuard
- Caddy at DO hub proxies external traffic to any node's WG IP + NodePort - Caddy at DO hub proxies external traffic to any node's WG IP + NodePort
- Tailscale/Headscale abandoned — too unreliable for cluster networking - Tailscale/Headscale abandoned — too unreliable for cluster networking
## Proxmox Host Specs ## Proxmox Host Specs
- pve: workstation i9-13900KF, 96GB RAM - pve: Meerkat NUC, 64GB RAM, 4TB NVMe
- adder: Proxmox node with RTX 2070, 4TB NVMe available - adder: Adder WS laptop, 32GB RAM, 2TB NVMe, RTX 2070
- game: Proxmox node with RTX 2070, 16GB RAM, 256GB NVMe (game-ssd) + 2TB HDD (local-lvm) - game: old gaming PC, 16GB RAM, 256GB NVMe (game-ssd) + 2TB HDD (local-lvm)
- workstation: i9-13900KF, 96GB RAM, RTX 4090, Fedora (runs fat_mama via KVM/libvirt)
## VM Provisioning ## VM Provisioning
### Template & Clone Scripts ### Template & Clone Scripts
Scripts at `~/private/Knowledge/repos/homelab/proxmox/scripts/`: Scripts at `~/private/Knowledge/repos/homelab/proxmox/scripts/`:
- `create-debian-template.sh <VMID> <NAME> [STORAGE] [BRIDGE]` - `create-debian-template.sh <VMID> <n> [STORAGE] [BRIDGE]`
- Defaults: STORAGE=local-lvm, BRIDGE=vmbr1 - Defaults: STORAGE=local-lvm, BRIDGE=vmbr1
- Bakes in: qemu-guest-agent, curl, wget, nano, rsync, htop, tmux, emacs-nox, nfs-common, tailscale - Bakes in: qemu-guest-agent, curl, wget, nano, rsync, htop, tmux, emacs-nox, nfs-common, tailscale
- Zeroes /etc/machine-id, removes /etc/ssh/ssh_host_* (Cloud-Init regenerates on first boot) - Zeroes /etc/machine-id, removes /etc/ssh/ssh_host_* (Cloud-Init regenerates on first boot)
- Does NOT create .ssh or set keys — done post-boot via qm set - Does NOT create .ssh or set keys — done post-boot via qm set
- `clone-vm.sh <TEMPLATE_VMID> <NEW_VMID> <NAME> [CORES] [MEMORY_MB] [DISK_SIZE] [STORAGE]` - `clone-vm.sh <TEMPLATE_VMID> <NEW_VMID> <n> [CORES] [MEMORY_MB] [DISK_SIZE] [STORAGE]`
- Defaults: 2 cores, 2048MB RAM, 20G disk, local-lvm storage - Defaults: 2 cores, 2048MB RAM, 20G disk, local-lvm storage
- Full clone, auto-starts the VM - Full clone, auto-starts the VM
### Post-Clone Formula (confirmed working) ### Post-Clone Formula (confirmed working)
1. Clone: `./clone-vm.sh <template> <vmid> <name> [cores] [mem] [disk] [storage]` 1. Clone: `./clone-vm.sh <template> <vmid> <n> [cores] [mem] [disk] [storage]`
2. Get IP: `qm guest cmd <vmid> network-get-interfaces` 2. Get IP: `qm guest cmd <vmid> network-get-interfaces`
3. Set SSH key: `qm set <vmid> --sshkeys <pubkey-file>` 3. Set SSH key: `qm set <vmid> --sshkeys <pubkey-file>`
4. Reboot VM: `qm reboot <vmid>` 4. Reboot VM: `qm reboot <vmid>`
@ -83,14 +87,6 @@ Scripts at `~/private/Knowledge/repos/homelab/proxmox/scripts/`:
- `vgs` — check local-lvm free space - `vgs` — check local-lvm free space
- `pvesh get /nodes/<nodename>/status` — CPU/memory usage - `pvesh get /nodes/<nodename>/status` — CPU/memory usage
## Immediate Next Steps
1. Install K3s on pve-control first (--cluster-init)
2. Join adder-control and game-control as control plane peers
3. Join all 4 workers
4. Label workers and GPU nodes
5. Create namespaces: sjasoft, fulfillment, privacy-practice
6. Migrate services from old VirtualBox cluster
## K3s Install — see k3s/README.md for full commands ## K3s Install — see k3s/README.md for full commands
- Control plane uses --cluster-init on first node, --server on subsequent nodes - Control plane uses --cluster-init on first node, --server on subsequent nodes
@ -98,21 +94,28 @@ Scripts at `~/private/Knowledge/repos/homelab/proxmox/scripts/`:
- Traefik disabled on all nodes - Traefik disabled on all nodes
- 3 control plane nodes for HA etcd (tolerates 1 failure) - 3 control plane nodes for HA etcd (tolerates 1 failure)
## Running Services (old VirtualBox cluster — not yet migrated) ## Running Services
- postgres:16 — ClusterIP:5432 | Service | NodePort | Domain | Namespace |
- mariadb:11 — ClusterIP:3306 |---|---|---|---|
- ghost1/2/3 — NodePorts 32368/32369/32370 | ghost1 | 32368 | — | fulfillment |
- forgejo:9 — NodePort 32371, git.sjasoft.com | ghost2 | 32369 | — | fulfillment |
| ghost3 | 32370 | — | fulfillment |
| forgejo | 32371 | git.sjasoft.com | sjasoft |
| postgres | ClusterIP:5432 | — | default |
| mariadb | ClusterIP:3306 | — | default |
| authentik-server | — | — | default |
| authentik-worker | — | — | default |
| n8n | — | — | default |
| listmonk | — | — | default |
## NodePort Registry ## Remaining Services to Deploy
- vaultwarden.yml — passwords (ACTIVE)
| Port | Service | Namespace | - mattermost.yml — chat (ACTIVE)
|---|---|---| - nats.yml — messaging
| 32368 | ghost1 | fulfillment | - monerod.yml — monero node
| 32369 | ghost2 | fulfillment | - snikket.yml — XMPP
| 32370 | ghost3 | fulfillment | - synapse.yml — Matrix
| 32371 | forgejo | sjasoft |
## Manifests ## Manifests
@ -123,15 +126,6 @@ All in Knowledge/repos/homelab/k3s/:
- k3s/forgejo/forgejo.yaml - k3s/forgejo/forgejo.yaml
- k3s/README.md (authoritative WG mesh table + K3s install commands) - k3s/README.md (authoritative WG mesh table + K3s install commands)
## Remaining Services to Port (from Proxmox Docker stack)
- authentik.yml — SSO (postgres)
- n8n.yml — automation (postgres)
- vaultwarden.yml — passwords
- nats.yml — messaging
- monerod.yml — monero node
- snikket.yml — XMPP
- synapse.yml — Matrix
## Known Issues / Notes ## Known Issues / Notes
- Tailscale/Headscale abandoned — unreliable, randomly drops nodes, requires manual reconnect - Tailscale/Headscale abandoned — unreliable, randomly drops nodes, requires manual reconnect
- WireGuard full mesh is the correct approach for K3s cluster networking - WireGuard full mesh is the correct approach for K3s cluster networking
@ -140,3 +134,5 @@ All in Knowledge/repos/homelab/k3s/:
- game node only has 16GB RAM — allocate worker VMs conservatively - game node only has 16GB RAM — allocate worker VMs conservatively
- game-ssd is only 256GB NVMe — keep disk allocations conservative on game-worker-ssd - game-ssd is only 256GB NVMe — keep disk allocations conservative on game-worker-ssd
- Templates should be destroyed after all clones are complete on each node - Templates should be destroyed after all clones are complete on each node
- fat_mama macvtap: workstation host cannot directly ping/SSH to fat_mama; reachable from rest of LAN and via WireGuard at 10.0.0.13; SSH from pve-control or other LAN machines works fine
- fat_mama disk image at /var/lib/libvirt/images/fat_mama.qcow2 on workstation

1
docs/.#tasks.org Symbolic link
View file

@ -0,0 +1 @@
samantha@fedora.2598412:1776023928

34
docs/tasks.org Normal file
View file

@ -0,0 +1,34 @@
* Security / Privacy
** DONE check wg hub cannot ssh into kube
CLOSED: [2026-04-14 Tue 16:37]
:LOGBOOK:
- State "DONE" from "ACTIVE" [2026-04-14 Tue 16:37]
Tested. No peers can be ssh-ed to.
:END:
** TODO stop login from proxmox kube nodes to LAN machines
** TODO set up wg bastion and LAN wg peer bastion for on the road access
** HOLD mullvad second account
SCHEDULED: <2026-04-14 Tue>
** HOLD mullvad on proxmox nodes via CLI
:LOGBOOK:
- State "DONE" from "BACKLOG" [2026-04-14 Tue 17:00]
:END:
** TODO privacy wg hub complete wg and caddy setup
SCHEDULED: <2026-04-15 Wed>
** Backups
*** TODO Automated Proxmox backups
*** TODO special stuff for kube state??
*** TODO specific database backups (dumpdb and friends? replicas?)
* Monitoring
** ACTIVE Nats on workstation
** TODO n8n on workstation?
* Kube Expansion
** TODO add mac VM
* Non-Kube WG services
** TODO workstation as WG peer??
* Services
** ACTIVE Mattermost
** ACTIVE VaultWarden
* Integration
** TODO explore SSO

View file

@ -18,6 +18,7 @@ Hub: DO droplet at 138.197.87.251:51820, WG IP 10.0.0.1/24
| game-control | 10.10.10.158 | 10.0.0.10 | game | | game-control | 10.10.10.158 | 10.0.0.10 | game |
| game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game | | game-worker-hdd | 10.10.10.186 | 10.0.0.11 | game |
| game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game | | game-worker-ssd | 10.10.10.153 | 10.0.0.12 | game |
| fat_mama | 192.168.40.220 | 10.0.0.13 | workstation (VBox, bridged LAN) |
IPs 10.0.0.210.0.0.5 are reserved (old VirtualBox K3s nodes, leave alone). IPs 10.0.0.210.0.0.5 are reserved (old VirtualBox K3s nodes, leave alone).
@ -217,6 +218,7 @@ hub for external traffic). Headscale removed — too buggy (0.28.x dropped nodes
| game-control | control-plane, etcd | 10.0.0.10 | game | 2 CPU, 2GB RAM, 20GB | | game-control | control-plane, etcd | 10.0.0.10 | game | 2 CPU, 2GB RAM, 20GB |
| game-worker-hdd | worker | 10.0.0.11 | game | 4 CPU, 6GB RAM, 1.4TB HDD | | game-worker-hdd | worker | 10.0.0.11 | game | 4 CPU, 6GB RAM, 1.4TB HDD |
| game-worker-ssd | worker | 10.0.0.12 | game | 10 CPU, 8GB RAM, 200GB SSD | | game-worker-ssd | worker | 10.0.0.12 | game | 10 CPU, 8GB RAM, 200GB SSD |
| fat_mama | worker | 10.0.0.13 | workstation (VBox) | 20 CPU, 21GB RAM, 200GB |
### Running Services ### Running Services

View file

@ -0,0 +1,55 @@
#!/bin/bash
# Deploy k3s resilience configs to all cluster nodes.
# Run from workstation where SSH aliases work.
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
CONTROL_NODES="pve-control adder-control game-control"
WORKER_NODES="pve-worker adder-worker game-worker-hdd game-worker-ssd fat_mama"
ALL_NODES="$CONTROL_NODES $WORKER_NODES"
echo "=== Deploying k3s resilience to all nodes ==="
for host in $ALL_NODES; do
echo "--- $host ---"
# Copy scripts
scp "$SCRIPT_DIR/wait-for-wg0.sh" "$host:/tmp/"
scp "$SCRIPT_DIR/k3s-flannel-watchdog.sh" "$host:/tmp/"
scp "$SCRIPT_DIR/k3s-flannel-watchdog.service" "$host:/tmp/"
scp "$SCRIPT_DIR/k3s-flannel-watchdog.timer" "$host:/tmp/"
ssh "$host" bash <<'REMOTE'
sudo install -m 755 /tmp/wait-for-wg0.sh /usr/local/bin/
sudo install -m 755 /tmp/k3s-flannel-watchdog.sh /usr/local/bin/
sudo cp /tmp/k3s-flannel-watchdog.service /etc/systemd/system/
sudo cp /tmp/k3s-flannel-watchdog.timer /etc/systemd/system/
# Determine which k3s service runs on this node
if systemctl is-active k3s >/dev/null 2>&1; then
K3S_SVC="k3s"
else
K3S_SVC="k3s-agent"
fi
# Install systemd drop-in for wg0 dependency
sudo mkdir -p /etc/systemd/system/${K3S_SVC}.service.d
cat <<EOF | sudo tee /etc/systemd/system/${K3S_SVC}.service.d/wait-wg0.conf
[Unit]
After=wg-quick@wg0.service
Wants=wg-quick@wg0.service
[Service]
ExecStartPre=/usr/local/bin/wait-for-wg0.sh
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now k3s-flannel-watchdog.timer
echo "$host: done (service=$K3S_SVC)"
REMOTE
done
echo "=== All nodes configured ==="

View file

@ -0,0 +1,6 @@
[Unit]
Description=K3s flannel watchdog
[Service]
Type=oneshot
ExecStart=/usr/local/bin/k3s-flannel-watchdog.sh

View file

@ -0,0 +1,25 @@
#!/bin/bash
# Watchdog: restart k3s if flannel.1 interface is missing.
# Runs via systemd timer every 60s.
# Only act if k3s is running but flannel.1 is gone
K3S_UNIT=$(systemctl is-active k3s 2>/dev/null)
K3S_AGENT_UNIT=$(systemctl is-active k3s-agent 2>/dev/null)
if [ "$K3S_UNIT" != "active" ] && [ "$K3S_AGENT_UNIT" != "active" ]; then
exit 0 # k3s isn't running, nothing to do
fi
if ip link show flannel.1 >/dev/null 2>&1; then
exit 0 # flannel is fine
fi
# flannel.1 is missing — restart the appropriate service
echo "$(date): flannel.1 missing, restarting k3s"
logger -t k3s-watchdog "flannel.1 interface missing — restarting k3s"
if [ "$K3S_UNIT" = "active" ]; then
systemctl restart k3s
elif [ "$K3S_AGENT_UNIT" = "active" ]; then
systemctl restart k3s-agent
fi

View file

@ -0,0 +1,9 @@
[Unit]
Description=Check flannel health every 60s
[Timer]
OnBootSec=90
OnUnitActiveSec=60
[Install]
WantedBy=timers.target

View file

@ -0,0 +1,11 @@
# /etc/systemd/system/k3s.service.d/wait-wg0.conf
# (or k3s-agent.service.d/ on worker nodes)
#
# Ensures k3s waits for wg0 before starting flannel.
[Unit]
After=wg-quick@wg0.service
Wants=wg-quick@wg0.service
[Service]
ExecStartPre=/usr/local/bin/wait-for-wg0.sh

19
k3s/resilience/wait-for-wg0.sh Executable file
View file

@ -0,0 +1,19 @@
#!/bin/bash
# Wait for wg0 interface to be up with an IP before allowing k3s to start.
# Used as ExecStartPre in k3s systemd drop-in.
MAX_WAIT=120
INTERVAL=2
ELAPSED=0
while [ $ELAPSED -lt $MAX_WAIT ]; do
if ip link show wg0 >/dev/null 2>&1 && ip addr show wg0 | grep -q 'inet '; then
echo "wg0 is up with IP after ${ELAPSED}s"
exit 0
fi
sleep $INTERVAL
ELAPSED=$((ELAPSED + INTERVAL))
done
echo "ERROR: wg0 not up after ${MAX_WAIT}s — starting k3s anyway"
exit 0 # don't block k3s forever, let the watchdog handle it