Furnace | niclydon.io

Furnace schematic — a desktop computer with a glowing GPU die

Furnace is the primary compute node and the gateway everything else talks to. It’s a GMKTec EVO-X2, an AMD Strix Halo (AMD Ryzen AI Max+ 395) with 128 GB of unified memory and 96 GB of VRAM allocated through the unified memory architecture. That VRAM number is what makes Strix Halo interesting for home lab AI: an 80B MoE chat model, an 8B embedder, and a 30B vision model all fit on one box without compromise, and there’s still headroom for a 4B prompt-enhancer or a stylistic adapter to swap in on demand.

Eight always-on services run here. Three LLM-family models sit behind llama-swap on port 8099 (qwen3-next-chat-80b, qwen3-vl-30b-a3b, qwen3-embed-8b), and five specialists each own their own systemd unit (Whisper large-v3-turbo, InsightFace buffalo_l, a cross-encoder reranker, Florence-2 OCR, Chatterbox TTS). qwen3-next-chat-80b is the primary reasoning model: 46 GB of Q4_K_M weights, 80B total params with 3B active, single-host on Furnace with no RPC split. It replaced the old Llama-3.3-70B + Crucible RPC split in April after Phase 0 validation clocked 49 tok/s decode with zero Vulkan errors. The RPC tail over Thunderbolt worked, but a single-host 80B MoE runs faster and has one less failure mode. llama-swap preloads the 80B and the embedder at startup and keeps them resident; the 30B vision model swaps in on demand with a 1800 s TTL. Forge is the OpenAI-compatible gateway that fans requests across all of this, with the permissive and code tiers living over on Crucible.

Furnace also hosts the PostgreSQL 16 instance that ARIA, Nexus, and every other platform service reads and writes to — ~191 tables, 162K photos ingested, 131K knowledge facts, 58K emails, all on local NVMe with no cloud round-trip. Caddy terminates TLS for every *.niclydon.io subdomain using a wildcard cert via Cloudflare’s DNS-01 challenge. CoreDNS resolves internal names to tailnet IPs. Prometheus scrapes everything every 15 seconds and Grafana renders 25 dashboards on top. When the watchdog notices an AMD Vulkan memory corruption pattern in the forge-api journal, it automatically restarts the service within 60 seconds. It’s the node that can’t go down, and after a year of iteration, it mostly doesn’t.