← All projects
Experimental

Smithy

Image & video generation workbench

Status
Experimental
Primary Stack
FastAPI · React · TypeScript
Depends On
Forge

Every time I wanted an image I ended up on SDXL subreddits learning about samplers, LoRAs, and anatomy negatives. I wanted one interface that knew all of that so I didn't have to.

Smithy schematic — a blacksmith anvil with a single amber spark

The existing image-gen tools sit at two extremes. On one end, a prompt box that says “FLUX” and gives you no control over samplers, LoRAs, negatives, or model selection. On the other end, a full ComfyUI node graph that requires a weekend to learn and a spreadsheet to debug. I wanted something in the middle — a natural-language prompt on the front, a small router model that actually knows about samplers and anatomy negatives and Pony’s score_9 tags, and the actual generation pipeline behind it. Also: Wan 2.1 video generation is genuinely good now and nobody’s built a clean UI for it.

Three entry points. A React/FastAPI SPA at smithy.niclydon.io with eight tabs — Generate, Expand, Video, Restore, Gallery, Catalog, Train, Dataset. A CLI chat REPL for quick one-shot prompts from the terminal. A direct gen.sh for explicit control when the router is getting in the way. Under the hood, any of them sends the prompt to an abliterated Qwen 3.5 35B-A3B MoE — the “permissive” build — running on Crucible. That router emits a structured plan — model family, LoRA stack, negatives, dimensions, sampler flags — and Forge runs sd-cli on Furnace to execute it. The router knows the family-specific quirks for FLUX, SDXL, Pony, Wan 2.1, Chroma, Z-Image, and Qwen-Image, and has absorbed OpenAI’s gpt-image-2 prompting discipline — structural prompt order, letter-by-letter spelling for tricky text, explicit preserve-vs-change lists for img2img iterations, and single-change discipline that stops it from silently swapping LoRAs when you ask for “warmer.” CodeFormer restores faces, Real-ESRGAN upscales 4×, ffmpeg converts video when Wan writes MJPG AVI. Results land in a timestamped run directory with a metadata sidecar and show up in the gallery within seconds.

The hard parts are all AMD Vulkan stability lessons, and they are all in the code. --vae-tiling is mandatory on SDXL or the compute buffer allocation fails at decode time. --clip-on-cpu is required for the SDXL family even with 96 GB of VRAM — the allocator hits radv/amdgpu: Not enough memory for command submission errors that have nothing to do with actual free memory. Wan video has a hard ~2.7M voxel ceiling (width × height × frames) before the device buffer blows up, and Forge auto-clamps. Real-ESRGAN silently produces half-corrupted output under memory pressure and returns exit code 0 anyway, so there’s retry-on-suspicious-size logic that compares output file size to input and re-runs if the ratio is wrong. And there’s a running ledger of multi-base Civitai LoRA gotchas — an unpinned manifest entry can silently download a different base architecture (Flux.2 Klein 9B instead of Flux.1 D) and crash sd-cli with a raw SIGABRT during weight init. Every one of those is a night I lost to bad error messages.

FastAPIReactTypeScriptTailwindsd-cliFLUXSDXLWan 2.1Qwen3.5 35B permissive
smithy.niclydon.io (tailnet) ↗