This started because the cloud AI bill kept climbing. OpenAI for chat, Anthropic for reasoning, Google for embeddings, Replicate for image gen, Runway for video, ElevenLabs for voice — every interesting idea meant another metered API and another monthly invoice that scaled with use. Building something ambitious meant signing up to pay forever.
So Nexus runs on a cluster of mini-PCs in a closet. The models are local. The database is local. Every token, embedding, photo description, transcription, face match, and reranked result happens on hardware I own outright. The hardware paid for itself in a few months of saved API spend, and now the marginal cost of an idea is zero — which turns out to be the only constraint that matters for actually building things.
The lab is four named nodes. Furnace is the one that can't go down; Crucible is its bulk-processing twin; Anvil reads the data sources Apple only exposes to a real Mac; and Bellows is the cheap insurance that a 2 AM kernel panic never costs a trip to the lab.
The whole platform runs out of one Postgres database on one machine — no cloud warehouse, no separate metadata service, no managed orchestrator.
Nexus is a TypeScript monorepo — core, API, an MCP server, and a job worker — sitting on more than 700 tables, with dbt handling transformations right inside the same database. Models are served by Forge, a Python gateway that speaks the OpenAI API and hides which model is running where; they run on llama.cpp and stable-diffusion.cpp compiled natively for AMD's GPUs. The operator console, Desk, is a tailnet-only Next.js app. Prometheus and Grafana watch all of it on a 15-second heartbeat, and a watchdog restarts the model gateway within a minute if it ever trips a known GPU memory bug. None of it needs a human in the loop to stay alive.
Forge is the front door for every model in the lab. It speaks the standard OpenAI API, so anything — an agent cycle, a one-off script, a coding assistant — asks for chat, vision, embeddings, transcription, speech, reranking, or OCR without ever needing to know which model is loaded or which machine it lives on.
As of the latest snapshot it routes across 25 models spanning roughly 275 GB of GPU memory: sixteen text and reasoning models, three that can see images, plus dedicated models for embeddings, transcription, face detection, reranking, OCR, and speech. The headliner is an 80-billion-parameter Qwen3 mixture-of-experts model kept resident on Furnace, flanked by a vision model and an embedder. A second, deliberately unfiltered router model on Crucible turns plain-language prompts into structured image-generation plans, and a stack of FLUX, Z-Image and Wan models handles the pictures and video.
Everything I generate flows into one database organized like a real data lakehouse — raw landing tables, source-normalized views, then cross-source serving layers for knowledge, relationships, and behavior.
The breadth is the point: the source manifest declares 77 distinct feeds across eight categories — two decades of iMessages and email, Apple's on-device signals (photos, health, notes, location), music and movie history, wearable and audio-recorder capture, the full history of AI-tool conversations and coding sessions, social and web archives going back to old Blogger posts, travel and flight logs, even the home-security cameras. Each source lands its raw data and stops; the rest of the platform reads it on its own schedule, so adding the next feed never destabilizes the last one.
Nexus keeps a typed graph of the things that make up a life. Nodes are entities — people, organizations, places, concepts — and the edges between them are facts: timestamped, typed assertions like “works at,” “traveled to,” or “knows.” Every fact carries provenance back to the exact source record that produced it, plus a confidence score, so the graph can always explain why it believes something.
Facts are versioned, never deleted: most relationship facts are bi-temporal, recording not just what's true now but when it was true, so the graph answers “where did I work in 2018?” against history instead of overwriting it. Deterministic per-source extractors project typed facts into the graph — no fact is written without a source pointer and a confidence value — and an append-only event log records the full lifecycle (created, confirmed, corrected, superseded, rejected). Normal reasoning reads a curated active view; a nightly quality gate scores identity, evidence coverage, and brief quality before any new claim goes live, staying fail-closed until it can prove the facts are sound.
A person shows up across a life under many identifiers: phone numbers, email addresses, chat handles, contact cards, speaker labels in transcripts. Identity Fusion is the resolver that decides when two of those are the same real human — the difference between good evidence and a graph that lies with confidence. Every candidate match is scored on explainable, cross-source evidence and weighed against negative signals like generic names, shared devices, or conflicting verified details.
The governing rule is strict: no automatic, permanent merge ever happens from a name-only match. High-confidence matches with at least two independent strong signals can auto-link; ambiguous ones route to a human review queue as interactive cards; weak or vendor-shaped matches are blocked by default. Merges preserve every alias and prior provenance, and are measured against a versioned benchmark before automation is widened — targeting 99%+ precision on auto-links and a 0% name-only rate. It's deliberately the opposite of move-fast: missing a merge is annoying, but merging two different people is how the whole graph starts lying.
Nine autonomous agents, each scoped to a distinct domain. They wake on their own schedules — anywhere from every ten minutes to twice a day — reason over deterministic scan evidence, and surface reviewable findings rather than mutating data silently. Every one operates inside an explicit tool-grant boundary with full provenance.
The front door to the roster. ARIA interprets requests, synthesizes across the platform's memory and timeline, and routes work to specialist agents. It carries the widest tool-grant footprint because it brokers across many domains — and leans on the same provenance and review boundaries as every other agent rather than acting silently.
Agents keep a dedicated operational memory — the lessons, facts, and procedures they pick up while doing work, assembled into per-agent memory bundles that ride along in every future run. Memory is typed into six kinds: working snapshots, semantic facts, episodic experiences, procedural know-how, tool lessons, and eval feedback. It's the agents' own scratchpad for getting better at their jobs, never a place where they invent truths about a person's life.
A background consolidation process watches sanitized runtime events and proposes candidate memories — deduped against what the agent already knows, risk-rated, and, for anything touching prompts, behavior, or durable state, held behind a human review card before it can change a bundle. Low-stakes facts settle on their own; the changes that matter cannot. Nothing rewrites an agent's prompt or grants autonomously, and every accepted memory carries evidence references back to the exact bundle that shaped a behavior.
Two memory systems, deliberately kept apart. The personal knowledge graph is fact-checked truth about a life; agent memory is operational know-how about doing the work. Agents read the graph through governed retrieval — they never write their hunches back into it as if they were personal fact. Good demos come from letting an agent improvise; good forensics come from making truth earn its place.
Retrieval is a hybrid stack. Every meaningful text record across dozens of sources is embedded into a shared vector space through an opt-in pipeline — register a source once, and the embedding sweep keeps it covered with zero new code. Search then fuses four signals — vector similarity, full-text lexical matching, knowledge-graph connectivity, and recency — into a single Reciprocal Rank Fusion score.
The top candidates are then re-scored by a cross-encoder reranker for precision before they reach an agent or surface a result. Embeddings are model-versioned, coverage is tracked per source, and the whole pipeline is source-agnostic: the same machinery serves agent retrieval, semantic search, and knowledge lookups alike.
Nexus runs a dedicated secret-redaction layer at its ingest boundaries. Before AI-conversation transcripts, command history, or pasted content are persisted, a shared high-confidence redactor scans for credential-shaped strings — API keys, tokens, webhooks, private-key blocks — and replaces them with a labeled marker that records the class of value removed without ever storing the value itself.
The redaction is wired across many write paths so older producers can't slip past it, and an audit sidecar records only what was scrubbed — counts, rule names, field paths, never the matched secret. A separate publication-time safeguard protects personal data: anything bound for a human-facing review card is paraphrased by a model instructed to describe the topic without quoting text or naming people, so raw messages never reach a rendered surface. Defense in depth, applied before exposure rather than after.
An agent's system prompt is no longer a single file. Under the current runtime, identity is composed at request time from five governed, independently versioned inputs — all stored in Postgres, all hashed for provenance. Any decision can be traced back to the exact prompt, tools, memory, and model that produced it.
The runtime stitches these into ordered context layers and records a hashes-only manifest for every run — never raw prompt, memory, inbox, or tool-result content. Profiles, grants, and memory are never mutated autonomously; risky changes are staged as review cards and applied only with explicit approval.
A standing ProjectManager agent watches for new work, keeps a clean machine-readable backlog, and posts a live coordination board to chat only when something has actually changed — no busywork pings.
The few questions that genuinely need a human are surfaced as decision threads, answered in plain language, and distilled into a structured decision behind a single confirm button. Two execution loops do the rest: one drains the pre-approved queue end to end — branch, merge, deploy, smoke-test, close — and a diagnosis loop investigates open questions and routes each to its real next step. A hard safety fence means nothing risky ever ships without an explicit decision on record, and a scoping instruction is never a license to ship.
JobFoundry is the platform's job orchestration layer, built straight on top of Postgres instead of a third-party workflow engine.
Every job type lives in one registry contract — owner, queue lane, retry policy, schedule, lifecycle — with a build-time gate that refuses to let the code and the database drift apart. Every job run moves through a formal lifecycle: workers claim a lease, heartbeat while they work, back off and retry on recoverable failures, and a crashed worker's work automatically requeues. Dead-lettered jobs wait safely for an explicit, approval-gated retry. Operators get clean read-only views into queues, workers, schedules, and a sanitized event feed — never raw payloads. It went fully live across the worker fleet at the end of May 2026.
It started as a refusal to keep paying per token, and grew into a standing platform: a lakehouse of seventy-seven personal data sources, a local model gateway, a roster of nine agents that reason over it, and the self-management machinery — a project office and a Postgres-native job orchestrator — that keeps the whole thing advancing without a human babysitting each step. Every piece runs on hardware in a closet, and the marginal cost of the next idea is still zero.