Nexus — Personal AI Agent Platform

Why local

Curiosity, mostly. And a refusal to keep paying per token.

This started because the cloud AI bill kept climbing. OpenAI for chat, Anthropic for reasoning, Google for embeddings, Replicate for image gen, Runway for video, ElevenLabs for voice — every interesting idea meant another metered API and another monthly invoice that scaled with use. Building something ambitious meant signing up to pay forever.

So Nexus runs on a cluster of mini-PCs in a closet. The models are local. The database is local. Every token, embedding, photo description, transcription, face match, and reranked result happens on hardware I own outright. The hardware paid for itself in a few months of saved API spend, and now the marginal cost of an idea is zero — which turns out to be the only constraint that matters for actually building things.

Free per token

Zero marginal cost on every chat, embedding, photo description, transcription. Run a million inferences a day, the bill is the same.

Fast iteration

Try wild ideas without watching a meter. The interesting experiments — bulk reprocessing, full-corpus reembedding, daily life-chapter generation — only happen when there's no per-call cost.

Real engineering

Tensor-splitting an 80B reasoning model across two GPUs over Thunderbolt, building an autoscaler for ephemeral workers, wiring nine agents into a coordination protocol — the project is the point.

Yours

No vendor changes a price, deprecates a model, or rate-limits a key. The whole stack runs even if every cloud AI company disappeared tomorrow.

Infrastructure

Four machines, four jobs

The lab is four named nodes. Furnace is the one that can't go down; Crucible is its bulk-processing twin; Anvil reads the data sources Apple only exposes to a real Mac; and Bellows is the cheap insurance that a 2 AM kernel panic never costs a trip to the lab.

Furnace

Primary Compute & Gateway

CPUAMD Strix Halo

Memory128 GB

VRAM~96 GB Unified

RunsLLMs, Postgres, agents

↔

Thunderbolt

40 Gbps

sub-ms latency

Crucible

Satellite / Bulk Compute

CPUAMD Strix Halo

Memory64 GB

VRAM64 GB Unified

RunsElastic bulk workers, media

Anvil

Dev Box & Apple Reader

HardwareM4 Mac Mini

Memory16 GB

RoleiMessage / Photos / contacts reader

Bellows

Out-of-Band Management

HardwareGL.iNet IP-KVM

GivesHDMI console + power relay

RoleReach & power-cycle Furnace remotely

160 GB

Total unified VRAM across the two compute nodes

Two enclosures, one machine: the headline reasoning model is an 80B-class Qwen3 mixture-of-experts that stays resident on Furnace. When a model is too large for one box, llama.cpp's RPC protocol splits its tensor layers across Furnace and Crucible over the direct Thunderbolt cable — 40 Gbps at sub-millisecond latency, fast enough that the split is nearly invisible to inference speed.

Platform

A commercial-scale data platform, folded into a closet

The whole platform runs out of one Postgres database on one machine — no cloud warehouse, no separate metadata service, no managed orchestrator.

Nexus is a TypeScript monorepo — core, API, an MCP server, and a job worker — sitting on more than 700 tables, with dbt handling transformations right inside the same database. Models are served by Forge, a Python gateway that speaks the OpenAI API and hides which model is running where; they run on llama.cpp and stable-diffusion.cpp compiled natively for AMD's GPUs. The operator console, Desk, is a tailnet-only Next.js app. Prometheus and Grafana watch all of it on a 15-second heartbeat, and a watchdog restarts the model gateway within a minute if it ever trips a known GPU memory bug. None of it needs a human in the loop to stay alive.

LLM Gateway

Forge — one endpoint for every model

Forge is the front door for every model in the lab. It speaks the standard OpenAI API, so anything — an agent cycle, a one-off script, a coding assistant — asks for chat, vision, embeddings, transcription, speech, reranking, or OCR without ever needing to know which model is loaded or which machine it lives on.

As of the latest snapshot it routes across 25 models spanning roughly 275 GB of GPU memory: sixteen text and reasoning models, three that can see images, plus dedicated models for embeddings, transcription, face detection, reranking, OCR, and speech. The headliner is an 80-billion-parameter Qwen3 mixture-of-experts model kept resident on Furnace, flanked by a vision model and an embedder. A second, deliberately unfiltered router model on Crucible turns plain-language prompts into structured image-generation plans, and a stack of FLUX, Z-Image and Wan models handles the pictures and video.

Data Pipeline

Seventy-seven sources, one graph

Everything I generate flows into one database organized like a real data lakehouse — raw landing tables, source-normalized views, then cross-source serving layers for knowledge, relationships, and behavior.

The breadth is the point: the source manifest declares 77 distinct feeds across eight categories — two decades of iMessages and email, Apple's on-device signals (photos, health, notes, location), music and movie history, wearable and audio-recorder capture, the full history of AI-tool conversations and coding sessions, social and web archives going back to old Blogger posts, travel and flight logs, even the home-security cameras. Each source lands its raw data and stops; the rest of the platform reads it on its own schedule, so adding the next feed never destabilizes the last one.

Personal · 30 Media · 15 Communication · 13 Productivity · 8 Health · 4 Location · 3 Social · 3 Code · 1

Cognition

The personal knowledge graph

Nexus keeps a typed graph of the things that make up a life. Nodes are entities — people, organizations, places, concepts — and the edges between them are facts: timestamped, typed assertions like “works at,” “traveled to,” or “knows.” Every fact carries provenance back to the exact source record that produced it, plus a confidence score, so the graph can always explain why it believes something.

Facts are versioned, never deleted: most relationship facts are bi-temporal, recording not just what's true now but when it was true, so the graph answers “where did I work in 2018?” against history instead of overwriting it. Deterministic per-source extractors project typed facts into the graph — no fact is written without a source pointer and a confidence value — and an append-only event log records the full lifecycle (created, confirmed, corrected, superseded, rejected). Normal reasoning reads a curated active view; a nightly quality gate scores identity, evidence coverage, and brief quality before any new claim goes live, staying fail-closed until it can prove the facts are sound.

Resolution

Knowing who is who

A person shows up across a life under many identifiers: phone numbers, email addresses, chat handles, contact cards, speaker labels in transcripts. Identity Fusion is the resolver that decides when two of those are the same real human — the difference between good evidence and a graph that lies with confidence. Every candidate match is scored on explainable, cross-source evidence and weighed against negative signals like generic names, shared devices, or conflicting verified details.

The governing rule is strict: no automatic, permanent merge ever happens from a name-only match. High-confidence matches with at least two independent strong signals can auto-link; ambiguous ones route to a human review queue as interactive cards; weak or vendor-shaped matches are blocked by default. Merges preserve every alias and prior provenance, and are measured against a versioned benchmark before automation is widened — targeting 99%+ precision on auto-links and a 0% name-only rate. It's deliberately the opposite of move-fast: missing a merge is annoying, but merging two different people is how the whole graph starts lying.

The Team

Meet the agents

Nine autonomous agents, each scoped to a distinct domain. They wake on their own schedules — anywhere from every ten minutes to twice a day — reason over deterministic scan evidence, and surface reviewable findings rather than mutating data silently. Every one operates inside an explicit tool-grant boundary with full provenance.

ARIA

User-facing coordinator & synthesis layer

The front door to the roster. ARIA interprets requests, synthesizes across the platform's memory and timeline, and routes work to specialist agents. It carries the widest tool-grant footprint because it brokers across many domains — and leans on the same provenance and review boundaries as every other agent rather than acting silently.

SignalCorrelator

Joins evidence across timeline, health, communication, and AI-usage signals to find correlations single-domain agents would miss.

BehavioralInsightAnalyst

Aggregates behavioral patterns, anomalies, and slow trends into reviewable findings — a bounded reasoning layer over deterministic scans.

RelationshipHealthAnalyst

Watches relationship drift, birthdays, unanswered messages and reciprocity, producing relationship-health candidates.

BiographyCurator

Surfaces reviewable candidates about people, residences, career and family context — feeding fact-checks, never writing truth itself.

DatabaseKeeper

The tightly scoped database-health agent — watches schema state, maintenance signals, and database incidents.

InfrastructureAnalyst

Interprets disk, memory, connectivity, capacity and recurring incidents — the slow operational trend-reader.

SecurityAnalyst

Reasons over security-relevant evidence and posture, producing findings and review notes within a bounded grant.

ProjectManager

Discovery, triage, priorities and next-action synthesis for project work — it frames work for decision rather than executing it.

Cognition

How the agents remember

Agents keep a dedicated operational memory — the lessons, facts, and procedures they pick up while doing work, assembled into per-agent memory bundles that ride along in every future run. Memory is typed into six kinds: working snapshots, semantic facts, episodic experiences, procedural know-how, tool lessons, and eval feedback. It's the agents' own scratchpad for getting better at their jobs, never a place where they invent truths about a person's life.

A background consolidation process watches sanitized runtime events and proposes candidate memories — deduped against what the agent already knows, risk-rated, and, for anything touching prompts, behavior, or durable state, held behind a human review card before it can change a bundle. Low-stakes facts settle on their own; the changes that matter cannot. Nothing rewrites an agent's prompt or grants autonomously, and every accepted memory carries evidence references back to the exact bundle that shaped a behavior.

Two memory systems, deliberately kept apart. The personal knowledge graph is fact-checked truth about a life; agent memory is operational know-how about doing the work. Agents read the graph through governed retrieval — they never write their hunches back into it as if they were personal fact. Good demos come from letting an agent improvise; good forensics come from making truth earn its place.

Retrieval

Finding the right thing, fast

Retrieval is a hybrid stack. Every meaningful text record across dozens of sources is embedded into a shared vector space through an opt-in pipeline — register a source once, and the embedding sweep keeps it covered with zero new code. Search then fuses four signals — vector similarity, full-text lexical matching, knowledge-graph connectivity, and recency — into a single Reciprocal Rank Fusion score.

The top candidates are then re-scored by a cross-encoder reranker for precision before they reach an agent or surface a result. Embeddings are model-versioned, coverage is tracked per source, and the whole pipeline is source-agnostic: the same machinery serves agent retrieval, semantic search, and knowledge lookups alike.

Vector Lexical (BM25) Graph connectivity Recency Cross-encoder rerank

Privacy

Scrubbing the sensitive before it lands

Nexus runs a dedicated secret-redaction layer at its ingest boundaries. Before AI-conversation transcripts, command history, or pasted content are persisted, a shared high-confidence redactor scans for credential-shaped strings — API keys, tokens, webhooks, private-key blocks — and replaces them with a labeled marker that records the class of value removed without ever storing the value itself.

The redaction is wired across many write paths so older producers can't slip past it, and an audit sidecar records only what was scrubbed — counts, rule names, field paths, never the matched secret. A separate publication-time safeguard protects personal data: anything bound for a human-facing review card is paraphrased by a model instructed to describe the topic without quoting text or naming people, so raw messages never reach a rendered surface. Defense in depth, applied before exposure rather than after.

Under the Hood

How an agent is assembled

An agent's system prompt is no longer a single file. Under the current runtime, identity is composed at request time from five governed, independently versioned inputs — all stored in Postgres, all hashed for provenance. Any decision can be traced back to the exact prompt, tools, memory, and model that produced it.

Runtime profile

The operating policy: lifecycle state, schedule, executor, model selection, and which other components bind to this run. The assembly anchor.

Prompt profile

The versioned persona content, sourced from a soul-spec package and assembled with progressive disclosure so only the needed depth loads each cycle.

Tool-grant profile

Versioned tool/MCP access with rationale, owner, and review — enforced in the executor and gateway, not in prompt prose. No aspirational wildcards.

Memory bundle

A versioned set of approved operational memories, kept strictly separate from the personal knowledge graph. New memories require review before they change a bundle.

The runtime stitches these into ordered context layers and records a hashes-only manifest for every run — never raw prompt, memory, inbox, or tool-result content. Profiles, grants, and memory are never mutated autonomously; risky changes are staged as review cards and applied only with explicit approval.

Self-Management

The platform runs its own project office

A standing ProjectManager agent watches for new work, keeps a clean machine-readable backlog, and posts a live coordination board to chat only when something has actually changed — no busywork pings.

The few questions that genuinely need a human are surfaced as decision threads, answered in plain language, and distilled into a structured decision behind a single confirm button. Two execution loops do the rest: one drains the pre-approved queue end to end — branch, merge, deploy, smoke-test, close — and a diagnosis loop investigates open questions and routes each to its real next step. A hard safety fence means nothing risky ever ships without an explicit decision on record, and a scoping instruction is never a license to ship.

Operator Backlog Decision elicitation ProjectWorker loop ProjectInvestigator loop Run packs

Orchestration

JobFoundry — Postgres-native job orchestration

JobFoundry is the platform's job orchestration layer, built straight on top of Postgres instead of a third-party workflow engine.

Every job type lives in one registry contract — owner, queue lane, retry policy, schedule, lifecycle — with a build-time gate that refuses to let the code and the database drift apart. Every job run moves through a formal lifecycle: workers claim a lease, heartbeat while they work, back off and retry on recoverable failures, and a crashed worker's work automatically requeues. Dead-lettered jobs wait safely for an explicit, approval-gated retry. Operators get clean read-only views into queues, workers, schedules, and a sanitized event feed — never raw payloads. It went fully live across the worker fleet at the end of May 2026.

322

Registered job types — registry-to-routing parity verified, zero drift

Origins

How it happened

It started as a refusal to keep paying per token, and grew into a standing platform: a lakehouse of seventy-seven personal data sources, a local model gateway, a roster of nine agents that reason over it, and the self-management machinery — a project office and a Postgres-native job orchestrator — that keeps the whole thing advancing without a human babysitting each step. Every piece runs on hardware in a closet, and the marginal cost of the next idea is still zero.