AgentsPKMPrivacy 2026-04-19

Beyond the Research Wiki: What Happens When the Data Is You

What changes when a personal knowledge base is about you, decisions, energy, health, inner life. Domain policy files, behavioral contracts for AI agents, and why one librarian with one behavior breaks down.

Also on Medium.

On April 2, 2026, Andrej Karpathy published a short document called llm-wiki.md. It described a pattern for building personal knowledge bases with LLMs: markdown over RAG, structure over embeddings, an agent that navigates rather than searches.

My project’s dependency folder has a timestamp of March 11, three weeks earlier. That’s when I ran pnpm install, when the system was already running, already reading my vault, already showing me my week. My first git commit, dated April 4, is named “Complete Alice Dashboard v1.” I hadn’t set up version control yet. The dashboard existed before I thought to track it.

We converged on the same core insight from different directions. His problem was research data.

Mine was myself.

The pattern that works

Karpathy’s central observation is precise: “A large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge.” The insight isn’t just about tools, it’s about where LLM effort is actually valuable. Retrieval-Augmented Generation solves the wrong problem at personal scale. You don’t need vector embeddings and similarity search for a hundred articles. You need good structure, maintained index files, and a model that can navigate a filesystem. The research wiki still has a search engine, but it’s naive text search, not RAG.

The system is clean: a /raw folder for incoming sources, papers, articles, repos. An LLM that incrementally compiles them into an interlinked wiki. Health checks that find inconsistencies and fill gaps. The LLM writes everything; you read it. Every interaction compounds. Nothing is lost. You can also connect the LLM to enrich the library further, generating summary notes, diagrams, or cross-reference pages, and mix in your own annotations alongside the generated content. It works because the data is uniform, external knowledge, authored by others, all the same type of thing. One librarian, one set of rules, one behavior throughout.

That assumption breaks the moment the data is about you.

The problem with personal data

When I started building my dashboard, I wasn’t trying to solve a research retrieval problem. I was trying to build a system that could hold context about my own life, work sessions, energy state, weekly plans, health patterns, and surface it without me having to reconstruct it from scratch every morning.

The vault I work with has several distinct domains: Work, Health, Home, Future, Career, Learning, and a private area for self-observation notes. Each of these contains a different kind of data. Work notes document decisions and blockers. Health logs track patterns over time. The self-observation area contains notes about how I function, energy load, recovery state, what’s draining and what isn’t.

These are not interchangeable. A weekly review in Work can be summarized, cross-referenced, and compiled freely. A note in the self-observation area cannot be “interpreted.” If an agent reads that I had a high-load week and responds with suggestions for managing it better, that’s not helpful, that’s a violation of what the folder is for.

One librarian with one behavior doesn’t work here. The data class determines what the agent is allowed to do with it.

Three architectural decisions

1. Domain policy files

Each domain in my vault has a AGENT.md file. These are not README files or documentation, they are behavioral contracts.

Self/AGENT.md contains rules like: no interpretation, no suggestions unless explicitly asked, observations only when requested. Career/AGENT.md is permissive: summarize, suggest, cross-reference freely.

These files are the load-bearing decision in this architecture. Everything else, the conversation flow, file edits, merge logic, depends on them being good. Early versions were thin, and the agent’s behavior was correspondingly generic. Once I rewrote them to actually describe what each area is for, what success looks like, what should be flagged, the quality of every review changed sharply.

The enforcement is explicit: when the Architect agent starts a weekly review, all five domain AGENT.md files are read and injected into the system prompt under domain headers, each truncated to 1500-2500 characters. The LLM receives the behavioral rules for every domain before it touches any content. There’s no runtime routing, the constraints are baked into the context the model reasons from.

This is a policy layer on top of structure. The moment you include self-observation data alongside curated knowledge, you need to specify not just what the agent can read, but how it’s allowed to respond to what it reads.

2. Authorship classification

The research wiki’s golden rule: the LLM writes everything. It works because everything is derived from external sources, the authorship question doesn’t arise.

In a vault that contains personal logs, it matters enormously who wrote what.

My daily notes are mine. My weekly reviews are mine. My Architect-generated files, the daily letters and weekly spreads the agent generates at the start of each week, drawing on vault state and current priorities, are not mine, even though they live in the same vault. (The daily letter I read each morning comes from here: a short piece written by the agent, specific to where I am in the project cycle.) If the agent reads a generated letter and treats it as my own reflection, it’s reasoning over its own previous output as if it were primary data. That’s a feedback loop, not a knowledge base.

Architect-generated files live in dedicated directories; the agent finds them by path, not by frontmatter. The type field in frontmatter exists for Obsidian search and Dataview, documentation of intent, not an enforcement mechanism. The authorship distinction is enforced by where files live, not by how they’re tagged. The separation is enforced; the retention policy is not, which is why generated content ratio appears in the gaps section below.

3. A second consumer

The research wiki is agent-only. You ask a question; the agent reads the wiki and answers. Obsidian serves as a viewer for compiled, static output, you navigate to a topic, read, close. There’s no state that changes between sessions in ways you need to see immediately.

For a personal operating system, that model doesn’t work.

I need to see my capacity level when I open my laptop in the morning, not after a conversation. I need my next three actions visible without asking. My current week, my work sessions, whether today’s daily note exists, these are ambient data, not query results. They need to be surfaced continuously, not on demand. Obsidian isn’t built for live state; I needed a separate UI.

The dashboard is a React SPA backed by a Hono.js server. Both the agent and the UI read from the same lib/ layer, lib/vault-parser.js, lib/vault-weekly.js, lib/calendar.js. The agent accesses these through MCP endpoints (a protocol that lets the agent call the same data functions the UI uses) that mirror the HTTP API. Two consumers, one data layer. When I update a weekly review through the UI, the agent sees the same change the next time it reads the vault.

This architecture adds complexity. It also means the vault is genuinely live, not a static corpus to query, but a system with state that multiple consumers observe simultaneously. In practice, concurrent access isn’t an issue: the vault is 593 markdown files (~251K words), one user, agent read-mostly. Two readers on flat files over iCloud sync is a non-problem at this scale.

All data stays on disk. During a review session, vault content is sent to the model API and is not retained beyond the session.

What this makes possible

The three layers work together in a specific way.

When the agent opens a weekly review session, it doesn’t just read my notes. It reads my current capacity (1-5, updated at each session transition), the domain-specific rules for how to engage with different parts of my life, and the difference between what I wrote and what it generated previously. It adjusts its behavior based on where I actually am, not just what I’ve documented.

A typical day looks like this. I open the dashboard and log my energy level and any health data. I read the daily letter, a short piece the agent wrote at the start of the week, specific to where I am in the project cycle. I run through the small Home tasks that form a morning ritual. Then I switch to the Work tab and start a timer. Throughout the day I drop completed tasks into the “done today” list; there’s a context field at the top where I pin the main thing I’m working on, because context switches are frequent. I check Daily Minimum a few times, a collapsed progress bar showing the non-negotiables for the day. After work I go to Career or Learning, pick one item from the week’s plan, read something, watch something, write something, and mark it done.

None of this requires reconstructing where I am from scratch. The dashboard holds the state. The agent wrote the plan. I execute.

A research library is a place you go to find things. A personal operating substrate is something that holds context about how you function and surfaces it at the right time.

Where this differs from what exists

Several comparable systems exist. MCP servers over Obsidian vaults are a growing category, generic CRUD bridges that expose notes as agent-readable data. External dashboards over markdown exist as Obsidian plugins. LLM-maintained knowledge bases in the spirit of Karpathy’s pattern are being built across the community.

None of them combine these elements in the same way, and the research confirms three specific gaps:

Per-domain behavioral contracts for personal data. The AGENTS.md convention exists for code repositories, per-folder instructions for coding agents. Applying the same pattern to life domains with structurally different rules (Career: suggest freely; Health: observe only; self-observation: never interpret) is not documented elsewhere. The ethical asymmetry between domains is what makes this non-trivial: the agent needs different behavior not because the topics differ, but because some data requires protection from interpretation.

External live dashboard alongside an AI agent, on the same data layer. Obsidian plugins provide dashboards within the app. What doesn’t exist, and wasn’t found in any comparable project, is an external React SPA that reads and writes the same vault the AI agent uses, through shared parsing code, with both consumers staying in sync. Two interfaces, one source of truth.

Markdown-native personal operating system without RAG. Tools like Khoj use vector search over personal notes. This system uses none, 593 files and ~251K words fit in a context window. The structural approach (maintained index files, navigation over retrieval) is the right choice at this scale, and it’s the same choice the research wiki pattern made. The difference is the data class it’s applied to.

What doesn’t work yet

Not everything is resolved.

Index staleness. Maintained index files are a commitment. My vault currently has dozens of orphan notes and files missing frontmatter. It requires regular maintenance that doesn’t happen automatically.

Generated content ratio. Each week the Architect produces a week plan, seven daily letters, and a review session file. Over time, if generated content accumulates without a retention policy, the agent increasingly reads its own previous output alongside authored notes. I don’t have a formal archiving policy for Architect-generated files yet. This needs to be an explicit architectural decision, not something I defer.

Evaluation is informal. I know the system is working because I use it and it feels useful. That’s not a measurement. The same gap exists in most personal tooling at this scale, but it’s still a gap.

The constraint that drives the architecture

The same structural constraints point to the same structural decisions. Local-first storage at personal scale, a corpus small enough to fit in a context window, heterogeneous data that needs navigation rather than similarity search, these constraints eliminate RAG before you even consider it. That’s where the pattern published in April 2026 and this system converge: both arrived at the same foundation from different starting points.

Where the data diverges, the architecture diverges. External research data is flat, one type, one author, one set of rules. Personal data has authorship asymmetries, domain-specific behavioral constraints, and a live-state UI requirement that a pure research corpus never needs. These aren’t refinements, they’re additional layers that only make sense once you’ve changed the data class.

The foundation is correct. What you put on it depends on what the data actually is.

Built with Hono.js, React, and an MCP server over a local Obsidian vault.