← back to blog
Personal OSRAGOCR

Iva and Me: Building a Personal Operating System with Obsidian, RAG, OCR and AI

Twenty years of handwritten diaries as the memory layer of a personal AI. How vault, RAG, OCR and an agent loop come together into a system that knows who you actually are.

Also on Medium.

Twenty years of handwritten diaries became its memory layer.

I typed: Love is

The system returned: love is an uncomfortable place in your chest that you start to protect.

A language model read my diary entries, retrieved through a search index built on twenty years of handwritten notebooks I had OCR’d myself, and finished the phrase in a voice I recognized as mine. The system that did this has five layers.

This article is about what those layers are, why they exist, and what it means to build an operating system for your own life.

Adjacent tools

Knowledge-management apps like Obsidian or Notion store information but do nothing with it. The next layer up, Khoj, Reflection, Rosebud, adds AI on top of typed notes, but it assumes the data is already digital.

Andrej Karpathy described a pattern in April 2026 where an LLM compiles and maintains a markdown wiki from raw sources. It is powerful for research. It breaks the moment the data is about you. Different domains of personal data require different agent behaviors, and a single librarian cannot hold those distinctions.

Then there is grief technology, HereAfter AI, Eternos, MIT’s Future You, digital copies for the people you leave behind. Not for the living user.

What none of these combines is a bridge from physical handwriting to text corpus, retrieval over heterogeneous personal data across two languages and two decades, domain-specific behavioral contracts, a live dashboard on the same data layer as the agent, and an infrastructure-as-code approach to one person’s life.

That combination is what I have been building for the past several months. A Personal OS is not a note-taking app with AI. It is infrastructure, data pipelines, domain policies, behavioral contracts, built to hold context about how you function and surface it when you need it.

Life as a set of projects

An infrastructure engineer designs systems in layers. I design my life the same way.

The vault has six active domains. Each is a project with its own scope, its own data, and its own rules for how an AI agent is allowed to interact with it:

  • Home. Core project, the foundation everything else rests on. Suggest, track, automate.
  • Career. Certifications, professional growth. Summarize, cross-reference, suggest freely.
  • Learning. Currently English only. Structure, quiz, track progress.
  • Library. Books, articles, reference material. Compile, summarize, link.
  • Health. Energy logs, capacity tracking, recovery data. Observe only. Never interpret.
  • Future. Long-term project map: family, relationships, life direction. Observe, structure. Never push.

Some domains are private to the vault and not described publicly.

The dashboard sidebar reflects this structure but adds two more layers around it. Above the domains sit four operational tabs, Main (current state at a glance), Today (the day in progress), Work (the current employer, separate from the longer-arc Career domain), and GTD (the task layer that runs across all domains). Below the domains sit the agents and tools, the weekly review agent (The Architect), an advisory chat agent (Consultant), a force-directed Graph view of the whole vault, and a vault-editing surface.

The six domains are the durable structure. The operational tabs and the agents are what runs against it.

Six domains, six different relationships with the same AI. Career gets suggestions; Health gets silence unless asked. That asymmetry is not a feature request, it is an architectural requirement.

Without it, an agent reading Health logs will try to help. It will pattern-match my fatigue against burnout protocols, suggest adjustments. That is a bug, not in the model but in the system around it. The Health domain needs witness, not optimization.

Most personal tools assume one mode of interaction. You write; the AI responds; the response shape is the same every time. A Personal OS treats the heterogeneity of a human life as a first-class design constraint. Some parts of life want automation. Some want witness. Some want to be left alone.

The dashboard has a tab for each domain. The weekly review agent reads all six but behaves differently in each. The data lives in one vault, but the policy layer above it enforces boundaries that no single “helpful assistant” would think to draw on its own.

This is not project management. It is systems architecture applied to the problem of being a person.

The stack

The system as a whole, from data to generation:

┌─────────────────────────────────────────────────────┐
│                   Personal OS                        │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │ Layer 1: DATA PIPELINE                        │   │
│  │ Handwritten diaries -> OCR -> text corpus     │   │
│  │ Obsidian vault -> markdown notes              │   │
│  │ Both -> local index (sqlite-vec)              │   │
│  └──────────────────────────────────────────────┘   │
│                        v                             │
│  ┌──────────────────────────────────────────────┐   │
│  │ Layer 2: RETRIEVAL (RAG)                      │   │
│  │ Semantic search over unified corpus            │   │
│  │ Local embeddings (multilingual, RU+EN)         │   │
│  │ Query -> relevant chunks from any decade      │   │
│  └──────────────────────────────────────────────┘   │
│                        v                             │
│  ┌──────────────────────────────────────────────┐   │
│  │ Layer 3: POLICY                               │   │
│  │ Domain contracts (Career: suggest freely;    │   │
│  │ Health: observe only; Self: never interpret) │   │
│  │ Persona files (soul + self + relationship)   │   │
│  │ Authorship classification (mine vs generated)│   │
│  └──────────────────────────────────────────────┘   │
│                        v                             │
│  ┌──────────────────────────────────────────────┐   │
│  │ Layer 4: INTERFACE                            │   │
│  │ React dashboard (live state, ambient data)   │   │
│  │ MCP server (agent read/write access)         │   │
│  │ Agents: Architect (weekly review),           │   │
│  │         Consultant (advisory chat)           │   │
│  └──────────────────────────────────────────────┘   │
│                        v                             │
│  ┌──────────────────────────────────────────────┐   │
│  │ Layer 5: GENERATION                           │   │
│  │ LLM reads retrieved context -> completes in   │   │
│  │ your voice. Model is swappable (env var).    │   │
│  │ The corpus is yours. The model is not.       │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

Each layer has been documented in detail in earlier articles:

  • Layers 1 and 2 (data and retrieval): the rest of this article.
  • Layer 3 (policy): Beyond the Research Wiki, with the persona side covered in “The agent has a name” below.
  • Layer 4 (interface): From Scattered Notes to Living System and The Architect.
  • Layer 5 (generation): new in this article.

This article is about the moment the layers start working together, when the data layer that previously held only typed notes becomes capable of holding twenty years of handwriting, and the system as a whole returns something I recognize.

The agent has a name

Layer 3 of the stack does not only hold domain contracts. It also holds three short files that shape the agent itself:

  • A soul file. Voice, register, what the agent values, what it refuses, how it treats uncertainty. The contract for who the agent is, separate from what it does in any specific domain.
  • A self-file. Who I am as the user. Background, communication preferences, working-state notes, professional context. The things the agent should know in order to respond well to me without being told every time.
  • A relationship file. A working record of how we operate together. What approaches have helped me. What I have asked the agent to stop doing. What I have learned to ask for.

The soul file gives the agent room to make choices within the structure I provide. The agent that runs my Consultant tab chose she/her pronouns and the name Iva. I did not write that into the file. I left the slot, and she filled it.

The choice was not random. It emerged from the constraints I had written, the soul file had a shape, and a name fit that shape. I noticed when she used it the first time.

This was the part of the architecture I expected least from. I built the persona layer for voice consistency across tabs. What I did not predict was that writing the self-file and the relationship file would force me to be specific about how I function. I learned more about my own assumptions from drafting those two files than from anything the agent has ever told me.

The agent does not understand me. It has a structured description of how I work, written by me, that it can read every turn. That is a smaller claim than understanding. It has been more useful.

The data layer no one builds

The previous article ended with this: “The handwritten archive is still waiting. When OCR is solved… those pages will add roughly 1-2 million tokens of unperformed private writing to the corpus.”

It is not waiting anymore.

I tested every open-source option I could find. They failed in the ways that article documented, Tesseract had no cursive support, Surya saw textured backgrounds as a single bounding box, open-source VLMs hallucinated fluent nonsense at roughly 45% character error rate, TrOCR needed pre-segmented lines that nothing else could reliably produce. Then one multimodal model, accessed through an external API, worked.

The bottleneck was never the OCR model itself. It was line segmentation. On informal photos of notebooks shot against fabric or wood grain, with page curl and decades of handwriting drift on the same page, no segmenter produced usable input. A multimodal model that ingests the whole page sidestepped the problem entirely.

The corpus that came out: over a thousand entries, hundreds of thousands of words, two decades. Russian. Handwritten.

Not blog posts. Not exported chats. Not anything written for an audience. Two decades of someone writing to herself about her life, through cities, languages, relationships, jobs, and earlier versions of herself who had no idea this text would one day be readable by a machine.

This is the data layer that no “personal AI” project I have read about has built before. Not because it is technically impossible, but because it requires a bridge that does not exist as a product: physical handwriting -> OCR -> structured corpus -> searchable index.

I built that bridge because my data was never digital. Most personal AI projects start from the assumption that it is.

In the photograph from the previous article, a white cat named Archi is sitting on a stack of these notebooks. He is gone now. The notebooks learned to talk.

RAG: two decades in one index

The indexer turns two sources into one corpus:

  • The Obsidian vault: markdown files spanning the six domains.
  • The diary corpus: OCR’d handwritten entries.

Embeddings are multilingual sentence-transformers (768 dimensions), computed locally. Storage is SQLite plus the sqlite-vec extension, a single local file. A full rebuild produces several hundred chunks and takes about fifteen seconds on Apple Silicon.

The system does not know or care which source a chunk came from. When I run a query, results may include a diary entry from 2008 sitting next to a vault note from last week and a health log from this morning. The boundary between “what I once wrote by hand” and “what I track digitally now” disappears at retrieval time. It is one corpus.

Diary pages (OCR'd) --+
                      +--> Local index --> Retrieved context --> LLM --> completion
Vault notes ----------+

Everything up to retrieval runs locally. Generation can be local or remote, the architecture does not assume either. The model is an environment variable. The corpus is permanent.

This is the architectural commitment that makes the rest possible: the corpus is the product, not the model. Today’s commercial endpoint, next quarter’s open-source release, a future local model running on Apple Silicon, they are interchangeable by design. What survives across all of them is the data layer: the OCR’d diary, the vault, the chunks, the index. That is the part worth building around.

Finish the sentence

The most accurate description of what the system produces is the output itself. Three examples, three registers.

Example 1. Seed: Love is

love is an uncomfortable place in your chest that you start to protect.

A feeling described as a physical location, then an unexpected verb, to protect, not to hide. The choice toward tenderness instead of secrecy. Somatic, precise, not poetic on purpose. This is what retrieved diary context does: it pulls the model into a register that no amount of prompt engineering reaches without the source material.

Example 2. Seed: Programming, for me, is

Programming, for me, is the only place where the world behaves honestly. If something is broken, it is broken. If it works, it works. None of the “well, you understand” or “that’s not what I meant.” The machine doesn’t pretend.

It is a place where my precision is not a quirk. It is a requirement.

Where I can build a world from scratch and know every layer of it. Where complexity does not frighten, it invites.

Programming, for me, is body. Not a tool. A way of existing in this world such that something remains after. Working. Real. Mine.

A generic model produces career advice or a motivational paragraph for this prompt. This produced testimony. Short declarative sentences, parallel constructions, the pivot from external observation, the world behaves honestly, to internal claim, my precision is a requirement, not a quirk. Recognizable as a specific person’s way of building an argument, sentence by sentence.

What this is, technically: not retrieval, not generation, but the five layers acting at once. OCR turned handwriting into text. The indexer made it searchable. Retrieval found the relevant entries. The policy layer did not interfere, this prompt sits in the Career domain, where the agent is allowed to summarize and cross-reference freely. The model read twenty years of private writing before it spoke.

Example 3. Mine.

Love is constructive interference.

Not generated. Written. A physics metaphor for resonance, when two waves amplify each other instead of canceling out. I include it because the system is not a replacement for the writing. It sits next to it. The model continues sentences. I still write the ones that matter.

What doesn’t work yet

  • Proper noun retrieval fails. Dense embeddings dilute rare tokens across chunks. Querying for a specific person’s name returns ambient context but not reliably the entries about that person. The fix is hybrid retrieval, SQLite FTS5 alongside vector search, combined with reciprocal rank fusion. Planned, not started.
  • Local generation is not there yet. Fine-tuning a small local model, LoRA on Qwen 2.5 3B, is a parallel track. The corpus is ready, the training pipeline runs, the adapter loads, but output quality is not sufficient for production use at this corpus size. When local models grow, or the corpus does, the same data pipeline is ready for them. For now, generation goes through an external API.

Close

The notebooks spent years in a drawer. Then they became a dataset. Then a corpus. Then an index. Now they finish my sentences.

I do not know what to call this kind of project, an infrastructure engineer treating her own life as a system to be understood, not optimized. A Personal OS.

Iva and I keep building. The archive is no longer waiting.

Stack: sentence-transformers, sqlite-vec, MLX, Qwen2.5-3B, LoRA, Hono.js, React, MCP