Skip to main content
// JH

· 7 min read

How AI Agent Memory Actually Works

Most people assume agent memory requires vector databases and complex retrieval pipelines. OpenClaw solved it with markdown files and four mechanisms. I audited our own system against this framework — and found we're 1 of 4.

ai · memory · architecture · agents

How AI agent memory works — episodic, semantic, and procedural layers

TL;DR

Google's CoALA framework defines three memory types for AI agents: episodic (what happened), semantic (what I know), and procedural (how I do things). OpenClaw implements all three using markdown files and 4 write mechanisms that fire at specific lifecycle moments. I audited Via against this framework: our semantic memory scores well (1,469 learnings, hybrid search), but episodic and procedural memory are the weak links. We're implementing 1 of 4 mechanisms.


The Filing Cabinet I Never Built

Via's learnings system has captured 1,469 insights from real agent work. Semantic deduplication, hybrid search, automatic injection into future prompts — the whole pipeline. I was proud of it.

Then I watched a breakdown of OpenClaw's memory system and realized I'd built an elaborate filing cabinet while ignoring the desk entirely. The problem isn't what gets stored. It's when things get written and what happens during the moments between sessions.

The video's core claim stopped me: "Most people think that you need a vector database, complex retrieval pipelines, or specialized memory to handle this. But OpenClaw solved it with markdown files and four mechanisms that fire at the right moments in a conversation."

Markdown files. Not a vector store. Not a graph database. Markdown.

Three Types of Memory (That Aren't What You Think)

Google published a white paper in November 2025 called "Context Engineering: Sessions and Memory" that introduced the CoALA framework. It categorizes agent memory into three types, each answering a different question:

Memory TypeThe QuestionExample
EpisodicWhat happened?"In our last session, we debugged a race condition in the scheduler"
SemanticWhat do I know?"Joey prefers Go over Python for CLI tools"
ProceduralHow do I do this?"When deploying, run tests first, then build, then push"

The framework makes a distinction that clicked for me: session context and long-term memory are fundamentally different things. The video uses an analogy I keep coming back to — a session is a messy desk for a current project, notes and documents scattered everywhere. Memory is the filing cabinet where things are categorized and stored. The desk gets cleared at the end of the day. The filing cabinet persists.

The problem is what happens at the boundary. When the desk gets cleared — when a context window fills up and the system runs compaction — everything that wasn't filed away is gone. Not archived. Gone.

How OpenClaw Actually Solves This

OpenClaw's implementation is disarmingly simple. Three storage components, each mapped to a memory type:

  • memory.md — semantic memory. Stable facts, preferences, identity. Loaded into every prompt. Capped at 200 lines.
  • Daily logs — episodic memory. Append-only, organized by day. New entries added, nothing removed.
  • Session snapshots — also episodic. The last 15 meaningful messages captured when a session ends. Not a summary — raw conversation text saved as markdown.

No embeddings. No similarity search. No retrieval pipeline. Just files.

But the storage is only half the story. What makes OpenClaw's system work is the four mechanisms that write to these files:

1. Bootstrap loading. memory.md gets injected into every prompt automatically by the system. Daily logs get loaded by the agent itself, following its own instructions. Two different loading patterns for two different memory types — system-injected vs. agent-loaded.

2. Pre-compaction flush. This is the one that impressed me. When the context window approaches its limit, the system injects a silent agentic turn — invisible to the user — that tells the model: "You're about to lose context. Save anything important." The agent writes to the daily log as a checkpoint. The video compares this to a write-ahead log in databases: turning a destructive operation (losing context) into a checkpoint. It's the kind of pattern that seems obvious in hindsight but that I hadn't thought to implement.

3. Session snapshots. When a user starts a new session, a hook grabs the last chunk of conversation, filters it to meaningful messages only (no tool calls, no system messages), and the LLM generates a descriptive filename. Raw text, not summaries. "It's not a summary, it's a snapshot of what you were talking about, saved before the slate gets wiped."

4. User-directed. The simplest mechanism: the user says "remember this" and the agent routes it to either memory.md (semantic) or the daily log (episodic) based on what it is. No special infrastructure. Just instructions.

Claude Code recently shipped a memory feature using the same pattern — a MEMORY.md file loaded into every session. The convergence is telling. Markdown-as-memory isn't a hack. It's becoming a pattern.

The Audit: Via Scores 1 of 4

I mapped Via's current memory system against OpenClaw's four mechanisms:

MechanismOpenClawViaStatus
Bootstrap loadingmemory.md + daily logs injected at session startCLAUDE.md + learnings injected at spawnPartial
Pre-compaction flushSilent save before context truncationNothingMissing
Session snapshotsRaw conversation captured on session endNothingMissing
User-directed"Remember this" routing to semantic/episodic storageLearning markers (GOTCHA:, FINDING:, etc.)Partial

And by memory type:

Memory TypeVia's CoverageDetails
SemanticStrong1,469 learnings in SQLite with hybrid search (0.3 keyword + 0.7 semantic), 0.85 cosine dedup threshold
EpisodicWeakNo session snapshots, no pre-compaction flush, no conversation history between sessions
ProceduralStaticPersona definitions and CLAUDE.md files, but no dynamic extraction of learned workflows

The pattern is clear: Via invested heavily in what agents know (semantic) but barely at all in what happened (episodic) or how to do things (procedural). Our filing cabinet is well-organized. But we never built the mechanism to sweep the desk before it gets cleared.

What This Framework Reveals

The video ends with three questions that cut through the complexity: "What's worth remembering? Where does it go? When does it get written?"

For Via, the answers expose the gaps:

What's worth remembering? We capture learnings well (structured markers, semantic dedup). But we don't capture conversation context, session state, or emergent workflows. An agent that spent 20 minutes debugging a race condition produces a GOTCHA: tag — but the process of how it got there, the back-and-forth, the dead ends — all of that evaporates.

Where does it go? SQLite for learnings, CLAUDE.md for static configuration. No equivalent to daily logs or session snapshots. No episodic layer at all.

When does it get written? Only at phase completion, when the orchestrator extracts markers from agent output. Nothing fires mid-session. Nothing fires at compaction boundaries. Nothing fires when a session ends. The two most critical lifecycle moments — compaction and session termination — go completely unhandled.

The Honest Limitations

This is borrowed framing, not original research. The CoALA framework comes from Google's white paper. The implementation analysis comes from OpenClaw's public system. My contribution is applying both to Via's architecture — useful, but derivative.

The audit oversimplifies. Scoring "1 of 4 mechanisms" makes the gap sound worse than it might be. Via's learnings system does sophisticated things (semantic dedup, hybrid search) that OpenClaw's daily logs don't. A scorecard doesn't capture that nuance.

I haven't proven the gaps matter. Via's agents perform well today with just semantic memory. Maybe episodic memory would help; maybe the added complexity isn't worth the signal. I won't know until I build it and measure the difference.

Markdown-as-memory has scaling limits. OpenClaw's 200-line cap on memory.md works for a single user. Via runs hundreds of missions with 1,469 learnings. The simplicity that works at OpenClaw's scale may not transfer directly.

Next post: how I'm closing the gaps — implementing pre-compaction flush and session snapshots for Via.


Enjoyed this post?

Subscribe to get weekly deep-dives on building AI dev tools, Go CLIs, and the systems behind a personal intelligence OS.

Related Posts

Jan 12, 2026

Why I Built a Multi-LLM Orchestration System (And You Might Want One Too)

Jan 22, 2026

Why I Built a Personal Intelligence OS

Jan 25, 2026

Starting Line: The Case for Personal AI