Teaching AI to Learn From Its Mistakes

TL;DR

Via captures structured knowledge from every agent run using marker tags, deduplicates it with Gemini embeddings (cosine similarity > 0.85 = duplicate), stores it in SQLite with FTS5, and injects relevant learnings into future agent prompts using a hybrid search (30% keyword + 70% semantic). The database holds 1,604 learnings — and the 360 errors are the most valuable part.

The Amnesia Problem

Every time you start a new conversation with an AI, it forgets everything. That debugging trick it figured out yesterday? Gone. The architectural decision it helped you make last week? Vanished. The error it spent 20 minutes tracking down? Ready to be rediscovered from scratch.

This isn't just inconvenient. When you're running an orchestration system that spawns dozens of agents across hundreds of missions, amnesia becomes expensive. The same errors repeat. The same patterns get rediscovered. The same workarounds get reinvented.

I spent a week building a system to fix this. Via's learnings system captures knowledge from every agent interaction, deduplicates it using Gemini embeddings, and injects relevant past learnings back into future agent prompts. The result: a feedback loop where agents genuinely get smarter over time.

The Feedback Loop

The core concept is a closed loop:

Learnings Feedback Loop

Output Categories

AvoidErrors to avoid

ApplyPatterns to use

ConsiderDecisions to weigh

Learning Types

insight

44%

error

22%

source

12%

decision

12%

pattern

behavior

Dedup Tiers

Novel<0.7

Near-dup0.7–0.85

Duplicate>0.85

Agent ExecutionOutput with markers

Learning Capture6 marker types

SQLite + Embeddings1,604 learnings

Hybrid SearchFTS5 + cosine

Prompt Injection~500 tokens

Here's the text version of the flow:

Agent runs a mission phase
  → Agent output contains marker tags (LEARNING:, GOTCHA:, FINDING:)
  → capture.go parses markers from output
  → Deduplication check (cosine similarity on embeddings)
  → Novel learnings stored in SQLite (text + embedding vector)
      |
      v
Next agent spawns
  → spawn.go queries SQLite for relevant learnings
  → Hybrid search: FTS5 keywords + semantic similarity
  → Top matches injected into the agent's CLAUDE.md

The key insight: agents are already producing knowledge as a side effect of doing their work. A researcher discovering a useful API writes FINDING: The Frankfurter API provides free exchange rates with no auth required. A developer hitting a build error writes GOTCHA: SQLite FTS5 triggers must be created after the main table, not before.

This knowledge was always being generated. It was just evaporating. The learnings system catches it and persists it.

The Capture System

Learning capture happens in internal/learning/capture.go. The system watches for structured markers in agent output:

internal/learning/capture.go

var markers = map[string]string{
  "LEARNING:":  "insight",
  "FINDING:":   "source",
  "GOTCHA:":    "error",
  "DECISION:":  "decision",
  "PATTERN:":   "pattern",
}

Different personas produce different markers. The security-auditor emits VULNERABILITY: tags. The performance-engineer emits BOTTLENECK: tags. The architect emits DECISION: and TRADEOFF: tags. Each persona naturally documents knowledge in its domain of expertise.

When an agent finishes a phase, capture.go scans the output, extracts every tagged line, generates a Gemini embedding for each one, and sends it to the deduplication pipeline.

Semantic Deduplication

The most important feature is what doesn't get stored. Without deduplication, the database would fill with minor variations of the same insight — "use FTS5 for search," "SQLite FTS5 is good for text search," "FTS5 provides full-text search in SQLite." Three entries saying the same thing.

When a new learning arrives, the system checks it against existing entries using Gemini embeddings:

internal/learning/db.go

// Dedup check against existing learnings
matchID, similarity, err := d.findMostSimilar(domain, l.Embedding)
if similarity > 0.85 {
  // Duplicate: increment seen_count, skip insert
  d.db.Exec(
      "UPDATE learnings SET seen_count = seen_count + 1 WHERE id = ?",
      matchID,
  )
  return DedupDuplicate
}

The threshold is 0.85 cosine similarity. Above that, we treat it as a duplicate and increment seen_count on the existing entry instead of inserting a new row. Below 0.70, it's definitely novel. Between 0.70 and 0.85 is the "near-duplicate" zone — similar enough to flag but different enough to store separately.

This dedup mechanism creates an unexpected signal: repetition frequency. If a learning has been "seen" 10 times across different missions, it's likely a fundamental truth about the project or the tooling. High seen_count entries are our most reliable insights. They're the things agents keep rediscovering because they matter.

Hybrid Search Injection

When spawning a new agent, the system needs to find relevant learnings from the database. Not all 1,604 — just the ones that matter for this specific phase.

It uses a hybrid search strategy combining two approaches:

Keyword Search (FTS5): Fast and precise. Great for specific error messages, library names, and exact terminology. If the phase mentions "FTS5," keyword search will find every learning that mentions "FTS5."
Semantic Search (Cosine Similarity on Embeddings): Catches conceptual matches. "Authentication error" matches "login failure." "Performance bottleneck" matches "slow query." The semantic layer understands meaning, not just words.

The final score blends both: 0.3 * keyword_score + 0.7 * semantic_score. The 70/30 weighting toward semantic search ensures we catch conceptually relevant learnings even when the terminology differs, while the keyword component prevents semantic drift and respects specific technical terms.

The top-scoring learnings get injected into the agent's prompt, formatted as categorized advice:

## Prior Learnings (Apply / Avoid / Consider)

- Apply: FTS5 + semantic embeddings hybrid search outperforms either alone
- Avoid: go build fails silently when embedding directive references missing file
- Consider: SQLite over Postgres for CLI tools — simpler deployment, sufficient for single-user

What the Data Shows

The database currently holds 1,604 learnings and 54 meta-learnings. Here's the breakdown:

Category	Count	%	What it captures
Insights	703	43.8%	Techniques, patterns, "this works well"
Errors	360	22.4%	Mistakes, gotchas, "this broke"
Sources	195	12.2%	Useful APIs, docs, references
Decisions	185	11.5%	Architectural choices with rationale
Patterns	148	9.2%	Reusable approaches across domains

The 360 errors are the most valuable category. Each represents a mistake that an agent made, documented, and that no future agent should repeat. "Avoid: go build fails silently when embedding directive references a missing file" saves a future agent 15 minutes of debugging — multiplied across hundreds of missions.

The Honest Limitations

The system works, but it has clear gaps:

No learning decay. Old learnings persist indefinitely. A workaround for a bug that's since been patched still gets injected into agent prompts. I need a mechanism to mark learnings as stale or to weight recent learnings higher.

No quality feedback. The system tracks how often a learning is seen (via dedup), but not whether it was helpful. An agent might receive a learning and ignore it because it's irrelevant. There's no signal for this.

Embedding quality variance. Gemini embeddings are good but not perfect. Occasionally, two genuinely different insights get flagged as duplicates because their surface-level language is similar, even though the underlying concepts differ.

Despite the noise, the signal is strong enough to matter. Agents running today are measurably better than agents running two weeks ago, because they stand on the shoulders of 1,604 prior insights. The feedback loop works — it just needs refinement.

Next in series: Six Plugins, One Brain

What 1,600+ AI Learnings Reveal — A data-driven analysis of patterns in the learnings database
LifeOS: Building an AI-Powered Personal Operating System — The Obsidian-based system that uses similar hybrid search patterns