Skip to main content
// JH

· 6 min read

Teaching AI to Learn From Its Mistakes

Most AI systems forget everything between sessions. I built a feedback loop that captures insights from agent output, deduplicates them with Gemini embeddings, and injects them into future prompts.

ai · machine-learning · sqlite · embeddings

TL;DR

Via captures structured knowledge from every agent run using marker tags, deduplicates it with Gemini embeddings (cosine similarity > 0.85 = duplicate), stores it in SQLite with FTS5, and injects relevant learnings into future agent prompts using a hybrid search (30% keyword + 70% semantic). The database holds 1,604 learnings — and the 360 errors are the most valuable part.


The Amnesia Problem

Every time you start a new conversation with an AI, it forgets everything. That debugging trick it figured out yesterday? Gone. The architectural decision it helped you make last week? Vanished. The error it spent 20 minutes tracking down? Ready to be rediscovered from scratch.

This isn't just inconvenient. When you're running an orchestration system that spawns dozens of agents across hundreds of missions, amnesia becomes expensive. The same errors repeat. The same patterns get rediscovered. The same workarounds get reinvented.

I spent a week building a system to fix this. Via's learnings system captures knowledge from every agent interaction, deduplicates it using Gemini embeddings, and injects relevant past learnings back into future agent prompts. The result: a feedback loop where agents genuinely get smarter over time.

The Feedback Loop

The core concept is a closed loop:

Here's the text version of the flow:

Agent runs a mission phase
  → Agent output contains marker tags (LEARNING:, GOTCHA:, FINDING:)
  → capture.go parses markers from output
  → Deduplication check (cosine similarity on embeddings)
  → Novel learnings stored in SQLite (text + embedding vector)
      |
      v
Next agent spawns
  → spawn.go queries SQLite for relevant learnings
  → Hybrid search: FTS5 keywords + semantic similarity
  → Top matches injected into the agent's CLAUDE.md

The key insight: agents are already producing knowledge as a side effect of doing their work. A researcher discovering a useful API writes FINDING: The Frankfurter API provides free exchange rates with no auth required. A developer hitting a build error writes GOTCHA: SQLite FTS5 triggers must be created after the main table, not before.

This knowledge was always being generated. It was just evaporating. The learnings system catches it and persists it.

The Capture System

Learning capture happens in internal/learning/capture.go. The system watches for structured markers in agent output:

internal/learning/capture.go
var markers = map[string]string{
  "LEARNING:":  "insight",
  "FINDING:":   "source",
  "GOTCHA:":    "error",
  "DECISION:":  "decision",
  "PATTERN:":   "pattern",
}

Different personas produce different markers. The security-auditor emits VULNERABILITY: tags. The performance-engineer emits BOTTLENECK: tags. The architect emits DECISION: and TRADEOFF: tags. Each persona naturally documents knowledge in its domain of expertise.

When an agent finishes a phase, capture.go scans the output, extracts every tagged line, generates a Gemini embedding for each one, and sends it to the deduplication pipeline.

Semantic Deduplication

The most important feature is what doesn't get stored. Without deduplication, the database would fill with minor variations of the same insight — "use FTS5 for search," "SQLite FTS5 is good for text search," "FTS5 provides full-text search in SQLite." Three entries saying the same thing.

When a new learning arrives, the system checks it against existing entries using Gemini embeddings:

internal/learning/db.go
// Dedup check against existing learnings
matchID, similarity, err := d.findMostSimilar(domain, l.Embedding)
if similarity > 0.85 {
  // Duplicate: increment seen_count, skip insert
  d.db.Exec(
      "UPDATE learnings SET seen_count = seen_count + 1 WHERE id = ?",
      matchID,
  )
  return DedupDuplicate
}

The threshold is 0.85 cosine similarity. Above that, we treat it as a duplicate and increment seen_count on the existing entry instead of inserting a new row. Below 0.70, it's definitely novel. Between 0.70 and 0.85 is the "near-duplicate" zone — similar enough to flag but different enough to store separately.

This dedup mechanism creates an unexpected signal: repetition frequency. If a learning has been "seen" 10 times across different missions, it's likely a fundamental truth about the project or the tooling. High seen_count entries are our most reliable insights. They're the things agents keep rediscovering because they matter.

Hybrid Search Injection

When spawning a new agent, the system needs to find relevant learnings from the database. Not all 1,604 — just the ones that matter for this specific phase.

It uses a hybrid search strategy combining two approaches:

  1. Keyword Search (FTS5): Fast and precise. Great for specific error messages, library names, and exact terminology. If the phase mentions "FTS5," keyword search will find every learning that mentions "FTS5."

  2. Semantic Search (Cosine Similarity on Embeddings): Catches conceptual matches. "Authentication error" matches "login failure." "Performance bottleneck" matches "slow query." The semantic layer understands meaning, not just words.

The final score blends both: 0.3 * keyword_score + 0.7 * semantic_score. The 70/30 weighting toward semantic search ensures we catch conceptually relevant learnings even when the terminology differs, while the keyword component prevents semantic drift and respects specific technical terms.

The top-scoring learnings get injected into the agent's prompt, formatted as categorized advice:

## Prior Learnings (Apply / Avoid / Consider)

- Apply: FTS5 + semantic embeddings hybrid search outperforms either alone
- Avoid: go build fails silently when embedding directive references missing file
- Consider: SQLite over Postgres for CLI tools — simpler deployment, sufficient for single-user

What the Data Shows

The database currently holds 1,604 learnings and 54 meta-learnings. Here's the breakdown:

CategoryCount%What it captures
Insights70343.8%Techniques, patterns, "this works well"
Errors36022.4%Mistakes, gotchas, "this broke"
Sources19512.2%Useful APIs, docs, references
Decisions18511.5%Architectural choices with rationale
Patterns1489.2%Reusable approaches across domains

The 360 errors are the most valuable category. Each represents a mistake that an agent made, documented, and that no future agent should repeat. "Avoid: go build fails silently when embedding directive references a missing file" saves a future agent 15 minutes of debugging — multiplied across hundreds of missions.

The Honest Limitations

The system works, but it has clear gaps:

No learning decay. Old learnings persist indefinitely. A workaround for a bug that's since been patched still gets injected into agent prompts. I need a mechanism to mark learnings as stale or to weight recent learnings higher.

No quality feedback. The system tracks how often a learning is seen (via dedup), but not whether it was helpful. An agent might receive a learning and ignore it because it's irrelevant. There's no signal for this.

Embedding quality variance. Gemini embeddings are good but not perfect. Occasionally, two genuinely different insights get flagged as duplicates because their surface-level language is similar, even though the underlying concepts differ.

Despite the noise, the signal is strong enough to matter. Agents running today are measurably better than agents running two weeks ago, because they stand on the shoulders of 1,604 prior insights. The feedback loop works — it just needs refinement.

Next in series: Six Plugins, One Brain


Related Posts

Jan 12, 2026

Why I Built a Multi-LLM Orchestration System (And You Might Want One Too)

Jan 22, 2026

Why I Built a Personal Intelligence OS

Jan 25, 2026

Starting Line: The Case for Personal AI