Skip to main content

Newsletter

Commentary and analysis on AI, engineering, and the industry. Also published on Substack.

· 9 min read

AI Reviews the Code That AI Wrote

No tool found more than 63% of known issues. The field is deploying the solution before it has measured the problem.

· 10 min read

The Security Problem Nobody Can Solve

Three independent research groups converged on instruction hierarchy. Two converged on causal attribution. The attacks in the wild are still 85% social engineering in plaintext.

· 8 min read

What Is /btw?

A broken hint appeared in Claude Code sessions 80 days before the feature shipped, and the gap between them documents something about how Anthropic is redesigning the cost of asking questions during active work.

· 9 min read

The Question You Type While Claude Is Still Working

Claude Code shipped /btw after 83 days of it appearing as a broken hint, and the feature reveals something about how AI tools are starting to treat developer attention as concurrent rather than sequential.

· 9 min read

Agent Frameworks Are Solving the Wrong Problem

Frameworks optimize for orchestration plumbing when the real bottleneck is context management.

· 8 min read

LLM Guardrails Are Not a Security Boundary

Benchmark data from 1,445 attack scenarios shows LLM-based guardrails drop from 91% accuracy on known attacks to 33.8% on novel ones — a 57-point generalization gap that no amount of prompt engineering fixes.

· 10 min read

What 1.5 Million Cancellations Actually Moved

The #QuitGPT movement claimed 1.5 million participants and less than 0.2 percent of ChatGPT's user base — but something moved.

· 8 min read

Vibe Coding's Three-Month Wall

The term that promised to democratize software development has a one-year track record. The features worked. That was always going to be the easy part.

· 9 min read

The Hack That Almost Broke the Internet. Now Anyone Can Do It.

The Veritasium documentary made the XZ backdoor legible to millions. Then the US killed its own policy response, funding stalled, the maintainer came back on the same terms, and an AI agent ran the same playbook in two weeks.

· 10 min read

AI Chose Nukes

A King's College simulation ran three AI models through 21 nuclear crisis war games. Across 329 turns, not one chose to de-escalate.

· 9 min read

Anthropic's Safety Pledge is Horsesh*t

In the same week, Anthropic publicly refused to remove guardrails under Pentagon pressure — and quietly removed the training-pause commitment it had held since 2023.

· 11 min read

The Wall Learned to Walk

An open-source scraping library taught AI agents to impersonate browsers. Cloudflare caught 34% more bots in response. The arms race is now structural — and the web is losing.

· 9 min read

The Ladder Nobody Climbed Down

Three frontier LLMs played 21 war games. Eight de-escalatory options were on the table every turn. None were ever chosen.

· 8 min read

LLM Guardrails Aren't a Security Boundary

Six commercial guardrails tested against emoji smuggling. Bypass rate: 100%. The numbers explain why probabilistic systems can't do deterministic security's job.

· 9 min read

Sam Altman Called Humans Inefficient. Commentators Called It Evil.

Sam Altman compared AI training costs to 200 millennia of human evolution. The argument is wrong in three specific ways — and designed to work at infrastructure scale.

· 10 min read

Your Terminal, From Your Pocket

Anthropic shipped Remote Control for Claude Code — mobile session continuity for a terminal-native AI tool. Four independent developers had already built the same thing. That sequence tells you more than the feature itself.

· 10 min read

Vibe Coding Shipped With a Warning. Someone Removed It.

The term Karpathy coined for throwaway weekend projects has been redefined to mean professional AI-assisted development. The security incidents that followed were predictable from the original definition.

· 8 min read

Your CLAUDE.md Is Making Things Worse

A newly published study found that LLM-generated context files hurt task completion and raise costs by 20%. The harder question is when they still make sense.

· 6 min read

No Cloud, No Account, No Subscription

Five independent developers shipped five single-purpose terminal tools in the same week. The motivation language was structurally identical across all five. Something has shifted.

· 7 min read

The Hit Piece

An AI agent had its PR rejected. It responded by publishing an essay accusing the maintainer of gatekeeping, insecurity, and discrimination. One in four readers believed it.

· 11 min read

The Vampiric Effect

I shipped an 847-line PR on a Wednesday night and couldn't recall a single architectural decision the next morning. The output was mine. The understanding wasn't.