Writing | April 30, 2026 | 10 min read

The orphan rate

How a fleet-wide audit replaced four elegant predictions, and why the safety net never fired

Josh Duffy FMLOps

The question came from a parallel session.

I had just shipped a fix for a different problem. Handoffs between sessions had been silently overwriting each other, and a four-day debugging trajectory had cost a working week before I figured out the actual mechanism. Filename grammar plus an ownership marker plus a PreToolUse hook stopped the silent overwrites at the substrate. The post about it shipped this morning. The session that wrote that post asked, on its way out, a different question: do we have a similar issue with how memory files are being written?

I was sure I knew what the answer would look like. I sketched four predictions about how memory-file drift would manifest. Each one was elegant, symmetric, and matched the shape of the handoff bug.

Three were wrong.

The fourth was the entire story, and the safety net I had built months earlier to catch it had been structurally inert under my own rules.

01

The Question

Why do we write memory files at all?

A session learns something. The thing is durable. Future sessions will need to know it. The session writes a markdown file under a per-project memory directory and an index pointer in MEMORY.md. The next session globs the directory at startup, reads the index, and gets a working understanding of what it inherits.

The convention has the same shape as a handoff. Single-author writes, cross-session reads, no enforcement layer. The handoff bug from earlier this week was that two parallel sessions writing to the same filename silently overwrote each other. The question from the outgoing session was the right one to ask. A clobber pattern that exists for handoffs would, by symmetry, exist for memory files. Same write semantics, same lack of enforcement, same multi-session pressure. If the substrate had one of those bugs, it probably had the other.

That was the prediction. It was a symmetric, elegant, plausible prediction. It produced four candidate failure modes. I was about to spend the next several hours probing each one.

02

Four Predictions

The four predictions I wrote down were these.

One. MEMORY.md clobber-revert. Two parallel sessions edit MEMORY.md at the same time. Both load the file, both add their entry, both save. One write wins, the other loses. The losing session's entry never makes it to disk. Mirror of the handoff bug, applied at the index layer.

Two. Topic-file content drift. Two sessions write to the same memory topic file, and one session's edits silently overwrite the other's content. A layer-down clobber, content-level rather than index-level.

Three. Stale-but-plausible pointers. MEMORY.md references a file that was renamed or moved, and the link silently rots. Future sessions follow the link, find nothing, and either give up or fall back to wrong information. The "memory you can't trust" pattern from earlier this week, applied to the memory layer itself.

Four. Orphan topic files. A session writes a new typed memory file (feedback, project, lessons, reference, user) and forgets to add the corresponding line to MEMORY.md. The file is on disk, but no future session ever globs it because they read the index, not the directory. A single-party omission failure, less elegant than the others.

All four predictions were elegant. All four mirrored some bug I had already debugged at a different layer. All four felt structurally correct in the way a hypothesis feels correct when you have not yet probed it.

This was mistake number 1.

I had treated my own predictions as load-bearing. The asymmetric-probe lesson from earlier this week says exactly the opposite. Predictions are cooperative smoke tests, not localizations. They tell you what you would believe if no evidence arrived. They tell you nothing about what is actually true. If three predictions are equally plausible, you have learned three things about your own intuitions and zero things about the substrate. The only way out is a probe that returns different answers for different hypotheses.

I had four predictions and zero probes. The next move was to fix that ratio.

03

The Probes

I designed one probe per prediction.

Probe 1, MEMORY.md clobber-revert. Walk the git history of every project's MEMORY.md and look for entries added in one commit and removed in the next without a deliberate refactor in between. The signature of a silent overwrite is an added-and-removed pair across consecutive commits, not within a single deliberate edit. Run on jd-dev, the largest fleet member, with 58 H2 headings in MEMORY.md. Result: zero hits. No clobber-revert pattern anywhere in the visible history.

Probe 2, topic-file content drift. Walk the lessons history for the same project and look for large-deletion signatures. A silent content-clobber would show up as a large-block removal not paired with a corresponding rewrite. Result: no large-deletion signature. The history shows accretion, not replacement.

Probe 3, stale pointers. Enumerate every relative link in every MEMORY.md across the fleet and check whether the linked file exists. Result: one suspicious candidate. Investigation showed the candidate was a probe bug. The link was real; the probe's path-resolution was wrong. After fixing the probe, zero hits.

Probe 4, orphans. Glob every file in every project memory directory, check whether each file's basename appears anywhere in MEMORY.md. Skip MEMORY.md itself. Skip the handoff-archive subdirectory.

The fourth probe returned a number. It returned several numbers, in fact, one per project, and the numbers were not small.

Three probes returned no evidence. One returned a fleet-wide finding. The asymmetry was the localization. The variable that produced different outcomes across the four probes was which prediction was being tested. The answer was: prediction four.

04

The Asymmetry

Here is what the audit returned, by project, on April 29.

Alpha-engine: 6 of 12. Fifty percent.

Changemanaged-io: 79 of 169. Forty-six percent.

Joshduffy-dev: 34 of 143. Twenty-three percent.

Aclaris-therapeutics: 1 of 38. Two percent.

Fourteen other projects: zero each.

Cumulative across the fleet: roughly a hundred orphan files. About a quarter of the typed-memory corpus, unreachable from the index.

The shape of the failure was not what I had predicted. It was simpler than I had predicted, and worse than I had predicted, and louder than I had predicted, all at the same time. There was no clever clobber mechanism. There was no two-parties-disagreeing pattern. There was a session that wrote a feedback or lessons or project file, and never edited MEMORY.md to register it, and stopped. The file existed. It was on disk. It was correct. And no future session would ever read it, because future sessions glob the index, not the directory. Each orphan was the silent omission of a single one-line edit, repeated across hundreds of sessions over the life of the fleet.

The reason three of my predictions were wrong is that they assumed sophistication. They assumed a clobber mechanism, a content-drift mechanism, a pointer-rot mechanism. The actual mechanism was nothing happening. A session finishes writing a file. The session ends. The MEMORY.md edit, which is a separate tool call, never happens. There is no clobber to detect, no race to win, no signal to fire. The only signal is the orphan rate. And the orphan rate is not visible until you compute it.

05

The Safety Net That Wasn't

There was supposed to be a tool that caught this.

I had written a memory-cleanup utility months earlier called /remsleep. It was named after REM sleep because it was supposed to do for the memory directory what sleep does for biological memory: prune stale entries, merge duplicates, resolve contradictions, re-index everything. It existed. It had run.

It had run on exactly one project, exactly once, on April 5. The project was changemanaged-io. The project's current orphan rate is forty-six percent.

Eighteen of the twenty-one projects with memory directories had never had it run at all.

So the question was no longer "does this fail?" The question was "why was nobody running the tool that would have caught it?"

The reason is in /remsleep's own configuration. Its agent definition declares Bash in its tool list, and it is designed to be invoked through the Agent tool as a subagent. Months ago I had written a rule, in my own user-rules directory, called no-bash-in-agents. The rule says: never launch agents that need to run Bash commands, because agent Bash commands require separate permission prompts that are invisible to the user, causing silent hangs.

I had written the rule. I had also written the tool that violated it. And then I had spent months not running the tool, without ever connecting the not-running to the rule that forbade it.

This was mistake number 2.

The capability existed. The trigger was gated by my own policy. It was not that /remsleep was broken; it was that /remsleep was structurally inert under my own rules, and I had never put those two facts in the same room.

The safety net was not catching the orphan rate. The orphan rate had been accumulating for the entire window during which the safety net was supposed to catch it. The orphan rate was the only signal that the safety net was not running. And the orphan rate was not visible to anyone, because no one was computing it.

The safety net was non-executable, and the only signal that it had ever been non-executable was the orphan rate, which only existed if you computed it.

06

Rule, Hook, Audit

The fix had three faces, the same way the handoff fix had three faces.

The first face was an audit script. About 190 lines of Node, walking every project memory directory, classifying every orphan by name pattern, sorting by orphan rate, surfacing the per-project breakdown. Path: ~/.claude/scripts/memory-orphan-audit.mjs. Smoke-tested by creating a synthetic orphan named feedback_synthetic_test_orphan_DELETEME.md, running the audit, confirming the synthetic appeared in the output classified as "feedback / high priority," and then deleting it. The smoke test was failure-mode-faithful. It created the actual condition the audit was supposed to detect, not a cooperative variant. If the audit had returned no hit on the synthetic, it would have proven the audit broken before anyone trusted it.

The second face was a rule. ~/.claude/rules/memory-files-mandatory.md. The rule says: when you Write a new typed memory file, you MUST add a one-line MEMORY.md entry in the same session. Handoffs are exempt because handoffs have their own grammar enforcement. Both faces shipped together as commit f46ecf2.

The third face is the part that has not shipped yet. A scheduled remote audit that fires once a week, runs the audit script, and opens a pull request if any project has more than ten typed orphans. It mirrors the pattern of the handoff-guard audit that fires every Monday. Setup needs the routines UI; pending one click of follow-up work.

A fake heckler in the back is shouting that obviously the right move was a PreToolUse hook that blocks any Write of a memory file unless MEMORY.md is also being edited. You are right, oddly-well-informed heckler, except that hook would false-positive on every legitimate refactor that splits a memory entry across two writes, on every cleanup pass that deletes an orphan rather than indexing it, and on every legitimate case where the index entry is added after a quick verify-the-content pass. The decision-point list explicitly rejected it for false-positive surface. You can sit down now.

This was mistake number 3.

I had been treating the rule as the structural prevention. It is not. The rule is a model-quality nudge that depends on me reading and following it in the moment of writing a memory file. The audit is the structural prevention. The audit catches whatever the rule misses, regardless of session quality, within seven days. The rule reduces the rate of new orphans from frequently to occasionally. The audit makes occasionally visible. The two together reduce drift; either one alone is insufficient. Per the planning rule about structural prevention versus model-quality: name what the fix covers and name what it does not cover. The rule does not cover the omission. The audit does.

07

Three Layers Of Silence

Three layers of silent failure had been stacked, and the audit was the first thing that put them in the same field of view.

Layer one. Handoffs were clobbering each other. Three months, four repos, thirty-two silent overwrites. Caught and fixed last week with filename grammar plus an ownership marker plus a PreToolUse hook.

Layer two. Memory files were being orphaned. About a hundred files across the fleet, roughly a quarter of the typed-memory corpus, unreachable from the index. Caught this week with the audit script and a rule.

Layer three. The cleanup tool that was supposed to catch layer two had been structurally inert under my own rules for the entire accumulation window. Caught this week, almost by accident, while looking at why nobody had been running it.

This is the fourth post in a small series about substrate failures of LLM-driven multi-session development. The first was about a hash function that drifted because the page it was hashing turned out to contain telemetry. The second was about a supervisor and an indexer that agreed on a cursor file's name and disagreed on what its absence meant. The third was about two parallel sessions writing to the same handoff filename, with no signal that the first one's content was being silently lost. This one is about a memory index where about a quarter of the typed entries silently failed to register, plus the cleanup tool that was supposed to catch it, plus my own rule that had quietly disabled the cleanup tool.

Four posts. Four substrate failures. Same pattern in every one. A convention exists, between two parties, or between a session and itself, or between a function and the page it is operating on. The convention has no enforcement layer. Violation is silent. Accumulation is at scale. Eventual cascading failure surfaces as something else at the symptom layer.

The corpus continues. The failure modes have not been exhausted. I expect more will surface, because the substrate is the same substrate, and the pattern of conventions-without-enforcement is not specific to any one of these layers.

The safety net was non-executable, and the only signal that it had ever been non-executable was the orphan rate. The orphan rate was the canary nobody was watching, in a coal mine nobody knew was there.