Writing | May 9, 2026 | 7 min read

Discipline doesn't scale

I had a rule. I broke it a hundred times in three months and didn't notice.

Josh Duffy fMLops

I had a rule.

The rule was three sentences long and I read it at the start of every session. It said: when you write a new typed memory file, you must also add a one-line entry to MEMORY.md. Memory files are how I let yesterday's session brief tomorrow's. The index is how tomorrow's session knows yesterday's file exists. Without the index entry, the file is on disk but invisible. The session writing it does fine. The session reading it never sees it.

I broke this rule a hundred times across three months and didn't notice.

The Audit

The script that found out was 190 lines of Node. It walked every project's memory directory, classified each file by name pattern, checked whether the filename appeared anywhere in MEMORY.md, and counted the orphans.

Repo A: 6 of 12. Fifty percent.

Repo B: 79 of 169. Forty-six percent.

Repo C: 34 of 143. Twenty-three percent.

Repo D: 1 of 38. Two percent.

Fourteen other projects: zero each.

Around a hundred files across the fleet. Every one of them written by some past version of me, on some past day, while reading a rule that said "also add a line to MEMORY.md," nodding, and not adding the line.

What I Want To Be True

Here is what I want to be true about myself: I am careful. I read the rules. I follow them.

Here is what is actually true: I write thirty memory files a week, in fifteen-minute spurts of wrapping up some chunk of work. The rule lives in a markdown file I read three months ago. The act of writing the memory file feels like the conclusion of the work. Adding the index entry feels like cleanup. Cleanup is exactly the kind of thing that gets dropped when you're tired or interrupted or deciding what to make for dinner.

The rule was readable, sensible, and unenforced. So it ran at about seventy-six percent compliance. The remaining twenty-four percent accumulated silently, for a hundred days, until the script ran.

This was mistake number 1.

I had been treating discipline as the structural prevention.

Discipline is good for things that fire once a quarter. It is not a policy for things that fire fifty times a day.

When Discipline Works

Discipline works for things that fire once a quarter. "Don't accidentally drop the production database." "Read the warning before deleting the customer table." "Hold the elevator door for the person carrying coffee." Things rare enough to stop and think about. Things you can deliberate over for thirty seconds before acting.

Discipline is not a policy for things that fire fifty times a day. Writing a memory file fires fifty times a day. So does pushing a commit. So does answering a Slack message. So does deciding which of seven Claude sessions a fix belongs in. The rule that depends on you remembering to do something every single time, forever, on a high-frequency action, will fail at the rate that matches your worst day. And it will fail silently, because you weren't watching, because there were too many of them to watch.

A fake heckler in the back is shouting that obviously the answer is to be more careful. You're right, oddly-well-informed heckler. I should be more careful. I should also exercise more, drink less coffee, and respond to emails the day they arrive. The list of things I should do more of is endless. The list of things I will reliably do more of is much shorter. It does not scale by adding new items.

Two Modes That Work

The fixes that worked, once I started shipping fixes that worked, looked the same in every case. They either made the hazard impossible, or they made failure loud.

Make impossible. A month ago, two parallel sessions writing to the same handoff filename silently overwrote each other thirty-two times across three months. The rule said "use a topic suffix." I broke it constantly because the rule was a pious wish. The fix was a filename grammar that requires a topic suffix and a hook that rejects writes without one. The collision doesn't happen because the filename can't exist. There is nothing to remember.

Make loud. The orphan-rate finding stays a finding because the rule still says "add an index entry," and I will continue to break it. So an audit script runs every week and opens a pull request listing every new orphan. The rule still depends on me. The audit doesn't. The audit catches whatever the rule misses, within seven days, regardless of how careful I was that week.

Either is dramatically more reliable than asking the operator to be perpetually careful about a hazard they cannot see.

Five Hours Of Wasted Compute

A few months earlier, a different version of this lesson cost me five hours of compute and five gigabytes of disk.

I had a supervisor script. Eight parallel slices, each running an indexer, each writing a cursor file to track its progress. The supervisor read the cursor file on resume. The indexer deleted the cursor file on success. From the supervisor's point of view, no cursor file meant "start over." From the indexer's point of view, no cursor file meant "done." The two definitions of "no cursor file" were never written down anywhere, because the file felt obvious. Cursors are read on resume. Cursors are written on progress. You don't write a memo about that.

The system ran the same six hours of work three times before I noticed.

The fix wasn't a more careful supervisor. The fix was making the wrapper own the cursor file outright, with state the indexer couldn't touch. The hazard of "two interpretations of one file" was made impossible. There was nothing left to remember.

$2.40 A Day

Or this. I had a hash function that decided whether to fetch a vendor's pricing page. If the page hadn't changed, skip the LLM call. The function was four lines of code. It hashed the response body and compared.

The vendor's response body, I learned eventually, contained a session-specific tracking pixel. So the hash flipped on every fetch. So the LLM call ran on every fetch. So I burned $2.40 a day for two months on Sonnet calls that returned the same answer every time. I caught it because the Anthropic usage dashboard happened to draw a bar chart that exposed the pattern. I did not catch it because I was being careful.

Same shape. The hazard was "telemetry inside the response body changes the hash." I couldn't see the hazard. I would not have predicted it. The fix was hashing only the parts of the body the LLM was going to read, which made the hazard impossible because the telemetry was no longer in scope.

The Same Lesson, Four Times

I would like to claim I learned a general lesson the first time and applied it to the second, and the third. The truth is I learned the same lesson four separate times before I noticed it was the same lesson.

The rules I read at the start of every session are an attempt to compensate for a missing piece of infrastructure. Every time I write a rule that says "remember to do X," the rule is admitting that the system doesn't enforce X, and hoping I will. The rule is the system asking me to be more careful in lieu of being engineered.

The number 32 is what you get when prevention gets delegated. None of those thirty-two silent overwrites was a discipline failure. Every one happened because the writing session genuinely could not see the collision. Same filename, no marker, no hook, equals silent data loss every time, no matter how careful any one session was. The cost of one hook plus one rule is far below the cumulative cost of the next thirty-two incidents. It is also far below the cost of the four days I spent debugging the wrong layer because the previous incidents had been silent.

Discipline doesn't scale. Hooks do.