Git as an External Brain for Claude Code: Beyond MEMORY.md

A friend dropped this in a dev community: “Git isn’t just useful for humans — it’s equally useful for Claude Code. Good git habits for humans are good git habits for AI agents.”

My first thought: “I’ve got Memory.md. Isn’t that enough?”

Two weeks of experiments later, turns out these two things solve completely different problems.

Memory isn’t just two layers

I used to think AI agent memory had two options: context window runs out, open Memory.md. Memory.md gets bloated, trim it somehow.

Turns out Claude Code already has three layers of memory built in. Git is a fourth:

Layer	What it is	How it works	What goes here
CLAUDE.md	Project rules file	Fully loaded every session	Stable rules, conventions, build commands
Auto Memory	`~/.claude/.../memory/`	First 200 lines of MEMORY.md auto-loaded; topic files loaded on demand	Things Claude learned, past mistakes
Session Memory	Auto-generated summaries	Injected as background knowledge, no manual effort	What happened last session
Git History	Commit log + diff + blame	Queried on demand	Full change history, decision context

The first three are built by Anthropic. The fourth has always been there — most people just haven’t thought of Git deliberately as an AI memory tool.

The key to this layering isn’t the type of information. It’s query frequency:

Need it every time → CLAUDE.md
Brought in automatically (you don’t manage it) → Session Memory
Occasionally need to go deep → Auto Memory topic files
Look it up when needed → Git

Anthropic’s best practices are direct about CLAUDE.md: keep it under 200 lines, and for each line ask yourself “would removing this cause Claude to make mistakes?” If not, cut it. Information that changes frequently — like project status — doesn’t belong here.

So what does Git add that the other layers don’t?

Session Memory already summarizes what happened last time. Auto Memory remembers past mistakes. What’s Git for?

The difference is granularity and queryability.

Session Memory is a summary — it tells you “we were working on the login feature yesterday.” But if you need to know “why does the token refresh logic use fixed wait instead of exponential backoff?” — Session Memory doesn’t have that detail. Git blame does.

Auto Memory captures general lessons learned. Git captures “this specific file, this specific line, at this specific time, was changed for this specific reason.”

In plain terms: Session Memory and Auto Memory are notes. Git is the full security footage. You don’t watch security footage every day, but when something goes wrong it’s the only thing that can tell you exactly what happened.

How I actually use this: four patterns

Commit messages as compressed summaries

I added a rule to my CLAUDE.md: commit messages must explain what was done and why.

The result: next session, run git log --oneline -15. Fifteen lines, full recent context. The AI’s own log entries are more accurate than my verbal descriptions — they carry timestamps, file scope, and reflect its understanding at the moment of completion, not a hazy recollection the next day.

To be clear: “commit after every small change” is not some industry consensus. Anthropic’s best practices place commit as the final step in an explore → plan → implement cycle. There’s also academic disagreement — the Lore paper (arXiv 2603.15566) argues commits should be fewer but richer, each carrying full decision context (why A over B, what constraints applied).

My own practice falls somewhere in between: I don’t commit every line change, but I commit after each logically complete piece of work. This is personal preference, not a best practice.

`git diff` before commit — self-review

Claude Code will sometimes confidently report “done” when changes didn’t actually land correctly. Hallucination isn’t just an answering-questions problem — it happens during file edits too.

My Pre-Commit Checklist includes: before committing, run git diff and review your own changes.

This has caught: editing file A but forgetting to sync file B, thinking a line was deleted when it wasn’t, adding a new dependency without updating the build config.

Let it write first, then check its own work. Verification is easier than generation — true for humans, true for AI.

`git blame` for decision archaeology

I once spotted some retry logic with weird backoff parameters. My instinct: “just change it to standard exponential backoff.”

Then I ran git blame. A commit message from three months back read: “API rate limiting rule is wait 5s after 429, not exponential backoff.”

Without Git, this code would’ve been “fixed” and then actually broken. These kinds of design decisions don’t get captured by Session Memory, Auto Memory, or Memory.md. But Git has all of them — as long as you wrote decent commit messages at the time.

git blame src/auth/token-manager.ts
git show abc1234 -s

`git worktree` for parallel agents

Run two Claude Code sessions simultaneously, each working on a different approach, no interference:

git worktree add ../myproject-experiment -b experiment/approach-A
cd ../myproject-experiment && claude

Main branch stays untouched. Experiment blows up? Delete the directory. Works out? Merge it back.

This isn’t something I invented — VILA-Lab’s analysis of Claude Code’s architecture (arXiv 2604.14228) found that Claude Code internally runs sub-agents in isolated git worktrees for complex tasks. Letta’s Context Repository system uses the same pattern.

A side benefit: pre-commit hooks

Not really about memory, but a bonus that comes with using Git deliberately.

With Git’s undo safety net, the AI does get bolder about making changes. Pre-commit hooks run lint and type-check; if they fail, the commit doesn’t go through:

# .git/hooks/pre-commit or via husky
npm run lint
npm run type-check
npm run test -- --bail

One gotcha: if your hooks take more than ~30 seconds, Claude Code will try to skip them (--no-verify). Keep hooks under 10 seconds.

How I split responsibilities

After using this setup for a while, my division of labor looks roughly like this:

CLAUDE.md (stable, lean, loaded every time):

Build and test commands
Code style rules
Hard constraints and landmines
Architecture decisions (current, not historical)

Auto Memory (let Claude manage it):

Mistakes and lessons learned
Debug patterns and solutions
Project-specific patterns

Session Memory (hands-off, automatic):

Summary of last session

Git (query when needed):

“Where did we leave off?” → git log
“How was that bug fixed?” → git log -S "keyword"
“Why does this code look like that?” → git blame
“What changed between v2 and v3?” → git diff

The rule of thumb: if you need this information every session, put it higher (CLAUDE.md). If you only need it occasionally, put it lower (Git). The cost of getting it wrong: too high wastes context tokens, too low means you can’t find it when you need it.

Is anyone actually researching this?

Yes. And not just one group.

A few worth tracking:

Lore (arXiv 2603.15566) — restructures git commit messages into a knowledge protocol so AI can query “why was A chosen over B” from commit history
Git Context Controller (arXiv 2508.00031) — organizes agent memory using git operations (commit, branch, merge), achieved 48% on SWE-Bench-Lite
Letta Context Repositories (blog) — git-backed filesystem as shared agent memory, sub-agents work in isolated worktrees
DiffMem (GitHub) — manages conversational AI memory as a git repo; current state in editable files, history in the commit graph

The common conclusion across all these systems: separating “current state” from “historical record” is the right call, and Git is a natural fit for the latter.

Closing thought

Git was built in 2005 by Linus Torvalds to manage the Linux kernel. Twenty years later it picked up a new use case: helping a perpetually amnesiac AI remember what it did.

Torvalds probably didn’t see that one coming.

But “complete history + precise queries + arbitrary restoration” — those three properties happen to be exactly what AI agents need most. Maybe that’s what good tools do: they solve one problem at design time, but a well-chosen abstraction gets rediscovered in contexts nobody anticipated.

Safety Gates in Claude Code Skills: From Auditing 35 Skills to a Three-Layer Protection Model

The Truth About 26%: Mem0's Paper, Benchmark Wars, and the Promise vs Reality of Graph Memory