Blog

Memory as File: Letting Coding Agents Manage Their Own Memory

February 2026 12 min read Concepts & Architecture
What happens when an AI coding assistant doesn't just write code, but also manages its own "working memory"?

Starting with Coding Agents

Since 2025, a new category of tools has been reshaping how developers work — Coding Agents.

These are not traditional AI chatbots. You don't need to copy-paste code snippets or manually apply suggestions. You simply describe the task, and the agent autonomously reads code, makes edits, runs tests, and fixes bugs in a loop until the task is done.

Traditional AI ChatCoding Agent
InteractionYou ask, it answers, doneAutonomous loop, multi-step
File opsCannotRead, write, edit files
CommandsCannotRun shell commands
AwarenessCannotObserves output, decides next step

A Coding Agent is built on three pillars:

  1. LLM brain — understands tasks, reasons about solutions
  2. Tools — lets the LLM take action in the real environment
  3. Loop — continuous "think → act → observe" cycle until completion

Major products in this space include Anthropic's Claude Code, the open-source OpenCode, as well as Cursor Agent, Aider, Cline, and others. What they all share: they equip an LLM with a set of file manipulation and command execution tools, transforming the model from "can only chat" to "can actually do work."

Taking Claude Code as an example, its built-in tools include:

ToolWhat it does
Read / Write / EditRead, create, and precisely modify files
Glob / GrepFind files by pattern, search code content
BashExecute shell commands
TaskSpawn sub-agents to handle subtasks
WebSearch / WebFetchSearch the internet, fetch web content

Looking at this table, a natural idea emerges —

An Observation: Coding Agents Are Natural Memory Managers

If we treat the conversation between a user and an agent as "memory", and persist that memory as files, then a Coding Agent already has everything it needs to manage that memory:

Memory OperationCorresponding Tool
RecallRead / Grep / Glob
RecordWrite
UpdateEdit
SearchGrep / Glob
Organize & SummarizeRead + Edit

The key insight

No need to reinvent the wheel. Memory management is essentially file management, and Coding Agents are naturally the best file managers around.

This is the core idea behind Memory as File:

User ↔ Agent conversation
        ↓ persisted as
    Files (Markdown)
        ↓ managed via
    Coding Agent's built-in Tools

Why This Works

Existing Validation

This is not purely hypothetical. Claude Code already ships with a minimal Memory as File mechanism: MEMORY.md.

Each project gets a dedicated directory at ~/.claude/projects/<path>/memory/, where MEMORY.md is automatically loaded into the system prompt of every new conversation. The agent can use Write/Edit tools to persist information, and the content survives across sessions.

Session 1: User says "This project uses bun, not npm"
            → Agent writes to MEMORY.md: "Package manager: use bun"

Session 2: New conversation starts
            → System prompt automatically includes MEMORY.md
            → Agent knows to use bun without being told again

This proves that "file as memory" works. But it's also minimal — a single file, 200-line limit, loaded in full, no structure. Which leads to a deeper question.

The real question

Who Should Manage the Agent's Attention?

MEMORY.md's approach is to dump all memory into the context window at once. This works for small projects, but hits a wall as memory grows and tasks become more complex.

An LLM's context window is a finite and expensive resource. Current approaches sit at two extremes:

ApproachProblem
Load everythingContext overflows, tokens wasted, attention diluted, quality degrades
User specifiesRequires the user to decide what to load — high cognitive burden

Is there a third way?

Let the agent manage its own working memory.

Agent-Driven Memory Management

Humans don't load all their knowledge into working memory at once. When writing frontend code, you're not simultaneously thinking about database index optimization. When debugging an API, you're not recalling CSS layout tricks. Human working memory is loaded on demand and released when done.

Coding Agents should work the same way.

Consider a large task — "Build a full-stack application" — that spans multiple knowledge domains, but each subtask only needs a subset:

Task: "Build a full-stack application"
├── Subtask 1: Database design
│   → Load: SQL skill, DB schema memory
│   → Unload: frontend-related memory
├── Subtask 2: API development
│   → Load: REST skill, auth memory
│   → Unload: SQL skill (no longer needed)
├── Subtask 3: Frontend pages
│   → Load: React skill, UI memory
│   → Unload: API skill
└── ...

Architecture

┌─────────────────────────────────────┐
│         Long-term Memory            │
│   (File system — all memory files)  │
│   skills/, mcp/, project-context/   │
│   ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐   │
│   │ A │ │ B │ │ C │ │ D │ │ E │   │
│   └───┘ └───┘ └───┘ └───┘ └───┘   │
└─────────────┬───────────────────────┘
              │ Agent decides load/unload
              ▼
┌─────────────────────────────────────┐
│         Working Memory              │
│   (Current context window)          │
│   ┌───┐ ┌───┐                       │
│   │ B │ │ D │  ← Only what the      │
│   └───┘ └───┘    current subtask    │
│                   needs             │
└─────────────────────────────────────┘

How Does the Agent Know What to Load?

ApproachDescription
Index fileMaintain an index.md with summaries. Agent reads the index first, then decides which files to load
Directory conventionsStructured directories (skills/database/, skills/frontend/) for quick Glob-based lookup
Metadata tagsTags in each memory file header; the agent filters by tag using Grep

How Is "Unloading" Implemented?

Under current LLM architectures, content cannot be truly removed from the context window once added. But there are practical workarounds:

ApproachFeasibilityDescription
Sub-agent (fork)BestEach subtask spawns a new sub-agent with a naturally clean context
Conversation splitViableSplit large tasks into multiple conversations, loading only relevant memory
Explicit promptingMarginalTell the agent "ignore the following" — unstable results

The sub-agent model aligns best with existing architectures. In Claude Code, the Task tool already supports spawning sub-agents:

Main Agent (coordinator)
├── Analyzes the large task, breaks it into subtasks
├── Determines which memory each subtask needs
└── Spawns Sub-Agent (Task tool)
     → Injects only the relevant memory file contents
     → Sub-agent works in a clean context
     → Returns results to the main agent
From prototype to vision

From MEMORY.md to a Complete Solution

Claude Code's MEMORY.mdFull Memory as File
EssenceFile as memoryFile as memory
ScaleSingle file, 200-line limitMulti-file, multi-directory, unlimited
StructureUnstructuredOrganized by topic, tags, and indexes
RetrievalLoaded in full, no search neededGrep/Glob or semantic search
ManagerUser / Agent passively recordsAgent autonomously decides what to load and unload

MEMORY.md is the minimal viable proof. The complete solution answers a fundamental question:

Who should be responsible for managing the agent's attention?

The answer shifts from "the user manages manually" to "the agent manages autonomously." This is a critical step toward truly autonomous agents.

Open Questions

Several questions remain worth exploring:

  1. Memory structure — Store raw conversation transcripts, or distilled summaries? Raw is complete but noisy; distilled is concise but loses information.
  2. Memory granularity — One file per conversation? Split by topic? By timeline? Granularity determines retrieval efficiency.
  3. Semantic retrieval — Grep/Glob uses keyword matching, which may miss semantically similar but differently worded memories. Do we need vector search? Or is the LLM's own comprehension sufficient?
  4. Memory lifecycle — How to handle outdated memory? Automatic decay, agent-initiated cleanup, or manual user management?
  5. Active vs. passive — Should the agent wait for the user to say "remember this," or proactively decide which information is worth persisting?

Conclusion

The emergence of Coding Agents has evolved LLMs from "chat tools" into "programming partners." But their memory capabilities remain primitive — either no memory at all, or loading a single small file in full.

Memory as File proposes a simple yet powerful abstraction: memory is files, file management is memory management, and Coding Agents happen to be the best file managers available.

Building on this, Agent-driven Memory Management takes it further by letting agents autonomously manage their own working memory — loading on demand, unloading when done, efficiently utilizing limited attention resources just as humans do.

This requires no additional infrastructure, no new tools — just a shift in perspective on the capabilities Coding Agents already possess.


The ideas in this post originated from observations and experiments while using Claude Code for day-to-day development. All concepts discussed (Memory as File, Agent-driven Memory Management) are still in the early conceptual stage. Discussion and feedback are welcome.