What happens when an AI coding assistant doesn't just write code, but also manages its own "working memory"?
Starting with Coding Agents
Since 2025, a new category of tools has been reshaping how developers work — Coding Agents.
These are not traditional AI chatbots. You don't need to copy-paste code snippets or manually apply suggestions. You simply describe the task, and the agent autonomously reads code, makes edits, runs tests, and fixes bugs in a loop until the task is done.
| Traditional AI Chat | Coding Agent | |
|---|---|---|
| Interaction | You ask, it answers, done | Autonomous loop, multi-step |
| File ops | Cannot | Read, write, edit files |
| Commands | Cannot | Run shell commands |
| Awareness | Cannot | Observes output, decides next step |
A Coding Agent is built on three pillars:
- LLM brain — understands tasks, reasons about solutions
- Tools — lets the LLM take action in the real environment
- Loop — continuous "think → act → observe" cycle until completion
Major products in this space include Anthropic's Claude Code, the open-source OpenCode, as well as Cursor Agent, Aider, Cline, and others. What they all share: they equip an LLM with a set of file manipulation and command execution tools, transforming the model from "can only chat" to "can actually do work."
Taking Claude Code as an example, its built-in tools include:
| Tool | What it does |
|---|---|
| Read / Write / Edit | Read, create, and precisely modify files |
| Glob / Grep | Find files by pattern, search code content |
| Bash | Execute shell commands |
| Task | Spawn sub-agents to handle subtasks |
| WebSearch / WebFetch | Search the internet, fetch web content |
Looking at this table, a natural idea emerges —
An Observation: Coding Agents Are Natural Memory Managers
If we treat the conversation between a user and an agent as "memory", and persist that memory as files, then a Coding Agent already has everything it needs to manage that memory:
| Memory Operation | Corresponding Tool |
|---|---|
| Recall | Read / Grep / Glob |
| Record | Write |
| Update | Edit |
| Search | Grep / Glob |
| Organize & Summarize | Read + Edit |
The key insight
No need to reinvent the wheel. Memory management is essentially file management, and Coding Agents are naturally the best file managers around.
This is the core idea behind Memory as File:
User ↔ Agent conversation
↓ persisted as
Files (Markdown)
↓ managed via
Coding Agent's built-in Tools
Why This Works
- Zero additional infrastructure — no databases, vector stores, or external services needed
- Human-readable and auditable — memory is just Markdown files; users can inspect and manually edit anytime
- Native version control — put it in a Git repo and the full history of memory changes is transparent
- Reuses existing capabilities — no new tools to build; the Coding Agent's current toolset is sufficient
Existing Validation
This is not purely hypothetical. Claude Code already ships with a minimal Memory as File mechanism: MEMORY.md.
Each project gets a dedicated directory at ~/.claude/projects/<path>/memory/, where MEMORY.md is automatically loaded into the system prompt of every new conversation. The agent can use Write/Edit tools to persist information, and the content survives across sessions.
Session 1: User says "This project uses bun, not npm"
→ Agent writes to MEMORY.md: "Package manager: use bun"
Session 2: New conversation starts
→ System prompt automatically includes MEMORY.md
→ Agent knows to use bun without being told again
This proves that "file as memory" works. But it's also minimal — a single file, 200-line limit, loaded in full, no structure. Which leads to a deeper question.
Who Should Manage the Agent's Attention?
MEMORY.md's approach is to dump all memory into the context window at once. This works for small projects, but hits a wall as memory grows and tasks become more complex.
An LLM's context window is a finite and expensive resource. Current approaches sit at two extremes:
| Approach | Problem |
|---|---|
| Load everything | Context overflows, tokens wasted, attention diluted, quality degrades |
| User specifies | Requires the user to decide what to load — high cognitive burden |
Is there a third way?
Let the agent manage its own working memory.
Agent-Driven Memory Management
Humans don't load all their knowledge into working memory at once. When writing frontend code, you're not simultaneously thinking about database index optimization. When debugging an API, you're not recalling CSS layout tricks. Human working memory is loaded on demand and released when done.
Coding Agents should work the same way.
Consider a large task — "Build a full-stack application" — that spans multiple knowledge domains, but each subtask only needs a subset:
Task: "Build a full-stack application" ├── Subtask 1: Database design │ → Load: SQL skill, DB schema memory │ → Unload: frontend-related memory ├── Subtask 2: API development │ → Load: REST skill, auth memory │ → Unload: SQL skill (no longer needed) ├── Subtask 3: Frontend pages │ → Load: React skill, UI memory │ → Unload: API skill └── ...
Architecture
┌─────────────────────────────────────┐
│ Long-term Memory │
│ (File system — all memory files) │
│ skills/, mcp/, project-context/ │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │
│ │ A │ │ B │ │ C │ │ D │ │ E │ │
│ └───┘ └───┘ └───┘ └───┘ └───┘ │
└─────────────┬───────────────────────┘
│ Agent decides load/unload
▼
┌─────────────────────────────────────┐
│ Working Memory │
│ (Current context window) │
│ ┌───┐ ┌───┐ │
│ │ B │ │ D │ ← Only what the │
│ └───┘ └───┘ current subtask │
│ needs │
└─────────────────────────────────────┘
- Long-term Memory: all memory files in the file system, unlimited capacity
- Working Memory: current context window, limited capacity, requires careful management
How Does the Agent Know What to Load?
| Approach | Description |
|---|---|
| Index file | Maintain an index.md with summaries. Agent reads the index first, then decides which files to load |
| Directory conventions | Structured directories (skills/database/, skills/frontend/) for quick Glob-based lookup |
| Metadata tags | Tags in each memory file header; the agent filters by tag using Grep |
How Is "Unloading" Implemented?
Under current LLM architectures, content cannot be truly removed from the context window once added. But there are practical workarounds:
| Approach | Feasibility | Description |
|---|---|---|
| Sub-agent (fork) | Best | Each subtask spawns a new sub-agent with a naturally clean context |
| Conversation split | Viable | Split large tasks into multiple conversations, loading only relevant memory |
| Explicit prompting | Marginal | Tell the agent "ignore the following" — unstable results |
The sub-agent model aligns best with existing architectures. In Claude Code, the Task tool already supports spawning sub-agents:
Main Agent (coordinator)
├── Analyzes the large task, breaks it into subtasks
├── Determines which memory each subtask needs
└── Spawns Sub-Agent (Task tool)
→ Injects only the relevant memory file contents
→ Sub-agent works in a clean context
→ Returns results to the main agent
From MEMORY.md to a Complete Solution
| Claude Code's MEMORY.md | Full Memory as File | |
|---|---|---|
| Essence | File as memory | File as memory |
| Scale | Single file, 200-line limit | Multi-file, multi-directory, unlimited |
| Structure | Unstructured | Organized by topic, tags, and indexes |
| Retrieval | Loaded in full, no search needed | Grep/Glob or semantic search |
| Manager | User / Agent passively records | Agent autonomously decides what to load and unload |
MEMORY.md is the minimal viable proof. The complete solution answers a fundamental question:
Who should be responsible for managing the agent's attention?
The answer shifts from "the user manages manually" to "the agent manages autonomously." This is a critical step toward truly autonomous agents.
Open Questions
Several questions remain worth exploring:
- Memory structure — Store raw conversation transcripts, or distilled summaries? Raw is complete but noisy; distilled is concise but loses information.
- Memory granularity — One file per conversation? Split by topic? By timeline? Granularity determines retrieval efficiency.
- Semantic retrieval — Grep/Glob uses keyword matching, which may miss semantically similar but differently worded memories. Do we need vector search? Or is the LLM's own comprehension sufficient?
- Memory lifecycle — How to handle outdated memory? Automatic decay, agent-initiated cleanup, or manual user management?
- Active vs. passive — Should the agent wait for the user to say "remember this," or proactively decide which information is worth persisting?
Conclusion
The emergence of Coding Agents has evolved LLMs from "chat tools" into "programming partners." But their memory capabilities remain primitive — either no memory at all, or loading a single small file in full.
Memory as File proposes a simple yet powerful abstraction: memory is files, file management is memory management, and Coding Agents happen to be the best file managers available.
Building on this, Agent-driven Memory Management takes it further by letting agents autonomously manage their own working memory — loading on demand, unloading when done, efficiently utilizing limited attention resources just as humans do.
This requires no additional infrastructure, no new tools — just a shift in perspective on the capabilities Coding Agents already possess.
The ideas in this post originated from observations and experiments while using Claude Code for day-to-day development. All concepts discussed (Memory as File, Agent-driven Memory Management) are still in the early conceptual stage. Discussion and feedback are welcome.