Blog — MemoryAgent

What happens when an AI coding assistant doesn't just write code, but also manages its own "working memory"?

Starting with Coding Agents

Since 2025, a new category of tools has been reshaping how developers work — Coding Agents.

These are not traditional AI chatbots. You don't need to copy-paste code snippets or manually apply suggestions. You simply describe the task, and the agent autonomously reads code, makes edits, runs tests, and fixes bugs in a loop until the task is done.

	Traditional AI Chat	Coding Agent
Interaction	You ask, it answers, done	Autonomous loop, multi-step
File ops	Cannot	Read, write, edit files
Commands	Cannot	Run shell commands
Awareness	Cannot	Observes output, decides next step

A Coding Agent is built on three pillars:

LLM brain — understands tasks, reasons about solutions
Tools — lets the LLM take action in the real environment
Loop — continuous "think → act → observe" cycle until completion

Major products in this space include Anthropic's Claude Code, the open-source OpenCode, as well as Cursor Agent, Aider, Cline, and others. What they all share: they equip an LLM with a set of file manipulation and command execution tools, transforming the model from "can only chat" to "can actually do work."

Taking Claude Code as an example, its built-in tools include:

Tool	What it does
Read / Write / Edit	Read, create, and precisely modify files
Glob / Grep	Find files by pattern, search code content
Bash	Execute shell commands
Task	Spawn sub-agents to handle subtasks
WebSearch / WebFetch	Search the internet, fetch web content

Looking at this table, a natural idea emerges —

An Observation: Coding Agents Are Natural Memory Managers

If we treat the conversation between a user and an agent as "memory", and persist that memory as files, then a Coding Agent already has everything it needs to manage that memory:

Memory Operation	Corresponding Tool
Recall	Read / Grep / Glob
Record	Write
Update	Edit
Search	Grep / Glob
Organize & Summarize	Read + Edit

The key insight

No need to reinvent the wheel. Memory management is essentially file management, and Coding Agents are naturally the best file managers around.

This is the core idea behind Memory as File:

User ↔ Agent conversation
        ↓ persisted as
    Files (Markdown)
        ↓ managed via
    Coding Agent's built-in Tools

Why This Works

Zero additional infrastructure — no databases, vector stores, or external services needed
Human-readable and auditable — memory is just Markdown files; users can inspect and manually edit anytime
Native version control — put it in a Git repo and the full history of memory changes is transparent
Reuses existing capabilities — no new tools to build; the Coding Agent's current toolset is sufficient

Existing Validation

This is not purely hypothetical. Claude Code already ships with a minimal Memory as File mechanism: MEMORY.md.

Each project gets a dedicated directory at ~/.claude/projects/<path>/memory/, where MEMORY.md is automatically loaded into the system prompt of every new conversation. The agent can use Write/Edit tools to persist information, and the content survives across sessions.

Session 1: User says "This project uses bun, not npm"
            → Agent writes to MEMORY.md: "Package manager: use bun"

Session 2: New conversation starts
            → System prompt automatically includes MEMORY.md
            → Agent knows to use bun without being told again

This proves that "file as memory" works. But it's also minimal — a single file, 200-line limit, loaded in full, no structure. Which leads to a deeper question.

The real question

Who Should Manage the Agent's Attention?

MEMORY.md's approach is to dump all memory into the context window at once. This works for small projects, but hits a wall as memory grows and tasks become more complex.

An LLM's context window is a finite and expensive resource. Current approaches sit at two extremes:

Approach	Problem
Load everything	Context overflows, tokens wasted, attention diluted, quality degrades
User specifies	Requires the user to decide what to load — high cognitive burden

Is there a third way?

Let the agent manage its own working memory.

Agent-Driven Memory Management

Humans don't load all their knowledge into working memory at once. When writing frontend code, you're not simultaneously thinking about database index optimization. When debugging an API, you're not recalling CSS layout tricks. Human working memory is loaded on demand and released when done.

Coding Agents should work the same way.

Consider a large task — "Build a full-stack application" — that spans multiple knowledge domains, but each subtask only needs a subset:

Task: "Build a full-stack application"
├── Subtask 1: Database design
│   → Load: SQL skill, DB schema memory
│   → Unload: frontend-related memory
├── Subtask 2: API development
│   → Load: REST skill, auth memory
│   → Unload: SQL skill (no longer needed)
├── Subtask 3: Frontend pages
│   → Load: React skill, UI memory
│   → Unload: API skill
└── ...

Architecture

┌─────────────────────────────────────┐
│         Long-term Memory            │
│   (File system — all memory files)  │
│   skills/, mcp/, project-context/   │
│   ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐   │
│   │ A │ │ B │ │ C │ │ D │ │ E │   │
│   └───┘ └───┘ └───┘ └───┘ └───┘   │
└─────────────┬───────────────────────┘
              │ Agent decides load/unload
              ▼
┌─────────────────────────────────────┐
│         Working Memory              │
│   (Current context window)          │
│   ┌───┐ ┌───┐                       │
│   │ B │ │ D │  ← Only what the      │
│   └───┘ └───┘    current subtask    │
│                   needs             │
└─────────────────────────────────────┘

Long-term Memory: all memory files in the file system, unlimited capacity
Working Memory: current context window, limited capacity, requires careful management

How Does the Agent Know What to Load?

Approach	Description
Index file	Maintain an `index.md` with summaries. Agent reads the index first, then decides which files to load
Directory conventions	Structured directories (`skills/database/`, `skills/frontend/`) for quick Glob-based lookup
Metadata tags	Tags in each memory file header; the agent filters by tag using Grep

How Is "Unloading" Implemented?

Under current LLM architectures, content cannot be truly removed from the context window once added. But there are practical workarounds:

Approach	Feasibility	Description
Sub-agent (fork)	Best	Each subtask spawns a new sub-agent with a naturally clean context
Conversation split	Viable	Split large tasks into multiple conversations, loading only relevant memory
Explicit prompting	Marginal	Tell the agent "ignore the following" — unstable results

The sub-agent model aligns best with existing architectures. In Claude Code, the Task tool already supports spawning sub-agents:

Main Agent (coordinator)
├── Analyzes the large task, breaks it into subtasks
├── Determines which memory each subtask needs
└── Spawns Sub-Agent (Task tool)
     → Injects only the relevant memory file contents
     → Sub-agent works in a clean context
     → Returns results to the main agent

From prototype to vision

From MEMORY.md to a Complete Solution

	Claude Code's MEMORY.md	Full Memory as File
Essence	File as memory	File as memory
Scale	Single file, 200-line limit	Multi-file, multi-directory, unlimited
Structure	Unstructured	Organized by topic, tags, and indexes
Retrieval	Loaded in full, no search needed	Grep/Glob or semantic search
Manager	User / Agent passively records	Agent autonomously decides what to load and unload

MEMORY.md is the minimal viable proof. The complete solution answers a fundamental question:

Who should be responsible for managing the agent's attention?

The answer shifts from "the user manages manually" to "the agent manages autonomously." This is a critical step toward truly autonomous agents.

Open Questions

Several questions remain worth exploring:

Memory structure — Store raw conversation transcripts, or distilled summaries? Raw is complete but noisy; distilled is concise but loses information.
Memory granularity — One file per conversation? Split by topic? By timeline? Granularity determines retrieval efficiency.
Semantic retrieval — Grep/Glob uses keyword matching, which may miss semantically similar but differently worded memories. Do we need vector search? Or is the LLM's own comprehension sufficient?
Memory lifecycle — How to handle outdated memory? Automatic decay, agent-initiated cleanup, or manual user management?
Active vs. passive — Should the agent wait for the user to say "remember this," or proactively decide which information is worth persisting?

Conclusion

The emergence of Coding Agents has evolved LLMs from "chat tools" into "programming partners." But their memory capabilities remain primitive — either no memory at all, or loading a single small file in full.

Memory as File proposes a simple yet powerful abstraction: memory is files, file management is memory management, and Coding Agents happen to be the best file managers available.

Building on this, Agent-driven Memory Management takes it further by letting agents autonomously manage their own working memory — loading on demand, unloading when done, efficiently utilizing limited attention resources just as humans do.

This requires no additional infrastructure, no new tools — just a shift in perspective on the capabilities Coding Agents already possess.

The ideas in this post originated from observations and experiments while using Claude Code for day-to-day development. All concepts discussed (Memory as File, Agent-driven Memory Management) are still in the early conceptual stage. Discussion and feedback are welcome.