How Agent Memory Works
Why Memory Matters
Section titled “Why Memory Matters”Imagine hiring an assistant who forgets everything you tell them every time they leave the room. Every morning, you would have to re-explain your name, your job, your preferences, and every task you need done. That would be incredibly frustrating and unproductive.
Without memory, AI agents work exactly like that. Each new conversation starts from zero. The AI does not know who you are, what you like, what mistakes it made yesterday, or what project you are working on. Every session is a fresh start, which means you waste time repeating yourself and the agent keeps making the same errors.
Memory is what transforms a forgetful chatbot into a reliable agent. When an AI agent can remember context, it delivers consistent results, learns from past mistakes, and adapts to your preferences over time. Understanding how memory works is essential for building agents that are truly useful.
The Three Layers of Agent Memory
Section titled “The Three Layers of Agent Memory”AI agent memory works in three layers. Each layer serves a different purpose, lasts for a different amount of time, and stores a different kind of information.
Layer 1: The Context Window (Working Memory)
Section titled “Layer 1: The Context Window (Working Memory)”The context window is the AI’s working memory. It holds everything the AI is actively thinking about right now: your current message, the system prompt, recent conversation history, and any files or data you have shared in this session.
Every AI model has a context window with a fixed size, measured in tokens (roughly, a token is about three-quarters of a word). Here are some approximate sizes:
| Model | Context Window Size | Roughly Equivalent To |
|---|---|---|
| ChatGPT | 128,000 tokens | ~200 pages of text |
| Claude Sonnet | 200,000 tokens | ~300 pages of text |
| Gemini Pro | 1,000,000 tokens | ~1,500 pages of text |
| Llama 3 (Ollama) | 8,000 - 128,000 tokens | ~12 - 200 pages |
The context window is like a desk. You can spread out papers, notes, and documents on it, but the desk has a fixed size. If you pile too much on it, older items fall off the edge and the AI forgets them. When the context window fills up, the AI starts dropping the earliest messages in the conversation to make room for new ones.
Key characteristics:
- Temporary: it only exists during the current session.
- Everything in the conversation is stored here, including the system prompt.
- When it fills up, older content is quietly removed.
- Once the session ends, the context window is cleared completely.
Layer 2: Conversation History (Short-Term Memory)
Section titled “Layer 2: Conversation History (Short-Term Memory)”Conversation history is the record of messages exchanged between you and the AI during a single session. It includes everything you have said and everything the AI has responded with, in order.
This is like a notebook that you and the AI are writing in together during your conversation. As long as the conversation continues, the AI can look back at earlier messages to remember what you discussed, what decisions were made, and what tasks were completed.
Key characteristics:
- Exists for the duration of one conversation session.
- Grows with every message you send and every response the AI gives.
- Stored inside the context window (so it counts toward the size limit).
- On most platforms, you can scroll up to re-read it, but the AI may have “forgotten” very early messages if the context window overflowed.
- When you start a new conversation, the history from the previous one is gone from the AI’s perspective (though the platform may save it in your account for you to review).
Layer 3: Configuration Files (Long-Term Memory)
Section titled “Layer 3: Configuration Files (Long-Term Memory)”Configuration files are the AI’s long-term memory. These are the .md files (like CLAUDE.md, AGENTS.md) that persist on your computer or in your platform settings between sessions. The AI reads them at the start of every new conversation.
This is like a reference binder that the AI consults before starting work each day. The binder contains the agent’s role, rules, preferences, and lessons learned from past sessions. Unlike the context window and conversation history, configuration files survive after the session ends. They are permanent until you change them.
Key characteristics:
- Persistent: they exist across sessions, for as long as you keep the file.
- Read at the start of every conversation, so the AI always has this information.
- Can be updated by the AI (in the self-modifying prompts pattern) or by you manually.
- Stored as files on your computer or in platform settings, not inside the AI model itself.
The Desk Analogy
Section titled “The Desk Analogy”Here is a simple analogy that ties all three layers together:
| Memory Layer | Analogy | Lasts For | What It Holds |
|---|---|---|---|
| Context Window | Your desk | Current moment | Everything the AI is thinking about right now |
| Conversation History | Your notebook | One session | The full record of the current conversation |
| Configuration Files | Your reference binder | Permanently | Role, rules, preferences, and lessons learned |
When you start a new session:
- The AI opens its reference binder (reads the configuration file).
- It places the binder contents on its desk (loads them into the context window).
- As you start chatting, it writes in its notebook (builds the conversation history).
- The notebook pages pile up on the desk (conversation history uses context window space).
- If the desk gets too full, older notebook pages slide off (early conversation messages are dropped).
- When you end the session, the desk is cleared and the notebook is closed. But the reference binder stays on the shelf for next time.
Why Each Layer Matters for Your Agents
Section titled “Why Each Layer Matters for Your Agents”Context Window Matters Because…
Section titled “Context Window Matters Because…”If your configuration file is too large, it takes up too much desk space, leaving less room for the actual conversation. This is why you should keep your configuration files focused and concise. A 10,000-word config file might push out parts of your conversation before you are done working.
Conversation History Matters Because…
Section titled “Conversation History Matters Because…”The AI uses conversation history to stay coherent within a session. If you tell the AI “use blue instead of red” on message 3, it should remember that on message 20. But if the conversation is very long and the context window overflows, it might forget that instruction. For important instructions, put them in the configuration file rather than relying on conversation history.
Configuration Files Matter Because…
Section titled “Configuration Files Matter Because…”Configuration files are the only layer that truly persists. If you want your agent to remember something tomorrow, next week, or next month, it needs to be in the configuration file. Everything else vanishes when the session ends.
Memory Across Platforms
Section titled “Memory Across Platforms”Different platforms handle memory slightly differently, but the three-layer model applies to all of them.
- Context window: Up to 200K tokens depending on the model.
- Conversation history: Saved in your Claude account; you can revisit past chats, but the AI treats each new session as a fresh start.
- Long-term memory:
CLAUDE.mdfiles (manual) and Claude’s built-in memory feature (automatic). Projects also provide shared context.
- Context window: Up to 128K tokens.
- Conversation history: Saved in your account. You can continue previous conversations, which loads past messages back into the context window.
- Long-term memory: Custom Instructions (free), Custom GPTs (requires paid plan), and ChatGPT’s automatic Memory feature that remembers facts across conversations.
- Context window: Up to 1 million tokens with Gemini Pro, the largest of any major platform.
- Conversation history: Saved in your Google account.
- Long-term memory: Gems (custom system prompts, requires Gemini Advanced) and
GEMINI.mdfor developer tools.
- Context window: Varies by model, typically 8K to 128K tokens.
- Conversation history: Not saved between sessions by default (runs locally in the terminal).
- Long-term memory: Modelfile with SYSTEM instructions. You must manually update the Modelfile to add new rules.
- Context window: Varies by model, configurable in settings.
- Conversation history: Saved locally on your computer.
- Long-term memory: System prompt presets. You can save and load different system prompts for different tasks.
Common Memory-Related Problems
Section titled “Common Memory-Related Problems”Understanding memory helps you diagnose problems when your agent misbehaves.
Problem: The agent forgot an instruction from earlier in the conversation. Likely cause: The context window overflowed and the instruction was dropped. Fix: Add important instructions to the configuration file instead of relying on conversation memory.
Problem: The agent keeps making the same mistake in every new session. Likely cause: There is no rule about it in the configuration file. Fix: Use the self-modifying prompt pattern to add a rule that prevents the mistake.
Problem: The agent seems slow or starts ignoring some of its rules. Likely cause: The configuration file may be too large, consuming too much of the context window. Fix: Review and prune the configuration file, keeping only the most important rules.
Key Takeaways
Section titled “Key Takeaways”- AI agent memory has three layers: context window (working memory), conversation history (session memory), and configuration files (persistent memory).
- The context window has a fixed size. When it fills up, the AI forgets older messages.
- Conversation history only lasts for one session. When you start a new chat, it is gone.
- Configuration files are the only layer that persists across sessions. This is where all permanent knowledge belongs.
- Understanding memory helps you build better agents and diagnose problems when they occur.