How Agent Memory Works

Why Memory Matters

Imagine hiring an assistant who forgets everything you tell them every time they leave the room. Every morning, you would have to re-explain your name, your job, your preferences, and every task you need done. That would be incredibly frustrating and unproductive.

Without memory, AI agents work exactly like that. Each new conversation starts from zero. The AI does not know who you are, what you like, what mistakes it made yesterday, or what project you are working on. Every session is a fresh start, which means you waste time repeating yourself and the agent keeps making the same errors.

Memory is what transforms a forgetful chatbot into a reliable agent. When an AI agent can remember context, it delivers consistent results, learns from past mistakes, and adapts to your preferences over time. Understanding how memory works is essential for building agents that are truly useful.

The Three Layers of Agent Memory

AI agent memory works in three layers. Each layer serves a different purpose, lasts for a different amount of time, and stores a different kind of information.

Layer 1: The Context Window (Working Memory)

The context window is the AI’s working memory. It holds everything the AI is actively thinking about right now: your current message, the system prompt, recent conversation history, and any files or data you have shared in this session.

Every AI model has a context window with a fixed size, measured in tokens (roughly, a token is about three-quarters of a word). Here are some approximate sizes:

Model	Context Window Size	Roughly Equivalent To
ChatGPT	128,000 tokens	~200 pages of text
Claude Sonnet	200,000 tokens	~300 pages of text
Gemini Pro	1,000,000 tokens	~1,500 pages of text
Llama 3 (Ollama)	8,000 - 128,000 tokens	~12 - 200 pages

The context window is like a desk. You can spread out papers, notes, and documents on it, but the desk has a fixed size. If you pile too much on it, older items fall off the edge and the AI forgets them. When the context window fills up, the AI starts dropping the earliest messages in the conversation to make room for new ones.

Key characteristics:

Temporary: it only exists during the current session.
Everything in the conversation is stored here, including the system prompt.
When it fills up, older content is quietly removed.
Once the session ends, the context window is cleared completely.

Layer 2: Conversation History (Short-Term Memory)

Conversation history is the record of messages exchanged between you and the AI during a single session. It includes everything you have said and everything the AI has responded with, in order.

This is like a notebook that you and the AI are writing in together during your conversation. As long as the conversation continues, the AI can look back at earlier messages to remember what you discussed, what decisions were made, and what tasks were completed.

Key characteristics:

Exists for the duration of one conversation session.
Grows with every message you send and every response the AI gives.
Stored inside the context window (so it counts toward the size limit).
On most platforms, you can scroll up to re-read it, but the AI may have “forgotten” very early messages if the context window overflowed.
When you start a new conversation, the history from the previous one is gone from the AI’s perspective (though the platform may save it in your account for you to review).

Layer 3: Configuration Files (Long-Term Memory)

Configuration files are the AI’s long-term memory. These are the .md files (like CLAUDE.md, AGENTS.md) that persist on your computer or in your platform settings between sessions. The AI reads them at the start of every new conversation.

This is like a reference binder that the AI consults before starting work each day. The binder contains the agent’s role, rules, preferences, and lessons learned from past sessions. Unlike the context window and conversation history, configuration files survive after the session ends. They are permanent until you change them.

Key characteristics:

Persistent: they exist across sessions, for as long as you keep the file.
Read at the start of every conversation, so the AI always has this information.
Can be updated by the AI (in the self-modifying prompts pattern) or by you manually.
Stored as files on your computer or in platform settings, not inside the AI model itself.

The Desk Analogy

Here is a simple analogy that ties all three layers together:

Memory Layer	Analogy	Lasts For	What It Holds
Context Window	Your desk	Current moment	Everything the AI is thinking about right now
Conversation History	Your notebook	One session	The full record of the current conversation
Configuration Files	Your reference binder	Permanently	Role, rules, preferences, and lessons learned

When you start a new session:

The AI opens its reference binder (reads the configuration file).
It places the binder contents on its desk (loads them into the context window).
As you start chatting, it writes in its notebook (builds the conversation history).
The notebook pages pile up on the desk (conversation history uses context window space).
If the desk gets too full, older notebook pages slide off (early conversation messages are dropped).
When you end the session, the desk is cleared and the notebook is closed. But the reference binder stays on the shelf for next time.

Why Each Layer Matters for Your Agents

Context Window Matters Because…

If your configuration file is too large, it takes up too much desk space, leaving less room for the actual conversation. This is why you should keep your configuration files focused and concise. A 10,000-word config file might push out parts of your conversation before you are done working.

Conversation History Matters Because…

The AI uses conversation history to stay coherent within a session. If you tell the AI “use blue instead of red” on message 3, it should remember that on message 20. But if the conversation is very long and the context window overflows, it might forget that instruction. For important instructions, put them in the configuration file rather than relying on conversation history.

Configuration Files Matter Because…

Configuration files are the only layer that truly persists. If you want your agent to remember something tomorrow, next week, or next month, it needs to be in the configuration file. Everything else vanishes when the session ends.

Memory Across Platforms

Different platforms handle memory slightly differently, but the three-layer model applies to all of them.

Context window: Up to 200K tokens depending on the model.
Conversation history: Saved in your Claude account; you can revisit past chats, but the AI treats each new session as a fresh start.
Long-term memory: CLAUDE.md files (manual) and Claude’s built-in memory feature (automatic). Projects also provide shared context.

Context window: Up to 1 million tokens with Gemini Pro, the largest of any major platform.
Conversation history: Saved in your Google account.
Long-term memory: Gems (custom system prompts, requires Gemini Advanced) and GEMINI.md for developer tools.

Understanding memory helps you diagnose problems when your agent misbehaves.

Problem: The agent forgot an instruction from earlier in the conversation. Likely cause: The context window overflowed and the instruction was dropped. Fix: Add important instructions to the configuration file instead of relying on conversation memory.

Problem: The agent keeps making the same mistake in every new session. Likely cause: There is no rule about it in the configuration file. Fix: Use the self-modifying prompt pattern to add a rule that prevents the mistake.

Problem: The agent seems slow or starts ignoring some of its rules. Likely cause: The configuration file may be too large, consuming too much of the context window. Fix: Review and prune the configuration file, keeping only the most important rules.

Key Takeaways

AI agent memory has three layers: context window (working memory), conversation history (session memory), and configuration files (persistent memory).
The context window has a fixed size. When it fills up, the AI forgets older messages.
Conversation history only lasts for one session. When you start a new chat, it is gone.
Configuration files are the only layer that persists across sessions. This is where all permanent knowledge belongs.
Understanding memory helps you build better agents and diagnose problems when they occur.

How Agent Memory Works

Why Memory Matters

The Three Layers of Agent Memory

Layer 1: The Context Window (Working Memory)

Layer 2: Conversation History (Short-Term Memory)

Layer 3: Configuration Files (Long-Term Memory)

The Desk Analogy

Why Each Layer Matters for Your Agents

Context Window Matters Because…

Conversation History Matters Because…

Configuration Files Matter Because…

Memory Across Platforms

Common Memory-Related Problems

Key Takeaways