What is AI Agent Memory?
Why your AI CEO remembers last quarter and ChatGPT forgets last Tuesday.
AI agent memory is the persistent state an agent carries across sessions — facts about the user, past decisions, preferences, prior outputs — beyond what fits in a single LLM context window. It typically combines a short-term working buffer, a long-term store in a vector database or structured DB, and a retrieval policy that decides what to surface on each turn. Memory is what separates an AI employee from a chatbot.
AI agent memory is the persistent state an agent carries across sessions — facts about the user, past decisions, preferences, prior outputs — beyond what fits in a single LLM context window. It typically combines a short-term working buffer, a long-term store in a vector database or structured DB, and a retrieval policy that decides what to surface on each turn. Memory is what separates an AI employee from a chatbot.
In depth
Examples
- →An AI CMO remembering your brand voice rules after being corrected once, not 50 times
- →An AI support agent recalling that Customer X had a billing issue last month and referring to the resolution without asking
- →Tycoon's AI CEO Astra remembering which investors are in your round and what they asked for last pitch
- →Letta (MemGPT) using a two-tier memory architecture — main context + archival — to handle unbounded conversation length
- →Mem0 storing user preferences as structured key-value pairs + embedding-based fuzzy retrieval
- →ChatGPT memory feature that stores persistent notes like 'user is vegetarian' and retrieves on relevant queries
- →An AI developer remembering a codebase's naming convention after one PR correction, applying it to all future code
Related terms
Frequently asked questions
How is agent memory different from a long context window?
A long context window (say Gemini 2.5's 2M tokens) lets you stuff a lot of information into a single prompt, but that information is reloaded every call, charged per token, and subject to the 'lost in the middle' problem where models ignore content in the middle of long contexts. Memory is persistent storage outside the prompt, retrieved selectively on each turn. Long context is cheap for one-off analysis of a big document; memory is cheap for ongoing work where you need continuity across thousands of interactions. Production agents use both: long context for the current turn, memory for the continuity.
What's the difference between short-term and long-term memory?
Short-term (or working) memory is the current session — typically the recent conversation buffer, currently-retrieved facts, and scratchpad state the agent needs to complete the current task. It's often wiped at session end or summarized into long-term. Long-term memory is persistent across sessions — facts about the user, completed tasks, learned preferences, archived conversations — stored in a vector DB or structured DB. The boundary is fuzzy; most systems have a graduation policy where important short-term content gets promoted to long-term based on recency, repetition, or explicit signals like 'remember this'.
Do I need a vector database for agent memory?
Not always. For small agents with a few hundred memories per user, a SQLite table with keyword search works fine and is dramatically simpler. You need a vector DB when (a) memory grows beyond a few thousand entries per user, (b) retrieval by semantic similarity matters (natural-language queries that don't keyword-match), or (c) you want hybrid search combining keywords and embeddings. pgvector, Pinecone, Weaviate, and Qdrant are common production choices. Tycoon uses pgvector because the rest of the platform runs on PostgreSQL and a separate vector service would add operational overhead for modest benefit.
How do AI employees decide what to remember?
Three common policies. (1) Explicit: the user says 'remember X' and the agent writes X to long-term memory. (2) Implicit via classification: after each turn, a secondary LLM call classifies what was discussed and extracts any durable facts, preferences, or decisions worth keeping. (3) Frequency-based: repeated mentions of the same fact across conversations promote it to memory. Production systems usually combine all three. The secondary-LLM extraction approach is popular because it catches things the user wouldn't think to mark explicitly, but it adds inference cost. Tycoon uses a hybrid: explicit remembers for high-value facts, implicit extraction for decisions and preferences.
What happens when memory gets contradictory?
This is the hardest problem in production agent memory. A user says 'we're targeting enterprise' in March, 'we're pivoting to SMB' in April — naive retrieval surfaces both and confuses the agent. Good systems handle it with (a) timestamps on every memory so the retrieval layer can prefer recent, (b) entity-level reconciliation where facts about the same entity get merged and old versions archived, and (c) an occasional 'memory consolidation' pass (often overnight) where a small LLM reviews conflicts and resolves them. Letta's pattern of an explicit reflection step is a good reference design. Tycoon runs nightly consolidation per project so Astra always works from a clean, current picture.
Run your one-person company.
Hire your AI team in 30 seconds. Start for free.
Free to start · No credit card required · Set up in 30 seconds