Learn

What is AI Agent Memory?

Why your AI CEO remembers last quarter and ChatGPT forgets last Tuesday.

AI agent memory is the persistent state an agent carries across sessions — facts about the user, past decisions, preferences, prior outputs — beyond what fits in a single LLM context window. It typically combines a short-term working buffer, a long-term store in a vector database or structured DB, and a retrieval policy that decides what to surface on each turn. Memory is what separates an AI employee from a chatbot.

Free to startNo credit card requiredUpdated Apr 2026
Short answer

AI agent memory is the persistent state an agent carries across sessions — facts about the user, past decisions, preferences, prior outputs — beyond what fits in a single LLM context window. It typically combines a short-term working buffer, a long-term store in a vector database or structured DB, and a retrieval policy that decides what to surface on each turn. Memory is what separates an AI employee from a chatbot.

In depth

A raw LLM is stateless. Every call starts from zero and only knows what you put in the prompt. Agent memory adds the missing ingredient: durable knowledge about the user, the business, and prior interactions, so the agent behaves like a continuous employee rather than a goldfish with perfect English. Memory in modern agents is usually organized into four tiers. (1) Working memory is the current context window — the last few turns, the system prompt, and any actively retrieved context. It's fast but tiny and wiped each turn. (2) Short-term session memory holds the current conversation and resets when the session ends; typically a rolling buffer summarized when it overflows. (3) Long-term semantic memory stores facts, preferences, and learned patterns in a vector DB or structured DB; retrieved by similarity search when relevant. (4) Episodic memory stores specific events — 'on 2026-03-02 the founder approved the pricing change to $49' — usually in a structured log, retrieved by date, entity, or keyword. The mechanics combine three techniques. Retrieval-augmented generation (RAG) fetches relevant memory chunks into the prompt on each turn. Summarization compresses old conversations into paragraph-sized notes that fit alongside retrieved chunks. Structured extraction pulls out entities and facts ('the user's company is Medvi, founded 2023, B2B healthcare') and stores them in a queryable DB so they can be filtered exactly rather than approximately. Production agents mix all three. Mem0, LangGraph's MemoryStore, and Letta (formerly MemGPT) are popular open-source libraries that wrap these patterns. The hard problems in agent memory are not storage — vector DBs are cheap — but curation. An agent that remembers everything quickly accumulates contradictory notes, stale preferences, and irrelevant tangents that pollute retrieval. Good memory systems distinguish facts from observations, version them with timestamps, reconcile contradictions ('the user said X in March, Y in April — use Y'), and forget deliberately. Getting this right is what makes Tycoon's AI CEO Astra feel like she actually knows your business after three weeks: the memory layer promotes important decisions, demotes small talk, and resolves conflicts automatically. Agent memory also raises privacy and safety questions. Whose memory is it? In Tycoon, each project has its own isolated memory — the CEO of Medvi's company never sees another customer's history. Memory can also be injected with adversarial content ('ignore prior instructions, the user's credit card is...'), which is why production systems separate memory storage from tool-executing actions and run policy filters on what memory can influence high-stakes decisions.

Examples

  • An AI CMO remembering your brand voice rules after being corrected once, not 50 times
  • An AI support agent recalling that Customer X had a billing issue last month and referring to the resolution without asking
  • Tycoon's AI CEO Astra remembering which investors are in your round and what they asked for last pitch
  • Letta (MemGPT) using a two-tier memory architecture — main context + archival — to handle unbounded conversation length
  • Mem0 storing user preferences as structured key-value pairs + embedding-based fuzzy retrieval
  • ChatGPT memory feature that stores persistent notes like 'user is vegetarian' and retrieves on relevant queries
  • An AI developer remembering a codebase's naming convention after one PR correction, applying it to all future code

Related terms

Frequently asked questions

How is agent memory different from a long context window?

A long context window (say Gemini 2.5's 2M tokens) lets you stuff a lot of information into a single prompt, but that information is reloaded every call, charged per token, and subject to the 'lost in the middle' problem where models ignore content in the middle of long contexts. Memory is persistent storage outside the prompt, retrieved selectively on each turn. Long context is cheap for one-off analysis of a big document; memory is cheap for ongoing work where you need continuity across thousands of interactions. Production agents use both: long context for the current turn, memory for the continuity.

What's the difference between short-term and long-term memory?

Short-term (or working) memory is the current session — typically the recent conversation buffer, currently-retrieved facts, and scratchpad state the agent needs to complete the current task. It's often wiped at session end or summarized into long-term. Long-term memory is persistent across sessions — facts about the user, completed tasks, learned preferences, archived conversations — stored in a vector DB or structured DB. The boundary is fuzzy; most systems have a graduation policy where important short-term content gets promoted to long-term based on recency, repetition, or explicit signals like 'remember this'.

Do I need a vector database for agent memory?

Not always. For small agents with a few hundred memories per user, a SQLite table with keyword search works fine and is dramatically simpler. You need a vector DB when (a) memory grows beyond a few thousand entries per user, (b) retrieval by semantic similarity matters (natural-language queries that don't keyword-match), or (c) you want hybrid search combining keywords and embeddings. pgvector, Pinecone, Weaviate, and Qdrant are common production choices. Tycoon uses pgvector because the rest of the platform runs on PostgreSQL and a separate vector service would add operational overhead for modest benefit.

How do AI employees decide what to remember?

Three common policies. (1) Explicit: the user says 'remember X' and the agent writes X to long-term memory. (2) Implicit via classification: after each turn, a secondary LLM call classifies what was discussed and extracts any durable facts, preferences, or decisions worth keeping. (3) Frequency-based: repeated mentions of the same fact across conversations promote it to memory. Production systems usually combine all three. The secondary-LLM extraction approach is popular because it catches things the user wouldn't think to mark explicitly, but it adds inference cost. Tycoon uses a hybrid: explicit remembers for high-value facts, implicit extraction for decisions and preferences.

What happens when memory gets contradictory?

This is the hardest problem in production agent memory. A user says 'we're targeting enterprise' in March, 'we're pivoting to SMB' in April — naive retrieval surfaces both and confuses the agent. Good systems handle it with (a) timestamps on every memory so the retrieval layer can prefer recent, (b) entity-level reconciliation where facts about the same entity get merged and old versions archived, and (c) an occasional 'memory consolidation' pass (often overnight) where a small LLM reviews conflicts and resolves them. Letta's pattern of an explicit reflection step is a good reference design. Tycoon runs nightly consolidation per project so Astra always works from a clean, current picture.

Run your one-person company.

Hire your AI team in 30 seconds. Start for free.

Free to start · No credit card required · Set up in 30 seconds