Design a memory system for a long-running AI agent — covering in-context working memory, episodic recall, semantic knowledge, and retrieval strategies.

Learn how to implement memory for long-running AI agents. Covers working memory, episodic memory, semantic memory, retrieval strategies, and context management.

AI Agent Memory Systems - Interview Question

Why This Is Asked

LLMs have finite context windows. A long-running agent that can't remember what happened 10 messages ago is useless for real-world tasks. Interviewers ask this to see if you can design around one of the fundamental limitations of current LLMs.

Key Concepts to Cover

Working memory — what's in the context window right now
Episodic memory — recall of past conversations or task executions
Semantic memory — stored facts and knowledge about the world or user
Skill memory — reusable action patterns and tool-use sequences for recurring task types
Retrieval strategies — how to decide what to pull into context
Memory decay — when and how to forget or compress old memories
Privacy — handling sensitive information in persistent memory

How to Approach This

1. The Memory Taxonomy

Borrow from cognitive science — agents need different types of memory:

Working memory (in-context):

The current conversation + recent tool call results
Limited by context window size
Lost when the session ends

Episodic memory (external storage):

Summaries of past conversations and task executions
Stored in a database, retrieved when relevant
"Last time we discussed your project, you were working on the authentication flow"

Semantic memory (knowledge base):

Facts about the user, preferences, ongoing projects
Structured storage, updated as new information is learned
"User prefers Python. Current project: e-commerce platform."

Skill memory (reusable action templates):

Reusable action sequences or tool-use patterns that worked for certain task types
In cognitive science, "procedural memory" refers to implicit motor skills (how to ride a bike). In AI agents, the analogous concept is learned action policies or skill programs — for example, a stored sequence of tool calls for "look up a company's SEC filings" that can be retrieved and reused when a similar task arises

2. Handling Context Length

The working memory fits in the context window. As conversations grow:

Summarization: Periodically summarize older turns:

System: The following is a summary of the conversation so far:
[summary of first 20 turns]

Recent conversation:
[last 5 turns verbatim]

Sliding window: Keep only the last N messages verbatim, drop older ones.

Hierarchical compression: Detailed recent memory, summary of older memory, key facts only for oldest memory.

3. Retrieval-Based Episodic Memory

For multi-session agents:

After each session, generate a summary and store it with an embedding
At the start of a new session, retrieve relevant past sessions using the current context as the query
Inject retrieved memories into the context: "Based on your previous sessions: [retrieved summaries]"

This is essentially RAG applied to conversational history.

4. Semantic Memory (User Profile)

Maintain a structured user/world model that updates over time:

{
  "user_id": "user_123",
  "name": "Sarah",
  "preferences": {
    "language": "Python",
    "editor": "VS Code",
    "prefers_concise_explanations": true
  },
  "current_projects": [
    {
      "name": "E-commerce Platform",
      "tech_stack": ["Next.js", "Go", "PostgreSQL"],
      "current_focus": "checkout flow"
    }
  ],
  "last_updated": "2025-02-01"
}

The agent reads this at session start and updates it when new information is learned.

5. Privacy Considerations

Persistent memory raises privacy concerns:

Users must be aware that information is being stored
Provide a way to view and delete stored memories
Do not store sensitive information (passwords, payment details)
Apply data retention policies (auto-expire old memories)
Consider user consent before inferring and storing preferences

Common Follow-ups

"How do you prevent the agent from learning incorrect information?" Before updating semantic memory, validate new information against what's already stored. For important facts, require explicit confirmation ("I've learned that you prefer Python — is that correct?"). Use confidence scores and decay old facts that haven't been reinforced.
"How would you implement 'forgetting' — removing memories that are no longer relevant?" Time-based decay: reduce the retrieval probability of old memories based on age. Relevance-based pruning: if a memory hasn't been retrieved in N sessions, archive or delete it. Explicit user control: let users flag memories to forget.
"How does memory retrieval work in real time without adding latency?" Pre-fetch: start retrieving relevant memories as soon as the user begins typing. Cache frequently accessed memories in Redis. Run retrieval in parallel with the LLM's initial response (for streaming). Keep the semantic memory small enough to include in full at session start, avoiding runtime retrieval entirely.

How Would You Implement Memory for a Long-Running AI Agent?

Why This Is Asked

Key Concepts to Cover

How to Approach This

1. The Memory Taxonomy

2. Handling Context Length

3. Retrieval-Based Episodic Memory

4. Semantic Memory (User Profile)

5. Privacy Considerations

Common Follow-ups

Related Questions

Explain the ReAct Pattern and When You Would Use It

How Do You Decide What Tools to Give an AI Agent?

Design a RAG Pipeline from Scratch

Prep for the full interview loop