Intermediate4 min read

How Would You Implement Memory for a Long-Running AI Agent?

Design a memory system for a long-running AI agent — covering in-context working memory, episodic recall, semantic knowledge, and retrieval strategies.

Also preparing for coding interviews?

Rubduck is an AI mock interviewer for DSA and coding rounds — get instant feedback on your solutions.

Daily tips, confessions & AI news. Unsubscribe anytime. Questions? [email protected]

Why This Is Asked

LLMs have finite context windows. A long-running agent that can't remember what happened 10 messages ago is useless for real-world tasks. Interviewers ask this to see if you can design around one of the fundamental limitations of current LLMs.

Key Concepts to Cover

  • Working memory — what's in the context window right now
  • Episodic memory — recall of past conversations or task executions
  • Semantic memory — stored facts and knowledge about the world or user
  • Skill memory — reusable action patterns and tool-use sequences for recurring task types
  • Retrieval strategies — how to decide what to pull into context
  • Memory decay — when and how to forget or compress old memories
  • Privacy — handling sensitive information in persistent memory

How to Approach This

1. The Memory Taxonomy

Borrow from cognitive science — agents need different types of memory:

Working memory (in-context):

  • The current conversation + recent tool call results
  • Limited by context window size
  • Lost when the session ends

Episodic memory (external storage):

  • Summaries of past conversations and task executions
  • Stored in a database, retrieved when relevant
  • "Last time we discussed your project, you were working on the authentication flow"

Semantic memory (knowledge base):

  • Facts about the user, preferences, ongoing projects
  • Structured storage, updated as new information is learned
  • "User prefers Python. Current project: e-commerce platform."

Skill memory (reusable action templates):

  • Reusable action sequences or tool-use patterns that worked for certain task types
  • In cognitive science, "procedural memory" refers to implicit motor skills (how to ride a bike). In AI agents, the analogous concept is learned action policies or skill programs — for example, a stored sequence of tool calls for "look up a company's SEC filings" that can be retrieved and reused when a similar task arises

2. Handling Context Length

The working memory fits in the context window. As conversations grow:

Summarization: Periodically summarize older turns:

System: The following is a summary of the conversation so far:
[summary of first 20 turns]

Recent conversation:
[last 5 turns verbatim]

Sliding window: Keep only the last N messages verbatim, drop older ones.

Hierarchical compression: Detailed recent memory, summary of older memory, key facts only for oldest memory.

3. Retrieval-Based Episodic Memory

For multi-session agents:

  • After each session, generate a summary and store it with an embedding
  • At the start of a new session, retrieve relevant past sessions using the current context as the query
  • Inject retrieved memories into the context: "Based on your previous sessions: [retrieved summaries]"

This is essentially RAG applied to conversational history.

4. Semantic Memory (User Profile)

Maintain a structured user/world model that updates over time:

{
  "user_id": "user_123",
  "name": "Sarah",
  "preferences": {
    "language": "Python",
    "editor": "VS Code",
    "prefers_concise_explanations": true
  },
  "current_projects": [
    {
      "name": "E-commerce Platform",
      "tech_stack": ["Next.js", "Go", "PostgreSQL"],
      "current_focus": "checkout flow"
    }
  ],
  "last_updated": "2025-02-01"
}

The agent reads this at session start and updates it when new information is learned.

5. Privacy Considerations

Persistent memory raises privacy concerns:

  • Users must be aware that information is being stored
  • Provide a way to view and delete stored memories
  • Do not store sensitive information (passwords, payment details)
  • Apply data retention policies (auto-expire old memories)
  • Consider user consent before inferring and storing preferences

Common Follow-ups

  1. "How do you prevent the agent from learning incorrect information?" Before updating semantic memory, validate new information against what's already stored. For important facts, require explicit confirmation ("I've learned that you prefer Python — is that correct?"). Use confidence scores and decay old facts that haven't been reinforced.

  2. "How would you implement 'forgetting' — removing memories that are no longer relevant?" Time-based decay: reduce the retrieval probability of old memories based on age. Relevance-based pruning: if a memory hasn't been retrieved in N sessions, archive or delete it. Explicit user control: let users flag memories to forget.

  3. "How does memory retrieval work in real time without adding latency?" Pre-fetch: start retrieving relevant memories as soon as the user begins typing. Cache frequently accessed memories in Redis. Run retrieval in parallel with the LLM's initial response (for streaming). Keep the semantic memory small enough to include in full at session start, avoiding runtime retrieval entirely.

Related Questions

Prep the coding round too

AI knowledge is only half the picture. Rubduck helps you nail DSA and coding interviews with an AI interviewer that gives real-time feedback.

Daily tips, confessions & AI news. Unsubscribe anytime. Questions? [email protected]