# Agent Memory Glossary

> 78 terms across memory taxonomy, RAG, vector stores, knowledge graphs, agent-memory patterns, context engineering, and forgetting strategies. Each term cited from a canonical primary source. Maintained by AgentsBooks (https://agentsbooks.com). Last refreshed 2026-05-08.

## TL;DR

78 terms covering the memory architecture of LLM agents: working / short / long-term / episodic / semantic / procedural memory; RAG (retriever, ranker, reranker, chunking, hybrid search, query decomposition, fusion decoding, GraphRAG); vector stores (HNSW, IVF, ANN, pgvector, Qdrant, Weaviate, Milvus, Pinecone); knowledge graphs (entity, relation, triple store, property graph, Cypher, SPARQL); agent-memory patterns (MemGPT/Letta, Mem0, MemoryBank, A-Mem, Generative Agents, Voyager, Reflexion); context engineering (context window, prompt caching, lost-in-the-middle, NIAH, compaction); forgetting strategies (decay, supersession, eviction, freshness floor).

Each entry has a primary-source citation and a date accessed. Flagship terms link to the deep glossary for longer narrative entries.

## Memory taxonomy (9)

### Working memory _(also: short-term memory)_
The active scratchpad an agent uses inside a single turn or short reasoning loop — the portion of context the model is currently attending to. In LLM agents, working memory is implemented as the live prompt window plus any inline scratchpad tokens, and is bounded by the context window.

Source: [Anthropic — Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) — accessed 2026-05-08. See also: Context window, Long-term memory, Scratchpad, Episodic memory.

### Short-term memory
Information retained across a small number of turns within a single session — typically the last N exchanges held verbatim or summarised in the prompt. Distinct from working memory by spanning multiple turns; distinct from long-term memory by not persisting after the session.

Source: [LangChain — Memory in Agents](https://python.langchain.com/docs/how_to/migrate_agent/) — accessed 2026-05-08. See also: Working memory, Long-term memory, Summarisation memory, Context window.

### Long-term memory
Persistent stores attached to an agent that survive between sessions — relational, blob, vector, filesystem, or knowledge graph. Long-term memory is what separates an agent from a stateless chatbot.

Source: [AgentsBooks — Anatomy of a Firm](https://agentsbooks.com/anatomy) — accessed 2026-05-08. Compare with the deep glossary: https://agentic-glossary.roei-020.workers.dev/. See also: Memory (primitive), RAG, Vector database, Episodic memory, Semantic memory.

### Episodic memory _[Foundational]_
Time-stamped, event-shaped memories of specific past interactions — "what happened, with whom, when, why." In agent systems, episodic memory is typically a log of past dialogues or task traces indexed for later retrieval, modelled on the Generative Agents memory stream.

Source: [Park et al. — Generative Agents (2023)](https://arxiv.org/abs/2304.03442) — accessed 2026-05-08. See also: Memory stream, Reflection, Semantic memory, Long-term memory.

### Semantic memory
Generalised facts and concepts an agent has learned — "what is true," decoupled from when it was learned. Implemented as a vector store of distilled knowledge, a knowledge graph of entities, or a fine-tuned model checkpoint.

Source: [Mem0 — Memory architecture for AI agents](https://docs.mem0.ai/overview) — accessed 2026-05-08. See also: Episodic memory, Knowledge graph, RAG, Long-term memory.

### Procedural memory
Learned skills and routines — "how to do things." In agent systems, procedural memory is typically the agent's library of saved tool-use programs, code snippets, or task plans that can be reused, e.g. Voyager's skill library.

Source: [Wang et al. — Voyager (2023)](https://arxiv.org/abs/2305.16291) — accessed 2026-05-08. See also: Skill library, Voyager, Semantic memory.

### Memory (primitive) _(also: memory layer)_
Fourth of the AgentsBooks 8 primitives. Long-term persistent stores attached to an agent: relational, blob, vector, filesystem (PostgreSQL, Redis, Firestore, S3, Pinecone, MongoDB). What separates an agent from a stateless chatbot.

Source: [AgentsBooks — Anatomy of a Firm](https://agentsbooks.com/anatomy) — accessed 2026-05-08. Compare with the deep glossary: https://agentic-glossary.roei-020.workers.dev/. See also: Long-term memory, RAG, Vector database, Knowledge (primitive).

### Knowledge (primitive)
Sixth of the AgentsBooks 8 primitives. Expertise-domains layer: structured knowledge bases, scraped documentation, RSS feeds, ingested URLs, and the retrieval pipelines that surface them at inference time. Configurable subject-matter expertise.

Source: [AgentsBooks — Anatomy of a Firm](https://agentsbooks.com/anatomy) — accessed 2026-05-08. Compare with the deep glossary: https://agentic-glossary.roei-020.workers.dev/. See also: Memory (primitive), RAG, Knowledge graph, Chunking.

### Memory stream _[Foundational]_
A time-ordered log of all observations and reflections an agent has produced, scored at retrieval time on recency, importance, and relevance. Introduced by Park et al. as the substrate of believable agent behaviour over long horizons.

Source: [Park et al. — Generative Agents (2023)](https://arxiv.org/abs/2304.03442) — accessed 2026-05-08. See also: Episodic memory, Reflection, Importance score, Recency bias.

## RAG (19)

### RAG _(also: Retrieval-Augmented Generation)_ _[Foundational]_
An architecture that combines a non-parametric retriever (typically a vector store) with a parametric generator (an LLM), so the model conditions its output on documents fetched at inference time. Coined by Lewis et al. (Facebook AI) in 2020.

Source: [Lewis et al. — Retrieval-Augmented Generation (2020)](https://arxiv.org/abs/2005.11401) — accessed 2026-05-08. Compare with the deep glossary: https://agentic-glossary.roei-020.workers.dev/. See also: Embedding, Vector database, Retriever, Chunking, Hybrid search.

### Retriever
The component of a RAG pipeline that turns a query into a ranked set of candidate documents. Retrievers may be sparse (BM25), dense (vector), or hybrid; quality of the retriever sets the ceiling on the generator's answer.

Source: [Lewis et al. — Retrieval-Augmented Generation (2020)](https://arxiv.org/abs/2005.11401) — accessed 2026-05-08. See also: RAG, Reranker, Hybrid search, BM25, Dense retrieval.

### Ranker
The first-stage scoring function that orders candidate documents by predicted relevance to a query. In modern RAG, the ranker is usually a vector-similarity score plus optional metadata filters; it feeds the reranker.

Source: [Pinecone — Reranking guide](https://docs.pinecone.io/guides/search/rerank-results) — accessed 2026-05-08. See also: Retriever, Reranker, Similarity search, BM25.

### Reranker _(also: cross-encoder)_
A second-stage model that re-scores the retriever's top-K candidates for finer relevance — typically a cross-encoder that reads (query, document) jointly. Rerankers trade latency for precision and are the single biggest lever on RAG quality.

Source: [Cohere — Rerank model documentation](https://docs.cohere.com/docs/rerank-overview) — accessed 2026-05-08. See also: Retriever, Ranker, RAG, BM25.

### Chunking
Splitting source documents into retrieval-sized passages before embedding. Chunk size, overlap, and boundary policy (fixed-token, sentence, semantic, structural) materially affect retrieval recall and answer quality.

Source: [Pinecone — Chunking strategies](https://www.pinecone.io/learn/chunking-strategies/) — accessed 2026-05-08. See also: Embedding, RAG, Semantic chunking, Context window.

### Semantic chunking
A chunking strategy that places chunk boundaries where embedding similarity between adjacent sentences drops, rather than at fixed token counts. Produces semantically coherent passages at the cost of preprocessing time.

Source: [LangChain — Semantic chunker](https://python.langchain.com/docs/how_to/semantic-chunker/) — accessed 2026-05-08. See also: Chunking, Embedding, RAG.

### Embedding
A dense vector representation of text, image, or audio that places semantically similar inputs near each other in vector space. Embeddings are the substrate of dense retrieval and similarity search.

Source: [OpenAI — Embeddings guide](https://platform.openai.com/docs/guides/embeddings) — accessed 2026-05-08. See also: Vector database, RAG, Similarity search, Dense retrieval.

### Dense retrieval
Retrieval that scores documents by vector similarity between dense embeddings of query and document. Strong on semantic match ("Boris Yeltsin" ≈ "Russian president"), weaker on rare keywords and exact codes.

Source: [Karpukhin et al. — Dense Passage Retrieval (2020)](https://arxiv.org/abs/2004.04906) — accessed 2026-05-08. See also: Sparse retrieval, Hybrid search, Embedding, RAG.

### Sparse retrieval _[Foundational]_
Retrieval that scores documents by lexical overlap between query and document terms — most commonly BM25. Strong on rare terms, identifiers, and exact-match queries; weaker on paraphrase.

Source: [Robertson & Zaragoza — BM25 (2009)](https://www.staff.city.ac.uk/~sbrp622/papers/foundations_bm25_review.pdf) — accessed 2026-05-08. See also: BM25, Dense retrieval, Hybrid search.

### BM25 _(also: Okapi BM25)_ _[Foundational]_
The default sparse-retrieval scoring function: a TF-IDF-style ranker with term-saturation and length-normalisation parameters. Still the strongest single-method baseline for many production search workloads, especially those with rare terms.

Source: [Robertson & Zaragoza — BM25 (2009)](https://www.staff.city.ac.uk/~sbrp622/papers/foundations_bm25_review.pdf) — accessed 2026-05-08. See also: Sparse retrieval, Hybrid search, Dense retrieval.

### Hybrid search
Retrieval that combines a dense (vector) score with a sparse (BM25) score, optionally fused with metadata filters. Outperforms pure-vector retrieval on most production workloads because the two signals fail in different places.

Source: [Weaviate — Hybrid search documentation](https://weaviate.io/developers/weaviate/search/hybrid) — accessed 2026-05-08. See also: Dense retrieval, Sparse retrieval, BM25, Reciprocal Rank Fusion.

### Reciprocal Rank Fusion _(also: RRF)_ _[Foundational]_
A score-free way to merge multiple ranked result lists by summing 1/(k + rank) across rankers. The default fusion strategy in Weaviate and Elasticsearch hybrid search because it works without calibrating dense and sparse scores.

Source: [Cormack et al. — RRF (2009)](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — accessed 2026-05-08. See also: Hybrid search, Ranker, Reranker.

### Query decomposition
Splitting a complex user query into multiple sub-queries, each retrieved separately, before fusing the results. A common pattern when one question requires answers from multiple documents or sub-topics.

Source: [LangChain — Query analysis (decomposition)](https://python.langchain.com/docs/how_to/query_multiple_queries/) — accessed 2026-05-08. See also: Query expansion, HyDE, RAG, Fusion decoding.

### Query expansion
Rewriting or augmenting a query with synonyms, near-paraphrases, or hypothetical answers before retrieval, to broaden recall. HyDE (Hypothetical Document Embeddings) is a popular zero-shot variant.

Source: [Gao et al. — HyDE (2022)](https://arxiv.org/abs/2212.10496) — accessed 2026-05-08. See also: HyDE, Query decomposition, RAG.

### HyDE _(also: Hypothetical Document Embeddings)_
A query-expansion technique that asks the LLM to generate a hypothetical answer document, embeds the answer (not the question), and uses that as the retrieval query. Often improves recall on under-specified queries.

Source: [Gao et al. — Precise Zero-Shot Dense Retrieval without Relevance Labels (2022)](https://arxiv.org/abs/2212.10496) — accessed 2026-05-08. See also: Query expansion, Embedding, RAG.

### Fusion decoding _(also: Fusion-in-Decoder, FiD)_ _[Foundational]_
A generation pattern where retrieved passages are encoded independently, then their representations are fused inside the decoder. Originally proposed by Izacard & Grave (Fusion-in-Decoder), it scales to many passages without quadratic attention cost.

Source: [Izacard & Grave — Fusion-in-Decoder (2020)](https://arxiv.org/abs/2007.01282) — accessed 2026-05-08. See also: RAG, Query decomposition.

### Agentic RAG _[Emerging 2026]_
RAG where the retriever is invoked as a tool by an agent loop, allowing iterative re-querying, multi-hop retrieval, and self-correction. The dominant production pattern for non-trivial document QA in 2026.

Source: [Anthropic — Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) — accessed 2026-05-08. See also: RAG, Self-RAG, Agentic RAG.

### Self-RAG
A pattern where the LLM emits special reflection tokens (Retrieve / Critique / Continue) deciding when and what to retrieve, and grades its own retrievals. Reduces hallucination on long-form generation when a static retriever underfits.

Source: [Asai et al. — Self-RAG (2023)](https://arxiv.org/abs/2310.11511) — accessed 2026-05-08. See also: Agentic RAG, RAG, Reflection.

### GraphRAG _[Emerging 2026]_
A retrieval pattern that builds a knowledge graph from a corpus, runs community detection over it, and serves graph-aware summaries at query time. Open-sourced by Microsoft Research; outperforms vector RAG on holistic / dataset-scope questions.

Source: [Microsoft Research — GraphRAG](https://microsoft.github.io/graphrag/) — accessed 2026-05-08. See also: Knowledge graph, RAG, Context graph, Entity resolution.

## Vector store (12)

### Vector database _(also: vector store, VectorDB)_
A specialised database for storing and querying high-dimensional vectors with metadata, typically supporting approximate-nearest-neighbour (ANN) search and metadata filters. Production options in 2026 include Pinecone, Weaviate, Qdrant, Milvus, Chroma, pgvector, and Vespa.

Source: [Pinecone — What is a vector database?](https://www.pinecone.io/learn/vector-database/) — accessed 2026-05-08. See also: Embedding, ANN search, HNSW, pgvector, RAG.

### ANN search _(also: Approximate Nearest Neighbour)_
Sub-linear search for the K vectors closest to a query, trading exact recall for speed. ANN is the only practical option above ~10⁵ vectors; common index families are HNSW, IVF, and product quantisation.

Source: [Faiss — Approximate nearest neighbour wiki](https://github.com/facebookresearch/faiss/wiki) — accessed 2026-05-08. See also: HNSW, IVF, Vector database, Similarity search.

### HNSW _(also: Hierarchical Navigable Small World)_ _[Foundational]_
A graph-based ANN index that organises vectors into a hierarchy of small-world graphs and greedily descends from the top layer. The default high-recall index in Qdrant, Weaviate, Milvus, pgvector, and most modern vector stores.

Source: [Malkov & Yashunin — HNSW (2018)](https://arxiv.org/abs/1603.09320) — accessed 2026-05-08. See also: ANN search, IVF, Vector database.

### IVF _(also: Inverted File Index)_
An ANN index that clusters vectors with k-means, then searches only the nprobe nearest clusters. Lower memory than HNSW; strong fit when the corpus is too large to keep a graph in RAM. Often combined with product quantisation (IVF-PQ).

Source: [Faiss — IVF documentation](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes) — accessed 2026-05-08. See also: HNSW, ANN search, Product quantisation.

### Product quantisation _(also: PQ)_ _[Foundational]_
A vector compression technique that splits each embedding into sub-vectors and replaces each sub-vector with the ID of its nearest centroid. 8–32× memory savings at modest recall cost; standard for billion-scale ANN.

Source: [Jégou et al. — Product Quantization (2011)](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf) — accessed 2026-05-08. See also: IVF, ANN search, Vector database.

### Similarity search
Returning items that are "close" to a query under a chosen distance metric — cosine, dot product, or Euclidean. The primitive operation a vector database is built around.

Source: [Pinecone — Similarity search](https://docs.pinecone.io/guides/search/semantic-search) — accessed 2026-05-08. See also: Embedding, ANN search, Vector database, Cosine similarity.

### Cosine similarity
Similarity measured as the cosine of the angle between two vectors — invariant to magnitude. The default similarity metric for normalised text embeddings.

Source: [Pinecone — Vector similarity explained](https://www.pinecone.io/learn/vector-similarity/) — accessed 2026-05-08. See also: Similarity search, Embedding.

### pgvector
A PostgreSQL extension that adds a vector column type, multiple distance operators, and HNSW / IVFFlat indexes. The pragmatic choice when you already operate PostgreSQL and don't want a second datastore.

Source: [pgvector — GitHub](https://github.com/pgvector/pgvector) — accessed 2026-05-08. See also: Vector database, HNSW, IVF.

### Qdrant
An open-source Rust-based vector database with strong filtered-search performance and binary quantisation. Often cited as the open-source speed leader at p99 in third-party benchmarks.

Source: [Qdrant — Documentation](https://qdrant.tech/documentation/) — accessed 2026-05-08. See also: Vector database, Weaviate, Milvus, HNSW.

### Weaviate
An open-source vector database with native hybrid search (vector + BM25 + metadata filters), generative-feedback modules, and built-in schema. Defaults to HNSW and uses Reciprocal Rank Fusion for hybrid scoring.

Source: [Weaviate — Documentation](https://weaviate.io/developers/weaviate) — accessed 2026-05-08. See also: Vector database, Hybrid search, Reciprocal Rank Fusion, HNSW.

### Milvus
An open-source distributed vector database with separation of compute and storage, supporting HNSW, IVF, DiskANN, and GPU indexes. Common pick for billion-scale enterprise deployments.

Source: [Milvus — Documentation](https://milvus.io/docs) — accessed 2026-05-08. See also: Vector database, Qdrant, ANN search.

### Pinecone
A managed serverless vector database with hybrid search, namespaces, and integrated reranking. Common pick for teams that want a vector store as a first-class managed service rather than self-hosted infrastructure.

Source: [Pinecone — Documentation](https://docs.pinecone.io/) — accessed 2026-05-08. See also: Vector database, Weaviate, Qdrant.

## Knowledge graph (9)

### Knowledge graph _[Foundational]_
A structured representation of entities and the relations between them, traversable as a graph and queryable by graph patterns. Knowledge graphs power GraphRAG, entity-aware retrieval, and reasoning over multi-hop relations that vector retrieval cannot follow.

Source: [Hogan et al. — Knowledge Graphs (2021)](https://arxiv.org/abs/2003.02320) — accessed 2026-05-08. See also: GraphRAG, Entity, Relation, Context graph.

### Context graph _[Emerging 2026]_
Gartner's term for an agent-managed, structured representation of entities, relationships, and recent events that an agent uses as long-term memory beyond a vector store. Cited in Gartner's Hype Cycle for Agentic AI as a maturing capability for agent management platforms.

Source: [Gartner — Hype Cycle for Agentic AI](https://www.gartner.com/en/articles/hype-cycle-for-agentic-ai) — accessed 2026-05-08. Compare with the deep glossary: https://agentic-glossary.roei-020.workers.dev/. See also: Knowledge graph, GraphRAG, Memory (primitive), Long-term memory.

### Entity
A discrete object in a knowledge graph — a person, company, document, transaction, or concept — typically identified by a stable URI or ID. Entities are the nodes; relations are the edges.

Source: [W3C — RDF Concepts](https://www.w3.org/TR/rdf11-concepts/) — accessed 2026-05-08. See also: Relation, Knowledge graph, Triple store, Entity resolution.

### Relation _(also: edge, predicate)_
A typed, directed link between two entities in a knowledge graph (subject-predicate-object). Relations are how multi-hop reasoning is expressed and traversed.

Source: [W3C — RDF Concepts](https://www.w3.org/TR/rdf11-concepts/) — accessed 2026-05-08. See also: Entity, Triple store, Knowledge graph, Property graph.

### Triple store
A database optimised for storing and querying RDF triples (subject, predicate, object). Triple stores are queried in SPARQL and back many open-data knowledge graphs (DBpedia, Wikidata).

Source: [W3C — RDF 1.1 Primer](https://www.w3.org/TR/rdf11-primer/) — accessed 2026-05-08. See also: SPARQL, Property graph, Knowledge graph, Entity.

### Property graph
A graph model where both nodes and edges carry typed property bags, queried in Cypher or Gremlin. The native model in Neo4j, Memgraph, and TigerGraph; usually a more practical fit for application graphs than RDF.

Source: [Neo4j — Property graph model](https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/) — accessed 2026-05-08. See also: Cypher, Triple store, Knowledge graph.

### Cypher
Neo4j's declarative graph query language, since standardised as openCypher. Pattern-matches on (node)-[edge]->(node) shapes; the property-graph counterpart to SPARQL.

Source: [Neo4j — Cypher Manual](https://neo4j.com/docs/cypher-manual/current/) — accessed 2026-05-08. See also: Property graph, SPARQL, Knowledge graph.

### SPARQL _[In force]_
The W3C-standardised query language for RDF triple stores. Pattern-matches triples, supports federation across endpoints; the lingua franca for open-data knowledge graphs.

Source: [W3C — SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/) — accessed 2026-05-08. See also: Triple store, Cypher, Knowledge graph.

### Entity resolution _(also: entity linking)_
The process of deciding when two mentions in different sources refer to the same real-world entity, and merging them. The single hardest correctness problem in any knowledge graph that ingests heterogeneous data.

Source: [Microsoft Research — GraphRAG (entity resolution)](https://microsoft.github.io/graphrag/) — accessed 2026-05-08. See also: Entity, Knowledge graph, GraphRAG.

## Memory patterns (12)

### MemGPT _(also: Letta)_ _[Foundational]_
An OS-inspired memory architecture that gives an LLM a small "main context" plus paged virtual memory it manages with explicit function calls. Originated the metaphor of an LLM as a CPU with a memory hierarchy; now developed under the name Letta.

Source: [Packer et al. — MemGPT (2023)](https://arxiv.org/abs/2310.08560) — accessed 2026-05-08. See also: Letta, Memory stream, MemoryBank, Long-term memory.

### Letta
An open-source agent runtime descended from MemGPT, providing stateful agents with paged memory, archival memory, and tool use over a Postgres or SQLite backend.

Source: [Letta — Documentation](https://docs.letta.com/) — accessed 2026-05-08. See also: MemGPT, Long-term memory.

### Mem0 _[Emerging 2026]_
An open-source memory layer that extracts facts from agent-user dialogue, deduplicates them, and stores them for retrieval — surfaces "what the user told me before" across sessions. Pairs an LLM-extractor with a vector store and optional graph store.

Source: [Mem0 — Documentation](https://docs.mem0.ai/overview) — accessed 2026-05-08. See also: MemoryBank, MemGPT, Long-term memory, Semantic memory.

### MemoryBank
A long-term memory mechanism for LLM agents that stores conversation summaries, reflects on the user's personality over time, and applies a forgetting curve modelled on Ebbinghaus to weight retrieval.

Source: [Zhong et al. — MemoryBank (2023)](https://arxiv.org/abs/2305.10250) — accessed 2026-05-08. See also: Forgetting curve, Summarisation memory, Mem0, Long-term memory.

### A-Mem _(also: Agentic Memory)_ _[Emerging 2026]_
An agentic memory architecture inspired by the Zettelkasten method that links new memory notes to existing ones via embeddings, builds an evolving memory network, and lets the agent itself decide what to remember.

Source: [Xu et al. — A-Mem (2025)](https://arxiv.org/abs/2502.12110) — accessed 2026-05-08. See also: Mem0, MemoryBank, Memory stream, Knowledge graph.

### Generative agents _[Foundational]_
Park et al.'s 2023 architecture for believable simulacra of human behaviour: LLM agents with a memory stream, a reflection mechanism, and a planning loop. Demonstrated emergent social behaviour in a Sims-like environment and seeded the modern episodic-memory pattern.

Source: [Park et al. — Generative Agents (2023)](https://arxiv.org/abs/2304.03442) — accessed 2026-05-08. See also: Memory stream, Reflection, Episodic memory.

### Voyager _[Foundational]_
A lifelong-learning Minecraft agent that grows a library of executable skills (Javascript programs) over time and retrieves them from a vector skill library. Introduced procedural-memory-as-skill-library to LLM agents.

Source: [Wang et al. — Voyager (2023)](https://arxiv.org/abs/2305.16291) — accessed 2026-05-08. See also: Procedural memory, Skill library.

### Skill library
A persisted, retrievable collection of an agent's learned skills (typically tool-use programs or sub-plans) that is grown over time. Voyager popularised the pattern; later agent runtimes (e.g., AutoGen, OpenHands) adopted variants.

Source: [Wang et al. — Voyager (2023)](https://arxiv.org/abs/2305.16291) — accessed 2026-05-08. See also: Voyager, Procedural memory.

### Reflection
An agent's review of its own recent actions or memories to extract higher-level conclusions, which it then writes back into long-term memory. Generative Agents and Reflexion both use reflection to compress long horizons into stable behaviour.

Source: [Park et al. — Generative Agents (2023)](https://arxiv.org/abs/2304.03442) — accessed 2026-05-08. See also: Reflexion, Memory stream, Self-RAG.

### Reflexion _[Foundational]_
Shinn et al.'s framework where an agent verbalises feedback about its own failures and stores those reflections in episodic memory to improve on retries. "Verbal reinforcement learning" without parameter updates.

Source: [Shinn et al. — Reflexion (2023)](https://arxiv.org/abs/2303.11366) — accessed 2026-05-08. See also: Reflection, Episodic memory, Self-RAG.

### Summarisation memory _(also: summary-as-memory)_
A long-context strategy that periodically replaces older turns with an LLM-written summary, freeing context tokens while keeping the gist. The default fallback when a context window is exceeded; trades fidelity for survival.

Source: [LangChain — ConversationSummaryMemory](https://python.langchain.com/docs/versions/migrating_memory/conversation_summary_memory/) — accessed 2026-05-08. See also: Short-term memory, Compaction, Context window.

### Shared memory _(also: agent-fleet memory)_
A memory store visible to multiple agents in a fleet, used to coordinate handoffs, share context, and avoid duplicate work. The substrate for A2A delegation patterns; usually a vector store or graph plus a coordination event log.

Source: [AgentsBooks — Anatomy of a Firm](https://agentsbooks.com/anatomy) — accessed 2026-05-08. See also: Memory (primitive), Shared memory.

## Context engineering (9)

### Context window
The maximum number of tokens an LLM can attend to in a single forward pass. Anthropic's Claude offers a 1M-token window in 2026; even so, attention quality degrades non-uniformly across the window, and bigger windows are not a substitute for memory architecture.

Source: [Anthropic — Claude long-context model card](https://docs.claude.com/en/docs/build-with-claude/context-windows) — accessed 2026-05-08. See also: Lost in the middle, Needle in a Haystack, Prompt caching, Compaction.

### Prompt caching _[In force]_
A model-provider feature that caches the prefix of a prompt so subsequent calls reuse compute and pay 10× less for the cached portion. The most cost-effective lever for agents that send large stable system prompts on every step.

Source: [Anthropic — Prompt caching](https://docs.claude.com/en/docs/build-with-claude/prompt-caching) — accessed 2026-05-08. See also: Context window, Compaction, KV cache.

### KV cache
The per-token key/value tensors a transformer caches during autoregressive generation so it does not re-encode prior tokens. Server-side prompt caching exposes the KV cache as a billing primitive.

Source: [Anthropic — Prompt caching](https://docs.claude.com/en/docs/build-with-claude/prompt-caching) — accessed 2026-05-08. See also: Prompt caching, Context window.

### Lost in the middle _[Foundational]_
Liu et al.'s finding that LLMs attend most reliably to the start and end of a long context, and degrade in the middle — the U-shaped recall curve. Drives the practice of placing the most relevant retrieved passages at the edges of the prompt.

Source: [Liu et al. — Lost in the Middle (2023)](https://arxiv.org/abs/2307.03172) — accessed 2026-05-08. See also: Context window, Needle in a Haystack, Attention dilution.

### Needle in a Haystack _(also: NIAH)_
A long-context evaluation that embeds a single fact ("the needle") at a random depth in a long irrelevant document and tests whether the model can retrieve it. The de-facto smoke test for context-window claims in 2026.

Source: [Anthropic — Long context prompting tips](https://www.anthropic.com/news/prompting-long-context) — accessed 2026-05-08. See also: Lost in the middle, Context window.

### Attention dilution
Degradation of attention quality on any single token as the number of competing tokens grows. Why naive concatenation of large RAG outputs into a 1M-token window often performs worse than a smaller, well-curated context.

Source: [Anthropic — Long context prompting tips](https://www.anthropic.com/news/prompting-long-context) — accessed 2026-05-08. See also: Context window, Lost in the middle, Compaction.

### Context engineering _[Emerging 2026]_
The practice of curating, ordering, compressing, and refreshing the context an LLM sees on each call — the operating discipline of building production agents on long-context models. Distinct from "prompt engineering" by treating context as an evolving system, not a one-shot string.

Source: [Anthropic — Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) — accessed 2026-05-08. See also: Context window, Compaction, Prompt caching, Long-term memory.

### Compaction
Mid-trajectory compression of an agent's context — summarising older steps, dropping resolved branches, or rewriting tool outputs — so the agent can continue past its original window. A near-universal pattern in multi-hour agent runs.

Source: [Anthropic — Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) — accessed 2026-05-08. See also: Summarisation memory, Context window, Context engineering.

### Scratchpad
A region of the prompt the model writes to and reads from while reasoning — the chain-of-thought made explicit and persistent within a turn. Typically held in working memory; can be persisted to long-term memory at task end.

Source: [Anthropic — Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) — accessed 2026-05-08. See also: Working memory, Compaction.

## Forgetting & freshness (8)

### Memory decay
Down-weighting older memories over time so retrieval favours recent items. Implemented as an exponential time penalty on memory scores; modelled in MemoryBank on Ebbinghaus's forgetting curve.

Source: [Zhong et al. — MemoryBank (2023)](https://arxiv.org/abs/2305.10250) — accessed 2026-05-08. See also: Forgetting curve, Recency bias, Eviction.

### Forgetting curve
Ebbinghaus's empirical observation that memory retention falls roughly exponentially with time since encoding, slowed by repeated retrieval. Memory systems that explicitly model this (e.g., MemoryBank) keep useful memories alive longer than naive recency cutoffs.

Source: [Zhong et al. — MemoryBank (2023)](https://arxiv.org/abs/2305.10250) — accessed 2026-05-08. See also: Memory decay, Eviction, Recency bias.

### Supersession
Replacing an older memory with a newer one when the two contradict — "the user's job title is now CTO, not VP of Eng." Supersession is required to keep semantic memory consistent; without it, retrieval surfaces stale facts alongside true ones.

Source: [Mem0 — Memory updates and consolidation](https://docs.mem0.ai/overview) — accessed 2026-05-08. See also: Memory decay, Staleness, Mem0, Semantic memory.

### Eviction _(also: eviction policy)_
Removing memories from the active set when capacity is exceeded — by recency (LRU), importance, or composite scores. Eviction policy determines which stale items survive and which evergreen ones are accidentally lost.

Source: [Packer et al. — MemGPT (2023)](https://arxiv.org/abs/2310.08560) — accessed 2026-05-08. See also: Memory decay, Compaction, MemGPT.

### Staleness
The condition of a memory or retrieved document being out of date relative to the world. Staleness drives most production-RAG correctness failures — a confidently-cited but expired fact is worse than no fact at all.

Source: [Anthropic — Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) — accessed 2026-05-08. See also: Supersession, Freshness floor, Memory decay.

### Freshness floor
A site- or product-level rule that any cited claim must be no older than N months unless explicitly flagged as foundational or in-force. The discipline that keeps an authority property from quietly degrading into a stale archive.

Source: [AgentsBooks — Build Authority Property skill](https://agentsbooks.com/anatomy) — accessed 2026-05-08. See also: Staleness, Supersession, Memory decay.

### Recency bias
An inclination to over-weight recent memories in retrieval scoring. Useful for conversational continuity, dangerous for stable preferences — "the user once mentioned vegetarian" should not be lost just because it's older than yesterday's coffee preference.

Source: [Park et al. — Generative Agents (2023)](https://arxiv.org/abs/2304.03442) — accessed 2026-05-08. See also: Memory decay, Importance score, Memory stream.

### Importance score
An LLM-graded weight assigned to each memory at write-time, reflecting how core the memory is to the agent's identity or task. Combined with recency at retrieval to balance "new" against "important."

Source: [Park et al. — Generative Agents (2023)](https://arxiv.org/abs/2304.03442) — accessed 2026-05-08. See also: Recency bias, Memory stream, Memory decay.

## Sibling references on agentsbooks.com

- Pillar essay: [Agent memory and knowledge](https://agentsbooks.com/blog/agent-memory) — the canonical home for this vocabulary on the AgentsBooks hub.
- Pillar essay: [Context graph, explained](https://agentsbooks.com/blog/context-graph-explained) — sibling pairing for the knowledge-graph and context-graph terms.
- Worldview: [Anatomy of a Firm](https://agentsbooks.com/anatomy) — the 8 primitives every service firm needs.
- Compare with: [the deep agentic glossary](https://agentic-glossary.roei-020.workers.dev/) — longer narrative entries for flagship terms.

## See also

- AgentsBooks pillar (Memory & Knowledge): https://agentsbooks.com/blog/agent-memory
- AgentsBooks Anatomy of a Firm: https://agentsbooks.com/anatomy
- Try AgentsBooks Free: https://agentsbooks.com/login?returnTo=/onboarding

---

*Last refresh: 2026-05-08. Built by AgentsBooks. Archetype: glossary.*