Context Window Management with Letta

Letta (formerly known as MemGPT) is an open source framework for building stateful agents with advanced reasoning capabilities and transparent long-term memory.

The MemGPT package and Docker image have been renamed to letta to clarify the distinction between MemGPT agents and the Letta API server / runtime that runs LLM agents as services.

How It Works Like Virtual Memory

Letta uses virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. Similar to an OS, the operating system coordinates between virtual memory, physical memory, and disk, deciding what data stays in RAM, what goes to disk, and when to swap it back. arXiv Medium

The system creates a memory hierarchy with different tiers:

Main Context: Analogous to an OS's main memory or RAM, representing the standard fixed-length context window the LLM processes during inference Letta+Zilliz Cloud: Building RAG Agents with Extended LLM Context Window
External Context: Resembling secondary storage in an OS, holding out-of-context information that can be selectively moved into the main context through explicit function calls Letta+Zilliz Cloud: Building RAG Agents with Extended LLM Context Window
Recall Storage: Stores recent evicted data for quick retrieval
Archival Storage: For long-term, less frequently accessed information

Intelligent Memory Management

The Queue Manager in MemGPT plays a central role in managing the limited memory resources of the main context (LLM's context window). It evicts the oldest messages (typically around 50% of the context window) and replaces them with a recursive summary. Extending LLM Context Through OS-Inspired Virtual Memory and Hierarchical Storage | by Neeraj Kumar | Medium

MemGPT empowers LLMs to control data movement between the main and external context through self-generated function calls, learning to leverage these functions based on the current goals and context. Letta+Zilliz Cloud: Building RAG Agents with Extended LLM Context Window

Key Features

Self-editing memory: The basic idea is to use LLM tools to allow an agent to both edit its own context window ("core memory"), as well as edit external storage (i.e. "archival memory")
Automatic swapping: The system automatically determines what information is most relevant and should remain in the active context window
Persistent memory: Unlike traditional LLMs, agents built with Letta can remember information across sessions
Model-agnostic: Works with various LLM providers including OpenAI, local models, etc.

You can find the project on GitHub at letta-ai/letta and install it via Docker or pip (pip install -U letta).