AI Engineering

Building Production-Grade Agent Memory

HatiData TeamMarch 6, 20268 min read

The Prototype-to-Production Gap

Building agent memory for a prototype is easy. Store memories in a Python dictionary. Use a list for retrieval. Maybe add FAISS for vector search if you are feeling ambitious. The agent remembers things within a session, and that is enough for a demo.

Production is different. The agent runs across multiple processes, on multiple machines, serving multiple customers. The dictionary is gone when the process restarts. The FAISS index is lost when the container scales down. Customer A's memories are mixed with Customer B's memories. There is no encryption, no access control, no audit trail, no way to delete a specific user's data for GDPR compliance.

The gap between prototype memory and production memory is not a small incremental step — it is a fundamental architectural shift. This guide describes the seven properties that define production-grade agent memory and how HatiData implements each one.

Property 1: Persistence Across Restarts

The most basic requirement: memories must survive process restarts, container recycling, node failures, and deployments. If your agent learns something important about a customer on Monday, it must still know it on Friday, regardless of how many times the infrastructure has been recycled in between.

HatiData stores all memories in its embedded columnar engine, which writes to persistent storage (high-performance local SSD in cloud deployments, local disk for development). The engine uses a write-ahead log (WAL) for crash recovery — even if the process is killed mid-write, the data is either fully committed or fully rolled back on restart.

Vector embeddings are stored in a dedicated vector index with its own persistence layer, and the memory-to-embedding link is maintained by the shared memory_id UUID. If the vector index restarts, it reloads from persistent storage. If the embedding for a memory is lost (rare, but possible during a corruption event), HatiData detects the has_embedding = false flag and re-queues the memory for embedding.

Property 2: Namespace Isolation

Production agents serve multiple customers, teams, or use cases. A customer support agent handling Company A's tickets must not see Company B's interaction history. A research agent in the marketing namespace must not access memories in the engineering namespace.

HatiData enforces namespace isolation at the query pipeline level. Every API key specifies which namespaces it can access, and every query is automatically filtered to include only those namespaces. This is not a convention that agents must follow — it is an enforcement that agents cannot bypass.

API Key: hd_live_support_agent_A
Namespaces: [company_a/support, company_a/product]

store_memory(namespace="company_a/support", content="...") → Success
store_memory(namespace="company_b/support", content="...") → Permission Denied
search_memory(namespace="company_a/support", query="...") → Results from A only

Namespace isolation applies to all operations: memory storage, retrieval, search, deletion, and analytics queries. An agent cannot escape its namespace, even with crafted SQL injection attempts, because the filter is applied in the query pipeline after SQL parsing.

Property 3: Encryption

Production data must be encrypted at rest and in transit. This is a basic security requirement, but it has implications for memory architecture:

At rest — All memory data is encrypted with AES-256. CMEK (Customer-Managed Encryption Keys) is supported through AWS KMS, GCP Cloud KMS, or Azure Key Vault, giving customers full control over their encryption keys.
In transit — All connections use TLS 1.3. The MCP server, REST API, and Postgres wire protocol all enforce encryption. Plaintext connections are rejected.
Embeddings — Vector embeddings are encrypted at rest using the same key management infrastructure.

The encryption is transparent to agents — they store and retrieve memories through the same API regardless of encryption configuration. The performance impact is minimal because encryption/decryption is hardware-accelerated on modern CPUs.

Property 4: Access Controls

Not all agents should have the same access to the memory store. A ReadOnly analytics agent should be able to search memories for reporting but not store new ones. An Admin agent should have full access. A supervisor agent should be able to read across namespaces for monitoring.

HatiData's access control model has three layers:

1Key scope — ReadOnly keys can search and retrieve. Admin keys can store, search, retrieve, and delete.
2Namespace grants — Each key specifies its accessible namespaces. Cross-namespace access requires explicit grants.
3ABAC policies — Fine-grained rules can restrict memory operations based on attributes like time of day, content sensitivity tags, or agent identity.

These layers compose — an agent must have the right scope AND the right namespace AND pass all applicable policies for an operation to succeed.

Property 5: Hybrid Search

Production agents need more than vector similarity search. They need to combine semantic understanding with structured filters — "find memories similar to 'pricing concern' from the last 7 days in the enterprise namespace with high importance."

HatiData's hybrid search runs queries against both the columnar engine (structured data) and the vector index (vector embeddings), joining results by memory_id. The agent can use:

MCP tools — search_memory with query text, namespace filter, and similarity threshold
SQL functions — semantic_match() and semantic_rank() in standard SQL queries with arbitrary WHERE clauses, JOINs, and aggregations

The hybrid approach means agents are never forced to choose between semantic relevance and structured precision. They get both in a single operation.

Property 6: Embedding Pipeline

Every memory needs a vector embedding for semantic search. In production, the embedding pipeline must be:

Asynchronous — Memory writes should not block on embedding computation
Batched — Multiple memories should be embedded in a single model call for efficiency
Resilient — Failed embeddings should be retried, not silently dropped
Configurable — Different embedding models for different use cases

HatiData's embedding pipeline consists of three components:

Embedding Service

The embedding model runs as a service alongside the data proxy. It accepts batches of text and returns vector embeddings. The model is configurable — you can use different model sizes (small for speed, large for accuracy) or swap in a different model entirely.

Batch Processor

An asynchronous batch processor receives embedding requests through a bounded channel. It accumulates requests up to a configurable batch size (default 32) or flush interval (default 100ms), then sends the batch to the embedding service. This amortizes the model invocation overhead across multiple memories.

Write Coordinator

The write coordinator manages the memory storage flow:

1Write the memory record to the query engine (immediate, synchronous)
2Dispatch the embedding request to the batch processor (asynchronous, non-blocking)
3When the embedding completes, update has_embedding = true in the query engine and store the vector in the vector index

This design means store_memory returns immediately — the agent does not wait for embedding computation. The memory is queryable via SQL right away, and becomes semantically searchable once the embedding pipeline processes it (typically within milliseconds for low-load scenarios, seconds for high-load).

Access Tracking

The access tracker monitors memory access patterns using concurrent atomic counters. Each memory's access count is incremented on retrieval and periodically flushed to the query engine in batches. This data powers memory analytics — which memories are accessed most frequently, which namespaces have the highest retrieval rates, and which memories might be candidates for archival.

Property 7: Scalability

Production memory stores grow continuously. An agent that stores 100 memories per day accumulates 36,500 memories per year. A fleet of 50 agents accumulates nearly 2 million memories per year. The memory system must handle this growth without degrading search performance.

HatiData scales memory storage through:

Columnar storage — Efficient compression for structured memory data, with sub-millisecond point lookups and fast analytical scans
Vector index — Approximate nearest neighbor search that maintains sub-5ms latency regardless of collection size, up to millions of vectors
Partitioning by namespace — Large deployments can partition memory tables by namespace, ensuring that queries only scan relevant data
Archival policies — Memories can be automatically archived (moved to cold storage) based on age or access frequency, reducing the hot data footprint

The target performance characteristics:

Metric	Target	Notes
Memory write	<1ms	Synchronous engine insert
Memory search (semantic)	<5ms p50	Vector index + engine join
Memory search (SQL only)	<2ms p50	Engine only
Embedding pipeline	<100ms p99	Async, batched
Collection size	1M+ memories	Per namespace

The Checklist

Before deploying agent memory to production, verify these properties:

[ ] Memories persist across process restarts and deployments
[ ] Namespaces isolate data between customers/teams/use cases
[ ] Data is encrypted at rest (AES-256) and in transit (TLS 1.3)
[ ] Access controls enforce least-privilege per agent
[ ] Hybrid search combines semantic and structured queries
[ ] Embedding pipeline is asynchronous, batched, and resilient
[ ] System scales to expected memory volume without degradation
[ ] GDPR compliance: memories can be deleted by user/customer
[ ] Audit trail records all memory access for compliance

HatiData addresses all nine properties as built-in features, not add-on configurations.

Next Steps

For a hands-on introduction to production memory, see the LangChain persistent memory cookbook. For architecture details, see the persistent memory documentation. For multi-agent memory sharing patterns, see the CrewAI shared memory cookbook.