Engineering

Building Agentic RAG Without a Vector Database

HatiData Team6 min read

The Two-Database Problem

Every team building agentic RAG hits the same architectural wall. Your structured data lives in a warehouse. Your embeddings live in a vector database. Your agent needs both. So you build an orchestration layer that queries two systems, merges the results, manages two sets of credentials, monitors two sets of metrics, and pays two bills.

This architecture is not just expensive — it is fragile. Every cross-system query is a potential failure point. Every schema change in the warehouse requires a corresponding update in the vector index. Every new data source needs to be ingested into both systems. The operational overhead scales linearly with complexity.

What if you did not need the second database at all?

Agent Memory as a First-Class Primitive

HatiData includes vector-indexed agent memory as a native capability of the query engine. This is not a bolt-on integration or a separate service — it is a storage primitive that sits alongside your tables, governed by the same access controls, backed by the same audit trail, and queryable with the same SQL interface.

Here is what that looks like in practice:

python
from hatidata import Client

client = Client(host="your-cluster.vpc.hatidata.com", port=5439)

# Store a memory with embedding, tags, and TTL
client.memory.store(
    session_id="agent-session-42",
    content="Customer prefers email communication. Last order was SKU-8821.",
    tags=["customer-pref", "order-history"],
    ttl_hours=72
)

# Semantic search across agent memories
results = client.memory.search(
    query="What are this customer's communication preferences?",
    session_id="agent-session-42",
    limit=5
)

for memory in results:
    print(f"[{memory.similarity:.2f}] {memory.content}")

The memory store handles embedding generation, vector indexing, and similarity search internally. You do not need to manage an embedding model, configure an index, or tune HNSW parameters. Store text, search by meaning.

Combining SQL and Memory in a Single Query

The real power emerges when you combine structured SQL queries with semantic memory retrieval. Traditional architectures require you to query both systems separately and merge the results in application code. With HatiData, it is a single operation:

python
# Retrieve customer data AND relevant agent memories in one call
context = client.query_with_memory(
    sql="SELECT customer_id, plan_tier, last_login FROM customers WHERE customer_id = 'C-1042'",
    memory_query="Previous interactions and preferences for this customer",
    session_id="agent-session-42",
    memory_limit=3
)

# context.rows contains the SQL results
# context.memories contains semantically relevant memories
# Both are returned in a single round trip

This is not syntactic sugar over two separate calls. The query engine executes both operations in the same process, on the same node, with shared access to the session context. The latency is a single round trip, not two.

Session Context That Persists

Agentic RAG is not a single query — it is a conversation. The agent retrieves context, reasons about it, generates a response, and then needs to remember what it learned for the next iteration. Traditional architectures require an external state store (Redis, DynamoDB) to maintain this session context.

HatiData's session layer maintains context natively:

python
# Store intermediate reasoning results
client.memory.store(
    session_id="agent-session-42",
    content="Analysis complete: customer C-1042 is at risk of churn based on declining login frequency",
    tags=["analysis", "churn-risk"],
    ttl_hours=24
)

# Later in the session, the agent can retrieve its own reasoning
prior_analysis = client.memory.search(
    query="What did I determine about this customer's churn risk?",
    session_id="agent-session-42",
    tags=["analysis"],
    limit=1
)

The agent's working memory, long-term knowledge, and structured data all live in the same system. No Redis. No DynamoDB. No glue code.

TTL-Based Memory Management

Not all memories should live forever. Customer preferences might be relevant for days. Session-specific reasoning might only matter for hours. Cached API responses might expire in minutes.

HatiData's memory store supports TTL at the individual memory level. Expired memories are automatically cleaned up — no cron jobs, no manual garbage collection, no storage bloat from stale embeddings.

The Architecture Simplification

Eliminating the vector database is not just about cost savings, though those are significant. It is about architectural simplification:

  • One set of credentials instead of two
  • One audit trail instead of fragmented logs across systems
  • One access control layer instead of separate RBAC configurations
  • One failure domain instead of cross-system failure modes
  • One bill instead of warehouse plus vector DB plus session store

For teams building production agentic RAG systems, this simplification translates directly into faster development, easier debugging, and more reliable production deployments.

Explore the full implementation guide in our Agentic RAG Playbook, including patterns for multi-agent memory sharing, cross-session knowledge persistence, and memory-augmented SQL generation.

Enjoyed this post?

Get notified when we publish new engineering deep-dives and product updates.

Ready to see the difference?

Run the free audit script in 5 minutes. Or start Shadow Mode and see HatiData run your actual workloads side-by-side.