LlamaIndex + HatiData: Agentic RAG with SQL
Beyond Document RAG
Most RAG implementations follow the same pattern: chunk documents, embed chunks, store in a vector database, retrieve by similarity, stuff into a prompt. This works well for unstructured text — product documentation, support articles, knowledge base entries. But a large fraction of enterprise knowledge lives in structured data: databases, data warehouses, analytics platforms.
An agent asked "What were our top 5 products by revenue last quarter?" cannot answer that question from document chunks. It needs to write and execute a SQL query against a database. An agent asked "Find customer complaints similar to 'slow API response times' from enterprise accounts" needs both semantic search and structured filtering. This is agentic RAG — retrieval that combines multiple strategies, with the agent deciding which approach to use for each question.
LlamaIndex provides the building blocks for agentic RAG through its QueryEngine, SQLTableRetriever, and AgentRunner abstractions. HatiData provides the backend that unifies structured and semantic retrieval in a single system. Together, they enable agents that can answer questions requiring SQL queries, semantic search, or both.
Connecting LlamaIndex to HatiData
HatiData speaks the Postgres wire protocol, so LlamaIndex's SQL integration works out of the box:
from llama_index.core import SQLDatabase
from sqlalchemy import create_engine
# Connect to HatiData via Postgres wire protocol
engine = create_engine("postgresql://admin:password@localhost:5439/hatidata")
sql_database = SQLDatabase(engine, include_tables=["customers", "orders", "products", "agent_memories"])This connection gives LlamaIndex full access to HatiData's SQL capabilities, including tables, views, and HatiData's custom semantic functions.
Text-to-SQL with NLSQLTableQueryEngine
LlamaIndex's NLSQLTableQueryEngine converts natural language questions into SQL queries. With HatiData as the backend, the agent can query any table in its namespace:
from llama_index.core.query_engine import NLSQLTableQueryEngine
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4o")
query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=["customers", "orders", "products"],
llm=llm,
)
# Natural language to SQL
response = query_engine.query("What were the top 5 products by revenue last quarter?")
print(response)
# The engine generates SQL, executes it against HatiData, and returns a natural language answerThe LLM generates the SQL query, HatiData executes it through its multi-stage pipeline (including policy checks, namespace filtering, and column masking), and the results flow back through LlamaIndex for natural language synthesis.
Hybrid Retrieval: SQL + Vector
The real power of HatiData with LlamaIndex is hybrid retrieval — combining text-to-SQL for structured questions with vector search for semantic questions. This requires a router that decides which retrieval strategy to use:
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool
# SQL query engine for structured data
sql_tool = QueryEngineTool.from_defaults(
query_engine=sql_query_engine,
description="Useful for answering questions about specific metrics, counts, aggregations, and structured data queries. Use this for questions like 'how many', 'what is the total', 'top N by', etc.",
)
# Vector query engine for semantic search
vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine,
description="Useful for finding information by meaning or concept. Use this for questions like 'find similar', 'what do we know about', 'summarize insights on', etc.",
)
# Router decides which engine to use
router_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(llm=llm),
query_engine_tools=[sql_tool, vector_tool],
)
# The router handles both types of questions
response = router_engine.query("What were our top products by revenue?") # Routes to SQL
response = router_engine.query("What do customers say about our pricing?") # Routes to vectorUsing HatiData's Semantic SQL Functions
For questions that genuinely require both structured and semantic retrieval, you can use HatiData's semantic_match() and semantic_rank() functions directly in SQL:
from llama_index.core.query_engine import NLSQLTableQueryEngine
# Custom prompt that teaches the LLM about semantic SQL functions
custom_prompt = """You are a SQL expert. You have access to HatiData's semantic SQL functions:
- semantic_match(column, query, threshold) - returns true if content is semantically similar
- semantic_rank(column, query) - returns similarity score (0-1)
Use these functions when the question requires finding data by meaning, not exact values.
Example: SELECT content, semantic_rank(content, 'pricing concerns') as relevance
FROM agent_memories WHERE semantic_rank(content, 'pricing concerns') > 0.7
ORDER BY relevance DESC"""
hybrid_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=["customers", "orders", "agent_memories"],
llm=llm,
text_to_sql_prompt=custom_prompt,
)
# Now the engine can generate hybrid queries
response = hybrid_engine.query(
"Find enterprise customers who have expressed pricing concerns in the last month"
)
# Generates SQL that joins customers table with semantic search on agent_memoriesBuilding an Agentic RAG System
For a fully agentic RAG system, combine multiple query engines with LlamaIndex's agent framework:
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, FunctionTool
# Tool 1: SQL queries
sql_tool = QueryEngineTool.from_defaults(
query_engine=sql_query_engine,
name="sql_query",
description="Query structured data with SQL",
)
# Tool 2: Memory search
memory_tool = QueryEngineTool.from_defaults(
query_engine=memory_query_engine,
name="memory_search",
description="Search agent memories by meaning",
)
# Tool 3: Store memory (custom function)
def store_finding(content: str) -> str:
"""Store an important finding for future reference."""
import httpx
response = httpx.post(
"http://localhost:5439/v1/memory/store",
headers={"Authorization": "Bearer hd_live_your_key"},
json={"content": content, "namespace": "research"},
)
return f"Stored: {response.json()['memory_id']}"
store_tool = FunctionTool.from_defaults(fn=store_finding)
# Create ReAct agent with all tools
agent = ReActAgent.from_tools(
tools=[sql_tool, memory_tool, store_tool],
llm=llm,
verbose=True,
)
# The agent decides which tools to use
response = agent.chat("Analyze our enterprise customer churn and compare with past findings")
# Agent: 1) search_memory for past churn analysis
# 2) sql_query for current churn data
# 3) Synthesize findings
# 4) store_finding with new insightsSubQuestion Query Engine
For complex questions that require multiple data sources, LlamaIndex's SubQuestion engine breaks the question into sub-questions and routes each to the appropriate engine:
from llama_index.core.query_engine import SubQuestionQueryEngine
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=[sql_tool, memory_tool],
llm=llm,
)
response = sub_question_engine.query(
"How does our current Q1 churn rate compare to what we predicted last quarter, and what do customer memories suggest about the main drivers?"
)
# Generates sub-questions:
# 1. "What is the current Q1 churn rate?" → SQL query
# 2. "What churn prediction did we make last quarter?" → Memory search
# 3. "What do customer memories say about churn drivers?" → Memory search
# Synthesizes all sub-answers into a coherent responseProduction Architecture
For production agentic RAG with LlamaIndex and HatiData:
- Connection pooling — Use SQLAlchemy connection pooling with
pool_size=10andmax_overflow=20for concurrent agent queries - Query caching — LlamaIndex supports response caching. Enable it for frequently asked questions to reduce HatiData query load
- Error handling — HatiData's AI heal step (pipeline step 10) automatically corrects minor SQL errors generated by the LLM, reducing the failure rate of text-to-SQL generation
- Namespace scoping — Use different HatiData API keys for different LlamaIndex agents, ensuring each agent only accesses its authorized data
Next Steps
The LlamaIndex integration leverages HatiData's Postgres wire protocol compatibility and semantic SQL functions. For more memory-focused patterns, see the LangChain deep dive. For multi-agent memory sharing, see the CrewAI guide. For the full range of 24 MCP tools, see the Claude MCP integration guide.