AI Engineering

LlamaIndex + HatiData: Agentic RAG with SQL

HatiData TeamMarch 9, 20266 min read

Beyond Document RAG

Most RAG implementations follow the same pattern: chunk documents, embed chunks, store in a vector database, retrieve by similarity, stuff into a prompt. This works well for unstructured text — product documentation, support articles, knowledge base entries. But a large fraction of enterprise knowledge lives in structured data: databases, data warehouses, analytics platforms.

An agent asked "What were our top 5 products by revenue last quarter?" cannot answer that question from document chunks. It needs to write and execute a SQL query against a database. An agent asked "Find customer complaints similar to 'slow API response times' from enterprise accounts" needs both semantic search and structured filtering. This is agentic RAG — retrieval that combines multiple strategies, with the agent deciding which approach to use for each question.

LlamaIndex provides the building blocks for agentic RAG through its QueryEngine, SQLTableRetriever, and AgentRunner abstractions. HatiData provides the backend that unifies structured and semantic retrieval in a single system. Together, they enable agents that can answer questions requiring SQL queries, semantic search, or both.

Connecting LlamaIndex to HatiData

HatiData speaks the Postgres wire protocol, so LlamaIndex's SQL integration works out of the box:

python

from llama_index.core import SQLDatabase
from sqlalchemy import create_engine

# Connect to HatiData via Postgres wire protocol
engine = create_engine("postgresql://admin:password@localhost:5439/hatidata")
sql_database = SQLDatabase(engine, include_tables=["customers", "orders", "products", "agent_memories"])

This connection gives LlamaIndex full access to HatiData's SQL capabilities, including tables, views, and HatiData's custom semantic functions.

Text-to-SQL with NLSQLTableQueryEngine

LlamaIndex's NLSQLTableQueryEngine converts natural language questions into SQL queries. With HatiData as the backend, the agent can query any table in its namespace:

python

from llama_index.core.query_engine import NLSQLTableQueryEngine
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["customers", "orders", "products"],
    llm=llm,
)

# Natural language to SQL
response = query_engine.query("What were the top 5 products by revenue last quarter?")
print(response)
# The engine generates SQL, executes it against HatiData, and returns a natural language answer

The LLM generates the SQL query, HatiData executes it through its multi-stage pipeline (including policy checks, namespace filtering, and column masking), and the results flow back through LlamaIndex for natural language synthesis.

Hybrid Retrieval: SQL + Vector

The real power of HatiData with LlamaIndex is hybrid retrieval — combining text-to-SQL for structured questions with vector search for semantic questions. This requires a router that decides which retrieval strategy to use:

python

from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool

# SQL query engine for structured data
sql_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine,
    description="Useful for answering questions about specific metrics, counts, aggregations, and structured data queries. Use this for questions like 'how many', 'what is the total', 'top N by', etc.",
)

# Vector query engine for semantic search
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for finding information by meaning or concept. Use this for questions like 'find similar', 'what do we know about', 'summarize insights on', etc.",
)

# Router decides which engine to use
router_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(llm=llm),
    query_engine_tools=[sql_tool, vector_tool],
)

# The router handles both types of questions
response = router_engine.query("What were our top products by revenue?")  # Routes to SQL
response = router_engine.query("What do customers say about our pricing?")  # Routes to vector

Using HatiData's Semantic SQL Functions

For questions that genuinely require both structured and semantic retrieval, you can use HatiData's semantic_match() and semantic_rank() functions directly in SQL:

python

from llama_index.core.query_engine import NLSQLTableQueryEngine

# Custom prompt that teaches the LLM about semantic SQL functions
custom_prompt = """You are a SQL expert. You have access to HatiData's semantic SQL functions:
- semantic_match(column, query, threshold) - returns true if content is semantically similar
- semantic_rank(column, query) - returns similarity score (0-1)

Use these functions when the question requires finding data by meaning, not exact values.

Example: SELECT content, semantic_rank(content, 'pricing concerns') as relevance
FROM agent_memories WHERE semantic_rank(content, 'pricing concerns') > 0.7
ORDER BY relevance DESC"""

hybrid_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["customers", "orders", "agent_memories"],
    llm=llm,
    text_to_sql_prompt=custom_prompt,
)

# Now the engine can generate hybrid queries
response = hybrid_engine.query(
    "Find enterprise customers who have expressed pricing concerns in the last month"
)
# Generates SQL that joins customers table with semantic search on agent_memories

Building an Agentic RAG System

For a fully agentic RAG system, combine multiple query engines with LlamaIndex's agent framework:

python

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, FunctionTool

# Tool 1: SQL queries
sql_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine,
    name="sql_query",
    description="Query structured data with SQL",
)

# Tool 2: Memory search
memory_tool = QueryEngineTool.from_defaults(
    query_engine=memory_query_engine,
    name="memory_search",
    description="Search agent memories by meaning",
)

# Tool 3: Store memory (custom function)
def store_finding(content: str) -> str:
    """Store an important finding for future reference."""
    import httpx
    response = httpx.post(
        "http://localhost:5439/v1/memory/store",
        headers={"Authorization": "Bearer hd_live_your_key"},
        json={"content": content, "namespace": "research"},
    )
    return f"Stored: {response.json()['memory_id']}"

store_tool = FunctionTool.from_defaults(fn=store_finding)

# Create ReAct agent with all tools
agent = ReActAgent.from_tools(
    tools=[sql_tool, memory_tool, store_tool],
    llm=llm,
    verbose=True,
)

# The agent decides which tools to use
response = agent.chat("Analyze our enterprise customer churn and compare with past findings")
# Agent: 1) search_memory for past churn analysis
#         2) sql_query for current churn data
#         3) Synthesize findings
#         4) store_finding with new insights

SubQuestion Query Engine

For complex questions that require multiple data sources, LlamaIndex's SubQuestion engine breaks the question into sub-questions and routes each to the appropriate engine:

python

from llama_index.core.query_engine import SubQuestionQueryEngine

sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[sql_tool, memory_tool],
    llm=llm,
)

response = sub_question_engine.query(
    "How does our current Q1 churn rate compare to what we predicted last quarter, and what do customer memories suggest about the main drivers?"
)
# Generates sub-questions:
# 1. "What is the current Q1 churn rate?" → SQL query
# 2. "What churn prediction did we make last quarter?" → Memory search
# 3. "What do customer memories say about churn drivers?" → Memory search
# Synthesizes all sub-answers into a coherent response

Production Architecture

For production agentic RAG with LlamaIndex and HatiData:

Connection pooling — Use SQLAlchemy connection pooling with pool_size=10 and max_overflow=20 for concurrent agent queries
Query caching — LlamaIndex supports response caching. Enable it for frequently asked questions to reduce HatiData query load
Error handling — HatiData's AI heal step (pipeline step 10) automatically corrects minor SQL errors generated by the LLM, reducing the failure rate of text-to-SQL generation
Namespace scoping — Use different HatiData API keys for different LlamaIndex agents, ensuring each agent only accesses its authorized data

Next Steps

The LlamaIndex integration leverages HatiData's Postgres wire protocol compatibility and semantic SQL functions. For more memory-focused patterns, see the LangChain deep dive. For multi-agent memory sharing, see the CrewAI guide. For the full range of 24 MCP tools, see the Claude MCP integration guide.

LlamaIndex + HatiData: Agentic RAG with SQL

Beyond Document RAG

Connecting LlamaIndex to HatiData

Text-to-SQL with NLSQLTableQueryEngine

Hybrid Retrieval: SQL + Vector

Using HatiData's Semantic SQL Functions

Building an Agentic RAG System

SubQuestion Query Engine

Production Architecture

Next Steps

Enjoyed this post?

Ready to see the difference?