LlamaIndex + HatiData: SQL-Based RAG Pipeline
Build a RAG pipeline that uses HatiData's SQL interface for structured data retrieval alongside vector search for unstructured content.
What You'll Build
A LlamaIndex RAG pipeline that uses HatiData's SQL interface for structured data retrieval alongside vector search.
Prerequisites
$pip install llama-index hatidata-agent
$hati init
$OpenAI API key
Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ LlamaIndex │───▶│ HatiData │───▶│ Engine │
│ Query Engine│ │ SQL + Vec │ │ (structured) │
└──────────────┘ │ │ └──────────────┘
│ │───▶┌──────────────┐
└──────────────┘ │ Vector Index │
NL → SQL + vector │(unstructured)│
└──────────────┘Key Concepts
- ●Hybrid RAG: combine SQL queries for structured data with vector search for unstructured content in a single pipeline
- ●SQL tables vs memory layer: structured data lives in regular SQL tables, unstructured data lives in the memory layer with embeddings
- ●Single data platform: no need to orchestrate separate SQL and vector databases — HatiData serves both from one endpoint
- ●LlamaIndex integration: use LlamaIndex query engines and LLMs on top of HatiData's hybrid retrieval
Step-by-Step Implementation
Install Dependencies
Install LlamaIndex and the HatiData agent SDK.
pip install llama-index llama-index-llms-openai hatidata-agent
hati initHatiData initialized successfully.
Proxy running on localhost:5439Note: LlamaIndex provides query engine abstractions. HatiData provides the SQL and vector storage backends.
Load Structured Data into HatiData
Create tables with structured data that the RAG pipeline will query.
from hatidata_agent import HatiDataAgent
hati = HatiDataAgent(host="localhost", port=5439, agent_id="llamaindex-rag")
# Create product catalog
hati.execute("""
CREATE TABLE products (
id INTEGER, name VARCHAR, category VARCHAR,
price DOUBLE, rating DOUBLE, description VARCHAR
)
""")
hati.execute("""
INSERT INTO products VALUES
(1, 'HatiData Cloud', 'Data Platform', 299.0, 4.8, 'Agent-native data warehouse with SQL+vector hybrid'),
(2, 'VectorDB Pro', 'Database', 199.0, 4.5, 'Standalone vector database for embeddings'),
(3, 'QueryMaster', 'Analytics', 149.0, 4.2, 'SQL analytics tool with BI dashboards'),
(4, 'DataPipe', 'ETL', 99.0, 4.0, 'ETL pipeline for batch data processing'),
(5, 'StreamFlow', 'Streaming', 249.0, 4.6, 'Real-time data streaming platform')
""")
print("Loaded 5 products into HatiData")Loaded 5 products into HatiDataStore Unstructured Content as Memories
Store product reviews and documentation as vector-searchable memories.
# Store unstructured reviews as memories
reviews = [
"HatiData Cloud made our RAG pipeline 5x faster with the hybrid SQL+vector queries",
"VectorDB Pro is great for pure embedding search but lacks SQL capabilities",
"QueryMaster handles our BI needs but cannot do semantic search on documents",
"We switched from DataPipe to StreamFlow for real-time agent event processing",
]
for review in reviews:
hati.execute(f"""
SELECT store_memory(
'{review}',
'product-reviews'
)
""")
print(f"Stored {len(reviews)} reviews as vector-searchable memories")Stored 4 reviews as vector-searchable memoriesNote: Structured data lives in SQL tables. Unstructured data lives in the memory layer with vector embeddings.
Build the Hybrid Query Engine
Create a function that combines SQL queries for structured data with vector search for unstructured content.
def hybrid_rag_query(question: str) -> dict:
# Step 1: SQL query for structured data
sql_results = hati.query("""
SELECT name, category, price, rating
FROM products
WHERE rating >= 4.5
ORDER BY rating DESC
""")
# Step 2: Vector search for relevant reviews
vector_results = hati.query(f"""
SELECT content
FROM _hatidata_memory.memories
WHERE namespace = 'product-reviews'
AND semantic_match(embedding, '{question}', 0.6)
ORDER BY semantic_rank(embedding, '{question}') DESC
LIMIT 3
""")
return {
"structured": sql_results,
"unstructured": [r["content"] for r in vector_results],
}
result = hybrid_rag_query("Which product is best for AI agent data needs?")
print("=== Structured Results (SQL) ===")
for r in result["structured"]:
print(f" {r['name']} ({r['category']}): ${r['price']} - {r['rating']} stars")
print("\n=== Unstructured Results (Vector) ===")
for review in result["unstructured"]:
print(f" {review[:80]}...")=== Structured Results (SQL) ===
HatiData Cloud (Data Platform): $299.0 - 4.8 stars
StreamFlow (Streaming): $249.0 - 4.6 stars
VectorDB Pro (Database): $199.0 - 4.5 stars
=== Unstructured Results (Vector) ===
HatiData Cloud made our RAG pipeline 5x faster with the hybrid SQL+vector que...
VectorDB Pro is great for pure embedding search but lacks SQL capabilities...Generate LLM Response with Combined Context
Feed both SQL and vector results into an LLM to generate a comprehensive answer.
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4o")
def answer_with_hybrid_rag(question: str) -> str:
data = hybrid_rag_query(question)
# Format context from both sources
sql_context = "\n".join(
f"- {r['name']} ({r['category']}): ${r['price']}, {r['rating']} stars"
for r in data["structured"]
)
review_context = "\n".join(f"- {r}" for r in data["unstructured"])
prompt = f"""Based on the following data, answer the question.
Product Data (from database):
{sql_context}
User Reviews (from semantic search):
{review_context}
Question: {question}"""
response = llm.complete(prompt)
return response.text
answer = answer_with_hybrid_rag("Which product is best for AI agent data needs?")
print(answer)Based on the product data and user reviews, HatiData Cloud is the best choice for AI agent data needs. It has the highest rating (4.8 stars) and uniquely combines SQL and vector search in a single platform. User reviews confirm it made RAG pipelines 5x faster with hybrid queries. While VectorDB Pro offers vector search, it lacks the SQL capabilities that agents need for structured data access.Note: The hybrid approach gives the LLM both quantitative data (price, ratings) and qualitative data (reviews) for richer answers.