Ollama + HatiData: Local LLM + Memory
Build a fully local AI agent setup using Ollama for inference and HatiData for persistent memory — no cloud APIs, no data leaving your machine.
What You'll Build
A fully local AI agent setup using Ollama for inference and HatiData for persistent memory — no cloud required.
Prerequisites
$Ollama installed (ollama.com)
$hati init
$8GB+ RAM
Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Ollama │ │ HatiData │───▶│ Engine │
│ (local LLM) │ │ Memory API │ │ + Vectors │
└──────┬───────┘ └──────────────┘ └──────────────┘
│ ▲
└───────────────────┘
100% local — no cloud APIsKey Concepts
- ●100% local: both inference (Ollama) and data storage (HatiData) run on your machine — no cloud APIs needed
- ●Local embeddings: HatiData uses a local embedding service for embeddings when no cloud API key is configured
- ●Privacy by design: no data leaves your machine, making this ideal for sensitive or regulated environments
- ●Standard interface: the same HatiData SQL + memory API works whether running locally or in the cloud
Step-by-Step Implementation
Install Ollama and HatiData
Set up both Ollama and HatiData locally.
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.2:3b
# Install and start HatiData
curl -fsSL https://hatidata.com/install.sh | sh
hati initOllama installed successfully
pulling llama3.2:3b... done
HatiData initialized.
Proxy running on localhost:5439
MCP server running on localhost:5440Note: llama3.2:3b is a good balance of quality and speed for local use. Needs about 4GB RAM.
Configure the Connection
Set up the Python client to talk to both Ollama and HatiData.
import requests
from hatidata_agent import HatiDataAgent
# Connect to local HatiData
hati = HatiDataAgent(host="localhost", port=5439, agent_id="local-agent")
# Verify Ollama is running
response = requests.get("http://localhost:11434/api/tags")
models = response.json().get("models", [])
print("Available Ollama models:")
for m in models:
print(f" {m['name']} ({m['size'] / 1e9:.1f} GB)")
# Verify HatiData is running
result = hati.query("SELECT 1 AS health_check")
print(f"\nHatiData: healthy ({result[0]['health_check']})")Available Ollama models:
llama3.2:3b (2.0 GB)
HatiData: healthy (1)Store Memories Locally
Store knowledge that the local LLM will use as context.
# Store project-specific knowledge
memories = [
"Our API rate limit is 1000 requests per minute per client",
"Database backups run at 2 AM UTC using pg_dump to S3",
"The deployment process: PR merge → CI build → staging deploy → smoke test → prod deploy",
"Team standup is at 9:30 AM PST on Monday, Wednesday, Friday",
"Production incidents go to #incidents Slack channel, page on-call via PagerDuty",
]
for mem in memories:
hati.execute(f"""
SELECT store_memory('{mem}', 'team-knowledge')
""")
print(f"Stored {len(memories)} memories locally")
print("All data stays on your machine — no cloud APIs called")Stored 5 memories locally
All data stays on your machine — no cloud APIs calledNote: HatiData uses a local embedding service for vector search when no cloud API key is configured.
Chat with Local LLM + Memory
Build a chat function that retrieves relevant memories and sends them to Ollama.
def chat_local(user_input: str) -> str:
# Retrieve relevant memories from HatiData
memories = hati.query(f"""
SELECT content
FROM _hatidata_memory.memories
WHERE namespace = 'team-knowledge'
AND semantic_match(embedding, '{user_input}', 0.6)
ORDER BY semantic_rank(embedding, '{user_input}') DESC
LIMIT 3
""")
context = "\n".join(m["content"] for m in memories)
# Send to Ollama with memory context
response = requests.post("http://localhost:11434/api/generate", json={
"model": "llama3.2:3b",
"prompt": f"Context from memory:\n{context}\n\nQuestion: {user_input}\nAnswer concisely:",
"stream": False,
})
return response.json()["response"]
# Test it
answer = chat_local("How do we handle production incidents?")
print(f"Q: How do we handle production incidents?")
print(f"A: {answer}")
answer = chat_local("When is the team standup?")
print(f"\nQ: When is the team standup?")
print(f"A: {answer}")Q: How do we handle production incidents?
A: Production incidents are reported in the #incidents Slack channel. The on-call engineer is paged via PagerDuty for immediate response.
Q: When is the team standup?
A: Team standup is at 9:30 AM PST on Monday, Wednesday, and Friday.Verify Everything is Local
Confirm no data leaves your machine by checking network activity.
# Check that only localhost connections were made
import socket
# All connections are to localhost
services = [
("Ollama", "localhost", 11434),
("HatiData Proxy", "localhost", 5439),
("HatiData MCP", "localhost", 5440),
]
print("=== Local Service Check ===")
for name, host, port in services:
try:
sock = socket.create_connection((host, port), timeout=2)
sock.close()
print(f" {name}: localhost:{port} — connected (local)")
except:
print(f" {name}: localhost:{port} — not running")
print("\nAll services are local. No cloud APIs called.")
print("Your data never left your machine.")=== Local Service Check ===
Ollama: localhost:11434 — connected (local)
HatiData Proxy: localhost:5439 — connected (local)
HatiData MCP: localhost:5440 — connected (local)
All services are local. No cloud APIs called.
Your data never left your machine.Note: This setup is ideal for sensitive data, air-gapped environments, or when you simply want zero cloud dependency.
Related Use Case
Security
Private RAG in Your VPC
RAG That Never Leaves Your VPC