← All Cookbooks
OllamaBeginner20 min

Ollama + HatiData: Local LLM + Memory

Build a fully local AI agent setup using Ollama for inference and HatiData for persistent memory — no cloud APIs, no data leaving your machine.

What You'll Build

A fully local AI agent setup using Ollama for inference and HatiData for persistent memory — no cloud required.

Prerequisites

$Ollama installed (ollama.com)

$hati init

$8GB+ RAM

Architecture

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   Ollama     │    │  HatiData    │───▶│   Engine     │
│  (local LLM) │    │  Memory API  │    │  + Vectors   │
└──────┬───────┘    └──────────────┘    └──────────────┘
       │                   ▲
       └───────────────────┘
        100% local — no cloud APIs

Key Concepts

  • 100% local: both inference (Ollama) and data storage (HatiData) run on your machine — no cloud APIs needed
  • Local embeddings: HatiData uses a local embedding service for embeddings when no cloud API key is configured
  • Privacy by design: no data leaves your machine, making this ideal for sensitive or regulated environments
  • Standard interface: the same HatiData SQL + memory API works whether running locally or in the cloud

Step-by-Step Implementation

1

Install Ollama and HatiData

Set up both Ollama and HatiData locally.

Bash
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.2:3b

# Install and start HatiData
curl -fsSL https://hatidata.com/install.sh | sh
hati init
Expected Output
Ollama installed successfully
pulling llama3.2:3b... done

HatiData initialized.
Proxy running on localhost:5439
MCP server running on localhost:5440

Note: llama3.2:3b is a good balance of quality and speed for local use. Needs about 4GB RAM.

2

Configure the Connection

Set up the Python client to talk to both Ollama and HatiData.

Python
import requests
from hatidata_agent import HatiDataAgent

# Connect to local HatiData
hati = HatiDataAgent(host="localhost", port=5439, agent_id="local-agent")

# Verify Ollama is running
response = requests.get("http://localhost:11434/api/tags")
models = response.json().get("models", [])
print("Available Ollama models:")
for m in models:
    print(f"  {m['name']} ({m['size'] / 1e9:.1f} GB)")

# Verify HatiData is running
result = hati.query("SELECT 1 AS health_check")
print(f"\nHatiData: healthy ({result[0]['health_check']})")
Expected Output
Available Ollama models:
  llama3.2:3b (2.0 GB)

HatiData: healthy (1)
3

Store Memories Locally

Store knowledge that the local LLM will use as context.

Python
# Store project-specific knowledge
memories = [
    "Our API rate limit is 1000 requests per minute per client",
    "Database backups run at 2 AM UTC using pg_dump to S3",
    "The deployment process: PR merge → CI build → staging deploy → smoke test → prod deploy",
    "Team standup is at 9:30 AM PST on Monday, Wednesday, Friday",
    "Production incidents go to #incidents Slack channel, page on-call via PagerDuty",
]

for mem in memories:
    hati.execute(f"""
        SELECT store_memory('{mem}', 'team-knowledge')
    """)

print(f"Stored {len(memories)} memories locally")
print("All data stays on your machine — no cloud APIs called")
Expected Output
Stored 5 memories locally
All data stays on your machine — no cloud APIs called

Note: HatiData uses a local embedding service for vector search when no cloud API key is configured.

4

Chat with Local LLM + Memory

Build a chat function that retrieves relevant memories and sends them to Ollama.

Python
def chat_local(user_input: str) -> str:
    # Retrieve relevant memories from HatiData
    memories = hati.query(f"""
        SELECT content
        FROM _hatidata_memory.memories
        WHERE namespace = 'team-knowledge'
          AND semantic_match(embedding, '{user_input}', 0.6)
        ORDER BY semantic_rank(embedding, '{user_input}') DESC
        LIMIT 3
    """)

    context = "\n".join(m["content"] for m in memories)

    # Send to Ollama with memory context
    response = requests.post("http://localhost:11434/api/generate", json={
        "model": "llama3.2:3b",
        "prompt": f"Context from memory:\n{context}\n\nQuestion: {user_input}\nAnswer concisely:",
        "stream": False,
    })

    return response.json()["response"]

# Test it
answer = chat_local("How do we handle production incidents?")
print(f"Q: How do we handle production incidents?")
print(f"A: {answer}")

answer = chat_local("When is the team standup?")
print(f"\nQ: When is the team standup?")
print(f"A: {answer}")
Expected Output
Q: How do we handle production incidents?
A: Production incidents are reported in the #incidents Slack channel. The on-call engineer is paged via PagerDuty for immediate response.

Q: When is the team standup?
A: Team standup is at 9:30 AM PST on Monday, Wednesday, and Friday.
5

Verify Everything is Local

Confirm no data leaves your machine by checking network activity.

Python
# Check that only localhost connections were made
import socket

# All connections are to localhost
services = [
    ("Ollama", "localhost", 11434),
    ("HatiData Proxy", "localhost", 5439),
    ("HatiData MCP", "localhost", 5440),
]

print("=== Local Service Check ===")
for name, host, port in services:
    try:
        sock = socket.create_connection((host, port), timeout=2)
        sock.close()
        print(f"  {name}: localhost:{port} — connected (local)")
    except:
        print(f"  {name}: localhost:{port} — not running")

print("\nAll services are local. No cloud APIs called.")
print("Your data never left your machine.")
Expected Output
=== Local Service Check ===
  Ollama: localhost:11434 — connected (local)
  HatiData Proxy: localhost:5439 — connected (local)
  HatiData MCP: localhost:5440 — connected (local)

All services are local. No cloud APIs called.
Your data never left your machine.

Note: This setup is ideal for sensitive data, air-gapped environments, or when you simply want zero cloud dependency.

Ready to build?

Install HatiData locally and start building with Ollama in minutes.

Join Waitlist