OllamaBeginner20 min

Ollama + HatiData: Local LLM + Memory

Build a fully local AI agent setup using Ollama for inference and HatiData for persistent memory — no cloud APIs, no data leaving your machine.

What You'll Build

A fully local AI agent setup using Ollama for inference and HatiData for persistent memory — no cloud required.

Prerequisites

$Ollama installed (ollama.com)

$hati init

$8GB+ RAM

Architecture

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   Ollama     │    │  HatiData    │───▶│   Engine     │
│  (local LLM) │    │  Memory API  │    │  + Vectors   │
└──────┬───────┘    └──────────────┘    └──────────────┘
       │                   ▲
       └───────────────────┘
        100% local — no cloud APIs

Key Concepts

●100% local: both inference (Ollama) and data storage (HatiData) run on your machine — no cloud APIs needed
●Local embeddings: HatiData uses a local embedding service for embeddings when no cloud API key is configured
●Privacy by design: no data leaves your machine, making this ideal for sensitive or regulated environments
●Standard interface: the same HatiData SQL + memory API works whether running locally or in the cloud

Step-by-Step Implementation

Install Ollama and HatiData

Set up both Ollama and HatiData locally.

Bash

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.2:3b

# Install and start HatiData
curl -fsSL https://hatidata.com/install.sh | sh
hati init

Expected Output

Ollama installed successfully
pulling llama3.2:3b... done

HatiData initialized.
Proxy running on localhost:5439
MCP server running on localhost:5440

Note: llama3.2:3b is a good balance of quality and speed for local use. Needs about 4GB RAM.

Configure the Connection

Set up the Python client to talk to both Ollama and HatiData.

Python

import requests
from hatidata_agent import HatiDataAgent

# Connect to local HatiData
hati = HatiDataAgent(host="localhost", port=5439, agent_id="local-agent")

# Verify Ollama is running
response = requests.get("http://localhost:11434/api/tags")
models = response.json().get("models", [])
print("Available Ollama models:")
for m in models:
    print(f"  {m['name']} ({m['size'] / 1e9:.1f} GB)")

# Verify HatiData is running
result = hati.query("SELECT 1 AS health_check")
print(f"\nHatiData: healthy ({result[0]['health_check']})")

Expected Output

Available Ollama models:
  llama3.2:3b (2.0 GB)

HatiData: healthy (1)

Store Memories Locally

Store knowledge that the local LLM will use as context.

Python

# Store project-specific knowledge
memories = [
    "Our API rate limit is 1000 requests per minute per client",
    "Database backups run at 2 AM UTC using pg_dump to S3",
    "The deployment process: PR merge → CI build → staging deploy → smoke test → prod deploy",
    "Team standup is at 9:30 AM PST on Monday, Wednesday, Friday",
    "Production incidents go to #incidents Slack channel, page on-call via PagerDuty",
]

for mem in memories:
    hati.execute(f"""
        SELECT store_memory('{mem}', 'team-knowledge')
    """)

print(f"Stored {len(memories)} memories locally")
print("All data stays on your machine — no cloud APIs called")

Expected Output

Stored 5 memories locally
All data stays on your machine — no cloud APIs called

Note: HatiData uses a local embedding service for vector search when no cloud API key is configured.

Chat with Local LLM + Memory

Build a chat function that retrieves relevant memories and sends them to Ollama.

Python

def chat_local(user_input: str) -> str:
    # Retrieve relevant memories from HatiData
    memories = hati.query(f"""
        SELECT content
        FROM _hatidata_memory.memories
        WHERE namespace = 'team-knowledge'
          AND semantic_match(embedding, '{user_input}', 0.6)
        ORDER BY semantic_rank(embedding, '{user_input}') DESC
        LIMIT 3
    """)

    context = "\n".join(m["content"] for m in memories)

    # Send to Ollama with memory context
    response = requests.post("http://localhost:11434/api/generate", json={
        "model": "llama3.2:3b",
        "prompt": f"Context from memory:\n{context}\n\nQuestion: {user_input}\nAnswer concisely:",
        "stream": False,
    })

    return response.json()["response"]

# Test it
answer = chat_local("How do we handle production incidents?")
print(f"Q: How do we handle production incidents?")
print(f"A: {answer}")

answer = chat_local("When is the team standup?")
print(f"\nQ: When is the team standup?")
print(f"A: {answer}")

Expected Output

Q: How do we handle production incidents?
A: Production incidents are reported in the #incidents Slack channel. The on-call engineer is paged via PagerDuty for immediate response.

Q: When is the team standup?
A: Team standup is at 9:30 AM PST on Monday, Wednesday, and Friday.

Verify Everything is Local

Confirm no data leaves your machine by checking network activity.

Python

# Check that only localhost connections were made
import socket

# All connections are to localhost
services = [
    ("Ollama", "localhost", 11434),
    ("HatiData Proxy", "localhost", 5439),
    ("HatiData MCP", "localhost", 5440),
]

print("=== Local Service Check ===")
for name, host, port in services:
    try:
        sock = socket.create_connection((host, port), timeout=2)
        sock.close()
        print(f"  {name}: localhost:{port} — connected (local)")
    except:
        print(f"  {name}: localhost:{port} — not running")

print("\nAll services are local. No cloud APIs called.")
print("Your data never left your machine.")

Expected Output

=== Local Service Check ===
  Ollama: localhost:11434 — connected (local)
  HatiData Proxy: localhost:5439 — connected (local)
  HatiData MCP: localhost:5440 — connected (local)

All services are local. No cloud APIs called.
Your data never left your machine.

Note: This setup is ideal for sensitive data, air-gapped environments, or when you simply want zero cloud dependency.

Related Use Case

Security

Private RAG in Your VPC

RAG That Never Leaves Your VPC

Ready to build?

Install HatiData locally and start building with Ollama in minutes.

Join Waitlist