OllamaBeginner15 min

Ollama + HatiData: Air-Gapped Local AI

Run AI agents completely locally with Ollama and HatiData. Zero cloud dependencies.

What You'll Build

A fully local AI agent with Ollama for inference and HatiData for persistent memory and SQL queries.

Prerequisites

$pip install hatidata-agent ollama

$hati init

$ollama pull llama3

Architecture

┌──────────────┐    ┌──────────────┐
│   Ollama     │───▶│  HatiData    │
│  (Local LLM) │    │  (Local DB)  │
└──────────────┘    └──────────────┘
     100% Local — No Cloud Required

Key Concepts

●Zero cloud dependencies: Ollama runs LLM inference locally, HatiData stores data in a local DuckDB file, and the built-in ONNX model handles embeddings — nothing touches the network
●Air-gapped operation: the entire stack works without internet access, making it suitable for classified environments, edge deployments, and data-sovereign workloads
●Local-first data sovereignty: all agent memories, embeddings, and query results stay on your machine in a single DuckDB file that you fully control
●Ollama + DuckDB performance: local LLM inference avoids API latency while DuckDB provides sub-millisecond SQL queries — often faster than cloud round-trips

Step-by-Step Implementation

Install Ollama and HatiData

Install both Ollama for local LLM inference and HatiData for persistent agent memory. Pull a model and initialize the database.

Bash

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a local model
ollama pull llama3

# Install HatiData agent library
pip install hatidata-agent ollama

# Initialize the local HatiData database
hati init

Expected Output

>>> pulling llama3... done
HatiData initialized at ./hatidata/
  Storage: ./hatidata/data.duckdb
  Config:  ./.hati/config.toml
  MCP:     ready on port 8741

Note: Ollama runs entirely on your machine with no API keys. HatiData stores everything in a local DuckDB file. Zero cloud dependencies from the start.

Configure for Fully Local Operation

Create a HatiData configuration file that explicitly disables all cloud features for air-gapped environments.

TOML

# .hati/config.toml — fully local configuration
# No cloud endpoints, no telemetry, no external calls

[storage]
path = "./agent_data"

[memory]
default_namespace = "local_agent"
embedding_dimensions = 384
embedding_provider = "local"  # uses built-in ONNX model

[proxy]
port = 5439
host = "127.0.0.1"

[cloud]
enabled = false
telemetry = false

Note: Setting embedding_provider to 'local' uses the built-in ONNX embedding model. No OpenAI or Cohere API calls. The entire stack runs offline.

Build the Local Agent

Create an agent that uses Ollama for LLM inference and HatiData MCP tools for persistent memory, running entirely on your machine.

Python

import ollama
from hatidata_agent import HatiDataAgent

# Connect to local HatiData
hati = HatiDataAgent(host="localhost", port=5439, agent_id="local-agent", framework="ollama")

# Agent system prompt
SYSTEM_PROMPT = """You are a helpful local AI assistant with persistent memory.
You can remember information across conversations.

When the user shares important facts, store them in memory.
When answering questions, check your memory first for relevant context.
You run 100% locally — no data ever leaves this machine."""


def store_memory(content: str, tags: list[str]) -> str:
    """Store a memory in HatiData."""
    hati.execute(
        "SELECT store_memory(?, ?, 'local_agent')",
        [content, ",".join(tags)]
    )
    return f"Stored: {content}"


def search_memory(query: str, top_k: int = 5) -> list[dict]:
    """Search memories using semantic similarity."""
    results = hati.query(f"""
        SELECT content, tags,
               semantic_rank(embedding, '{query}') AS relevance
        FROM _hatidata_memory.memories
        WHERE namespace = 'local_agent'
        ORDER BY relevance DESC
        LIMIT {top_k}
    """)
    return results


def chat(user_message: str) -> str:
    """Run one turn of conversation with memory."""
    # Check memory for relevant context
    memories = search_memory(user_message, top_k=3)
    context = ""
    if memories and memories[0]["relevance"] > 0.5:
        context = "\nRelevant memories:\n" + "\n".join(
            f"- {m['content']} (relevance: {m['relevance']:.2f})"
            for m in memories
        )

    # Call Ollama locally
    response = ollama.chat(
        model="llama3",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT + context},
            {"role": "user", "content": user_message},
        ],
    )
    reply = response["message"]["content"]

    # Store important facts from the conversation
    if any(kw in user_message.lower() for kw in ["my name", "i prefer", "remember"]):
        store_memory(user_message, ["user_fact", "conversation"])

    return reply


# Test the local agent
print(chat("My name is Marcus and I work on embedded systems."))
print(chat("What do you know about me?"))

Expected Output

I'll remember that, Marcus! I've noted that you work on embedded systems.
Since we're running entirely locally, your data stays on this machine.

Based on my memory, your name is Marcus and you work on embedded systems.
This was stored from our earlier conversation. (relevance: 0.94)

Note: The ollama.chat() call runs locally on your GPU or CPU. The HatiDataAgent connects to the local DuckDB instance. Nothing touches the network.

Test Offline Memory Persistence

Verify that memories persist across agent restarts without any internet connection. Simulate an air-gapped environment.

Python

# Simulate restart: create a brand new client connection
from hatidata_agent import HatiDataAgent

# New connection — simulates agent restart
client = HatiDataAgent(host="localhost", port=5439, agent_id="local-agent", framework="ollama")

# Verify memories survived the restart
memories = client.query("""
    SELECT memory_id, content, tags, created_at
    FROM _hatidata_memory.memories
    WHERE namespace = 'local_agent'
    ORDER BY created_at DESC
    LIMIT 10
""")

print(f"Found {len(memories)} persisted memories:\n")
for m in memories:
    print(f"  [{m['created_at']}] {m['content']}")
    print(f"    Tags: {m['tags']}")

# Semantic search still works without internet
relevant = client.query("""
    SELECT content,
           semantic_rank(embedding, 'what does the user work on') AS score
    FROM _hatidata_memory.memories
    WHERE namespace = 'local_agent'
    ORDER BY score DESC
    LIMIT 3
""")

print("\nSemantic search (offline):")
for r in relevant:
    print(f"  [{r['score']:.3f}] {r['content']}")

Expected Output

Found 2 persisted memories:

  [2025-01-15 14:22:01] My name is Marcus and I work on embedded systems.
    Tags: ['user_fact', 'conversation']
  [2025-01-15 14:21:58] What do you know about me?
    Tags: ['user_fact', 'conversation']

Semantic search (offline):
  [0.891] My name is Marcus and I work on embedded systems.
  [0.234] What do you know about me?

Note: Memories persist in the local DuckDB file. The built-in ONNX embedding model runs semantic search without any network calls. Disconnect your WiFi to verify.

Run SQL Queries on Local Data

Connect directly to HatiData on port 5439 and run SQL analytics on all stored agent data.

Python

from hatidata_agent import HatiDataAgent

client = HatiDataAgent(host="localhost", port=5439, agent_id="local-agent", framework="ollama")

# Count memories by namespace
stats = client.query("""
    SELECT namespace, COUNT(*) AS memory_count,
           MIN(created_at) AS first_memory,
           MAX(created_at) AS last_memory
    FROM _hatidata_memory.memories
    GROUP BY namespace
    ORDER BY memory_count DESC
""")

print("=== Memory Statistics ===")
for s in stats:
    print(f"  {s['namespace']}: {s['memory_count']} memories")
    print(f"    First: {s['first_memory']}")
    print(f"    Last:  {s['last_memory']}")

# Run analytics on stored data
tag_analysis = client.query("""
    SELECT UNNEST(string_split(tags, ',')) AS tag,
           COUNT(*) AS count
    FROM _hatidata_memory.memories
    GROUP BY tag
    ORDER BY count DESC
    LIMIT 10
""")

print("\n=== Tag Distribution ===")
for t in tag_analysis:
    print(f"  {t['tag']}: {t['count']} occurrences")

# Export all memories as JSON for backup
client.execute("""
    COPY (
        SELECT memory_id, namespace, content, tags, created_at
        FROM _hatidata_memory.memories
    ) TO './backup_memories.json' (FORMAT JSON)
""")
print("\nBackup exported to ./backup_memories.json")

Expected Output

=== Memory Statistics ===
  local_agent: 2 memories
    First: 2025-01-15 14:21:58
    Last:  2025-01-15 14:22:01

=== Tag Distribution ===
  user_fact: 2 occurrences
  conversation: 2 occurrences

Backup exported to ./backup_memories.json

Note: HatiData exposes all data as standard SQL tables on port 5439. You can connect with psql, DBeaver, or any Postgres-compatible tool for ad-hoc queries and exports.

Related Use Case

Operations

DevOps & Incident Response

Agents That Learn From Every Incident

Ready to build?

Install HatiData locally and start building with Ollama in minutes.

Join Waitlist