AI Engineering

What is Agent-Native Data Infrastructure (ANDI)?

HatiData Team10 min read

The Infrastructure Gap

AI agents are getting remarkably capable. They can reason over complex problems, call tools, generate code, and orchestrate multi-step workflows. GPT-4, Claude, Gemini, and their open-source counterparts improve every quarter. But the infrastructure these agents run on has not evolved at the same pace.

Today, most AI agents operate on platforms designed for human analysts. They query traditional cloud data platforms built for batch analytics — platforms where every query starts a cluster, bills in 60-second minimums, and returns results through dashboards. There is no concept of agent identity. No persistent memory between sessions. No way to audit the reasoning chain that led to a decision. No mechanism to isolate an agent's experimental state from production data.

This is the infrastructure gap: the distance between what AI agents can do and what their underlying data platforms allow them to do safely, efficiently, and at scale.

Defining ANDI

Agent-Native Data Infrastructure (ANDI) is a new category of platforms purpose-built for autonomous AI agents. Rather than retrofitting analytics tools with agent-friendly wrappers, ANDI platforms are designed from the ground up around the requirements of software that thinks, acts, and persists state across sessions.

An ANDI platform provides five core capabilities through a standard SQL interface: persistent memory, verifiable reasoning, isolated state management, semantic triggers, and governance. These are not add-on features — they are foundational primitives baked into the query engine, the storage layer, and the access control system.

The key insight behind ANDI is that AI agents are not humans using dashboards. They are software processes that need programmatic, low-latency, secure access to data — with guarantees about isolation, auditability, and cost that traditional platforms were never designed to provide.

The 5 Pillars of ANDI

1. Persistent Memory

AI agents need to remember. Not just the last conversation, but everything they have learned across hundreds or thousands of interactions. ANDI platforms provide hybrid SQL and vector search that allows agents to store, retrieve, and reason over memories using standard SQL syntax.

This goes beyond a simple key-value store. Persistent memory in ANDI means:

  • store_memory() to persist observations, preferences, and learned facts
  • semantic_match() to retrieve memories by meaning, not just keywords
  • semantic_rank() to order results by relevance to the current context
  • Full SQL JOIN capability to combine memories with structured data

An agent can store that "this customer prefers concise responses" and retrieve that memory six months later when the same customer returns — without any external vector database, without Redis, without a separate memory service.

2. Chain-of-Thought Ledger

When an AI agent approves a loan, flags a transaction, or recommends a treatment plan, someone will eventually ask: why? The Chain-of-Thought (CoT) Ledger provides an immutable, hash-chained record of every reasoning step an agent takes.

Each step in the ledger includes:

  • Step type: observation, hypothesis, tool_call, verification, decision, or one of several other semantic step types
  • Input summary: what the agent considered
  • Output summary: what the agent concluded
  • Cryptographic hash: a hash that chains to the previous step

If any step is modified after the fact — even a single character — the hash chain breaks. This provides the same integrity guarantee that blockchains offer, but applied to agent reasoning rather than financial transactions. The ledger is append-only: no UPDATE, no DELETE, no TRUNCATE. Once a reasoning step is recorded, it is permanent.

3. Semantic Triggers

Traditional event systems fire on exact matches: "if column X equals 42, send a webhook." Semantic triggers fire on meaning. An ANDI platform evaluates incoming data against concept embeddings and fires actions when the semantic similarity exceeds a threshold.

For example, a semantic trigger can detect that an agent's output is "expressing frustration with a customer" even if the exact words have never been seen before. The trigger evaluation uses a two-stage pipeline: a fast approximate nearest neighbor (ANN) pre-filter followed by exact cosine similarity verification. This keeps evaluation latency under 50 milliseconds even with thousands of active triggers.

Trigger actions include webhook delivery (with HMAC-SHA256 signatures), agent notifications, event logging, and flagging for human review.

4. Branch Isolation

Production AI agents need to experiment safely. Branch isolation allows an agent to create a copy-on-write branch of its current state, make changes, evaluate results, and either merge back to the main state or discard the branch — without affecting production data.

Under the hood, branches are implemented as schema-level isolation. When a branch is created, the platform creates views that reference the main tables (zero-copy). The first time the agent writes to a branched table, the platform materializes a copy (copy-on-write). This means branch creation is instantaneous and branch storage cost is proportional to the changes made, not the total data size.

Merge operations include conflict detection and configurable resolution strategies: branch wins, main wins, manual resolution, or abort on conflict.

5. Agent Governance

ANDI platforms treat agents as first-class identities with attribute-based access control (ABAC). Each agent has a defined set of permissions that control which tables it can read, which columns it can see, which operations it can perform, and how much compute it can consume.

Governance in ANDI includes:

  • ABAC policies: per-agent, per-table, per-column access control
  • Immutable audit trails: every query logged with agent identity, timestamp, and result metadata
  • CMEK encryption: customer-managed encryption keys for data at rest
  • In-VPC deployment: the data plane runs in the customer's network, not a shared multi-tenant cloud
  • Cost governance: per-agent compute quotas, per-second billing, and automatic suspend after idle periods

The 4-Layer Stack

ANDI platforms are organized into four layers, each building on the one below:

Data Layer: The foundation. A high-performance query engine with open table format storage. This layer handles SQL execution, columnar storage, and data lifecycle management. Because the storage format is open, there is no vendor lock-in — data can be read by any compatible tool.

Intelligence Layer: Vector search and embedding management layered on top of the data layer. This enables hybrid queries that combine structured SQL filters with semantic similarity search. Embeddings are stored alongside structured data, not in a separate system.

Reasoning Layer: The Chain-of-Thought Ledger and semantic triggers. This layer provides the auditability and automation primitives that production agents require. Reasoning traces are queryable with standard SQL, and semantic triggers are evaluated in real-time as data flows through the system.

Governance Layer: ABAC, audit trails, encryption, and deployment controls. This layer ensures that agents operate within defined boundaries and that every action is recorded for compliance review.

Why Legacy Infrastructure Fails Agents

Traditional cloud data platforms were built for a different era — one where human analysts ran batch queries, examined dashboards, and made decisions themselves. These platforms fail agents in five specific ways:

60-second billing minimums: An AI agent that runs a 200-millisecond query gets billed for 60 seconds of compute. For agents running hundreds of small queries per hour, the overcharge is 300x or more. ANDI platforms bill per-second with sub-second granularity.

No agent identity: Legacy platforms authenticate users, not agents. There is no way to give Agent A read access to customer data while restricting Agent B to aggregate statistics only. ANDI platforms have agent-level ABAC from the ground up.

No persistent memory: When an agent's session ends, everything it learned disappears. There is no built-in mechanism for agents to store and retrieve memories across sessions. ANDI platforms make memory a core SQL primitive.

No reasoning audit trail: Traditional platforms log query text and execution metadata. They do not capture the reasoning chain that led an agent to run a particular query. ANDI platforms record the full chain of thought with cryptographic integrity.

Data leaves the VPC: Most legacy cloud platforms process data in shared multi-tenant infrastructure. For regulated industries — financial services, healthcare, government — this is a non-starter. ANDI platforms deploy the data plane in the customer's own VPC, ensuring data never crosses network boundaries.

Getting Started with ANDI

HatiData is the first ANDI platform. It implements all five pillars — persistent memory, chain-of-thought ledger, semantic triggers, branch isolation, and agent governance — in a single deployment that speaks standard Postgres wire protocol.

Install locally and try it:

bash
curl -fsSL https://hatidata.com/install.sh | sh
hati init

Connect with any Postgres client, SQL tool, or AI framework. The 24 MCP tools work with Claude, Cursor, and any MCP-compatible client out of the box.

Enjoyed this post?

Get notified when we publish new engineering deep-dives and product updates.

Ready to see the difference?

Run the free audit script in 5 minutes. Or start Shadow Mode and see HatiData run your actual workloads side-by-side.