Compliance

SOC 2 Compliance for AI Agent Infrastructure

HatiData TeamMarch 5, 20268 min read

Why SOC 2 Matters for Agent Infrastructure

SOC 2 Type II certification is the standard trust signal for enterprise software. When a CISO evaluates whether to allow AI agents to access sensitive data, one of the first questions is: "Is the agent infrastructure SOC 2 compliant?" Without a credible answer, the conversation ends before it begins.

But AI agents create new challenges for SOC 2 compliance that did not exist with traditional software. Agents make autonomous decisions that need audit trails. They store learned knowledge that needs access controls. They execute queries that need monitoring. And they reason through complex logic that needs to be explainable.

Traditional SOC 2 controls were designed for human-operated systems where a person initiates every action. Agent infrastructure needs controls that work when the actor is an autonomous AI system operating at machine speed.

This guide maps the five SOC 2 Trust Service Criteria to specific HatiData capabilities, showing how each requirement is addressed by the platform's built-in features.

Security: Protecting Agent Data

The Security criterion requires protection against unauthorized access to systems and data. For agent infrastructure, this means controlling what each agent can access, encrypting data at rest and in transit, and monitoring for anomalous behavior.

Access Controls (RBAC + ABAC)

HatiData implements two layers of access control:

RBAC (Role-Based Access Control) — Every agent has an API key with a scope (ReadOnly or Admin) and a set of allowed namespaces. The scope determines which operations the agent can perform, and the namespaces determine which data the agent can access. This is the coarse-grained control layer.

ABAC (Attribute-Based Access Control) — For fine-grained control, the policy engine evaluates attributes of the request, the agent, and the target data. Policies can restrict access based on time of day, data sensitivity tags, query complexity, or any other attribute. This handles requirements like "PII columns can only be accessed by agents with explicit PII clearance."

Together, RBAC and ABAC provide defense in depth. Even if an agent's API key is compromised, the attacker is limited to the key's namespace and scope, and ABAC policies add further restrictions based on the request context.

Encryption at Rest (CMEK)

HatiData encrypts all data at rest using AES-256. For customers who require control over their own encryption keys, Customer-Managed Encryption Keys (CMEK) are supported through cloud-native key management:

AWS — AWS KMS customer-managed keys
GCP — Cloud KMS customer-managed keys
Azure — Azure Key Vault customer-managed keys

With CMEK, the customer retains full control over the encryption keys. Revoking the key immediately renders all data unreadable, providing a reliable "kill switch" for data destruction if needed.

Encryption in Transit

All communication between agents and HatiData uses TLS 1.3. For deployments using PrivateLink, traffic never traverses the public internet — it stays entirely within the cloud provider's backbone network.

The MCP server, REST API, and Postgres wire protocol all enforce TLS. Unencrypted connections are rejected by default.

Network Isolation

HatiData's two-plane architecture provides strong network isolation:

The data plane runs inside the customer's VPC, with no public internet exposure
PrivateLink connectivity ensures that agent-to-data-plane traffic stays within the cloud provider's network
The control plane communicates with the data plane through a narrow, encrypted control channel that carries only metadata (no query data or results)

Availability: Ensuring Agent Uptime

The Availability criterion requires that systems are operational and accessible as committed. For agent infrastructure, this means the query engine, memory store, and MCP tools must be available when agents need them.

Auto-Healing Query Pipeline

HatiData's query pipeline includes an auto-heal step. When a query fails due to a transient error or minor syntax issue, the pipeline automatically attempts to correct and re-execute the query. This reduces the impact of intermittent failures on agent availability.

Graceful Degradation

When the vector search component is unavailable, HatiData falls back to SQL-only search. Semantic functions return neutral results (no matches / zero similarity), and pure SQL queries continue to work normally. Agents experience reduced capability but not a full outage.

Per-Second Billing with Auto-Suspend

The auto-suspend feature ensures that compute resources are available when needed without manual scaling. When queries arrive, the system resumes within milliseconds. When queries stop, compute scales down. This eliminates the availability risk of manually-managed capacity — there is no "forgot to scale up before the batch job" failure mode.

Processing Integrity: Accurate Agent Results

The Processing Integrity criterion requires that system processing is complete, valid, accurate, and authorized. For agent infrastructure, this means queries must return correct results, memory operations must be reliable, and reasoning traces must be complete.

Query Pipeline Validation

The multi-stage query pipeline includes multiple validation checkpoints:

Step 3 (Policy Check) — Ensures the query is authorized before execution
Step 5 (Quota Check) — Ensures the agent has sufficient quota
Step 6 (Row Filter) — Ensures namespace isolation is enforced
Step 11 (Column Mask) — Ensures sensitive data is masked per policy

Each step either passes the query through or blocks it with an explanatory error. No query reaches the execution engine without passing all upstream checks.

Shadow Mode Validation

For customers migrating to HatiData, Shadow Mode provides automated result validation against their existing warehouse. Every query runs on both systems, and results are compared automatically. This provides statistical proof that HatiData produces accurate results for the specific workload.

Hash-Chained Reasoning Traces

The CoT ledger's cryptographic hash chain provides cryptographic proof that reasoning traces are complete and unmodified. Any attempt to insert, modify, or delete steps in the chain would break the hash sequence, which is detectable through the verify_chain function.

Confidentiality: Protecting Sensitive Data

The Confidentiality criterion requires that data designated as confidential is protected. For agent infrastructure, this means agent memories, query results, and reasoning traces must be accessible only to authorized parties.

Namespace Isolation

Every piece of data in HatiData — memories, reasoning traces, state entries, query results — belongs to a namespace. Each agent key specifies which namespaces it can access. The query pipeline automatically injects namespace filters into every query, ensuring an agent never sees data outside its authorized namespaces.

Column Masking

For tables containing mixed-sensitivity data, column masking allows specific columns to be redacted or anonymized based on the querying agent's permissions. A support agent might see customer names but not credit card numbers, while an analytics agent sees aggregated statistics but no individual records.

Data Residency

HatiData's in-VPC deployment model means data resides in the customer's chosen region. For organizations with data residency requirements (GDPR, data localization laws), this provides a simple guarantee: the data physically exists in the same region as the VPC, managed by the same cloud provider, subject to the same jurisdictional rules.

Privacy: Handling Personal Data

The Privacy criterion addresses the collection, use, retention, and disposal of personal information. For agent infrastructure, this primarily concerns agent memories that may contain personal data.

Memory Lifecycle Management

HatiData supports configurable retention policies for agent memories. Memories can be automatically expired after a specified duration, ensuring that personal data does not persist indefinitely. Expired memories are permanently deleted from both the query engine and the vector index.

Right to Erasure

For GDPR compliance, HatiData supports targeted deletion of memories associated with a specific individual. The delete_memory MCP tool removes the memory from the query engine, its embedding from the vector index, and any references in the CoT ledger are annotated as redacted (the hash chain is preserved by recording the deletion as a special step type).

Audit Trail

Every data access — whether a query, a memory retrieval, or a CoT replay — is recorded in the audit trail with the agent identity, timestamp, and operation type. This provides a complete record of who accessed what data and when, supporting GDPR's accountability requirements.

Putting It Together

SOC 2 compliance for agent infrastructure requires controls across all five trust criteria that work for autonomous agents, not just human users. HatiData's architecture addresses these requirements through built-in features rather than bolt-on additions:

SOC 2 Criterion	HatiData Feature
Security — Access Control	RBAC (key scopes) + ABAC (policy engine)
Security — Encryption	AES-256 at rest, TLS 1.3 in transit, CMEK
Security — Network	In-VPC deployment, PrivateLink, no public exposure
Availability — Uptime	Auto-heal, graceful degradation, auto-suspend
Processing Integrity	Multi-stage pipeline validation, Shadow Mode, hash chains
Confidentiality	Namespace isolation, column masking, data residency
Privacy	Retention policies, right to erasure, audit trail

Next Steps

For organizations pursuing SOC 2 Type II certification, HatiData provides a compliance package that includes architecture diagrams, control descriptions, and evidence collection guidance mapped to each trust criterion. See the SOC 2 architecture documentation for the complete control mapping.