Branch Isolation: Safe Exploration for AI Agents
The Exploration Problem
AI agents need to explore. A financial analysis agent might want to test what happens to a portfolio under different market scenarios. A data engineering agent might want to try different transformation strategies before committing to one. A research agent might want to pursue multiple hypotheses simultaneously.
But exploration is dangerous when it happens on production data. If the agent writes incorrect results, corrupts a table, or inserts millions of junk rows, the damage is immediate and potentially irreversible. Traditional solutions — copying entire databases, using transactions with savepoints, or running separate database instances — are either too slow, too expensive, or too complex for agent-scale workloads.
HatiData solves this with branch isolation: a lightweight mechanism for creating isolated copies of an agent's data environment that are cheap to create, fast to query, and safe to discard. Branches use the engine's native schema system, which means they have zero overhead for read-only exploration and minimal overhead for write operations.
How Branch Isolation Works
Schema-Based Isolation
Each branch in HatiData is an isolated schema with a UUID-based name: branch_{uuid}. When an agent creates a branch, HatiData creates a new schema and populates it with views that point to the tables in the main schema. No data is copied — the views are zero-cost references to the existing data.
-- What HatiData does internally when creating a branch
CREATE SCHEMA branch_a1b2c3d4;
-- For each table in the main schema, create a view
CREATE VIEW branch_a1b2c3d4.customers AS
SELECT * FROM main.customers;
CREATE VIEW branch_a1b2c3d4.orders AS
SELECT * FROM main.orders;
CREATE VIEW branch_a1b2c3d4.products AS
SELECT * FROM main.products;Queries within the branch see exactly the same data as the main schema, but in a completely isolated namespace. The agent can reference customers within its branch context without any ambiguity — HatiData sets the search_path to the branch schema before executing each query.
Copy-on-Write Materialization
The zero-copy views work perfectly for read-only exploration. But when an agent writes to a table in a branch — inserting rows, updating values, or deleting records — HatiData needs to materialize that table. This is the copy-on-write step: the first write to any table triggers a full copy from the main schema into the branch schema, replacing the view with a real table.
-- First write to customers in the branch triggers materialization
DROP VIEW branch_a1b2c3d4.customers;
CREATE TABLE branch_a1b2c3d4.customers AS
SELECT * FROM main.customers;
-- Now the write can proceed on the materialized copy
INSERT INTO branch_a1b2c3d4.customers (id, name, segment)
VALUES ('cust_new', 'Acme Corp', 'enterprise');After materialization, the branch has its own independent copy of that table. Changes to the branch table do not affect the main schema, and changes to the main schema do not affect the branch. Tables that the agent never writes to remain as zero-cost views, pointing at the live main data.
This copy-on-write approach means that a branch exploring 1 table out of 50 only copies that 1 table. The other 49 tables consume zero additional storage.
The Branch Lifecycle
A complete branch lifecycle follows this pattern:
- 1Create — Agent calls
branch_createwith an optional description. HatiData creates the schema and populates it with views. Returns a branch ID.
- 1Query — Agent calls
branch_queryto read data within the branch. Reads go through the views for unmodified tables, through materialized copies for modified tables.
- 1Write — Agent writes to tables within the branch. First write to each table triggers copy-on-write materialization. Subsequent writes to the same table operate on the materialized copy.
- 1Merge or Discard — Agent either merges changes back to main with
branch_merge, or discards the branch withbranch_discard.
{
"tool": "branch_create",
"arguments": {
"description": "Testing pricing model changes for enterprise tier"
}
}Response:
{
"branch_id": "br_a1b2c3d4",
"schema": "branch_a1b2c3d4",
"tables": 12,
"created_at": "2026-03-01T10:00:00Z"
}Merge Strategies
When an agent is satisfied with the changes in a branch, it merges them back into the main schema. But merging is not always straightforward — the main schema may have changed since the branch was created, creating conflicts. HatiData supports four merge strategies to handle different conflict scenarios.
BranchWins
The simplest strategy: if a conflict exists, the branch version takes precedence. Any rows in the main schema that conflict with branch rows are overwritten. This is appropriate when the agent's branch work is authoritative and should supersede any concurrent changes.
MainWins
The opposite: main schema data takes precedence. Only branch changes that do not conflict with main are applied. This is useful for tentative exploration where the agent wants to keep non-conflicting additions without overwriting concurrent updates.
Manual
HatiData detects conflicts and reports them without merging. The response includes a list of conflicting rows with both the branch and main versions. A human operator (or a different agent) can then resolve each conflict individually.
Abort
If any conflicts are detected, the entire merge is aborted. No changes are applied to the main schema. This is the safest option for workflows where conflicts indicate a problem that needs investigation.
{
"tool": "branch_merge",
"arguments": {
"branch_id": "br_a1b2c3d4",
"strategy": "branch_wins"
}
}Conflict detection works at the row level using primary keys. If a row exists in both the main schema and the branch with the same primary key but different values, that is a conflict. New rows in the branch (no matching primary key in main) are always added without conflict.
Garbage Collection
Branches consume resources — schemas, materialized tables, and metadata entries. Left unchecked, abandoned branches could accumulate and waste storage. HatiData includes automatic garbage collection that cleans up unused branches.
The garbage collector tracks two metrics per branch:
- Reference count — An atomic counter incremented when a query targets the branch and decremented when the query completes. A branch with zero references has no active queries.
- Last accessed timestamp — Updated on every query or write to the branch.
Branches are eligible for garbage collection when they have zero active references and have not been accessed for longer than the configured TTL (default 24 hours). The collector runs periodically (default every hour) and drops eligible branch schemas with all their contents.
Garbage Collector
|
+── Check reference counts (per branch)
|
+── Check last_accessed timestamps
|
+── For eligible branches:
| DROP SCHEMA branch_{uuid} CASCADE;
| Remove metadata entries
|
+── Log cleanup summaryAgents can also explicitly discard branches with branch_discard, which bypasses the TTL and immediately cleans up the branch resources. This is the recommended approach for well-behaved agents that know when their exploration is complete.
Practical Patterns
What-If Analysis
A financial agent creates a branch, modifies pricing assumptions in a configuration table, runs revenue projections, and compares results against the main schema. If the new pricing looks better, it merges the configuration change. If not, it discards the branch and tries different assumptions.
Safe Data Transformation
A data engineering agent creates a branch, applies a series of transformations to a staging table, validates the output against quality rules, and only merges if all validations pass. The main data is never at risk during the transformation development process.
Parallel Hypothesis Testing
A research agent creates multiple branches simultaneously, each pursuing a different hypothesis. Each branch modifies data independently — adding derived columns, filtering datasets, computing aggregations. The agent queries all branches, compares results, and merges the most promising one while discarding the rest.
A/B Testing Agent Strategies
Two agent instances each get their own branch with the same starting data. They apply different strategies, and the results are compared. The winning strategy's branch is merged while the other is discarded. This is useful for evaluating agent behavior without the agents interfering with each other.
Performance Characteristics
Branch operations are designed to be fast enough for interactive agent workflows:
- Branch creation: Under 10ms for schemas with up to 100 tables, since no data is copied
- Read queries: Identical performance to main schema queries, since views are transparent to the query optimizer
- First write (materialization): Proportional to table size, as the full table must be copied. For a 1M row table with 10 columns, expect 200-500ms.
- Subsequent writes: Identical performance to main schema writes, since the table is already materialized
- Merge: Proportional to the number of changed rows, not total table size. Conflict detection uses primary key indexes.
- Discard: Under 5ms, as DROP SCHEMA CASCADE is very fast
Next Steps
Branch isolation works particularly well with chain-of-thought logging — every reasoning step within a branch is tagged with the branch ID, so you can replay the exact thinking that led to each branch's results. See the branch isolation documentation for configuration options, and the OpenAI branch isolation cookbook for a complete agent implementation.