Shadow Mode: Zero-Risk Data Warehouse Migration
The Migration Fear
Data warehouse migrations fail for predictable reasons. The new system handles 95% of queries perfectly, but the other 5% produce subtly different results — rounding differences in floating point arithmetic, different null handling semantics, timezone conversion discrepancies, or edge cases in SQL dialect translation. These differences are impossible to discover through testing alone. They only surface in production, weeks or months after the migration, when someone notices that a report no longer matches expectations.
This fear keeps organizations on legacy infrastructure long after they have decided to move. The technical evaluation is complete, the cost analysis is favorable, but the migration itself is too risky. What if something breaks? What if the numbers do not match? What if we have to roll back after agents are already running on the new system?
Shadow Mode eliminates this fear by removing the need for a hard cutover. Instead of migrating all at once, you run both systems simultaneously and let the data prove that the new system is ready.
How Shadow Mode Works
Shadow Mode operates as a transparent proxy between your agents and your data infrastructure. Every query your agents send is executed twice — once against your existing warehouse and once against HatiData — and the results are automatically compared.
Agent sends query
|
v
Shadow Mode Proxy
|
+──> Existing Warehouse ──> Results A
|
+──> HatiData ──> Results B
|
v
Comparison Engine
|
+── Match: Results identical (or within tolerance)
+── Mismatch: Differences logged with full context
|
v
Return Results A to Agent (existing warehouse is still primary)The critical property: agents always receive results from the existing warehouse. HatiData's results are used purely for comparison. This means Shadow Mode has zero impact on your production workloads — if HatiData produces different results, the agent never sees them. The differences are logged for analysis, but the agent continues operating normally on the existing system.
Setting Up Shadow Mode
Shadow Mode requires three things:
- 1Your existing warehouse connection — The connection string for your current database, with read-only credentials
- 2A HatiData instance — Either a local installation or a cloud deployment
- 3Data sync — Your data replicated to HatiData (via the CLI sync tools or direct import)
The proxy is configured with a simple YAML file:
shadow_mode:
enabled: true
primary: existing_warehouse
shadow: hatidata
comparison:
float_tolerance: 0.0001
null_equality: true
order_sensitive: false
max_rows_to_compare: 10000
existing_warehouse:
type: postgres
host: your-warehouse.example.com
port: 5432
database: production
user: readonly_user
hatidata:
host: localhost
port: 5439
user: adminThe Comparison Engine
Not all result differences are meaningful. Shadow Mode's comparison engine understands the common categories of harmless differences and filters them from the mismatch reports.
Floating Point Tolerance
Different databases compute floating point arithmetic with slightly different precision. A SUM operation might return 1234567.890001 from one system and 1234567.890002 from another. Shadow Mode applies a configurable tolerance (default 0.0001) to floating point comparisons, marking results as matching if they are within tolerance.
Row Ordering
When a query does not include an ORDER BY clause, the row order is undefined. Different databases return rows in different orders based on internal storage layout and query execution strategy. Shadow Mode compares result sets without regard to order by default, only checking order when the query explicitly includes ORDER BY.
Null Handling
Different SQL engines handle NULL comparisons differently in some edge cases. Shadow Mode provides a null_equality flag that controls whether two NULL values are considered equal in comparisons.
Type Coercion
A column might be returned as INTEGER from one system and BIGINT from another, or as FLOAT32 vs FLOAT64. Shadow Mode compares values after normalizing to common types, so type differences that do not affect value accuracy are not flagged as mismatches.
The Migration Dashboard
Shadow Mode includes a dedicated dashboard view that tracks comparison results over time:
- Match rate — Percentage of queries that produce identical results (the target is 100%)
- Mismatch categories — Breakdown of differences by type (value, type, row count, column count, error)
- Latency comparison — Side-by-side latency for each query on both systems
- Trending — Match rate over time, showing whether the rate is improving as dialect issues are resolved
- Mismatch drill-down — For each mismatched query, the exact SQL, both result sets, and highlighted differences
The dashboard makes it easy to identify which queries need attention. Typically, the first week of Shadow Mode reveals a small number of SQL dialect issues that need transpiler fixes. Once those are resolved, the match rate approaches 100% and stays there.
Migration Phases
Phase 1: Shadow (Week 1-2)
Enable Shadow Mode with your existing warehouse as primary. All agents continue using the existing warehouse. HatiData runs every query in parallel and results are compared. Fix any dialect or transpilation issues that cause mismatches.
Phase 2: Validation (Week 2-3)
Once the match rate has been 100% for at least 7 consecutive days, you have statistical confidence that HatiData produces identical results for your workload. Review the latency comparison to ensure HatiData meets your performance requirements.
Phase 3: Flip (Day 1 of migration)
Switch the primary to HatiData. Agents now receive results from HatiData, and the existing warehouse becomes the shadow. This is a single configuration change — no agent code modifications needed.
Phase 4: Verify (Week 1 post-flip)
Run with HatiData as primary and your existing warehouse as shadow for at least one week. Confirm that all agents continue operating correctly. The shadow comparison provides an automatic safety net — if any query produces different results, you have immediate visibility.
Phase 5: Decommission
Once you are confident in the migration, disable Shadow Mode and decommission the existing warehouse connection. The migration is complete.
Zero Downtime Guarantees
At no point during the Shadow Mode migration does any agent experience downtime. The proxy handles the primary/shadow flip transparently:
- No connection string changes for agents
- No query syntax changes
- No schema modifications
- No application code changes
- No deployment coordination
The agents are unaware that a migration is happening. They send the same queries to the same endpoint and receive the same results. The only change is which system is providing those results.
Performance Impact
Shadow Mode does add overhead — every query runs twice, and the comparison engine processes both result sets. However, this overhead affects latency, not correctness:
- Latency — Agents experience the latency of the primary system only. Shadow execution is asynchronous and does not block the response.
- Network — Additional bandwidth for the shadow queries. For most workloads, this is negligible.
- Compute — The shadow system (HatiData) consumes compute resources for its parallel execution. This is the cost of validation.
For cost-sensitive environments, Shadow Mode supports sampling — only a configurable percentage of queries (e.g., 10%) are executed on both systems. This reduces the compute cost while still providing statistical confidence in result equivalence.
Next Steps
Shadow Mode is available on all HatiData tiers, including the free local installation. To start a Shadow Mode evaluation, install HatiData locally, sync your data with the CLI, and configure the shadow proxy. The entire setup takes under an hour for most workloads.