Multi-Cloud Agent Deployment: AWS, GCP, Azure
The Multi-Cloud Reality
Organizations do not choose a single cloud anymore. An acquisition brings in a team that runs on GCP while the parent company is on AWS. A data residency requirement mandates Azure in Europe. A cost optimization initiative moves dev workloads to one cloud while production stays on another. The reality is multi-cloud, whether planned or not.
Agent infrastructure that only runs on one cloud becomes a constraint. If your agent data warehouse only deploys on AWS, your GCP-native team cannot use it without cross-cloud networking — adding latency, cost, and operational complexity. If your agents are locked to one cloud's identity system, you cannot leverage managed identities on other clouds.
HatiData runs the same native binary on all three major clouds. The same codebase, the same feature set, the same API. Cloud-specific implementations are abstracted behind clean interfaces and selected at compile time.
Compile-Time Cloud Selection
HatiData supports three cloud targets: AWS, GCP, and Azure. The default build includes support for all three clouds. For deployments that only need one cloud, a targeted build reduces binary size and eliminates unnecessary dependencies.
The local-only build is useful for development and testing — it includes the query engine, the query pipeline, MCP tools, and all agent features, but uses local filesystem storage instead of cloud blob stores, and local key files instead of cloud KMS.
Cloud-Agnostic Abstractions
Three core infrastructure concerns vary between clouds: blob storage, key management, and identity federation. HatiData abstracts each behind a clean interface with cloud-specific implementations.
BlobStorageClient
The BlobStorageClient trait provides a unified interface for object storage:
// Simplified trait definition
trait BlobStorageClient: Send + Sync {
async fn put_object(&self, key: &str, data: &[u8]) -> Result<()>;
async fn get_object(&self, key: &str) -> Result<Vec<u8>>;
async fn delete_object(&self, key: &str) -> Result<()>;
async fn list_objects(&self, prefix: &str) -> Result<Vec<String>>;
}Implementations:
- S3Client (cloud-aws) — Uses AWS SDK for S3
- GcsClient (cloud-gcp) — Uses Google Cloud Storage client
- AzureBlobClient (cloud-azure) — Uses Azure Storage Blobs SDK
- LocalFsClient (always available) — Uses local filesystem
A factory function selects the implementation based on the configured cloud provider. The rest of the codebase uses the interface, never the concrete implementation.
KeyManager
The KeyManager trait provides encryption key management:
trait KeyManager: Send + Sync {
async fn encrypt(&self, plaintext: &[u8]) -> Result<Vec<u8>>;
async fn decrypt(&self, ciphertext: &[u8]) -> Result<Vec<u8>>;
async fn generate_data_key(&self) -> Result<DataKey>;
}Implementations:
- AwsKmsKeyManager (cloud-aws) — Uses AWS KMS for envelope encryption
- GcpKmsKeyManager (cloud-gcp) — Uses Google Cloud KMS
- AzureKeyVaultKeyManager (cloud-azure) — Uses Azure Key Vault
- LocalKeyManager (always available) — Uses local key files for development
FederationProvider
The FederationProvider trait handles cloud-native identity:
- GCP Workload Identity — Allows pods running on GKE to authenticate as GCP service accounts without managing keys
- Azure Managed Identity — Allows VMs and containers to authenticate to Azure services without credentials in code
These integrations mean that HatiData running on GKE uses GCP's native identity system, and HatiData running on AKS uses Azure's native identity system. No cloud credentials need to be stored in configuration files or environment variables.
Terraform Modules
Each cloud has a dedicated set of Terraform modules in the terraform/ directory:
terraform/
aws/
modules/
eks/ # Amazon EKS cluster
privatelink/ # VPC endpoint service
s3/ # S3 buckets with encryption
kms/ # KMS key management
iam/ # IAM roles and policies
gcp/
modules/
gke/ # Google Kubernetes Engine
cloud_run/ # Serverless deployment option
cloudsql/ # PostgreSQL for control plane
secrets/ # Secret Manager integration
registry/ # Artifact Registry for images
cloud_kms/ # Cloud KMS key management
azure/
modules/
aks/ # Azure Kubernetes Service
private_link/ # Private endpoint configuration
blob/ # Azure Blob Storage
key_vault/ # Key Vault for secrets and keys
identity/ # Managed Identity assignments
networking/ # VNet, NSG, peering
shared/
variables.tf # Common variables
outputs.tf # Common outputsEach cloud's modules are independent — you can deploy on GCP without any AWS or Azure Terraform state. The shared directory contains common variable definitions for consistency.
Deployment is configured through environment-specific variable files:
# Deploy to GCP production
cd terraform/gcp
terraform plan -var-file=environments/production.tfvars
# Deploy to AWS staging
cd terraform/aws
terraform plan -var-file=environments/staging.tfvarsConfiguration Fallback Chains
To ease migration between clouds, HatiData's configuration supports fallback chains. Cloud-agnostic configuration names (e.g., storage bucket, cloud region, storage endpoint) are preferred, but legacy AWS-specific names are automatically detected and used if the new names are not set.
This means existing AWS deployments continue to work with their current configuration, while new deployments use the cloud-agnostic names.
CI Matrix Testing
HatiData's CI pipeline tests five build variants on every push:
| Variant | What It Tests |
|---|---|
| all-clouds | Full build with all cloud support |
| aws-only | AWS-specific deployment |
| gcp-only | GCP-specific deployment |
| azure-only | Azure-specific deployment |
| local-only | No cloud dependencies |
This matrix ensures that cloud-specific code does not accidentally depend on another cloud's types or that local-only builds do not break when cloud code changes. The CI also runs terraform validate for each cloud's Terraform modules.
Practical Multi-Cloud Patterns
Primary Cloud + Edge Regions
Run your primary data plane on your main cloud (e.g., AWS us-east-1) and deploy edge data planes on other clouds for specific regions (e.g., GCP asia-southeast1 for APAC, Azure westeurope for EU). The control plane manages all data planes from a single dashboard.
Disaster Recovery
Run a standby data plane on a different cloud from your primary. If the primary cloud experiences an outage, failover to the standby. Since HatiData uses standard storage formats (Parquet, Arrow), data replication between clouds uses cloud-native tools like cross-cloud object replication.
Cost Arbitrage
Different clouds offer different pricing for the same workloads. Run batch analytics on whichever cloud offers the best spot pricing today. HatiData's cloud-agnostic interface means your agents do not need to know which cloud they are querying — the connection string changes, but the queries stay the same.
Next Steps
For cloud-specific deployment guides with step-by-step instructions, see the multi-cloud documentation. The Terraform modules include example variable files for dev, staging, and production environments on each cloud.