AI Engineering

Google Vertex AI Agent Builder + HatiData

HatiData TeamMarch 10, 20266 min read

Vertex AI Agent Builder and the Memory Gap

Google's Vertex AI Agent Builder provides a managed platform for building AI agents powered by Gemini models. It handles model serving, tool orchestration, and conversation management. But like most agent frameworks, it treats each conversation as stateless — the agent has no memory of previous interactions, no accumulated knowledge, and no persistent state.

For enterprise deployments where agents handle complex workflows over weeks and months — customer onboarding sequences, incident management across shifts, ongoing compliance monitoring — this statelessness is a significant limitation. The agent that helped a customer yesterday has no recollection of it today.

HatiData adds persistent memory to Vertex AI agents through its GCP-native deployment. Since HatiData runs on Cloud Run in the same GCP project as your Vertex AI resources, the integration is seamless — no cross-cloud networking, no additional credentials, and no data leaving your GCP environment.

GCP-Native Deployment

HatiData's GCP deployment uses Cloud Run for the control plane and GKE (or a standalone VM) for the data plane. Both run in your GCP project, authenticated via Workload Identity.

Your GCP Project
+--------------------------------------------------+
|                                                  |
|  Vertex AI Agent Builder                        |
|       |                                          |
|       | Function calls                           |
|       v                                          |
|  Cloud Functions (tools)                        |
|       |                                          |
|       | HTTP API                                 |
|       v                                          |
|  HatiData Data Plane (GKE/VM)                   |
|    +── Query Engine (SSD-backed)                |
|    +── Vector Index                             |
|    +── MCP Server                               |
|                                                  |
|  HatiData Control Plane (Cloud Run)             |
|    +── Dashboard                                 |
|    +── Billing                                   |
|    +── Auth (WorkOS)                             |
|                                                  |
+--------------------------------------------------+

Workload Identity

HatiData on GKE uses GCP Workload Identity for authentication. The HatiData pod runs as a Kubernetes service account that is bound to a GCP service account via Workload Identity Federation. This means:

No service account key files stored in environment variables or secrets
No credential rotation — GCP handles token issuance and renewal automatically
IAM policies control access to Cloud Storage, Cloud KMS, and other GCP services
Full audit trail in Cloud Audit Logs

Building a Vertex AI Agent with Memory

Vertex AI Agent Builder agents use tool definitions to interact with external systems. HatiData is integrated as a set of Cloud Functions that the agent can call:

python

# Cloud Function: store_memory
import functions_framework
import httpx

HATIDATA_URL = "http://hatidata-proxy.internal:5439"  # Internal GCP endpoint
HATIDATA_KEY = "hd_live_vertex_agent_key"

@functions_framework.http
def store_memory(request):
    """Store a memory in HatiData."""
    data = request.get_json()
    response = httpx.post(
        f"{HATIDATA_URL}/v1/memory/store",
        headers={"Authorization": f"Bearer {HATIDATA_KEY}"},
        json={
            "content": data["content"],
            "namespace": data.get("namespace", "vertex-agent"),
            "metadata": data.get("metadata", {}),
        },
    )
    return response.json()

@functions_framework.http
def search_memory(request):
    """Search memories in HatiData."""
    data = request.get_json()
    response = httpx.post(
        f"{HATIDATA_URL}/v1/memory/search",
        headers={"Authorization": f"Bearer {HATIDATA_KEY}"},
        json={
            "query": data["query"],
            "namespace": data.get("namespace", "vertex-agent"),
            "limit": data.get("limit", 5),
        },
    )
    return response.json()

@functions_framework.http
def query_data(request):
    """Execute a SQL query against HatiData."""
    data = request.get_json()
    response = httpx.post(
        f"{HATIDATA_URL}/v1/query",
        headers={"Authorization": f"Bearer {HATIDATA_KEY}"},
        json={"sql": data["sql"]},
    )
    return response.json()

Agent Builder Tool Configuration

In the Vertex AI Agent Builder console, configure these Cloud Functions as tools:

yaml

tools:
  - name: store_memory
    description: "Store important information for future conversations"
    parameters:
      content:
        type: string
        description: "The information to remember"
      namespace:
        type: string
        description: "Category for the memory"
      metadata:
        type: object
        description: "Additional structured data"
    endpoint: https://us-central1-your-project.cloudfunctions.net/store_memory

  - name: search_memory
    description: "Search past conversations and stored knowledge"
    parameters:
      query:
        type: string
        description: "What to search for"
      namespace:
        type: string
        description: "Category to search within"
      limit:
        type: integer
        description: "Maximum number of results"
    endpoint: https://us-central1-your-project.cloudfunctions.net/search_memory

  - name: query_data
    description: "Run SQL queries against the data warehouse"
    parameters:
      sql:
        type: string
        description: "SQL query to execute"
    endpoint: https://us-central1-your-project.cloudfunctions.net/query_data

Gemini Agent Patterns

Pattern 1: Memory-Augmented Customer Support

A Vertex AI agent handles customer support with persistent memory:

Customer: "Hi, I'm having trouble with my integration again"

Agent thinking:
1. search_memory("customer integration trouble") -> Retrieves:
   "Customer had OAuth token refresh issue on 2026-02-15, resolved by updating redirect URI"
2. Agent responds: "I see you had an OAuth integration issue last month.
   Is this the same redirect URI problem, or a different integration issue?"

Customer: "Different issue - the webhook isn't firing"

Agent thinking:
1. store_memory("Customer reporting webhook not firing, different from previous OAuth issue")
2. query_data("SELECT * FROM webhook_logs WHERE customer_id = 'cust_123' ORDER BY created_at DESC LIMIT 10")
3. Agent responds with specific diagnostic information based on webhook logs

Pattern 2: Enterprise Onboarding Sequences

A Vertex AI agent guides customers through a multi-step onboarding process that spans days:

Day 1:
  Agent: search_memory("onboarding progress customer_456") -> No results (new customer)
  Agent: Guides through Step 1 (account setup)
  Agent: store_memory("customer_456 completed Step 1: account setup, preferences: enterprise tier, APAC region")

Day 3:
  Agent: search_memory("onboarding progress customer_456") -> Retrieves Day 1 progress
  Agent: "Welcome back! You completed account setup on Monday. Ready for Step 2: data import?"
  Agent: store_memory("customer_456 completed Step 2: data import, 3 tables imported, 1.2M rows total")

Day 5:
  Agent: search_memory("onboarding progress customer_456") -> Retrieves Day 1 and Day 3 progress
  Agent: "Great progress! Account setup and data import are complete. Let's move to Step 3: query testing."

Pattern 3: Compliance Monitoring

A Vertex AI agent monitors data access patterns and flags anomalies:

python

# Agent runs on a schedule (Cloud Scheduler -> Cloud Function -> Vertex AI)
# Each run:
# 1. search_memory("recent compliance alerts") for context
# 2. query_data("SELECT * FROM audit_log WHERE ...") for current data
# 3. Compare against policies
# 4. store_memory("Compliance check: 3 anomalies detected - ...") for history
# 5. If critical: trigger alert via webhook

Private Networking

For production deployments, all communication between Vertex AI, Cloud Functions, and HatiData stays within GCP's internal network:

Cloud Functions connect to HatiData via VPC connector (Serverless VPC Access)
HatiData runs on an internal IP address with no public exposure
Vertex AI Agent Builder calls Cloud Functions via their HTTPS endpoint (within GCP backbone)

This means no data traverses the public internet. The entire pipeline — from user request to memory storage — stays within your GCP project's network boundary.

Terraform Deployment

HatiData's GCP Terraform modules deploy the full stack:

bash

cd terraform/gcp
terraform init
terraform plan -var-file=environments/production.tfvars
terraform apply

The deployment creates:

GKE cluster (or Cloud Run service) for the data plane
Cloud Run service for the control plane
Cloud SQL for control plane metadata
Secret Manager entries for API keys and configuration
Artifact Registry for container images
IAM bindings for Workload Identity
VPC networking with private subnets

Cost Optimization on GCP

Running HatiData alongside Vertex AI on GCP provides several cost advantages:

No cross-cloud data transfer — All data stays within GCP, avoiding egress charges
Spot/preemptible nodes — The HatiData data plane can run on spot instances for dev/staging environments (roughly 60-70% savings)
Committed use discounts — GCP CUD pricing applies to the compute resources running HatiData
Auto-scaling — Cloud Run scales the control plane to zero when not in use, and GKE node auto-scaler adjusts data plane capacity

Next Steps

The Vertex AI integration is ideal for GCP-native organizations building enterprise agents with Gemini models. For multi-cloud deployments, see the multi-cloud architecture guide. For the complete set of 24 MCP tools (which can also be exposed as Cloud Functions), see the MCP tools reference. For other framework integrations, see the LangChain, CrewAI, and OpenAI Agents SDK guides.