dbtIntermediate25 min

dbt + HatiData: dbt Adapter Transforms

Use the HatiData dbt adapter to run SQL-based data transformations in the agent data warehouse. Create models, run tests, and build data pipelines.

What You'll Build

A dbt project using the HatiData adapter for SQL-based data transformations in the agent data warehouse.

Prerequisites

$pip install dbt-core dbt-hatidata

$hati init

Architecture

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  dbt Models  │───▶│  HatiData    │───▶│   Engine     │
│  (SQL)       │    │  Adapter     │    │  (execute)   │
└──────────────┘    └──────────────┘    └──────────────┘
  models/ → compiled SQL → proxy (:5439) → materialized tables

Key Concepts

●Postgres wire protocol: HatiData's proxy speaks Postgres, so any dbt Postgres-compatible adapter works out of the box
●Model layering: staging models clean raw data, marts models aggregate for analytics — standard dbt best practice
●Fast execution: SQL transformations run on HatiData's query engine for fast analytical processing
●Data testing: dbt tests validate data quality after every transformation run

Step-by-Step Implementation

Install the dbt Adapter

Install dbt-core and the HatiData adapter.

Bash

pip install dbt-core dbt-hatidata
hati init
dbt --version

Expected Output

Core:
  - installed: 1.7.0

Plugins:
  - hatidata: 1.0.0

Note: The dbt-hatidata adapter connects to HatiData's Postgres-compatible proxy on port 5439.

Configure profiles.yml

Set up the dbt connection profile for HatiData.

YAML

# ~/.dbt/profiles.yml
hatidata_project:
  target: dev
  outputs:
    dev:
      type: hatidata
      host: localhost
      port: 5439
      user: admin
      password: ""
      dbname: main
      schema: analytics
      threads: 4

Expected Output

# Save to ~/.dbt/profiles.yml

Note: HatiData's proxy speaks Postgres wire protocol, so the dbt adapter uses standard Postgres connection settings.

Create dbt Models

Write SQL models that transform raw agent data into analytics-ready tables.

SQL

-- models/staging/stg_agent_events.sql
WITH raw_events AS (
    SELECT
        agent_id,
        event_type,
        payload,
        created_at
    FROM raw.agent_events
    WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
)

SELECT
    agent_id,
    event_type,
    json_extract_string(payload, '$.duration_ms')::DOUBLE AS duration_ms,
    json_extract_string(payload, '$.token_count')::INTEGER AS token_count,
    json_extract_string(payload, '$.model')::VARCHAR AS model_name,
    created_at
FROM raw_events

-- models/marts/agent_daily_summary.sql
SELECT
    agent_id,
    DATE_TRUNC('day', created_at) AS date,
    COUNT(*) AS total_events,
    AVG(duration_ms) AS avg_duration_ms,
    SUM(token_count) AS total_tokens,
    COUNT(CASE WHEN event_type = 'error' THEN 1 END) AS error_count,
    1.0 - (COUNT(CASE WHEN event_type = 'error' THEN 1 END)::DOUBLE / COUNT(*)) AS success_rate
FROM {{ ref('stg_agent_events') }}
GROUP BY agent_id, DATE_TRUNC('day', created_at)

Expected Output

-- Models created in models/staging/ and models/marts/

Note: dbt models are pure SQL with Jinja templating. The {{ ref() }} function handles dependency ordering automatically.

Run Transformations

Execute dbt run to build all models in the correct dependency order.

Bash

dbt run --profiles-dir ~/.dbt --project-dir .

# Check the results
dbt run --select agent_daily_summary

Expected Output

Running with dbt=1.7.0
Found 2 models, 0 tests, 0 snapshots, 0 seeds

Concurrency: 4 threads (target='dev')

1 of 2 START sql table model analytics.stg_agent_events ................ [RUN]
1 of 2 OK created sql table model analytics.stg_agent_events .......... [OK in 0.82s]
2 of 2 START sql table model analytics.agent_daily_summary ............ [RUN]
2 of 2 OK created sql table model analytics.agent_daily_summary ....... [OK in 0.45s]

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Add dbt Tests

Write data quality tests to validate the transformed data.

YAML

# models/marts/schema.yml
version: 2

models:
  - name: agent_daily_summary
    description: "Daily aggregated agent performance metrics"
    columns:
      - name: agent_id
        tests:
          - not_null
      - name: date
        tests:
          - not_null
      - name: success_rate
        tests:
          - not_null
          - accepted_values:
              values: [0.0, 1.0]
              quote: false
              config:
                where: "success_rate < 0 OR success_rate > 1"
                severity: error

Expected Output

# Save to models/marts/schema.yml

Note: dbt tests run as SQL queries that assert conditions. Failed tests produce warnings or errors based on severity.

Run Tests and Generate Docs

Execute dbt tests and generate documentation.

Bash

# Run all tests
dbt test --profiles-dir ~/.dbt --project-dir .

# Generate and serve docs
dbt docs generate
dbt docs serve --port 8081

Expected Output

Running with dbt=1.7.0
Found 2 models, 3 tests, 0 snapshots, 0 seeds

1 of 3 START test not_null_agent_daily_summary_agent_id ............... [PASS in 0.12s]
2 of 3 START test not_null_agent_daily_summary_date ................... [PASS in 0.11s]
3 of 3 START test accepted_values_agent_daily_summary_success_rate .... [PASS in 0.15s]

Completed successfully

Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

Catalog written to target/catalog.json
Serving docs at http://localhost:8081

Note: dbt docs generate creates a browsable data catalog. Great for onboarding new team members to the data model.

Related Use Case

Operations

Supply Chain Intelligence

Disruption Detection Before It Hits

Ready to build?

Install HatiData locally and start building with dbt in minutes.

Join Waitlist