← All Cookbooks
dbtIntermediate25 min

dbt + HatiData: dbt Adapter Transforms

Use the HatiData dbt adapter to run SQL-based data transformations in the agent data warehouse. Create models, run tests, and build data pipelines.

What You'll Build

A dbt project using the HatiData adapter for SQL-based data transformations in the agent data warehouse.

Prerequisites

$pip install dbt-core dbt-hatidata

$hati init

Architecture

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  dbt Models  │───▶│  HatiData    │───▶│   Engine     │
│  (SQL)       │    │  Adapter     │    │  (execute)   │
└──────────────┘    └──────────────┘    └──────────────┘
  models/ → compiled SQL → proxy (:5439) → materialized tables

Key Concepts

  • Postgres wire protocol: HatiData's proxy speaks Postgres, so any dbt Postgres-compatible adapter works out of the box
  • Model layering: staging models clean raw data, marts models aggregate for analytics — standard dbt best practice
  • Fast execution: SQL transformations run on HatiData's query engine for fast analytical processing
  • Data testing: dbt tests validate data quality after every transformation run

Step-by-Step Implementation

1

Install the dbt Adapter

Install dbt-core and the HatiData adapter.

Bash
pip install dbt-core dbt-hatidata
hati init
dbt --version
Expected Output
Core:
  - installed: 1.7.0

Plugins:
  - hatidata: 1.0.0

Note: The dbt-hatidata adapter connects to HatiData's Postgres-compatible proxy on port 5439.

2

Configure profiles.yml

Set up the dbt connection profile for HatiData.

YAML
# ~/.dbt/profiles.yml
hatidata_project:
  target: dev
  outputs:
    dev:
      type: hatidata
      host: localhost
      port: 5439
      user: admin
      password: ""
      dbname: main
      schema: analytics
      threads: 4
Expected Output
# Save to ~/.dbt/profiles.yml

Note: HatiData's proxy speaks Postgres wire protocol, so the dbt adapter uses standard Postgres connection settings.

3

Create dbt Models

Write SQL models that transform raw agent data into analytics-ready tables.

SQL
-- models/staging/stg_agent_events.sql
WITH raw_events AS (
    SELECT
        agent_id,
        event_type,
        payload,
        created_at
    FROM raw.agent_events
    WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
)

SELECT
    agent_id,
    event_type,
    json_extract_string(payload, '$.duration_ms')::DOUBLE AS duration_ms,
    json_extract_string(payload, '$.token_count')::INTEGER AS token_count,
    json_extract_string(payload, '$.model')::VARCHAR AS model_name,
    created_at
FROM raw_events

-- models/marts/agent_daily_summary.sql
SELECT
    agent_id,
    DATE_TRUNC('day', created_at) AS date,
    COUNT(*) AS total_events,
    AVG(duration_ms) AS avg_duration_ms,
    SUM(token_count) AS total_tokens,
    COUNT(CASE WHEN event_type = 'error' THEN 1 END) AS error_count,
    1.0 - (COUNT(CASE WHEN event_type = 'error' THEN 1 END)::DOUBLE / COUNT(*)) AS success_rate
FROM {{ ref('stg_agent_events') }}
GROUP BY agent_id, DATE_TRUNC('day', created_at)
Expected Output
-- Models created in models/staging/ and models/marts/

Note: dbt models are pure SQL with Jinja templating. The {{ ref() }} function handles dependency ordering automatically.

4

Run Transformations

Execute dbt run to build all models in the correct dependency order.

Bash
dbt run --profiles-dir ~/.dbt --project-dir .

# Check the results
dbt run --select agent_daily_summary
Expected Output
Running with dbt=1.7.0
Found 2 models, 0 tests, 0 snapshots, 0 seeds

Concurrency: 4 threads (target='dev')

1 of 2 START sql table model analytics.stg_agent_events ................ [RUN]
1 of 2 OK created sql table model analytics.stg_agent_events .......... [OK in 0.82s]
2 of 2 START sql table model analytics.agent_daily_summary ............ [RUN]
2 of 2 OK created sql table model analytics.agent_daily_summary ....... [OK in 0.45s]

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
5

Add dbt Tests

Write data quality tests to validate the transformed data.

YAML
# models/marts/schema.yml
version: 2

models:
  - name: agent_daily_summary
    description: "Daily aggregated agent performance metrics"
    columns:
      - name: agent_id
        tests:
          - not_null
      - name: date
        tests:
          - not_null
      - name: success_rate
        tests:
          - not_null
          - accepted_values:
              values: [0.0, 1.0]
              quote: false
              config:
                where: "success_rate < 0 OR success_rate > 1"
                severity: error
Expected Output
# Save to models/marts/schema.yml

Note: dbt tests run as SQL queries that assert conditions. Failed tests produce warnings or errors based on severity.

6

Run Tests and Generate Docs

Execute dbt tests and generate documentation.

Bash
# Run all tests
dbt test --profiles-dir ~/.dbt --project-dir .

# Generate and serve docs
dbt docs generate
dbt docs serve --port 8081
Expected Output
Running with dbt=1.7.0
Found 2 models, 3 tests, 0 snapshots, 0 seeds

1 of 3 START test not_null_agent_daily_summary_agent_id ............... [PASS in 0.12s]
2 of 3 START test not_null_agent_daily_summary_date ................... [PASS in 0.11s]
3 of 3 START test accepted_values_agent_daily_summary_success_rate .... [PASS in 0.15s]

Completed successfully

Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

Catalog written to target/catalog.json
Serving docs at http://localhost:8081

Note: dbt docs generate creates a browsable data catalog. Great for onboarding new team members to the data model.

Ready to build?

Install HatiData locally and start building with dbt in minutes.

Join Waitlist