← All Cookbooks
LangChainIntermediate30 min

LangChain + HatiData: Contract Auditor

Build a contract review agent with LangChain that analyzes clauses and maintains an immutable audit trail.

What You'll Build

A LangChain agent with HatiData tools for contract review, CoT logging, and compliance reports.

Prerequisites

$pip install hatidata-agent langchain langchain-hatidata

$hati init

Architecture

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  LangChain   │───▶│  HatiData    │───▶│  Contracts   │
│  Agent       │    │  MCP Server  │    │   (DuckDB)   │
└──────┬───────┘    └──────┬───────┘    └──────────────┘
       │            ┌──────▼───────┐
       └───────────▶│  CoT Ledger  │
                    └──────────────┘

Key Concepts

  • Immutable audit trails (CoT): every reasoning step during contract review is SHA-256 hash-chained, creating a tamper-evident log that proves how the agent reached each conclusion
  • Clause-level semantic search: store contract clauses with embeddings so future reviews can find similar risky language across the entire contract corpus using semantic_match()
  • Compliance reporting with SQL: aggregate flags, severities, and remediations using standard SQL queries against HatiData tables, enabling dashboards and automated reports
  • LangChain tool integration: the langchain-hatidata toolkit provides drop-in tools for queries, memory, and CoT logging that work with any LangChain agent architecture

Step-by-Step Implementation

1

Install and configure

Install LangChain, the HatiData LangChain integration, and initialize the local database.

Bash
pip install hatidata-agent langchain langchain-openai langchain-hatidata
hati init

Note: hati init creates a .hati/config.toml config and initializes a local DuckDB database. langchain-hatidata provides pre-built LangChain tools for queries and semantic search.

2

Set up HatiData schema for contracts

Create tables for contracts, individual clauses, and compliance flags that the auditing agent will populate.

Python
from hatidata_agent import HatiDataAgent

client = HatiDataAgent(host="localhost", port=5439, agent_id="contract-auditor", framework="langchain")

# Create contracts table
client.execute("""
    CREATE TABLE IF NOT EXISTS contracts (
        contract_id VARCHAR PRIMARY KEY,
        title VARCHAR NOT NULL,
        counterparty VARCHAR,
        effective_date DATE,
        expiry_date DATE,
        contract_type VARCHAR,
        status VARCHAR DEFAULT 'pending_review'
    )
""")

# Create clauses table for granular analysis
client.execute("""
    CREATE TABLE IF NOT EXISTS clauses (
        clause_id VARCHAR PRIMARY KEY,
        contract_id VARCHAR NOT NULL,
        section VARCHAR,
        clause_text TEXT,
        risk_level VARCHAR DEFAULT 'low',
        review_notes TEXT
    )
""")

# Create compliance flags table
client.execute("""
    CREATE TABLE IF NOT EXISTS compliance_flags (
        flag_id VARCHAR PRIMARY KEY,
        contract_id VARCHAR NOT NULL,
        clause_id VARCHAR,
        flag_type VARCHAR,
        severity VARCHAR,
        description TEXT,
        remediation TEXT,
        flagged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    )
""")

# Insert a sample contract with clauses
client.execute("""
    INSERT INTO contracts VALUES
        ('CTR_001', 'Cloud Services Agreement', 'Acme Corp',
         '2025-01-01', '2026-12-31', 'SaaS', 'pending_review')
""")

client.execute("""
    INSERT INTO clauses VALUES
        ('CL_001', 'CTR_001', '4.2', 'Vendor shall indemnify Client against
         all claims arising from data breaches, with liability capped at the
         total fees paid in the preceding 12 months.', 'medium', NULL),
        ('CL_002', 'CTR_001', '7.1', 'Either party may terminate this
         agreement with 30 days written notice. Upon termination, Vendor
         shall delete all Client data within 90 days.', 'low', NULL),
        ('CL_003', 'CTR_001', '9.3', 'Vendor reserves the right to modify
         pricing with 15 days notice. Client acceptance is implied by
         continued use of the service.', 'high', NULL),
        ('CL_004', 'CTR_001', '11.1', 'This agreement shall be governed by
         the laws of the State of Delaware. Any disputes shall be resolved
         through binding arbitration.', 'medium', NULL)
""")

print("Contract schema created with sample data.")
Expected Output
Contract schema created with sample data.
3

Create LangChain tools wrapping HatiData

Use the langchain-hatidata toolkit to create query and search tools for the agent.

Python
from langchain_hatidata import HatiDataToolkit
from hatidata_agent import HatiDataAgent

# Create an agent and the full toolkit
agent = HatiDataAgent(host="localhost", port=5439, agent_id="contract-auditor", framework="langchain")
toolkit = HatiDataToolkit(agent=agent)
tools = toolkit.get_tools()

# The toolkit provides these tools:
# - hatidata_query: Execute SQL queries against HatiData
# - hatidata_list_tables: List all available tables
# - hatidata_describe_table: Get schema for a specific table
# - hatidata_context_search: Semantic search for relevant context

for tool in tools:
    print(f"  Tool: {tool.name}")
    print(f"    {tool.description[:80]}...")
    print()

print(f"Created {len(tools)} LangChain tools for contract auditing.")
Expected Output
  Tool: hatidata_query
    Execute SQL queries against HatiData. Supports JOIN_VECTOR for semantic...

  Tool: hatidata_list_tables
    List all tables available in the HatiData database. Returns table names...

  Tool: hatidata_describe_table
    Get the schema and column details for a specific table. Returns column ...

  Tool: hatidata_context_search
    Search for relevant context using semantic similarity. Returns matching...

Created 4 LangChain tools for contract auditing.

Note: The toolkit wraps a HatiDataAgent instance. Each tool handles serialization and error handling internally.

4

Build the contract review chain

Create a LangChain agent with the HatiData tools and a system prompt tailored for legal contract review.

Python
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_hatidata import HatiDataToolkit
from hatidata_agent import HatiDataAgent

# Create the toolkit and LLM
lc_agent = HatiDataAgent(host="localhost", port=5439, agent_id="contract-auditor", framework="langchain")
toolkit = HatiDataToolkit(agent=lc_agent)
tools = toolkit.get_tools()
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# System prompt for legal review
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert contract auditor. Your job is to review
contracts clause-by-clause and identify legal risks.

WORKFLOW:
1. Query the clauses table to get all clauses for the contract
2. For each clause, analyze the risk using hatidata_query
3. If a clause is risky, store a compliance flag using hatidata_query (INSERT)
4. Store your overall findings in memory using hatidata_query (INSERT INTO _hatidata_memory.memories)
5. After reviewing all clauses, generate a compliance summary

RISK INDICATORS TO WATCH:
- Liability caps below industry standard (should be 2x annual fees)
- Unilateral price change clauses
- Short termination notice periods
- Implied acceptance clauses
- Missing data protection provisions
- Binding arbitration without carve-outs

Always log your reasoning for every clause reviewed."""),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

# Create the agent
agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=20,
    return_intermediate_steps=True,
)

print("Contract review agent ready.")
Expected Output
Contract review agent ready.
5

Process a contract with clause analysis

Run the agent on a sample contract. It reviews each clause, stores findings in memory, and logs reasoning steps.

Python
# Run the agent on contract CTR_001
result = executor.invoke({
    "input": "Review contract CTR_001 (Cloud Services Agreement with Acme Corp). "
             "Analyze every clause for legal risks. Log your reasoning for each "
             "clause and flag any compliance issues. Use session_id 'review_CTR_001'."
})

print("\nAgent review complete.")
print(f"Final output: {result['output'][:200]}...")

# Verify the chain-of-thought was logged
from hatidata_agent import HatiDataAgent
client = HatiDataAgent(host="localhost", port=5439, agent_id="contract-auditor", framework="langchain")

cot_entries = client.query("""
    SELECT step_number, step_type, content, confidence
    FROM _hatidata_cot.agent_traces
    WHERE session_id = 'review_CTR_001'
    ORDER BY step_number ASC
""")

print(f"\nChain-of-thought: {len(cot_entries)} reasoning steps logged")
for entry in cot_entries:
    print(f"  Step {entry['step_number']} [{entry['step_type']}] "
          f"(confidence: {entry['confidence']:.2f}): "
          f"{entry['content'][:80]}...")

# Check compliance flags
flags = client.query("""
    SELECT flag_type, severity, description
    FROM compliance_flags
    WHERE contract_id = 'CTR_001'
    ORDER BY severity DESC
""")

print(f"\nCompliance flags raised: {len(flags)}")
for flag in flags:
    print(f"  [{flag['severity'].upper()}] {flag['flag_type']}: "
          f"{flag['description'][:80]}...")
Expected Output
Agent review complete.
Final output: Contract CTR_001 review complete. Found 2 high-risk clauses requiring attention: Section 9.3 (unilateral pricing) and Section 4.2 (low liability cap)...

Chain-of-thought: 8 reasoning steps logged
  Step 1 [observation] (confidence: 1.00): Beginning review of CTR_001 - Cloud Services Agreement with Acme Corp. 4 clau...
  Step 2 [analysis] (confidence: 0.75): Section 4.2 - Indemnification clause caps liability at 12 months fees. Industry...
  Step 3 [analysis] (confidence: 0.60): Section 7.1 - Termination with 30 days notice is acceptable. 90-day data deleti...
  Step 4 [analysis] (confidence: 0.30): Section 9.3 - CRITICAL: Unilateral pricing changes with only 15 days notice and...
  Step 5 [analysis] (confidence: 0.70): Section 11.1 - Binding arbitration in Delaware. No carve-out for IP disputes or...
  Step 6 [recommendation] (confidence: 0.85): FLAG: Section 9.3 requires immediate renegotiation. Implied acceptance clause...
  Step 7 [recommendation] (confidence: 0.80): FLAG: Section 4.2 liability cap should be increased to 2x annual contract val...
  Step 8 [conclusion] (confidence: 0.90): Overall assessment: CTR_001 has 2 high-risk, 1 medium-risk clauses. Recommend ...

Compliance flags raised: 3
  [HIGH] unilateral_pricing: Section 9.3 allows vendor to modify pricing with only 15 days notice. Impli...
  [HIGH] low_liability_cap: Section 4.2 caps indemnification at 12 months fees. Industry standard is 24 m...
  [MEDIUM] binding_arbitration: Section 11.1 uses binding arbitration without carve-outs for injunctive reli...

Note: Every reasoning step is SHA-256 hash-chained. Run client.query("SELECT hash, previous_hash FROM _hatidata_cot.agent_traces WHERE session_id = 'review_CTR_001'") to verify chain integrity.

6

Generate compliance report

Query HatiData to aggregate all findings into a structured compliance report using SQL.

Python
from hatidata_agent import HatiDataAgent

client = HatiDataAgent(host="localhost", port=5439, agent_id="contract-auditor", framework="langchain")

# Aggregate compliance report
report = client.query("""
    SELECT
        c.contract_id,
        c.title,
        c.counterparty,
        COUNT(DISTINCT cl.clause_id) AS total_clauses,
        COUNT(DISTINCT f.flag_id) AS total_flags,
        COUNT(DISTINCT CASE WHEN f.severity = 'high' THEN f.flag_id END) AS high_flags,
        COUNT(DISTINCT CASE WHEN f.severity = 'medium' THEN f.flag_id END) AS med_flags,
        COUNT(DISTINCT CASE WHEN f.severity = 'low' THEN f.flag_id END) AS low_flags
    FROM contracts c
    LEFT JOIN clauses cl ON c.contract_id = cl.contract_id
    LEFT JOIN compliance_flags f ON c.contract_id = f.contract_id
    WHERE c.contract_id = 'CTR_001'
    GROUP BY c.contract_id, c.title, c.counterparty
""")

r = report[0]
print("=" * 60)
print("COMPLIANCE REPORT")
print("=" * 60)
print(f"Contract:    {r['title']}")
print(f"Counterparty: {r['counterparty']}")
print(f"Contract ID: {r['contract_id']}")
print(f"Clauses reviewed: {r['total_clauses']}")
print(f"Flags raised:     {r['total_flags']} "
      f"(High: {r['high_flags']}, Medium: {r['med_flags']}, Low: {r['low_flags']})")
print()

# Get detailed flags with remediation
details = client.query("""
    SELECT f.severity, f.flag_type, f.description, f.remediation,
           cl.section, cl.clause_text
    FROM compliance_flags f
    JOIN clauses cl ON f.clause_id = cl.clause_id
    WHERE f.contract_id = 'CTR_001'
    ORDER BY
        CASE f.severity WHEN 'high' THEN 1 WHEN 'medium' THEN 2 ELSE 3 END
""")

for i, d in enumerate(details, 1):
    print(f"{i}. [{d['severity'].upper()}] Section {d['section']}: {d['flag_type']}")
    print(f"   Issue: {d['description']}")
    print(f"   Remediation: {d['remediation']}")
    print()

# Store the report in memory for future reference
client.execute(
    "INSERT INTO _hatidata_memory.memories (content, tags, namespace) VALUES ("
    f"'Compliance report for {r["title"]} ({r["contract_id"]}): "
    f"{r["total_flags"]} flags ({r["high_flags"]} high, {r["med_flags"]} medium). "
    f"Key issues: unilateral pricing (s9.3), low liability cap (s4.2).', "
    f"'compliance-report,{r["contract_id"]},completed', 'contract_auditor')"
)

print("Report stored in agent memory for future reference.")
Expected Output
============================================================
COMPLIANCE REPORT
============================================================
Contract:    Cloud Services Agreement
Counterparty: Acme Corp
Contract ID: CTR_001
Clauses reviewed: 4
Flags raised:     3 (High: 2, Medium: 1, Low: 0)

1. [HIGH] Section 9.3: unilateral_pricing
   Issue: Vendor can modify pricing with 15 days notice. Implied acceptance by continued use.
   Remediation: Require 90-day notice and explicit written consent for price changes.

2. [HIGH] Section 4.2: low_liability_cap
   Issue: Indemnification capped at 12 months fees. Below industry standard.
   Remediation: Negotiate cap to 24 months (2x annual) or uncapped for data breaches.

3. [MEDIUM] Section 11.1: binding_arbitration
   Issue: Binding arbitration without carve-outs for injunctive relief or IP disputes.
   Remediation: Add carve-out allowing court proceedings for IP and injunctive matters.

Report stored in agent memory for future reference.

Ready to build?

Install HatiData locally and start building with LangChain in minutes.

Join Waitlist