Vercel AI SDK + HatiData: Streaming with Persistent Memory
Streaming AI with Memory
The Vercel AI SDK makes it easy to build streaming AI interfaces in Next.js. The useChat hook, streamText function, and route handler patterns provide a clean API for real-time AI responses. But streaming adds a challenge for memory: the response is generated token-by-token, so you cannot wait for the full response before storing it as a memory.
HatiData integrates with the Vercel AI SDK at the route handler level, providing memory retrieval before streaming begins and memory storage after streaming completes. The user experiences real-time streaming while the backend manages persistent memory seamlessly.
Route Handler with Memory Retrieval
The core pattern: retrieve relevant memories before calling the LLM, inject them into the system prompt, stream the response, and store the interaction afterward.
// app/api/chat/route.ts
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";
const HATIDATA_URL = process.env.HATIDATA_URL || "http://localhost:5439";
const HATIDATA_KEY = process.env.HATIDATA_API_KEY || "";
async function searchMemories(query: string, userId: string) {
const response = await fetch(`${HATIDATA_URL}/v1/memory/search`, {
method: "POST",
headers: {
"Authorization": `Bearer ${HATIDATA_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
query,
namespace: `user/${userId}`,
limit: 5,
min_similarity: 0.65,
}),
});
const data = await response.json();
return data.memories || [];
}
async function storeMemory(content: string, userId: string) {
await fetch(`${HATIDATA_URL}/v1/memory/store`, {
method: "POST",
headers: {
"Authorization": `Bearer ${HATIDATA_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
content,
namespace: `user/${userId}`,
metadata: { source: "vercel-ai-sdk" },
}),
});
}
export async function POST(req: Request) {
const { messages } = await req.json();
const userId = req.headers.get("x-user-id") || "anonymous";
const lastMessage = messages[messages.length - 1]?.content || "";
// Retrieve relevant memories before streaming
const memories = await searchMemories(lastMessage, userId);
const memoryContext = memories.length > 0
? memories.map((m: any) => `- ${m.content}`).join("\n")
: "No relevant past context found.";
// Stream response with memory-augmented context
const result = streamText({
model: openai("gpt-4o"),
system: `You are a helpful assistant with persistent memory.
Relevant context from past conversations:
${memoryContext}
Use this context to provide personalized, consistent responses.`,
messages,
onFinish: async ({ text }) => {
// Store the interaction after streaming completes
const summary = `User: ${lastMessage} | Assistant: ${text.slice(0, 500)}`;
await storeMemory(summary, userId);
},
});
return result.toDataStreamResponse();
}The onFinish callback fires after the full response has been streamed to the client. This is where the interaction is stored as a memory — after the user has received the complete response, so memory storage does not add latency to the streaming experience.
Client-Side Integration
The client uses the standard Vercel AI SDK useChat hook — no modifications needed:
// app/page.tsx
"use client";
import { useChat } from "ai/react";
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
api: "/api/chat",
headers: {
"x-user-id": "user-123", // Pass user identity for namespace isolation
},
});
return (
<div>
{messages.map((m) => (
<div key={m.id}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={handleInputChange}
placeholder="Ask anything..."
disabled={isLoading}
/>
<button type="submit" disabled={isLoading}>Send</button>
</form>
</div>
);
}The user experiences a standard streaming chat interface. Behind the scenes, every conversation is enriched with relevant past context and stored for future retrieval.
Tool Use with Memory
The Vercel AI SDK supports tool definitions that the model can invoke. Combine this with HatiData for explicit memory operations:
import { openai } from "@ai-sdk/openai";
import { streamText, tool } from "ai";
import { z } from "zod";
export async function POST(req: Request) {
const { messages } = await req.json();
const userId = req.headers.get("x-user-id") || "anonymous";
const result = streamText({
model: openai("gpt-4o"),
system: "You are a helpful assistant. Use the remember tool to store important information and the recall tool to search past conversations.",
messages,
tools: {
remember: tool({
description: "Store important information for future conversations",
parameters: z.object({
content: z.string().describe("The information to remember"),
importance: z.enum(["low", "normal", "high"]),
}),
execute: async ({ content, importance }) => {
await storeMemory(
content,
userId,
);
return { stored: true };
},
}),
recall: tool({
description: "Search past conversations for relevant information",
parameters: z.object({
query: z.string().describe("What to search for"),
}),
execute: async ({ query }) => {
const memories = await searchMemories(query, userId);
return { memories: memories.map((m: any) => m.content) };
},
}),
},
});
return result.toDataStreamResponse();
}With tool definitions, the model decides when to store and retrieve memories. It might recall past preferences before making a recommendation, or store a user's stated preference for future sessions.
Edge Runtime Compatibility
HatiData's HTTP API is compatible with Vercel's Edge Runtime. The memory retrieval and storage calls use standard fetch, which is available in edge functions:
// app/api/chat/route.ts
export const runtime = "edge"; // Runs on Vercel Edge Network
export async function POST(req: Request) {
// Same implementation as above — fetch-based HatiData calls work on edge
// ...
}Running on the edge reduces latency for globally distributed users — the route handler executes close to the user, and HatiData calls go to the nearest data plane deployment.
Middleware Pattern: Memory Enrichment
For applications with multiple AI routes that all need memory context, use Next.js middleware to pre-fetch memories:
// middleware.ts
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
export async function middleware(request: NextRequest) {
if (request.nextUrl.pathname.startsWith("/api/chat")) {
const body = await request.clone().json();
const lastMessage = body.messages?.[body.messages.length - 1]?.content;
const userId = request.headers.get("x-user-id") || "anonymous";
if (lastMessage) {
const memories = await searchMemories(lastMessage, userId);
// Attach memories to request headers for the route handler
const response = NextResponse.next();
response.headers.set("x-memories", JSON.stringify(memories));
return response;
}
}
return NextResponse.next();
}This pattern centralizes memory retrieval logic so individual route handlers do not need to implement it.
Multi-User Memory Isolation
In a SaaS application where multiple users interact with the same AI assistant, namespace isolation ensures each user's memories are private:
// Each user gets their own namespace
const namespace = `user/${session.user.id}`;
// Memories stored by user A are invisible to user B
// Even if user B asks the same question, they get different (or no) contextFor team-shared knowledge (company wiki, product documentation), use a separate namespace that all users' agents can read from:
const personalNamespace = `user/${userId}`;
const sharedNamespace = "team/knowledge-base";
// Search both namespaces and combine results
const personalMemories = await searchMemories(query, personalNamespace);
const sharedMemories = await searchMemories(query, sharedNamespace);Next Steps
The Vercel AI SDK integration is ideal for web applications that need streaming AI with persistent context. For Python-based agents, see the LangChain or CrewAI integrations. For the full MCP tool suite, see the Claude MCP guide. For production deployment with access controls and encryption, see the governance documentation.