AI-powered applications have become the dominant trend in 2026, and if you're self-hosting Supabase, you're probably wondering how to tap into vector search capabilities without paying for a separate vector database. The good news: pgvector ships with Supabase's Docker images, giving you a production-ready vector database inside your existing PostgreSQL instance.
This guide walks you through enabling pgvector, storing embeddings, building semantic search, and deploying RAG (Retrieval-Augmented Generation) applications—all on your self-hosted infrastructure.
Why pgvector for Self-Hosted Supabase?
Vector databases have exploded in popularity alongside LLM adoption. Services like Pinecone, Weaviate, and Qdrant offer dedicated vector storage, but they add operational complexity and cost. For self-hosters, pgvector offers a compelling alternative:
Single database architecture: Store your relational data and vector embeddings together. No syncing between systems, no additional infrastructure to manage.
SQL-native queries: Query embeddings using familiar SQL syntax. Join vector similarity results with your existing tables without learning a new query language.
Cost efficiency: You're already running PostgreSQL. pgvector adds vector capabilities without additional licensing or hosting costs.
Data locality: For compliance-sensitive applications, keeping embeddings in your existing database simplifies data governance.
The trade-off? pgvector won't match the raw performance of purpose-built vector databases at massive scale. But for most applications—semantic search, RAG chatbots, recommendation engines—pgvector handles millions of vectors comfortably.
Enabling pgvector in Self-Hosted Supabase
If you're running Supabase via Docker Compose (the standard self-hosting approach), pgvector comes pre-installed in the supabase/postgres image. You just need to enable the extension.
Step 1: Enable the Extension
Connect to your database and run:
-- Enable pgvector extension CREATE EXTENSION IF NOT EXISTS vector; -- Verify installation SELECT * FROM pg_extension WHERE extname = 'vector';
If you're using Supabase Studio (which runs on port 3000 in self-hosted deployments), navigate to Database → Extensions and search for "vector" to enable it through the UI.
Step 2: Verify the Version
Check which pgvector version you're running:
SELECT extversion FROM pg_extension WHERE extname = 'vector';
As of early 2026, self-hosted Supabase ships with pgvector 0.7.x. Some users have reported issues when attempting to upgrade to newer versions (like 0.8.x) due to breaking changes in configuration files. Unless you need specific features from newer versions, stick with the bundled version.
Step 3: Create a Vector Table
Here's a standard schema for storing document embeddings:
-- Create documents table with vector column
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536), -- Matches OpenAI text-embedding-3-small dimensions
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create HNSW index for fast similarity search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
The vector(1536) type stores 1536-dimensional vectors, matching OpenAI's text-embedding-3-small model. If you're using a different embedding model, adjust the dimension accordingly.
Understanding Indexing Options
pgvector offers two index types for approximate nearest neighbor search:
IVFFlat Index
Faster to build, uses less memory, but requires training on existing data:
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Best for: Static datasets where you don't frequently add new documents.
HNSW Index
Slower to build, uses more memory, but handles dynamic data better:
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
Best for: Applications with frequent inserts, RAG systems with growing document stores.
For most self-hosted deployments, HNSW provides the best balance of performance and operational simplicity. Tune the m parameter (connections per node) and ef_construction (build-time accuracy) based on your memory constraints and accuracy requirements.
Building Semantic Search
With pgvector configured, you can implement semantic search that understands meaning, not just keywords. Here's a complete example using Node.js and OpenAI embeddings:
import { createClient } from '@supabase/supabase-js';
import OpenAI from 'openai';
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_KEY!
);
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Generate embedding for a query
async function getEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}
// Search for similar documents
async function semanticSearch(query: string, limit = 5) {
const queryEmbedding = await getEmbedding(query);
const { data, error } = await supabase.rpc('match_documents', {
query_embedding: queryEmbedding,
match_threshold: 0.7,
match_count: limit,
});
if (error) throw error;
return data;
}
Create the matching function in PostgreSQL:
CREATE OR REPLACE FUNCTION match_documents(
query_embedding vector(1536),
match_threshold float,
match_count int
)
RETURNS TABLE (
id uuid,
content text,
metadata jsonb,
similarity float
)
LANGUAGE sql STABLE
AS $$
SELECT
documents.id,
documents.content,
documents.metadata,
1 - (documents.embedding <=> query_embedding) AS similarity
FROM documents
WHERE 1 - (documents.embedding <=> query_embedding) > match_threshold
ORDER BY documents.embedding <=> query_embedding
LIMIT match_count;
$$;
The <=> operator calculates cosine distance. Subtracting from 1 converts it to similarity (higher = more similar).
Implementing RAG for Your Self-Hosted Stack
RAG combines retrieval (finding relevant documents) with generation (using an LLM to synthesize answers). Here's how to implement it with your self-hosted Supabase:
Document Ingestion Pipeline
async function ingestDocument(content: string, metadata: Record<string, any>) {
// Split into chunks (simple approach - use LangChain for production)
const chunks = splitIntoChunks(content, 800, 100); // 800 chars, 100 overlap
for (const chunk of chunks) {
const embedding = await getEmbedding(chunk);
await supabase.from('documents').insert({
content: chunk,
metadata: { ...metadata, chunk_index: chunks.indexOf(chunk) },
embedding,
});
}
}
function splitIntoChunks(text: string, size: number, overlap: number): string[] {
const chunks: string[] = [];
let start = 0;
while (start < text.length) {
const end = Math.min(start + size, text.length);
chunks.push(text.slice(start, end));
start += size - overlap;
}
return chunks;
}
RAG Query Function
async function ragQuery(question: string): Promise<string> {
// 1. Find relevant documents
const relevantDocs = await semanticSearch(question, 5);
// 2. Build context from retrieved documents
const context = relevantDocs
.map((doc: any) => doc.content)
.join('\n\n---\n\n');
// 3. Generate answer using LLM
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `Answer questions based on the provided context.
If the context doesn't contain relevant information, say so.`,
},
{
role: 'user',
content: `Context:\n${context}\n\nQuestion: ${question}`,
},
],
temperature: 0.7,
});
return completion.choices[0].message.content || 'Unable to generate response';
}
Hybrid Search: Combining Keywords and Vectors
In 2026, hybrid search has become the standard approach—combining traditional full-text search with semantic similarity. This catches cases where exact keyword matches matter (product names, technical terms) while still understanding context.
-- Enable full-text search
ALTER TABLE documents ADD COLUMN fts tsvector
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
CREATE INDEX ON documents USING gin(fts);
-- Hybrid search function
CREATE OR REPLACE FUNCTION hybrid_search(
query_text text,
query_embedding vector(1536),
match_count int,
full_text_weight float DEFAULT 0.3,
semantic_weight float DEFAULT 0.7
)
RETURNS TABLE (
id uuid,
content text,
metadata jsonb,
combined_score float
)
LANGUAGE sql STABLE
AS $$
WITH semantic_search AS (
SELECT id, content, metadata,
1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT match_count * 2
),
keyword_search AS (
SELECT id, content, metadata,
ts_rank(fts, plainto_tsquery('english', query_text)) AS rank
FROM documents
WHERE fts @@ plainto_tsquery('english', query_text)
LIMIT match_count * 2
)
SELECT
COALESCE(s.id, k.id) AS id,
COALESCE(s.content, k.content) AS content,
COALESCE(s.metadata, k.metadata) AS metadata,
(COALESCE(s.similarity, 0) * semantic_weight +
COALESCE(k.rank, 0) * full_text_weight) AS combined_score
FROM semantic_search s
FULL OUTER JOIN keyword_search k ON s.id = k.id
ORDER BY combined_score DESC
LIMIT match_count;
$$;
Performance Tuning for Self-Hosted Deployments
Self-hosted means you control the resources. Here are key optimizations:
Memory Configuration
pgvector benefits from adequate shared_buffers and work_mem. In your PostgreSQL configuration:
# For a server with 8GB RAM dedicated to Postgres shared_buffers = 2GB work_mem = 256MB maintenance_work_mem = 512MB effective_cache_size = 6GB
Index Tuning
For HNSW indexes, the ef_search parameter controls query-time accuracy vs. speed:
-- Higher = more accurate but slower SET hnsw.ef_search = 100; -- Lower = faster but less accurate (default: 40) SET hnsw.ef_search = 40;
Batch Embedding Inserts
Instead of inserting embeddings one at a time, batch them:
async function batchInsertEmbeddings(documents: Array<{content: string, metadata: any}>) {
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: documents.map(d => d.content),
});
const rows = documents.map((doc, i) => ({
content: doc.content,
metadata: doc.metadata,
embedding: embeddings.data[i].embedding,
}));
await supabase.from('documents').insert(rows);
}
Common Issues and Solutions
Extension not available: If CREATE EXTENSION vector fails, ensure you're using the official supabase/postgres Docker image. Custom PostgreSQL images may not include pgvector.
Dimension mismatch: The error "expected X dimensions, not Y" means your embedding model's output doesn't match your column definition. Update your table schema or switch models.
Slow queries: Without an index, vector similarity search scans the entire table. Always create HNSW or IVFFlat indexes for tables with more than a few thousand rows.
High memory usage: HNSW indexes are memory-intensive. Monitor your PostgreSQL memory consumption and scale your server or reduce index parameters (m value) if needed.
Why Self-Hosting Matters for AI Applications
Building AI applications on self-hosted infrastructure gives you complete control over your data pipeline. When embeddings represent sensitive documents—customer data, proprietary content, internal knowledge bases—keeping everything on your own servers simplifies compliance and reduces data exposure risks.
With Supascale, managing self-hosted Supabase becomes straightforward. You get automated backups (including your vector data), custom domain configuration, and OAuth setup—without the operational overhead of managing raw Docker deployments. Check our pricing for details on what's included.
Further Reading
- Supabase Self-Hosted Backup Guide - Ensure your embeddings are protected
- PostgreSQL Performance Tuning - Optimize your database for AI workloads
- Docker Compose Best Practices - Production deployment patterns
- Official pgvector Documentation - Supabase's pgvector reference
