Setting Up pgvector for Self-Hosted Supabase: Build AI Apps with Vector Search

Learn how to configure pgvector for self-hosted Supabase to enable vector embeddings, semantic search, and RAG applications.

Cover Image for Setting Up pgvector for Self-Hosted Supabase: Build AI Apps with Vector Search

AI-powered applications have become the dominant trend in 2026, and if you're self-hosting Supabase, you're probably wondering how to tap into vector search capabilities without paying for a separate vector database. The good news: pgvector ships with Supabase's Docker images, giving you a production-ready vector database inside your existing PostgreSQL instance.

This guide walks you through enabling pgvector, storing embeddings, building semantic search, and deploying RAG (Retrieval-Augmented Generation) applications—all on your self-hosted infrastructure.

Why pgvector for Self-Hosted Supabase?

Vector databases have exploded in popularity alongside LLM adoption. Services like Pinecone, Weaviate, and Qdrant offer dedicated vector storage, but they add operational complexity and cost. For self-hosters, pgvector offers a compelling alternative:

Single database architecture: Store your relational data and vector embeddings together. No syncing between systems, no additional infrastructure to manage.

SQL-native queries: Query embeddings using familiar SQL syntax. Join vector similarity results with your existing tables without learning a new query language.

Cost efficiency: You're already running PostgreSQL. pgvector adds vector capabilities without additional licensing or hosting costs.

Data locality: For compliance-sensitive applications, keeping embeddings in your existing database simplifies data governance.

The trade-off? pgvector won't match the raw performance of purpose-built vector databases at massive scale. But for most applications—semantic search, RAG chatbots, recommendation engines—pgvector handles millions of vectors comfortably.

Enabling pgvector in Self-Hosted Supabase

If you're running Supabase via Docker Compose (the standard self-hosting approach), pgvector comes pre-installed in the supabase/postgres image. You just need to enable the extension.

Step 1: Enable the Extension

Connect to your database and run:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';

If you're using Supabase Studio (which runs on port 3000 in self-hosted deployments), navigate to Database → Extensions and search for "vector" to enable it through the UI.

Step 2: Verify the Version

Check which pgvector version you're running:

SELECT extversion FROM pg_extension WHERE extname = 'vector';

As of early 2026, self-hosted Supabase ships with pgvector 0.7.x. Some users have reported issues when attempting to upgrade to newer versions (like 0.8.x) due to breaking changes in configuration files. Unless you need specific features from newer versions, stick with the bundled version.

Step 3: Create a Vector Table

Here's a standard schema for storing document embeddings:

-- Create documents table with vector column
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  content TEXT NOT NULL,
  metadata JSONB DEFAULT '{}',
  embedding vector(1536),  -- Matches OpenAI text-embedding-3-small dimensions
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

The vector(1536) type stores 1536-dimensional vectors, matching OpenAI's text-embedding-3-small model. If you're using a different embedding model, adjust the dimension accordingly.

Understanding Indexing Options

pgvector offers two index types for approximate nearest neighbor search:

IVFFlat Index

Faster to build, uses less memory, but requires training on existing data:

CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Best for: Static datasets where you don't frequently add new documents.

HNSW Index

Slower to build, uses more memory, but handles dynamic data better:

CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Best for: Applications with frequent inserts, RAG systems with growing document stores.

For most self-hosted deployments, HNSW provides the best balance of performance and operational simplicity. Tune the m parameter (connections per node) and ef_construction (build-time accuracy) based on your memory constraints and accuracy requirements.

With pgvector configured, you can implement semantic search that understands meaning, not just keywords. Here's a complete example using Node.js and OpenAI embeddings:

import { createClient } from '@supabase/supabase-js';
import OpenAI from 'openai';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Generate embedding for a query
async function getEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });
  return response.data[0].embedding;
}

// Search for similar documents
async function semanticSearch(query: string, limit = 5) {
  const queryEmbedding = await getEmbedding(query);
  
  const { data, error } = await supabase.rpc('match_documents', {
    query_embedding: queryEmbedding,
    match_threshold: 0.7,
    match_count: limit,
  });

  if (error) throw error;
  return data;
}

Create the matching function in PostgreSQL:

CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(1536),
  match_threshold float,
  match_count int
)
RETURNS TABLE (
  id uuid,
  content text,
  metadata jsonb,
  similarity float
)
LANGUAGE sql STABLE
AS $$
  SELECT
    documents.id,
    documents.content,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) AS similarity
  FROM documents
  WHERE 1 - (documents.embedding <=> query_embedding) > match_threshold
  ORDER BY documents.embedding <=> query_embedding
  LIMIT match_count;
$$;

The <=> operator calculates cosine distance. Subtracting from 1 converts it to similarity (higher = more similar).

Implementing RAG for Your Self-Hosted Stack

RAG combines retrieval (finding relevant documents) with generation (using an LLM to synthesize answers). Here's how to implement it with your self-hosted Supabase:

Document Ingestion Pipeline

async function ingestDocument(content: string, metadata: Record<string, any>) {
  // Split into chunks (simple approach - use LangChain for production)
  const chunks = splitIntoChunks(content, 800, 100); // 800 chars, 100 overlap
  
  for (const chunk of chunks) {
    const embedding = await getEmbedding(chunk);
    
    await supabase.from('documents').insert({
      content: chunk,
      metadata: { ...metadata, chunk_index: chunks.indexOf(chunk) },
      embedding,
    });
  }
}

function splitIntoChunks(text: string, size: number, overlap: number): string[] {
  const chunks: string[] = [];
  let start = 0;
  
  while (start < text.length) {
    const end = Math.min(start + size, text.length);
    chunks.push(text.slice(start, end));
    start += size - overlap;
  }
  
  return chunks;
}

RAG Query Function

async function ragQuery(question: string): Promise<string> {
  // 1. Find relevant documents
  const relevantDocs = await semanticSearch(question, 5);
  
  // 2. Build context from retrieved documents
  const context = relevantDocs
    .map((doc: any) => doc.content)
    .join('\n\n---\n\n');
  
  // 3. Generate answer using LLM
  const completion = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `Answer questions based on the provided context. 
                  If the context doesn't contain relevant information, say so.`,
      },
      {
        role: 'user',
        content: `Context:\n${context}\n\nQuestion: ${question}`,
      },
    ],
    temperature: 0.7,
  });

  return completion.choices[0].message.content || 'Unable to generate response';
}

Hybrid Search: Combining Keywords and Vectors

In 2026, hybrid search has become the standard approach—combining traditional full-text search with semantic similarity. This catches cases where exact keyword matches matter (product names, technical terms) while still understanding context.

-- Enable full-text search
ALTER TABLE documents ADD COLUMN fts tsvector
  GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;

CREATE INDEX ON documents USING gin(fts);

-- Hybrid search function
CREATE OR REPLACE FUNCTION hybrid_search(
  query_text text,
  query_embedding vector(1536),
  match_count int,
  full_text_weight float DEFAULT 0.3,
  semantic_weight float DEFAULT 0.7
)
RETURNS TABLE (
  id uuid,
  content text,
  metadata jsonb,
  combined_score float
)
LANGUAGE sql STABLE
AS $$
  WITH semantic_search AS (
    SELECT id, content, metadata,
           1 - (embedding <=> query_embedding) AS similarity
    FROM documents
    ORDER BY embedding <=> query_embedding
    LIMIT match_count * 2
  ),
  keyword_search AS (
    SELECT id, content, metadata,
           ts_rank(fts, plainto_tsquery('english', query_text)) AS rank
    FROM documents
    WHERE fts @@ plainto_tsquery('english', query_text)
    LIMIT match_count * 2
  )
  SELECT 
    COALESCE(s.id, k.id) AS id,
    COALESCE(s.content, k.content) AS content,
    COALESCE(s.metadata, k.metadata) AS metadata,
    (COALESCE(s.similarity, 0) * semantic_weight + 
     COALESCE(k.rank, 0) * full_text_weight) AS combined_score
  FROM semantic_search s
  FULL OUTER JOIN keyword_search k ON s.id = k.id
  ORDER BY combined_score DESC
  LIMIT match_count;
$$;

Performance Tuning for Self-Hosted Deployments

Self-hosted means you control the resources. Here are key optimizations:

Memory Configuration

pgvector benefits from adequate shared_buffers and work_mem. In your PostgreSQL configuration:

# For a server with 8GB RAM dedicated to Postgres
shared_buffers = 2GB
work_mem = 256MB
maintenance_work_mem = 512MB
effective_cache_size = 6GB

Index Tuning

For HNSW indexes, the ef_search parameter controls query-time accuracy vs. speed:

-- Higher = more accurate but slower
SET hnsw.ef_search = 100;

-- Lower = faster but less accurate (default: 40)
SET hnsw.ef_search = 40;

Batch Embedding Inserts

Instead of inserting embeddings one at a time, batch them:

async function batchInsertEmbeddings(documents: Array<{content: string, metadata: any}>) {
  const embeddings = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: documents.map(d => d.content),
  });

  const rows = documents.map((doc, i) => ({
    content: doc.content,
    metadata: doc.metadata,
    embedding: embeddings.data[i].embedding,
  }));

  await supabase.from('documents').insert(rows);
}

Common Issues and Solutions

Extension not available: If CREATE EXTENSION vector fails, ensure you're using the official supabase/postgres Docker image. Custom PostgreSQL images may not include pgvector.

Dimension mismatch: The error "expected X dimensions, not Y" means your embedding model's output doesn't match your column definition. Update your table schema or switch models.

Slow queries: Without an index, vector similarity search scans the entire table. Always create HNSW or IVFFlat indexes for tables with more than a few thousand rows.

High memory usage: HNSW indexes are memory-intensive. Monitor your PostgreSQL memory consumption and scale your server or reduce index parameters (m value) if needed.

Why Self-Hosting Matters for AI Applications

Building AI applications on self-hosted infrastructure gives you complete control over your data pipeline. When embeddings represent sensitive documents—customer data, proprietary content, internal knowledge bases—keeping everything on your own servers simplifies compliance and reduces data exposure risks.

With Supascale, managing self-hosted Supabase becomes straightforward. You get automated backups (including your vector data), custom domain configuration, and OAuth setup—without the operational overhead of managing raw Docker deployments. Check our pricing for details on what's included.

Further Reading