Full Text Search for Self-Hosted Supabase: A Complete Guide

Learn how to implement powerful full text search in self-hosted Supabase using PostgreSQL's native search capabilities without external services.

Cover Image for Full Text Search for Self-Hosted Supabase: A Complete Guide

If you're self-hosting Supabase, you already know the appeal: complete control, predictable costs, and no vendor lock-in. But what happens when your users need to search through thousands of records? Do you spin up Elasticsearch or Algolia, adding more infrastructure complexity to your stack?

You don't have to. PostgreSQL's native full text search is surprisingly capable, and it comes baked into your self-hosted Supabase instance. This guide walks you through implementing full text search from scratch, covering everything from basic queries to production-ready indexed searches.

Before diving into implementation, let's address the elephant in the room: when does Postgres full text search make sense versus dedicated search services?

Choose PostgreSQL full text search when:

  • Your dataset is under 10 million rows
  • You need real-time consistency between data and search results
  • You want to minimize infrastructure complexity
  • Your search requirements are straightforward (no ML-powered relevance ranking)
  • You're optimizing for cost efficiency

Consider external search services when:

  • You need typo tolerance and fuzzy matching at scale
  • Your search requires ML-based ranking or personalization
  • You're indexing unstructured data from multiple sources
  • Sub-10ms response times are critical

For most self-hosted Supabase deployments, the built-in search handles 80% of use cases without adding operational burden.

Understanding Text Search Fundamentals

PostgreSQL full text search works by converting text into tsvector (a searchable document format) and queries into tsquery (a search pattern format). The @@ operator checks if a document matches a query.

Here's a minimal example:

SELECT 'The quick brown fox' @@ to_tsquery('quick & fox');
-- Returns: true

The magic happens in how Postgres processes text:

  1. Tokenization - Breaking text into individual words
  2. Normalization - Converting to lowercase, stemming (running → run)
  3. Stop word removal - Filtering common words (the, a, is)
  4. Lexeme creation - Building the searchable representation

Setting Up Full Text Search in Self-Hosted Supabase

Let's build a practical search system for a products table. This example works identically whether you're using Docker Compose or another deployment method.

Step 1: Create Your Table with a Search Column

CREATE TABLE products (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL,
  description TEXT,
  category TEXT,
  price DECIMAL(10, 2),
  created_at TIMESTAMPTZ DEFAULT NOW(),
  -- Generated column for search
  fts TSVECTOR GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(category, '')), 'C')
  ) STORED
);

The fts column is a generated column that automatically updates whenever name, description, or category changes. The setweight function assigns importance levels (A highest, D lowest), so matches in the product name rank higher than description matches.

Step 2: Create a GIN Index

Without an index, Postgres scans every row. A GIN (Generalized Inverted Index) makes searches nearly instant:

CREATE INDEX idx_products_fts ON products USING GIN (fts);

This is critical for performance. On a 100,000 row table, unindexed search might take 200ms while indexed search takes under 5ms.

Step 3: Create a Search Function

For cleaner API access, wrap the search logic in a database function:

CREATE OR REPLACE FUNCTION search_products(search_query TEXT)
RETURNS TABLE (
  id UUID,
  name TEXT,
  description TEXT,
  category TEXT,
  price DECIMAL,
  rank REAL
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT 
    p.id,
    p.name,
    p.description,
    p.category,
    p.price,
    ts_rank(p.fts, websearch_to_tsquery('english', search_query)) AS rank
  FROM products p
  WHERE p.fts @@ websearch_to_tsquery('english', search_query)
  ORDER BY rank DESC
  LIMIT 50;
END;
$$;

The websearch_to_tsquery function parses queries like users actually type them:

  • wireless headphones → match both terms
  • "wireless headphones" → match exact phrase
  • wireless OR bluetooth → match either term
  • wireless -cheap → match wireless, exclude cheap

Step 4: Call from Your Application

With Supabase's RPC support, calling this function is straightforward:

const { data, error } = await supabase
  .rpc('search_products', { search_query: 'wireless headphones' });

Handling Multiple Languages

The default english configuration works for English text. For multilingual support, you have several options.

Option 1: Specify Language Per Column

fts TSVECTOR GENERATED ALWAYS AS (
  to_tsvector('french', coalesce(name_fr, '')) ||
  to_tsvector('german', coalesce(name_de, ''))
) STORED
to_tsvector('simple', coalesce(name, ''))

The simple configuration performs no stemming or stop word removal, making it work across languages at the cost of reduced accuracy.

Option 3: Install PGroonga for CJK Languages

For Chinese, Japanese, and Korean text, PostgreSQL's native search falls short. Self-hosted Supabase supports installing extensions:

CREATE EXTENSION IF NOT EXISTS pgroonga;

PGroonga uses Groonga's ICU tokenizer, which properly handles non-space-separated languages.

Combining Full Text Search with Filters

Real applications rarely use pure text search. Users filter by price range, category, or date. PostgreSQL handles this efficiently when you structure queries correctly:

CREATE OR REPLACE FUNCTION search_products_filtered(
  search_query TEXT,
  min_price DECIMAL DEFAULT 0,
  max_price DECIMAL DEFAULT 999999,
  category_filter TEXT DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  name TEXT,
  description TEXT,
  category TEXT,
  price DECIMAL,
  rank REAL
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT 
    p.id,
    p.name,
    p.description,
    p.category,
    p.price,
    ts_rank(p.fts, websearch_to_tsquery('english', search_query)) AS rank
  FROM products p
  WHERE 
    p.fts @@ websearch_to_tsquery('english', search_query)
    AND p.price BETWEEN min_price AND max_price
    AND (category_filter IS NULL OR p.category = category_filter)
  ORDER BY rank DESC
  LIMIT 50;
END;
$$;

For optimal performance with combined filters, create a partial index:

CREATE INDEX idx_products_electronics_fts 
ON products USING GIN (fts) 
WHERE category = 'Electronics';

Implementing Search Suggestions

Autocomplete requires a different approach than full text search. For prefix matching, use trigrams:

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Create a trigram index
CREATE INDEX idx_products_name_trgm 
ON products USING GIN (name gin_trgm_ops);

-- Query for suggestions
SELECT DISTINCT name
FROM products
WHERE name ILIKE '%wire%'
ORDER BY similarity(name, 'wire') DESC
LIMIT 10;

This handles partial matches and typos that standard full text search misses.

For semantic search (finding conceptually similar content), combine full text search with pgvector. This hybrid approach delivers both keyword precision and semantic understanding:

-- Assuming you have an embedding column
SELECT 
  id,
  name,
  -- Combine text rank and vector similarity
  (0.7 * ts_rank(fts, websearch_to_tsquery('english', $1))) +
  (0.3 * (1 - (embedding <=> $2))) AS combined_score
FROM products
WHERE fts @@ websearch_to_tsquery('english', $1)
ORDER BY combined_score DESC
LIMIT 20;

This approach is particularly effective for product catalogs and content search where users might describe what they want rather than knowing exact terms.

Performance Optimization

As your data grows, monitor and optimize search performance.

Analyze Query Plans

EXPLAIN ANALYZE
SELECT * FROM products 
WHERE fts @@ websearch_to_tsquery('english', 'wireless headphones');

Look for "Index Scan using idx_products_fts" in the output. If you see "Seq Scan", your index isn't being used.

Update Statistics

PostgreSQL uses table statistics for query planning. After bulk inserts or major changes:

ANALYZE products;

Monitor Index Size

GIN indexes can grow large. Check their size:

SELECT 
  pg_size_pretty(pg_relation_size('idx_products_fts')) as index_size,
  pg_size_pretty(pg_table_size('products')) as table_size;

If the index exceeds 20% of table size, consider limiting indexed columns or using partial indexes.

For deeper performance insights, see our guide on PostgreSQL performance tuning.

Triggering Search Index Updates

When your search needs to include data from related tables, generated columns won't work. Use triggers instead:

CREATE OR REPLACE FUNCTION update_product_fts()
RETURNS TRIGGER AS $$
BEGIN
  NEW.fts := 
    setweight(to_tsvector('english', coalesce(NEW.name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(
      (SELECT string_agg(t.name, ' ') 
       FROM product_tags pt 
       JOIN tags t ON pt.tag_id = t.id 
       WHERE pt.product_id = NEW.id), 
      '')), 'C');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_products_fts
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION update_product_fts();

Common Pitfalls and Solutions

Empty Results for Valid Searches

The most common issue is language mismatch. If your data is in Spanish but you're using english configuration, stemming won't work correctly:

-- Check what tokens are being generated
SELECT to_tsvector('english', 'running');
-- Returns: 'run':1 (correct)

SELECT to_tsvector('english', 'corriendo');
-- Returns: 'corriendo':1 (not stemmed)

SELECT to_tsvector('spanish', 'corriendo');
-- Returns: 'corr':1 (correctly stemmed)

Slow Queries Despite Indexing

If queries are slow with a GIN index present, check if you're inadvertently bypassing it:

-- This bypasses the index (coalesce prevents index usage)
WHERE coalesce(fts, '') @@ to_tsquery('search term')

-- This uses the index
WHERE fts @@ to_tsquery('search term')

Memory Issues on Large Indexes

GIN index maintenance happens in memory. For large imports, increase maintenance_work_mem temporarily:

SET maintenance_work_mem = '1GB';
-- Perform bulk operations
RESET maintenance_work_mem;

Managing Search with Supascale

If you're using Supascale to manage your self-hosted Supabase deployment, the SQL editor provides a straightforward way to create and test these search functions. The one-time purchase model means you're not paying per-query for search like you would with external search services.

For production deployments, consider:

Conclusion

PostgreSQL's full text search is a pragmatic choice for self-hosted Supabase deployments. It eliminates external dependencies, keeps search results perfectly synchronized with your data, and handles most use cases without specialized infrastructure.

Start with the generated column approach for simple implementations. As requirements grow, layer in trigram search for autocomplete and vector search for semantic queries. The composability of PostgreSQL means you're not locked into any single approach.

For teams evaluating search solutions, try the built-in approach first. You can always migrate to a dedicated service later if you outgrow it—but many teams find that PostgreSQL's capabilities exceed their expectations.


Further Reading