Data Retention and Archival for Self-Hosted Supabase: A Complete Guide

Self-hosted Supabase gives you unlimited database growth—but that's a double-edged sword. Without Supabase Cloud's managed limits nudging you toward cost-conscious decisions, databases can balloon quickly. Logs accumulate, old sessions pile up, and analytics tables grow unchecked. Before you know it, backups take hours and queries slow to a crawl.

Unlike managed services, Supabase won't automatically delete your data after a certain time. You need to implement retention policies yourself. This guide covers three approaches to managing database growth in self-hosted Supabase: simple scheduled deletions with pg_cron, efficient partitioning with pg_partman, and storage object lifecycle management.

Why Data Retention Matters for Self-Hosted Instances

On Supabase Cloud, database size directly impacts your bill—there's a built-in incentive to keep things lean. Self-hosted deployments have different economics: storage is cheap, but operational costs add up.

Large databases create several problems:

Slower backups: A 500GB database takes significantly longer to dump and restore than a 50GB one. Our guide on backup and restore procedures becomes more complex with oversized databases.
Degraded query performance: Indexes grow, table scans take longer, and even well-optimized queries suffer
Higher resource requirements: More RAM for caching, more CPU for vacuum operations, more I/O during maintenance windows
Compliance risks: Retaining user data longer than necessary can violate GDPR, CCPA, or industry regulations

The solution isn't to never store data—it's to define how long data should live and automate its cleanup.

Defining Your Retention Requirements

Before writing any code, map out what data you have and how long you need it:

Data Type	Typical Retention	Notes
User sessions	30-90 days	Balance UX (staying logged in) with security
Application logs	7-30 days	Keep longer if needed for debugging patterns
Analytics events	90-365 days	Consider aggregating before archiving
Audit trails	1-7 years	Often legally mandated
Realtime presence	24 hours	Ephemeral by nature
Auth tokens	Until expiry	Clean up expired tokens promptly

Your requirements will vary based on your application, industry, and compliance needs. The key is being intentional rather than keeping everything forever "just in case."

Approach 1: Scheduled Deletions with pg_cron

For most self-hosted Supabase deployments, pg_cron provides the simplest path to automated data cleanup. It's a PostgreSQL extension that runs cron jobs inside the database—no external schedulers required.

Enabling pg_cron in Self-Hosted Supabase

If you followed our guide on scheduling cron jobs for self-hosted Supabase, you'll already have pg_cron configured. If not, you'll need to add it to your PostgreSQL configuration.

First, enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_cron;

Then add pg_cron to shared_preload_libraries in your postgresql.conf (requires a restart):

shared_preload_libraries = 'pg_cron'
cron.database_name = 'postgres'

Creating Retention Jobs

Here's a practical example for cleaning up old sessions:

-- Delete sessions older than 30 days, runs daily at 3 AM
SELECT cron.schedule(
  'cleanup-old-sessions',
  '0 3 * * *',
  $$DELETE FROM auth.sessions WHERE created_at < NOW() - INTERVAL '30 days'$$
);

For application logs:

-- Delete logs older than 7 days, runs every 6 hours
SELECT cron.schedule(
  'cleanup-app-logs',
  '0 */6 * * *',
  $$DELETE FROM public.app_logs WHERE created_at < NOW() - INTERVAL '7 days'$$
);

For analytics events with batch limits (prevents long-running transactions):

-- Delete up to 10,000 old events per run
SELECT cron.schedule(
  'cleanup-analytics',
  '*/15 * * * *',
  $$DELETE FROM public.analytics_events 
    WHERE id IN (
      SELECT id FROM public.analytics_events 
      WHERE created_at < NOW() - INTERVAL '90 days' 
      LIMIT 10000
    )$$
);

Monitoring Your Jobs

Check if jobs are running successfully:

SELECT jobid, jobname, schedule, command, 
       (SELECT max(start_time) FROM cron.job_run_details WHERE jobid = cron.job.jobid) as last_run,
       (SELECT status FROM cron.job_run_details WHERE jobid = cron.job.jobid ORDER BY start_time DESC LIMIT 1) as last_status
FROM cron.job;

Review recent job execution details:

SELECT jobid, jobname, start_time, end_time, 
       status, return_message
FROM cron.job_run_details
ORDER BY start_time DESC
LIMIT 20;

If jobs are failing silently and backlogs accumulate, you can end up in a situation where clearing old data becomes difficult. Monitor cron.job_run_details regularly.

Performance Considerations

For tables with millions of rows, bulk DELETEs create problems:

Long-running transactions block other operations
Large amounts of dead tuples trigger expensive autovacuum
Write amplification impacts disk I/O

A 16 CPU/64GB machine can handle batch deletes every 5 minutes without issues—delete queries typically complete under 35ms. But this depends on your specific workload and table structure.

For high-volume tables, consider Approach 2 instead.

Approach 2: Partition Management with pg_partman

When DELETE statements can't keep up with your data growth, partitioning offers a more efficient alternative. Instead of deleting rows, you drop entire partitions—a nearly instant operation regardless of partition size.

pg_partman automates partition creation and maintenance. It removes the need to manually create and detach partitions for tables partitioned by time or an integer series.

When to Use Partitioning

Partitioning makes sense when:

Tables grow by millions of rows per day
Retention periods are well-defined (30 days, 90 days, etc.)
Query patterns naturally align with time ranges
DELETE operations are causing performance issues

Partitioning adds complexity. For tables under 10 million rows with predictable growth, pg_cron deletions are simpler to maintain.

Setting Up pg_partman

Enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_partman;

Let's partition an analytics events table by month:

-- Create the parent table with partitioning
CREATE TABLE public.analytics_events (
    id UUID DEFAULT gen_random_uuid(),
    event_type TEXT NOT NULL,
    payload JSONB,
    user_id UUID REFERENCES auth.users(id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Configure pg_partman to manage partitions
SELECT partman.create_parent(
    p_parent_table := 'public.analytics_events',
    p_control := 'created_at',
    p_type := 'native',
    p_interval := 'monthly',
    p_premake := 3  -- Create 3 months ahead
);

Configuring Retention

Set up automatic partition archival after 6 months:

UPDATE partman.part_config 
SET retention = '6 months',
    retention_keep_table = false  -- Actually drop old partitions
WHERE parent_table = 'public.analytics_events';

If you want to archive partitions to a separate schema before dropping:

UPDATE partman.part_config 
SET retention = '6 months',
    retention_keep_table = true,
    retention_schema = 'archive'
WHERE parent_table = 'public.analytics_events';

Scheduling Maintenance

pg_partman requires regular maintenance runs. Schedule this with pg_cron:

-- Run maintenance daily at 4 AM
SELECT cron.schedule(
    'partman-maintenance',
    '0 4 * * *',
    $$CALL partman.run_maintenance_proc()$$
);

The maintenance procedure creates new partitions as needed and drops or archives old ones based on your retention settings.

Migrating Existing Tables to Partitioned Tables

Converting an existing large table to partitioned requires planning. Supabase provides guidance on this process—you'll essentially create a new partitioned table, migrate data in batches, and swap the tables.

This is an advanced operation that can cause downtime if not handled carefully. Test thoroughly in a staging environment first.

Approach 3: Storage Object Lifecycle Management

Database retention only covers half the picture. Supabase Storage objects—uploads, images, documents—accumulate separately and won't be cleaned up by database retention policies.

The Challenge

When uploading files to S3-compatible storage, you'd typically use lifecycle policies to expire objects automatically. Supabase Storage doesn't expose this API directly, requiring workarounds for self-hosted deployments.

Using Metadata for Expiration

One community approach uses storage object metadata to track expiration:

-- When uploading, set expiration in metadata
INSERT INTO storage.objects (bucket_id, name, metadata)
VALUES (
    'user-uploads',
    'temp-file.pdf',
    jsonb_build_object('expires_at', NOW() + INTERVAL '7 days')
);

Then create a cleanup function:

CREATE OR REPLACE FUNCTION cleanup_expired_storage()
RETURNS void AS $$
DECLARE
    expired_object RECORD;
BEGIN
    FOR expired_object IN 
        SELECT id, name, bucket_id 
        FROM storage.objects 
        WHERE (metadata->>'expires_at')::timestamptz < NOW()
    LOOP
        -- Delete via storage API or direct removal
        DELETE FROM storage.objects WHERE id = expired_object.id;
    END LOOP;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

Schedule with pg_cron:

SELECT cron.schedule(
    'cleanup-expired-storage',
    '0 5 * * *',
    $$SELECT cleanup_expired_storage()$$
);

Direct S3 Lifecycle Rules

If your self-hosted Supabase uses MinIO or another S3-compatible backend for storage, you can configure lifecycle rules at the storage layer:

# Using mc (MinIO client)
mc ilm rule add myminio/supabase-storage \
    --prefix "temp/" \
    --expiry-days 7

This handles cleanup at the storage level, independent of Supabase's metadata. Check our guide on storage backup for self-hosted Supabase for more on storage architecture.

Archival Strategies for Compliance

Some data can't simply be deleted—audit logs, financial records, and user activity may need long-term retention for compliance. Archiving moves this data out of your active database while keeping it accessible when needed.

Simple Archive Schema

Create an archive schema and move old data there:

CREATE SCHEMA IF NOT EXISTS archive;

-- Archive old audit logs
INSERT INTO archive.audit_logs
SELECT * FROM public.audit_logs
WHERE created_at < NOW() - INTERVAL '1 year';

-- Then delete from active table
DELETE FROM public.audit_logs
WHERE created_at < NOW() - INTERVAL '1 year';

Wrap this in a function and schedule with pg_cron for automated archival.

External Archival

For data that must be retained but rarely accessed, consider:

Exporting to cold storage (S3 Glacier, Backblaze B2)
Dumping to separate archive database
Converting to Parquet files for analytical queries

A rotation strategy might look like:

Age	Storage Tier	Access Pattern
0-7 days	Primary database	Real-time
7-30 days	Hot archive	Minutes
30-365 days	Cold storage (S3)	Hours
1+ years	Deep archive (Glacier)	Days

This balances cost with recovery needs across different data ages.

Putting It Together: A Complete Retention Strategy

Here's how these pieces fit together in practice:

Audit your data: Identify tables, growth rates, and retention requirements
Choose your approach: pg_cron for small-medium tables, pg_partman for high-volume
Implement incrementally: Start with one table, monitor, then expand
Set up monitoring: Track job success rates and table sizes over time
Document policies: Make retention periods explicit for compliance

For most self-hosted Supabase deployments, a combination works best:

pg_cron deletions for auth sessions, tokens, and app logs
pg_partman for high-volume analytics or time-series data
Storage lifecycle rules for temporary file uploads
Archive schema for compliance-required historical data

Simplifying Data Management with Supascale

Implementing retention policies manually works, but it's another operational task to maintain. You're writing SQL, scheduling jobs, monitoring execution, and updating policies as requirements change.

Supascale can help reduce this operational burden for self-hosted Supabase. While you'll still need to define your retention requirements, having automated backups with configurable schedules means you're not losing data while you implement cleanup policies. One-click restore capability provides a safety net as you test retention jobs.

Check out our documentation on backup storage to see how automated S3 backups complement your data lifecycle management.

Summary

Data retention isn't glamorous, but it's essential for healthy self-hosted Supabase deployments:

Use pg_cron for straightforward scheduled deletions on smaller tables
Use pg_partman when table sizes or growth rates make DELETE operations impractical
Don't forget storage—implement expiration for uploaded files separately
Archive strategically—move compliance data rather than keeping it in active tables
Monitor continuously—silent failures lead to backlogs that are hard to clear

The unlimited growth of self-hosted databases is a feature, not a bug. But like any feature, it requires intentional management to work in your favor.