Alerting and Notifications for Self-Hosted Supabase

You've deployed self-hosted Supabase, configured monitoring with Prometheus and Grafana, and everything looks great. But what happens at 3 AM when your database runs out of connections or disk space fills up? Without proper alerting, you won't know until users start complaining.

This guide covers setting up production-grade alerting for self-hosted Supabase—from defining meaningful alert rules to routing notifications to Slack, Discord, PagerDuty, and email.

Why Alerting Matters for Self-Hosted Supabase

When you self-host Supabase, you take on operational responsibility that Supabase Cloud handles for you. The platform won't automatically notify you when:

PostgreSQL connections are exhausted
Disk usage exceeds safe thresholds
Realtime subscriptions spike beyond capacity
Long-running queries block other operations
Auth service response times degrade

Without alerts, these issues silently compound until they cause downtime. The goal isn't to eliminate all problems—it's to catch them before they impact users.

The Alerting Stack Architecture

Self-hosted Supabase monitoring typically uses this stack:

Supabase Services → Prometheus → Alertmanager → Notification Channels
                        ↓
                    Grafana (dashboards + optional alerting)

Prometheus scrapes metrics from Supabase services and evaluates alert rules. When rules fire, it sends alerts to Alertmanager, which handles deduplication, grouping, and routing to your notification channels.

You can also use Grafana Alerting as an alternative or complement to Alertmanager. Grafana's built-in alerting works well for smaller deployments and integrates directly with your dashboards.

Setting Up Prometheus Alert Rules

First, define what conditions warrant alerts. Create a file called supabase-alerts.yml in your Prometheus rules directory:

groups:
  - name: supabase-database
    rules:
      # High connection usage - warning
      - alert: PostgresConnectionsHigh
        expr: pg_stat_activity_count / pg_settings_max_connections * 100 > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PostgreSQL connections above 80%"
          description: "{{ $value | printf \"%.1f\" }}% of max connections used"

      # Critical connection saturation
      - alert: PostgresConnectionsCritical
        expr: pg_stat_activity_count / pg_settings_max_connections * 100 > 95
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "PostgreSQL connections nearly exhausted"
          description: "{{ $value | printf \"%.1f\" }}% of connections used - immediate action required"

      # Disk space warning
      - alert: PostgresDiskSpaceLow
        expr: pg_volume_free_bytes / pg_volume_total_bytes * 100 < 20
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "PostgreSQL disk space below 20%"
          description: "Only {{ $value | printf \"%.1f\" }}% disk space remaining"

      # Long-running transactions
      - alert: LongRunningTransaction
        expr: pg_stat_activity_max_tx_duration_seconds > 300
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Transaction running longer than 5 minutes"
          description: "Longest transaction has been running for {{ $value | humanizeDuration }}"

  - name: supabase-services
    rules:
      # Auth service down
      - alert: AuthServiceDown
        expr: up{job="supabase-auth"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "GoTrue auth service is down"
          description: "Authentication will fail for all users"

      # Realtime connection spike
      - alert: RealtimeConnectionsHigh
        expr: realtime_connected_clients > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High number of Realtime connections"
          description: "{{ $value }} clients connected to Realtime"

      # Storage service errors
      - alert: StorageErrorRate
        expr: rate(storage_api_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Elevated storage API error rate"
          description: "{{ $value | printf \"%.2f\" }} errors per second"

Reference these rules in your prometheus.yml:

rule_files:
  - /etc/prometheus/rules/supabase-alerts.yml

Configuring Alertmanager

Alertmanager receives alerts from Prometheus and routes them to appropriate channels. Create an alertmanager.yml configuration:

global:
  resolve_timeout: 5m
  slack_api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'

route:
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'slack-notifications'
  
  routes:
    # Critical alerts go to PagerDuty
    - match:
        severity: critical
      receiver: 'pagerduty-critical'
      repeat_interval: 1h
    
    # Warning alerts go to Slack
    - match:
        severity: warning
      receiver: 'slack-notifications'

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#supabase-alerts'
        send_resolved: true
        title: '{{ if eq .Status "firing" }}🚨{{ else }}✅{{ end }} {{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'

  - name: 'pagerduty-critical'
    pagerduty_configs:
      - service_key: 'YOUR_PAGERDUTY_SERVICE_KEY'
        send_resolved: true
        description: '{{ .CommonAnnotations.summary }}'
        details:
          description: '{{ .CommonAnnotations.description }}'
          num_alerts: '{{ len .Alerts }}'

inhibit_rules:
  # Don't send warning if critical is already firing
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname']

Key configuration concepts:

group_by: Combines related alerts into single notifications
group_wait: Time to wait before sending first notification (allows grouping)
repeat_interval: How often to resend unresolved alerts
inhibit_rules: Prevents duplicate notifications when critical supersedes warning

Setting Up Slack Notifications

Slack is the most common notification channel for development teams. Here's how to set it up:

1. Create a Slack Webhook

Go to api.slack.com/apps and create a new app
Enable Incoming Webhooks
Add a webhook to your desired channel
Copy the webhook URL

2. Configure Rich Notifications

For more informative Slack alerts, use attachments:

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#supabase-alerts'
        send_resolved: true
        color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
        title: '{{ .CommonAnnotations.summary }}'
        title_link: 'https://grafana.yourdomain.com/alerting/list'
        text: '{{ .CommonAnnotations.description }}'
        fields:
          - title: 'Severity'
            value: '{{ .CommonLabels.severity }}'
            short: true
          - title: 'Status'
            value: '{{ .Status }}'
            short: true

Setting Up Discord Notifications

Discord uses webhooks similar to Slack. Create a webhook in your server settings, then configure Alertmanager:

receivers:
  - name: 'discord-notifications'
    webhook_configs:
      - url: 'https://discord.com/api/webhooks/YOUR/WEBHOOK'
        send_resolved: true
        http_config:
          follow_redirects: true

For Discord, you'll need to format the payload. Consider using a webhook proxy like Alertmanager-Discord for better formatting.

Setting Up PagerDuty for Critical Alerts

PagerDuty should handle critical alerts that require immediate human intervention. Reserve it for:

Service outages
Database connection exhaustion
Disk space critical (below 5%)
Auth or API service failures

receivers:
  - name: 'pagerduty-critical'
    pagerduty_configs:
      - routing_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
        send_resolved: true
        severity: '{{ .CommonLabels.severity }}'
        client: 'Supabase Alertmanager'
        client_url: 'https://grafana.yourdomain.com'
        description: '{{ .CommonAnnotations.summary }}'
        details:
          firing: '{{ .Alerts.Firing | len }}'
          resolved: '{{ .Alerts.Resolved | len }}'
          description: '{{ .CommonAnnotations.description }}'

Using Grafana Alerting as an Alternative

If you prefer keeping alerting within Grafana, you can skip Alertmanager and use Grafana's built-in alerting:

1. Create Alert Rules in Grafana

Navigate to Alerting → Alert rules → Create alert rule:

Define the query (same PromQL as Prometheus rules)
Set conditions (threshold, duration)
Add labels and annotations
Link to a notification policy

2. Configure Contact Points

In Alerting → Contact points, add your notification channels:

Slack: Requires webhook URL
PagerDuty: Requires integration key
Email: Requires SMTP configuration
Discord: Use webhook integration

3. Set Up Notification Policies

Route alerts to contact points based on labels:

- alertname="*Critical*" → PagerDuty
- severity="warning" → Slack
- default → Email

Essential Alerts for Self-Hosted Supabase

Based on common failure modes, here are the alerts every self-hosted deployment should have:

Database Alerts

Alert	Condition	Severity	Why It Matters
Connection saturation	>80% used	Warning	Clients will start failing
Disk space low	<20% free	Warning	Prevents writes, crashes
Long transactions	>5 min	Warning	Causes lock contention
Replication lag	>30 sec	Warning	Replicas serve stale data
Dead tuples high	>10%	Warning	Needs VACUUM

Service Alerts

Alert	Condition	Severity	Why It Matters
Auth down	Service unreachable	Critical	All auth fails
Realtime down	Service unreachable	Critical	No live updates
Storage errors	Error rate >1%	Warning	File uploads fail
API latency	p95 >2 sec	Warning	Poor user experience

Infrastructure Alerts

Alert	Condition	Severity	Why It Matters
Container restart	>3 in 10 min	Warning	Service instability
Memory pressure	>90% used	Warning	OOM kills imminent
CPU sustained high	>80% for 15 min	Warning	Performance degradation

Testing Your Alert Pipeline

Before relying on alerts in production, test the entire pipeline:

1. Trigger a Test Alert

Create a rule that always fires:

- alert: TestAlert
  expr: vector(1)
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: "Test alert - please ignore"
    description: "Verifying alerting pipeline"

2. Verify Delivery

Check that notifications arrive in all configured channels. Then remove the test rule.

3. Test Resolved Notifications

Ensure you receive "resolved" notifications when alerts clear. This confirms the full lifecycle works.

Reducing Alert Fatigue

Alert fatigue is real—too many notifications and you'll start ignoring them. Follow these principles:

Be specific: Alert on conditions that require action, not just anomalies.

Use appropriate severities: Not everything is critical. Reserve pager alerts for true emergencies.

Set meaningful thresholds: 50% disk usage isn't urgent. 90% is.

Tune for durations: Short spikes often resolve themselves. Require conditions to persist before alerting.

Group related alerts: One notification for multiple related issues, not a flood.

How Supascale Simplifies Alerting

Setting up alerting infrastructure is time-consuming. With Supascale, you get built-in monitoring and can focus on what matters—building your application.

While Supascale doesn't replace dedicated alerting for custom metrics, it provides:

Health status visibility for all Supabase services
Easy access to service logs for troubleshooting
One-click management that reduces operational surprises

For teams that want self-hosting benefits without the full operational burden, Supascale's approach offers a middle ground between raw Docker Compose and fully managed cloud.

Conclusion

Effective alerting transforms self-hosted Supabase from a liability into a reliable platform. The key principles:

Alert on actionable conditions—not everything that looks unusual
Route by severity—Slack for warnings, PagerDuty for critical
Test your pipeline—before you need it in production
Iterate on thresholds—tune based on real operational experience

Start with the essential alerts listed above, then expand based on your specific failure modes. Remember: the best alert is one you never receive because you've built a stable system.