Alerting and Notifications for Self-Hosted Supabase

Set up production-grade alerting for self-hosted Supabase with Alertmanager, Slack, PagerDuty, and Discord notifications.

Cover Image for Alerting and Notifications for Self-Hosted Supabase

You've deployed self-hosted Supabase, configured monitoring with Prometheus and Grafana, and everything looks great. But what happens at 3 AM when your database runs out of connections or disk space fills up? Without proper alerting, you won't know until users start complaining.

This guide covers setting up production-grade alerting for self-hosted Supabase—from defining meaningful alert rules to routing notifications to Slack, Discord, PagerDuty, and email.

Why Alerting Matters for Self-Hosted Supabase

When you self-host Supabase, you take on operational responsibility that Supabase Cloud handles for you. The platform won't automatically notify you when:

  • PostgreSQL connections are exhausted
  • Disk usage exceeds safe thresholds
  • Realtime subscriptions spike beyond capacity
  • Long-running queries block other operations
  • Auth service response times degrade

Without alerts, these issues silently compound until they cause downtime. The goal isn't to eliminate all problems—it's to catch them before they impact users.

The Alerting Stack Architecture

Self-hosted Supabase monitoring typically uses this stack:

Supabase Services → Prometheus → Alertmanager → Notification Channels
                        ↓
                    Grafana (dashboards + optional alerting)

Prometheus scrapes metrics from Supabase services and evaluates alert rules. When rules fire, it sends alerts to Alertmanager, which handles deduplication, grouping, and routing to your notification channels.

You can also use Grafana Alerting as an alternative or complement to Alertmanager. Grafana's built-in alerting works well for smaller deployments and integrates directly with your dashboards.

Setting Up Prometheus Alert Rules

First, define what conditions warrant alerts. Create a file called supabase-alerts.yml in your Prometheus rules directory:

groups:
  - name: supabase-database
    rules:
      # High connection usage - warning
      - alert: PostgresConnectionsHigh
        expr: pg_stat_activity_count / pg_settings_max_connections * 100 > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PostgreSQL connections above 80%"
          description: "{{ $value | printf \"%.1f\" }}% of max connections used"

      # Critical connection saturation
      - alert: PostgresConnectionsCritical
        expr: pg_stat_activity_count / pg_settings_max_connections * 100 > 95
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "PostgreSQL connections nearly exhausted"
          description: "{{ $value | printf \"%.1f\" }}% of connections used - immediate action required"

      # Disk space warning
      - alert: PostgresDiskSpaceLow
        expr: pg_volume_free_bytes / pg_volume_total_bytes * 100 < 20
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "PostgreSQL disk space below 20%"
          description: "Only {{ $value | printf \"%.1f\" }}% disk space remaining"

      # Long-running transactions
      - alert: LongRunningTransaction
        expr: pg_stat_activity_max_tx_duration_seconds > 300
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Transaction running longer than 5 minutes"
          description: "Longest transaction has been running for {{ $value | humanizeDuration }}"

  - name: supabase-services
    rules:
      # Auth service down
      - alert: AuthServiceDown
        expr: up{job="supabase-auth"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "GoTrue auth service is down"
          description: "Authentication will fail for all users"

      # Realtime connection spike
      - alert: RealtimeConnectionsHigh
        expr: realtime_connected_clients > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High number of Realtime connections"
          description: "{{ $value }} clients connected to Realtime"

      # Storage service errors
      - alert: StorageErrorRate
        expr: rate(storage_api_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Elevated storage API error rate"
          description: "{{ $value | printf \"%.2f\" }} errors per second"

Reference these rules in your prometheus.yml:

rule_files:
  - /etc/prometheus/rules/supabase-alerts.yml

Configuring Alertmanager

Alertmanager receives alerts from Prometheus and routes them to appropriate channels. Create an alertmanager.yml configuration:

global:
  resolve_timeout: 5m
  slack_api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'

route:
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'slack-notifications'
  
  routes:
    # Critical alerts go to PagerDuty
    - match:
        severity: critical
      receiver: 'pagerduty-critical'
      repeat_interval: 1h
    
    # Warning alerts go to Slack
    - match:
        severity: warning
      receiver: 'slack-notifications'

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#supabase-alerts'
        send_resolved: true
        title: '{{ if eq .Status "firing" }}🚨{{ else }}✅{{ end }} {{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'

  - name: 'pagerduty-critical'
    pagerduty_configs:
      - service_key: 'YOUR_PAGERDUTY_SERVICE_KEY'
        send_resolved: true
        description: '{{ .CommonAnnotations.summary }}'
        details:
          description: '{{ .CommonAnnotations.description }}'
          num_alerts: '{{ len .Alerts }}'

inhibit_rules:
  # Don't send warning if critical is already firing
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname']

Key configuration concepts:

  • group_by: Combines related alerts into single notifications
  • group_wait: Time to wait before sending first notification (allows grouping)
  • repeat_interval: How often to resend unresolved alerts
  • inhibit_rules: Prevents duplicate notifications when critical supersedes warning

Setting Up Slack Notifications

Slack is the most common notification channel for development teams. Here's how to set it up:

1. Create a Slack Webhook

  1. Go to api.slack.com/apps and create a new app
  2. Enable Incoming Webhooks
  3. Add a webhook to your desired channel
  4. Copy the webhook URL

2. Configure Rich Notifications

For more informative Slack alerts, use attachments:

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#supabase-alerts'
        send_resolved: true
        color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
        title: '{{ .CommonAnnotations.summary }}'
        title_link: 'https://grafana.yourdomain.com/alerting/list'
        text: '{{ .CommonAnnotations.description }}'
        fields:
          - title: 'Severity'
            value: '{{ .CommonLabels.severity }}'
            short: true
          - title: 'Status'
            value: '{{ .Status }}'
            short: true

Setting Up Discord Notifications

Discord uses webhooks similar to Slack. Create a webhook in your server settings, then configure Alertmanager:

receivers:
  - name: 'discord-notifications'
    webhook_configs:
      - url: 'https://discord.com/api/webhooks/YOUR/WEBHOOK'
        send_resolved: true
        http_config:
          follow_redirects: true

For Discord, you'll need to format the payload. Consider using a webhook proxy like Alertmanager-Discord for better formatting.

Setting Up PagerDuty for Critical Alerts

PagerDuty should handle critical alerts that require immediate human intervention. Reserve it for:

  • Service outages
  • Database connection exhaustion
  • Disk space critical (below 5%)
  • Auth or API service failures
receivers:
  - name: 'pagerduty-critical'
    pagerduty_configs:
      - routing_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
        send_resolved: true
        severity: '{{ .CommonLabels.severity }}'
        client: 'Supabase Alertmanager'
        client_url: 'https://grafana.yourdomain.com'
        description: '{{ .CommonAnnotations.summary }}'
        details:
          firing: '{{ .Alerts.Firing | len }}'
          resolved: '{{ .Alerts.Resolved | len }}'
          description: '{{ .CommonAnnotations.description }}'

Using Grafana Alerting as an Alternative

If you prefer keeping alerting within Grafana, you can skip Alertmanager and use Grafana's built-in alerting:

1. Create Alert Rules in Grafana

Navigate to Alerting → Alert rules → Create alert rule:

  1. Define the query (same PromQL as Prometheus rules)
  2. Set conditions (threshold, duration)
  3. Add labels and annotations
  4. Link to a notification policy

2. Configure Contact Points

In Alerting → Contact points, add your notification channels:

  • Slack: Requires webhook URL
  • PagerDuty: Requires integration key
  • Email: Requires SMTP configuration
  • Discord: Use webhook integration

3. Set Up Notification Policies

Route alerts to contact points based on labels:

- alertname="*Critical*" → PagerDuty
- severity="warning" → Slack
- default → Email

Essential Alerts for Self-Hosted Supabase

Based on common failure modes, here are the alerts every self-hosted deployment should have:

Database Alerts

AlertConditionSeverityWhy It Matters
Connection saturation>80% usedWarningClients will start failing
Disk space low<20% freeWarningPrevents writes, crashes
Long transactions>5 minWarningCauses lock contention
Replication lag>30 secWarningReplicas serve stale data
Dead tuples high>10%WarningNeeds VACUUM

Service Alerts

AlertConditionSeverityWhy It Matters
Auth downService unreachableCriticalAll auth fails
Realtime downService unreachableCriticalNo live updates
Storage errorsError rate >1%WarningFile uploads fail
API latencyp95 >2 secWarningPoor user experience

Infrastructure Alerts

AlertConditionSeverityWhy It Matters
Container restart>3 in 10 minWarningService instability
Memory pressure>90% usedWarningOOM kills imminent
CPU sustained high>80% for 15 minWarningPerformance degradation

Testing Your Alert Pipeline

Before relying on alerts in production, test the entire pipeline:

1. Trigger a Test Alert

Create a rule that always fires:

- alert: TestAlert
  expr: vector(1)
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: "Test alert - please ignore"
    description: "Verifying alerting pipeline"

2. Verify Delivery

Check that notifications arrive in all configured channels. Then remove the test rule.

3. Test Resolved Notifications

Ensure you receive "resolved" notifications when alerts clear. This confirms the full lifecycle works.

Reducing Alert Fatigue

Alert fatigue is real—too many notifications and you'll start ignoring them. Follow these principles:

Be specific: Alert on conditions that require action, not just anomalies.

Use appropriate severities: Not everything is critical. Reserve pager alerts for true emergencies.

Set meaningful thresholds: 50% disk usage isn't urgent. 90% is.

Tune for durations: Short spikes often resolve themselves. Require conditions to persist before alerting.

Group related alerts: One notification for multiple related issues, not a flood.

How Supascale Simplifies Alerting

Setting up alerting infrastructure is time-consuming. With Supascale, you get built-in monitoring and can focus on what matters—building your application.

While Supascale doesn't replace dedicated alerting for custom metrics, it provides:

  • Health status visibility for all Supabase services
  • Easy access to service logs for troubleshooting
  • One-click management that reduces operational surprises

For teams that want self-hosting benefits without the full operational burden, Supascale's approach offers a middle ground between raw Docker Compose and fully managed cloud.

Conclusion

Effective alerting transforms self-hosted Supabase from a liability into a reliable platform. The key principles:

  1. Alert on actionable conditions—not everything that looks unusual
  2. Route by severity—Slack for warnings, PagerDuty for critical
  3. Test your pipeline—before you need it in production
  4. Iterate on thresholds—tune based on real operational experience

Start with the essential alerts listed above, then expand based on your specific failure modes. Remember: the best alert is one you never receive because you've built a stable system.

Further Reading