PostgreSQL PITR backup
Daily base backups via pgBackRest plus continuous WAL archiving via WAL-G. 30-day PITR window with 5-minute granularity. Tested against the actual PostgreSQL version and extensions in use on your infrastructure.
The typical email DR failure mode is not "the server caught fire at 3am" but "at 14:32 today someone ran a query without a WHERE clause and flagged 200,000 subscribers as bounced when they were not". A full-backup-only strategy means restoring to 02:00 last night and losing 12 hours of legitimate signups, opens, clicks, and unsubscribes that happened between the backup and the mistake. Point-in-time recovery (PITR) restores PostgreSQL to any second within the retention window. The Tiger Data PostgreSQL backup guide updated December 2025 and the Bytebase 2026 backup tools comparison both document PITR as the production baseline for operations where database state has business value beyond the last 24 hours. For email infrastructure, this is every operation.
Disaster Recovery Setup deploys tested backup and restore across the four data classes that matter for email infrastructure. PostgreSQL with PITR via continuous WAL archiving plus daily base backups. MailWizz application state (database plus filesystem for templates, lists, configuration) backed up atomically to prevent cross-component inconsistency. PowerMTA configuration and persistent queue state backed up on every change. Suppression lists (unsubscribes, hard bounces, complaints) backed up separately as append-only log applied as the last step of any restore so customers who opted out after the restore point never receive mail post-restore. AES-256-GCM encryption with separately-stored keys. Dual-region offsite storage following the 3-2-1 rule. One full restore drill into a separate test environment included with measured RTO and RPO documented. One-time EUR 1,299, delivery in 10 business days.
Most DR plans pick one retention period and apply it uniformly. The reality is that recovery scenarios have different time horizons. Operational mistakes surface within days; regulatory inquiry surfaces within months; compliance retention requirements span years. Three tiers cover all three scenarios at appropriate storage cost.
| Tier | Cadence | Retention | Recovery scenario | Storage class |
|---|---|---|---|---|
| Hot | Daily full + 5min WAL | 30 days | Operational mistakes (bad query, accidental delete, mistaken bulk action) | Local SSD with PITR enabled |
| Warm | Weekly full | 90 days | Regulatory inquiry, audit request, customer dispute about past state | Object storage, encrypted |
| Cold | Monthly full | 365 days | Compliance retention (GDPR records, SOC 2 evidence, legal hold) | Cold object storage, encrypted, immutable |
Suppression lists run on a separate continuous schedule with retention matched to the longest tier (365 days) because suppression state cannot be lost regardless of how far back the operational restore goes. The three-tier model and per-tier retention is configurable per customer compliance and storage cost preference; the defaults above match most operations.
Most cloud platforms offer VM-level snapshots as the default backup option. The marketing pitch is appealing: one-click backup of the whole machine, one-click restore to a fresh instance. The reality is more nuanced. VM snapshots work well for stateless or near-stateless workloads where any point-in-time copy is acceptable. They work poorly for stateful database workloads where the snapshot must capture a transactionally consistent state. For PostgreSQL specifically, a filesystem snapshot of a running database often catches the data files at one moment, the WAL segments at a slightly different moment, and the shared buffers in memory that the snapshot misses entirely. The result is a snapshot that may or may not restore depending on what PostgreSQL was doing at the snapshot moment.
The PostgreSQL community has produced specific backup tools that handle this correctly. The Bytebase 2026 backup tools comparison covers the current mature options: pgBackRest (most popular, archived April 2026 but stable), Barman from 2ndQuadrant, WAL-G (modern Go-based, parallel compression), and pgmoneta (newer daemon-based approach). All four implement the same fundamental pattern: a consistent base backup using pg_basebackup or equivalent PostgreSQL-native mechanism, plus continuous WAL archiving for point-in-time recovery. The deployment choice depends on operational preference and existing tool familiarity; the engagement defaults to pgBackRest plus WAL-G for parallel WAL shipping but adapts to customer preference.
MailWizz adds the cross-component consistency problem. The application state lives in two places: a PostgreSQL database holding lists, subscribers, campaigns, and metadata, and a filesystem directory holding uploaded templates, images, customer configuration files, and asset attachments. A backup that captures the database at 03:00:15 and the filesystem at 03:00:42 may end up with a campaign referencing a template ID that does not yet exist in the filesystem snapshot (because the upload happened in those 27 seconds) or a template file referencing a campaign that does not yet exist in the database. Restoration produces an inconsistent state where the application throws errors for missing-but-referenced objects. The engagement handles this by quiescing MailWizz background jobs during the backup window (typically 30-90 seconds) and capturing database plus filesystem in a coordinated transaction that either both succeed or both roll back.
PowerMTA configuration backup is simpler because the state changes less frequently. The configuration file plus the virtual-mta definitions plus the IP pool assignments change on operator action rather than continuously. Event-triggered backup on every configuration change captures the change immediately, versioned with timestamp and operator attribution. The 90-day version history lets operations roll back to a prior configuration if a change introduced an issue. The persistent queue state (messages awaiting delivery, deferred bounces being retried) gets snapshotted at backup time but is less critical because PowerMTA can rebuild queue state from recent message logs if needed during recovery.
The suppression list backup deserves specific attention because it is the most consequential single restore failure mode. Every email operation accumulates a list of recipients who unsubscribed, bounced hard, or complained. Restoring to a backup from three days ago wipes any suppression events captured in those three days. Re-sending to those recipients violates GDPR Article 7 (right of withdrawal must be honored immediately and permanently), CAN-SPAM section 5 (opt-out must be honored within 10 days and permanently), and CASL consent withdrawal rules. The penalty for individual incidents varies by jurisdiction but the pattern is consistent: regulators treat post-restore re-sending as evidence of inadequate technical measures rather than as accidental. The engagement handles this by running suppression backup as a continuous append-only log (every unsubscribe event captured immediately with cryptographic ordering) and applying the log as the last step of any restore, overriding the restored database suppression state with the current authoritative log. The result: operational state matches the restore target, but suppression state always reflects present-moment opt-outs.
The 3-2-1 rule from the Acronis 2026 backup guide and the Medium PostgreSQL backup production-patterns guide both formalize the offsite requirement: three copies of data, on two different media types, with one copy offsite from the primary location. The engagement implements this as: primary backup local to the operation\'s primary region with fast restore capability, secondary backup in a different datacenter region for geographic separation, tertiary long-term cold storage for compliance retention. The dual-region offsite defends against regional events (datacenter outage, regional cloud failure, jurisdiction-level regulatory event) that would destroy single-region backups. The cold tier provides defense against scenarios requiring data from months or years ago. All three storage tiers use AES-256-GCM encryption with keys stored separately from the backup data, so disk theft or unauthorized storage access does not yield readable backups.
Daily base backups via pgBackRest plus continuous WAL archiving via WAL-G. 30-day PITR window with 5-minute granularity. Tested against the actual PostgreSQL version and extensions in use on your infrastructure.
Database and filesystem captured in coordinated transaction with background job quiescing during the 30-90 second backup window. Daily cadence. Template, list, and configuration state preserved consistently.
Event-triggered backup on every configuration change with operator attribution and timestamp. 90-day version history. Includes virtual-mta definitions, IP pool assignments, and recipient domain policies.
Append-only log capturing every unsubscribe, hard bounce, and complaint event in real-time. 365-day retention. Applied as the final step of any restore so post-restore state honors all opt-outs through the present moment.
Per-backup nonce, keys stored on separate infrastructure with their own access controls and audit logging. Disk theft or unauthorized storage access does not yield readable backups. Annual key rotation.
Three-tier retention (30/90/365 days) replicated across two geographic regions matched to your compliance posture. Default EU primary plus non-EU secondary; EU-only or non-EU-only configurations supported.
One full end-to-end restore drill into a separate test environment within 14 days of backup deployment. RTO and RPO measured against your specific stack. Written drill report covering procedure deviations and corrective actions.
Written restore runbook with step-by-step procedure, decision criteria, escalation contacts, and rollback steps. Encryption key access procedures, regional failover guidance, and validation checklist for post-restore verification.
If your subscriber base, campaign history, or list segmentation took months or years to build, losing it to a non-recoverable backup means rebuilding from zero. The engagement cost is small relative to the rebuild cost.
Operations that experienced a real data incident (corrupted database, accidental delete, ransomware, ESP migration mishap) and need tested DR posture before continuing operations. The drill validates the recovery actually works before relying on it again.
Operations preparing SOC 2 or ISO 27001 evidence packages need documented DR procedures, tested recovery, and offsite storage with documented retention. The engagement produces the artefacts auditors expect to see.
Operations launching new email infrastructure where DR should be in place at day zero rather than retrofitted later after the first incident. The engagement integrates with initial PowerMTA + MailWizz setup.
Financial services, healthcare, insurance, and other regulated industries where data loss has direct regulatory consequence. The dual-region offsite and 365-day cold tier match typical regulatory retention expectations.
Operations preparing for acquisition or investment due diligence where backup and DR posture appears on the diligence checklist. The engagement produces the documented evidence that diligence teams expect.
A one-time engagement deploying tested backup and restore for your email infrastructure stack. The deliverables cover four data classes. First: PostgreSQL database backup using point-in-time recovery (PITR) with continuous WAL archiving, allowing restoration to any second within the retention window rather than just to the last full backup point. Second: MailWizz application state including configuration, templates, lists, segments, and uploaded assets. Third: PowerMTA configuration, virtual MTA definitions, IP pool assignments, and persistent queue state. Fourth: suppression lists (hard bounces, complaints, unsubscribes) which must be preserved across any restore to avoid re-sending to recipients who already opted out. Encryption: AES-256-GCM with per-backup nonce, keys stored separately from backup data. Storage: dual-region offsite (one EU, one non-EU by default; configurable per customer compliance posture). Testing: one full restore drill into a separate environment with documented RTO and RPO measurement, included in the engagement. Documentation: written runbook with restore procedure, decision criteria, escalation contacts, and rollback steps. One-time engagement EUR 1,299, delivery in 10 business days.
Three structural reasons. First: PostgreSQL has specific backup requirements that generic file-level snapshots do not satisfy. A naive filesystem snapshot of a running PostgreSQL data directory often produces an inconsistent backup that PostgreSQL cannot start from because the WAL state, shared buffers, and on-disk pages were not flushed atomically. Real PostgreSQL DR uses pg_basebackup or equivalent for the base backup plus continuous WAL archiving for PITR. Second: MailWizz has cross-component state that must be backed up together. The database holds list metadata and campaign state; the filesystem holds uploaded templates, images, and per-customer configuration. A snapshot that catches database mid-write but filesystem at a different point produces an inconsistent restore. Third: suppression lists are operational and legal data that must survive any restore. If a customer unsubscribed yesterday and you restore to a backup from three days ago, you re-send to them and violate GDPR Article 7 right of withdrawal or US CAN-SPAM section 5 immediate opt-out requirements. Suppression lists need separate continuous backup that always represents current state, applied during restore to override any list-state from the base backup.
Point-in-time recovery (PITR) is PostgreSQL\'s mechanism for restoring the database to any specific moment within a retention window rather than just to the timestamp of the last full backup. The mechanism: a full base backup serves as the starting point; continuous WAL (Write-Ahead Log) archiving captures every transaction; restore replays WAL from the base backup forward to the target time. For email infrastructure specifically, PITR matters because the typical disaster recovery scenario is not "the server caught fire at 3am" but rather "at 14:32 today someone ran a query without a WHERE clause and flagged 200,000 subscribers as bounced when they were not". A full-backup-only DR strategy means restoring to 02:00 last night and losing 12 hours of legitimate signups, opens, clicks, and unsubscribes that happened between the backup and the mistake. PITR lets us restore to 14:31, immediately before the mistake, preserving 12 hours and 31 minutes of legitimate state. The retention window for PITR is configurable (typically 7-30 days depending on storage budget); recovery before the retention window falls back to the most recent full backup within the longer-term retention.
Suppression lists are backed up separately from the main database and applied as the last step of any restore. The mechanism: a separate continuous backup process captures unsubscribe events, hard bounce events, and complaint events as they occur, storing them in an append-only log that always represents current operational state. The log is stored separately from main database backups with its own encryption and retention. During a restore, after the main database is restored to the target point-in-time, the suppression log is replayed to apply any opt-outs that occurred after the restore point. The result: the operational database state matches the restore target, but the suppression state reflects all opt-outs through the present moment regardless of when they happened. This pattern prevents the most consequential restore failure mode: re-sending to recipients who opted out after the backup point. The pattern is documented in the GDPR Article 7 and CAN-SPAM section 5 context as the technical means of honoring right of withdrawal across DR scenarios.
Three-tier retention covering different recovery scenarios. Hot tier: continuous WAL archiving every 5 minutes plus daily full PostgreSQL base backups, retained 30 days. Covers the common case of recovering from operational mistakes within the past month. Warm tier: weekly full backups retained 90 days. Covers regulatory inquiry scenarios where data from 1-3 months ago is requested. Cold tier: monthly full backups retained 365 days. Covers long-term retention for compliance frameworks that require 12-month retrievability. Suppression list backups run continuously (append-only log) with retention matched to the longest tier (365 days). MailWizz files (templates, assets, configuration) back up daily with the same three-tier retention. PowerMTA configuration backs up on every change (event-triggered) with versioned history preserved 90 days.
One full end-to-end restore drill is included in the engagement, conducted within 14 days of backup deployment. The drill restores backups into a separate test environment (not production), runs the full email infrastructure stack against the restored data, measures actual RTO (time from drill start to functional system) and RPO (data loss versus the drill start time), validates that suppression lists apply correctly post-restore, and produces a written drill report with measured outcomes. The drill matters because untested backups are theoretical insurance. Backups can complete successfully without errors and still fail to restore for reasons that only surface when you try: incompatible PostgreSQL major version between source and restore environment, missing extensions, file permission issues, encryption key access problems, application-level state expecting specific paths or hostnames. The drill catches these issues in a controlled environment rather than during a real incident.
Dual-region offsite storage with location chosen based on your compliance posture. Default configuration: primary backup in your operation\'s primary region, secondary backup in a different region with separate operator. The two regions provide geographic separation defending against regional disasters. For customers needing GDPR-only data residency, both regions stay within EU. For customers needing non-EU non-US backup posture, both regions can be configured outside both jurisdictions (Panama primary, Singapore secondary). Storage encryption: AES-256-GCM with per-backup nonce, encryption keys stored separately from backup data on infrastructure that cannot read the backup blobs directly. Key rotation: annual.
Different scope and pricing model. Disaster Recovery Setup is a one-time engagement that builds the backup infrastructure, configures the three-tier retention, deploys encryption and offsite storage, runs the initial restore drill, and delivers documentation. Backup-as-a-Service is the ongoing recurring service that operates the backup infrastructure after setup: daily verification that backups completed successfully, monthly restore drills into the test environment, alert on backup failures, periodic key rotation, retention policy enforcement, restore assistance when needed. Many customers buy DR Setup once and then subscribe to BaaS for ongoing operations. Customers who prefer to operate backups themselves can buy DR Setup alone and skip BaaS; the runbook documents everything needed to run backups independently.
Multi-jurisdictional disaster recovery faces specific operational considerations that single-jurisdiction recovery does not. The primary and secondary sites operating in different jurisdictions produce both protective properties and operational complications that operators benefit from understanding before deployment.
The protective properties: cross-jurisdiction failover survives legal-process events affecting either jurisdiction independently, regulatory actions limited to one jurisdiction do not automatically affect the other, infrastructure-level events (datacenter outages, regional network issues) affect at most one site. The properties match the broader operational philosophy of multi-jurisdictional structure that distributes risk across coordinate points rather than concentrating it.
The operational complications: data replication across jurisdictions can implicate data residency requirements that some compliance frameworks impose, latency between sites affects synchronous replication patterns producing tradeoffs between recovery point objectives and operational performance, regulatory coordination during recovery events requires understanding multiple regulatory frameworks simultaneously.
Our disaster recovery setup includes the operational coordination across jurisdictions plus the technical infrastructure: primary site in customer-selected jurisdiction, secondary site in different jurisdiction with appropriate replication, documented failover procedures covering both technical and legal coordination, regular failover testing to verify procedures work as documented.
Disaster recovery setup that has not been tested is not actually disaster recovery; it is documentation of intent without verification of capability. The structural reasoning is that recovery procedures contain assumptions that only testing reveals as accurate or inaccurate. Operations relying on untested recovery procedures discover during actual incidents that specific assumptions did not hold; the discovery is operationally bad timing.
Our standard testing cadence: quarterly failover testing during scheduled windows, with documented test outcomes and identified issues. Annual full-recovery testing including data restoration from backups, with substantially expanded scope versus quarterly tests. The cadence balances ongoing validation against the operational cost of testing; more frequent testing produces marginal additional confidence at substantial operational cost.
Customer involvement during testing is modest but matters. Customer team should participate in at least the annual full-recovery test to validate that customer-side procedures (DNS coordination, vendor communication, internal escalation) work as documented. Tests where customer team is not involved verify only the provider-side procedures; the customer-side procedures remain untested until actual incidents force their execution.
Testing outcomes are documented in customer-accessible records covering: what was tested, what worked as expected, what produced unexpected behavior, what remediation is planned to address identified gaps. The documentation supports both operational improvement and compliance framework requirements that some frameworks impose on disaster recovery testing.
Telegram conversation establishes current infrastructure stack (PostgreSQL version, MailWizz version, PowerMTA version), database size estimates, compliance posture for storage region selection, and any prior backup configuration to replace. Engagement begins within 3 business days of confirmation. Delivery in 10 business days including configuration, initial drill, and documentation. One-time EUR 1,299 fixed.
# Median Telegram response: 12 minutes during operating hours