Skip to content
RESILIENCE BUNDLE | 2-3 REGIONS | DNS FAILOVER | QUARTERLY DRILLS

Email infrastructure that stays up when one datacenter does not.
Active-passive or active-active across regions, measured RTO and RPO, drilled quarterly.

On October 20, 2025, an automated DNS management failure in AWS US-EAST-1 took down a long list of major customer-facing services worldwide for several hours. On March 1, 2026, AWS UAE went partially offline after physical incidents at the datacenter forced workload migration. The pattern is clear: regional outages happen, they affect entire regions simultaneously, and operations without tested failover stay down until the upstream provider restores service. The most common misconception in disaster recovery planning is that Multi-AZ delivers multi-region resilience; it does not. Multi-AZ protects against datacenter failures within one region but shares regional control-plane dependencies that fail together during regional events.

Multi-Jurisdiction Redundancy Pack deploys your email infrastructure across two or three genuinely separate datacenter regions with independent IP pools, independent control planes, independent upstream networks, and independent regulatory jurisdictions. DNS health-check failover detects primary-region failure within 90 seconds (or 30 seconds with tightened settings) and shifts traffic to the secondary. Cross-region suppression sync propagates unsubscribes and hard bounces within 60 seconds. Bounce and complaint events aggregate across regions into a unified deliverability view. The bundle includes quarterly DR drills with measured RTO and RPO outcomes documented against your specific targets. EUR 2,999 setup, EUR 599 monthly. For operations where email downtime translates directly into lost revenue, missed compliance windows, or customer churn, the bundle pays for itself the first time it prevents an outage that would have lasted hours.

Setup EUR 2,999
Monthly EUR 599
Failover RTO 90 sec default
DR drills Quarterly
rto/rpo dr pattern calculator

Calculate which DR pattern fits your downtime tolerance and revenue exposure.

The four standard DR patterns (Backup & Restore, Pilot Light, Warm Standby, Active-Active) trade cost against recovery time. The calculator below uses your inputs to recommend the pattern that fits your operational profile, with honest cost implications and the realistic RTO and RPO that pattern delivers.

Recommended pattern |
Realistic RTO |
Realistic RPO |
Regions needed |
Pack covers this |
six failure mode playbook

The six scenarios the bundle is engineered to handle.

Real-world failure modes from 2024-2026 incident archives. Each scenario carries its own detection signal, response procedure, and rollback path. The bundle includes a written playbook covering all six; the quarterly drill rotates through them.

why this exists

The cost of an outage versus the cost of redundancy.

The economics are stark. Hostperl reported in March 2026 that database failures cost on average EUR 5,600 per minute across industries; e-commerce sites during peak hours reach EUR 50,000 per minute. For operations where email is the backbone for transactional notifications, marketing campaign delivery, or customer support communication, an email infrastructure outage during a product launch, regulatory notification window, or peak commercial period produces revenue loss and compliance exposure that vastly exceeds the cost of redundancy infrastructure. Multi-Jurisdiction Redundancy Pack costs EUR 2,999 plus EUR 599 monthly, for an annual cost of approximately EUR 10,187. A single avoided outage of three hours at typical enterprise impact rates pays for the bundle for over three years.

The October 20, 2025 AWS US-EAST-1 outage is the most recent large-scale reminder. An automated DNS management issue tied to DynamoDB regional endpoints required manual intervention to resolve and brought down customer-facing services across multiple industries simultaneously. Companies with tested multi-region failover continued operating from secondary regions; companies relying on Multi-AZ within US-EAST-1 stayed down until AWS restored the DNS infrastructure. The financial press covered the outage extensively because the impacted services included household names. The operational lesson was covered less: the difference between a 15-minute failover and a 6-hour outage came down to whether the DR plan had been tested in advance.

For email specifically, several characteristics increase the value of cross-region redundancy beyond the general DR case. First, email reputation is path-dependent and slow to recover. A 6-hour outage where transactional emails fail or get retried aggressively can trigger ISP throttling, blacklist listings, or reputation damage that persists for weeks after the underlying infrastructure recovers. Cross-region redundancy keeps the sending pattern steady through regional incidents. Second, email deadlines are typically hard rather than soft. A delayed password reset is a security incident, a delayed order confirmation looks like a fraud signal, a delayed marketing send misses the window. The operational tolerance for queue-and-retry behaviour is lower than for many other workloads. Third, email jurisdictional positioning matters for compliance and customer trust independently of redundancy; if you are already running production in EU jurisdictions for GDPR positioning, deploying a second EU region extends both compliance posture and operational resilience with a single architectural decision.

The bundle is engineered around the realistic operational mechanics rather than theoretical patterns. DNS health checks at default 90-second detection are tightened to 30 seconds for customers who need faster failover; the trade-off is health check sensitivity (false positives during normal network flutter) versus failover speed. The choice is documented per customer based on their actual observed network conditions. Suppression list sync runs every 60 seconds in async mode by default because email workloads rarely need stricter consistency; synchronous mode is available for customers requiring strict cross-region consistency on unsubscribes. The secondary region IPs maintain warm reputation through scheduled traffic so failover does not trigger reputation degradation on top of the original outage, which is the most common operational failure mode in untested DR plans.

The quarterly drill is the accountability layer that separates real redundancy from theoretical redundancy. Untested failover procedures are fiction rather than insurance. The drill schedule begins with a first drill within 60 days of bundle onboarding to validate that the deployed architecture meets the documented RTO and RPO targets before relying on it in a real incident. Subsequent drills rotate through the six failure modes documented in the playbook over an 18-month cycle, ensuring all scenarios receive tested coverage rather than only the most familiar ones. Drill reports document RTO achieved, RPO achieved, procedure deviations, and corrective actions. The reports are delivered to the customer within 10 business days and inform the next drill design. Customers who skip drills because real incidents have not happened recently are the customers most likely to fail their first real test; the drill cadence is non-negotiable in the bundle design.

eight components

What you receive, with the engineering rationale.

01

Two or three region selection

Datacenter regions chosen for your customer footprint, regulatory posture, and threat model. Typical pairings: EU primary plus EU candidate (RO+BG), or EU primary plus non-EU secondary (RO+PA, BG+HK). Three-region option for customers needing additional resilience or wider geographic coverage.

02

Independent IP pools per region

Separate IP allocations per region with independent reputation maintenance. A deliverability problem in one region (RBL listing, ISP throttling, reputation event) does not propagate to the other region. Secondary region IPs warm through scheduled traffic so they have established reputation at failover time.

03

DNS health-check failover

Default 90-second detection window using three consecutive health check failures at 30-second intervals. Tightenable to 30 seconds (three checks at 10-second intervals) for critical workloads. Failover triggers DNS update propagating across recursive resolvers within seconds; client TTLs dictate when traffic actually shifts.

04

Cross-region suppression sync

60-second async synchronisation of unsubscribes, hard bounces, and complaints across all regions. Each region writes to a globally replicated store; each region polls the store every 60 seconds. Synchronous mode available for operations requiring strict cross-region consistency on suppression events.

05

Bounce and complaint aggregation

Eventual-consistency aggregation of bounce events and complaint feedback loop data across regions into a unified deliverability view. Customers see a single dashboard rather than per-region silos. Aggregation lag typically under 5 minutes for complaint events, under 1 minute for hard bounces.

06

Quarterly DR drill

Scheduled exercise every three months covering one failure mode per drill on an 18-month rotation through all six modes. Drill report documents RTO and RPO achieved against targets, procedure deviations, and corrective actions. First drill within 60 days of bundle onboarding.

07

Replication lag monitoring

Continuous monitoring of replication lag with alert thresholds tied to your RPO target. Alert fires when lag exceeds threshold for sustained window (default 60 seconds), giving the operations team early warning before lag becomes operationally visible. Alerting integrates with customer incident channels (Slack, Telegram, PagerDuty).

08

Incident playbook (6 modes)

Written runbook covering primary datacenter outage, network partition between regions, IP blacklist event in primary region, DNS hijack scenario, regional regulatory event forcing emergency departure, BGP hijack scenario. Each playbook entry includes detection signals, response steps, validation criteria, and rollback procedure.

dr pattern comparison

Four standard DR patterns with realistic cost and recovery figures.

The DR pattern that fits your operation depends on your RTO and RPO targets, not on what the marketing team prefers. We document the four standard patterns with realistic numbers below so you can map your requirements honestly. The bundle implements warm standby (active-passive) by default and active-active on request.

Pattern RTO RPO Relative cost Best fit
Backup & Restore 4-24 hours Up to 6 hours 1x (baseline) Non-critical workloads; analytics; archive
Pilot Light Minutes to hours Seconds to minutes 1.3x Tier 2 operations; cost-conscious DR
Active-Active Seconds (traffic re-weights, no startup) Seconds (or zero with sync replication) 2x or higher PREMIUM Transactional, security-critical

The bundle defaults to Warm Standby because it matches most email workload requirements at substantially lower cost than active-active. Active-active is available for operations where seconds of downtime carry measurable cost (transactional email handling payment confirmations, security-critical notifications, compliance communications with hard SLA penalties). Backup & Restore and Pilot Light fall outside the bundle scope; for those patterns, the standard ASH single-region products combined with backup-as-a-service provide appropriate coverage at lower cost.

when this fits

Operational profiles where redundancy pays for itself.

01

Transactional email at commercial scale

Operations sending order confirmations, password resets, billing notifications, security alerts. Each delayed message has measurable business cost and a meaningful share of recipients treat delivery delays as service failures. The bundle keeps transactional flow live through regional incidents.

02

ESPs and email-as-a-service operators

Customers building their own ESP product on ASH infrastructure inherit our redundancy posture for their own customers. The bundle supports multi-tenant operations with per-tenant deliverability isolation across regions and cross-region tenant suppression sync.

03

Financial services or regulated communications

Banks, insurance, fintech, and other regulated operations where email is part of the regulatory communication path. SLA-bound customer contracts often impose downtime penalties; the bundle keeps you on the right side of those penalties through regional incidents.

04

Multi-region customer base

Operations with customers across multiple geographic regions where latency matters for transactional email delivery. Active-active configuration delivers low latency to all customer regions while also providing redundancy as a side-effect of the architecture.

05

Compliance posture spanning jurisdictions

Operations needing data residency in specific jurisdictions for some traffic and operational independence from those jurisdictions for other traffic. The bundle supports per-tenant or per-list region pinning so different traffic categories route through different regions automatically.

06

High-stakes campaign windows

Operations with revenue-critical campaign windows (Black Friday, product launches, regulatory notification windows, event triggers). A regional outage during the campaign window has outsize cost; the bundle insures against that specific failure mode at modest annual expense.

questions before you order

Frequently asked.

What does Multi-Jurisdiction Redundancy Pack include?

Eight components engineered together for resilient cross-region email operations. First: two or three datacenter regions selected based on your customer footprint, regulatory posture, and threat model (typical pairings are EU primary plus EU candidate, or EU primary plus non-EU secondary). Second: independent IP pools per region with separate reputation maintenance, so a deliverability problem in one region does not propagate to the other. Third: DNS health-check failover with default 90-second detection window (tightenable to 30 seconds for critical operations). Fourth: cross-region SMTP suppression list synchronisation every 60 seconds, so unsubscribes and hard bounces captured in one region are honoured in all regions within one minute. Fifth: bounce and complaint event aggregation across regions into a unified view. Sixth: quarterly disaster recovery drill with documented RTO and RPO achieved against the targets defined for your operation. Seventh: daily replication lag monitoring with alert thresholds tied to your RPO target. Eighth: incident response playbook covering six standard failure modes with rollback procedures. Setup EUR 2,999 once, EUR 599 monthly recurring.

Why does email infrastructure need cross-region redundancy?

Real-world data from 2025-2026. On October 20, 2025, an AWS DNS resolution failure in DynamoDB regional endpoints triggered a worldwide business outage affecting major customer-facing services across multiple industries. In March 2026, AWS UAE went partially offline after physical incidents at the datacenter. The pattern is consistent: regional outages happen, they affect entire regions simultaneously, and operations without tested failover stay down until the upstream provider restores service. For email operations specifically, downtime translates directly into business impact. Hostperl reported in 2026 that database failures cost on average EUR 5,600 per minute across industries; e-commerce sites during peak hours reach EUR 50,000 per minute. Email is the backbone for transactional notifications (order confirmations, password resets, security alerts, billing), marketing campaign delivery (revenue events with hard deadlines), and customer support (ticket responses, account updates). An email infrastructure outage during a product launch, regulatory notification window, or peak commercial period creates revenue loss and compliance exposure that vastly exceeds the cost of a redundancy bundle.

What is the difference between active-passive and active-active configurations?

Two architectural patterns with different cost and recovery characteristics. Active-passive runs the primary region serving all production traffic and a secondary region maintained as a warm standby (infrastructure ready, data replicated, but not serving traffic). On failure detection, DNS health checks fail over to the secondary region within 90 seconds (or 30 seconds with tightened settings). RTO measured in minutes, RPO measured in seconds to minutes depending on replication mode. Cost is lower because secondary capacity does not need to match primary capacity at all times. Active-active runs both regions serving production traffic simultaneously with traffic split by latency or weight. On failure, the surviving region absorbs all traffic with RTO measured in seconds because no infrastructure needs to start up; only traffic routing adjusts. Cost is approximately double because both regions run full production capacity. We recommend active-passive for most email operations because the failover RTO is well within acceptable bounds for email workloads and the cost differential is material. Active-active is appropriate for transactional email handling payment confirmations, security-critical notifications, or compliance-driven communications where seconds of downtime have measurable business cost.

What replication mode does the bundle use?

Default is asynchronous replication with sub-minute lag targets. Synchronous replication (RPO zero, zero data loss) is available for operations requiring guaranteed-zero data loss but carries a write latency penalty of 30-100ms depending on inter-region distance, which can affect throughput for high-volume SMTP submission. Asynchronous replication (RPO measured in seconds, low write latency) is the standard configuration for email workloads where a few seconds of replication lag is acceptable in exchange for full local write throughput. Eventual consistency (RPO in minutes, lowest write latency) is available for non-critical aggregation pipelines like deliverability analytics. We document which replication mode applies to which data class in your specific deployment: suppression lists use sub-minute async with 60-second sync interval; bounce events use eventual consistency with 5-minute aggregation; subscriber list changes use sub-30-second async. The choice of mode per data class is documented in the deployment runbook and reviewed during the quarterly DR drill.

How are quarterly DR drills conducted?

Scheduled exercise with documented procedures and measured outcomes. The drill cycle runs every three months. Each drill targets one failure mode from the six covered in the playbook (primary datacenter outage, network partition between regions, IP blacklist event in primary region, DNS hijack scenario, regional regulatory event forcing emergency departure, BGP hijack). The drill begins with notification to the customer 5 business days before execution, runs during a low-traffic maintenance window agreed in advance, executes the failover procedure end-to-end with measured timing at each step, validates the secondary region handles real production traffic successfully, runs failback to primary after validation, and produces a written report covering RTO achieved (measured), RPO achieved (measured), any procedure deviations, any rollback rough edges identified, and corrective actions for the next drill. The report is delivered within 10 business days of drill completion. The first drill happens within 60 days of bundle onboarding to validate that the deployed architecture meets the documented targets before relying on it in a real incident.

What does Multi-AZ is not multi-region mean and why does it matter?

The most common misconception in disaster recovery planning. Multi-AZ (multiple availability zones within one region) protects against single-datacenter failures and is the default high-availability pattern on cloud platforms. It does not protect against region-wide outages because all availability zones in a region share underlying control-plane dependencies, regional DNS infrastructure, and sometimes physical infrastructure or upstream connectivity. The October 2025 AWS US-EAST-1 outage affected all availability zones in that region simultaneously because the failure was in regional DynamoDB DNS infrastructure. Operations that thought they had high availability through Multi-AZ discovered during the outage that they had high availability against datacenter failures but not against regional failures. Multi-jurisdiction redundancy means deploying production capability in genuinely separate regions with separate upstream providers, separate control planes, separate physical infrastructure, and separate regulatory jurisdictions. The bundle deploys to ASH datacenter regions that are operationally and physically independent: an outage in Romania does not affect Bulgaria, Panama, Hong Kong, or Singapore. That independence is what genuine cross-region redundancy delivers.

How does the suppression list sync work across regions?

Real-time concern with a real-time mechanism. When a recipient unsubscribes through any region, that suppression must propagate to all regions before any region attempts to send to that recipient again. The standard implementation uses a centralised suppression authority with sub-minute synchronisation: each region writes suppression events to a globally replicated store, each region polls the store every 60 seconds for new entries, and the SMTP send path consults the local suppression cache before every send. If the primary region experiences a 90-second outage during DNS failover, the secondary region picks up SMTP submission with a suppression cache that may be up to 60 seconds stale; the next sync cycle (within 60 seconds of failover completion) catches up. For operations requiring stricter guarantees (paid newsletter operators, regulated communications), we offer a synchronous suppression mode that confirms write to all regions before acknowledging an unsubscribe to the user; this trades a few hundred milliseconds of unsubscribe latency for strict cross-region consistency. The choice is documented per customer in the onboarding runbook.

What about IP reputation in a secondary region that rarely sends?

Independent and continuously maintained. The most common operational mistake in cross-region email DR is letting the secondary region IPs sit idle until failover, at which point they have no reputation and the failover triggers deliverability problems on top of the original outage. The bundle prevents this by maintaining secondary region IPs as an active sending pool throughout normal operations. Configurations vary: some customers split sending 70/30 between primary and secondary continuously, some send specific traffic categories (transactional, lower-volume) from the secondary, some run scheduled reputation maintenance sends. Whichever pattern fits, the secondary IPs maintain warm reputation with mainstream mailbox providers at all times. When failover triggers, the secondary IPs absorb the full traffic load with established reputation rather than starting from zero. We document the specific reputation maintenance pattern in your deployment runbook and review it during the quarterly DR drill.

Order Multi-Jurisdiction Redundancy Pack.

Telegram conversation establishes primary region, target regions, RTO and RPO targets, traffic split between regions, and replication mode per data class. Topology deployed within 10 business days. First DR drill within 60 days of deployment to validate the architecture before relying on it in a real incident. Quarterly drill cadence thereafter. Cancel anytime; no minimum term.

# Median Telegram response: 12 minutes during operating hours