B2B agency cold outreach

the problem

Domain burnout pace exceeded warmup pipeline capacity.

The agency had built their delivery model around a familiar pattern: register sending domains per client (mybrand-outreach.com, mybrand-sales.com, variations of each client's primary brand), warm those domains for 4-6 weeks, run cold outreach campaigns until deliverability degraded, then register fresh domains and repeat. The pattern worked through 2022 and most of 2023. By Q1 2024 the cycle was tightening visibly. Domains that previously lasted 16-20 weeks were now hitting reputation issues at 8-10 weeks. By summer 2024, three of their 11 clients had simultaneously exhausted current sending domains. Their warmup pipeline, which could handle approximately two new domains per week, was already running at capacity managing the previous quarter's burnouts.

The cause was multi-factorial and not specific to the agency. Receiver-side detection of cold outreach sending patterns had tightened materially through 2023 and 2024. Gmail's spam classifier became more attentive to newly-warmed domains sending at the volume profile typical of cold outreach: 50 daily per mailbox, low engagement rates, similar template content across recipients. Outlook's heuristics followed similar tightening, and at the corporate-recipient end, anti-spam gateways like Proofpoint and Mimecast had started weighting sending-pattern uniformity as a signal independent of content or authentication. Domains that previously could sustain cold outreach for months were now flagged for reduced inbox placement within weeks of reaching production volume.

The agency had tried multiple mitigations. They lowered per-mailbox volume from 50 to 30 daily. They pushed more aggressive list segmentation. They invested in content variation tooling, generating five copy variants per sequence and randomising delivery. They extended warmup periods from 4 weeks to 6 weeks. Each adjustment helped marginally; none solved the underlying acceleration of domain burnout. Their CTO reached out to evaluate whether subdomain rotation architecture, which they had heard about in agency forums but never operationalised, could meaningfully extend domain longevity. Specifically, they wanted to know whether rotation was a genuine structural fix or just a way to redistribute the same problem across more domains.

The structural question was the right framing. Rotation done poorly is exactly what they feared: the same problem spread across more pools. Rotation done correctly is structural because it changes the volume signature that receivers see per-pool, and per-pool signature is what triggers the burnout curve. The distinction between those two outcomes turned out to be the entire engagement.

2026 context

What changed in cold outreach economics between 2024 and 2026.

This case is from 2024 and the framework deployed at the time has continued working. The numbers underneath, however, have shifted enough that any agency reading this should understand where the pressure is now. Three concrete changes matter for the cold outreach economy in 2026.

Inbox crowding has gone past the saturation point. The Sopro 2026 outreach report puts the average B2B buyer at over 120 sales-related emails per week, which is roughly 25 per business day. A typical mid-market decision maker now starts every morning with a queue of cold outreach competing for the same five seconds of attention. Average reply rates have dropped to the 1-5% band, down from approximately 7% two years ago. Top-performing programmes still hit 15-25% reply rates, but the gap between the median and the top quartile has widened: average campaigns are getting worse while disciplined campaigns get slightly better. The 2024 agency was already feeling this pressure; the 2026 baseline is materially harder.

Authentication enforcement has gone categorical. Gmail moved from soft enforcement to outright SMTP-level rejection in November 2025 for senders above 5,000 emails per day to personal Gmail accounts. Microsoft completed the same transition by April 30, 2026, returning 550 5.7.515 rejection codes on non-compliant bulk mail to consumer Outlook properties. The threshold that triggers bulk-sender rules is 5,000 per day to personal accounts; the agency in this case sat well under that threshold per client domain, but the agency's aggregate volume across all clients exceeded it. That distinction matters when running rotation pools, because the receiver does not necessarily attribute the volume to per-client domains; they see it as one operator behind a coordinated set of sending entities, and the bulk-sender envelope gets applied accordingly.

IP geography has become a stronger signal than it was two years ago. Industry case data from a content marketing agency managing outreach for a German B2B software company found that US-IP Google Workspace inboxes were hitting only 62% primary-tab placement with German and French corporate recipients. Switching to inboxes with dedicated EU IPs moved that number to 91% without changing copy, sequences, or targeting. The IP geography match was the entire fix. For any agency running cross-border outreach, IP-recipient geography alignment is now one of the cheapest single variables to optimise, and the receiver-side weighting on it appears to have increased through 2025 as anti-spam gateways incorporated more refined geographic-plausibility signals.

Mailbox rotation thresholds are well-documented now in a way they were not in 2024. The Laxis 2026 cold outreach playbook, Hypergen's 2026 cold email guide, and multiple agency operational reports converge on the same numbers: safe daily volume per mailbox is 20-50, scaling beyond a single mailbox requires rotation across 3-5 mailboxes per domain, total envelope of 60-350 per domain per day stays in the healthy range. Anything above that crosses into territory where the receiver starts treating the domain as a bulk sender regardless of how the volume is distributed across mailboxes. The architecture the agency in this case adopted in 2024 already conformed to those numbers; the framework has aged well because the underlying receiver behaviour has continued in the direction it was already moving.

audit findings

Volume concentration on single subdomain per client was the multiplier.

Our 5-day audit pulled signals from across the agency's 11 active client domains. Sending domains were registered per client with SPF, DKIM and DMARC all configured correctly. Warmup logs showed disciplined ramp from 10 daily per mailbox on day 1 to 50 daily per mailbox by week 4, then sustained 50 daily through week 8 before the first reputation deterioration signals appeared in Postmaster Tools. Authentication was not the cause; warmup discipline was not the cause. The structural issue was concentration.

Every client domain hosted between 3 and 5 mailboxes (sales, partnerships, business development, etc.), and every mailbox ran on the same root domain. Total volume per client domain across all mailboxes ran 150-250 daily during steady-state production. That number sits squarely in the bulk-sender envelope that 2024-era Gmail had begun treating with stricter heuristics, even though no individual mailbox was sending above the 50-daily safety threshold. The agency had calibrated to mailbox-level limits without recognising that the receiver views the domain as the reputation unit, not the mailbox.

The 2026 thresholds confirm this analysis. Mailbox-level limits of 20-50 daily are necessary but not sufficient. Domain-level limits matter independently, and the safe envelope per domain is 60-350 daily depending on the warmup history of that domain and the engagement rates of its mailboxes. The agency was sitting in the higher end of that envelope on every client domain, which means each domain was operating right at the edge of where receiver-side classification escalates. The eight-to-ten week burnout curve aligned cleanly with how long a domain at the high end of the envelope can sustain operation before classification tips against it.

The other audit observation was IP geography. The agency provisioned mailboxes through a US-based Google Workspace reseller. Their client base included two German SaaS companies, three UK SaaS companies, and one Australian fintech. Inbox placement data segmented by recipient country showed primary-tab placement at corporate recipients in Germany averaging 58%, in the UK averaging 71%, in Australia averaging 64%. By contrast, US corporate recipients averaged 88%. The geographic dispersion was significant and the agency was not factoring it into their pool design. They had been treating mailbox geography as a cost variable rather than a deliverability variable.

The third finding was about pattern uniformity. All 38 mailboxes were configured with similar sending schedules (business hours US Pacific, Tuesday-Thursday weighted), similar follow-up cadence (day 0, day 3, day 7, day 14), similar signature blocks (same template across clients with name and company variations), and similar opening-line patterns (10 variant openings rotated across all sequences). From any single recipient's perspective the messages looked distinct enough. From an anti-spam gateway watching aggregate patterns across thousands of recipients, the coordinated behaviour was identifiable as a single operational entity. That signal compounds across burst windows and contributes to the burnout curve independent of any individual domain's volume.

remediation executed

15-subdomain rotation pool architecture per client.

The remediation plan addressed three layers: domain-level volume distribution, IP geography matching, and pattern decorrelation across mailboxes and clients. Each layer was deployed sequentially across a 90-day rollout, starting with two pilot clients before expanding to all 11.

Layer one was the subdomain rotation pool. For each client we configured the existing root sending domain plus 14 subdomains (s1 through s14, abstracted naming for operational clarity). Each subdomain received its own DKIM signing key. SPF inherited from the root through include directives. DMARC policy was set to p=quarantine with reporting aggregation at the root domain level for visibility. Each subdomain carried a single dedicated mailbox sending 20-30 daily, which puts each subdomain well below the threshold where receivers begin tightening evaluation. Total volume per client across the 15 pools ran 300-450 daily, but distributed across 15 distinct reputation pools each operating at a quarter of the burnout-risk envelope. The arithmetic was straightforward: more pools at smaller volume rather than fewer pools at larger volume.

Layer two was IP geography. We provisioned EU-located Microsoft 365 mailboxes for the two German clients and three of the UK clients, splitting the difference between IP location and recipient base. For the Australian fintech we provisioned APAC-based mailboxes. US-located mailboxes remained for the US clients. The cost increment was small (EU and APAC mailboxes carry roughly 15-20% premium over US-based equivalents), and the deliverability impact at corporate recipients was the largest single improvement across the entire engagement. German recipient primary-tab placement moved from 58% to 89% within the first warming cycle on the new infrastructure. UK recipients moved from 71% to 92%. Australian recipients moved from 64% to 86%. The IP geography fix on its own would have been worth the engagement even without the rest of the architecture changes.

Layer three was pattern decorrelation. We rebuilt the mailbox provisioning template so each new mailbox got randomised sending hours within a four-hour band (rather than uniform schedules), randomised follow-up cadence within a three-day variance band (rather than fixed day intervals), unique signature blocks per client (rather than templated variations), and per-client opening-line pools rather than the shared 10-variant pool. The agency accepted some operational overhead for this decorrelation: each client onboarding now took roughly an extra hour of configuration work, and reporting required client-specific dashboards rather than the shared template they had been using. The trade-off was worthwhile because the aggregate pattern that anti-spam gateways had been detecting was dissolved as a signal. Each client now looked like an independent sending entity rather than one of eleven coordinated cohorts behind the same operator.

The 90-day rollout sequence ran in this order: weeks 1-2 for pilot configuration on two clients, weeks 3-6 for subdomain warmup on the pilot clients, weeks 7-8 for production volume ramp on the pilots and pattern-decorrelation template development, weeks 9-12 for expansion to the remaining 9 clients in batches of three. Total mailbox count grew from 38 to 165 across the rollout, but with 70% of those new mailboxes running below 30 daily, the operational load per mailbox decreased even as the count increased. The warmup pipeline went from running at capacity managing burnouts to running with significant headroom because the new architecture produces fewer burnouts to manage.

technical deep-dive

How the 15-subdomain pool design absorbs receiver-side detection.

The arithmetic of the pool design is worth walking through explicitly because it explains why this architecture has continued working under tightening receiver conditions. Each client pool consists of 15 sending entities (the root domain plus 14 subdomains). Each entity hosts one mailbox sending 20-30 daily, with sending hours randomised within a four-hour band and follow-up cadence randomised within a three-day variance band. The aggregate per client across all 15 entities runs 300-450 daily. From a receiver perspective, this looks like 15 distinct senders with overlapping but non-uniform behavioural fingerprints, rather than one sender behind a coordinated rotation.

The structural reason this works is that receiver classification primarily operates at the per-pool reputation unit. Gmail evaluates DKIM signing identity, per-domain historical engagement, and per-domain volume relative to historical baseline. A pool sending 20-30 daily with the appropriate signing identity stays comfortably below the threshold where bulk-sender heuristics activate. Receiver systems can correlate across pools, and they do, but correlation requires a clear-enough signal to attribute aggregate volume to a single operator. Randomising the variables that produce that signal (timing, cadence, signature blocks, signing key rotation) dissolves the correlation enough that the receiver evaluation stays at the per-pool level where compliance is maintained.

The 15-entity count was deliberately chosen rather than a round number. Smaller pools (8-10 entities) have less headroom against per-entity reputation deterioration: if two entities in an 8-pool fail simultaneously, 25% of capacity is lost and the remaining 6 entities have to absorb the volume, pushing them toward their thresholds. Larger pools (20-25 entities) add operational overhead in DNS management, DKIM key rotation, and warmup pipeline load without adding meaningful resilience. 15 entities provides three layers of buffer: a primary capacity layer of 10 entities running production, a hot-standby layer of 3 entities operating at low volume to maintain reputation without contributing to daily output, and a cold-standby layer of 2 entities pre-warmed but not actively sending, available for immediate substitution if a primary entity shows reputation deterioration.

The rotation cadence within the pool is also deliberately asymmetric. Within any given week, primary entities rotate which days they carry peak volume. Hot-standby entities rotate into the primary role on a 6-week cycle, displacing the longest-active primary entities into the hot-standby role for rest periods. This rotation pattern is invisible to recipients because each entity continues sending throughout, but it produces meaningful reputation cycling for the receiver systems that watch entity-level engagement trends. Each entity gets periodic rest from peak volume, which compounds into longer per-entity lifespan and lower aggregate burnout rate across the pool.

measured outcome

18-month results from the rotation pool architecture.

Domain burnout rate

~100% per quarter → ~12% annual

Burnout dropped to single replaced subdomains rather than full client domain cycles

CPM (cost per thousand sends)

~USD 4.20 → ~USD 2.60

38% drop driven by reduced warmup-cycle cost and fewer domain registrations

Reply rate (across all clients)

~2.1% (pre-engagement) → ~4.7% (post)

Recovered to industry-healthy 4-7% band; sustained through 2025 and 2026

Primary-tab placement (EU corporate)

58-71% → 89-92%

IP-geography matching alone accounted for ~85% of EU placement improvement

Mailbox count

38 mailboxes → 165 mailboxes

More mailboxes, less load per mailbox, lower failure rate per pool

Warmup pipeline utilisation

~100% (saturated) → ~35%

Headroom allowed taking on 4 new clients in 2025 without infrastructure expansion

lessons captured

What this case taught us about cold outreach infrastructure.

The single most operationally relevant lesson was that mailbox-level limits are necessary but not sufficient. The agency had calibrated everything they were doing to mailbox-level thresholds, which were correct as far as they went but missed the domain-level evaluation receivers were actually doing. Reframing the unit of analysis from mailbox to domain, and then from domain to operator-aggregate, is what made the rotation architecture viable. Operators running cold outreach today should think in three layers: mailbox volume, domain volume, and aggregate-operator volume. Each layer has independent thresholds and exceeding any of them triggers receiver-side classification regardless of compliance at the other two layers.

The second lesson was about IP geography. The agency had assumed geography was a cost variable to be optimised toward the cheapest provider, which in their case was US-based Google Workspace resellers. That assumption was wrong by a factor of roughly 30 percentage points on primary-tab placement at corporate recipients outside the US. The correction was operationally trivial (provisioning EU and APAC mailboxes through different resellers) and the deliverability gain was the single largest improvement across the entire engagement. Any agency running cross-border outreach should treat IP geography as a primary deliverability variable, not a secondary cost variable. The 2026 industry data confirms this strongly enough that there is no longer any debate about it.

The third lesson was about pattern decorrelation across clients. Most agencies do not think of their book of clients as an aggregate sending entity, but anti-spam gateways do. Coordinated patterns across notionally independent client campaigns produce a signal that gets classified at the operator level, not the client level. The remediation here was to actively make each client look independent, which required spending operational effort to break the patterns that the agency's standard templates had introduced. The underlying point is that operational efficiency from templated client onboarding creates a deliverability cost that agencies typically do not measure. Some templating is unavoidable; deliberately decorrelating the operationally significant variables (timing, cadence, signatures) buys back most of the lost reputation at the cost of an extra hour per client onboarding.

The fourth lesson, which only became visible across the 18-month follow-up window, was about the durability of the rotation architecture under tightening receiver conditions. Through 2025 and into 2026, receiver-side classification continued tightening: Gmail's enforcement shift in November 2025, Microsoft's bulk-sender rules taking full effect by April 2026, the move from soft-enforcement to hard-rejection at SMTP level. Every one of these changes would have accelerated burnout on the agency's pre-engagement architecture. The 15-subdomain pool architecture absorbed every one of them without operational disruption. Per-pool volume stayed well below the new thresholds, per-pool reputation accrued independently, and aggregate-operator detection remained dissolved through the pattern decorrelation. The architecture designed to survive 2024 receiver conditions has continued working through 2026 receiver conditions because the design principles (distribute volume, match geography, decorrelate patterns) align with the direction receivers were already moving.

The final lesson is about reply rates as the early indicator. The agency's first signal that something was wrong was reply rate decline, not delivery failure. Reply rates dropped from their healthy 4-6% band into the 2-3% band over a three-month window before any visible delivery issue appeared. That reply-rate signal, in retrospect, was reporting primary-tab placement degradation that would have escalated to full delivery failure inside another four to six weeks if left uncorrected. Agencies should treat reply-rate trend more seriously than delivery-rate trend as a leading indicator, because by the time delivery rates degrade visibly, the reputation pool has already absorbed enough damage to require a 30-60 day rebuild even if the underlying cause is fixed immediately.

customer reflection

In their words.

"We thought we had a domain-rotation problem. What we actually had was a domain-concentration problem with a mailbox-rotation cover. Once we saw the aggregate volume per client domain, the eight-to-ten week burnout curve made obvious sense. We had been calibrating to the wrong unit of analysis the whole time."

"The IP geography fix was the biggest surprise. We had treated mailbox provisioning as a cost decision for years. Switching our German and UK clients to EU-based mailboxes improved their reply rates more than any copy iteration we have ever shipped. Looking back, we had been throwing away roughly a third of our potential reply volume from cross-border clients for the entire history of those accounts."

"Eighteen months in, the architecture is still holding. Gmail and Microsoft have tightened twice since we deployed this, and we have not had to restructure once. The architecture absorbed both enforcement waves cleanly because the per-pool numbers were already well below the new thresholds. That was unexpected; we built it for the conditions we had in 2024 and it has continued working for the conditions we have in 2026."
: anonymized customer, agency CTO

# Agency name, client names, and exact reply rates are withheld at customer request. Volume figures, infrastructure decisions, and rollout timeline are reproducible and discussed openly with prospects under NDA. CPM and burnout-rate figures were customer-disclosed with permission to publish under the anonymisation conditions.

frequently asked

Common questions about subdomain rotation for cold outreach.

How many cold emails per day per mailbox is safe in 2026?

Safe daily volume per mailbox is 20-50 cold emails. To scale beyond a single mailbox, use rotation across 3-5 mailboxes per domain, which gives an effective ceiling of 60-350 emails per domain per day while keeping each individual mailbox in healthy territory. Sudden spikes above the established baseline are themselves a risk signal regardless of absolute volume: consistency matters more than ceiling. The 2026 industry guidance has converged on these numbers across multiple independent publications, which is a good signal that the receiver-side thresholds are stable enough to plan around.

How long does it take to warm up a new sending domain?

Plan for 4-6 weeks minimum. New domains have zero reputation and blasting 50 emails on day one is the fastest way to get flagged. Start at 10-20 per day to known contacts, increase volume gradually with a logarithmic ramp, and target 85% inbox placement across Gmail, Outlook and Yahoo before scaling. Industry data shows new domains face roughly a 30 percentage-point inbox placement penalty compared to established domains during the first warming weeks; that penalty does not lift until the domain has accumulated sufficient positive engagement signals to establish baseline trust.

Why did this agency see reply rates decline before any visible delivery problem?

The 2026 average B2B buyer receives over 120 sales-related emails per week, roughly 25 per business day. Average cold outreach reply rates have dropped to the 1-5% band, down from approximately 7% two years ago. Receiver inbox-tab placement has tightened: even when delivery succeeds, mail landing in Promotions or Updates tabs draws less attention than primary-tab mail. Reply-rate decline is the early indicator of declining primary-tab placement, which precedes any hard delivery problem by weeks. Treating reply rate as a leading deliverability signal rather than just a copy or targeting signal is a structural improvement most agencies have not made.

Does using subdomain rotation hurt deliverability compared to a single sending domain?

Used correctly, no. Each subdomain accrues its own DKIM and engagement signal while inheriting the parent domain authentication and historical reputation. Rotation distributes volume across more reputation pools, so no single pool hits the threshold where receivers begin reducing placement. Used incorrectly, with weak per-subdomain DKIM, inconsistent SPF alignment, or rotation patterns that look automated and uniform, it can hurt because receivers can detect coordinated subdomain bursts as a single sender behaving unnaturally. The difference between correct and incorrect implementation is the entire engagement value of this case study.

Should cold outreach use the same domain as transactional or marketing email?

No. Cold outreach should always run on dedicated sending domains separate from the primary brand domain. Reputation damage from cold campaigns can contaminate transactional mail like password resets and order confirmations, which then fail to reach customers when they need them most. The standard 2026 architecture uses lookalike or descriptive variations of the brand domain for cold outreach exclusively, with the primary brand domain reserved for customer communication and opt-in mail. Agencies that suggest sending cold outreach from a client's primary brand domain should be considered a red flag and avoided.

Why does sending IP geography affect inbox placement at corporate recipients?

Corporate spam filters and receiver heuristics weight geographic plausibility. Mail from US IPs to German corporate recipients is statistically more likely to be cold outreach or spam than mail from EU IPs to the same recipients, and filters treat it accordingly. Industry case data shows that switching from US-IP infrastructure to EU-IP infrastructure for outreach to German and French corporate inboxes moved primary-tab placement from 62% to 91% without changing copy or targeting. IP geography is the cheapest single variable to fix for cross-border outreach and one of the most under optimised across the industry.

What happens if the agency is sending under 5,000 per day per client domain but over it aggregated across clients?

The 5,000-per-day threshold for Gmail and Microsoft bulk-sender rules nominally applies per sending domain, but anti-spam gateways and receiver heuristics can identify operator-level aggregation through pattern correlation across notionally independent domains. Coordinated timing, identical signatures, shared signing infrastructure, and similar follow-up cadence all produce signals that get classified at the operator level rather than the per-domain level. The practical guidance is to dissolve those correlation signals through deliberate decorrelation, which keeps the receiver evaluation contained at the per-domain or per-subdomain level where compliance is actually being maintained.

next reading

Related case studies and references.

← All case studies Replicate this on Telegram

B2B outreach agency: ending the domain-burnout cycle with 15-subdomain rotation pool.

Affiliate Spamhaus recovery

Cold Outreach Bundle

Subdomain Rotation

Multi-domain Warmup

Subdomain rotation reference

Romania infrastructure