Most of what gets written about cold outreach infrastructure is written by ESPs trying to sell SMTP relays, warmup vendors trying to sell warmup pools, or affiliate marketers trying to sell whichever tool kicks them a commission. The honest operational requirements for an agency running cold outreach across multiple clients are different from any of those three perspectives. I want to walk through what those requirements actually look like.

We have been onboarding cold outreach agencies for the last two years. The pattern of what they ask for versus what they need is consistent enough that it is worth documenting. The mismatch is large. Most agencies arrive with a setup that is either over-built in places that do not matter or under-built in places that will kill their delivery within three months. This is not a criticism of the agencies. The information available to them is filtered through vendor incentives, and the vendors are not selling honest infrastructure design.

The work the agency is actually doing

Before talking about infrastructure, I want to describe the work, because the work shapes everything that follows.

A typical cold outreach agency we serve manages between six and twenty active client campaigns at any time. Each client campaign has a target audience of usually 5,000 to 50,000 prospects, sourced from one or more of: sales navigator scraping, Apollo-style databases, ZoomInfo lists, BuiltWith firmographic filtering, or hand-built lists from the client’s own research. The agency writes sequences of usually four to seven touches, spread over two to six weeks. The touches are sent from mailboxes the agency controls, branded to look like they are coming from the client’s employees but actually living in the agency’s infrastructure.

The work happens at three layers. The data layer is list sourcing, enrichment, verification, and segmentation. The sending layer is the actual SMTP infrastructure: mailboxes, sending IPs, warmup, throttling, bounce processing, reply routing. The campaign layer is the sequences, the personalization, the scheduling, the response handling. Most agencies treat the data layer and the campaign layer as their core competence. They treat the sending layer as plumbing that should “just work.”

The mismatch with reality is in that last assumption. The sending layer is not plumbing. It is the layer with the highest operational complexity and the largest performance variance. A bad sending setup will degrade campaign performance by 60% or more compared to a good one. A good sending setup gives the agency leverage that no amount of better copywriting or smarter targeting can replicate.

What agencies typically arrive with

The setup we see most often when an agency comes to us is some combination of the following. They have a primary domain for their agency identity and 15-40 secondary domains for client campaigns. Each secondary domain has 3-10 mailboxes provisioned through a workspace email provider. The workspace provider is usually Google Workspace, sometimes Microsoft 365, occasionally Zoho or Fastmail. The mailboxes are connected to a cold outreach sequencer like Instantly, Smartlead, Lemlist, Quickmail, or one of the smaller competitors.

Warmup is happening through one of three mechanisms. The first is the sequencer’s built-in warmup feature, which simulates inbox engagement by exchanging mail with other accounts in the same pool. The second is a third-party warmup service like Warmup Inbox or Mailwarm, which does the same thing across a larger pool. The third, less commonly, is a manual setup the agency built itself.

The DNS setup is usually correct on SPF (because the workspace provider sets it up automatically) and incorrect on DKIM and DMARC. DKIM is often the workspace provider’s default rather than a custom-aligned key. DMARC, if it exists at all, is at p=none with no aggregate reporting going anywhere useful. Reverse DNS on the sending IPs is set to the workspace provider’s default, not aligned with the sending domain.

The reply handling is split. Replies that come back from prospects route to either the sequencer (which often parses them poorly) or a shared inbox the agency monitors manually. Out-of-office responses and bounces are typically handled by the sequencer through some heuristic that decides whether to retry, skip, or remove the recipient.

This setup is what gets sold to agencies as “the modern cold outreach stack.” It works at low volume. It starts to fail at medium volume. It actively works against the agency at high volume.

Where this setup fails

The failures fall into four categories that compound.

The first failure is reputation contamination through shared infrastructure. When the agency uses Google Workspace mailboxes for sending, the mailboxes share Google’s outbound IP pool with millions of other senders. The sending IP is not under the agency’s control. The IP reputation is a property of the entire pool. If a poorly-behaved sender on the same IP block burns reputation, the agency’s mail starts going to spam without any change in the agency’s behavior. The agency has no levers to pull because they do not control the IPs.

The second failure is warmup that does not warmup. The sequencer’s built-in warmup and the third-party warmup services share a small pool of accounts that recognize each other. The warmup mail is being delivered between accounts that are all sending warmup mail. The receiving mailbox providers (Gmail, Microsoft, Yahoo, Apple) increasingly recognize this pattern and discount the engagement signals from warmup pools. The result is that the warmup is producing inbox-placement metrics inside the pool that do not translate to inbox placement in the real audience.

The third failure is alignment. Because the workspace provider signs DKIM with its own key under a CNAME, the DKIM signing domain often does not align with the sending From domain in a way that DMARC alignment evaluates as strict. Mail passes DKIM in isolation but fails DMARC alignment, which receiver reputation systems weight against the sender. The agency has DMARC at p=none so the mail is still delivered, but the alignment failure is logged and influences future delivery decisions invisibly.

The fourth failure is volume management. Cold outreach sequencers usually let the agency configure “max emails per mailbox per day” as a static number. The number is set conservatively, usually 30-50 per mailbox per day, on the theory that low volume per mailbox stays under provider rate limits. The number is not adjusted dynamically based on what each individual mailbox can actually handle. The result is that some mailboxes are over-sending and being throttled silently by Google or Microsoft, while others are under-sending and not building reputation as fast as they could.

What the infrastructure actually needs to look like

The setup that scales properly for cold outreach has different properties.

The sending IPs need to be dedicated to the agency and ideally segmented by client campaign. This means moving off Google Workspace as the sending mechanism while keeping it as the mailbox receiving mechanism if the agency wants to. Mail is sent via a dedicated SMTP relay with IPs the agency controls. Replies come back to the workspace mailbox as before. The cost is that the agency has to manage SMTP infrastructure or pay someone to manage it. The benefit is that reputation is property of the agency’s own IPs and not a shared pool.

The DKIM signing needs to use a key under the actual sending domain rather than a delegated key from the workspace provider. This means publishing your own DKIM record at selector._domainkey.yourdomain.com and configuring the SMTP relay to sign with the matching private key. DMARC alignment then works in strict mode. The mail passes DMARC properly rather than skating through under a relaxed alignment policy that the receivers downgrade.

The warmup needs to look like real engagement rather than warmup pool exchanges. The way we structure this is to seed the agency’s audience with a controlled set of accounts that read, reply, and forward messages in patterns that match real user behavior. The seed accounts are not part of any warmup pool and do not appear in any third-party service’s footprint. The cost is operational complexity and the cost of maintaining the seed set. The benefit is that the engagement signals are accepted by the receivers as real.

The reply handling needs to do three things the typical sequencer setup does not. First, identify bounced replies and feed them back into list hygiene. Second, identify auto-responder replies and not count them as engagement. Third, identify positive replies and route them to a human within hours rather than days. The third point is where most agencies leave money on the table. A reply that takes three days to get a follow-up converts at one-quarter the rate of a reply that gets a follow-up within four hours. The fix is workflow, not infrastructure, but the workflow is enabled by the infrastructure.

The volume management needs to be dynamic. Each mailbox has a daily ceiling that adjusts based on what the mailbox can actually deliver without bouncing. New mailboxes start at 5-10 per day and ramp up over 14-28 days. Established mailboxes can handle 80-150 per day if the warmup was done correctly. Older mailboxes that have built deep reputation can sometimes handle 200+ per day for genuinely engaged audiences. The static “30 per day” that most sequencers default to leaves enormous capacity unused on the established mailboxes and over-pushes the new ones.

The mailbox-to-domain ratio question

This is one of the questions agencies ask us most often and where the conventional wisdom is largely wrong. The conventional wisdom is “more mailboxes per domain is better for capacity.” The conventional wisdom is wrong.

The right way to think about the ratio is in terms of risk concentration. Every mailbox you operate is a separate reputation signal to the receiver. If you have ten mailboxes on one domain and one of them generates a complaint spike, the receiver evaluates the domain’s reputation, sees the spike, and downgrades all ten mailboxes. The reputation is at the domain level, not the mailbox level.

The right architecture is fewer mailboxes per domain, more domains. Three to five mailboxes per domain is typical. If you need more capacity, add more domains, not more mailboxes on existing domains. Each domain is an independent reputation surface. A complaint spike on one domain does not affect the others if they are properly isolated. The isolation requires that each domain has its own DKIM key, its own DMARC policy, its own SPF record, and ideally its own sending IP.

The downstream effect is that an agency at our volume tier ends up with 15-40 secondary domains, each lightly used, each with strong authentication, each with its own warmup history. This is operationally more expensive than running fewer domains with more mailboxes each, but the deliverability difference is substantial. We have customers who moved from 4 domains with 25 mailboxes each to 20 domains with 5 mailboxes each, kept their total daily volume identical, and saw inbox placement improve by 18-25%.

The throttling and dispatch question

Once you have the mailbox-to-domain ratio right, the question of how to dispatch across mailboxes becomes important. The naive approach is round-robin: each mailbox sends one message before the next mailbox sends one. This is what most sequencers do by default. It looks fair but it produces a uniformity in the sending pattern that receivers can detect.

The pattern that works better is randomized weighted dispatch. Each mailbox has a current capacity score (based on its age, its bounce rate, its complaint rate, and its recent delivery success). Each message is dispatched to a randomly-selected mailbox weighted by capacity score. Messages are spaced with random intervals between sends, not regular intervals. The total daily volume per mailbox stays within its ceiling, but the within-day distribution is irregular.

The irregularity matters. Receivers track inter-message intervals as one of many signals. A mailbox that sends one message every 47 seconds for two hours looks like an automation. A mailbox that sends one message, then waits 4 minutes, then sends another, then waits 11 minutes, then sends three in 6 minutes, looks like a person at a keyboard. The mathematical distribution of human send timing is well-studied (it follows a roughly log-normal distribution with some power-law tail behavior), and matching that distribution rather than uniform timing produces noticeably better reputation signals.

We do this dispatch through PowerMTA, which has fine-grained scheduling primitives that the typical cold outreach sequencer does not expose. The agencies that come to us and stay tend to be the agencies that have hit the ceiling of what their sequencer can do at the scheduling layer.

The reply routing operational reality

I mentioned this above but want to expand on it because most agencies underinvest here and it is the highest-ROI improvement we typically make.

When a cold outreach reply lands in the agency’s inbox, three things happen in quick succession that determine whether the reply becomes a meeting, a deal, or wasted effort. First, the reply has to be detected. Most sequencers detect replies through SMTP forwarding rules and parse them adequately for status changes (positive, negative, out-of-office) but poorly for content extraction. Second, the reply has to be routed to the right human. The right human is whoever is the campaign lead for that client, available right now, capable of carrying the conversation. Third, the human has to compose a follow-up that is contextually relevant.

The total elapsed time from reply received to follow-up sent is the metric that matters. Industry benchmarks suggest that follow-ups within four hours convert at roughly 2.5x the rate of follow-ups within 24 hours, and follow-ups within 24 hours convert at roughly 1.8x the rate of follow-ups within 48 hours. The marginal hour matters enormously, especially for the first follow-up.

The infrastructure components that matter for this are: a reply detection layer that catches replies in real-time (within seconds), a routing layer that knows which human is responsible for which campaign and which humans are available, an alerting mechanism that reaches the human (typically a Slack notification or push notification, not an email which sits unread for hours), and a context-assembly layer that gives the human everything they need to compose the follow-up in one place (the original message, the prospect’s profile, the previous touches, the campaign goal).

Most agencies treat this as a workflow problem. It is partly a workflow problem. The infrastructure side is real and most agencies undersolve it.

What we provide and what we deliberately do not

We provide the SMTP infrastructure layer end-to-end: dedicated sending IPs in jurisdictions that align with the agency’s audience, mailbox-to-domain architecture, DKIM and DMARC and SPF setup correctly, dynamic volume management, randomized weighted dispatch, bounce processing, complaint feedback loops with the major mailbox providers, and the operational support to keep all of this working.

We deliberately do not provide several pieces of the stack. We do not provide list sourcing, enrichment, or verification. We do not provide the sequencer or the campaign management UI. We do not provide reply parsing, routing to humans, or follow-up composition. We do not provide the writing of the sequences themselves.

The reason for the deliberate exclusions is that those layers are where the agency adds value to its clients. We are infrastructure. We are not trying to be a full-service marketing platform. The agencies we serve already have their data tools and their sequencer of choice and their workflow for handling replies. They come to us because the SMTP layer is where they were getting hurt and they wanted to fix it without rebuilding everything else.

The integration model is that the agency’s sequencer points its outbound SMTP at our relay rather than at the workspace provider’s SMTP. The replies still route through the workspace mailbox the way they did before. Nothing else changes from the agency’s daily workflow perspective. What changes is what happens at the wire.

The cost question

I am going to be direct about cost because the comparison is something agencies have to do and the honest numbers are not always presented.

A typical agency running 25 mailboxes across 5 domains through Google Workspace pays Google about $150-200 per month for the mailbox subscriptions, plus the cost of the sequencer (Smartlead, Instantly, or similar typically runs $97-297 per month), plus warmup ($50-100 per month). Total roughly $300-600 per month for the stack, depending on the tier.

The same agency moving to dedicated SMTP infrastructure (which is what we provide) pays roughly $300-450 per month for the SMTP layer that replaces the Google Workspace sending capability. They keep Google Workspace for receiving and inbox management (reduces to about $80-120 per month at the Business Starter tier since they no longer need the higher tier for sending volume). They keep their sequencer. They keep the warmup or replace it with our managed warmup at no extra cost. Total roughly $480-670 per month.

The gross cost is about 25-40% higher. The deliverability improvement is typically 18-30%. For an agency whose campaign performance is sensitive to deliverability, the return on the cost increase is several months. For an agency whose campaign performance is already poor enough that the increase doesn’t matter, we tell them the issue is in their copywriting or targeting and they should fix that first before changing infrastructure.

What I would tell an agency starting today

If you are starting a cold outreach agency today, the order of operations matters.

First, get one campaign working end-to-end on whatever infrastructure you have available, including Google Workspace mailboxes if that is the path of least resistance. Validate that your copywriting works, your targeting works, and your reply handling works. Spend the first three months on this. The infrastructure does not matter at this stage because nothing else is dialed in.

Second, once the campaign is producing predictable results, scale it. Add more campaigns. Add more clients. Run into the deliverability ceiling on your current infrastructure. You will hit it somewhere between 100,000 and 300,000 total monthly sends. The symptoms are inbox placement degrading, reply rates dropping despite the copy not changing, and complaint rates rising despite the targeting not changing.

Third, when you hit the ceiling, that is when you invest in infrastructure. Move off Google Workspace as the sending layer. Build dedicated SMTP. Restructure your domain-to-mailbox ratio. Implement the warmup that does not look like warmup. The investment makes sense at this stage because you have validated revenue to compare against the infrastructure cost.

What you should not do is invest in infrastructure on day one. Most agencies that try this fail before they get to scale because they spent their first six months on infrastructure instead of on campaigns. The infrastructure layer is important. It is not the first layer to perfect.

We onboard agencies at all three stages. The early-stage ones we typically tell to come back in six months. The mid-stage ones we onboard and migrate carefully. The late-stage ones we onboard fast because they understand the urgency.

If you are reading this and you are at the second stage, hitting the ceiling, this is the conversation to have. The remediation timeline from “I notice my deliverability is degrading” to “I have new infrastructure running” is typically four to eight weeks. Start before the ceiling becomes a wall.