Twelve days from today, Gmail will start issuing soft 421 errors to bulk senders who fail the new authentication requirements. Yahoo will follow on a rolling basis through the first half of the year. The enforcement is not theoretical anymore. We have been running compliance audits across the customer base for the last three months, and there is a pattern to the failures that is worth documenting before the deadline arrives.
This is not a “what is DMARC” post. By now anyone running bulk infrastructure either knows what DMARC is or has bigger problems than a blog post can solve. This is the specific audit checklist we are running against every customer’s setup, the failures we are finding most often, and the remediation that actually works in the twelve-day window if you discover something broken today.
The audit sequence in order of frequency-of-failure
When we started the audit rounds in October, I expected the most common failure to be missing DMARC records. It was not. The most common failure, by a significant margin, was DKIM key length. About a third of the customer-managed setups we touched still had 1024-bit DKIM keys generated five to eight years ago, signing every outbound message. None of them realized this was a problem because their mail was getting delivered. Microsoft Outlook started treating 1024-bit signatures with skepticism late in 2023, and Gmail will follow.
The second most common failure was DMARC alignment. The customer had a DMARC record, the receiving server could find it, but the From: domain was not aligned with either the SPF return path or the DKIM signing domain. The message would pass SPF and DKIM in isolation. DMARC would still fail because alignment is the binding requirement. This failure is invisible without aggregate reports. None of the customers we found this in had aggregate reports going to a mailbox anyone read.
The third most common failure was a SPF record that exceeded the 10-lookup limit. SPF resolution walks through every include: and a: and mx: directive recursively, and any path that exceeds ten total lookups returns a PermError. Most receivers treat PermError as authentication failure. The senders affected had built up their SPF over years, adding includes as they added new services, never auditing what was actually being resolved.
The fourth most common failure was a transactional sender who genuinely was below 5,000 daily messages to Gmail but had no idea their volume could spike on incidents. A password reset incident pushed one customer from 2,000 daily transactional to 18,000 in a single 24-hour window during an outage in November. They had no DMARC, no proper SPF, and they suddenly tripped the threshold. The Gmail enforcement does not care that you usually send less. The day you send more is the day the requirement applies.
The fifth most common failure, and the one I want to highlight because almost nobody talks about it, was the unsubscribe processing pipeline. The customer had implemented the List-Unsubscribe header. They had even tested it. What they had not tested was whether the actual processing of the unsubscribe POST suppressed the recipient from future sends. In two cases, the unsubscribe endpoint logged the request and returned 200, then did nothing else. Subsequent campaigns sent to the unsubscribed user. This will be a spam complaint generator by April.
How we run the audit on a customer setup
For any sender we onboard or audit, we run the same six-step check. It takes about thirty minutes per sending domain if everything is in order, two to six hours if something is wrong.
Step one is SPF resolution. We use dig and a script that walks the SPF graph counting lookups. We compare against the canonical sending IP list the customer has given us. Any include: that resolves to IPs not in the canonical list goes on the remediation list. Any path that hits eleven lookups goes on the remediation list. Any -all that should be ~all (or vice versa) goes on the remediation list. We also check whether the customer’s third-party services (warmup vendors, escalation tools, support inboxes) are included or whether they are sending unauthenticated under the customer’s domain. We have found this in seven of fifteen audits this quarter.
Step two is DKIM verification. We pull every selector under _domainkey for the sending domain. We check key length, encryption algorithm (RSA is correct, anything else is a misconfiguration), and rotation date. The rotation date is not in DNS, it has to be inferred from PowerMTA or ESP records. Any key older than 18 months gets flagged for rotation. Any key under 1024 bits gets immediate rotation. Any key without a known recent rotation gets a rotation plan.
Step three is DMARC. We pull the _dmarc record. We check it parses correctly (a surprising number have syntax errors that are silently treated as “no policy”), and we check the rua= mailbox is one the customer actually has access to. We then check the last 30 days of aggregate reports if they exist. We classify the mail streams the reports surface, and we identify any stream that is failing alignment or authentication. The aggregate reports are the most valuable diagnostic tool in this entire audit.
Step four is MTA-STS and TLS-RPT. These are not strict requirements for Gmail enforcement but they are signals receivers use to score senders. We check whether the customer has an MTA-STS policy published, whether it is reachable via HTTPS, and whether TLS-RPT is configured to capture failures. Most customers do not have these. Most do not need them in February. By 2025 they will.
Step five is the unsubscribe pipeline. We send a test message from the customer’s setup to a Gmail account we control. We then issue the one-click unsubscribe POST from Postman with the exact User-Agent the receiving servers use. We verify the endpoint returns 200 or 202. We then send a second message from the same setup and verify it does not arrive. About half the customers we tested for this in November had a working header but a broken pipeline. The pattern is consistent: implementing the header is easy and gets prioritized, implementing the suppression logic is harder and gets deferred.
Step six is Postmaster Tools data. We check whether the customer is signed up. If they are, we look at the spam rate trend over the prior 30 days. If the rate is below 0.10%, they are operationally fine. If it is between 0.10% and 0.30%, they have remediation work to do and the work involves their content and audience, not their infrastructure. If it is above 0.30% they have urgent work to do and Gmail is going to start rejecting their mail in February regardless of what infrastructure they have.
The remediation in the twelve-day window
If you have just discovered a problem and the deadline is twelve days away, here is what you can fix in that time and what you cannot.
You can rotate DKIM keys in twelve days if you start today. The actual mechanics: generate the new key, publish the new selector to DNS, wait 48 hours for receiver caches to pick up the new public key, switch outbound signing to the new selector, leave the old selector in place for 7 more days while in-flight mail clears, then deprecate the old selector. Total elapsed time is 9-10 days. If you start January 22, you finish February 1 with everything switched over before the enforcement begins.
You can fix SPF records in twelve days. The work is essentially: enumerate every authorized sending IP across all services the domain uses, build a fresh SPF record that lists them with minimal include: nesting, publish it, wait 24 hours for cache propagation, and verify with multiple receivers. Twelve days is more than enough. The hard part is finding all the services. Marketing teams routinely add ESPs and warmup vendors and survey tools without telling operations. You will find at least one you did not know about.
You can publish a DMARC record in twelve days, but you cannot move to p=reject in twelve days. If you have nothing today, you should publish p=none with valid rua= reporting and stay at none through February. Moving to p=quarantine should happen after you have collected and reviewed at least four weeks of aggregate reports. Moving to p=reject should happen after at least four more weeks at p=quarantine with no surprises in the reports. The Gmail enforcement only requires you to have a DMARC record with at least p=none, so publishing none in time for the deadline is fine.
You can implement the List-Unsubscribe header in twelve days. You probably cannot fully build out the suppression pipeline in twelve days if you do not already have one. If this is your situation, prioritize the header for the February deadline and the pipeline for the June one-click enforcement deadline. The header alone will keep you compliant for the authentication portion. The pipeline is what keeps you compliant when one-click enforcement begins in June.
You cannot fix a 0.30% spam complaint rate in twelve days. If your spam rate is above the threshold, the only short-term remediation is reducing your daily volume to under 5,000 to Gmail addresses, which keeps you below the bulk sender threshold and gives you breathing room to rebuild list quality. This is a serious operational decision and not one we recommend lightly. The longer-term fix is list hygiene, content alignment, and audience reconfirmation, all of which take months.
What we are seeing in the last week before enforcement
The customers who have been doing the work since October are in good shape. They have rotated their keys, fixed their SPF records, published DMARC, built unsubscribe pipelines, and lowered their complaint rates. For them, February 1 will be a non-event. They will see their authentication pass rates in Postmaster Tools, they will see their compliance status as compliant, and their delivery will continue exactly as before.
The customers who have been doing the work since December are in adequate shape. They have not had the luxury of slow DMARC progression. They published p=none and they are staying there through the deadline. They have done the SPF and DKIM work but they have not had four weeks of aggregate reports to catch every unauthenticated stream. They will likely have a handful of legitimate mail flows fail authentication in February that they did not predict. They will fix them in the following weeks. Their delivery will dip briefly and recover.
The customers who are starting the work this week are going to have a rough February. The infrastructure work is doable in twelve days. The reputational work is not. They will hit February with authentication passing but with a backlog of audience hygiene issues. The complaint rate will manifest as deliverability degradation through February and March. Recovery to baseline takes six to twelve weeks.
The customers who are not doing the work and are betting that Gmail will not actually enforce in February will discover that Gmail will actually enforce in February. We have been telling them this since October. Some have listened. Some have not. The ones who have not will be writing post-mortems in March.
Specific things we are watching for on February 1
We have customers across seven jurisdictions and a range of volume profiles. We are setting up monitoring that fires on three specific signals starting January 30.
First, any customer’s 4xx error rate from Gmail jumping above their historical baseline. Gmail’s announced enforcement starts with 4xx soft errors on a small percentage of non-compliant traffic. A meaningful rise in the 4xx rate is the early warning that some traffic is failing the new checks.
Second, any customer’s Postmaster Tools compliance status switching from compliant to non-compliant. The compliance status dashboard is the most direct signal Gmail will give us. We monitor it daily.
Third, any aggregate DMARC report showing a sudden change in failed mail streams. The reports lag by 24-48 hours but they catch the long-tail failures that the compliance dashboard does not surface.
We do not expect February to be uneventful. The honest expectation across our customer base is that perhaps three to five customers will have something unexpected fail in the first week of February. The work between now and then is to compress that number by catching as much as possible in advance. The work in the first week of February is to triage and fix what we missed.
If you are reading this and you are not sure where you stand, the audit is something you can do yourself in a long afternoon. The six steps above are not specific to anything we do, they apply to any bulk sender. The tools are all free: dig for SPF and DKIM checks, the receiving servers themselves for DMARC and one-click testing, Postmaster Tools for compliance and complaint rate data.
The deadline is in twelve days. The infrastructure work can be done in that time if it has to be. The reputational work cannot. Now is the moment to be honest with yourself about where you stand. If you discover you are not where you need to be, decide today whether you are going to do the work, reduce your volume, or accept that your delivery is going to degrade. There is no fourth option.
We are still accepting customer audits this week. After January 28 we are no longer taking on remediation work that has to complete before the deadline. Anyone reaching out after that date will be working on a post-February timeline. If you are not sure whether you need an audit, the answer is that if you are asking the question, the answer is yes.
A note on what the receivers will actually do in February
I have read several pieces of analysis that conflate “Gmail enforcement begins” with “mail starts bouncing.” The actual behavior is more graduated than that, and understanding it matters for triage in the first week.
Gmail’s announced rollout is that non-compliant mail will start receiving temporary 4xx errors on a small percentage of traffic from February 1. The percentage is not specified publicly but the operational reality, based on what we have learned from a contact at one of the major ESPs, is that it starts very small. Single-digit percentage of non-compliant volume. The rationale Google gives internally is that they want senders to see the errors and have time to react before the errors become hard rejections.
This means February 1 will not look like a wall coming down. It will look like an inflection point. Senders who are non-compliant will see their 421 rate creep up from near zero to something between 0.5% and 3% in the first week. Within two months that number is expected to grow to a meaningful fraction of traffic. By April, Google has said publicly, they will start permanent 5xx rejections.
For triage in the first week, this matters because a small rise in 4xx errors is not by itself proof of a compliance failure. Mail providers have noise floors. We treat the signal as meaningful when a customer’s 4xx rate from Gmail rises more than two standard deviations above their 30-day baseline. Below that threshold we wait another 48 hours before raising an alarm.
The Yahoo behavior is messier. Yahoo’s rollout is “gradual through the first half of the year” without the specific February 1 anchor that Gmail gave. We expect the actual enforcement at Yahoo to start in early February with a small percentage of traffic, mirror Gmail’s progression, and be fully enforced by April or May. Senders who are compliant at Gmail will generally be compliant at Yahoo. The senders who fail Yahoo and pass Gmail are the senders whose authentication has some peculiarity that affects one receiver and not the other. We have not yet seen this combination but expect to in the spring.
What we have already broken trying to fix things
Two specific things have gone wrong in our customer audits that are worth flagging because both are easy to fall into.
The first was a DKIM rotation we did for a customer in early January. We followed the standard sequence: published the new selector, waited for cache propagation, switched signing. What we missed was that the customer’s CRM was relaying through a separate sending service that signed with its own DKIM key under a CNAME we did not know about. When we deprecated the old selector, the CRM mail started failing authentication for the brief window where the CNAME was pointing at the old selector. The fix was to add the CRM service to the audit scope and rotate its key separately. The takeaway is that any service that sends mail using your domain needs to be in the rotation plan. Service inventory is part of compliance.
The second was a DMARC record we published for a customer in November. The record was syntactically valid. The aggregate reports were going to a mailbox we had set up. We checked the reports weekly. What we did not catch was that one of the customer’s affiliates was sending campaigns using the customer’s domain (with permission, but unauthenticated) and the affiliate’s mail was failing DMARC. Because the DMARC policy was p=none, the mail still got delivered, but the customer’s domain reputation was taking a hit from the failures. The affiliate had been doing this for months. The aggregate reports surfaced it. We then had a long conversation with the customer about whether to authenticate the affiliate properly (which they did) or terminate the affiliate program (which they did not). The point is that p=none is a monitoring policy. It does not protect you from your own delegated senders. It surfaces them.
How we are running the change windows for the customers we have to fix
For each remediation cycle, the change window is structured the same way. The 24 hours before the change, we monitor baseline metrics: send volume, bounce rate, complaint rate, Postmaster compliance status. The change itself happens during a low-volume hour for that customer’s audience, typically 02:00 to 04:00 in their primary recipient timezone. We make the change, verify it took effect, and then sit on it for an hour watching for any immediate failures.
The 48 hours after the change, we monitor a more granular set: 4xx and 5xx rates per receiver, authentication pass rates from aggregate reports, any feedback loop spikes from Gmail or Yahoo, any customer reports of failed delivery. We expect normal noise but no sustained changes. If we see a sustained change in either direction we treat it as a signal worth investigating before continuing to the next change.
We do not batch DNS changes. One TXT record change at a time. The temptation to rotate DKIM, update SPF, and publish DMARC all in the same window is strong because each change requires waiting for propagation. The reason we do not batch is that if something fails, we want to know which change caused it. The cost is a few more days of elapsed time. The benefit is a much faster recovery if something goes wrong.
The other thing we do not do is hurry the customer past confidence. If the customer is not certain a change is right, we wait. The deadline is real but a rushed change that breaks production mail is worse than a missed deadline by two days. We have postponed one customer’s DMARC publication by two days this month because they were not ready to monitor the aggregate reports. We will publish it on January 26. It will be in force by February 1 with a few days to spare. That is enough.