BIG BOX Hosting Guides Procurement Guide 2026 № 60.02

Email infrastructure
procurement guide
for 2026.

A procurement-grade methodology for email infrastructure vendor selection in 2026. The four developments since 2020 that reframed the landscape — Schrems II, Yahoo/Gmail/Microsoft bulk sender requirements, OVH Canada ruling. Shortlist construction with binary baseline gates, technical evaluation against actual production infrastructure, DPA negotiation timelines as leading indicators, total cost of ownership with engineer-hours real, migration planning, and what this guide deliberately does not cover. About one in five intake conversations we run on the basis of this guide ends with us recommending a different vendor — that is the methodology working as intended.

01  /  Why this guide exists

Procurement-fluent, not vendor-promotional.

Written for procurement teams running an email-infrastructure vendor selection. Synthesises the procurement-side content across this site into a single defensible methodology.

This guide synthesises the procurement-side content published across the rest of this site into a single resource for procurement teams running an email-infrastructure vendor selection in 2026. The sections that follow assume the reader has been delegated the responsibility of running a vendor selection and needs a defensible methodology to take to internal stakeholders. The guide does not assume the reader is a specialist in email deliverability or in the regulatory frameworks that apply to the workload. It does assume the reader can read a contract, can engage with a compliance team, and can distinguish between marketing claims and verifiable infrastructure documentation.

The methodology is the same one we use when we are the customer rather than the vendor. Several of the engagement leads on our side previously ran procurement for in-house email infrastructure at media organisations and fintech operators; the approach below is what we wish we had been handed when we were running those selections, before we had the operational experience to construct it ourselves. The guide is consequently written in a tone that is procurement-fluent rather than vendor-promotional, and the recommendations it produces will not always favour BIG BOX Hosting. About one in five intake conversations we run on the basis of this guide ends with us recommending the customer talk to a different vendor. That is the methodology working as intended.

─────────────────────────────────────────────────────────────────────────
02  /  The 2026 landscape

Schrems II. Bulk sender. OVH Canada. NIS2.

Four developments since 2020 reframed the procurement framework for email infrastructure. Each is recent enough that frameworks built before 2020 likely have gaps.

Four developments since 2020 have reframed the procurement landscape for email infrastructure. The Schrems II ruling of July 2020 (CJEU C-311/18) reframed the lawful-transfer analysis for personal data leaving the EEA. The Yahoo and Gmail bulk sender requirements that took effect in February 2024 hardened the technical baseline that any sender at scale must meet. The Microsoft sender requirements that took effect in May 2025 added comparable obligations for sending to Outlook.com and Hotmail. The OVH Canada ruling of September 2025 reframed the corporate-counterparty analysis for cloud and infrastructure providers more broadly. Each of these developments is recent enough that procurement frameworks built before 2020 are likely to have gaps that need attention.

Schrems II implications. Schrems II invalidated the EU-US Privacy Shield framework and reframed the analysis under which personal data may be transferred outside the EEA. The ruling has been refined by subsequent case law (La Quadrature du Net 2020, the Privacy Shield successor framework adopted in 2023, the OVH Canada ruling of 2025), but the core analysis remains: a controller transferring personal data to a processor outside the EEA must conduct a transfer impact assessment that includes the legal regime of the recipient country, the corporate structure of the processor, and the practical likelihood that foreign government access could compromise the data. Email infrastructure customers running this analysis since 2020 have produced a measurable shift in vendor selection toward providers whose corporate counterparty is exclusively within the EEA or in adequacy-decision countries. We have run forty-plus migration engagements where this analysis was the primary procurement driver.

Bulk sender requirements. The Yahoo and Gmail bulk sender requirements that took effect in February 2024 specify three baseline obligations for any sender exceeding 5,000 messages per day to a Gmail or Yahoo address: SPF and DKIM authentication on all outbound mail; DMARC published with a policy of at least p=none with an aggregate report destination; and a one-click unsubscribe mechanism (RFC 8058) for marketing mail. The Microsoft requirements that took effect in May 2025 added comparable obligations for Outlook.com and Hotmail with slightly tighter thresholds for one-click unsubscribe placement. A vendor selection that does not explicitly verify the candidate's compliance with these baselines is missing a procurement-grade requirement that is now table stakes.

OVH Canada ruling. The OVH Canada ruling of 25 September 2025 produced the largest single shift in our intake conversations across the second half of 2025. The ruling held that a Canadian court could compel OVH (a French EU-domiciled cloud provider) to extract and produce customer data held by group entities outside Canada, on the basis that OVH had a Canadian subsidiary which made the corporate group reachable through Canadian process. The same logic applies to any provider with a corporate footprint in a jurisdiction whose courts can compel production. The procurement consequence: the corporate-counterparty analysis that financial-services teams have done since Schrems II is now the analysis every regulated-industry procurement team needs to do. The DPA negotiation is no longer the only place this analysis surfaces; it surfaces in vendor selection itself.

Sectoral regulatory updates. NIS2 (Directive 2022/2555/EU) added cybersecurity obligations on digital infrastructure providers across the EU, with national transposition completing across 2024-2025. The Data Act (Regulation 2023/2854/EU) became fully applicable on 12 September 2025, adding switching and portability obligations on cloud and edge providers. MiCA (Regulation 2023/1114) added crypto-asset specific provisions including residency requirements that affect email infrastructure for crypto firms. Each of these adds to the procurement framework rather than replacing the existing GDPR baseline. The compounding effect is that 2026 vendor selections require more documentation than 2020 selections, which in turn rewards vendors who have invested in documentation discipline rather than vendors who have invested in marketing surface.

─────────────────────────────────────────────────────────────────────────
03  /  The procurement framework

Defensible shortlist construction.

Five baseline gates plus vertical-specific gates. Three to five candidates including one unlikely to win. Two procurement modes: formal RFP, or relationship.

The procurement framework starts with a defensible shortlist. The shortlist is defensible if every candidate meets a published baseline that the procurement team can document, and if the published baseline includes the four developments described in section 2. The candidates that fail the baseline are excluded for documented reasons; the candidates that pass are evaluated against the specific workload requirements in subsequent stages. The shortlist construction is the most procurement-political stage of the selection because internal stakeholders frequently advocate for candidates they have personal experience with, and the methodology has to either include those candidates fairly or exclude them transparently.

Baseline gates. The baseline gates we recommend in 2026: the corporate counterparty for the customer contract is established in the EEA or in an adequacy-decision country, with no parent group entity in a non-adequacy jurisdiction that would create CLOUD Act-style process reach; the candidate publishes a sub-processor list at the company name and jurisdiction level, with notification commitments of 30 days minimum for changes; the candidate's standard DPA template references GDPR Article 28 obligations explicitly and includes the Standard Contractual Clauses for the relevant transfer module where transfers outside the EEA occur; the candidate operates infrastructure compliant with the Yahoo/Gmail/Microsoft bulk sender requirements; the candidate publishes a security disclosure file at /.well-known/security.txt with a current contact and GPG fingerprint. Candidates that fail any of these are excluded from the shortlist with the failure documented.

Vertical-specific gates. Add vertical-specific gates per the customer's regulatory profile. Financial-services customers add: explicit reference to FCA SYSC 8 audit-rights latitude (for UK firms), MiFID II Article 16(7) retention support, PSD2 Article 95 incident notification SLAs that match the firm's regulatory window. Media organisations add: source-protection statutory references where the customer's editorial workload requires them, infrastructure metadata hygiene where sources need to be unable to infer the provider from headers. Healthcare customers add: HDS certification or a documented derogation route (the French Healthcare HDS migration case study walks through one such engagement), the equivalent national health-data hosting framework for non-French customers. Each gate is binary: meets or does not meet. Quantitative criteria belong to subsequent evaluation stages.

Shortlist size and composition. The shortlist should contain three to five candidates. Fewer than three produces insufficient comparison; more than five produces evaluation work that exceeds the value of the additional candidates. Within the shortlist, include at least one candidate that the procurement team genuinely believes is unlikely to win. The presence of that candidate forces the methodology to engage with the boundary cases rather than confirming the pre-selected favourite. We have seen vendor selections where the surprise winner was the candidate the team had originally included as a comparison baseline; the methodology produces this outcome when it is run honestly.

RFP versus relationship. Two procurement modes operate at this point. The formal RFP, where each candidate completes a structured questionnaire and the responses are scored quantitatively. The relationship mode, where the procurement team takes a 30-minute call with each candidate and evaluates qualitatively. We have seen both modes produce high-quality vendor selections; we have seen both modes produce low-quality selections. The mode is less important than the discipline. Whichever mode is chosen, the methodology must be documented before the conversation begins so that the evaluation cannot be retrofitted to favour a candidate after the conversation has produced a particular impression.

─────────────────────────────────────────────────────────────────────────
04  /  Technical evaluation

Authentication. Deliverability. Responsiveness. Documentation. References.

Verify against actual production infrastructure rather than trusting marketing claims. Test deliverability across two to three days. Communicate enough to observe the realistic response cadence.

Authentication baseline verification. Technical evaluation starts with verifying the authentication baseline against a candidate's actual production infrastructure rather than trusting marketing claims. Use dig +short TXT against the candidate's documented sending domain and verify SPF, DKIM, DMARC, MTA-STS, and TLS-RPT records are present and correctly formed. Send a test message to a Gmail account you control and inspect the headers for SPF pass, DKIM pass, DMARC alignment pass. Repeat against Yahoo and Outlook accounts. The candidate that passes this baseline against test accounts is operating the basics correctly; the candidate that fails has a problem that will surface in production within weeks.

Deliverability profile testing. The deliverability profile cannot be evaluated from a single test message. The realistic evaluation requires the candidate to send a representative volume against test accounts on the major mailbox providers across two to three days. We have run this evaluation as part of competitive bids and the results are typically more revealing than the candidate's marketing material. The candidate whose Gmail Postmaster reputation drops into Medium during the test send is unlikely to maintain High reputation under production load. The candidate whose Yahoo throttle threshold trips at 50,000 messages per day per IP has a capacity profile materially different from the one their published documentation claims.

Operational responsiveness. Operational responsiveness is the variable that most vendor selections fail to evaluate before the contract is signed. The candidate that responds to a procurement enquiry within four hours during business days is signaling its standard support pace. The candidate that takes three business days to respond to the first email is signaling something different. We have seen vendors whose pre-sales response was 24 hours and whose post-sales support response was a week, and the gap was a leading indicator of churn-quality experience. The procurement team should communicate with the candidate enough times during evaluation that the realistic response cadence is observable. The realistic response cadence is the production-time response cadence.

Documentation depth. Documentation depth is the proxy for operational discipline. Candidates whose documentation surface is thin typically have thin operational discipline behind it. Specifically: a candidate without a published security.txt has not done the 30-minute work to publish a security.txt and the same operational pattern probably extends to less visible work. A candidate without a published sub-processor list has either not maintained one or is unwilling to disclose it; both are procurement risks. A candidate without published incident response runbooks does not necessarily lack runbooks but has chosen not to publish them, which limits the customer's ability to assess the runbook quality before committing.

Reference checks. Reference checks complete the technical evaluation. Ask the candidate for two reference customers in your industry with similar volume. Speak with the references directly rather than reading published case studies. Ask the references three questions: how often does the candidate's infrastructure produce a customer-visible incident, how does the candidate handle the incident when it occurs, and what would the reference change about the relationship if they were starting over. Honest references will answer all three questions concretely. Marketing-coached references will produce general affirmation that does not engage with the specifics. The procurement team that has never had a reference call with a marketing-coached customer has not had enough reference calls.

─────────────────────────────────────────────────────────────────────────
06  /  Cost evaluation

Headline pricing is the smallest variable.

Engineer-hours, deliverability impact, switching cost, migration cost. The four components most procurement teams omit. Each has a defensible methodology for monetisation.

Headline pricing. Headline pricing is the starting point of cost evaluation, not the conclusion. The published per-thousand-message rate or per-month tier is one component of total cost of ownership, and it is rarely the largest component. The components most procurement teams omit: engineer time required to operate the candidate's infrastructure, the cost of sub-optimal deliverability translated into customer-acquisition or revenue impact, the cost of switching if the candidate underperforms, the migration cost to reach the candidate from the current state. Each component has a defensible methodology for monetisation, and a vendor selection that compares only headline pricing has compared the smallest variable in the calculation.

Engineer-hour cost real. Engineer time is the single largest hidden cost in most ESP comparisons. A candidate whose service requires four hours of engineer attention per week to operate effectively has an engineer-hour cost of roughly €23,000 per year at the €110 per hour fully-loaded rate that we use as a calculation baseline. A candidate whose service requires forty hours of engineer attention per week — typical for self-managed Postfix or Exim deployments at scale — has an engineer-hour cost above €230,000 per year. The headline pricing difference between a fully-managed service and a self-managed alternative is rarely large enough to overcome an engineer-hour delta of that magnitude. The procurement team should model engineer time explicitly, with a published rate the team can defend against finance review, and treat engineer time as part of the contract value rather than as overhead.

Deliverability cost. Deliverability translates into revenue more directly than most cost models capture. A marketing programme with a 78 percent Gmail inbox placement has roughly 16 percentage points of recoverable revenue against a comparable programme at 94 percent. The recoverable revenue at typical conversion rates (1 to 3 percent on email-driven acquisition) and typical customer lifetime values is meaningful enough to dominate the headline-pricing comparison for any sender above modest scale. The candidate that costs €500 per month more but delivers an additional 12 percent inbox placement has paid for itself within a quarter at most B2C scales. The procurement team should request inbox placement data from each candidate during the technical evaluation and model the revenue impact of the delta against the customer's specific conversion economics.

Switching cost as latent value. Switching cost is the hardest component to quantify because it does not surface until the customer wants to leave. The procurement team should treat the candidate's openness about switching cost as a leading indicator of how easy the actual switch will be. Candidates who publish data export formats, who document standard deprovisioning procedures, who commit to assisting migration at termination — these candidates are signalling that switching cost is low. Candidates whose contracts impose post-termination data-extraction fees, who require advance notice periods longer than the operational reality of the workload, or whose data formats are proprietary — these candidates are signalling that switching cost is high. The Data Act of 2023 (Regulation 2023/2854/EU) tightened the legal baseline on switching obligations, but the contractual specifics still vary materially across vendors, and procurement-time review is the right place to catch the exposure.

Migration cost. Migration cost is the explicit price of leaving the current state to reach the candidate's. Our migration guide describes the typical 21-day pattern in detail, with engineer-hours as the dominant variable. A typical 21-day migration with one dedicated engineer at €110 per hour fully loaded is roughly €18,000 in internal engineer time, plus any vendor-side migration support fee. Migrations beyond the typical envelope (multi-tenant architectures, regulated-industry compliance overlays, in-house Postfix deployments rather than ESP-to-ESP migrations) cost more in proportion to the additional complexity. The procurement team should request a migration-cost estimate from each candidate at the shortlist stage and treat the response as a methodology check rather than as a binding quote.

─────────────────────────────────────────────────────────────────────────
07  /  Migration planning

Selection produces a contract. Migration produces reality.

The 21-day pattern fits most ESP-to-dedicated migrations within a moderately complex envelope. The 6-week pattern applies for workloads beyond it. Realistic timelines produce successful migrations.

Vendor selection produces a contract; migration produces operational reality. The two are connected. Procurement teams who treat selection as the end of the work routinely discover that the chosen candidate's marketing material described an architecture different from what the migration actually produces. The procurement team should require the candidate to walk through the migration plan during the evaluation stage rather than after contract signature. The walk-through reveals whether the candidate has run comparable migrations before and whether their estimated timeline matches the customer's operational reality.

The 21-day pattern. The 21-day migration pattern documented in our migration guide applies to most ESP-to-dedicated migrations within a moderately complex envelope: between one and four sending domains, monthly volume between one and ten million messages, one dedicated engineer for three weeks. The pattern breaks into discovery and DNS audit (days 1-3), authentication remediation (days 4-7), IP warmup (days 8-14), two-stage cutover (days 15-18), DPA execution and DNS cleanup (days 19-21). The pattern is achievable but not comfortable; teams that try to run it as a side project alongside ongoing feature work consistently miss the timeline by 50 to 100 percent.

The 6-week alternative. For workloads beyond the 21-day envelope, the 6-week pattern applies. Very high volume senders (above 50 million messages a month), customers under active regulatory inquiry, multi-tenant architectures, and in-house Postfix or Exim migrations all need the longer timeline. The German Media case study documents a six-month engagement that combined regulatory remediation with infrastructure migration, and the timeline reflected the regulatory side rather than the technical side. The procurement team should match the timeline to the workload realistically rather than committing to the headline 21-day number for a migration the workload cannot support in 21 days. Realistic timelines produce successful migrations; aspirational timelines produce migrations that arrive at week six in a hybrid state harder to manage than either endpoint.

─────────────────────────────────────────────────────────────────────────
08  /  Boundaries

What this guide doesn't cover.

Marketing automation. Inbound security gateways. Volumes outside the dedicated-economic envelope. When to engage external advisors.

Marketing-tooling decisions. This guide does not cover marketing automation platform selection. The choice between HubSpot or Marketo or Braze or Customer.io or Iterable, alongside the dozens of comparable tools in the market, is a marketing-team decision rather than an infrastructure-team decision, and the methodology for evaluating those tools is different from the methodology for evaluating the underlying delivery infrastructure. Email infrastructure sits beneath the marketing automation platform; the procurement team should evaluate the two layers separately. A marketing automation platform that integrates well with multiple delivery infrastructures gives the customer flexibility; a marketing automation platform locked to a single delivery infrastructure couples two procurement decisions that should remain decoupled.

Email security gateways. This guide does not cover inbound email security gateway selection. The choice of Mimecast or Proofpoint or Barracuda or any comparable inbound-mail security tool is operationally distinct from outbound delivery infrastructure. Some vendors offer both capabilities; the integration is rarely as deep as the marketing material implies, and the procurement team should evaluate inbound security and outbound delivery as separate decisions even when a single vendor offers both. Where overlap exists (vendors who provide both inbound and outbound on a unified platform), the procurement team should specifically test the integration claims rather than trusting them.

Volumes outside our economic envelope. This guide does not cover vendor selection for senders below 100,000 messages per month. The economics of dedicated infrastructure are not competitive at that volume against shared-IP services, and the procurement framework above is over-engineered for the actual decision. Senders at this volume should evaluate Postmark or SES or another shared-infrastructure service without applying the full procurement framework. The framework starts producing returns above roughly 500,000 messages per month and scales from there. Senders at very high volume (above 100 million messages per month) need procurement frameworks that include per-IP capacity planning, multi-region failover, and contractual capacity commitments that go beyond what this guide covers.

When to engage external advisors. This guide does not replace external procurement advisors where the procurement is genuinely complex. Customers running their first email-infrastructure procurement, customers under active regulatory pressure, customers whose volume sits at the boundary between shared and dedicated, customers whose vertical has sectoral requirements not covered above — each of these benefits from external advisory. We provide a 30-minute discovery call as part of our intake process and route customers to specialist advisors in our network where the customer's situation needs more than what we can provide directly. The honest framing is that not every procurement should be run by the customer's procurement team alone. Recognising when to call in support is itself a procurement skill.

─────────────────────────────────────────────────────────────────────────

Procurement conversation?

The 30-minute discovery call we offer covers the regulatory profile that applies to your workload, the jurisdiction question, the procurement-framework gates, the technical-fit assessment, and the timeline expectations. About one in five intake calls ends with us recommending you talk to a different vendor — that is part of the methodology this guide describes. Procurement teams running their first email infrastructure selection, or selections under active regulatory pressure, are particularly welcome on the call. We have run roughly seventy engagements informing the methodology above and most of the value of the conversation is in the boundary cases.