Multi-Cloud Strategy: When It Makes Sense and When It Doesn't

Multi-cloud is one of those strategies that sounds unambiguously good in a board presentation. Vendor independence, best-of-breed services, resilience against cloud provider outages — what's not to like?

The reality is more complicated. Multi-cloud done right is genuinely valuable. Multi-cloud done carelessly is expensive, complex, and ironically more fragile than a well-executed single-cloud strategy.

Let's work through the trade-offs honestly.

What People Mean by "Multi-Cloud"

The term covers very different architectural patterns:

Active-Active Multi-Cloud

Your application runs simultaneously across AWS and GCP (for example), with traffic split between them. This provides the highest resilience and theoretically eliminates cloud provider lock-in.

Active-Passive Multi-Cloud

Primary workloads run on one cloud. A secondary cloud sits ready for failover, potentially with replicated data. You only use it when the primary has problems.

Workload-Segmented Multi-Cloud

Different services run on different clouds based on where they're strongest. ML training on GCP because of TPUs, customer-facing on AWS because of global CloudFront reach, video processing on Azure because of your existing Microsoft licensing.

SaaS-Integration Multi-Cloud

Your core infrastructure is single-cloud, but you integrate with cloud-native SaaS services that happen to run on other providers. This is what most companies actually do.

Each of these has a very different cost-benefit profile.

When Multi-Cloud Actually Makes Sense

Regulatory Requirements

Some industries have explicit requirements about data residency or provider diversification. Financial services in certain regions, government contracts, healthcare data in some jurisdictions. If a regulator or contract requires it, the decision is made for you.

Specific Service Superiority

When one cloud genuinely offers a service that is materially better for your use case, it may be worth the complexity:

GCP BigQuery    → Petabyte-scale analytics at low cost
AWS SageMaker   → ML model deployment and inference
Azure AD        → Enterprise identity when customers use Microsoft
Cloudflare      → Edge computing closer to users than any single cloud

The key word is "materially better" — not marginally better with different syntax. The improvement needs to justify the operational complexity of running across multiple clouds.

Negotiating Leverage

A credible multi-cloud capability changes your position in contract negotiations. Cloud providers discount significantly when you demonstrate you're not locked in. If you're spending $2M+/year on cloud, this leverage can be worth millions in savings — even if you never actually run workloads on the secondary cloud.

Acquisition-Driven Heterogeneity

You acquired a company that runs on Azure. Your core business runs on AWS. Migrating everything to one cloud takes 18 months and significant risk. Running multi-cloud temporarily while you migrate is completely rational.

When Multi-Cloud Is the Wrong Choice

Early-Stage Companies

If you're pre-product-market-fit or a small team, multi-cloud will consume engineering time that should go into the product. Single-cloud with good practices is almost always right here.

The cloud provider risk is overstated for most businesses. Major cloud providers have extraordinary uptime records. AWS's S3 SLA is 99.9% — that's 8.7 hours of downtime per year. Most businesses have far more downtime from their own code.

When the Team Lacks Cloud Depth

Multi-cloud requires expertise across multiple platforms. If your team is already stretched learning AWS deeply, adding GCP doesn't double your capability — it distributes your learning across two systems and halves your depth in each.

A strong single-cloud team will outperform a weak multi-cloud team every time.

When Costs Are a Concern

Multi-cloud costs more. Not a little more — often significantly more:

Cost drivers of multi-cloud:
- Egress fees when data moves between clouds
- Duplicate tooling and licensing
- Separate monitoring, security, and compliance stacks
- Engineering time to maintain abstractions
- More complex debugging and incident response
- Training for multiple platforms

Analysis by multiple organizations consistently shows that egress fees alone can add 20-40% to cloud bills when running active-active across providers.

The Lock-In Question

Vendor lock-in is the most commonly cited reason for multi-cloud, and it's frequently misunderstood.

What Actually Locks You In

High lock-in risk:
- Proprietary managed databases (DynamoDB, Cosmos DB, Firestore)
- Cloud-native serverless (Lambda, Cloud Functions, Azure Functions)
- Proprietary ML services with custom APIs
- Cloud-specific networking constructs

Low lock-in risk:
- Compute (EC2, GCE — they're all VMs)
- Kubernetes (largely portable with care)
- Standard databases (RDS PostgreSQL, Cloud SQL — both are Postgres)
- Object storage (S3-compatible APIs are everywhere)
- Standard queueing (SQS vs Pub/Sub are different but not that different)

You can reduce lock-in significantly without going multi-cloud by choosing services with open standards. Running PostgreSQL on RDS locks you in to the RDS service, not to AWS — migrating to Cloud SQL or a self-hosted PostgreSQL is straightforward.

The Hidden Abstraction Tax

Teams trying to stay cloud-agnostic often build abstraction layers:

# Instead of using cloud-native services directly...
from myapp.storage import ObjectStorage  # Our abstraction

storage = ObjectStorage()
storage.put("key", data)  # Works on S3 or GCS

# Behind the scenes:
class ObjectStorage:
    def put(self, key, data):
        if settings.CLOUD == "aws":
            self.s3_client.put_object(Bucket=self.bucket, Key=key, Body=data)
        elif settings.CLOUD == "gcp":
            blob = self.bucket_ref.blob(key)
            blob.upload_from_string(data)

This abstraction looks reasonable. In practice, it:

Constrains you to the common denominator of both APIs
Prevents you from using advanced features of either
Adds a maintenance burden forever
Rarely gets tested against the secondary cloud
Usually doesn't actually work when you try to switch

The portability you paid for often isn't there when you need it.

If You Do Go Multi-Cloud: Practical Architecture

If the business case is solid, here's how to execute it well.

Build on Open Standards

Choose services that have broadly adopted APIs:

# Kubernetes: runs on any cloud
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: api
          image: myorg/api:1.0.0
          # This runs on EKS, GKE, or AKS without changes

Portable technology choices:
- Kubernetes for compute orchestration
- PostgreSQL for relational data (multiple managed options)
- Kafka or Pulsar for event streaming
- Redis for caching (Elasticache, Memorystore, Azure Cache all work)
- Terraform for infrastructure provisioning
- Prometheus + Grafana for observability

Unified Observability Is Non-Negotiable

The worst part of multi-cloud incidents is context-switching between monitoring tools. Invest in a single observability plane:

# Datadog agent config — works across clouds
agents:
  - name: aws-cluster-agent
    provider: aws
    region: us-east-1
  - name: gcp-cluster-agent
    provider: gcp
    region: us-central1

# All metrics flow to one Datadog organization
# One dashboard, one alert policy, one on-call rotation

Paying for a unified observability tool pays for itself the first time you troubleshoot a cross-cloud incident.

Separate Concerns by Cloud

Rather than running the same workload on both clouds (complex), run different workloads on each (manageable):

AWS:
  - Customer-facing API and web app
  - Relational database (RDS)
  - Media storage (S3)
  - Email sending (SES)

GCP:
  - Data warehouse (BigQuery)
  - ML training pipelines (Vertex AI)
  - Analytics ingestion (Pub/Sub → BigQuery)

This is workload-segmented multi-cloud, and it's the most pragmatic form. Each cloud does what it does best. You don't need complex cross-cloud failover because the workloads aren't duplicated.

Cross-Cloud Networking

If you need low-latency connectivity between clouds, use dedicated interconnects rather than traversing the public internet:

Options:
- AWS Direct Connect + GCP Cloud Interconnect meeting at a colocation facility
- Equinix Fabric for any-to-any connectivity
- Aviatrix for multi-cloud network abstraction
- VPN tunnels (simpler but higher latency and variable bandwidth)

Data transfer costs and latency between clouds via the public internet are higher than most teams expect. Budget for this explicitly.

The Decision Framework

Before committing to multi-cloud, answer these questions:

1. Is there a regulatory or contractual requirement?
   Yes → Scope the minimum required multi-cloud footprint
   No  → Continue to question 2

2. Is there a specific service that is materially better on another cloud?
   Yes → Consider workload-segmented approach for that service
   No  → Continue to question 3

3. Are you spending enough to justify negotiating leverage?
   $2M+/year → The leverage may justify investment
   Less       → Not worth it yet

4. Do you have the engineering capacity to maintain two clouds?
   Yes → Proceed with clear scope
   No  → Single cloud, invest in portability instead

Default answer: Single cloud, done well.

A Reasonable Middle Path

For most companies, the right answer is:

Primary cloud: Pick one, go deep, master it
Portability hygiene: Use Kubernetes, avoid the most proprietary services, prefer open APIs
Secondary SaaS: Use cloud-native SaaS on other providers freely (Snowflake, Cloudflare, etc.)
Egress awareness: Design your data architecture to minimize cross-provider data movement
Documented exit plan: Know theoretically how you'd migrate, even if you never execute it

This gives you most of the benefits of multi-cloud without most of the costs.

The cloud is infrastructure. It should enable your product, not become your product. Make the choice that lets your team stay focused on what creates value for your customers.

Building something that needs to scale? We help teams architect systems that grow with their business. scopeforged.com