Cloud Cost Optimization | Strategies for AWS, Azure, and GCP

Cloud cost optimization balances performance requirements against spending. Without active management, cloud bills grow unpredictably as teams provision resources for peak loads, forget to decommission experiments, or choose expensive services when cheaper alternatives suffice. Systematic cost optimization reduces waste while maintaining application performance.

The pay-as-you-go model that makes cloud attractive also makes costs difficult to predict. Understanding cloud pricing models, identifying waste, and implementing cost controls helps organizations benefit from cloud flexibility without budget surprises.

Understanding Cloud Costs

Cloud costs come from multiple sources: compute (VMs, containers, serverless), storage (block, object, database), network (data transfer, load balancers), and managed services (databases, caches, queues). Each has different pricing dimensions.

Compute pricing depends on instance type, region, and commitment level. On-demand instances cost most but offer flexibility. Reserved instances (1-3 year commitments) reduce costs 30-70%. Spot/preemptible instances offer 60-90% discounts but can be terminated with notice.

// Cost-aware instance selection
class InstanceRecommender
{
    public function recommend(array $requirements): array
    {
        $options = [];

        // On-demand baseline
        $options['on_demand'] = [
            'type' => $this->findInstance($requirements),
            'cost' => $this->getOnDemandPrice($requirements),
            'availability' => 'guaranteed',
        ];

        // Reserved instance option
        $options['reserved_1yr'] = [
            'type' => $this->findInstance($requirements),
            'cost' => $this->getOnDemandPrice($requirements) * 0.6,
            'commitment' => '1 year',
            'break_even_months' => 7,
        ];

        // Spot instance option
        $options['spot'] = [
            'type' => $this->findInstance($requirements),
            'cost' => $this->getSpotPrice($requirements),
            'availability' => 'interruptible',
            'suitable_for' => 'fault-tolerant workloads',
        ];

        return $options;
    }
}

Storage costs accumulate continuously. Object storage is cheap per GB but access costs add up. Block storage costs depend on provisioned size, not used size. Database storage includes I/O charges.

Network costs are often overlooked. Data transfer between regions or to the internet costs money. Internal traffic is usually free, incentivizing regional deployment.

Identifying Waste

Idle resources are the most common waste. VMs running but unused, databases provisioned for peak load, storage holding forgotten data.

// Find underutilized resources
class ResourceAnalyzer
{
    public function findIdleInstances(): array
    {
        $instances = $this->cloudProvider->getInstances();

        return collect($instances)
            ->filter(function ($instance) {
                $metrics = $this->getMetrics($instance->id, 'cpu', 'week');
                $avgCpu = collect($metrics)->average();

                // Flag instances with < 5% average CPU
                return $avgCpu < 5;
            })
            ->map(fn ($i) => [
                'id' => $i->id,
                'type' => $i->type,
                'monthly_cost' => $this->getMonthlyCost($i),
                'recommendation' => $this->getRecommendation($i),
            ])
            ->toArray();
    }

    private function getRecommendation($instance): string
    {
        // Recommend downsizing or termination
        return $instance->environment === 'development'
            ? 'Consider terminating during off-hours'
            : 'Consider downsizing to smaller instance type';
    }
}

Orphaned resources accumulate over time. Load balancers pointing to nothing, EBS volumes detached from instances, snapshots of deleted resources.

# Find unattached EBS volumes (AWS CLI)
aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]'

# Find unused elastic IPs
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]'

Over-provisioned resources handle peak loads that rarely occur. Right-size based on actual usage, not anticipated maximums.

Right-Sizing

Right-sizing matches resource allocation to actual needs. Over-provisioning wastes money; under-provisioning hurts performance.

class RightSizingRecommender
{
    public function analyze(string $instanceId): array
    {
        $metrics = $this->getMetrics($instanceId, days: 30);

        $cpuP95 = $this->percentile($metrics['cpu'], 95);
        $memP95 = $this->percentile($metrics['memory'], 95);

        $currentType = $this->getInstance($instanceId)->type;
        $currentSpecs = $this->getSpecs($currentType);

        $recommendations = [];

        // CPU right-sizing
        if ($cpuP95 < $currentSpecs['vcpu'] * 0.3) {
            $recommendations[] = [
                'metric' => 'cpu',
                'current' => $currentSpecs['vcpu'],
                'recommended' => ceil($cpuP95 / 0.5),  // Target 50% utilization
                'savings' => $this->calculateSavings($currentType, 'cpu', $cpuP95),
            ];
        }

        // Memory right-sizing
        if ($memP95 < $currentSpecs['memory'] * 0.5) {
            $recommendations[] = [
                'metric' => 'memory',
                'current' => $currentSpecs['memory'],
                'recommended' => ceil($memP95 / 0.7),  // Target 70% utilization
                'savings' => $this->calculateSavings($currentType, 'memory', $memP95),
            ];
        }

        return [
            'instance_id' => $instanceId,
            'current_type' => $currentType,
            'recommended_type' => $this->findBestFit($recommendations),
            'monthly_savings' => array_sum(array_column($recommendations, 'savings')),
        ];
    }
}

For databases, right-size based on query patterns, not data size. A 100GB database with simple queries needs less compute than a 10GB database with complex analytics.

Reserved Capacity

Reserved instances and savings plans provide significant discounts for committed usage. The tradeoff is reduced flexibility.

class ReservationPlanner
{
    public function recommendReservations(): array
    {
        $usage = $this->getHistoricalUsage(months: 6);

        // Find steady-state baseline
        $baseline = $this->calculateBaseline($usage);

        // Only reserve capacity that's consistently used
        $reservable = collect($usage['instances'])
            ->groupBy('type')
            ->filter(fn ($instances) => $this->isStable($instances))
            ->map(fn ($instances, $type) => [
                'type' => $type,
                'count' => $this->getMinimumCount($instances),
                'term' => $this->recommendTerm($instances),
                'annual_savings' => $this->calculateSavings($type, count($instances)),
            ]);

        return [
            'recommendations' => $reservable->toArray(),
            'total_annual_savings' => $reservable->sum('annual_savings'),
            'coverage_percent' => $this->calculateCoverage($reservable, $usage),
        ];
    }

    private function isStable(Collection $instances): bool
    {
        // Instance type used consistently over 6 months
        return $instances->every(fn ($i) => $i['months_active'] >= 5);
    }
}

Savings Plans (AWS) or Committed Use Discounts (GCP) offer more flexibility than reserved instances. They apply discounts to any matching usage, regardless of specific instance type or region.

Auto-Scaling

Auto-scaling matches capacity to demand, avoiding over-provisioning for peak loads.

# Kubernetes Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait before scaling down
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

For VMs, use managed instance groups or auto-scaling groups with appropriate policies:

// Scale based on queue depth
class ScalingDecider
{
    public function shouldScale(): ScalingDecision
    {
        $queueDepth = Queue::size('default');
        $currentWorkers = $this->getWorkerCount();
        $processingRate = $this->getProcessingRate();

        $requiredWorkers = ceil($queueDepth / ($processingRate * 300));  // Clear in 5 min

        if ($requiredWorkers > $currentWorkers * 1.2) {
            return new ScalingDecision('up', min($requiredWorkers, $this->maxWorkers));
        }

        if ($requiredWorkers < $currentWorkers * 0.5 && $currentWorkers > $this->minWorkers) {
            return new ScalingDecision('down', max($requiredWorkers, $this->minWorkers));
        }

        return new ScalingDecision('none', $currentWorkers);
    }
}

Scheduled Scaling

Development and staging environments often don't need 24/7 availability. Schedule shutdowns during off-hours.

// Scheduled environment control
class EnvironmentScheduler
{
    public function applySchedule(): void
    {
        $hour = now()->hour;
        $dayOfWeek = now()->dayOfWeek;

        // Development: off nights and weekends
        if ($this->environment === 'development') {
            if ($hour < 8 || $hour > 20 || $dayOfWeek === 0 || $dayOfWeek === 6) {
                $this->scaleDown();
            } else {
                $this->scaleUp();
            }
        }
    }

    private function scaleDown(): void
    {
        // Scale deployments to 0
        $this->kubectl('scale deployment --all --replicas=0');

        // Stop RDS instances
        foreach ($this->getRdsInstances() as $instance) {
            $this->stopRdsInstance($instance);
        }

        Log::info('Environment scaled down', ['env' => $this->environment]);
    }
}

Cost Visibility

Tagging resources enables cost allocation by team, project, or environment.

// Enforce tagging policy
class TaggingPolicy
{
    private array $requiredTags = ['team', 'project', 'environment', 'cost-center'];

    public function validate(array $resource): array
    {
        $missing = array_diff($this->requiredTags, array_keys($resource['tags'] ?? []));

        if (!empty($missing)) {
            return [
                'valid' => false,
                'missing_tags' => $missing,
                'message' => 'Resource missing required tags: ' . implode(', ', $missing),
            ];
        }

        return ['valid' => true];
    }
}

Generate cost reports by tag:

class CostReporter
{
    public function generateReport(string $period = 'monthly'): array
    {
        $costs = $this->getCostData($period);

        return [
            'by_team' => $this->groupBy($costs, 'team'),
            'by_project' => $this->groupBy($costs, 'project'),
            'by_service' => $this->groupByService($costs),
            'by_environment' => $this->groupBy($costs, 'environment'),
            'trends' => $this->calculateTrends($costs),
            'anomalies' => $this->detectAnomalies($costs),
        ];
    }
}

Conclusion

Cloud cost optimization requires ongoing attention. Monitor spending continuously. Right-size resources based on actual usage. Use reserved capacity for stable workloads. Scale automatically for variable loads. Tag resources for visibility.

Cost optimization isn't about minimizing spending; it's about maximizing value. Sometimes the cheapest option isn't the best option. Balance cost against performance, reliability, and engineering time. The goal is efficient cloud spending that supports business objectives.

Cloud Cost Optimization Strategies

Understanding Cloud Costs

Identifying Waste

Right-Sizing

Reserved Capacity

Auto-Scaling

Scheduled Scaling

Cost Visibility

Conclusion

Share this article

Related Articles

How to Budget for a Software Project You Don't Fully Understand

Why Your Last Developer Ghosted You (It's Not What You Think)

Hiring Your First Developer as a Non-Technical Founder

Need help with your project?

ScopeForged Assistant