Cloud cost optimization balances performance requirements against spending. Without active management, cloud bills grow unpredictably as teams provision resources for peak loads, forget to decommission experiments, or choose expensive services when cheaper alternatives suffice. Systematic cost optimization reduces waste while maintaining application performance.
The pay-as-you-go model that makes cloud attractive also makes costs difficult to predict. Understanding cloud pricing models, identifying waste, and implementing cost controls helps organizations benefit from cloud flexibility without budget surprises.
Understanding Cloud Costs
Cloud costs come from multiple sources: compute (VMs, containers, serverless), storage (block, object, database), network (data transfer, load balancers), and managed services (databases, caches, queues). Each has different pricing dimensions.
Compute pricing depends on instance type, region, and commitment level. On-demand instances cost most but offer flexibility. Reserved instances (1-3 year commitments) reduce costs 30-70%. Spot/preemptible instances offer 60-90% discounts but can be terminated with notice.
// Cost-aware instance selection
class InstanceRecommender
{
public function recommend(array $requirements): array
{
$options = [];
// On-demand baseline
$options['on_demand'] = [
'type' => $this->findInstance($requirements),
'cost' => $this->getOnDemandPrice($requirements),
'availability' => 'guaranteed',
];
// Reserved instance option
$options['reserved_1yr'] = [
'type' => $this->findInstance($requirements),
'cost' => $this->getOnDemandPrice($requirements) * 0.6,
'commitment' => '1 year',
'break_even_months' => 7,
];
// Spot instance option
$options['spot'] = [
'type' => $this->findInstance($requirements),
'cost' => $this->getSpotPrice($requirements),
'availability' => 'interruptible',
'suitable_for' => 'fault-tolerant workloads',
];
return $options;
}
}
Storage costs accumulate continuously. Object storage is cheap per GB but access costs add up. Block storage costs depend on provisioned size, not used size. Database storage includes I/O charges.
Network costs are often overlooked. Data transfer between regions or to the internet costs money. Internal traffic is usually free, incentivizing regional deployment.
Identifying Waste
Idle resources are the most common waste. VMs running but unused, databases provisioned for peak load, storage holding forgotten data.
// Find underutilized resources
class ResourceAnalyzer
{
public function findIdleInstances(): array
{
$instances = $this->cloudProvider->getInstances();
return collect($instances)
->filter(function ($instance) {
$metrics = $this->getMetrics($instance->id, 'cpu', 'week');
$avgCpu = collect($metrics)->average();
// Flag instances with < 5% average CPU
return $avgCpu < 5;
})
->map(fn ($i) => [
'id' => $i->id,
'type' => $i->type,
'monthly_cost' => $this->getMonthlyCost($i),
'recommendation' => $this->getRecommendation($i),
])
->toArray();
}
private function getRecommendation($instance): string
{
// Recommend downsizing or termination
return $instance->environment === 'development'
? 'Consider terminating during off-hours'
: 'Consider downsizing to smaller instance type';
}
}
Orphaned resources accumulate over time. Load balancers pointing to nothing, EBS volumes detached from instances, snapshots of deleted resources.
# Find unattached EBS volumes (AWS CLI)
aws ec2 describe-volumes \
--filters "Name=status,Values=available" \
--query 'Volumes[*].[VolumeId,Size,CreateTime]'
# Find unused elastic IPs
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]'
Over-provisioned resources handle peak loads that rarely occur. Right-size based on actual usage, not anticipated maximums.
Right-Sizing
Right-sizing matches resource allocation to actual needs. Over-provisioning wastes money; under-provisioning hurts performance.
class RightSizingRecommender
{
public function analyze(string $instanceId): array
{
$metrics = $this->getMetrics($instanceId, days: 30);
$cpuP95 = $this->percentile($metrics['cpu'], 95);
$memP95 = $this->percentile($metrics['memory'], 95);
$currentType = $this->getInstance($instanceId)->type;
$currentSpecs = $this->getSpecs($currentType);
$recommendations = [];
// CPU right-sizing
if ($cpuP95 < $currentSpecs['vcpu'] * 0.3) {
$recommendations[] = [
'metric' => 'cpu',
'current' => $currentSpecs['vcpu'],
'recommended' => ceil($cpuP95 / 0.5), // Target 50% utilization
'savings' => $this->calculateSavings($currentType, 'cpu', $cpuP95),
];
}
// Memory right-sizing
if ($memP95 < $currentSpecs['memory'] * 0.5) {
$recommendations[] = [
'metric' => 'memory',
'current' => $currentSpecs['memory'],
'recommended' => ceil($memP95 / 0.7), // Target 70% utilization
'savings' => $this->calculateSavings($currentType, 'memory', $memP95),
];
}
return [
'instance_id' => $instanceId,
'current_type' => $currentType,
'recommended_type' => $this->findBestFit($recommendations),
'monthly_savings' => array_sum(array_column($recommendations, 'savings')),
];
}
}
For databases, right-size based on query patterns, not data size. A 100GB database with simple queries needs less compute than a 10GB database with complex analytics.
Reserved Capacity
Reserved instances and savings plans provide significant discounts for committed usage. The tradeoff is reduced flexibility.
class ReservationPlanner
{
public function recommendReservations(): array
{
$usage = $this->getHistoricalUsage(months: 6);
// Find steady-state baseline
$baseline = $this->calculateBaseline($usage);
// Only reserve capacity that's consistently used
$reservable = collect($usage['instances'])
->groupBy('type')
->filter(fn ($instances) => $this->isStable($instances))
->map(fn ($instances, $type) => [
'type' => $type,
'count' => $this->getMinimumCount($instances),
'term' => $this->recommendTerm($instances),
'annual_savings' => $this->calculateSavings($type, count($instances)),
]);
return [
'recommendations' => $reservable->toArray(),
'total_annual_savings' => $reservable->sum('annual_savings'),
'coverage_percent' => $this->calculateCoverage($reservable, $usage),
];
}
private function isStable(Collection $instances): bool
{
// Instance type used consistently over 6 months
return $instances->every(fn ($i) => $i['months_active'] >= 5);
}
}
Savings Plans (AWS) or Committed Use Discounts (GCP) offer more flexibility than reserved instances. They apply discounts to any matching usage, regardless of specific instance type or region.
Auto-Scaling
Auto-scaling matches capacity to demand, avoiding over-provisioning for peak loads.
# Kubernetes Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait before scaling down
policies:
- type: Percent
value: 10
periodSeconds: 60
For VMs, use managed instance groups or auto-scaling groups with appropriate policies:
// Scale based on queue depth
class ScalingDecider
{
public function shouldScale(): ScalingDecision
{
$queueDepth = Queue::size('default');
$currentWorkers = $this->getWorkerCount();
$processingRate = $this->getProcessingRate();
$requiredWorkers = ceil($queueDepth / ($processingRate * 300)); // Clear in 5 min
if ($requiredWorkers > $currentWorkers * 1.2) {
return new ScalingDecision('up', min($requiredWorkers, $this->maxWorkers));
}
if ($requiredWorkers < $currentWorkers * 0.5 && $currentWorkers > $this->minWorkers) {
return new ScalingDecision('down', max($requiredWorkers, $this->minWorkers));
}
return new ScalingDecision('none', $currentWorkers);
}
}
Scheduled Scaling
Development and staging environments often don't need 24/7 availability. Schedule shutdowns during off-hours.
// Scheduled environment control
class EnvironmentScheduler
{
public function applySchedule(): void
{
$hour = now()->hour;
$dayOfWeek = now()->dayOfWeek;
// Development: off nights and weekends
if ($this->environment === 'development') {
if ($hour < 8 || $hour > 20 || $dayOfWeek === 0 || $dayOfWeek === 6) {
$this->scaleDown();
} else {
$this->scaleUp();
}
}
}
private function scaleDown(): void
{
// Scale deployments to 0
$this->kubectl('scale deployment --all --replicas=0');
// Stop RDS instances
foreach ($this->getRdsInstances() as $instance) {
$this->stopRdsInstance($instance);
}
Log::info('Environment scaled down', ['env' => $this->environment]);
}
}
Cost Visibility
Tagging resources enables cost allocation by team, project, or environment.
// Enforce tagging policy
class TaggingPolicy
{
private array $requiredTags = ['team', 'project', 'environment', 'cost-center'];
public function validate(array $resource): array
{
$missing = array_diff($this->requiredTags, array_keys($resource['tags'] ?? []));
if (!empty($missing)) {
return [
'valid' => false,
'missing_tags' => $missing,
'message' => 'Resource missing required tags: ' . implode(', ', $missing),
];
}
return ['valid' => true];
}
}
Generate cost reports by tag:
class CostReporter
{
public function generateReport(string $period = 'monthly'): array
{
$costs = $this->getCostData($period);
return [
'by_team' => $this->groupBy($costs, 'team'),
'by_project' => $this->groupBy($costs, 'project'),
'by_service' => $this->groupByService($costs),
'by_environment' => $this->groupBy($costs, 'environment'),
'trends' => $this->calculateTrends($costs),
'anomalies' => $this->detectAnomalies($costs),
];
}
}
Conclusion
Cloud cost optimization requires ongoing attention. Monitor spending continuously. Right-size resources based on actual usage. Use reserved capacity for stable workloads. Scale automatically for variable loads. Tag resources for visibility.
Cost optimization isn't about minimizing spending; it's about maximizing value. Sometimes the cheapest option isn't the best option. Balance cost against performance, reliability, and engineering time. The goal is efficient cloud spending that supports business objectives.