Your weekly dose of actionable cloud wisdom to start the week right
The Problem
Your Kubernetes bill has exploded from £500 to £5,000 per month, but your applications aren’t running any faster. Pods are requesting massive amounts of CPU and memory “just to be safe,” nodes are running at 20% utilization, and you’re paying for expensive storage that nobody’s actually using. Meanwhile, finance is asking hard questions about cloud spend efficiency.
The Solution
Implement systematic Kubernetes cost optimization using resource rightsizing, intelligent scaling, and waste elimination techniques. Most K8s cost problems stem from poor resource requests/limits, oversized clusters, and lack of monitoring – all fixable with the right approach.
Essential Cost Optimization Strategies:
1. Right-Size Resource Requests and Limits
# Before: Overprovisioned (expensive)
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-wasteful
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: my-web-app:latest
resources:
requests:
memory: "2Gi" # Way too much!
cpu: "1000m" # Way too much!
limits:
memory: "4Gi" # Dangerous without monitoring
cpu: "2000m" # Expensive overkill
---
# After: Right-sized (cost-effective)
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-optimized
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: my-web-app:latest
resources:
requests:
memory: "256Mi" # Based on actual usage
cpu: "100m" # Conservative but realistic
limits:
memory: "512Mi" # 2x requests for safety
cpu: "500m" # Allow for traffic spikes
# Add probes for better scheduling
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
2. Implement Horizontal Pod Autoscaling
# HPA configuration for cost-effective scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app-optimized
minReplicas: 2 # Minimum for availability
maxReplicas: 20 # Cap to prevent runaway costs
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up at 70% CPU
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale up at 80% memory
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 50 # Scale down max 50% at once
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60 # Scale up quickly
policies:
- type: Percent
value: 100 # Double pods if needed
periodSeconds: 15
3. Cluster Autoscaling Configuration
# Cluster autoscaler configuration for different node types
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-status
namespace: kube-system
data:
nodes.max: "20" # Maximum nodes to prevent runaway costs
nodes.min: "3" # Minimum nodes for availability
scale-down-delay-after-add: "10m"
scale-down-unneeded-time: "10m"
skip-nodes-with-local-storage: "false"
skip-nodes-with-system-pods: "false"
---
# Node pool configuration (AWS EKS example)
apiVersion: v1
kind: ConfigMap
metadata:
name: node-pool-config
data:
# General workloads - balanced cost/performance
general-pool: |
instance-types: ["t3.medium", "t3.large", "m5.large"]
spot-percentage: 70 # 70% spot instances for cost savings
on-demand-percentage: 30 # 30% on-demand for stability
# CPU-intensive workloads
compute-pool: |
instance-types: ["c5.large", "c5.xlarge", "c5n.large"]
spot-percentage: 50 # Lower spot % for more predictable workloads
# Memory-intensive workloads
memory-pool: |
instance-types: ["r5.large", "r5.xlarge", "r6i.large"]
spot-percentage: 60
4. Storage Cost Optimization
# Optimized storage classes for different use cases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cost-optimized-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3 # Latest generation (cheaper than gp2)
iops: "3000" # Baseline IOPS
throughput: "125" # Baseline throughput
reclaimPolicy: Delete # Clean up when PVC deleted
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: archival-storage
provisioner: kubernetes.io/aws-ebs
parameters:
type: sc1 # Cold HDD for archival data
encrypted: "true"
reclaimPolicy: Retain # Keep data for compliance
allowVolumeExpansion: true
---
# Example PVC with cost-conscious sizing
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data-optimized
spec:
accessModes:
- ReadWriteOnce
storageClassName: cost-optimized-ssd
resources:
requests:
storage: 20Gi # Start small, expand as needed
5. Pod Disruption Budgets for Spot Instance Savings
# Allow spot instance interruptions without service disruption
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-app-pdb
spec:
minAvailable: 2 # Always keep 2 pods running
selector:
matchLabels:
app: web-app
---
# Node affinity to prefer spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-job-spot-friendly
spec:
replicas: 5
selector:
matchLabels:
app: batch-job
template:
metadata:
labels:
app: batch-job
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values: ["t3.medium", "t3.large"] # Cheaper instance types
- weight: 50
preference:
matchExpressions:
- key: eks.amazonaws.com/capacityType
operator: In
values: ["SPOT"] # Prefer spot instances
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 30 # Quick recovery from spot termination
containers:
- name: batch-job
image: my-batch-job:latest
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
Cost Monitoring and Alerting
6. Resource Usage Monitoring
# Prometheus queries for cost monitoring
apiVersion: v1
kind: ConfigMap
metadata:
name: cost-monitoring-queries
data:
cpu-waste.pql: |
# Pods requesting more CPU than they use
(
kube_pod_container_resource_requests{resource="cpu"} -
rate(container_cpu_usage_seconds_total[5m])
) / kube_pod_container_resource_requests{resource="cpu"} * 100
memory-waste.pql: |
# Pods requesting more memory than they use
(
kube_pod_container_resource_requests{resource="memory"} -
container_memory_working_set_bytes
) / kube_pod_container_resource_requests{resource="memory"} * 100
node-utilization.pql: |
# Node CPU utilization
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
cost-per-namespace.pql: |
# Estimated cost per namespace (requires cost data)
sum by (namespace) (
kube_pod_container_resource_requests{resource="cpu"} * 0.048 + # £0.048 per CPU hour
kube_pod_container_resource_requests{resource="memory"} / 1024/1024/1024 * 0.0053 # £0.0053 per GB hour
)
---
# Grafana dashboard configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: k8s-cost-dashboard
data:
dashboard.json: |
{
"dashboard": {
"title": "Kubernetes Cost Optimization",
"panels": [
{
"title": "Monthly Cost Trend",
"type": "graph",
"targets": [
{
"expr": "sum(cost_per_namespace)",
"legendFormat": "Total Monthly Cost"
}
]
},
{
"title": "Resource Waste by Namespace",
"type": "table",
"targets": [
{
"expr": "cpu-waste",
"legendFormat": "CPU Waste %"
}
]
}
]
}
}
7. Automated Cost Optimization Script
#!/bin/bash
# Kubernetes cost optimization audit script
echo "=== Kubernetes Cost Optimization Audit ==="
echo
# Check for overprovisioned pods
echo "🔍 Checking for overprovisioned pods..."
kubectl top pods --all-namespaces --containers | awk '
BEGIN { print "Namespace\tPod\tContainer\tCPU_Used\tMemory_Used" }
NR>1 {
cpu_used = $4
memory_used = $5
gsub(/m/, "", cpu_used)
gsub(/Mi/, "", memory_used)
if (cpu_used < 50 && memory_used < 100) {
print $1 "\t" $2 "\t" $3 "\t" cpu_used "m\t" memory_used "Mi\t⚠️ UNDERUTILIZED"
}
}'
echo
echo "💰 Checking resource requests vs limits..."
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{range .spec.containers[*]}{.name}:{.resources.requests.cpu}:{.resources.limits.cpu}:{.resources.requests.memory}:{.resources.limits.memory}{"\n"}{end}{end}' | \
while IFS=$'\t' read namespace pod container_info; do
if [[ "$container_info" =~ :.*: ]]; then
echo "$namespace/$pod: $container_info"
fi
done
echo
echo "📊 Node utilization summary..."
kubectl top nodes | awk 'NR>1 {
gsub(/%/, "", $3)
gsub(/%/, "", $5)
if ($3 < 50 || $5 < 50) {
print $1 "\tCPU: " $3 "% Memory: " $5 "% ⚠️ LOW UTILIZATION"
}
}'
echo
echo "💾 Storage analysis..."
kubectl get pv -o custom-columns=NAME:.metadata.name,CAPACITY:.spec.capacity.storage,STATUS:.status.phase,CLAIM:.spec.claimRef.name | \
grep -E "(Available|Released)" | \
awk '{ print $1 "\t" $2 "\t" $3 "\t💸 UNUSED STORAGE" }'
echo
echo "🎯 Cost optimization recommendations:"
echo "1. Review underutilized pods and reduce resource requests"
echo "2. Consider spot instances for fault-tolerant workloads"
echo "3. Implement HPA for variable workloads"
echo "4. Clean up unused storage volumes"
echo "5. Use cluster autoscaler to match capacity with demand"
Advanced Cost Optimization Techniques
8. Vertical Pod Autoscaler (VPA) Integration
# VPA for automatic resource rightsizing
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app-optimized
updatePolicy:
updateMode: "Auto" # Automatically apply recommendations
resourcePolicy:
containerPolicies:
- containerName: web-app
maxAllowed:
cpu: "1" # Cap CPU to prevent expensive scaling
memory: "1Gi" # Cap memory
minAllowed:
cpu: "50m" # Minimum viable CPU
memory: "64Mi" # Minimum viable memory
controlledResources: ["cpu", "memory"]
9. Cost Allocation and Chargeback
# Labels for cost allocation
apiVersion: apps/v1
kind: Deployment
metadata:
name: billing-tagged-app
labels:
cost-center: "engineering"
project: "web-platform"
environment: "production"
team: "backend"
spec:
template:
metadata:
labels:
cost-center: "engineering"
project: "web-platform"
environment: "production"
team: "backend"
spec:
containers:
- name: app
image: my-app:latest
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
Why It Matters
- Cost Control: Right-sizing can reduce K8s costs by 40-70%
- Resource Efficiency: Better utilization = more applications per £ spent
- Environmental Impact: Less waste = smaller carbon footprint
- Budget Predictability: Autoscaling prevents both overspend and outages
Try This Week
- Audit current resource usage – Run the cost optimization script above
- Right-size one application – Start with your most expensive workload
- Implement HPA – Add autoscaling to variable workloads
- Set up cost monitoring – Create alerts for unexpected spend increases
Quick Cost Estimation Calculator
# Python script to estimate K8s costs
def calculate_k8s_costs(cpu_cores, memory_gb, hours_per_month=730):
"""
Calculate approximate Kubernetes costs
Based on average cloud provider pricing
"""
# Rough pricing (varies by provider and region)
cpu_cost_per_hour = 0.048 # £0.048 per vCPU hour
memory_cost_per_hour = 0.0053 # £0.0053 per GB hour
monthly_cpu_cost = cpu_cores * cpu_cost_per_hour * hours_per_month
monthly_memory_cost = memory_gb * memory_cost_per_hour * hours_per_month
total_monthly_cost = monthly_cpu_cost + monthly_memory_cost
return {
'cpu_cost': round(monthly_cpu_cost, 2),
'memory_cost': round(monthly_memory_cost, 2),
'total_cost': round(total_monthly_cost, 2),
'annual_cost': round(total_monthly_cost * 12, 2)
}
# Example usage
print("Current setup (overprovisioned):")
current = calculate_k8s_costs(cpu_cores=20, memory_gb=80)
print(f"Monthly cost: £{current['total_cost']}")
print("\nOptimized setup:")
optimized = calculate_k8s_costs(cpu_cores=8, memory_gb=32)
print(f"Monthly cost: £{optimized['total_cost']}")
savings = current['total_cost'] - optimized['total_cost']
print(f"\nMonthly savings: £{savings}")
print(f"Annual savings: £{savings * 12}")
Common Cost Traps to Avoid
- No resource limits: Pods can consume unlimited resources
- Overprovisioned requests: “Better safe than sorry” mentality
- Unused persistent volumes: Storage costs accumulating
- Single large nodes: Poor bin packing efficiency
- No spot instance usage: Missing 60-80% cost savings
Advanced Optimization Tools
- KubeCost: Detailed cost breakdown and optimization recommendations
- Goldilocks: VPA recommendations for right-sizing
- Cluster Autoscaler: Automatic node scaling
- KEDA: Event-driven autoscaling for more efficient resource usage
Pro Tip: Start with monitoring before optimizing. Install Prometheus and Grafana to understand your actual resource usage patterns. You can’t optimize what you can’t measure, and most teams are surprised by how little CPU and memory their applications actually need.
Achieved massive Kubernetes cost savings at your organization? I’d love to hear your optimization stories – real cost reduction wins inspire the best Monday tips!








