Monday Cloud Tip: Kubernetes Cost Optimization – Stop the Resource Waste

Your weekly dose of actionable cloud wisdom to start the week right

The Problem

Your Kubernetes bill has exploded from £500 to £5,000 per month, but your applications aren’t running any faster. Pods are requesting massive amounts of CPU and memory “just to be safe,” nodes are running at 20% utilization, and you’re paying for expensive storage that nobody’s actually using. Meanwhile, finance is asking hard questions about cloud spend efficiency.

The Solution

Implement systematic Kubernetes cost optimization using resource rightsizing, intelligent scaling, and waste elimination techniques. Most K8s cost problems stem from poor resource requests/limits, oversized clusters, and lack of monitoring – all fixable with the right approach.

Essential Cost Optimization Strategies:

1. Right-Size Resource Requests and Limits

# Before: Overprovisioned (expensive)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-wasteful
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: my-web-app:latest
        resources:
          requests:
            memory: "2Gi"      # Way too much!
            cpu: "1000m"       # Way too much!
          limits:
            memory: "4Gi"      # Dangerous without monitoring
            cpu: "2000m"       # Expensive overkill

---

# After: Right-sized (cost-effective)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-optimized
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: my-web-app:latest
        resources:
          requests:
            memory: "256Mi"    # Based on actual usage
            cpu: "100m"        # Conservative but realistic
          limits:
            memory: "512Mi"    # 2x requests for safety
            cpu: "500m"        # Allow for traffic spikes
        # Add probes for better scheduling
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

2. Implement Horizontal Pod Autoscaling

# HPA configuration for cost-effective scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app-optimized
  minReplicas: 2              # Minimum for availability
  maxReplicas: 20             # Cap to prevent runaway costs
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale up at 70% CPU
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale up at 80% memory
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 50               # Scale down max 50% at once
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60   # Scale up quickly
      policies:
      - type: Percent
        value: 100              # Double pods if needed
        periodSeconds: 15

3. Cluster Autoscaling Configuration

# Cluster autoscaler configuration for different node types
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-status
  namespace: kube-system
data:
  nodes.max: "20"              # Maximum nodes to prevent runaway costs
  nodes.min: "3"               # Minimum nodes for availability
  scale-down-delay-after-add: "10m"
  scale-down-unneeded-time: "10m"
  skip-nodes-with-local-storage: "false"
  skip-nodes-with-system-pods: "false"

---

# Node pool configuration (AWS EKS example)
apiVersion: v1
kind: ConfigMap
metadata:
  name: node-pool-config
data:
  # General workloads - balanced cost/performance
  general-pool: |
    instance-types: ["t3.medium", "t3.large", "m5.large"]
    spot-percentage: 70      # 70% spot instances for cost savings
    on-demand-percentage: 30 # 30% on-demand for stability
    
  # CPU-intensive workloads
  compute-pool: |
    instance-types: ["c5.large", "c5.xlarge", "c5n.large"]
    spot-percentage: 50      # Lower spot % for more predictable workloads
    
  # Memory-intensive workloads  
  memory-pool: |
    instance-types: ["r5.large", "r5.xlarge", "r6i.large"]
    spot-percentage: 60

4. Storage Cost Optimization

# Optimized storage classes for different use cases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cost-optimized-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3                    # Latest generation (cheaper than gp2)
  iops: "3000"                 # Baseline IOPS
  throughput: "125"            # Baseline throughput
reclaimPolicy: Delete          # Clean up when PVC deleted
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

---

apiVersion: storage.k8s.io/v1  
kind: StorageClass
metadata:
  name: archival-storage
provisioner: kubernetes.io/aws-ebs
parameters:
  type: sc1                    # Cold HDD for archival data
  encrypted: "true"
reclaimPolicy: Retain          # Keep data for compliance
allowVolumeExpansion: true

---

# Example PVC with cost-conscious sizing
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data-optimized
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: cost-optimized-ssd
  resources:
    requests:
      storage: 20Gi            # Start small, expand as needed

5. Pod Disruption Budgets for Spot Instance Savings

# Allow spot instance interruptions without service disruption
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2              # Always keep 2 pods running
  selector:
    matchLabels:
      app: web-app

---

# Node affinity to prefer spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-job-spot-friendly
spec:
  replicas: 5
  selector:
    matchLabels:
      app: batch-job
  template:
    metadata:
      labels:
        app: batch-job
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: node.kubernetes.io/instance-type
                operator: In
                values: ["t3.medium", "t3.large"]  # Cheaper instance types
          - weight: 50
            preference:
              matchExpressions:
              - key: eks.amazonaws.com/capacityType
                operator: In
                values: ["SPOT"]                   # Prefer spot instances
      tolerations:
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 30                      # Quick recovery from spot termination
      containers:
      - name: batch-job
        image: my-batch-job:latest
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"

Cost Monitoring and Alerting

6. Resource Usage Monitoring

# Prometheus queries for cost monitoring
apiVersion: v1
kind: ConfigMap
metadata:
  name: cost-monitoring-queries
data:
  cpu-waste.pql: |
    # Pods requesting more CPU than they use
    (
      kube_pod_container_resource_requests{resource="cpu"} - 
      rate(container_cpu_usage_seconds_total[5m])
    ) / kube_pod_container_resource_requests{resource="cpu"} * 100
    
  memory-waste.pql: |
    # Pods requesting more memory than they use  
    (
      kube_pod_container_resource_requests{resource="memory"} - 
      container_memory_working_set_bytes
    ) / kube_pod_container_resource_requests{resource="memory"} * 100
    
  node-utilization.pql: |
    # Node CPU utilization
    100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
    
  cost-per-namespace.pql: |
    # Estimated cost per namespace (requires cost data)
    sum by (namespace) (
      kube_pod_container_resource_requests{resource="cpu"} * 0.048 +  # £0.048 per CPU hour
      kube_pod_container_resource_requests{resource="memory"} / 1024/1024/1024 * 0.0053  # £0.0053 per GB hour
    )

---

# Grafana dashboard configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: k8s-cost-dashboard
data:
  dashboard.json: |
    {
      "dashboard": {
        "title": "Kubernetes Cost Optimization",
        "panels": [
          {
            "title": "Monthly Cost Trend",
            "type": "graph",
            "targets": [
              {
                "expr": "sum(cost_per_namespace)",
                "legendFormat": "Total Monthly Cost"
              }
            ]
          },
          {
            "title": "Resource Waste by Namespace", 
            "type": "table",
            "targets": [
              {
                "expr": "cpu-waste",
                "legendFormat": "CPU Waste %"
              }
            ]
          }
        ]
      }
    }

7. Automated Cost Optimization Script

#!/bin/bash
# Kubernetes cost optimization audit script

echo "=== Kubernetes Cost Optimization Audit ==="
echo

# Check for overprovisioned pods
echo "🔍 Checking for overprovisioned pods..."
kubectl top pods --all-namespaces --containers | awk '
BEGIN { print "Namespace\tPod\tContainer\tCPU_Used\tMemory_Used" }
NR>1 {
    cpu_used = $4
    memory_used = $5
    gsub(/m/, "", cpu_used)
    gsub(/Mi/, "", memory_used)
    
    if (cpu_used < 50 && memory_used < 100) {
        print $1 "\t" $2 "\t" $3 "\t" cpu_used "m\t" memory_used "Mi\t⚠️ UNDERUTILIZED"
    }
}'

echo
echo "💰 Checking resource requests vs limits..."
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{range .spec.containers[*]}{.name}:{.resources.requests.cpu}:{.resources.limits.cpu}:{.resources.requests.memory}:{.resources.limits.memory}{"\n"}{end}{end}' | \
while IFS=$'\t' read namespace pod container_info; do
    if [[ "$container_info" =~ :.*: ]]; then
        echo "$namespace/$pod: $container_info"
    fi
done

echo
echo "📊 Node utilization summary..."
kubectl top nodes | awk 'NR>1 {
    gsub(/%/, "", $3)
    gsub(/%/, "", $5)
    if ($3 < 50 || $5 < 50) {
        print $1 "\tCPU: " $3 "% Memory: " $5 "% ⚠️ LOW UTILIZATION"
    }
}'

echo
echo "💾 Storage analysis..."
kubectl get pv -o custom-columns=NAME:.metadata.name,CAPACITY:.spec.capacity.storage,STATUS:.status.phase,CLAIM:.spec.claimRef.name | \
grep -E "(Available|Released)" | \
awk '{ print $1 "\t" $2 "\t" $3 "\t💸 UNUSED STORAGE" }'

echo
echo "🎯 Cost optimization recommendations:"
echo "1. Review underutilized pods and reduce resource requests"
echo "2. Consider spot instances for fault-tolerant workloads"  
echo "3. Implement HPA for variable workloads"
echo "4. Clean up unused storage volumes"
echo "5. Use cluster autoscaler to match capacity with demand"

Advanced Cost Optimization Techniques

8. Vertical Pod Autoscaler (VPA) Integration

# VPA for automatic resource rightsizing
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app-optimized
  updatePolicy:
    updateMode: "Auto"         # Automatically apply recommendations
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      maxAllowed:
        cpu: "1"               # Cap CPU to prevent expensive scaling
        memory: "1Gi"          # Cap memory 
      minAllowed:
        cpu: "50m"             # Minimum viable CPU
        memory: "64Mi"         # Minimum viable memory
      controlledResources: ["cpu", "memory"]

9. Cost Allocation and Chargeback

# Labels for cost allocation
apiVersion: apps/v1
kind: Deployment
metadata:
  name: billing-tagged-app
  labels:
    cost-center: "engineering"
    project: "web-platform"
    environment: "production"
    team: "backend"
spec:
  template:
    metadata:
      labels:
        cost-center: "engineering"
        project: "web-platform"
        environment: "production"
        team: "backend"
    spec:
      containers:
      - name: app
        image: my-app:latest
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Why It Matters

  • Cost Control: Right-sizing can reduce K8s costs by 40-70%
  • Resource Efficiency: Better utilization = more applications per £ spent
  • Environmental Impact: Less waste = smaller carbon footprint
  • Budget Predictability: Autoscaling prevents both overspend and outages

Try This Week

  1. Audit current resource usage – Run the cost optimization script above
  2. Right-size one application – Start with your most expensive workload
  3. Implement HPA – Add autoscaling to variable workloads
  4. Set up cost monitoring – Create alerts for unexpected spend increases

Quick Cost Estimation Calculator

# Python script to estimate K8s costs
def calculate_k8s_costs(cpu_cores, memory_gb, hours_per_month=730):
    """
    Calculate approximate Kubernetes costs
    Based on average cloud provider pricing
    """
    
    # Rough pricing (varies by provider and region)
    cpu_cost_per_hour = 0.048    # £0.048 per vCPU hour
    memory_cost_per_hour = 0.0053  # £0.0053 per GB hour
    
    monthly_cpu_cost = cpu_cores * cpu_cost_per_hour * hours_per_month
    monthly_memory_cost = memory_gb * memory_cost_per_hour * hours_per_month
    
    total_monthly_cost = monthly_cpu_cost + monthly_memory_cost
    
    return {
        'cpu_cost': round(monthly_cpu_cost, 2),
        'memory_cost': round(monthly_memory_cost, 2),
        'total_cost': round(total_monthly_cost, 2),
        'annual_cost': round(total_monthly_cost * 12, 2)
    }

# Example usage
print("Current setup (overprovisioned):")
current = calculate_k8s_costs(cpu_cores=20, memory_gb=80)
print(f"Monthly cost: £{current['total_cost']}")

print("\nOptimized setup:")
optimized = calculate_k8s_costs(cpu_cores=8, memory_gb=32)
print(f"Monthly cost: £{optimized['total_cost']}")

savings = current['total_cost'] - optimized['total_cost']
print(f"\nMonthly savings: £{savings}")
print(f"Annual savings: £{savings * 12}")

Common Cost Traps to Avoid

  • No resource limits: Pods can consume unlimited resources
  • Overprovisioned requests: “Better safe than sorry” mentality
  • Unused persistent volumes: Storage costs accumulating
  • Single large nodes: Poor bin packing efficiency
  • No spot instance usage: Missing 60-80% cost savings

Advanced Optimization Tools

  • KubeCost: Detailed cost breakdown and optimization recommendations
  • Goldilocks: VPA recommendations for right-sizing
  • Cluster Autoscaler: Automatic node scaling
  • KEDA: Event-driven autoscaling for more efficient resource usage

Pro Tip: Start with monitoring before optimizing. Install Prometheus and Grafana to understand your actual resource usage patterns. You can’t optimize what you can’t measure, and most teams are surprised by how little CPU and memory their applications actually need.


Achieved massive Kubernetes cost savings at your organization? I’d love to hear your optimization stories – real cost reduction wins inspire the best Monday tips!