GCP Cloud Run Performance Optimization Guide

Your weekly dose of actionable cloud wisdom to start the week right

The Problem

Your Cloud Run services are taking 10+ seconds to cold start, costing you customers who won’t wait for slow responses. You’re paying for CPU and memory you don’t need, instances are scaling chaotically during traffic spikes, and your serverless dream has become a performance nightmare. Meanwhile, other teams have sub-second response times and predictable costs with the same technology.

The Solution

Optimize Cloud Run performance using intelligent resource sizing, startup acceleration, concurrency tuning, and traffic management. Cloud Run can deliver exceptional performance and cost efficiency when properly configured – the key is understanding how serverless containers behave under load and optimizing accordingly.

Essential Cloud Run Performance Optimizations:

1. Container Image and Startup Optimization

# Optimized Dockerfile for fast Cloud Run startup
# Use distroless or alpine base images for minimal size
FROM gcr.io/distroless/java:11
# Alternative: FROM node:18-alpine for Node.js apps

# Optimize layer caching - put least changing items first
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code last for better caching
COPY . .

# Use multi-stage builds to minimize final image size
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

FROM node:18-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY . .

# Optimize for Cloud Run startup
# Use exec form for proper signal handling
CMD ["node", "server.js"]

# Set appropriate user (Cloud Run runs as non-root by default)
USER 1000

# Expose the port that Cloud Run expects
EXPOSE 8080

# Add health check endpoint for faster readiness detection
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s --retries=2 \
  CMD curl -f http://localhost:8080/health || exit 1

# Cloud Run service configuration for optimal performance
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: optimized-app
  annotations:
    # Enable CPU allocation during startup to reduce cold start time
    run.googleapis.com/cpu-throttling: "false"
    # Set minimum instances to avoid cold starts entirely
    run.googleapis.com/min-instances: "1"
    # Set maximum instances to control costs and avoid thundering herd
    run.googleapis.com/max-instances: "100"
    # Enable HTTP/2 for better performance
    run.googleapis.com/ingress: all
    # Use second generation execution environment for better performance
    run.googleapis.com/execution-environment: gen2
spec:
  template:
    metadata:
      annotations:
        # Optimize autoscaling behavior
        autoscaling.knative.dev/maxScale: "100"
        autoscaling.knative.dev/minScale: "1"
        # Set concurrency target for optimal performance
        autoscaling.knative.dev/target: "70"
        # Configure faster scaling
        autoscaling.knative.dev/window: "60s"
    spec:
      # Set appropriate resource limits
      containerConcurrency: 80
      timeoutSeconds: 300
      serviceAccountName: cloud-run-optimized-sa
      containers:
      - name: app
        image: gcr.io/project/optimized-app:latest
        ports:
        - name: http1
          containerPort: 8080
        env:
        - name: NODE_ENV
          value: "production"
        - name: PORT
          value: "8080"
        resources:
          limits:
            # Right-size CPU and memory for your workload
            cpu: "1"      # 1 vCPU
            memory: "512Mi"  # 512MB RAM
          requests:
            cpu: "0.5"    # Minimum CPU allocation
            memory: "256Mi"  # Minimum memory
        # Add startup probe for faster readiness detection
        startupProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 0
          periodSeconds: 1
          timeoutSeconds: 1
          successThreshold: 1
          failureThreshold: 10
        # Add liveness probe
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          periodSeconds: 10
          timeoutSeconds: 5
        # Add readiness probe
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          periodSeconds: 5
          timeoutSeconds: 3

2. Application-Level Performance Optimizations

// Node.js optimizations for Cloud Run
const express = require('express');
const compression = require('compression');
const helmet = require('helmet');

const app = express();
const PORT = process.env.PORT || 8080;

// Enable gzip compression
app.use(compression());

// Security headers
app.use(helmet());

// Optimize JSON parsing
app.use(express.json({ limit: '10mb' }));

// Connection pooling and keep-alive for external services
const https = require('https');
const http = require('http');

const httpsAgent = new https.Agent({
  keepAlive: true,
  keepAliveMsecs: 10000,
  maxSockets: 50,
  maxFreeSockets: 10,
  timeout: 60000,
});

const httpAgent = new http.Agent({
  keepAlive: true,
  keepAliveMsecs: 10000,
  maxSockets: 50,
  maxFreeSockets: 10,
  timeout: 60000,
});

// Database connection optimization
class DatabaseManager {
  constructor() {
    this.pool = null;
    this.initializePool();
  }

  initializePool() {
    const { Pool } = require('pg');
    
    this.pool = new Pool({
      connectionString: process.env.DATABASE_URL,
      // Optimize connection pool for Cloud Run
      max: 10,           // Maximum connections
      min: 2,            // Minimum connections
      idleTimeoutMillis: 30000,
      connectionTimeoutMillis: 2000,
      // Enable SSL for secure connections
      ssl: process.env.NODE_ENV === 'production' ? { rejectUnauthorized: false } : false
    });

    // Handle pool errors
    this.pool.on('error', (err) => {
      console.error('Database pool error:', err);
    });
  }

  async query(text, params) {
    const start = Date.now();
    try {
      const res = await this.pool.query(text, params);
      const duration = Date.now() - start;
      console.log(`Query executed in ${duration}ms`);
      return res;
    } catch (error) {
      console.error('Database query error:', error);
      throw error;
    }
  }

  async close() {
    await this.pool.end();
  }
}

const db = new DatabaseManager();

// In-memory caching for frequently accessed data
const NodeCache = require('node-cache');
const cache = new NodeCache({ 
  stdTTL: 600,  // 10 minutes default TTL
  checkperiod: 120,  // Check for expired keys every 2 minutes
  useClones: false   // Better performance, but be careful with object mutations
});

// Middleware for caching
function cacheMiddleware(ttl = 300) {
  return (req, res, next) => {
    const key = req.originalUrl;
    const cachedData = cache.get(key);
    
    if (cachedData) {
      console.log(`Cache hit for ${key}`);
      return res.json(cachedData);
    }
    
    // Override res.json to cache the response
    const originalJson = res.json;
    res.json = function(data) {
      cache.set(key, data, ttl);
      console.log(`Cached response for ${key}`);
      return originalJson.call(this, data);
    };
    
    next();
  };
}

// Health check endpoints for Cloud Run
app.get('/health', (req, res) => {
  res.status(200).json({ 
    status: 'healthy', 
    timestamp: new Date().toISOString(),
    uptime: process.uptime()
  });
});

app.get('/ready', async (req, res) => {
  try {
    // Check database connectivity
    await db.query('SELECT 1');
    res.status(200).json({ 
      status: 'ready',
      database: 'connected'
    });
  } catch (error) {
    res.status(503).json({ 
      status: 'not ready',
      error: 'database connection failed'
    });
  }
});

// Optimized API endpoints with caching
app.get('/api/data', cacheMiddleware(600), async (req, res) => {
  try {
    const result = await db.query('SELECT * FROM expensive_query LIMIT 100');
    res.json(result.rows);
  } catch (error) {
    res.status(500).json({ error: 'Internal server error' });
  }
});

// Batch processing endpoint for efficiency
app.post('/api/batch', async (req, res) => {
  try {
    const { items } = req.body;
    
    if (!Array.isArray(items) || items.length === 0) {
      return res.status(400).json({ error: 'Items array is required' });
    }

    // Process items in parallel but with concurrency limit
    const pLimit = require('p-limit');
    const limit = pLimit(10); // Process max 10 items concurrently
    
    const results = await Promise.allSettled(
      items.map(item => 
        limit(() => processItem(item))
      )
    );
    
    const processed = results.map((result, index) => ({
      index,
      status: result.status,
      data: result.status === 'fulfilled' ? result.value : null,
      error: result.status === 'rejected' ? result.reason.message : null
    }));
    
    res.json({ 
      processed: processed.length,
      successful: results.filter(r => r.status === 'fulfilled').length,
      failed: results.filter(r => r.status === 'rejected').length,
      results: processed
    });
    
  } catch (error) {
    res.status(500).json({ error: 'Batch processing failed' });
  }
});

async function processItem(item) {
  // Simulate item processing
  await new Promise(resolve => setTimeout(resolve, 100));
  return { id: item.id, processed: true, timestamp: new Date() };
}

// Graceful shutdown for Cloud Run
process.on('SIGINT', gracefulShutdown);
process.on('SIGTERM', gracefulShutdown);

async function gracefulShutdown(signal) {
  console.log(`Received ${signal}. Starting graceful shutdown...`);
  
  // Stop accepting new requests
  server.close(async () => {
    console.log('HTTP server closed');
    
    // Close database connections
    try {
      await db.close();
      console.log('Database connections closed');
    } catch (error) {
      console.error('Error closing database:', error);
    }
    
    // Clear cache
    cache.flushAll();
    
    process.exit(0);
  });
  
  // Force shutdown after 30 seconds
  setTimeout(() => {
    console.error('Force shutdown');
    process.exit(1);
  }, 30000);
}

const server = app.listen(PORT, '0.0.0.0', () => {
  console.log(`Server running on port ${PORT}`);
  console.log(`Environment: ${process.env.NODE_ENV}`);
  console.log(`Memory limit: ${process.env.MEMORY_LIMIT || 'Not set'}`);
});

module.exports = app;

3. Performance Monitoring and Optimization Script

# Cloud Run performance monitoring and optimization script
import requests
import time
import json
import statistics
from google.cloud import run_v2
from google.cloud import monitoring_v3
from datetime import datetime, timedelta

class CloudRunPerformanceOptimizer:
    def __init__(self, project_id, region='europe-west1'):
        self.project_id = project_id
        self.region = region
        self.client = run_v2.ServicesClient()
        self.monitoring_client = monitoring_v3.MetricServiceClient()
    
    def analyze_service_performance(self, service_name, days_back=7):
        """
        Analyze Cloud Run service performance metrics
        """
        parent = f"projects/{self.project_id}"
        
        # Get service configuration
        service_path = f"projects/{self.project_id}/locations/{self.region}/services/{service_name}"
        
        try:
            service = self.client.get_service(name=service_path)
        except Exception as e:
            print(f"Error getting service info: {e}")
            return None
        
        # Analyze metrics
        end_time = datetime.utcnow()
        start_time = end_time - timedelta(days=days_back)
        
        metrics_analysis = {
            'service_name': service_name,
            'current_config': self._extract_service_config(service),
            'performance_metrics': self._get_performance_metrics(service_name, start_time, end_time),
            'cost_analysis': self._calculate_cost_metrics(service_name, start_time, end_time),
            'recommendations': []
        }
        
        # Generate recommendations
        recommendations = self._generate_performance_recommendations(metrics_analysis)
        metrics_analysis['recommendations'] = recommendations
        
        return metrics_analysis
    
    def _extract_service_config(self, service):
        """
        Extract current service configuration
        """
        container = service.spec.template.spec.containers[0]
        
        return {
            'cpu_limit': container.resources.limits.get('cpu', 'Not set'),
            'memory_limit': container.resources.limits.get('memory', 'Not set'),
            'concurrency': service.spec.template.spec.container_concurrency,
            'min_instances': service.spec.template.metadata.annotations.get(
                'run.googleapis.com/min-instances', '0'
            ),
            'max_instances': service.spec.template.metadata.annotations.get(
                'run.googleapis.com/max-instances', '1000'
            ),
            'cpu_throttling': service.spec.template.metadata.annotations.get(
                'run.googleapis.com/cpu-throttling', 'true'
            ),
            'execution_environment': service.spec.template.metadata.annotations.get(
                'run.googleapis.com/execution-environment', 'gen1'
            )
        }
    
    def _get_performance_metrics(self, service_name, start_time, end_time):
        """
        Get performance metrics from Cloud Monitoring
        """
        project_name = f"projects/{self.project_id}"
        
        # Define metrics to collect
        metrics = {
            'request_count': 'run.googleapis.com/request_count',
            'request_latencies': 'run.googleapis.com/request_latencies',
            'container_cpu_utilization': 'run.googleapis.com/container/cpu/utilizations',
            'container_memory_utilization': 'run.googleapis.com/container/memory/utilizations',
            'container_instance_count': 'run.googleapis.com/container/instance_count',
            'container_started_count': 'run.googleapis.com/container/started_count'
        }
        
        collected_metrics = {}
        
        for metric_name, metric_type in metrics.items():
            try:
                interval = monitoring_v3.TimeInterval({
                    "end_time": {"seconds": int(end_time.timestamp())},
                    "start_time": {"seconds": int(start_time.timestamp())},
                })
                
                results = self.monitoring_client.list_time_series(
                    request={
                        "name": project_name,
                        "filter": f'metric.type="{metric_type}" AND resource.labels.service_name="{service_name}"',
                        "interval": interval,
                        "view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL,
                    }
                )
                
                # Process results
                metric_data = []
                for result in results:
                    for point in result.points:
                        metric_data.append({
                            'timestamp': point.interval.end_time.timestamp(),
                            'value': point.value.double_value or point.value.int64_value
                        })
                
                if metric_data:
                    values = [point['value'] for point in metric_data]
                    collected_metrics[metric_name] = {
                        'average': statistics.mean(values) if values else 0,
                        'p95': statistics.quantiles(values, n=20)[18] if len(values) > 1 else 0,
                        'p99': statistics.quantiles(values, n=100)[98] if len(values) > 1 else 0,
                        'max': max(values) if values else 0,
                        'min': min(values) if values else 0,
                        'data_points': len(values)
                    }
                
            except Exception as e:
                print(f"Error getting metric {metric_name}: {e}")
                collected_metrics[metric_name] = None
        
        return collected_metrics
    
    def _calculate_cost_metrics(self, service_name, start_time, end_time):
        """
        Calculate cost metrics for the service
        """
        # These are approximate costs based on Cloud Run pricing
        # Actual costs would come from Cloud Billing API
        
        pricing = {
            'cpu_per_vcpu_second': 0.00002400,  # $0.000024 per vCPU-second
            'memory_per_gb_second': 0.00000250,  # $0.0000025 per GB-second
            'request_per_million': 0.40  # $0.40 per million requests
        }
        
        # This would normally integrate with Cloud Billing API
        # For now, return estimated costs based on metrics
        return {
            'estimated_monthly_cost_gbp': 50.00,  # Placeholder
            'cost_per_request_gbp': 0.00001,
            'cost_breakdown': {
                'cpu_cost_percentage': 60,
                'memory_cost_percentage': 30,
                'request_cost_percentage': 10
            }
        }
    
    def _generate_performance_recommendations(self, metrics_analysis):
        """
        Generate performance optimization recommendations
        """
        recommendations = []
        config = metrics_analysis['current_config']
        metrics = metrics_analysis['performance_metrics']
        
        # CPU utilization recommendations
        if metrics.get('container_cpu_utilization'):
            cpu_util = metrics['container_cpu_utilization']
            if cpu_util['average'] > 80:
                recommendations.append({
                    'priority': 'HIGH',
                    'category': 'CPU',
                    'issue': f"High CPU utilization ({cpu_util['average']:.1f}% average)",
                    'recommendation': "Increase CPU allocation or optimize application code",
                    'implementation': "Update service with higher CPU limit (e.g., '1' -> '2')"
                })
            elif cpu_util['average'] < 20:
                recommendations.append({
                    'priority': 'MEDIUM',
                    'category': 'CPU',
                    'issue': f"Low CPU utilization ({cpu_util['average']:.1f}% average)",
                    'recommendation': "Reduce CPU allocation to save costs",
                    'implementation': "Update service with lower CPU limit"
                })
        
        # Memory utilization recommendations
        if metrics.get('container_memory_utilization'):
            mem_util = metrics['container_memory_utilization']
            if mem_util['average'] > 85:
                recommendations.append({
                    'priority': 'HIGH',
                    'category': 'Memory',
                    'issue': f"High memory utilization ({mem_util['average']:.1f}% average)",
                    'recommendation': "Increase memory allocation or optimize memory usage",
                    'implementation': "Update service with higher memory limit"
                })
        
        # Request latency recommendations
        if metrics.get('request_latencies'):
            latency = metrics['request_latencies']
            if latency['p95'] > 2000:  # 2 seconds
                recommendations.append({
                    'priority': 'HIGH',
                    'category': 'Latency',
                    'issue': f"High request latency (P95: {latency['p95']:.0f}ms)",
                    'recommendation': "Optimize application performance or increase resources",
                    'implementation': "Profile application, add caching, or increase CPU/memory"
                })
        
        # Cold start recommendations
        if config['min_instances'] == '0':
            recommendations.append({
                'priority': 'MEDIUM',
                'category': 'Cold Starts',
                'issue': "No minimum instances configured",
                'recommendation': "Set minimum instances to reduce cold starts for latency-sensitive apps",
                'implementation': "Add annotation: run.googleapis.com/min-instances: '1'"
            })
        
        # Concurrency recommendations
        concurrency = int(config.get('concurrency', 80))
        if concurrency > 100:
            recommendations.append({
                'priority': 'LOW',
                'category': 'Concurrency',
                'issue': f"High concurrency setting ({concurrency})",
                'recommendation': "Consider reducing concurrency for better per-request performance",
                'implementation': "Set containerConcurrency to 80 or lower"
            })
        
        # Execution environment recommendations
        if config['execution_environment'] == 'gen1':
            recommendations.append({
                'priority': 'MEDIUM',
                'category': 'Runtime',
                'issue': "Using first-generation execution environment",
                'recommendation': "Upgrade to second-generation for better performance",
                'implementation': "Add annotation: run.googleapis.com/execution-environment: gen2"
            })
        
        return recommendations
    
    def load_test_service(self, service_url, concurrent_users=10, duration_seconds=60):
        """
        Simple load test for Cloud Run service
        """
        import threading
        import queue
        
        results_queue = queue.Queue()
        start_time = time.time()
        
        def make_requests():
            session = requests.Session()
            request_count = 0
            response_times = []
            errors = 0
            
            while time.time() - start_time < duration_seconds:
                try:
                    response_start = time.time()
                    response = session.get(f"{service_url}/health", timeout=30)
                    response_time = (time.time() - response_start) * 1000  # Convert to ms
                    
                    response_times.append(response_time)
                    request_count += 1
                    
                    if response.status_code != 200:
                        errors += 1
                        
                except Exception as e:
                    errors += 1
                    print(f"Request error: {e}")
                
                time.sleep(0.1)  # Small delay between requests
            
            results_queue.put({
                'request_count': request_count,
                'response_times': response_times,
                'errors': errors
            })
        
        # Start concurrent threads
        threads = []
        for _ in range(concurrent_users):
            thread = threading.Thread(target=make_requests)
            thread.start()
            threads.append(thread)
        
        # Wait for all threads to complete
        for thread in threads:
            thread.join()
        
        # Aggregate results
        total_requests = 0
        all_response_times = []
        total_errors = 0
        
        while not results_queue.empty():
            result = results_queue.get()
            total_requests += result['request_count']
            all_response_times.extend(result['response_times'])
            total_errors += result['errors']
        
        if all_response_times:
            return {
                'duration_seconds': duration_seconds,
                'concurrent_users': concurrent_users,
                'total_requests': total_requests,
                'requests_per_second': total_requests / duration_seconds,
                'total_errors': total_errors,
                'error_rate': (total_errors / total_requests) * 100 if total_requests > 0 else 0,
                'response_times': {
                    'average_ms': statistics.mean(all_response_times),
                    'p50_ms': statistics.median(all_response_times),
                    'p95_ms': statistics.quantiles(all_response_times, n=20)[18] if len(all_response_times) > 1 else 0,
                    'p99_ms': statistics.quantiles(all_response_times, n=100)[98] if len(all_response_times) > 1 else 0,
                    'max_ms': max(all_response_times),
                    'min_ms': min(all_response_times)
                }
            }
        else:
            return {'error': 'No successful requests completed'}

# Usage example
optimizer = CloudRunPerformanceOptimizer('your-project-id', 'europe-west1')

# Analyze service performance
analysis = optimizer.analyze_service_performance('my-service')

if analysis:
    print("=== Cloud Run Performance Analysis ===")
    print(f"Service: {analysis['service_name']}")
    print(f"CPU Limit: {analysis['current_config']['cpu_limit']}")
    print(f"Memory Limit: {analysis['current_config']['memory_limit']}")
    print(f"Concurrency: {analysis['current_config']['concurrency']}")
    print()
    
    if analysis['performance_metrics'].get('request_latencies'):
        latency = analysis['performance_metrics']['request_latencies']
        print(f"Request Latency - Avg: {latency['average']:.0f}ms, P95: {latency['p95']:.0f}ms")
    
    print()
    print("Recommendations:")
    for rec in analysis['recommendations']:
        print(f"🔥 {rec['priority']}: {rec['issue']}")
        print(f"   💡 {rec['recommendation']}")
        print(f"   🔧 {rec['implementation']}")
        print()

# Run load test
print("Running load test...")
load_test_results = optimizer.load_test_service(
    'https://my-service-hash-ew.a.run.app',
    concurrent_users=5,
    duration_seconds=30
)

if 'error' not in load_test_results:
    print("=== Load Test Results ===")
    print(f"Requests per second: {load_test_results['requests_per_second']:.2f}")
    print(f"Average response time: {load_test_results['response_times']['average_ms']:.0f}ms")
    print(f"P95 response time: {load_test_results['response_times']['p95_ms']:.0f}ms")
    print(f"Error rate: {load_test_results['error_rate']:.2f}%")

4. Cost-Performance Optimization Matrix

# Cloud Run cost-performance optimization calculator
class CloudRunCostOptimizer:
    def __init__(self):
        # GCP Cloud Run pricing (approximate, in GBP)
        self.pricing = {
            'cpu_per_vcpu_second': 0.000018,  # £0.000018 per vCPU-second
            'memory_per_gb_second': 0.000002, # £0.000002 per GB-second
            'request_per_million': 0.30       # £0.30 per million requests
        }
    
    def calculate_monthly_costs(self, config):
        """
        Calculate monthly costs for different configurations
        """
        # Convert CPU limit to vCPUs
        cpu_vcpus = float(config['cpu_limit'].replace('m', '')) / 1000 if 'm' in config['cpu_limit'] else float(config['cpu_limit'])
        
        # Convert memory to GB
        memory_gb = float(config['memory_limit'].replace('Mi', '')) / 1024 if 'Mi' in config['memory_limit'] else float(config['memory_limit'].replace('Gi', ''))
        
        # Calculate based on usage patterns
        monthly_requests = config['monthly_requests']
        avg_request_duration_seconds = config['avg_request_duration_seconds']
        
        # Total compute seconds per month
        total_compute_seconds = monthly_requests * avg_request_duration_seconds
        
        # Calculate costs
        cpu_cost = total_compute_seconds * cpu_vcpus * self.pricing['cpu_per_vcpu_second']
        memory_cost = total_compute_seconds * memory_gb * self.pricing['memory_per_gb_second']
        request_cost = (monthly_requests / 1000000) * self.pricing['request_per_million']
        
        total_cost = cpu_cost + memory_cost + request_cost
        
        return {
            'cpu_cost_gbp': cpu_cost,
            'memory_cost_gbp': memory_cost,
            'request_cost_gbp': request_cost,
            'total_monthly_cost_gbp': total_cost,
            'cost_per_request_gbp': total_cost / monthly_requests if monthly_requests > 0 else 0,
            'cost_breakdown': {
                'cpu_percentage': (cpu_cost / total_cost) * 100 if total_cost > 0 else 0,
                'memory_percentage': (memory_cost / total_cost) * 100 if total_cost > 0 else 0,
                'request_percentage': (request_cost / total_cost) * 100 if total_cost > 0 else 0
            }
        }
    
    def find_optimal_configuration(self, requirements):
        """
        Find optimal CPU/memory configuration for given requirements
        """
        configurations = []
        
        # Test different CPU/memory combinations
        cpu_options = ['0.5', '1', '2', '4']
        memory_options = ['512Mi', '1Gi', '2Gi', '4Gi', '8Gi']
        
        for cpu in cpu_options:
            for memory in memory_options:
                # Skip invalid combinations (memory must be appropriate for CPU)
                cpu_float = float(cpu)
                memory_gb = float(memory.replace('Mi', '')) / 1024 if 'Mi' in memory else float(memory.replace('Gi', ''))
                
                # GCP Cloud Run memory constraints
                if cpu_float == 0.5 and memory_gb > 2:
                    continue
                if cpu_float == 1 and memory_gb > 4:
                    continue
                if cpu_float == 2 and memory_gb > 8:
                    continue
                
                config = {
                    'cpu_limit': cpu,
                    'memory_limit': memory,
                    'monthly_requests': requirements['monthly_requests'],
                    'avg_request_duration_seconds': requirements['avg_request_duration_seconds']
                }
                
                costs = self.calculate_monthly_costs(config)
                
                # Calculate performance score (simplified)
                performance_score = self._calculate_performance_score(cpu_float, memory_gb, requirements)
                
                configurations.append({
                    'cpu': cpu,
                    'memory': memory,
                    'monthly_cost_gbp': costs['total_monthly_cost_gbp'],
                    'cost_per_request_gbp': costs['cost_per_request_gbp'],
                    'performance_score': performance_score,
                    'efficiency_ratio': performance_score / costs['total_monthly_cost_gbp'] if costs['total_monthly_cost_gbp'] > 0 else 0,
                    'full_analysis': costs
                })
        
        # Sort by efficiency ratio (performance per pound)
        configurations.sort(key=lambda x: x['efficiency_ratio'], reverse=True)
        
        return configurations
    
    def _calculate_performance_score(self, cpu_vcpus, memory_gb, requirements):
        """
        Calculate a performance score based on resource allocation
        """
        # This is a simplified performance model
        # In reality, you'd use actual performance testing data
        
        cpu_requirement = requirements.get('cpu_intensive', 5)  # 1-10 scale
        memory_requirement = requirements.get('memory_intensive', 5)  # 1-10 scale
        
        # Calculate how well resources match requirements
        cpu_score = min(cpu_vcpus * 10, cpu_requirement * 2)  # Cap at 2x requirement
        memory_score = min(memory_gb * 2, memory_requirement * 2)  # Cap at 2x requirement
        
        # Weighted average (customize weights based on your workload)
        cpu_weight = 0.6
        memory_weight = 0.4
        
        performance_score = (cpu_score * cpu_weight) + (memory_score * memory_weight)
        
        return round(performance_score, 2)

# Example usage
optimizer = CloudRunCostOptimizer()

# Define requirements
requirements = {
    'monthly_requests': 1000000,  # 1M requests per month
    'avg_request_duration_seconds': 0.5,  # 500ms average
    'cpu_intensive': 7,  # CPU-intensive workload (1-10 scale)
    'memory_intensive': 4  # Moderate memory usage (1-10 scale)
}

# Find optimal configuration
optimal_configs = optimizer.find_optimal_configuration(requirements)

print("=== Cloud Run Cost-Performance Optimization ===")
print(f"Requirements: {requirements['monthly_requests']:,} requests/month, {requirements['avg_request_duration_seconds']}s avg duration")
print()
print("Top 5 most efficient configurations:")
print()

for i, config in enumerate(optimal_configs[:5], 1):
    print(f"{i}. CPU: {config['cpu']}, Memory: {config['memory']}")
    print(f"   Monthly Cost: £{config['monthly_cost_gbp']:.2f}")
    print(f"   Cost per Request: £{config['cost_per_request_gbp']:.6f}")
    print(f"   Performance Score: {config['performance_score']}/10")
    print(f"   Efficiency Ratio: {config['efficiency_ratio']:.2f}")
    print()

Why It Matters

User Experience: Fast, responsive serverless apps keep users engaged
Cost Efficiency: Right-sized Cloud Run services can reduce costs by 50-70%
Scalability: Properly configured services handle traffic spikes gracefully
Reliability: Optimized containers have fewer timeouts and failures
Developer Productivity: Faster deployments and better performance feedback

Try This Week

Audit your Cloud Run services – Run the performance analysis script
Optimize one container image – Reduce size and improve startup time
Right-size resources – Use the cost optimizer to find efficient configurations
Add proper health checks – Implement startup, liveness, and readiness probes

Quick Cloud Run Performance Assessment

#!/bin/bash
# Cloud Run performance assessment script

PROJECT_ID="your-project-id"
REGION="europe-west1"

echo "=== Cloud Run Performance Assessment ==="
echo

echo "📊 Current Cloud Run services:"
gcloud run services list --region=$REGION --format="table(metadata.name,status.url,spec.template.spec.containers[0].resources.limits.cpu,spec.template.spec.containers[0].resources.limits.memory,spec.template.spec.containerConcurrency)"

echo
echo "🔍 Service configurations analysis:"
for service in $(gcloud run services list --region=$REGION --format="value(metadata.name)")
do
    echo "Service: $service"
    
    # Get detailed service info
    gcloud run services describe $service --region=$REGION --format="yaml" > /tmp/service_config.yaml
    
    # Extract key configuration values
    echo "  CPU Limit: $(yq '.spec.template.spec.containers[0].resources.limits.cpu // "Not set"' /tmp/service_config.yaml)"
    echo "  Memory Limit: $(yq '.spec.template.spec.containers[0].resources.limits.memory // "Not set"' /tmp/service_config.yaml)"
    echo "  Concurrency: $(yq '.spec.template.spec.containerConcurrency // "80"' /tmp/service_config.yaml)"
    echo "  Min Instances: $(yq '.spec.template.metadata.annotations."run.googleapis.com/min-instances" // "0"' /tmp/service_config.yaml)"
    echo "  Max Instances: $(yq '.spec.template.metadata.annotations."run.googleapis.com/max-instances" // "1000"' /tmp/service_config.yaml)"
    echo "  CPU Throttling: $(yq '.spec.template.metadata.annotations."run.googleapis.com/cpu-throttling" // "true"' /tmp/service_config.yaml)"
    echo "  Execution Environment: $(yq '.spec.template.metadata.annotations."run.googleapis.com/execution-environment" // "gen1"' /tmp/service_config.yaml)"
    echo
done

rm -f /tmp/service_config.yaml

echo "💰 Recent Cloud Run costs (requires billing export):"
echo "Run this BigQuery query in your billing dataset:"
cat << 'EOF'
SELECT
  service.description,
  SUM(cost) as total_cost,
  SUM(CASE WHEN sku.description LIKE '%CPU%' THEN cost END) as cpu_cost,
  SUM(CASE WHEN sku.description LIKE '%Memory%' THEN cost END) as memory_cost,
  SUM(CASE WHEN sku.description LIKE '%Request%' THEN cost END) as request_cost
FROM `project.dataset.gcp_billing_export_v1_BILLING_ACCOUNT_ID`
WHERE service.description = 'Cloud Run'
  AND usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY service.description
EOF

echo
echo "🎯 Performance optimization checklist:"
echo "1. ✅ Use distroless or alpine base images"
echo "2. ✅ Enable second-generation execution environment (gen2)"
echo "3. ✅ Set appropriate CPU and memory limits"
echo "4. ✅ Configure startup, liveness, and readiness probes"
echo "5. ✅ Implement proper connection pooling"
echo "6. ✅ Add response caching where appropriate"
echo "7. ✅ Set minimum instances for latency-sensitive services"
echo "8. ✅ Monitor and optimize container concurrency"
echo "9. ✅ Use compression for response data"
echo "10. ✅ Implement graceful shutdown handling"

Common Cloud Run Performance Mistakes

Oversized containers: Using large base images that slow cold starts
Poor resource allocation: Wrong CPU/memory ratios for the workload
No health checks: Missing probes that help Cloud Run manage containers
Inefficient connection handling: Not reusing database connections
No caching: Making expensive operations on every request
Synchronous processing: Blocking on slow external API calls

Advanced Performance Patterns

Container warming: Pre-warm containers with background traffic
Request batching: Process multiple items per request when possible
Async processing: Use Cloud Tasks for long-running operations
Regional optimization: Deploy to regions closer to your users
Traffic splitting: Gradually roll out performance optimizations

Pro Tip: Start performance optimization with container image size and start-up time – these have the biggest impact on cold starts. A 50MB container will start significantly faster than a 500MB container, directly improving user experience.