Your weekly dose of actionable cloud wisdom to start the week right
The Problem
Your Cloud Run services are taking 10+ seconds to cold start, costing you customers who won’t wait for slow responses. You’re paying for CPU and memory you don’t need, instances are scaling chaotically during traffic spikes, and your serverless dream has become a performance nightmare. Meanwhile, other teams have sub-second response times and predictable costs with the same technology.
The Solution
Optimize Cloud Run performance using intelligent resource sizing, startup acceleration, concurrency tuning, and traffic management. Cloud Run can deliver exceptional performance and cost efficiency when properly configured – the key is understanding how serverless containers behave under load and optimizing accordingly.
Essential Cloud Run Performance Optimizations:
1. Container Image and Startup Optimization
# Optimized Dockerfile for fast Cloud Run startup
# Use distroless or alpine base images for minimal size
FROM gcr.io/distroless/java:11
# Alternative: FROM node:18-alpine for Node.js apps
# Optimize layer caching - put least changing items first
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code last for better caching
COPY . .
# Use multi-stage builds to minimize final image size
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
FROM node:18-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY . .
# Optimize for Cloud Run startup
# Use exec form for proper signal handling
CMD ["node", "server.js"]
# Set appropriate user (Cloud Run runs as non-root by default)
USER 1000
# Expose the port that Cloud Run expects
EXPOSE 8080
# Add health check endpoint for faster readiness detection
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s --retries=2 \
CMD curl -f http://localhost:8080/health || exit 1
# Cloud Run service configuration for optimal performance
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: optimized-app
annotations:
# Enable CPU allocation during startup to reduce cold start time
run.googleapis.com/cpu-throttling: "false"
# Set minimum instances to avoid cold starts entirely
run.googleapis.com/min-instances: "1"
# Set maximum instances to control costs and avoid thundering herd
run.googleapis.com/max-instances: "100"
# Enable HTTP/2 for better performance
run.googleapis.com/ingress: all
# Use second generation execution environment for better performance
run.googleapis.com/execution-environment: gen2
spec:
template:
metadata:
annotations:
# Optimize autoscaling behavior
autoscaling.knative.dev/maxScale: "100"
autoscaling.knative.dev/minScale: "1"
# Set concurrency target for optimal performance
autoscaling.knative.dev/target: "70"
# Configure faster scaling
autoscaling.knative.dev/window: "60s"
spec:
# Set appropriate resource limits
containerConcurrency: 80
timeoutSeconds: 300
serviceAccountName: cloud-run-optimized-sa
containers:
- name: app
image: gcr.io/project/optimized-app:latest
ports:
- name: http1
containerPort: 8080
env:
- name: NODE_ENV
value: "production"
- name: PORT
value: "8080"
resources:
limits:
# Right-size CPU and memory for your workload
cpu: "1" # 1 vCPU
memory: "512Mi" # 512MB RAM
requests:
cpu: "0.5" # Minimum CPU allocation
memory: "256Mi" # Minimum memory
# Add startup probe for faster readiness detection
startupProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 0
periodSeconds: 1
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 10
# Add liveness probe
livenessProbe:
httpGet:
path: /health
port: 8080
periodSeconds: 10
timeoutSeconds: 5
# Add readiness probe
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
timeoutSeconds: 3
2. Application-Level Performance Optimizations
// Node.js optimizations for Cloud Run
const express = require('express');
const compression = require('compression');
const helmet = require('helmet');
const app = express();
const PORT = process.env.PORT || 8080;
// Enable gzip compression
app.use(compression());
// Security headers
app.use(helmet());
// Optimize JSON parsing
app.use(express.json({ limit: '10mb' }));
// Connection pooling and keep-alive for external services
const https = require('https');
const http = require('http');
const httpsAgent = new https.Agent({
keepAlive: true,
keepAliveMsecs: 10000,
maxSockets: 50,
maxFreeSockets: 10,
timeout: 60000,
});
const httpAgent = new http.Agent({
keepAlive: true,
keepAliveMsecs: 10000,
maxSockets: 50,
maxFreeSockets: 10,
timeout: 60000,
});
// Database connection optimization
class DatabaseManager {
constructor() {
this.pool = null;
this.initializePool();
}
initializePool() {
const { Pool } = require('pg');
this.pool = new Pool({
connectionString: process.env.DATABASE_URL,
// Optimize connection pool for Cloud Run
max: 10, // Maximum connections
min: 2, // Minimum connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
// Enable SSL for secure connections
ssl: process.env.NODE_ENV === 'production' ? { rejectUnauthorized: false } : false
});
// Handle pool errors
this.pool.on('error', (err) => {
console.error('Database pool error:', err);
});
}
async query(text, params) {
const start = Date.now();
try {
const res = await this.pool.query(text, params);
const duration = Date.now() - start;
console.log(`Query executed in ${duration}ms`);
return res;
} catch (error) {
console.error('Database query error:', error);
throw error;
}
}
async close() {
await this.pool.end();
}
}
const db = new DatabaseManager();
// In-memory caching for frequently accessed data
const NodeCache = require('node-cache');
const cache = new NodeCache({
stdTTL: 600, // 10 minutes default TTL
checkperiod: 120, // Check for expired keys every 2 minutes
useClones: false // Better performance, but be careful with object mutations
});
// Middleware for caching
function cacheMiddleware(ttl = 300) {
return (req, res, next) => {
const key = req.originalUrl;
const cachedData = cache.get(key);
if (cachedData) {
console.log(`Cache hit for ${key}`);
return res.json(cachedData);
}
// Override res.json to cache the response
const originalJson = res.json;
res.json = function(data) {
cache.set(key, data, ttl);
console.log(`Cached response for ${key}`);
return originalJson.call(this, data);
};
next();
};
}
// Health check endpoints for Cloud Run
app.get('/health', (req, res) => {
res.status(200).json({
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime()
});
});
app.get('/ready', async (req, res) => {
try {
// Check database connectivity
await db.query('SELECT 1');
res.status(200).json({
status: 'ready',
database: 'connected'
});
} catch (error) {
res.status(503).json({
status: 'not ready',
error: 'database connection failed'
});
}
});
// Optimized API endpoints with caching
app.get('/api/data', cacheMiddleware(600), async (req, res) => {
try {
const result = await db.query('SELECT * FROM expensive_query LIMIT 100');
res.json(result.rows);
} catch (error) {
res.status(500).json({ error: 'Internal server error' });
}
});
// Batch processing endpoint for efficiency
app.post('/api/batch', async (req, res) => {
try {
const { items } = req.body;
if (!Array.isArray(items) || items.length === 0) {
return res.status(400).json({ error: 'Items array is required' });
}
// Process items in parallel but with concurrency limit
const pLimit = require('p-limit');
const limit = pLimit(10); // Process max 10 items concurrently
const results = await Promise.allSettled(
items.map(item =>
limit(() => processItem(item))
)
);
const processed = results.map((result, index) => ({
index,
status: result.status,
data: result.status === 'fulfilled' ? result.value : null,
error: result.status === 'rejected' ? result.reason.message : null
}));
res.json({
processed: processed.length,
successful: results.filter(r => r.status === 'fulfilled').length,
failed: results.filter(r => r.status === 'rejected').length,
results: processed
});
} catch (error) {
res.status(500).json({ error: 'Batch processing failed' });
}
});
async function processItem(item) {
// Simulate item processing
await new Promise(resolve => setTimeout(resolve, 100));
return { id: item.id, processed: true, timestamp: new Date() };
}
// Graceful shutdown for Cloud Run
process.on('SIGINT', gracefulShutdown);
process.on('SIGTERM', gracefulShutdown);
async function gracefulShutdown(signal) {
console.log(`Received ${signal}. Starting graceful shutdown...`);
// Stop accepting new requests
server.close(async () => {
console.log('HTTP server closed');
// Close database connections
try {
await db.close();
console.log('Database connections closed');
} catch (error) {
console.error('Error closing database:', error);
}
// Clear cache
cache.flushAll();
process.exit(0);
});
// Force shutdown after 30 seconds
setTimeout(() => {
console.error('Force shutdown');
process.exit(1);
}, 30000);
}
const server = app.listen(PORT, '0.0.0.0', () => {
console.log(`Server running on port ${PORT}`);
console.log(`Environment: ${process.env.NODE_ENV}`);
console.log(`Memory limit: ${process.env.MEMORY_LIMIT || 'Not set'}`);
});
module.exports = app;
3. Performance Monitoring and Optimization Script
# Cloud Run performance monitoring and optimization script
import requests
import time
import json
import statistics
from google.cloud import run_v2
from google.cloud import monitoring_v3
from datetime import datetime, timedelta
class CloudRunPerformanceOptimizer:
def __init__(self, project_id, region='europe-west1'):
self.project_id = project_id
self.region = region
self.client = run_v2.ServicesClient()
self.monitoring_client = monitoring_v3.MetricServiceClient()
def analyze_service_performance(self, service_name, days_back=7):
"""
Analyze Cloud Run service performance metrics
"""
parent = f"projects/{self.project_id}"
# Get service configuration
service_path = f"projects/{self.project_id}/locations/{self.region}/services/{service_name}"
try:
service = self.client.get_service(name=service_path)
except Exception as e:
print(f"Error getting service info: {e}")
return None
# Analyze metrics
end_time = datetime.utcnow()
start_time = end_time - timedelta(days=days_back)
metrics_analysis = {
'service_name': service_name,
'current_config': self._extract_service_config(service),
'performance_metrics': self._get_performance_metrics(service_name, start_time, end_time),
'cost_analysis': self._calculate_cost_metrics(service_name, start_time, end_time),
'recommendations': []
}
# Generate recommendations
recommendations = self._generate_performance_recommendations(metrics_analysis)
metrics_analysis['recommendations'] = recommendations
return metrics_analysis
def _extract_service_config(self, service):
"""
Extract current service configuration
"""
container = service.spec.template.spec.containers[0]
return {
'cpu_limit': container.resources.limits.get('cpu', 'Not set'),
'memory_limit': container.resources.limits.get('memory', 'Not set'),
'concurrency': service.spec.template.spec.container_concurrency,
'min_instances': service.spec.template.metadata.annotations.get(
'run.googleapis.com/min-instances', '0'
),
'max_instances': service.spec.template.metadata.annotations.get(
'run.googleapis.com/max-instances', '1000'
),
'cpu_throttling': service.spec.template.metadata.annotations.get(
'run.googleapis.com/cpu-throttling', 'true'
),
'execution_environment': service.spec.template.metadata.annotations.get(
'run.googleapis.com/execution-environment', 'gen1'
)
}
def _get_performance_metrics(self, service_name, start_time, end_time):
"""
Get performance metrics from Cloud Monitoring
"""
project_name = f"projects/{self.project_id}"
# Define metrics to collect
metrics = {
'request_count': 'run.googleapis.com/request_count',
'request_latencies': 'run.googleapis.com/request_latencies',
'container_cpu_utilization': 'run.googleapis.com/container/cpu/utilizations',
'container_memory_utilization': 'run.googleapis.com/container/memory/utilizations',
'container_instance_count': 'run.googleapis.com/container/instance_count',
'container_started_count': 'run.googleapis.com/container/started_count'
}
collected_metrics = {}
for metric_name, metric_type in metrics.items():
try:
interval = monitoring_v3.TimeInterval({
"end_time": {"seconds": int(end_time.timestamp())},
"start_time": {"seconds": int(start_time.timestamp())},
})
results = self.monitoring_client.list_time_series(
request={
"name": project_name,
"filter": f'metric.type="{metric_type}" AND resource.labels.service_name="{service_name}"',
"interval": interval,
"view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL,
}
)
# Process results
metric_data = []
for result in results:
for point in result.points:
metric_data.append({
'timestamp': point.interval.end_time.timestamp(),
'value': point.value.double_value or point.value.int64_value
})
if metric_data:
values = [point['value'] for point in metric_data]
collected_metrics[metric_name] = {
'average': statistics.mean(values) if values else 0,
'p95': statistics.quantiles(values, n=20)[18] if len(values) > 1 else 0,
'p99': statistics.quantiles(values, n=100)[98] if len(values) > 1 else 0,
'max': max(values) if values else 0,
'min': min(values) if values else 0,
'data_points': len(values)
}
except Exception as e:
print(f"Error getting metric {metric_name}: {e}")
collected_metrics[metric_name] = None
return collected_metrics
def _calculate_cost_metrics(self, service_name, start_time, end_time):
"""
Calculate cost metrics for the service
"""
# These are approximate costs based on Cloud Run pricing
# Actual costs would come from Cloud Billing API
pricing = {
'cpu_per_vcpu_second': 0.00002400, # $0.000024 per vCPU-second
'memory_per_gb_second': 0.00000250, # $0.0000025 per GB-second
'request_per_million': 0.40 # $0.40 per million requests
}
# This would normally integrate with Cloud Billing API
# For now, return estimated costs based on metrics
return {
'estimated_monthly_cost_gbp': 50.00, # Placeholder
'cost_per_request_gbp': 0.00001,
'cost_breakdown': {
'cpu_cost_percentage': 60,
'memory_cost_percentage': 30,
'request_cost_percentage': 10
}
}
def _generate_performance_recommendations(self, metrics_analysis):
"""
Generate performance optimization recommendations
"""
recommendations = []
config = metrics_analysis['current_config']
metrics = metrics_analysis['performance_metrics']
# CPU utilization recommendations
if metrics.get('container_cpu_utilization'):
cpu_util = metrics['container_cpu_utilization']
if cpu_util['average'] > 80:
recommendations.append({
'priority': 'HIGH',
'category': 'CPU',
'issue': f"High CPU utilization ({cpu_util['average']:.1f}% average)",
'recommendation': "Increase CPU allocation or optimize application code",
'implementation': "Update service with higher CPU limit (e.g., '1' -> '2')"
})
elif cpu_util['average'] < 20:
recommendations.append({
'priority': 'MEDIUM',
'category': 'CPU',
'issue': f"Low CPU utilization ({cpu_util['average']:.1f}% average)",
'recommendation': "Reduce CPU allocation to save costs",
'implementation': "Update service with lower CPU limit"
})
# Memory utilization recommendations
if metrics.get('container_memory_utilization'):
mem_util = metrics['container_memory_utilization']
if mem_util['average'] > 85:
recommendations.append({
'priority': 'HIGH',
'category': 'Memory',
'issue': f"High memory utilization ({mem_util['average']:.1f}% average)",
'recommendation': "Increase memory allocation or optimize memory usage",
'implementation': "Update service with higher memory limit"
})
# Request latency recommendations
if metrics.get('request_latencies'):
latency = metrics['request_latencies']
if latency['p95'] > 2000: # 2 seconds
recommendations.append({
'priority': 'HIGH',
'category': 'Latency',
'issue': f"High request latency (P95: {latency['p95']:.0f}ms)",
'recommendation': "Optimize application performance or increase resources",
'implementation': "Profile application, add caching, or increase CPU/memory"
})
# Cold start recommendations
if config['min_instances'] == '0':
recommendations.append({
'priority': 'MEDIUM',
'category': 'Cold Starts',
'issue': "No minimum instances configured",
'recommendation': "Set minimum instances to reduce cold starts for latency-sensitive apps",
'implementation': "Add annotation: run.googleapis.com/min-instances: '1'"
})
# Concurrency recommendations
concurrency = int(config.get('concurrency', 80))
if concurrency > 100:
recommendations.append({
'priority': 'LOW',
'category': 'Concurrency',
'issue': f"High concurrency setting ({concurrency})",
'recommendation': "Consider reducing concurrency for better per-request performance",
'implementation': "Set containerConcurrency to 80 or lower"
})
# Execution environment recommendations
if config['execution_environment'] == 'gen1':
recommendations.append({
'priority': 'MEDIUM',
'category': 'Runtime',
'issue': "Using first-generation execution environment",
'recommendation': "Upgrade to second-generation for better performance",
'implementation': "Add annotation: run.googleapis.com/execution-environment: gen2"
})
return recommendations
def load_test_service(self, service_url, concurrent_users=10, duration_seconds=60):
"""
Simple load test for Cloud Run service
"""
import threading
import queue
results_queue = queue.Queue()
start_time = time.time()
def make_requests():
session = requests.Session()
request_count = 0
response_times = []
errors = 0
while time.time() - start_time < duration_seconds:
try:
response_start = time.time()
response = session.get(f"{service_url}/health", timeout=30)
response_time = (time.time() - response_start) * 1000 # Convert to ms
response_times.append(response_time)
request_count += 1
if response.status_code != 200:
errors += 1
except Exception as e:
errors += 1
print(f"Request error: {e}")
time.sleep(0.1) # Small delay between requests
results_queue.put({
'request_count': request_count,
'response_times': response_times,
'errors': errors
})
# Start concurrent threads
threads = []
for _ in range(concurrent_users):
thread = threading.Thread(target=make_requests)
thread.start()
threads.append(thread)
# Wait for all threads to complete
for thread in threads:
thread.join()
# Aggregate results
total_requests = 0
all_response_times = []
total_errors = 0
while not results_queue.empty():
result = results_queue.get()
total_requests += result['request_count']
all_response_times.extend(result['response_times'])
total_errors += result['errors']
if all_response_times:
return {
'duration_seconds': duration_seconds,
'concurrent_users': concurrent_users,
'total_requests': total_requests,
'requests_per_second': total_requests / duration_seconds,
'total_errors': total_errors,
'error_rate': (total_errors / total_requests) * 100 if total_requests > 0 else 0,
'response_times': {
'average_ms': statistics.mean(all_response_times),
'p50_ms': statistics.median(all_response_times),
'p95_ms': statistics.quantiles(all_response_times, n=20)[18] if len(all_response_times) > 1 else 0,
'p99_ms': statistics.quantiles(all_response_times, n=100)[98] if len(all_response_times) > 1 else 0,
'max_ms': max(all_response_times),
'min_ms': min(all_response_times)
}
}
else:
return {'error': 'No successful requests completed'}
# Usage example
optimizer = CloudRunPerformanceOptimizer('your-project-id', 'europe-west1')
# Analyze service performance
analysis = optimizer.analyze_service_performance('my-service')
if analysis:
print("=== Cloud Run Performance Analysis ===")
print(f"Service: {analysis['service_name']}")
print(f"CPU Limit: {analysis['current_config']['cpu_limit']}")
print(f"Memory Limit: {analysis['current_config']['memory_limit']}")
print(f"Concurrency: {analysis['current_config']['concurrency']}")
print()
if analysis['performance_metrics'].get('request_latencies'):
latency = analysis['performance_metrics']['request_latencies']
print(f"Request Latency - Avg: {latency['average']:.0f}ms, P95: {latency['p95']:.0f}ms")
print()
print("Recommendations:")
for rec in analysis['recommendations']:
print(f"π₯ {rec['priority']}: {rec['issue']}")
print(f" π‘ {rec['recommendation']}")
print(f" π§ {rec['implementation']}")
print()
# Run load test
print("Running load test...")
load_test_results = optimizer.load_test_service(
'https://my-service-hash-ew.a.run.app',
concurrent_users=5,
duration_seconds=30
)
if 'error' not in load_test_results:
print("=== Load Test Results ===")
print(f"Requests per second: {load_test_results['requests_per_second']:.2f}")
print(f"Average response time: {load_test_results['response_times']['average_ms']:.0f}ms")
print(f"P95 response time: {load_test_results['response_times']['p95_ms']:.0f}ms")
print(f"Error rate: {load_test_results['error_rate']:.2f}%")
4. Cost-Performance Optimization Matrix
# Cloud Run cost-performance optimization calculator
class CloudRunCostOptimizer:
def __init__(self):
# GCP Cloud Run pricing (approximate, in GBP)
self.pricing = {
'cpu_per_vcpu_second': 0.000018, # Β£0.000018 per vCPU-second
'memory_per_gb_second': 0.000002, # Β£0.000002 per GB-second
'request_per_million': 0.30 # Β£0.30 per million requests
}
def calculate_monthly_costs(self, config):
"""
Calculate monthly costs for different configurations
"""
# Convert CPU limit to vCPUs
cpu_vcpus = float(config['cpu_limit'].replace('m', '')) / 1000 if 'm' in config['cpu_limit'] else float(config['cpu_limit'])
# Convert memory to GB
memory_gb = float(config['memory_limit'].replace('Mi', '')) / 1024 if 'Mi' in config['memory_limit'] else float(config['memory_limit'].replace('Gi', ''))
# Calculate based on usage patterns
monthly_requests = config['monthly_requests']
avg_request_duration_seconds = config['avg_request_duration_seconds']
# Total compute seconds per month
total_compute_seconds = monthly_requests * avg_request_duration_seconds
# Calculate costs
cpu_cost = total_compute_seconds * cpu_vcpus * self.pricing['cpu_per_vcpu_second']
memory_cost = total_compute_seconds * memory_gb * self.pricing['memory_per_gb_second']
request_cost = (monthly_requests / 1000000) * self.pricing['request_per_million']
total_cost = cpu_cost + memory_cost + request_cost
return {
'cpu_cost_gbp': cpu_cost,
'memory_cost_gbp': memory_cost,
'request_cost_gbp': request_cost,
'total_monthly_cost_gbp': total_cost,
'cost_per_request_gbp': total_cost / monthly_requests if monthly_requests > 0 else 0,
'cost_breakdown': {
'cpu_percentage': (cpu_cost / total_cost) * 100 if total_cost > 0 else 0,
'memory_percentage': (memory_cost / total_cost) * 100 if total_cost > 0 else 0,
'request_percentage': (request_cost / total_cost) * 100 if total_cost > 0 else 0
}
}
def find_optimal_configuration(self, requirements):
"""
Find optimal CPU/memory configuration for given requirements
"""
configurations = []
# Test different CPU/memory combinations
cpu_options = ['0.5', '1', '2', '4']
memory_options = ['512Mi', '1Gi', '2Gi', '4Gi', '8Gi']
for cpu in cpu_options:
for memory in memory_options:
# Skip invalid combinations (memory must be appropriate for CPU)
cpu_float = float(cpu)
memory_gb = float(memory.replace('Mi', '')) / 1024 if 'Mi' in memory else float(memory.replace('Gi', ''))
# GCP Cloud Run memory constraints
if cpu_float == 0.5 and memory_gb > 2:
continue
if cpu_float == 1 and memory_gb > 4:
continue
if cpu_float == 2 and memory_gb > 8:
continue
config = {
'cpu_limit': cpu,
'memory_limit': memory,
'monthly_requests': requirements['monthly_requests'],
'avg_request_duration_seconds': requirements['avg_request_duration_seconds']
}
costs = self.calculate_monthly_costs(config)
# Calculate performance score (simplified)
performance_score = self._calculate_performance_score(cpu_float, memory_gb, requirements)
configurations.append({
'cpu': cpu,
'memory': memory,
'monthly_cost_gbp': costs['total_monthly_cost_gbp'],
'cost_per_request_gbp': costs['cost_per_request_gbp'],
'performance_score': performance_score,
'efficiency_ratio': performance_score / costs['total_monthly_cost_gbp'] if costs['total_monthly_cost_gbp'] > 0 else 0,
'full_analysis': costs
})
# Sort by efficiency ratio (performance per pound)
configurations.sort(key=lambda x: x['efficiency_ratio'], reverse=True)
return configurations
def _calculate_performance_score(self, cpu_vcpus, memory_gb, requirements):
"""
Calculate a performance score based on resource allocation
"""
# This is a simplified performance model
# In reality, you'd use actual performance testing data
cpu_requirement = requirements.get('cpu_intensive', 5) # 1-10 scale
memory_requirement = requirements.get('memory_intensive', 5) # 1-10 scale
# Calculate how well resources match requirements
cpu_score = min(cpu_vcpus * 10, cpu_requirement * 2) # Cap at 2x requirement
memory_score = min(memory_gb * 2, memory_requirement * 2) # Cap at 2x requirement
# Weighted average (customize weights based on your workload)
cpu_weight = 0.6
memory_weight = 0.4
performance_score = (cpu_score * cpu_weight) + (memory_score * memory_weight)
return round(performance_score, 2)
# Example usage
optimizer = CloudRunCostOptimizer()
# Define requirements
requirements = {
'monthly_requests': 1000000, # 1M requests per month
'avg_request_duration_seconds': 0.5, # 500ms average
'cpu_intensive': 7, # CPU-intensive workload (1-10 scale)
'memory_intensive': 4 # Moderate memory usage (1-10 scale)
}
# Find optimal configuration
optimal_configs = optimizer.find_optimal_configuration(requirements)
print("=== Cloud Run Cost-Performance Optimization ===")
print(f"Requirements: {requirements['monthly_requests']:,} requests/month, {requirements['avg_request_duration_seconds']}s avg duration")
print()
print("Top 5 most efficient configurations:")
print()
for i, config in enumerate(optimal_configs[:5], 1):
print(f"{i}. CPU: {config['cpu']}, Memory: {config['memory']}")
print(f" Monthly Cost: Β£{config['monthly_cost_gbp']:.2f}")
print(f" Cost per Request: Β£{config['cost_per_request_gbp']:.6f}")
print(f" Performance Score: {config['performance_score']}/10")
print(f" Efficiency Ratio: {config['efficiency_ratio']:.2f}")
print()
Why It Matters
- User Experience: Fast, responsive serverless apps keep users engaged
- Cost Efficiency: Right-sized Cloud Run services can reduce costs by 50-70%
- Scalability: Properly configured services handle traffic spikes gracefully
- Reliability: Optimized containers have fewer timeouts and failures
- Developer Productivity: Faster deployments and better performance feedback
Try This Week
- Audit your Cloud Run services – Run the performance analysis script
- Optimize one container image – Reduce size and improve startup time
- Right-size resources – Use the cost optimizer to find efficient configurations
- Add proper health checks – Implement startup, liveness, and readiness probes
Quick Cloud Run Performance Assessment
#!/bin/bash
# Cloud Run performance assessment script
PROJECT_ID="your-project-id"
REGION="europe-west1"
echo "=== Cloud Run Performance Assessment ==="
echo
echo "π Current Cloud Run services:"
gcloud run services list --region=$REGION --format="table(metadata.name,status.url,spec.template.spec.containers[0].resources.limits.cpu,spec.template.spec.containers[0].resources.limits.memory,spec.template.spec.containerConcurrency)"
echo
echo "π Service configurations analysis:"
for service in $(gcloud run services list --region=$REGION --format="value(metadata.name)")
do
echo "Service: $service"
# Get detailed service info
gcloud run services describe $service --region=$REGION --format="yaml" > /tmp/service_config.yaml
# Extract key configuration values
echo " CPU Limit: $(yq '.spec.template.spec.containers[0].resources.limits.cpu // "Not set"' /tmp/service_config.yaml)"
echo " Memory Limit: $(yq '.spec.template.spec.containers[0].resources.limits.memory // "Not set"' /tmp/service_config.yaml)"
echo " Concurrency: $(yq '.spec.template.spec.containerConcurrency // "80"' /tmp/service_config.yaml)"
echo " Min Instances: $(yq '.spec.template.metadata.annotations."run.googleapis.com/min-instances" // "0"' /tmp/service_config.yaml)"
echo " Max Instances: $(yq '.spec.template.metadata.annotations."run.googleapis.com/max-instances" // "1000"' /tmp/service_config.yaml)"
echo " CPU Throttling: $(yq '.spec.template.metadata.annotations."run.googleapis.com/cpu-throttling" // "true"' /tmp/service_config.yaml)"
echo " Execution Environment: $(yq '.spec.template.metadata.annotations."run.googleapis.com/execution-environment" // "gen1"' /tmp/service_config.yaml)"
echo
done
rm -f /tmp/service_config.yaml
echo "π° Recent Cloud Run costs (requires billing export):"
echo "Run this BigQuery query in your billing dataset:"
cat << 'EOF'
SELECT
service.description,
SUM(cost) as total_cost,
SUM(CASE WHEN sku.description LIKE '%CPU%' THEN cost END) as cpu_cost,
SUM(CASE WHEN sku.description LIKE '%Memory%' THEN cost END) as memory_cost,
SUM(CASE WHEN sku.description LIKE '%Request%' THEN cost END) as request_cost
FROM `project.dataset.gcp_billing_export_v1_BILLING_ACCOUNT_ID`
WHERE service.description = 'Cloud Run'
AND usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY service.description
EOF
echo
echo "π― Performance optimization checklist:"
echo "1. β
Use distroless or alpine base images"
echo "2. β
Enable second-generation execution environment (gen2)"
echo "3. β
Set appropriate CPU and memory limits"
echo "4. β
Configure startup, liveness, and readiness probes"
echo "5. β
Implement proper connection pooling"
echo "6. β
Add response caching where appropriate"
echo "7. β
Set minimum instances for latency-sensitive services"
echo "8. β
Monitor and optimize container concurrency"
echo "9. β
Use compression for response data"
echo "10. β
Implement graceful shutdown handling"
Common Cloud Run Performance Mistakes
- Oversized containers: Using large base images that slow cold starts
- Poor resource allocation: Wrong CPU/memory ratios for the workload
- No health checks: Missing probes that help Cloud Run manage containers
- Inefficient connection handling: Not reusing database connections
- No caching: Making expensive operations on every request
- Synchronous processing: Blocking on slow external API calls
Advanced Performance Patterns
- Container warming: Pre-warm containers with background traffic
- Request batching: Process multiple items per request when possible
- Async processing: Use Cloud Tasks for long-running operations
- Regional optimization: Deploy to regions closer to your users
- Traffic splitting: Gradually roll out performance optimizations
Pro Tip: Start performance optimization with container image size and start-up time – these have the biggest impact on cold starts. A 50MB container will start significantly faster than a 500MB container, directly improving user experience.








