Monday Cloud Tip: Set Up AWS CloudWatch Custom Alarms That Actually Matter

Your weekly dose of actionable cloud wisdom to start the week right

The Problem

You’re getting alerts for high CPU usage at 3 AM, but meanwhile your application is throwing 500 errors, users can’t log in, and your payment processing has stopped working. Default CloudWatch alarms monitor infrastructure metrics, but they don’t tell you if your business is actually working.

The Solution

Create custom CloudWatch alarms that monitor what your business actually cares about – user experience, revenue-generating functions, and application health. Stop monitoring servers and start monitoring success.

Business-Critical Custom Metrics:

1. Application Error Rate Alarm

# Create custom metric for application errors
aws cloudwatch put-alarm \
    --alarm-name "HighApplicationErrorRate" \
    --alarm-description "Alert when error rate exceeds 5%" \
    --metric-name "ErrorRate" \
    --namespace "MyApp/Performance" \
    --statistic Average \
    --period 300 \
    --threshold 5.0 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2

2. User Login Success Rate

# Monitor authentication health
aws logs put-metric-filter \
    --log-group-name "/aws/lambda/user-auth" \
    --filter-name "LoginFailures" \
    --filter-pattern "ERROR Login failed" \
    --metric-transformations \
        metricName="LoginFailureRate",metricNamespace="MyApp/Auth",metricValue="1"

3. API Response Time Alarm

# Python code to push custom metrics
import boto3
import time

cloudwatch = boto3.client('cloudwatch')

def log_api_response_time(response_time_ms):
    cloudwatch.put_metric_data(
        Namespace='MyApp/API',
        MetricData=[
            {
                'MetricName': 'ResponseTime',
                'Value': response_time_ms,
                'Unit': 'Milliseconds',
                'Timestamp': time.time()
            }
        ]
    )

# Then create alarm for this metric
# Threshold: Alert if API responses > 2000ms

4. Queue Depth Monitoring

# Monitor processing backlogs
aws cloudwatch put-alarm \
    --alarm-name "SQSQueueBacklog" \
    --alarm-description "Alert when queue has too many messages" \
    --metric-name "ApproximateNumberOfMessages" \
    --namespace "AWS/SQS" \
    --statistic Average \
    --period 300 \
    --threshold 100 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=QueueName,Value=processing-queue

Why It Matters

  • Business Impact: Know when revenue is affected, not just when servers are busy
  • User Experience: Alert on what users actually feel, not infrastructure noise
  • Proactive Response: Fix problems before customers complain
  • Better Sleep: Get woken up for things that actually matter

Try This Week

  1. Identify your critical business metrics – What would break your business if it stopped working?
  2. Create one custom metric – Start with error rates or response times
  3. Set up Log Insights queries – Find patterns in your application logs
  4. Build a business dashboard – Combine your custom metrics in CloudWatch

Advanced: Composite Alarms

# Create an alarm that combines multiple conditions
aws cloudwatch put-composite-alarm \
    --alarm-name "ApplicationHealthCheck" \
    --alarm-rule "(ALARM HighErrorRate) OR (ALARM SlowResponseTime)" \
    --actions-enabled \
    --alarm-actions "arn:aws:sns:region:account:alert-topic"

Metric Ideas by Industry

  • E-commerce: Successful payments per minute, cart abandonment rate
  • SaaS: Active user sessions, feature usage rates, subscription events
  • Content: Page load times, video stream quality, content delivery success
  • Financial: Transaction processing rate, fraud detection accuracy
  • Healthcare: Patient data sync success, appointment booking rates

Pro Tips

  • Use percentiles: P95 response time tells you more than average
  • Alert fatigue is real: Start with high thresholds and tune down
  • Test your alarms: Use CloudWatch’s “Set alarm state” to verify notifications work
  • Document everything: Future you will thank present you for clear alarm descriptions

Hidden Gem: CloudWatch Logs Insights can automatically create metrics from log patterns. Search for “ERROR” or “TIMEOUT” and turn frequent patterns into alarms with one click.


Monitoring something unique to your industry? I’d love to hear about creative custom metrics you’ve built – they might inspire next week’s tip!