AWS Instance Scheduler: Automate EC2 & RDS Stop/Start

The Problem

Your development and test environments run 24/7 even though teams only work Monday-Friday, 9-6. You’re burning thousands in EC2 and RDS costs for instances that sit idle 75% of the time. You tried writing Lambda functions to stop instances on a schedule, but maintaining custom scripts across multiple accounts and regions became a nightmare. Your CFO wants to know why non-prod costs are almost as high as production.

The Solution

AWS Instance Scheduler is an official AWS solution that automatically stops and starts EC2 and RDS instances based on schedules you define. Deploy once via CloudFormation, configure schedules in DynamoDB, tag your instances, and the Lambda function handles everything – checking every 5 minutes and executing start/stop actions. A typical dev environment running 45 hours/week instead of 168 saves 73% on compute costs with zero manual intervention.

Essential Instance Scheduler Implementations

1. CloudFormation Deployment Script

#!/bin/bash
# Deploy AWS Instance Scheduler with standard configuration
# Handles EC2 and RDS across multiple regions

STACK_NAME="instance-scheduler"
REGION="eu-west-2"
TEMPLATE_URL="https://s3.amazonaws.com/solutions-reference/instance-scheduler-on-aws/latest/instance-scheduler-on-aws.template"

# Configuration
SCHEDULE_TAG_KEY="Schedule"
DEFAULT_TIMEZONE="Europe/London"
SCHEDULER_FREQUENCY="5"  # Run every 5 minutes
SERVICES="Both"  # EC2, RDS, or Both
TARGET_REGIONS="eu-west-2,eu-west-1,us-east-1"
ENABLE_CLOUDWATCH_METRICS="Yes"
ENABLE_CLOUDWATCH_LOGS="Yes"

echo "🚀 Deploying AWS Instance Scheduler to $REGION"

aws cloudformation create-stack \
    --stack-name $STACK_NAME \
    --template-url $TEMPLATE_URL \
    --region $REGION \
    --parameters \
        ParameterKey=TagName,ParameterValue=$SCHEDULE_TAG_KEY \
        ParameterKey=DefaultTimezone,ParameterValue=$DEFAULT_TIMEZONE \
        ParameterKey=SchedulerFrequency,ParameterValue=$SCHEDULER_FREQUENCY \
        ParameterKey=ScheduledServices,ParameterValue=$SERVICES \
        ParameterKey=Regions,ParameterValue=$TARGET_REGIONS \
        ParameterKey=MemorySize,ParameterValue=128 \
        ParameterKey=Trace,ParameterValue=No \
        ParameterKey=EnableCloudWatchMetrics,ParameterValue=$ENABLE_CLOUDWATCH_METRICS \
        ParameterKey=EnableCloudWatchLogs,ParameterValue=$ENABLE_CLOUDWATCH_LOGS \
        ParameterKey=StartedTags,ParameterValue="SchedulerAction=Started" \
        ParameterKey=StoppedTags,ParameterValue="SchedulerAction=Stopped" \
    --capabilities CAPABILITY_IAM \
    --tags \
        Key=Service,Value=InstanceScheduler \
        Key=ManagedBy,Value=CloudFormation

echo "⏳ Waiting for stack creation to complete..."
aws cloudformation wait stack-create-complete \
    --stack-name $STACK_NAME \
    --region $REGION

if [ $? -eq 0 ]; then
    echo "✅ Instance Scheduler deployed successfully"
    
    # Get DynamoDB table name
    TABLE_NAME=$(aws cloudformation describe-stacks \
        --stack-name $STACK_NAME \
        --region $REGION \
        --query 'Stacks[0].Outputs[?OutputKey==`ConfigurationTable`].OutputValue' \
        --output text)
    
    echo "📊 Configuration table: $TABLE_NAME"
    echo "📝 Next: Configure schedules in DynamoDB"
else
    echo "❌ Stack creation failed"
    exit 1
fi

2. Schedule Configuration Script

#!/usr/bin/env python3
"""
Configure Instance Scheduler periods and schedules in DynamoDB
Creates common scheduling patterns for dev/test/prod environments
"""

import boto3
from datetime import datetime

dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
table_name = 'instance-scheduler-ConfigTable'  # Replace with your table name
table = dynamodb.Table(table_name)

def create_period(name, begintime, endtime, weekdays=None, monthdays=None, months=None):
    """Create a period definition"""
    item = {
        'type': 'period',
        'name': name,
        'begintime': begintime,
        'endtime': endtime
    }
    
    if weekdays:
        item['weekdays'] = set(weekdays)
    if monthdays:
        item['monthdays'] = set(monthdays)
    if months:
        item['months'] = set(months)
    
    table.put_item(Item=item)
    print(f"✅ Created period: {name}")

def create_schedule(name, periods, timezone='Europe/London', description=''):
    """Create a schedule that references periods"""
    item = {
        'type': 'schedule',
        'name': name,
        'timezone': timezone,
        'periods': set(periods)
    }
    
    if description:
        item['description'] = description
    
    table.put_item(Item=item)
    print(f"✅ Created schedule: {name}")

# Common scheduling patterns

# 1. UK Office Hours (Mon-Fri 9am-6pm)
create_period(
    name='uk-office-hours',
    begintime='09:00',
    endtime='18:00',
    weekdays=['mon-fri']
)

create_schedule(
    name='uk-office-hours',
    periods=['uk-office-hours'],
    timezone='Europe/London',
    description='Standard UK office hours for dev environments'
)

# 2. Extended Development Hours (Mon-Fri 8am-8pm)
create_period(
    name='extended-dev-hours',
    begintime='08:00',
    endtime='20:00',
    weekdays=['mon-fri']
)

create_schedule(
    name='extended-dev',
    periods=['extended-dev-hours'],
    timezone='Europe/London',
    description='Extended hours for active development'
)

# 3. Weekend Testing (Sat-Sun only)
create_period(
    name='weekend-period',
    begintime='00:00',
    endtime='23:59',
    weekdays=['sat-sun']
)

create_schedule(
    name='weekend-only',
    periods=['weekend-period'],
    timezone='Europe/London',
    description='Weekend testing environment'
)

# 4. Business Hours with Maintenance Window (stops Sun 2am-6am)
create_period(
    name='weekday-full',
    begintime='00:00',
    endtime='23:59',
    weekdays=['mon-sat']
)

create_period(
    name='sunday-after-maintenance',
    begintime='06:00',
    endtime='23:59',
    weekdays=['sun']
)

create_schedule(
    name='always-on-with-maintenance',
    periods=['weekday-full', 'sunday-after-maintenance'],
    timezone='Europe/London',
    description='24/7 with Sunday morning maintenance window'
)

# 5. Month-end Processing (only last 3 days of month)
create_period(
    name='month-end-period',
    begintime='00:00',
    endtime='23:59',
    monthdays=['28', '29', '30', '31']  # Will only run on days that exist
)

create_schedule(
    name='month-end-only',
    periods=['month-end-period'],
    timezone='Europe/London',
    description='Month-end processing instances'
)

# 6. QA Environment (Mon-Fri 7am-10pm for overnight test runs)
create_period(
    name='qa-hours',
    begintime='07:00',
    endtime='22:00',
    weekdays=['mon-fri']
)

create_schedule(
    name='qa-extended',
    periods=['qa-hours'],
    timezone='Europe/London',
    description='QA environment with extended hours for overnight tests'
)

# 7. US East Coast Hours (for global teams)
create_period(
    name='us-east-hours',
    begintime='09:00',
    endtime='18:00',
    weekdays=['mon-fri']
)

create_schedule(
    name='us-east-office',
    periods=['us-east-hours'],
    timezone='America/New_York',
    description='US East Coast office hours'
)

print("\n✅ All schedules configured successfully")
print("\n📝 Tag your instances with: Schedule=<schedule-name>")
print("Example: aws ec2 create-tags --resources i-1234567890 --tags Key=Schedule,Value=uk-office-hours")

3. Bulk Tagging Script

#!/bin/bash
# Bulk tag instances with scheduler schedules
# Tags based on instance name patterns or existing tags

REGION="eu-west-2"
SCHEDULE_TAG_KEY="Schedule"

# Function to tag instances
tag_instances() {
    local instance_ids=$1
    local schedule_value=$2
    
    if [ -z "$instance_ids" ]; then
        echo "⚠️  No instances found for schedule: $schedule_value"
        return
    fi
    
    echo "🏷️  Tagging instances for schedule: $schedule_value"
    
    aws ec2 create-tags \
        --region $REGION \
        --resources $instance_ids \
        --tags Key=$SCHEDULE_TAG_KEY,Value=$schedule_value
    
    echo "✅ Tagged $(echo $instance_ids | wc -w) instances"
}

# Tag dev instances (by Name tag containing 'dev')
DEV_INSTANCES=$(aws ec2 describe-instances \
    --region $REGION \
    --filters "Name=tag:Name,Values=*dev*" "Name=instance-state-name,Values=running,stopped" \
    --query 'Reservations[*].Instances[*].InstanceId' \
    --output text)

tag_instances "$DEV_INSTANCES" "uk-office-hours"

# Tag test instances
TEST_INSTANCES=$(aws ec2 describe-instances \
    --region $REGION \
    --filters "Name=tag:Name,Values=*test*" "Name=instance-state-name,Values=running,stopped" \
    --query 'Reservations[*].Instances[*].InstanceId' \
    --output text)

tag_instances "$TEST_INSTANCES" "extended-dev"

# Tag QA instances
QA_INSTANCES=$(aws ec2 describe-instances \
    --region $REGION \
    --filters "Name=tag:Name,Values=*qa*" "Name=instance-state-name,Values=running,stopped" \
    --query 'Reservations[*].Instances[*].InstanceId' \
    --output text)

tag_instances "$QA_INSTANCES" "qa-extended"

# Tag RDS instances
echo ""
echo "🗄️  Tagging RDS instances..."

# Dev databases
DEV_DBS=$(aws rds describe-db-instances \
    --region $REGION \
    --query 'DBInstances[?contains(DBInstanceIdentifier, `dev`)].DBInstanceArn' \
    --output text)

for db_arn in $DEV_DBS; do
    aws rds add-tags-to-resource \
        --region $REGION \
        --resource-name $db_arn \
        --tags Key=$SCHEDULE_TAG_KEY,Value=uk-office-hours
    echo "✅ Tagged RDS: $(basename $db_arn)"
done

echo ""
echo "✅ Bulk tagging complete"
echo "📊 Verify tags: aws ec2 describe-instances --filters \"Name=tag:$SCHEDULE_TAG_KEY,Values=*\" --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==\`$SCHEDULE_TAG_KEY\`].Value]' --output table"

4. Cost Savings Calculator

#!/usr/bin/env python3
"""
Calculate potential cost savings from Instance Scheduler
Analyzes current usage and projects savings based on schedules
"""

import boto3
from datetime import datetime, timedelta
from collections import defaultdict

ec2 = boto3.client('ec2', region_name='eu-west-2')
pricing = boto3.client('pricing', region_name='us-east-1')

# AWS pricing (approximate for eu-west-2)
EC2_PRICING = {
    't3.micro': 0.0104,
    't3.small': 0.0208,
    't3.medium': 0.0416,
    't3.large': 0.0832,
    'm5.large': 0.096,
    'm5.xlarge': 0.192,
    'm5.2xlarge': 0.384,
    'r5.large': 0.126,
    'r5.xlarge': 0.252,
}

# Schedule runtime hours per week
SCHEDULE_HOURS = {
    'uk-office-hours': 45,      # 9am-6pm Mon-Fri = 9hrs * 5 days
    'extended-dev': 60,           # 8am-8pm Mon-Fri = 12hrs * 5 days
    'qa-extended': 75,            # 7am-10pm Mon-Fri = 15hrs * 5 days
    'weekend-only': 48,           # Sat-Sun full days
    'always-on': 168,             # 24/7
}

def get_instance_cost(instance_type, hours_per_week):
    """Calculate weekly cost for instance type"""
    hourly_rate = EC2_PRICING.get(instance_type, 0.10)  # Default fallback
    return hourly_rate * hours_per_week

def analyze_savings():
    """Analyze potential savings across all tagged instances"""
    
    # Get all instances with Schedule tag
    response = ec2.describe_instances(
        Filters=[
            {'Name': 'tag-key', 'Values': ['Schedule']},
            {'Name': 'instance-state-name', 'Values': ['running', 'stopped']}
        ]
    )
    
    savings_data = defaultdict(lambda: {
        'count': 0,
        'current_weekly_cost': 0,
        'scheduled_weekly_cost': 0,
        'weekly_savings': 0
    })
    
    total_instances = 0
    
    for reservation in response['Reservations']:
        for instance in reservation['Instances']:
            total_instances += 1
            instance_id = instance['InstanceId']
            instance_type = instance['InstanceType']
            
            # Get schedule tag
            schedule = next(
                (tag['Value'] for tag in instance.get('Tags', []) if tag['Key'] == 'Schedule'),
                None
            )
            
            if not schedule:
                continue
            
            # Calculate costs
            current_cost = get_instance_cost(instance_type, 168)  # Current 24/7
            scheduled_hours = SCHEDULE_HOURS.get(schedule, 168)
            scheduled_cost = get_instance_cost(instance_type, scheduled_hours)
            savings = current_cost - scheduled_cost
            
            # Aggregate by schedule
            savings_data[schedule]['count'] += 1
            savings_data[schedule]['current_weekly_cost'] += current_cost
            savings_data[schedule]['scheduled_weekly_cost'] += scheduled_cost
            savings_data[schedule]['weekly_savings'] += savings
    
    # Print results
    print("=" * 80)
    print("AWS INSTANCE SCHEDULER - COST SAVINGS ANALYSIS")
    print("=" * 80)
    print(f"\nTotal instances with Schedule tag: {total_instances}")
    print(f"Analysis date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    
    total_weekly_savings = 0
    total_current_cost = 0
    
    print("\n" + "-" * 80)
    print(f"{'Schedule':<25} {'Instances':<10} {'Current':<15} {'Scheduled':<15} {'Savings':<15}")
    print("-" * 80)
    
    for schedule, data in sorted(savings_data.items()):
        print(f"{schedule:<25} {data['count']:<10} "
              f"£{data['current_weekly_cost']:>8.2f}/wk   "
              f"£{data['scheduled_weekly_cost']:>8.2f}/wk   "
              f"£{data['weekly_savings']:>8.2f}/wk ({(data['weekly_savings']/data['current_weekly_cost']*100):.0f}%)")
        
        total_weekly_savings += data['weekly_savings']
        total_current_cost += data['current_weekly_cost']
    
    print("-" * 80)
    print(f"{'TOTAL':<25} {total_instances:<10} "
          f"£{total_current_cost:>8.2f}/wk   "
          f"{'':15} "
          f"£{total_weekly_savings:>8.2f}/wk ({(total_weekly_savings/total_current_cost*100):.0f}%)")
    print("-" * 80)
    
    # Monthly and annual projections
    monthly_savings = total_weekly_savings * 4.33  # Average weeks per month
    annual_savings = total_weekly_savings * 52
    
    print(f"\n💰 PROJECTED SAVINGS:")
    print(f"   Monthly: £{monthly_savings:,.2f}")
    print(f"   Annual:  £{annual_savings:,.2f}")
    
    # Solution cost (approximate)
    solution_cost_monthly = 2.00
    roi_monthly = ((monthly_savings - solution_cost_monthly) / solution_cost_monthly) * 100
    
    print(f"\n📊 ROI ANALYSIS:")
    print(f"   Solution cost: £{solution_cost_monthly:.2f}/month")
    print(f"   Net savings:   £{(monthly_savings - solution_cost_monthly):,.2f}/month")
    print(f"   ROI:           {roi_monthly:,.0f}%")
    
    # Breakeven analysis
    if monthly_savings > solution_cost_monthly:
        hours_to_breakeven = (solution_cost_monthly / (total_weekly_savings / 168)) if total_weekly_savings > 0 else 0
        print(f"   Payback:       {hours_to_breakeven:.1f} hours")

if __name__ == '__main__':
    analyze_savings()

5. Monitoring and Alerting Script

#!/usr/bin/env python3
"""
Monitor Instance Scheduler Lambda execution
Check for errors, track scheduler actions, generate reports
"""

import boto3
from datetime import datetime, timedelta
from collections import Counter

logs = boto3.client('logs', region_name='eu-west-2')
cloudwatch = boto3.client('cloudwatch', region_name='eu-west-2')

LOG_GROUP = '/aws/lambda/instance-scheduler'  # Replace with your log group
NAMESPACE = 'InstanceScheduler'

def analyze_scheduler_logs(hours=24):
    """Analyze scheduler logs for the past N hours"""
    
    start_time = int((datetime.now() - timedelta(hours=hours)).timestamp() * 1000)
    end_time = int(datetime.now().timestamp() * 1000)
    
    print(f"📊 Analyzing scheduler logs for past {hours} hours...")
    
    # Query logs
    query = """
    fields @timestamp, @message
    | filter @message like /Started instance|Stopped instance|Error/
    | sort @timestamp desc
    """
    
    response = logs.start_query(
        logGroupName=LOG_GROUP,
        startTime=start_time,
        endTime=end_time,
        queryString=query
    )
    
    query_id = response['queryId']
    
    # Wait for query to complete
    import time
    while True:
        result = logs.get_query_results(queryId=query_id)
        if result['status'] == 'Complete':
            break
        time.sleep(1)
    
    # Parse results
    started = []
    stopped = []
    errors = []
    
    for record in result['results']:
        message = next((r['value'] for r in record if r['field'] == '@message'), '')
        timestamp = next((r['value'] for r in record if r['field'] == '@timestamp'), '')
        
        if 'Started instance' in message:
            started.append(message)
        elif 'Stopped instance' in message:
            stopped.append(message)
        elif 'Error' in message:
            errors.append(message)
    
    # Print summary
    print(f"\n✅ Instances started: {len(started)}")
    print(f"⛔ Instances stopped: {len(stopped)}")
    print(f"❌ Errors: {len(errors)}")
    
    if errors:
        print("\n🔥 Recent Errors:")
        for error in errors[:5]:
            print(f"  {error}")
    
    return {
        'started': len(started),
        'stopped': len(stopped),
        'errors': len(errors)
    }

def check_cloudwatch_metrics():
    """Check CloudWatch metrics for scheduler"""
    
    end_time = datetime.now()
    start_time = end_time - timedelta(hours=24)
    
    print("\n📈 CloudWatch Metrics (24h):")
    
    metrics = [
        'RunningInstances',
        'StoppedInstances',
        'ScheduledInstances'
    ]
    
    for metric_name in metrics:
        response = cloudwatch.get_metric_statistics(
            Namespace=NAMESPACE,
            MetricName=metric_name,
            StartTime=start_time,
            EndTime=end_time,
            Period=3600,  # 1 hour
            Statistics=['Average', 'Maximum']
        )
        
        if response['Datapoints']:
            latest = sorted(response['Datapoints'], key=lambda x: x['Timestamp'])[-1]
            print(f"  {metric_name}: {latest['Average']:.0f} (max: {latest['Maximum']:.0f})")

def verify_scheduler_health():
    """Verify scheduler is running and healthy"""
    
    print("\n🏥 Scheduler Health Check:")
    
    # Check if Lambda is being invoked
    lambda_client = boto3.client('lambda', region_name='eu-west-2')
    
    try:
        # Get Lambda function
        response = lambda_client.get_function(FunctionName='instance-scheduler')
        print("  ✅ Lambda function exists")
        
        # Check last modification
        last_modified = response['Configuration']['LastModified']
        print(f"  📅 Last modified: {last_modified}")
        
    except Exception as e:
        print(f"  ❌ Error checking Lambda: {str(e)}")
        return False
    
    # Check recent invocations
    end_time = datetime.now()
    start_time = end_time - timedelta(minutes=15)
    
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/Lambda',
        MetricName='Invocations',
        Dimensions=[
            {'Name': 'FunctionName', 'Value': 'instance-scheduler'}
        ],
        StartTime=start_time,
        EndTime=end_time,
        Period=300,
        Statistics=['Sum']
    )
    
    if response['Datapoints']:
        invocations = sum(dp['Sum'] for dp in response['Datapoints'])
        print(f"  ✅ Invocations (last 15min): {invocations:.0f}")
        
        if invocations == 0:
            print("  ⚠️  WARNING: No recent invocations - scheduler may not be running")
            return False
    else:
        print("  ⚠️  WARNING: No invocation metrics found")
        return False
    
    # Check for errors
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/Lambda',
        MetricName='Errors',
        Dimensions=[
            {'Name': 'FunctionName', 'Value': 'instance-scheduler'}
        ],
        StartTime=start_time,
        EndTime=end_time,
        Period=300,
        Statistics=['Sum']
    )
    
    if response['Datapoints']:
        errors = sum(dp['Sum'] for dp in response['Datapoints'])
        if errors > 0:
            print(f"  ⚠️  Errors (last 15min): {errors:.0f}")
        else:
            print("  ✅ No errors")
    
    return True

if __name__ == '__main__':
    # Run all checks
    analyze_scheduler_logs(hours=24)
    check_cloudwatch_metrics()
    is_healthy = verify_scheduler_health()
    
    if is_healthy:
        print("\n✅ Instance Scheduler is healthy and operating normally")
    else:
        print("\n❌ Instance Scheduler may have issues - investigate logs")

Why It Matters

Immediate cost reduction: 70% savings on non-prod environments without changing architecture
Zero maintenance: No custom Lambda code to maintain – AWS-supported solution
Multi-account support: Manage schedules across AWS Organizations from a single scheduler
RDS support: Works for both EC2 instances and RDS databases (including Aurora clusters)
Flexibility: Different schedules for different teams/projects via tagging
Auditability: CloudWatch logs show every start/stop action with timestamps

Try This Week

Deploy the scheduler – Run the CloudFormation deployment script (5 minutes)
Configure schedules – Create 2-3 schedule patterns in DynamoDB (10 minutes)
Tag dev instances – Tag 5-10 instances with your schedule (5 minutes)
Calculate savings – Run the cost calculator to see projected savings (2 minutes)
Monitor first day – Check CloudWatch logs after 24 hours to verify it’s working

Common Instance Scheduler Mistakes

RDS 7-day limit: RDS auto-starts after 7 days stopped – don’t use for schedules with >7 day gaps
Wrong timezone: Schedule times are in the timezone you configure, not UTC
Tag key mismatch: Tag key must exactly match CloudFormation parameter (default: Schedule)
Cross-account without proper setup: Need to deploy spoke stack in secondary accounts
Not monitoring logs: Check CloudWatch Logs regularly to catch errors early
Forgetting SNS notifications: Configure SNS topic parameter to get alerts on errors
Manual start/stop confusion: If you manually start/stop, scheduler will override on next run

Advanced Patterns

SSM Maintenance Windows integration: Scheduler can start instances before maintenance windows and stop after

Cross-account orchestration: Hub-spoke model manages scheduling across AWS Organizations

Custom Lambda hooks: Extend scheduler with your own Lambda functions for custom actions

Cost allocation tags: Add automatic tags when instances start/stop for detailed cost tracking

Weekend deploy schedules: Create special schedules that run different patterns on deploy days

Pro Tip

Start with one environment (dev) and one schedule (uk-office-hours) to validate the solution works in your environment. Tag 5-10 instances, wait 24 hours, check the logs, and verify instances stopped/started correctly. Once validated, roll out to test, QA, and other non-prod environments. This cautious approach prevents accidentally stopping production instances.