The Problem
Your development and test environments run 24/7 even though teams only work Monday-Friday, 9-6. You’re burning thousands in EC2 and RDS costs for instances that sit idle 75% of the time. You tried writing Lambda functions to stop instances on a schedule, but maintaining custom scripts across multiple accounts and regions became a nightmare. Your CFO wants to know why non-prod costs are almost as high as production.
The Solution
AWS Instance Scheduler is an official AWS solution that automatically stops and starts EC2 and RDS instances based on schedules you define. Deploy once via CloudFormation, configure schedules in DynamoDB, tag your instances, and the Lambda function handles everything – checking every 5 minutes and executing start/stop actions. A typical dev environment running 45 hours/week instead of 168 saves 73% on compute costs with zero manual intervention.
Essential Instance Scheduler Implementations
1. CloudFormation Deployment Script
#!/bin/bash
# Deploy AWS Instance Scheduler with standard configuration
# Handles EC2 and RDS across multiple regions
STACK_NAME="instance-scheduler"
REGION="eu-west-2"
TEMPLATE_URL="https://s3.amazonaws.com/solutions-reference/instance-scheduler-on-aws/latest/instance-scheduler-on-aws.template"
# Configuration
SCHEDULE_TAG_KEY="Schedule"
DEFAULT_TIMEZONE="Europe/London"
SCHEDULER_FREQUENCY="5" # Run every 5 minutes
SERVICES="Both" # EC2, RDS, or Both
TARGET_REGIONS="eu-west-2,eu-west-1,us-east-1"
ENABLE_CLOUDWATCH_METRICS="Yes"
ENABLE_CLOUDWATCH_LOGS="Yes"
echo "š Deploying AWS Instance Scheduler to $REGION"
aws cloudformation create-stack \
--stack-name $STACK_NAME \
--template-url $TEMPLATE_URL \
--region $REGION \
--parameters \
ParameterKey=TagName,ParameterValue=$SCHEDULE_TAG_KEY \
ParameterKey=DefaultTimezone,ParameterValue=$DEFAULT_TIMEZONE \
ParameterKey=SchedulerFrequency,ParameterValue=$SCHEDULER_FREQUENCY \
ParameterKey=ScheduledServices,ParameterValue=$SERVICES \
ParameterKey=Regions,ParameterValue=$TARGET_REGIONS \
ParameterKey=MemorySize,ParameterValue=128 \
ParameterKey=Trace,ParameterValue=No \
ParameterKey=EnableCloudWatchMetrics,ParameterValue=$ENABLE_CLOUDWATCH_METRICS \
ParameterKey=EnableCloudWatchLogs,ParameterValue=$ENABLE_CLOUDWATCH_LOGS \
ParameterKey=StartedTags,ParameterValue="SchedulerAction=Started" \
ParameterKey=StoppedTags,ParameterValue="SchedulerAction=Stopped" \
--capabilities CAPABILITY_IAM \
--tags \
Key=Service,Value=InstanceScheduler \
Key=ManagedBy,Value=CloudFormation
echo "ā³ Waiting for stack creation to complete..."
aws cloudformation wait stack-create-complete \
--stack-name $STACK_NAME \
--region $REGION
if [ $? -eq 0 ]; then
echo "ā
Instance Scheduler deployed successfully"
# Get DynamoDB table name
TABLE_NAME=$(aws cloudformation describe-stacks \
--stack-name $STACK_NAME \
--region $REGION \
--query 'Stacks[0].Outputs[?OutputKey==`ConfigurationTable`].OutputValue' \
--output text)
echo "š Configuration table: $TABLE_NAME"
echo "š Next: Configure schedules in DynamoDB"
else
echo "ā Stack creation failed"
exit 1
fi
2. Schedule Configuration Script
#!/usr/bin/env python3
"""
Configure Instance Scheduler periods and schedules in DynamoDB
Creates common scheduling patterns for dev/test/prod environments
"""
import boto3
from datetime import datetime
dynamodb = boto3.resource('dynamodb', region_name='eu-west-2')
table_name = 'instance-scheduler-ConfigTable' # Replace with your table name
table = dynamodb.Table(table_name)
def create_period(name, begintime, endtime, weekdays=None, monthdays=None, months=None):
"""Create a period definition"""
item = {
'type': 'period',
'name': name,
'begintime': begintime,
'endtime': endtime
}
if weekdays:
item['weekdays'] = set(weekdays)
if monthdays:
item['monthdays'] = set(monthdays)
if months:
item['months'] = set(months)
table.put_item(Item=item)
print(f"ā
Created period: {name}")
def create_schedule(name, periods, timezone='Europe/London', description=''):
"""Create a schedule that references periods"""
item = {
'type': 'schedule',
'name': name,
'timezone': timezone,
'periods': set(periods)
}
if description:
item['description'] = description
table.put_item(Item=item)
print(f"ā
Created schedule: {name}")
# Common scheduling patterns
# 1. UK Office Hours (Mon-Fri 9am-6pm)
create_period(
name='uk-office-hours',
begintime='09:00',
endtime='18:00',
weekdays=['mon-fri']
)
create_schedule(
name='uk-office-hours',
periods=['uk-office-hours'],
timezone='Europe/London',
description='Standard UK office hours for dev environments'
)
# 2. Extended Development Hours (Mon-Fri 8am-8pm)
create_period(
name='extended-dev-hours',
begintime='08:00',
endtime='20:00',
weekdays=['mon-fri']
)
create_schedule(
name='extended-dev',
periods=['extended-dev-hours'],
timezone='Europe/London',
description='Extended hours for active development'
)
# 3. Weekend Testing (Sat-Sun only)
create_period(
name='weekend-period',
begintime='00:00',
endtime='23:59',
weekdays=['sat-sun']
)
create_schedule(
name='weekend-only',
periods=['weekend-period'],
timezone='Europe/London',
description='Weekend testing environment'
)
# 4. Business Hours with Maintenance Window (stops Sun 2am-6am)
create_period(
name='weekday-full',
begintime='00:00',
endtime='23:59',
weekdays=['mon-sat']
)
create_period(
name='sunday-after-maintenance',
begintime='06:00',
endtime='23:59',
weekdays=['sun']
)
create_schedule(
name='always-on-with-maintenance',
periods=['weekday-full', 'sunday-after-maintenance'],
timezone='Europe/London',
description='24/7 with Sunday morning maintenance window'
)
# 5. Month-end Processing (only last 3 days of month)
create_period(
name='month-end-period',
begintime='00:00',
endtime='23:59',
monthdays=['28', '29', '30', '31'] # Will only run on days that exist
)
create_schedule(
name='month-end-only',
periods=['month-end-period'],
timezone='Europe/London',
description='Month-end processing instances'
)
# 6. QA Environment (Mon-Fri 7am-10pm for overnight test runs)
create_period(
name='qa-hours',
begintime='07:00',
endtime='22:00',
weekdays=['mon-fri']
)
create_schedule(
name='qa-extended',
periods=['qa-hours'],
timezone='Europe/London',
description='QA environment with extended hours for overnight tests'
)
# 7. US East Coast Hours (for global teams)
create_period(
name='us-east-hours',
begintime='09:00',
endtime='18:00',
weekdays=['mon-fri']
)
create_schedule(
name='us-east-office',
periods=['us-east-hours'],
timezone='America/New_York',
description='US East Coast office hours'
)
print("\nā
All schedules configured successfully")
print("\nš Tag your instances with: Schedule=<schedule-name>")
print("Example: aws ec2 create-tags --resources i-1234567890 --tags Key=Schedule,Value=uk-office-hours")
3. Bulk Tagging Script
#!/bin/bash
# Bulk tag instances with scheduler schedules
# Tags based on instance name patterns or existing tags
REGION="eu-west-2"
SCHEDULE_TAG_KEY="Schedule"
# Function to tag instances
tag_instances() {
local instance_ids=$1
local schedule_value=$2
if [ -z "$instance_ids" ]; then
echo "ā ļø No instances found for schedule: $schedule_value"
return
fi
echo "š·ļø Tagging instances for schedule: $schedule_value"
aws ec2 create-tags \
--region $REGION \
--resources $instance_ids \
--tags Key=$SCHEDULE_TAG_KEY,Value=$schedule_value
echo "ā
Tagged $(echo $instance_ids | wc -w) instances"
}
# Tag dev instances (by Name tag containing 'dev')
DEV_INSTANCES=$(aws ec2 describe-instances \
--region $REGION \
--filters "Name=tag:Name,Values=*dev*" "Name=instance-state-name,Values=running,stopped" \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text)
tag_instances "$DEV_INSTANCES" "uk-office-hours"
# Tag test instances
TEST_INSTANCES=$(aws ec2 describe-instances \
--region $REGION \
--filters "Name=tag:Name,Values=*test*" "Name=instance-state-name,Values=running,stopped" \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text)
tag_instances "$TEST_INSTANCES" "extended-dev"
# Tag QA instances
QA_INSTANCES=$(aws ec2 describe-instances \
--region $REGION \
--filters "Name=tag:Name,Values=*qa*" "Name=instance-state-name,Values=running,stopped" \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text)
tag_instances "$QA_INSTANCES" "qa-extended"
# Tag RDS instances
echo ""
echo "šļø Tagging RDS instances..."
# Dev databases
DEV_DBS=$(aws rds describe-db-instances \
--region $REGION \
--query 'DBInstances[?contains(DBInstanceIdentifier, `dev`)].DBInstanceArn' \
--output text)
for db_arn in $DEV_DBS; do
aws rds add-tags-to-resource \
--region $REGION \
--resource-name $db_arn \
--tags Key=$SCHEDULE_TAG_KEY,Value=uk-office-hours
echo "ā
Tagged RDS: $(basename $db_arn)"
done
echo ""
echo "ā
Bulk tagging complete"
echo "š Verify tags: aws ec2 describe-instances --filters \"Name=tag:$SCHEDULE_TAG_KEY,Values=*\" --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==\`$SCHEDULE_TAG_KEY\`].Value]' --output table"
4. Cost Savings Calculator
#!/usr/bin/env python3
"""
Calculate potential cost savings from Instance Scheduler
Analyzes current usage and projects savings based on schedules
"""
import boto3
from datetime import datetime, timedelta
from collections import defaultdict
ec2 = boto3.client('ec2', region_name='eu-west-2')
pricing = boto3.client('pricing', region_name='us-east-1')
# AWS pricing (approximate for eu-west-2)
EC2_PRICING = {
't3.micro': 0.0104,
't3.small': 0.0208,
't3.medium': 0.0416,
't3.large': 0.0832,
'm5.large': 0.096,
'm5.xlarge': 0.192,
'm5.2xlarge': 0.384,
'r5.large': 0.126,
'r5.xlarge': 0.252,
}
# Schedule runtime hours per week
SCHEDULE_HOURS = {
'uk-office-hours': 45, # 9am-6pm Mon-Fri = 9hrs * 5 days
'extended-dev': 60, # 8am-8pm Mon-Fri = 12hrs * 5 days
'qa-extended': 75, # 7am-10pm Mon-Fri = 15hrs * 5 days
'weekend-only': 48, # Sat-Sun full days
'always-on': 168, # 24/7
}
def get_instance_cost(instance_type, hours_per_week):
"""Calculate weekly cost for instance type"""
hourly_rate = EC2_PRICING.get(instance_type, 0.10) # Default fallback
return hourly_rate * hours_per_week
def analyze_savings():
"""Analyze potential savings across all tagged instances"""
# Get all instances with Schedule tag
response = ec2.describe_instances(
Filters=[
{'Name': 'tag-key', 'Values': ['Schedule']},
{'Name': 'instance-state-name', 'Values': ['running', 'stopped']}
]
)
savings_data = defaultdict(lambda: {
'count': 0,
'current_weekly_cost': 0,
'scheduled_weekly_cost': 0,
'weekly_savings': 0
})
total_instances = 0
for reservation in response['Reservations']:
for instance in reservation['Instances']:
total_instances += 1
instance_id = instance['InstanceId']
instance_type = instance['InstanceType']
# Get schedule tag
schedule = next(
(tag['Value'] for tag in instance.get('Tags', []) if tag['Key'] == 'Schedule'),
None
)
if not schedule:
continue
# Calculate costs
current_cost = get_instance_cost(instance_type, 168) # Current 24/7
scheduled_hours = SCHEDULE_HOURS.get(schedule, 168)
scheduled_cost = get_instance_cost(instance_type, scheduled_hours)
savings = current_cost - scheduled_cost
# Aggregate by schedule
savings_data[schedule]['count'] += 1
savings_data[schedule]['current_weekly_cost'] += current_cost
savings_data[schedule]['scheduled_weekly_cost'] += scheduled_cost
savings_data[schedule]['weekly_savings'] += savings
# Print results
print("=" * 80)
print("AWS INSTANCE SCHEDULER - COST SAVINGS ANALYSIS")
print("=" * 80)
print(f"\nTotal instances with Schedule tag: {total_instances}")
print(f"Analysis date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
total_weekly_savings = 0
total_current_cost = 0
print("\n" + "-" * 80)
print(f"{'Schedule':<25} {'Instances':<10} {'Current':<15} {'Scheduled':<15} {'Savings':<15}")
print("-" * 80)
for schedule, data in sorted(savings_data.items()):
print(f"{schedule:<25} {data['count']:<10} "
f"Ā£{data['current_weekly_cost']:>8.2f}/wk "
f"Ā£{data['scheduled_weekly_cost']:>8.2f}/wk "
f"Ā£{data['weekly_savings']:>8.2f}/wk ({(data['weekly_savings']/data['current_weekly_cost']*100):.0f}%)")
total_weekly_savings += data['weekly_savings']
total_current_cost += data['current_weekly_cost']
print("-" * 80)
print(f"{'TOTAL':<25} {total_instances:<10} "
f"Ā£{total_current_cost:>8.2f}/wk "
f"{'':15} "
f"Ā£{total_weekly_savings:>8.2f}/wk ({(total_weekly_savings/total_current_cost*100):.0f}%)")
print("-" * 80)
# Monthly and annual projections
monthly_savings = total_weekly_savings * 4.33 # Average weeks per month
annual_savings = total_weekly_savings * 52
print(f"\nš° PROJECTED SAVINGS:")
print(f" Monthly: £{monthly_savings:,.2f}")
print(f" Annual: £{annual_savings:,.2f}")
# Solution cost (approximate)
solution_cost_monthly = 2.00
roi_monthly = ((monthly_savings - solution_cost_monthly) / solution_cost_monthly) * 100
print(f"\nš ROI ANALYSIS:")
print(f" Solution cost: £{solution_cost_monthly:.2f}/month")
print(f" Net savings: £{(monthly_savings - solution_cost_monthly):,.2f}/month")
print(f" ROI: {roi_monthly:,.0f}%")
# Breakeven analysis
if monthly_savings > solution_cost_monthly:
hours_to_breakeven = (solution_cost_monthly / (total_weekly_savings / 168)) if total_weekly_savings > 0 else 0
print(f" Payback: {hours_to_breakeven:.1f} hours")
if __name__ == '__main__':
analyze_savings()
5. Monitoring and Alerting Script
#!/usr/bin/env python3
"""
Monitor Instance Scheduler Lambda execution
Check for errors, track scheduler actions, generate reports
"""
import boto3
from datetime import datetime, timedelta
from collections import Counter
logs = boto3.client('logs', region_name='eu-west-2')
cloudwatch = boto3.client('cloudwatch', region_name='eu-west-2')
LOG_GROUP = '/aws/lambda/instance-scheduler' # Replace with your log group
NAMESPACE = 'InstanceScheduler'
def analyze_scheduler_logs(hours=24):
"""Analyze scheduler logs for the past N hours"""
start_time = int((datetime.now() - timedelta(hours=hours)).timestamp() * 1000)
end_time = int(datetime.now().timestamp() * 1000)
print(f"š Analyzing scheduler logs for past {hours} hours...")
# Query logs
query = """
fields @timestamp, @message
| filter @message like /Started instance|Stopped instance|Error/
| sort @timestamp desc
"""
response = logs.start_query(
logGroupName=LOG_GROUP,
startTime=start_time,
endTime=end_time,
queryString=query
)
query_id = response['queryId']
# Wait for query to complete
import time
while True:
result = logs.get_query_results(queryId=query_id)
if result['status'] == 'Complete':
break
time.sleep(1)
# Parse results
started = []
stopped = []
errors = []
for record in result['results']:
message = next((r['value'] for r in record if r['field'] == '@message'), '')
timestamp = next((r['value'] for r in record if r['field'] == '@timestamp'), '')
if 'Started instance' in message:
started.append(message)
elif 'Stopped instance' in message:
stopped.append(message)
elif 'Error' in message:
errors.append(message)
# Print summary
print(f"\nā
Instances started: {len(started)}")
print(f"ā Instances stopped: {len(stopped)}")
print(f"ā Errors: {len(errors)}")
if errors:
print("\nš„ Recent Errors:")
for error in errors[:5]:
print(f" {error}")
return {
'started': len(started),
'stopped': len(stopped),
'errors': len(errors)
}
def check_cloudwatch_metrics():
"""Check CloudWatch metrics for scheduler"""
end_time = datetime.now()
start_time = end_time - timedelta(hours=24)
print("\nš CloudWatch Metrics (24h):")
metrics = [
'RunningInstances',
'StoppedInstances',
'ScheduledInstances'
]
for metric_name in metrics:
response = cloudwatch.get_metric_statistics(
Namespace=NAMESPACE,
MetricName=metric_name,
StartTime=start_time,
EndTime=end_time,
Period=3600, # 1 hour
Statistics=['Average', 'Maximum']
)
if response['Datapoints']:
latest = sorted(response['Datapoints'], key=lambda x: x['Timestamp'])[-1]
print(f" {metric_name}: {latest['Average']:.0f} (max: {latest['Maximum']:.0f})")
def verify_scheduler_health():
"""Verify scheduler is running and healthy"""
print("\nš„ Scheduler Health Check:")
# Check if Lambda is being invoked
lambda_client = boto3.client('lambda', region_name='eu-west-2')
try:
# Get Lambda function
response = lambda_client.get_function(FunctionName='instance-scheduler')
print(" ā
Lambda function exists")
# Check last modification
last_modified = response['Configuration']['LastModified']
print(f" š
Last modified: {last_modified}")
except Exception as e:
print(f" ā Error checking Lambda: {str(e)}")
return False
# Check recent invocations
end_time = datetime.now()
start_time = end_time - timedelta(minutes=15)
response = cloudwatch.get_metric_statistics(
Namespace='AWS/Lambda',
MetricName='Invocations',
Dimensions=[
{'Name': 'FunctionName', 'Value': 'instance-scheduler'}
],
StartTime=start_time,
EndTime=end_time,
Period=300,
Statistics=['Sum']
)
if response['Datapoints']:
invocations = sum(dp['Sum'] for dp in response['Datapoints'])
print(f" ā
Invocations (last 15min): {invocations:.0f}")
if invocations == 0:
print(" ā ļø WARNING: No recent invocations - scheduler may not be running")
return False
else:
print(" ā ļø WARNING: No invocation metrics found")
return False
# Check for errors
response = cloudwatch.get_metric_statistics(
Namespace='AWS/Lambda',
MetricName='Errors',
Dimensions=[
{'Name': 'FunctionName', 'Value': 'instance-scheduler'}
],
StartTime=start_time,
EndTime=end_time,
Period=300,
Statistics=['Sum']
)
if response['Datapoints']:
errors = sum(dp['Sum'] for dp in response['Datapoints'])
if errors > 0:
print(f" ā ļø Errors (last 15min): {errors:.0f}")
else:
print(" ā
No errors")
return True
if __name__ == '__main__':
# Run all checks
analyze_scheduler_logs(hours=24)
check_cloudwatch_metrics()
is_healthy = verify_scheduler_health()
if is_healthy:
print("\nā
Instance Scheduler is healthy and operating normally")
else:
print("\nā Instance Scheduler may have issues - investigate logs")
Why It Matters
- Immediate cost reduction: 70% savings on non-prod environments without changing architecture
- Zero maintenance: No custom Lambda code to maintain – AWS-supported solution
- Multi-account support: Manage schedules across AWS Organizations from a single scheduler
- RDS support: Works for both EC2 instances and RDS databases (including Aurora clusters)
- Flexibility: Different schedules for different teams/projects via tagging
- Auditability: CloudWatch logs show every start/stop action with timestamps
Try This Week
- Deploy the scheduler – Run the CloudFormation deployment script (5 minutes)
- Configure schedules – Create 2-3 schedule patterns in DynamoDB (10 minutes)
- Tag dev instances – Tag 5-10 instances with your schedule (5 minutes)
- Calculate savings – Run the cost calculator to see projected savings (2 minutes)
- Monitor first day – Check CloudWatch logs after 24 hours to verify it’s working
Common Instance Scheduler Mistakes
- RDS 7-day limit: RDS auto-starts after 7 days stopped – don’t use for schedules with >7 day gaps
- Wrong timezone: Schedule times are in the timezone you configure, not UTC
- Tag key mismatch: Tag key must exactly match CloudFormation parameter (default:
Schedule) - Cross-account without proper setup: Need to deploy spoke stack in secondary accounts
- Not monitoring logs: Check CloudWatch Logs regularly to catch errors early
- Forgetting SNS notifications: Configure SNS topic parameter to get alerts on errors
- Manual start/stop confusion: If you manually start/stop, scheduler will override on next run
Advanced Patterns
SSM Maintenance Windows integration: Scheduler can start instances before maintenance windows and stop after
Cross-account orchestration: Hub-spoke model manages scheduling across AWS Organizations
Custom Lambda hooks: Extend scheduler with your own Lambda functions for custom actions
Cost allocation tags: Add automatic tags when instances start/stop for detailed cost tracking
Weekend deploy schedules: Create special schedules that run different patterns on deploy days
Pro Tip
Start with one environment (dev) and one schedule (uk-office-hours) to validate the solution works in your environment. Tag 5-10 instances, wait 24 hours, check the logs, and verify instances stopped/started correctly. Once validated, roll out to test, QA, and other non-prod environments. This cautious approach prevents accidentally stopping production instances.








