Your weekly dose of actionable cloud wisdom to start the week right
The Problem
Your team is still SSH-ing into servers to patch systems, check logs, and run maintenance tasks. You have no clear inventory of what’s running where, patch compliance is a manual nightmare, and your security team is asking uncomfortable questions about SSH key management and audit trails. Meanwhile, servers are drifting from their desired configurations, and nobody’s sure which instances need urgent security updates.
The Solution
Replace manual server management with AWS Systems Manager’s automation capabilities. Stop SSH-ing into servers, automate patch management, maintain configuration compliance, and get complete visibility into your infrastructure. Systems Manager provides secure, auditable, and scalable server management without the overhead of traditional tools.
Essential Systems Manager Capabilities:
1. Session Manager – Secure Shell Access Without SSH
# No more SSH keys or bastion hosts needed!
# Connect to instances through Session Manager
aws ssm start-session --target i-1234567890abcdef0
# Port forwarding for database access
aws ssm start-session \
--target i-1234567890abcdef0 \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["3306"],"localPortNumber":["3306"]}'
# Run commands on multiple instances
aws ssm send-command \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["sudo yum update -y","sudo systemctl restart httpd"]' \
--targets "Key=tag:Environment,Values=production" \
--comment "Update and restart web servers"
2. Automated Patch Management
{
"PatchGroupName": "ProductionWebServers",
"MaintenanceWindow": {
"Name": "ProductionPatching",
"Description": "Automated patching for production web servers",
"Schedule": "cron(0 2 ? * SUN *)",
"Duration": 4,
"Cutoff": 1,
"AllowUnassociatedTargets": false,
"Tags": [
{
"Key": "Environment",
"Value": "Production"
}
]
},
"PatchBaseline": {
"Name": "SecurityOnlyBaseline",
"Description": "Only install security patches",
"OperatingSystem": "AMAZON_LINUX_2",
"ApprovalRules": {
"PatchRules": [
{
"PatchFilterGroup": {
"PatchFilters": [
{
"Key": "CLASSIFICATION",
"Values": ["Security", "Critical"]
},
{
"Key": "SEVERITY",
"Values": ["Critical", "Important"]
}
]
},
"ComplianceLevel": "CRITICAL",
"ApproveAfterDays": 0,
"EnableNonSecurity": false
}
]
},
"ApprovedPatches": [],
"RejectedPatches": [],
"Sources": []
}
}
# Create patch baseline
aws ssm create-patch-baseline \
--name "SecurityOnlyBaseline" \
--operating-system "AMAZON_LINUX_2" \
--approval-rules file://patch-rules.json \
--description "Only install security patches"
# Register patch group
aws ssm register-patch-baseline-for-patch-group \
--baseline-id "pb-1234567890abcdef0" \
--patch-group "ProductionWebServers"
# Create maintenance window
aws ssm create-maintenance-window \
--name "ProductionPatching" \
--schedule "cron(0 2 ? * SUN *)" \
--duration 4 \
--cutoff 1 \
--description "Automated patching for production servers"
# Register targets for maintenance window
aws ssm register-target-with-maintenance-window \
--window-id "mw-1234567890abcdef0" \
--target-type "Instance" \
--targets "Key=tag:PatchGroup,Values=ProductionWebServers"
3. Inventory and Compliance Monitoring
# CloudFormation template for compliance monitoring
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Systems Manager compliance monitoring setup'
Resources:
# Association for inventory collection
InventoryAssociation:
Type: AWS::SSM::Association
Properties:
Name: AWS-GatherSoftwareInventory
Targets:
- Key: "tag:Environment"
Values:
- "production"
- "staging"
ScheduleExpression: "rate(30 days)"
AssociationName: "InventoryCollection"
Parameters:
applications: "Enabled"
awsComponents: "Enabled"
customInventory: "Enabled"
instanceDetailedInformation: "Enabled"
networkConfig: "Enabled"
services: "Enabled"
windowsRegistry: "Disabled"
windowsRoles: "Disabled"
# Compliance association for security configuration
ComplianceAssociation:
Type: AWS::SSM::Association
Properties:
Name: AWS-ConfigureAWSPackage
Targets:
- Key: "tag:Environment"
Values: ["production"]
ScheduleExpression: "rate(1 day)"
Parameters:
action: "Install"
name: "AmazonCloudWatchAgent"
version: "latest"
# Custom compliance rule
CustomComplianceRule:
Type: AWS::Config::ConfigRule
Properties:
ConfigRuleName: "ssm-agent-compliance"
Description: "Checks if SSM Agent is running and up to date"
Source:
Owner: "AWS"
SourceIdentifier: "EC2_MANAGEDINSTANCE_ASSOCIATION_COMPLIANCE_STATUS_CHECK"
Scope:
ComplianceResourceTypes:
- "AWS::SSM::ManagedInstanceInventory"
4. Parameter Store for Configuration Management
# Store configuration parameters securely
aws ssm put-parameter \
--name "/app/database/connection-string" \
--value "postgresql://user:pass@db.example.com:5432/myapp" \
--type "SecureString" \
--description "Database connection string for application"
aws ssm put-parameter \
--name "/app/api/rate-limit" \
--value "1000" \
--type "String" \
--description "API rate limit per minute"
# Create parameter hierarchy
aws ssm put-parameter \
--name "/app/production/database/host" \
--value "prod-db.cluster-xyz.eu-west-1.rds.amazonaws.com" \
--type "String"
aws ssm put-parameter \
--name "/app/staging/database/host" \
--value "staging-db.cluster-abc.eu-west-1.rds.amazonaws.com" \
--type "String"
# Retrieve parameters in application code
aws ssm get-parameters-by-path \
--path "/app/production/" \
--recursive \
--with-decryption
5. Run Command for Automated Operations
# Python script for automated server management
import boto3
import json
import time
def run_command_on_instances(command, instance_ids=None, tag_filters=None):
"""
Execute commands on EC2 instances using Systems Manager
"""
ssm = boto3.client('ssm')
# Define targets
if instance_ids:
targets = [{"Key": "InstanceIds", "Values": instance_ids}]
elif tag_filters:
targets = [{"Key": f"tag:{k}", "Values": v} for k, v in tag_filters.items()]
else:
raise ValueError("Must specify either instance_ids or tag_filters")
# Send command
response = ssm.send_command(
DocumentName='AWS-RunShellScript',
Parameters={'commands': command if isinstance(command, list) else [command]},
Targets=targets,
Comment='Automated maintenance task',
TimeoutSeconds=300,
MaxConcurrency='50%',
MaxErrors='2'
)
command_id = response['Command']['CommandId']
print(f"Command sent: {command_id}")
# Wait for completion and get results
return wait_for_command_completion(command_id, targets)
def wait_for_command_completion(command_id, targets):
"""
Wait for command completion and return results
"""
ssm = boto3.client('ssm')
while True:
time.sleep(5)
# Get command invocations
invocations = ssm.list_command_invocations(
CommandId=command_id,
Details=True
)
completed = 0
total = len(invocations['CommandInvocations'])
results = []
for invocation in invocations['CommandInvocations']:
if invocation['Status'] in ['Success', 'Failed', 'Cancelled', 'TimedOut']:
completed += 1
results.append({
'InstanceId': invocation['InstanceId'],
'Status': invocation['Status'],
'StandardOutput': invocation.get('StandardOutputContent', ''),
'StandardError': invocation.get('StandardErrorContent', '')
})
if completed == total:
break
print(f"Progress: {completed}/{total} instances completed")
return results
# Example usage
# Update all production web servers
web_server_results = run_command_on_instances(
command=[
'sudo yum update -y',
'sudo systemctl restart httpd',
'curl -f http://localhost/health || echo "Health check failed"'
],
tag_filters={'Environment': ['production'], 'Role': ['webserver']}
)
# Check disk space on all servers
disk_check_results = run_command_on_instances(
command='df -h | grep -E "(Filesystem|/dev/)" | head -10',
tag_filters={'Environment': ['production']}
)
for result in disk_check_results:
print(f"Instance {result['InstanceId']}: {result['Status']}")
if result['StandardOutput']:
print(f"Output: {result['StandardOutput']}")
6. Maintenance Windows for Scheduled Operations
# Create maintenance window for automated tasks
aws ssm create-maintenance-window \
--name "WeeklyMaintenance" \
--description "Weekly maintenance tasks" \
--schedule "cron(0 3 ? * MON *)" \
--duration 3 \
--cutoff 1 \
--allow-unassociated-targets
# Register targets
WINDOW_ID=$(aws ssm create-maintenance-window \
--name "WeeklyMaintenance" \
--schedule "cron(0 3 ? * MON *)" \
--duration 3 \
--cutoff 1 \
--query 'WindowId' \
--output text)
# Register production servers as targets
aws ssm register-target-with-maintenance-window \
--window-id $WINDOW_ID \
--target-type "Instance" \
--targets "Key=tag:Environment,Values=production"
# Register cleanup task
aws ssm register-task-with-maintenance-window \
--window-id $WINDOW_ID \
--target-type "Instance" \
--targets "Key=tag:Environment,Values=production" \
--task-type "RUN_COMMAND" \
--task-arn "AWS-RunShellScript" \
--service-role-arn "arn:aws:iam::123456789012:role/MaintenanceWindowRole" \
--task-parameters '{
"commands": [
"sudo find /tmp -type f -atime +7 -delete",
"sudo find /var/log -name \"*.log\" -size +100M -exec gzip {} \\;",
"sudo docker system prune -f",
"sudo systemctl restart rsyslog"
]
}' \
--priority 1 \
--max-concurrency "50%" \
--max-errors "2"
Advanced Automation Patterns
7. State Manager for Configuration Drift Prevention
{
"AssociationName": "EnforceSecurityConfiguration",
"Name": "AWS-ApplyAnsiblePlaybooks",
"Targets": [
{
"Key": "tag:Environment",
"Values": ["production"]
}
],
"ScheduleExpression": "rate(12 hours)",
"Parameters": {
"SourceType": ["S3"],
"SourceInfo": ["{\"path\":\"https://s3.amazonaws.com/my-bucket/security-playbook.yml\"}"],
"InstallDependencies": ["True"],
"PlaybookFile": ["security-playbook.yml"],
"ExtraVariables": ["SSM=True"],
"Check": ["False"],
"Verbose": ["-v"]
},
"ComplianceType": "Custom:Security",
"OutputLocation": {
"S3Location": {
"OutputS3BucketName": "my-compliance-logs",
"OutputS3KeyPrefix": "security-enforcement/"
}
}
}
8. Custom Documents for Standardized Operations
# Custom SSM document for application deployment
schemaVersion: '2.2'
description: 'Deploy application with health checks and rollback'
parameters:
applicationVersion:
type: String
description: 'Version of application to deploy'
healthCheckUrl:
type: String
description: 'URL for health check verification'
default: 'http://localhost:8080/health'
rollbackOnFailure:
type: String
description: 'Rollback if deployment fails'
default: 'true'
allowedValues:
- 'true'
- 'false'
mainSteps:
- action: 'aws:runShellScript'
name: 'BackupCurrentVersion'
inputs:
runCommand:
- 'sudo cp -r /opt/myapp /opt/myapp.backup.$(date +%Y%m%d_%H%M%S)'
- 'echo "Backup completed"'
- action: 'aws:runShellScript'
name: 'DeployNewVersion'
inputs:
runCommand:
- 'sudo systemctl stop myapp'
- 'cd /opt && sudo wget -O myapp-{{applicationVersion}}.tar.gz https://releases.mycompany.com/myapp-{{applicationVersion}}.tar.gz'
- 'sudo tar -xzf myapp-{{applicationVersion}}.tar.gz'
- 'sudo rm -rf myapp && sudo mv myapp-{{applicationVersion}} myapp'
- 'sudo systemctl start myapp'
- 'echo "Deployment completed"'
- action: 'aws:runShellScript'
name: 'HealthCheck'
inputs:
runCommand:
- 'sleep 30'
- 'for i in {1..5}; do curl -f {{healthCheckUrl}} && echo "Health check passed" && exit 0; sleep 10; done'
- 'echo "Health check failed after 5 attempts" && exit 1'
- action: 'aws:runShellScript'
name: 'RollbackIfNeeded'
precondition:
StringEquals:
- '{{ rollbackOnFailure }}'
- 'true'
onFailure: Continue
inputs:
runCommand:
- 'if [ $? -ne 0 ]; then'
- ' echo "Rolling back due to health check failure"'
- ' sudo systemctl stop myapp'
- ' sudo rm -rf /opt/myapp'
- ' sudo mv /opt/myapp.backup.* /opt/myapp'
- ' sudo systemctl start myapp'
- ' echo "Rollback completed"'
- 'fi'
Cost and Security Benefits
9. Cost Analysis Script
# Calculate cost savings from Systems Manager automation
def calculate_ssm_savings():
"""
Estimate cost savings from using Systems Manager vs manual operations
"""
# Assumptions
num_servers = 100
admin_hourly_rate = 50 # £50/hour
hours_per_month_manual = 40 # Manual maintenance hours
# Manual operation costs
monthly_manual_cost = num_servers * (hours_per_month_manual / num_servers) * admin_hourly_rate
# Systems Manager costs
ssm_monthly_cost = 0 # SSM is free for managed instances
session_manager_cost = 0 # Session Manager is free
# Time savings (80% reduction in manual tasks)
time_savings_percent = 0.8
monthly_savings = monthly_manual_cost * time_savings_percent
print("=== Systems Manager Cost Analysis ===")
print(f"Number of servers: {num_servers}")
print(f"Manual operations cost: £{monthly_manual_cost:.2f}/month")
print(f"Systems Manager cost: £{ssm_monthly_cost:.2f}/month")
print(f"Monthly savings: £{monthly_savings:.2f}")
print(f"Annual savings: £{monthly_savings * 12:.2f}")
print()
print("Additional benefits:")
print("- Improved security (no SSH keys)")
print("- Complete audit trail")
print("- Standardized operations")
print("- Reduced human error")
print("- 24/7 automated compliance")
return monthly_savings
calculate_ssm_savings()
Why It Matters
- Security: No SSH keys to manage, complete audit trails, secure access
- Automation: Eliminate manual tasks, reduce human error, ensure consistency
- Compliance: Automated patch management, configuration drift detection
- Cost: Reduce operational overhead by 70-80%, no additional licensing fees
- Scalability: Manage thousands of instances as easily as one
Try This Week
- Set up Session Manager – Replace SSH access for at least one instance
- Create a patch baseline – Automate security updates for development servers
- Implement inventory collection – Get visibility into what’s installed where
- Store one configuration – Move a sensitive config value to Parameter Store
Quick Systems Manager Setup Script
#!/bin/bash
# Quick Systems Manager setup for existing instances
# Ensure instances have required IAM role
echo "=== Systems Manager Setup ==="
echo
# Check SSM Agent status on instances
echo "📊 Checking SSM Agent status..."
aws ssm describe-instance-information \
--query 'InstanceInformationList[*].[InstanceId,Name,PingStatus,PlatformType,AgentVersion]' \
--output table
echo
echo "🔍 Instances not managed by SSM:"
# Get all EC2 instances
ALL_INSTANCES=$(aws ec2 describe-instances \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text)
# Get SSM managed instances
SSM_INSTANCES=$(aws ssm describe-instance-information \
--query 'InstanceInformationList[*].InstanceId' \
--output text)
# Find unmanaged instances
for instance in $ALL_INSTANCES; do
if ! echo "$SSM_INSTANCES" | grep -q "$instance"; then
echo " $instance - Needs IAM role attachment"
fi
done
echo
echo "📋 Recent command executions:"
aws ssm list-commands \
--max-items 5 \
--query 'Commands[*].[CommandId,DocumentName,Status,RequestedDateTime]' \
--output table
echo
echo "🎯 Next steps to complete setup:"
echo "1. Attach AmazonSSMManagedInstanceCore policy to EC2 instance role"
echo "2. Restart SSM Agent: sudo systemctl restart amazon-ssm-agent"
echo "3. Verify connectivity: aws ssm describe-instance-information"
echo "4. Test Session Manager: aws ssm start-session --target INSTANCE-ID"
echo "5. Create first maintenance window for patch management"
Common Systems Manager Patterns
- Blue/Green Deployments: Use Run Command to coordinate application updates
- Configuration Management: State Manager ensures servers stay configured correctly
- Incident Response: Quick access to servers during outages without SSH
- Compliance Reporting: Automated inventory and patch compliance auditing
- Cost Optimization: Scheduled shutdown/startup of development environments
Integration with Other AWS Services
- CloudWatch: Monitoring and alerting on Systems Manager activities
- Config: Compliance rules based on Systems Manager inventory
- EventBridge: Trigger workflows based on Systems Manager events
- Lambda: Custom automation triggered by Systems Manager findings
Pro Tip: Start with Session Manager to replace SSH access – it’s the quickest win with immediate security benefits. Once your team experiences the convenience and security of Session Manager, they’ll be eager to adopt other Systems Manager capabilities.








