Amazon CloudWatch can automatically recover an EC2 instance if it fails due to underlying hardware issues using the EC2 Auto Recovery feature.
This works by creating a CloudWatch Alarm on the EC2 metric:
When this metric becomes 1 (failure), CloudWatch triggers a Recover action, which automatically migrates the instance to healthy hardware — without changing:
Instance ID
Private IP
Elastic IP
EBS volumes
IAM role
Metadata
There are two status checks in EC2:
Instance Status Check → OS-level issues (like kernel panic)
System Status Check → AWS hardware failure
⚠️ Auto Recovery works only for System Status Check failures (hardware issue).
Select the EC2 instance.
Click Status Checks
Click Create Status Check Alarm
Metric: StatusCheckFailed_System
Threshold: >= 1
Evaluation period: 2 consecutive periods (recommended)
Choose:
Done
You have:
EC2 instance running core banking API
Behind ALB
Auto Scaling min=1 (single instance for licensing reason)
Suddenly:
AWS underlying hardware fails
System Status Check becomes 1
Without recovery:
Application downtime
SLA violation
Customer impact
With CloudWatch Auto Recovery:
Instance automatically moved to new hardware
No IP change
No manual intervention
Downtime < 2 minutes
✔ Enable recovery alarms for all standalone production EC2
✔ Use Auto Scaling if possible (better than single instance recovery)
✔ Combine with SNS notifications for alerts
✔ Monitor both Instance & System status checks
Application crash
Disk full
High CPU
OS corruption
Network misconfiguration
For those → use:
Auto Scaling replacement
SSM automation
Self-healing scripts
If interviewer asks:
“Why not just use Auto Scaling instead?”
Correct answer:
CloudWatch EC2 Recovery:
Monitors StatusCheckFailed_System
Triggers automatic recovery
Preserves instance configuration
Prevents downtime from hardware failures
Critical for enterprise production reliability
Not a member yet? Register now
Are you a member? Login now