Question3.

Q3: How do you configure CloudWatch to recover an EC2 instance?

Detailed Answer (Production Explanation)

Amazon CloudWatch can automatically recover an EC2 instance if it fails due to underlying hardware issues using the EC2 Auto Recovery feature.

This works by creating a CloudWatch Alarm on the EC2 metric:

StatusCheckFailed_System

When this metric becomes 1 (failure), CloudWatch triggers a Recover action, which automatically migrates the instance to healthy hardware — without changing:

Instance ID
Private IP
Elastic IP
EBS volumes
IAM role
Metadata

What Type of Failure Does It Fix?

There are two status checks in EC2:

Instance Status Check → OS-level issues (like kernel panic)
System Status Check → AWS hardware failure

⚠️ Auto Recovery works only for System Status Check failures (hardware issue).

Step-by-Step Configuration (Console Method)

Step 1: Go to EC2 Console

Select the EC2 instance.

Step 2: Open Monitoring Tab

Click Status Checks

Step 3: Create Alarm

Click Create Status Check Alarm

Step 4: Configure Alarm

Metric: StatusCheckFailed_System
Threshold: >= 1
Evaluation period: 2 consecutive periods (recommended)

Step 5: Select Action

Choose:

Recover this instance

Step 6: Create Alarm

Done

CLI Method (Production Engineers Prefer This)

aws cloudwatch put-metric-alarm \

–alarm-name EC2-Recovery-Alarm \

–metric-name StatusCheckFailed_System \

–namespace AWS/EC2 \

–statistic Maximum \

–period 60 \

–threshold 1 \

–comparison-operator GreaterThanOrEqualToThreshold \

–dimensions Name=InstanceId,Value=i-xxxxxxxx \

–evaluation-periods 2 \

–alarm-actions arn:aws:automate:region:ec2:recover

Production Scenario

Scenario: Banking Application Server

You have:

EC2 instance running core banking API
Behind ALB
Auto Scaling min=1 (single instance for licensing reason)

Suddenly:

AWS underlying hardware fails
System Status Check becomes 1

Without recovery:

Application downtime
SLA violation
Customer impact

With CloudWatch Auto Recovery:

Instance automatically moved to new hardware
No IP change
No manual intervention
Downtime < 2 minutes

Enterprise Best Practices

✔ Enable recovery alarms for all standalone production EC2
✔ Use Auto Scaling if possible (better than single instance recovery)
✔ Combine with SNS notifications for alerts
✔ Monitor both Instance & System status checks

When Auto Recovery Does NOT Help

Application crash
Disk full
High CPU
OS corruption
Network misconfiguration

For those → use:

Auto Scaling replacement
SSM automation
Self-healing scripts

Interview Tip:

If interviewer asks:

“Why not just use Auto Scaling instead?”

Correct answer:

Auto Scaling is better for application-level failure.
CloudWatch Auto Recovery is useful for single-instance workloads or licensing-restricted apps.

Final Summary

CloudWatch EC2 Recovery:

Monitors StatusCheckFailed_System
Triggers automatic recovery
Preserves instance configuration
Prevents downtime from hardware failures
Critical for enterprise production reliability

AWS 300 Realtime Scenario Based Interview Q&A