CloudWatch Metric Guide
StatusCheckFailedAmazon EC2 CloudWatch metric
StatusCheckFailed combines the results of both the instance status check (instance software and network configuration) and the system status check (underlying AWS host hardware). A value of 1 means at least one of these checks has failed.
What it measures
About StatusCheckFailed
StatusCheckFailed combines the results of both the instance status check (instance software and network configuration) and the system status check (underlying AWS host hardware). A value of 1 means at least one of these checks has failed.
| Namespace | AWS/EC2 |
| Metric name | StatusCheckFailed |
| Unit | Count |
| AWS docs | Official Amazon EC2 metrics reference |
Why this metric matters
StatusCheckFailed = 1 is an emergency signal, not a warning. Unlike CPU or memory metrics that indicate stress, a failed status check means the instance itself has a fundamental problem — either a kernel panic, a hardware failure on the underlying host, or a network misconfiguration that makes the instance unreachable.
AWS distinguishes two types of failures: instance status failures (problems in the OS, which a reboot or user intervention can often fix) and system status failures (problems with the underlying AWS hardware, which require AWS to move the instance to a healthy host). Both types result in StatusCheckFailed = 1. Both require immediate action. Neither resolves without intervention or AWS action.
Recommended alarm threshold for StatusCheckFailed
Recommended threshold
> 0 — alarm immediately
StatusCheckFailed has no gradual scale — it is either passing (0) or failing (1) per AWS documentation. A value of 1 represents a hard infrastructure failure that must be addressed immediately (ConvOps recommendation). Set this alarm with a 1-minute evaluation period and the minimum number of evaluation periods to minimize time-to-alert. There are no false positives in a production environment — StatusCheckFailed = 1 always warrants investigation.
Is your StatusCheckFailed alarm already set up correctly?
The free ConvOps Audit scans your CloudWatch setup and flags missing or misconfigured alarms — including StatusCheckFailed — in 5 minutes.
Common failures that show up in StatusCheckFailed
When StatusCheckFailed reaches an alarm threshold, these are the most common root causes — in order of how often ConvOps sees them across customer AWS accounts.
System status check failure — the underlying AWS host hardware has failed; AWS detects this and typically migrates the instance automatically, but the migration window causes downtime
Kernel panic or OS crash — the instance OS has crashed; a hard reboot initiated from the EC2 console or via API is required
Network interface misconfiguration — a change to the instance's network interface, routing table, or security group has made the instance unreachable for health checks
Corrupted disk — the instance's EBS root volume has encountered I/O errors, causing the OS to mount the filesystem in read-only mode and fail status checks
Memory exhaustion — the instance ran out of memory, the OOM killer terminated critical OS processes, and the instance is effectively non-functional without a reboot
How ConvOps debugs StatusCheckFailed alarms
When StatusCheckFailed triggers an alarm, ConvOps Diagnose reads CloudWatch Logs, CloudTrail (recent API calls, deploys, config changes), and the current resource state in parallel. It correlates these with AWS/EC2 metrics on the same resource — giving you a plain-English root cause with numbered fix options, sent to WhatsApp or Slack, usually within 60 seconds of the alarm firing.
Before any anomaly in StatusCheckFailed reaches you as a proactive alert (via ConvOps Watch), it passes through 9 verification checks: a Recovery check (did the metric self-heal?), an AWS Status check (is AWS itself having an incident?), a Deploy check (was there a recent Lambda update, ECS deploy, or RDS parameter change in the last 120 minutes?), a Quota check, an Infrastructure check, a Security check, a Flap check (has this metric been anomalous more than 5 times in the last 24 hours?), a TLS check, and a Vulnerability check. Only anomalies that pass all relevant checks reach you — with full context attached.
ConvOps Watch
Detects StatusCheckFailed anomalies with z-score against 30-day time-bucketed baselines. 9 verification checks before any alert.
ConvOps Diagnose
When a StatusCheckFailed alarm fires, reads logs, CloudTrail, and resource state. Sends root cause + fix options to WhatsApp or Slack.
ConvOps Audit
Scans your CloudWatch setup for missing or misconfigured StatusCheckFailed alarms. Free, 5-minute read-only scan.
Related Amazon EC2 metrics
StatusCheckFailed rarely fails in isolation. These metrics tend to correlate — monitor them together for complete Amazon EC2 coverage.
FAQ
Frequently asked questions about StatusCheckFailed
Common questions about setting up CloudWatch alarms for StatusCheckFailed in Amazon EC2.
What is the recommended CloudWatch alarm threshold for StatusCheckFailed?+
> 0 — alarm immediately. StatusCheckFailed has no gradual scale — it is either passing (0) or failing (1) per AWS documentation. A value of 1 represents a hard infrastructure failure that must be addressed immediately (ConvOps recommendation). Set this alarm with a 1-minute evaluation period and the minimum number of evaluation periods to minimize time-to-alert. There are no false positives in a production environment — StatusCheckFailed = 1 always warrants investigation.
Which CloudWatch namespace does StatusCheckFailed belong to?+
StatusCheckFailed is published in the AWS/EC2 namespace with a unit of Count. You can find it in the CloudWatch console under "Metrics" → "AWS/EC2". See the Amazon EC2 CloudWatch metrics reference in the AWS documentation.
Does ConvOps automatically create CloudWatch alarms for StatusCheckFailed?+
ConvOps does not create alarms for you by default — it debugs the alarms you already have (or identifies missing ones). The free ConvOps Audit scans your CloudWatch setup and tells you which Amazon EC2 resources are missing a StatusCheckFailed alarm. ConvOps Watch then monitors StatusCheckFailed using z-score anomaly detection against a 30-day baseline, running 9 verification checks before alerting you.
Can I use ConvOps without already having a StatusCheckFailed alarm set up?+
Yes. ConvOps Watch monitors StatusCheckFailed independently of your CloudWatch alarm configuration — it reads the metric directly from CloudWatch every 5 minutes on the Growth plan. If you run the free Audit first, it will tell you which resources need a StatusCheckFailed alarm and provide the copy-paste AWS CLI command to create it.