Automatic CloudWatch alarm diagnosis

Your CloudWatch alarm fired. Here's what it actually means.

Q: What does 'CloudWatch alarm diagnosis' mean?

When a CloudWatch alarm fires, ConvOps automatically investigates — pulling CloudWatch Logs, CloudTrail events, resource state, and correlated metrics — and produces a structured explanation of what triggered it, why, and what to do next.

Q: Does ConvOps replace CloudWatch?

No. ConvOps connects to your existing CloudWatch setup. Your alarms keep working exactly as before. ConvOps adds a diagnosis layer on top — intercepting the alarm, investigating the cause, and delivering a plain-English summary instead of a raw alarm notification.

Q: How accurate is the diagnosis?

ConvOps includes a confidence field in every diagnosis (high / medium / low). When evidence is insufficient, it says so explicitly rather than guessing. The diagnosis is grounded in actual CloudWatch data, not pattern matching.

Q: Which AWS services does ConvOps diagnose?

EC2, ECS, EKS, Lambda, RDS, DynamoDB, ElastiCache, ALB, NLB, API Gateway, CloudFront, SQS, SNS, Kinesis, Step Functions, and billing/cost anomalies. Any service with CloudWatch metrics.

Q: How long does a diagnosis take?

Under 60 seconds from alarm trigger to WhatsApp or Slack message. For proactive anomaly detection (before alarms fire), the pipeline runs every 5 minutes.

CloudWatch tells you that something is wrong. ConvOps tells you what is wrong — by checking your logs, CloudTrail events, and resource state automatically, then delivering a plain-English diagnosis to WhatsApp or Slack before you open a single dashboard.

Try ConvOps free →CloudWatch alert management →

What you get instead of a raw alarm

Raw CloudWatch alarm (what you have now)

AlarmName: api-gateway-5xx-rate

AlarmDescription: 5xx errors exceeded threshold

AWSAccountId: 123456789012

NewStateValue: ALARM

NewStateReason: Threshold Crossed: 1 datapoint [34.2] was greater than the threshold (5.0)

...and that's it.

You still need to investigate. That takes 20–40 minutes.

ConvOps diagnosis (what you get instead)

⚠️ HIGH — api-gateway-5xx-rate

5xx rate jumped from 0.3% → 34% at 02:13 UTC

Root cause

ECS task api-service crashed after deploy at 02:11 UTC (commit abc1234). No downstream DB issues. No AWS Health events. Memory limit exceeded — exit code 137.

Do this first

Roll back to the previous task definition revision.

1 — Roll back deploy

2 — Restart task

3 — Scale out (+1 task)

Delivered in <60s. Reply with a number to act.

9 checks run before you get the message

ConvOps runs every alarm through 9 verification steps. Most alerts get suppressed here — only genuinely actionable issues make it through to your phone.

Has it already recovered?

Suppresses if the metric self-healed — no false alarm.

Is AWS having an outage?

Checks AWS Health events for service disruptions in your region.

Was there a recent deploy?

CloudTrail scan for code or config changes in the last 2 hours.

Are you hitting a service quota?

Flags if usage is approaching AWS service limits.

Did the resource change recently?

Checks for recent resource modifications that match the timing.

Is there a security finding?

Correlates with GuardDuty and Security Hub findings.

Is this alarm flapping?

Suppresses chronically recurring noise on borderline anomalies.

Is a TLS cert expiring?

Flags certificates expiring within 7 days.

Is there an active vulnerability?

Checks AWS Inspector findings for the affected resource.

Every diagnosis includes

Headline

One-line plain-English summary, max 15 words.

What we observed

3–6 facts with actual values — CPUUtilization at 94%, not 'high CPU'.

What we checked

Every verification step listed honestly — including steps that found nothing.

Likely cause

Null if evidence is insufficient. Never guessed.

Confidence

High / medium / low — so you know how much to trust it.

Recommended action

The single most important next step, not a list of possibilities.

What we don't know

Honest gaps that would help diagnose further.

Severity

Critical / high / medium / low — Claude's assessment, not just the z-score.

Frequently asked questions

What does 'CloudWatch alarm diagnosis' mean?: When a CloudWatch alarm fires, ConvOps automatically investigates — pulling CloudWatch Logs, CloudTrail events, resource state, and correlated metrics — and produces a structured explanation of what triggered it, why, and what to do next.
Does ConvOps replace CloudWatch?: No. ConvOps connects to your existing CloudWatch setup. Your alarms keep working exactly as before. ConvOps adds a diagnosis layer on top — intercepting the alarm, investigating the cause, and delivering a plain-English summary instead of a raw alarm notification.
How accurate is the diagnosis?: ConvOps includes a confidence field in every diagnosis (high / medium / low). When evidence is insufficient, it says so explicitly rather than guessing. The diagnosis is grounded in actual CloudWatch data, not pattern matching.
Which AWS services does ConvOps diagnose?: EC2, ECS, EKS, Lambda, RDS, DynamoDB, ElastiCache, ALB, NLB, API Gateway, CloudFront, SQS, SNS, Kinesis, Step Functions, and billing/cost anomalies. Any service with CloudWatch metrics.
How long does a diagnosis take?: Under 60 seconds from alarm trigger to WhatsApp or Slack message. For proactive anomaly detection (before alarms fire), the pipeline runs every 5 minutes.

See what your next CloudWatch alarm actually means.

Connect in 10 minutes. Diagnosis delivered to your phone.

Start free — no card needed →

AWS incident response without SRE AWS alerts to WhatsApp CloudWatch alert management CloudWatch alarm with no context Root cause without an SRE