CloudWatch tells you that something is wrong. ConvOps tells you what is wrong — by checking your logs, CloudTrail events, and resource state automatically, then delivering a plain-English diagnosis to WhatsApp or Slack before you open a single dashboard.
Raw CloudWatch alarm (what you have now)
AlarmName: api-gateway-5xx-rate
AlarmDescription: 5xx errors exceeded threshold
AWSAccountId: 123456789012
NewStateValue: ALARM
NewStateReason: Threshold Crossed: 1 datapoint [34.2] was greater than the threshold (5.0)
...and that's it.
You still need to investigate. That takes 20–40 minutes.
ConvOps diagnosis (what you get instead)
⚠️ HIGH — api-gateway-5xx-rate
5xx rate jumped from 0.3% → 34% at 02:13 UTC
Root cause
ECS task api-service crashed after deploy at 02:11 UTC (commit abc1234). No downstream DB issues. No AWS Health events. Memory limit exceeded — exit code 137.
Do this first
Roll back to the previous task definition revision.
1 — Roll back deploy
2 — Restart task
3 — Scale out (+1 task)
Delivered in <60s. Reply with a number to act.
ConvOps runs every alarm through 9 verification steps. Most alerts get suppressed here — only genuinely actionable issues make it through to your phone.
Has it already recovered?
Suppresses if the metric self-healed — no false alarm.
Is AWS having an outage?
Checks AWS Health events for service disruptions in your region.
Was there a recent deploy?
CloudTrail scan for code or config changes in the last 2 hours.
Are you hitting a service quota?
Flags if usage is approaching AWS service limits.
Did the resource change recently?
Checks for recent resource modifications that match the timing.
Is there a security finding?
Correlates with GuardDuty and Security Hub findings.
Is this alarm flapping?
Suppresses chronically recurring noise on borderline anomalies.
Is a TLS cert expiring?
Flags certificates expiring within 7 days.
Is there an active vulnerability?
Checks AWS Inspector findings for the affected resource.
Headline
One-line plain-English summary, max 15 words.
What we observed
3–6 facts with actual values — CPUUtilization at 94%, not 'high CPU'.
What we checked
Every verification step listed honestly — including steps that found nothing.
Likely cause
Null if evidence is insufficient. Never guessed.
Confidence
High / medium / low — so you know how much to trust it.
Recommended action
The single most important next step, not a list of possibilities.
What we don't know
Honest gaps that would help diagnose further.
Severity
Critical / high / medium / low — Claude's assessment, not just the z-score.
Connect in 10 minutes. Diagnosis delivered to your phone.
Related