AWS debugging for engineering teams

Your alarm fired.
ConvOps already knows why.

ConvOps audits your AWS posture, watches for anomalies 24/7, and debugs every CloudWatch alarm before it reaches your phone — root cause, logs, and a numbered fix list. Via WhatsApp or Slack.

Individual is free forever. Growth $49/mo. No credit card required.

<60s

alarm to
diagnosis

9

verification
checks

0

agents to
install

Three tools, one platform

Audit. Watch. Diagnose.

Full debugging loop — find issues before they alert, catch the ones with no alarm, explain every alarm that fires.

Audit

Know what's broken before it fires.

One-click read-only scan. 50+ security, cost, and reliability checks. Shareable report. Free — no account required.

Watch

24/7 anomaly detection. No alarms needed.

Monitors every CloudWatch metric against a learned baseline. Runs 9 verification checks before waking you up.

Diagnose

Root cause the moment your alarm fires.

Reads your logs, CloudTrail, and resource state — sends a plain-English diagnosis to WhatsApp or Slack in under 60 seconds.

convops — liveLIVE

updated 0s ago

ConvOps Audit

50+ alarm checks.
One click. No agent.

Paste a read-only CloudFormation template into your AWS account. ConvOps scans every CloudWatch alarm, surfaces gaps, and returns a shareable hygiene score in under 5 minutes. Free, forever.

  • Alarm classification — every alarm scored GOOD, NOISY, or SUPPRESSED-BAD with reasons
  • Missing alarm coverage — RDS, Lambda, ECS, ALB gaps with copy-pasteable CLI fix commands
  • Unmonitored resources — production resources with zero alarms, surfaced first
  • Security findings — GuardDuty and Security Hub findings with severity context
  • Service quota warnings — limits over 75% flagged before they cause a hard cap
  • Hygiene score — 0–100 weighted by severity, comparable across accounts

No account required. Read-only access. Revoke in 30 seconds from IAM.

NEEDS ATTENTION —
SIGNIFICANT MONITORING GAPS

0
64
100

CloudWatch Audit Report

May 26, 2026 · 101s

293

TOTAL

103

HEALTHY

190

NOISY/BAD

1,674

MISSING

Priority fixes

1

Fix 25 dead alarms (always suppressed or no data)

⏱ ~10 min+15 pts

These alarms will never fire — permanently suppressed or set to ignore missing data.

aws cloudwatch put-metric-alarm \
  --alarm-name "PROD-ContactForm-Failures" \
  --treat-missing-data breaching \
  --evaluation-periods 2 --datapoints-to-alarm 2
2

Fix 165 noisy alarms

⏱ ~10 min+15 pts

Noisy alarms cause alert fatigue and mask real incidents. Fix evaluation periods, thresholds, and TreatMissingData.

aws cloudwatch put-metric-alarm \
  --alarm-name "PROD-Lambda-Function-Errors" \
  --evaluation-periods 2 --datapoints-to-alarm 2 \
  --treat-missing-data notBreaching
ConvOps Watch

24/7 anomaly detection. No alarms needed.

ConvOps Watch monitors every CloudWatch metric against a learned baseline. When something breaks — even with no alarm configured — it catches it, verifies it's real, and sends you a full debug report.

  • Detects anomalies across all CloudWatch metrics — no alarm needed.
  • Runs 9 verification checks before alerting: transient spikes, deploys, AWS outages, quota exhaustion, and flapping history are all filtered out.
  • Correlates VPC flow logs, CloudTrail events, and GuardDuty findings at detection time.
  • Delivers a numbered action list — not just a threshold breach notification.
HIGH · ConvOps ML Alert

Messages stuck in egress queue — consumer appears to have stopped

SQS · arn:aws:sqs:eu-central-1:123456789012:prod-egress-queue

DETECTED

28 May 2026 03:30 UTC

DIAGNOSED

28 May 2026 03:31 UTC

ACCOUNT

123456789012

ENVIRONMENT

Production

No CloudWatch alarm was configured for this metric — ConvOps caught it through baseline analysis.

What we observed

  • ApproximateNumberOfMessagesNotVisible rose to 4 (z-score 10) against a baseline of 0, at 03:30 UTC.
  • ApproximateAgeOfOldestMessage hit 484 seconds (z-score 10) at the same timestamp — messages are aging without being processed.
  • A prior anomaly at 03:25 UTC showed 2 in-flight messages and oldest-message age of 181 seconds, confirming the issue is escalating.
  • VPC flow logs show 10 active packet-reject spikes, ranging from 100 to 2,671 rejected packets, all currently active.
  • An AssumeRole event for WorkflowIAMRole occurred at 03:14 UTC — 16 minutes before the anomaly was first detected.

What we checked before alerting

  • Metric persistence: still anomalous at re-check (value 3.8, baseline 0.0) — confirmed not a transient spike.
  • Deployments: no UpdateService, UpdateFunctionCode, or equivalent CloudTrail events in the last 2 hours — not deployment-related.
  • Service quotas: all checked quotas below 80% utilisation — quota exhaustion ruled out.
  • Infrastructure changes: no scaling events, task restarts, or config changes in the last 60 minutes.
  • Security tooling: GuardDuty and Security Hub are not enabled — automated threat detection unavailable.
  • Flapping history: 5 occurrences in 24 hours — this is not a chronic noisy metric; treated as a genuine new event.
  • Vulnerabilities: AWS Inspector found no active findings for this resource.

What to check first

  1. 1.Check the consumer (Lambda or ECS service) processing this queue: in CloudWatch Metrics, inspect NumberOfMessagesSent, NumberOfMessagesDeleted, and ApproximateNumberOfMessagesVisible for prod-egress-queue over the window 03:00–03:40 UTC to confirm whether deletions stopped.
  2. 2.Review CloudWatch Logs for the consumer service between 03:10–03:35 UTC. Run a Logs Insights query: fields @timestamp, @message | filter @message like /ERROR|error|timeout|refused|reject/ | sort @timestamp desc | limit 50
  3. 3.Investigate the VPC packet-reject findings: identify which security group or NACL is dropping traffic and whether the consumer's outbound or inbound connections to the SQS endpoint are affected. Check the VPC flow logs for the consumer's ENI around 03:10–03:30 UTC.
  4. 4.Examine the AssumeRole event at 03:14 UTC for WorkflowIAMRole: verify in CloudTrail whether subsequent API calls from that role succeeded or returned AccessDenied errors, which could indicate a permissions failure blocking message processing.

Recommended action

Check the queue consumer's logs and CloudWatch metrics from 03:10–03:35 UTC for errors or stopped processing.

🔧 Prevent this anomaly permanently

Run this in CloudShell to create a CloudWatch alarm. Once in place, ConvOps routes future occurrences through the reactive pipeline automatically.

aws cloudwatch put-metric-alarm \
  --alarm-name "prod-egress-queue-inflight-high" \
  --metric-name ApproximateNumberOfMessagesNotVisible \
  --namespace AWS/SQS --statistic Average \
  --period 300 --threshold 100 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --datapoints-to-alarm 2 \
  --treat-missing-data notBreaching \
  --dimensions Name=QueueName,Value=prod-egress-queue
ConvOps Diagnose

Your alarm fires. You already know why.

When a CloudWatch alarm fires, ConvOps reads your logs, CloudTrail, and resource state in parallel — then sends a plain-English diagnosis to WhatsApp or Slack before you've opened a single tab.

Without ConvOps

  1. 1.Alarm fires at 3am
  2. 2.Open console, hunt through logs
  3. 3.Check CloudTrail manually
  4. 4.Cross-reference metrics by hand
  5. 5.30+ min to identify root cause

With ConvOps

  1. 1.Alarm fires at 3am
  2. 2.Diagnosis on WhatsApp in < 60s
  3. 3.Root cause + logs excerpt
  4. 4.Numbered fix steps, ready to copy
  5. 5.Reply RESOLVE — done
C

ConvOps

online

TODAY

🚨 PROD-Lambda-Function-Errors fired

prod-api-order-processor · us-east-1 · production

Root cause (91% confidence) Memory exhaustion — Lambda hitting 512 MB limit under burst load. 340+ invocations timed out.

[ERROR] Task timed out after 900.00s MemorySize: 512 MB · Used: 510 MB Errors/min: 47 ↑ (threshold: 5)

💥 Impact: ~340 orders unprocessed in last 15 min

Fix steps:

1 → Increase Lambda memory to 1024 MB

2 → Enable SQS dead-letter queue

3 → Reduce batch size: 100 → 25

Reply: ACKNOWLEDGE · RESOLVE · SNOOZE 30

convops.io · Alert #lmb-472903:14

ACKNOWLEDGE

03:15✓✓

✓ Acknowledged · 03:15 AM

Auto-escalating to on-call in 20 min if not resolved. Reply RESOLVE when fixed.

03:15

Reply commands

One word. Full control.

Reply to any ConvOps message from WhatsApp or Slack. No dashboards, no runbooks, no console.

ACKNOWLEDGE

You're on it. Stops reminder notifications.

INVESTIGATE

Triggers deeper analysis with full log context.

RESOLVE

Marks it resolved and logs time-to-fix.

ADJUST

AI-recommended threshold change. Reply YES to apply.

SILENCE

Suppresses a noisy alarm permanently. WATCH to re-enable.

WATCH

Re-enables a silenced alarm.

STATUS

Live summary of active alarms and open diagnoses.

Security and access

Read-only. Auditable. Revocable.

ConvOps is designed for security-conscious engineering teams. Everything ConvOps can and cannot do is defined in CloudFormation and visible in your CloudTrail.

Read-only by default

ConvOps uses a read-only IAM role deployed via a CloudFormation template you review before deploying. No agents, no daemons, no code running in your account.

No sensitive data stored

ConvOps reads logs and metrics to generate a diagnosis, then discards the raw data. We store the diagnosis result, not your log lines.

Revoke in 30 seconds

Delete the CloudFormation stack from your IAM console and ConvOps loses all access immediately. No support ticket required.

AWS-native access control

Permissions are scoped by CloudFormation resource policy. Every API call ConvOps makes is visible in your CloudTrail logs.

SOC 2 Type II in progress

ConvOps is pursuing SOC 2 Type II certification. Security questionnaire available on request.

Optional write access

If you want ConvOps to execute fix commands (reply YES to apply), you deploy a second limited-write IAM role separately. Opt-in only.

What ConvOps is not

It is a debugging layer, not another tool to manage.

ConvOps is purpose-built to sit between your existing AWS setup and the engineers who need to act on problems. No new dashboards to maintain, no agents to update.

Not a monitoring tool.

ConvOps doesn't collect metrics — CloudWatch already does that. ConvOps makes those metrics useful by detecting anomalies, filtering noise, and explaining what happened.

Not an agent.

Nothing runs in your account except a read-only IAM role. No sidecar containers, no installed software, no persistent processes.

Not a dashboard.

ConvOps sends you a message when something is wrong. You don't log in to check a dashboard; the diagnosis comes to you.

Not a replacement for your on-call rotation.

ConvOps works with PagerDuty, Opsgenie, or any webhook-based on-call tool. Enriched alerts go through your existing rotation.

Pricing

Individual is free.Growth is $49/mo.

Audit, Watch, and Diagnose included in every tier. No caps on diagnoses, alarms, or usage.

Individual

Free

1 AWS account. Audit, Watch, and Diagnose included. No credit card. Free forever.

Growth

$49/mo

₹1,999/mo for India

Up to 5 AWS accounts. Every feature from Individual plus priority support.

More than 5 accounts? Email [email protected] and we'll work it out.

FAQ

Common questions.

How ConvOps connects to AWS, what it covers, and how it sits alongside the tools you already use.

Your next 3am alarm. Fixed before you wake up.

Individual is free forever. Connect your AWS account in under 10 minutes. Watch starts debugging immediately — no alarms to configure, no dashboards to build.

Individual is free forever. Growth $49/mo. Cancel any time.