AWS debugging for engineering teams
Your alarm fired.
ConvOps already knows why.
ConvOps audits your AWS posture, watches for anomalies 24/7, and debugs every CloudWatch alarm before it reaches your phone — root cause, logs, and a numbered fix list. Via WhatsApp or Slack.
Individual is free forever. Growth $49/mo. No credit card required.
<60s
alarm to
diagnosis
9
verification
checks
0
agents to
install
Three tools, one platform
Audit. Watch. Diagnose.
Full debugging loop — find issues before they alert, catch the ones with no alarm, explain every alarm that fires.
Full debugging loop — find issues before they alert, catch the ones with no alarm, explain every alarm that fires.
Know what's broken before it fires.
One-click read-only scan. 50+ security, cost, and reliability checks. Shareable report. Free — no account required.
24/7 anomaly detection. No alarms needed.
Monitors every CloudWatch metric against a learned baseline. Runs 9 verification checks before waking you up.
Root cause the moment your alarm fires.
Reads your logs, CloudTrail, and resource state — sends a plain-English diagnosis to WhatsApp or Slack in under 60 seconds.
▶ updated 0s ago
50+ alarm checks.
One click. No agent.
Paste a read-only CloudFormation template into your AWS account. ConvOps scans every CloudWatch alarm, surfaces gaps, and returns a shareable hygiene score in under 5 minutes. Free, forever.
- Alarm classification — every alarm scored GOOD, NOISY, or SUPPRESSED-BAD with reasons
- Missing alarm coverage — RDS, Lambda, ECS, ALB gaps with copy-pasteable CLI fix commands
- Unmonitored resources — production resources with zero alarms, surfaced first
- Security findings — GuardDuty and Security Hub findings with severity context
- Service quota warnings — limits over 75% flagged before they cause a hard cap
- Hygiene score — 0–100 weighted by severity, comparable across accounts
No account required. Read-only access. Revoke in 30 seconds from IAM.
NEEDS ATTENTION —
SIGNIFICANT MONITORING GAPS
CloudWatch Audit Report
May 26, 2026 · 101s
293
TOTAL
103
HEALTHY
190
NOISY/BAD
1,674
MISSING
Priority fixes
Fix 25 dead alarms (always suppressed or no data)
These alarms will never fire — permanently suppressed or set to ignore missing data.
aws cloudwatch put-metric-alarm \ --alarm-name "PROD-ContactForm-Failures" \ --treat-missing-data breaching \ --evaluation-periods 2 --datapoints-to-alarm 2
Fix 165 noisy alarms
Noisy alarms cause alert fatigue and mask real incidents. Fix evaluation periods, thresholds, and TreatMissingData.
aws cloudwatch put-metric-alarm \ --alarm-name "PROD-Lambda-Function-Errors" \ --evaluation-periods 2 --datapoints-to-alarm 2 \ --treat-missing-data notBreaching
24/7 anomaly detection. No alarms needed.
ConvOps Watch monitors every CloudWatch metric against a learned baseline. When something breaks — even with no alarm configured — it catches it, verifies it's real, and sends you a full debug report.
- Detects anomalies across all CloudWatch metrics — no alarm needed.
- Runs 9 verification checks before alerting: transient spikes, deploys, AWS outages, quota exhaustion, and flapping history are all filtered out.
- Correlates VPC flow logs, CloudTrail events, and GuardDuty findings at detection time.
- Delivers a numbered action list — not just a threshold breach notification.
Messages stuck in egress queue — consumer appears to have stopped
SQS · arn:aws:sqs:eu-central-1:123456789012:prod-egress-queue
DETECTED
28 May 2026 03:30 UTC
DIAGNOSED
28 May 2026 03:31 UTC
ACCOUNT
123456789012
ENVIRONMENT
Production
No CloudWatch alarm was configured for this metric — ConvOps caught it through baseline analysis.
What we observed
- ApproximateNumberOfMessagesNotVisible rose to 4 (z-score 10) against a baseline of 0, at 03:30 UTC.
- ApproximateAgeOfOldestMessage hit 484 seconds (z-score 10) at the same timestamp — messages are aging without being processed.
- A prior anomaly at 03:25 UTC showed 2 in-flight messages and oldest-message age of 181 seconds, confirming the issue is escalating.
- VPC flow logs show 10 active packet-reject spikes, ranging from 100 to 2,671 rejected packets, all currently active.
- An AssumeRole event for WorkflowIAMRole occurred at 03:14 UTC — 16 minutes before the anomaly was first detected.
What we checked before alerting
- Metric persistence: still anomalous at re-check (value 3.8, baseline 0.0) — confirmed not a transient spike.
- Deployments: no UpdateService, UpdateFunctionCode, or equivalent CloudTrail events in the last 2 hours — not deployment-related.
- Service quotas: all checked quotas below 80% utilisation — quota exhaustion ruled out.
- Infrastructure changes: no scaling events, task restarts, or config changes in the last 60 minutes.
- Security tooling: GuardDuty and Security Hub are not enabled — automated threat detection unavailable.
- Flapping history: 5 occurrences in 24 hours — this is not a chronic noisy metric; treated as a genuine new event.
- Vulnerabilities: AWS Inspector found no active findings for this resource.
What to check first
- 1.Check the consumer (Lambda or ECS service) processing this queue: in CloudWatch Metrics, inspect NumberOfMessagesSent, NumberOfMessagesDeleted, and ApproximateNumberOfMessagesVisible for prod-egress-queue over the window 03:00–03:40 UTC to confirm whether deletions stopped.
- 2.Review CloudWatch Logs for the consumer service between 03:10–03:35 UTC. Run a Logs Insights query: fields @timestamp, @message | filter @message like /ERROR|error|timeout|refused|reject/ | sort @timestamp desc | limit 50
- 3.Investigate the VPC packet-reject findings: identify which security group or NACL is dropping traffic and whether the consumer's outbound or inbound connections to the SQS endpoint are affected. Check the VPC flow logs for the consumer's ENI around 03:10–03:30 UTC.
- 4.Examine the AssumeRole event at 03:14 UTC for WorkflowIAMRole: verify in CloudTrail whether subsequent API calls from that role succeeded or returned AccessDenied errors, which could indicate a permissions failure blocking message processing.
Recommended action
Check the queue consumer's logs and CloudWatch metrics from 03:10–03:35 UTC for errors or stopped processing.
🔧 Prevent this anomaly permanently
Run this in CloudShell to create a CloudWatch alarm. Once in place, ConvOps routes future occurrences through the reactive pipeline automatically.
aws cloudwatch put-metric-alarm \ --alarm-name "prod-egress-queue-inflight-high" \ --metric-name ApproximateNumberOfMessagesNotVisible \ --namespace AWS/SQS --statistic Average \ --period 300 --threshold 100 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 2 \ --datapoints-to-alarm 2 \ --treat-missing-data notBreaching \ --dimensions Name=QueueName,Value=prod-egress-queue
Your alarm fires. You already know why.
When a CloudWatch alarm fires, ConvOps reads your logs, CloudTrail, and resource state in parallel — then sends a plain-English diagnosis to WhatsApp or Slack before you've opened a single tab.
Without ConvOps
- 1.Alarm fires at 3am
- 2.Open console, hunt through logs
- 3.Check CloudTrail manually
- 4.Cross-reference metrics by hand
- 5.30+ min to identify root cause
With ConvOps
- 1.Alarm fires at 3am
- 2.Diagnosis on WhatsApp in < 60s
- 3.Root cause + logs excerpt
- 4.Numbered fix steps, ready to copy
- 5.Reply RESOLVE — done
ConvOps
online
🚨 PROD-Lambda-Function-Errors fired
prod-api-order-processor · us-east-1 · production
Root cause (91% confidence) Memory exhaustion — Lambda hitting 512 MB limit under burst load. 340+ invocations timed out.
[ERROR] Task timed out after 900.00s
MemorySize: 512 MB · Used: 510 MB
Errors/min: 47 ↑ (threshold: 5)💥 Impact: ~340 orders unprocessed in last 15 min
Fix steps:
1 → Increase Lambda memory to 1024 MB
2 → Enable SQS dead-letter queue
3 → Reduce batch size: 100 → 25
Reply: ACKNOWLEDGE · RESOLVE · SNOOZE 30
ACKNOWLEDGE
✓ Acknowledged · 03:15 AM
Auto-escalating to on-call in 20 min if not resolved. Reply RESOLVE when fixed.
Reply commands
One word. Full control.
Reply to any ConvOps message from WhatsApp or Slack. No dashboards, no runbooks, no console.
ACKNOWLEDGE
You're on it. Stops reminder notifications.
INVESTIGATE
Triggers deeper analysis with full log context.
RESOLVE
Marks it resolved and logs time-to-fix.
ADJUST
AI-recommended threshold change. Reply YES to apply.
SILENCE
Suppresses a noisy alarm permanently. WATCH to re-enable.
WATCH
Re-enables a silenced alarm.
STATUS
Live summary of active alarms and open diagnoses.
Security and access
Read-only. Auditable. Revocable.
ConvOps is designed for security-conscious engineering teams. Everything ConvOps can and cannot do is defined in CloudFormation and visible in your CloudTrail.
Read-only by default
ConvOps uses a read-only IAM role deployed via a CloudFormation template you review before deploying. No agents, no daemons, no code running in your account.
No sensitive data stored
ConvOps reads logs and metrics to generate a diagnosis, then discards the raw data. We store the diagnosis result, not your log lines.
Revoke in 30 seconds
Delete the CloudFormation stack from your IAM console and ConvOps loses all access immediately. No support ticket required.
AWS-native access control
Permissions are scoped by CloudFormation resource policy. Every API call ConvOps makes is visible in your CloudTrail logs.
SOC 2 Type II in progress
ConvOps is pursuing SOC 2 Type II certification. Security questionnaire available on request.
Optional write access
If you want ConvOps to execute fix commands (reply YES to apply), you deploy a second limited-write IAM role separately. Opt-in only.
What ConvOps is not
It is a debugging layer, not another tool to manage.
ConvOps is purpose-built to sit between your existing AWS setup and the engineers who need to act on problems. No new dashboards to maintain, no agents to update.
Not a monitoring tool.
ConvOps doesn't collect metrics — CloudWatch already does that. ConvOps makes those metrics useful by detecting anomalies, filtering noise, and explaining what happened.
Not an agent.
Nothing runs in your account except a read-only IAM role. No sidecar containers, no installed software, no persistent processes.
Not a dashboard.
ConvOps sends you a message when something is wrong. You don't log in to check a dashboard; the diagnosis comes to you.
Not a replacement for your on-call rotation.
ConvOps works with PagerDuty, Opsgenie, or any webhook-based on-call tool. Enriched alerts go through your existing rotation.
Pricing
Individual is free.
Growth is $49/mo.
Audit, Watch, and Diagnose included in every tier. No caps on diagnoses, alarms, or usage.
Individual
1 AWS account. Audit, Watch, and Diagnose included. No credit card. Free forever.
Growth
₹1,999/mo for India
Up to 5 AWS accounts. Every feature from Individual plus priority support.
More than 5 accounts? Email [email protected] and we'll work it out.
FAQ
Common questions.
How ConvOps connects to AWS, what it covers, and how it sits alongside the tools you already use.
Your next 3am alarm. Fixed before you wake up.
Individual is free forever. Connect your AWS account in under 10 minutes. Watch starts debugging immediately — no alarms to configure, no dashboards to build.
Individual is free forever. Growth $49/mo. Cancel any time.