What are common causes of Errors going high in AWS Lambda?

Unhandled exception in function code — an uncaught TypeError, KeyError, or null reference causes the invocation to fail silently Timeout — function logic takes longer than the configured timeout, causing Lambda to terminate it mid-execution Out-of-memory — function processes a larger payload than its memory allocation can handle Dependency failure — the function calls an RDS, DynamoDB, or external API that is unavailable, and the function code doesn't handle the error gracefully Cold start issue on VPC-attached Lambda — ENI attachment delays cause timeouts on the first invocation after a scale-out

CloudWatch Metric Guide

AWS/Lambda/ErrorsCount

ErrorsAWS Lambda CloudWatch metric

Q: What is the recommended CloudWatch alarm threshold for Errors?

Recommended threshold: > 0 in any 5-minute window for critical functions; > N errors/minute for high-volume functions (set N based on your acceptable error rate). For critical functions handling user-facing or transactional workloads, zero tolerance is the correct starting point (ConvOps recommendation). For high-volume async processing functions, a percentage-based error rate threshold (errors / invocations > 1%) may be more appropriate to avoid false positives on transient failures. AWS documentation recommends monitoring both Errors and the derived error rate for complete coverage.

Q: Which CloudWatch namespace does Errors belong to?

Errors is published in the AWS/Lambda CloudWatch namespace, with a unit of Count. See AWS documentation: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html

Errors counts the number of Lambda invocations that resulted in a function error — including exceptions thrown by the function code and runtime errors (timeout, out-of-memory, handler not found).

Audit your AWS for free Learn about ConvOps Watch

What it measures

About Errors

Errors counts the number of Lambda invocations that resulted in a function error — including exceptions thrown by the function code and runtime errors (timeout, out-of-memory, handler not found).

Namespace	AWS/Lambda
Metric name	Errors
Unit	Count
AWS docs	Official AWS Lambda metrics reference

Why this metric matters

Lambda Errors is the single most critical metric for serverless workloads. Unlike application errors that might be logged somewhere and noticed eventually, Lambda Errors are AWS-level failures — the function did not complete its intended work, and the caller received an error or retry.

For Lambda functions that process SQS, SNS, or EventBridge events, errors trigger automatic retries (for asynchronous invocations) or dead-letter queues. This means errors can silently compound: one Lambda error might generate two or three subsequent retry invocations, all of which error, all of which generate more retries. An unchecked error rate can drain your dead-letter queue and drop events permanently.

Recommended alarm threshold for Errors

Recommended threshold

> 0 in any 5-minute window for critical functions; > N errors/minute for high-volume functions (set N based on your acceptable error rate)

For critical functions handling user-facing or transactional workloads, zero tolerance is the correct starting point (ConvOps recommendation). For high-volume async processing functions, a percentage-based error rate threshold (errors / invocations > 1%) may be more appropriate to avoid false positives on transient failures. AWS documentation recommends monitoring both Errors and the derived error rate for complete coverage.

Is your Errors alarm already set up correctly?

The free ConvOps Audit scans your CloudWatch setup and flags missing or misconfigured alarms — including Errors — in 5 minutes.

Run a free audit →

Common failures that show up in Errors

When Errors reaches an alarm threshold, these are the most common root causes — in order of how often ConvOps sees them across customer AWS accounts.

Unhandled exception in function code — an uncaught TypeError, KeyError, or null reference causes the invocation to fail silently
Timeout — function logic takes longer than the configured timeout, causing Lambda to terminate it mid-execution
Out-of-memory — function processes a larger payload than its memory allocation can handle
Dependency failure — the function calls an RDS, DynamoDB, or external API that is unavailable, and the function code doesn't handle the error gracefully
Cold start issue on VPC-attached Lambda — ENI attachment delays cause timeouts on the first invocation after a scale-out

How ConvOps debugs Errors alarms

When Errors triggers an alarm, ConvOps Diagnose reads CloudWatch Logs, CloudTrail (recent API calls, deploys, config changes), and the current resource state in parallel. It correlates these with AWS/Lambda metrics on the same resource — giving you a plain-English root cause with numbered fix options, sent to WhatsApp or Slack, usually within 60 seconds of the alarm firing.

Before any anomaly in Errors reaches you as a proactive alert (via ConvOps Watch), it passes through 9 verification checks: a Recovery check (did the metric self-heal?), an AWS Status check (is AWS itself having an incident?), a Deploy check (was there a recent Lambda update, ECS deploy, or RDS parameter change in the last 120 minutes?), a Quota check, an Infrastructure check, a Security check, a Flap check (has this metric been anomalous more than 5 times in the last 24 hours?), a TLS check, and a Vulnerability check. Only anomalies that pass all relevant checks reach you — with full context attached.

ConvOps Watch

Detects Errors anomalies with z-score against 30-day time-bucketed baselines. 9 verification checks before any alert.

ConvOps Diagnose

When a Errors alarm fires, reads logs, CloudTrail, and resource state. Sends root cause + fix options to WhatsApp or Slack.

ConvOps Audit

Scans your CloudWatch setup for missing or misconfigured Errors alarms. Free, 5-minute read-only scan.

See how ConvOps debugs your AWS →

Related AWS Lambda metrics

Errors rarely fails in isolation. These metrics tend to correlate — monitor them together for complete AWS Lambda coverage.

Duration Throttles ConcurrentExecutions

FAQ

Frequently asked questions about Errors

Common questions about setting up CloudWatch alarms for Errors in AWS Lambda.

What is the recommended CloudWatch alarm threshold for Errors?+

> 0 in any 5-minute window for critical functions; > N errors/minute for high-volume functions (set N based on your acceptable error rate). For critical functions handling user-facing or transactional workloads, zero tolerance is the correct starting point (ConvOps recommendation). For high-volume async processing functions, a percentage-based error rate threshold (errors / invocations > 1%) may be more appropriate to avoid false positives on transient failures. AWS documentation recommends monitoring both Errors and the derived error rate for complete coverage.

Which CloudWatch namespace does Errors belong to?+

Errors is published in the AWS/Lambda namespace with a unit of Count. You can find it in the CloudWatch console under "Metrics" → "AWS/Lambda". See the AWS Lambda CloudWatch metrics reference in the AWS documentation.

Does ConvOps automatically create CloudWatch alarms for Errors?+

ConvOps does not create alarms for you by default — it debugs the alarms you already have (or identifies missing ones). The free ConvOps Audit scans your CloudWatch setup and tells you which AWS Lambda resources are missing a Errors alarm. ConvOps Watch then monitors Errors using z-score anomaly detection against a 30-day baseline, running 9 verification checks before alerting you.

Can I use ConvOps without already having a Errors alarm set up?+

Yes. ConvOps Watch monitors Errors independently of your CloudWatch alarm configuration — it reads the metric directly from CloudWatch every 5 minutes on the Growth plan. If you run the free Audit first, it will tell you which resources need a Errors alarm and provide the copy-paste AWS CLI command to create it.