What are common causes of Duration going high in AWS Lambda?

Downstream API degradation — a call to an external service or RDS that normally completes in 50ms starts taking 5 seconds due to the dependency being under load Cold start + VPC initialization — VPC-attached Lambda functions incur ENI attachment time on cold starts, adding 1–10 seconds to the first invocation after scale-out Large payload processing — function receives a larger S3 object or SQS message than expected and processing time grows non-linearly N+1 database query pattern — each item in a batch triggers a separate query, so Duration scales linearly with batch size Memory-constrained processing — insufficient memory allocation forces the function to use more CPU time for the same computation (Lambda CPU scales with memory allocation)

CloudWatch Metric Guide

AWS/Lambda/DurationMilliseconds

DurationAWS Lambda CloudWatch metric

Q: What is the recommended CloudWatch alarm threshold for Duration?

Recommended threshold: p99 Duration > 80% of the function's configured timeout. Alarms on average Duration miss the long tail — a function with a 30-second timeout might average 500ms but have p99 at 25 seconds, meaning 1% of invocations are about to timeout. The 80% threshold on p99 (ConvOps recommendation) gives a buffer to investigate before timeouts become errors. Monitor the p99 statistic specifically, not the default Average.

Q: Which CloudWatch namespace does Duration belong to?

Duration is published in the AWS/Lambda CloudWatch namespace, with a unit of Milliseconds. See AWS documentation: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html

Duration measures the elapsed wall-clock time from when the Lambda function handler begins executing to when it returns or times out. CloudWatch publishes minimum, maximum, average, and percentile statistics.

Audit your AWS for free Learn about ConvOps Watch

What it measures

About Duration

Namespace	AWS/Lambda
Metric name	Duration
Unit	Milliseconds
AWS docs	Official AWS Lambda metrics reference

Why this metric matters

Duration has two failure modes. The obvious one is timeout: if Duration reaches the function's configured timeout, Lambda terminates the invocation and records an error. The subtler one is cost and performance creep — a function that used to run in 200ms now runs in 2 seconds because a downstream dependency slowed down, and nobody noticed because it didn't error.

For synchronous Lambda (API Gateway, ALB), Duration directly equals latency felt by the user. For async Lambda, Duration determines how many invocations fit inside the concurrency limit — a function that doubles in duration effectively halves your throughput capacity before you hit throttle limits. Monitoring p99 Duration separately from average Duration catches the worst-case outliers that average metrics hide.

Recommended alarm threshold for Duration

Recommended threshold

p99 Duration > 80% of the function's configured timeout

Alarms on average Duration miss the long tail — a function with a 30-second timeout might average 500ms but have p99 at 25 seconds, meaning 1% of invocations are about to timeout. The 80% threshold on p99 (ConvOps recommendation) gives a buffer to investigate before timeouts become errors. Monitor the p99 statistic specifically, not the default Average.

Is your Duration alarm already set up correctly?

The free ConvOps Audit scans your CloudWatch setup and flags missing or misconfigured alarms — including Duration — in 5 minutes.

Run a free audit →

Common failures that show up in Duration

When Duration reaches an alarm threshold, these are the most common root causes — in order of how often ConvOps sees them across customer AWS accounts.

Downstream API degradation — a call to an external service or RDS that normally completes in 50ms starts taking 5 seconds due to the dependency being under load
Cold start + VPC initialization — VPC-attached Lambda functions incur ENI attachment time on cold starts, adding 1–10 seconds to the first invocation after scale-out
Large payload processing — function receives a larger S3 object or SQS message than expected and processing time grows non-linearly
N+1 database query pattern — each item in a batch triggers a separate query, so Duration scales linearly with batch size
Memory-constrained processing — insufficient memory allocation forces the function to use more CPU time for the same computation (Lambda CPU scales with memory allocation)

How ConvOps debugs Duration alarms

When Duration triggers an alarm, ConvOps Diagnose reads CloudWatch Logs, CloudTrail (recent API calls, deploys, config changes), and the current resource state in parallel. It correlates these with AWS/Lambda metrics on the same resource — giving you a plain-English root cause with numbered fix options, sent to WhatsApp or Slack, usually within 60 seconds of the alarm firing.

Before any anomaly in Duration reaches you as a proactive alert (via ConvOps Watch), it passes through 9 verification checks: a Recovery check (did the metric self-heal?), an AWS Status check (is AWS itself having an incident?), a Deploy check (was there a recent Lambda update, ECS deploy, or RDS parameter change in the last 120 minutes?), a Quota check, an Infrastructure check, a Security check, a Flap check (has this metric been anomalous more than 5 times in the last 24 hours?), a TLS check, and a Vulnerability check. Only anomalies that pass all relevant checks reach you — with full context attached.

ConvOps Watch

Detects Duration anomalies with z-score against 30-day time-bucketed baselines. 9 verification checks before any alert.

ConvOps Diagnose

When a Duration alarm fires, reads logs, CloudTrail, and resource state. Sends root cause + fix options to WhatsApp or Slack.

ConvOps Audit

Scans your CloudWatch setup for missing or misconfigured Duration alarms. Free, 5-minute read-only scan.

See how ConvOps debugs your AWS →

Related AWS Lambda metrics

Duration rarely fails in isolation. These metrics tend to correlate — monitor them together for complete AWS Lambda coverage.

Errors Throttles ConcurrentExecutions

FAQ

Frequently asked questions about Duration

Common questions about setting up CloudWatch alarms for Duration in AWS Lambda.

What is the recommended CloudWatch alarm threshold for Duration?+

p99 Duration > 80% of the function's configured timeout. Alarms on average Duration miss the long tail — a function with a 30-second timeout might average 500ms but have p99 at 25 seconds, meaning 1% of invocations are about to timeout. The 80% threshold on p99 (ConvOps recommendation) gives a buffer to investigate before timeouts become errors. Monitor the p99 statistic specifically, not the default Average.

Which CloudWatch namespace does Duration belong to?+

Duration is published in the AWS/Lambda namespace with a unit of Milliseconds. You can find it in the CloudWatch console under "Metrics" → "AWS/Lambda". See the AWS Lambda CloudWatch metrics reference in the AWS documentation.

Does ConvOps automatically create CloudWatch alarms for Duration?+

ConvOps does not create alarms for you by default — it debugs the alarms you already have (or identifies missing ones). The free ConvOps Audit scans your CloudWatch setup and tells you which AWS Lambda resources are missing a Duration alarm. ConvOps Watch then monitors Duration using z-score anomaly detection against a 30-day baseline, running 9 verification checks before alerting you.

Can I use ConvOps without already having a Duration alarm set up?+

Yes. ConvOps Watch monitors Duration independently of your CloudWatch alarm configuration — it reads the metric directly from CloudWatch every 5 minutes on the Growth plan. If you run the free Audit first, it will tell you which resources need a Duration alarm and provide the copy-paste AWS CLI command to create it.