What are common causes of RunningTaskCount going high in Amazon ECS?

Startup crash on new deploy — the new task definition contains a bug that causes the container to exit immediately, ECS retries but the service never recovers Health check failure — the container starts but the health check endpoint returns non-200 before the grace period expires, ECS marks the task unhealthy and replaces it in a loop OOM kill — tasks with insufficient memory reservations get killed by ECS, RunningTaskCount drops, and ECS starts replacement tasks that may also get killed IAM permissions missing — the task role lacks permissions for a required AWS service call, the container errors on startup, ECS retries in a crash loop Dependency unavailable at startup — the container requires a database or secrets manager call during initialization that fails, causing crash-loop restarts

CloudWatch Metric Guide

AWS/ECS/RunningTaskCountCount

RunningTaskCountAmazon ECS CloudWatch metric

Q: What is the recommended CloudWatch alarm threshold for RunningTaskCount?

Recommended threshold: < desired task count for the service. Any drop below desired count means the service is degraded (ConvOps recommendation). The desired count is the explicit capacity declaration you've made — running below it violates your own capacity model. This alarm has very few false positives: RunningTaskCount only drops below desired during deployments (briefly) or failures. Pair it with a brief evaluation period (1–2 minutes) to avoid alarming during normal rolling deployments.

Q: Which CloudWatch namespace does RunningTaskCount belong to?

RunningTaskCount is published in the AWS/ECS CloudWatch namespace, with a unit of Count. See AWS documentation: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html

RunningTaskCount reports the number of tasks in the RUNNING state for an ECS service. Tasks not in RUNNING are either pending, provisioning, deprovisioning, or stopped.

Audit your AWS for free Learn about ConvOps Watch

What it measures

About RunningTaskCount

RunningTaskCount reports the number of tasks in the RUNNING state for an ECS service. Tasks not in RUNNING are either pending, provisioning, deprovisioning, or stopped.

Namespace	AWS/ECS
Metric name	RunningTaskCount
Unit	Count
AWS docs	Official Amazon ECS metrics reference

Why this metric matters

RunningTaskCount below your desired count means your service is operating at reduced capacity — either tasks are failing health checks, crashing on startup, or being terminated due to memory pressure. A service with a desired count of 3 running at 1 is serving three times its expected traffic load per task.

This metric is unique in that it can drop abruptly to zero during a bad deploy. If a new task definition introduces a startup crash, ECS will attempt to start tasks, they'll crash immediately, ECS will retry with backoff, and RunningTaskCount will oscillate between 0 and 1 while the service is effectively down. An alarm on RunningTaskCount < desired count catches this scenario faster than any other metric.

Recommended alarm threshold for RunningTaskCount

Recommended threshold

< desired task count for the service

Any drop below desired count means the service is degraded (ConvOps recommendation). The desired count is the explicit capacity declaration you've made — running below it violates your own capacity model. This alarm has very few false positives: RunningTaskCount only drops below desired during deployments (briefly) or failures. Pair it with a brief evaluation period (1–2 minutes) to avoid alarming during normal rolling deployments.

Is your RunningTaskCount alarm already set up correctly?

The free ConvOps Audit scans your CloudWatch setup and flags missing or misconfigured alarms — including RunningTaskCount — in 5 minutes.

Run a free audit →

Common failures that show up in RunningTaskCount

When RunningTaskCount reaches an alarm threshold, these are the most common root causes — in order of how often ConvOps sees them across customer AWS accounts.

Startup crash on new deploy — the new task definition contains a bug that causes the container to exit immediately, ECS retries but the service never recovers
Health check failure — the container starts but the health check endpoint returns non-200 before the grace period expires, ECS marks the task unhealthy and replaces it in a loop
OOM kill — tasks with insufficient memory reservations get killed by ECS, RunningTaskCount drops, and ECS starts replacement tasks that may also get killed
IAM permissions missing — the task role lacks permissions for a required AWS service call, the container errors on startup, ECS retries in a crash loop
Dependency unavailable at startup — the container requires a database or secrets manager call during initialization that fails, causing crash-loop restarts

How ConvOps debugs RunningTaskCount alarms

When RunningTaskCount triggers an alarm, ConvOps Diagnose reads CloudWatch Logs, CloudTrail (recent API calls, deploys, config changes), and the current resource state in parallel. It correlates these with AWS/ECS metrics on the same resource — giving you a plain-English root cause with numbered fix options, sent to WhatsApp or Slack, usually within 60 seconds of the alarm firing.

Before any anomaly in RunningTaskCount reaches you as a proactive alert (via ConvOps Watch), it passes through 9 verification checks: a Recovery check (did the metric self-heal?), an AWS Status check (is AWS itself having an incident?), a Deploy check (was there a recent Lambda update, ECS deploy, or RDS parameter change in the last 120 minutes?), a Quota check, an Infrastructure check, a Security check, a Flap check (has this metric been anomalous more than 5 times in the last 24 hours?), a TLS check, and a Vulnerability check. Only anomalies that pass all relevant checks reach you — with full context attached.

ConvOps Watch

Detects RunningTaskCount anomalies with z-score against 30-day time-bucketed baselines. 9 verification checks before any alert.

ConvOps Diagnose

When a RunningTaskCount alarm fires, reads logs, CloudTrail, and resource state. Sends root cause + fix options to WhatsApp or Slack.

ConvOps Audit

Scans your CloudWatch setup for missing or misconfigured RunningTaskCount alarms. Free, 5-minute read-only scan.

See how ConvOps debugs your AWS →

Related Amazon ECS metrics

RunningTaskCount rarely fails in isolation. These metrics tend to correlate — monitor them together for complete Amazon ECS coverage.

CPUUtilization MemoryUtilization

FAQ

Frequently asked questions about RunningTaskCount

Common questions about setting up CloudWatch alarms for RunningTaskCount in Amazon ECS.

What is the recommended CloudWatch alarm threshold for RunningTaskCount?+

< desired task count for the service. Any drop below desired count means the service is degraded (ConvOps recommendation). The desired count is the explicit capacity declaration you've made — running below it violates your own capacity model. This alarm has very few false positives: RunningTaskCount only drops below desired during deployments (briefly) or failures. Pair it with a brief evaluation period (1–2 minutes) to avoid alarming during normal rolling deployments.

Which CloudWatch namespace does RunningTaskCount belong to?+

RunningTaskCount is published in the AWS/ECS namespace with a unit of Count. You can find it in the CloudWatch console under "Metrics" → "AWS/ECS". See the Amazon ECS CloudWatch metrics reference in the AWS documentation.

Does ConvOps automatically create CloudWatch alarms for RunningTaskCount?+

ConvOps does not create alarms for you by default — it debugs the alarms you already have (or identifies missing ones). The free ConvOps Audit scans your CloudWatch setup and tells you which Amazon ECS resources are missing a RunningTaskCount alarm. ConvOps Watch then monitors RunningTaskCount using z-score anomaly detection against a 30-day baseline, running 9 verification checks before alerting you.

Can I use ConvOps without already having a RunningTaskCount alarm set up?+

Yes. ConvOps Watch monitors RunningTaskCount independently of your CloudWatch alarm configuration — it reads the metric directly from CloudWatch every 5 minutes on the Growth plan. If you run the free Audit first, it will tell you which resources need a RunningTaskCount alarm and provide the copy-paste AWS CLI command to create it.