Fix Ec2 Server Resource Exhaustion [Solved]

Table of Contents

Symptoms & Diagnosis

EC2 resource exhaustion often feels like a “battery drain,” especially when using T2 or T3 burstable instances. When your CPU credits hit zero, performance drops significantly, causing application lag or total timeouts.

The first sign of exhaustion is usually a spike in latency or failed health checks. You may notice that SSH connections become unresponsive or that your web server stops serving requests despite being “running” in the AWS Console.

To diagnose this, check your AWS CloudWatch metrics. Look specifically for “CPUUtilization” and “CPUCreditBalance.” If your balance is near zero, your instance is being throttled. Also, monitor “DiskReadOps” and “DiskWriteOps” to ensure I/O wait isn’t the primary bottleneck.

Troubleshooting Guide

When your EC2 server is struggling, you need to identify the specific resource being consumed: CPU, RAM, or Disk I/O. Start by logging into your instance via SSH and running real-time monitoring tools.

Use the following commands to identify the offending processes:

# Check overall system load and CPU usage
top

# Check memory usage in megabytes
free -m

# Check disk space availability
df -h

If you find a specific process consuming too much memory, you can investigate it further or restart the service. Below is a quick reference table for common troubleshooting commands:

Command	Resource Monitored	Description
htop	CPU/RAM	Interactive process viewer (requires installation).
iostat	Disk I/O	Shows storage device statistics.
netstat -tulpn	Network	Identifies active connections and listening ports.

Resolving Memory Leaks

If the free -m command shows that swap is heavily used, your server is out of RAM. You can temporarily clear the cache or identify the specific service—like Apache or MySQL—that needs optimization.

# Clear PageCache, dentries, and inodes (use with caution)
sync; echo 3 > /proc/sys/vm/drop_caches

Prevention

Preventing EC2 resource exhaustion requires a proactive approach. Start by setting up CloudWatch Alarms. Set an alarm to notify you when CPU utilization exceeds 80% for more than five minutes.

Consider switching to “Unlimited” mode for T-series instances. This allows the instance to burst beyond its credit balance for a small additional fee, preventing the “battery drain” effect that kills performance.

Horizontal scaling is the ultimate solution. Use an Auto Scaling Group (ASG) to automatically launch new instances when load increases. This ensures that no single server bears the entire burden of a traffic spike.

Finally, optimize your application code. Ensure that logs are rotated regularly using logrotate to prevent disk exhaustion, and implement connection pooling for databases to reduce overhead.

Symptoms & Diagnosis

Troubleshooting Guide

Resolving Memory Leaks

Prevention

Related posts: