How To Fix Kubernetes Oomkilled Error [Solved]

Issue Details
Error Name OOMKilled (Out of Memory)
Exit Code 137
Common Fix Increase memory limits or debug application memory leaks.

Kubernetes OOMKilled Error 137 Troubleshooting Guide.

What is the Kubernetes OOMKilled Error?

The OOMKilled error is a signal that a container has been terminated by the Kubernetes node because it exceeded its allocated memory. This is part of the “Out of Memory Killer” mechanism in Linux designed to protect the operating system from crashing when resources are exhausted.

When a pod reaches its memory limit, the kubelet kills the process. This results in a status of OOMKilled and an Exit Code 137. It is one of the most common reasons for service instability in production environments.

There are two types: OOMKilled by Limit (container exceeded its specific YAML limit) and OOMKilled by Node (the entire host ran out of memory, and your pod was selected for eviction).

Step-by-Step Solutions to Fix OOMKilled

Step 1: Confirm the Error Status

First, identify which pod is failing. Use the following command to check the status of your pods in the current namespace.

kubectl get pods

Step 2: Describe the Pod for Detailed Logs

To confirm that memory is the culprit, check the pod’s “Last State” and “Reason” field. Look for the “Terminated” status with the “OOMKilled” reason.

kubectl describe pod [POD_NAME]

Step 3: Monitor Real-Time Resource Usage

Use the Metrics Server to see how much memory your containers are currently consuming. This helps you determine if the memory growth is sudden or a steady leak.

kubectl top pod [POD_NAME]

Step 4: Adjust Memory Limits and Requests

The most direct fix is to update your deployment configuration. Increase the memory limits and requests in your YAML file to accommodate the workload.

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

Apply the changes using kubectl apply -f deployment.yaml. This will restart the pods with the new resource constraints.

Step 5: Check for Memory Leaks

If the OOMKilled error recurs even after increasing limits, your application likely has a memory leak. Use language-specific profilers (like pprof for Go or visualvm for Java) to analyze the heap.

Step 6: Address Node-Level OOM Issues

If the error is caused by the node running out of memory, you may need to add more nodes to your cluster or upgrade your existing nodes to a higher memory instance type.