Kubernetes Pods Terminated - Exit Code 137

Kubernetes Pods Terminated - Exit Code 137

Was able to solve the problem.

The nodes initially had 20G of ebs volume and on a c5.4xlarge instance type. I increased the ebs to 50 and 100G but that did not help as I kept seeing the below error:

"Disk usage on image filesystem is at 95% which is over the high
threshold (85%). Trying to free 3022784921 bytes down to the low
threshold (80%). "

I then changed the instance type to c5d.4xlarge which had 400GB of cache storage and gave 300GB of EBS. This solved the error.

Some of the gitlab jobs were for some java applications that were eating away lot of cache space and writing lot of logs.

Kubernetes 137 exit code when using SecurityContext readOnlyRootFilesystem

I managed to replicate your issue and achieve read only filesystem with exception for one directory.

First, worth to note that you are using both solutions in your deployment - the AppArmor profile and SecurityContext. As AppArmor seems to be much more complex and needs configuration to be done per node I decided to use only SecurityContext as it is working fine.

I got this error that you mention in the comment:

Failed to create CoreCLR, HRESULT: 0x80004005

This error doesn't say to much, but after some testing I found that it only occurs when you are running the pod which filesytem is read only - the application tries to save files but cannot do so.

The app creates some files in the /tmp directory so the solution is to mount /tmp using Kubernetes Volumes so it will be read write. In my example I used emptyDir but you can use any other volume you want as long as it supports writing to it. The deployment configuration (you can see that I added volumeMounts and volumes and the bottom):

apiVersion: v1
kind: Service
metadata:
name: parser-service
spec:
selector:
app: parser
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
- port: 443
targetPort: 443
protocol: TCP
name: https
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: parser-deployment
spec:
replicas: 5
selector:
matchLabels:
app: parser
template:
metadata:
labels:
app: parser
spec:
containers:
- name: parser
image: parser.azurecr.io/parser:latest
ports:
- containerPort: 80
- containerPort: 443
resources:
limits:
cpu: "1.20"
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /tmp
name: temp
volumes:
- name: temp
emptyDir: {}

After executing into pod I can see that pod file system is mounted as read only:

# ls   
app bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
# touch file
touch: cannot touch 'file': Read-only file system
# mount
overlay on / type overlay (ro,...)

By running kubectl describe pod {pod-name} I can see that /tmp directory is mounted as read write and it is using temp volume:

Mounts:
/tmp from temp (rw)

Keep in mind if you are using other directories (for example to save files) you need to also mount them the same way as the /tmp.

Kubernetes pod auto restart with exit 137 code

I am posting David's answer from the comments (community wiki) as it was confirmed by the OP:

If you're seeing that message it's the kernel OOM killer: your node is
out of memory. Increasing your pod's resource requests to be closer to
or equal to the resource limits can help a little bit (by keeping
other processes from getting scheduled on the same node), but if you
have a memory leak, you just need to fix that, and that's not
something that can really be diagnosed from the Kubernetes level.



Related Topics



Leave a reply



Submit