Kubernetes can't start due to too many open files in system - linux

I am trying create a bunch of pods, services and deployment using Kubernetes, but keep hitting the following errors when I run the kubectl describe command.
for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container bbdb58770a848733bf7130b1b230d809fcec3062b2b16748c5e4a8b12cc0533a: [8] System error: too many open files in system\n"
I have already terminated all pods and try restarting the machine, but it doesn't solve the issue. I am not an Linux expert, so I am just wondering how shall find all the open files and close them?

You can confirm which process is hogging file descriptors by running:
lsof | awk '{print $2}' | sort | uniq -c | sort -n
That will give you a sorted list of open FD counts with the pid of the process. Then you can look up each process w/
ps -p <pid>
If the main hogs are docker/kubernetes, then I would recommend following along on the issue that caesarxuchao referenced.

Related

Is it possible to connect to running NestJS server stderr/stdout?

I'm not sure if it's possible, but I want to connect to existing application to see what logs it produce in realtime. Is it possible to do so from Linux terminal?
If you know the pid of the process (it should be in the start up logs, or retrieved via a ps aux | grep <port> call) you should be able to tail it via tail -f /proc/<pid>/fd/2. Command found from here

how to find owner of a process without ps

Running nginx alpine image. ps is not installed and do not have permission to install ps using apt-get. I have the pid of process. Is there any way I can find out who the owner of process is ?
In this case, I want to figure out who is running nginx master process.
Use ls to find the process owner in the proc directory
ls -ld /proc/816
If you have stat you can display just the owner with fancy formatting:
stat -c '%U' /proc/775
avahi
Bonus: print your user name without looking at $USER
stat -c '%U' /proc/$$
You can find all the information relative to a process in /proc/YOUR_PROCESS_ID/status where YOUR_PROCESS_IDis the PID of your process.
Therefore, you could get the owner of the process by simply running something like this:
cat /proc/YOUR_PROCESS_ID/status | grep "Uid" | cut -f 2 | id -nu
You can use docker top command to get details about all the processes running inside a docker container
Syntax
docker top <container ID or name>
How about checking from active processes list?
top
If looking for specific process name:
top | grep nginx

Too many open files in system in kubernetes cluster [duplicate]

I am trying create a bunch of pods, services and deployment using Kubernetes, but keep hitting the following errors when I run the kubectl describe command.
for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container bbdb58770a848733bf7130b1b230d809fcec3062b2b16748c5e4a8b12cc0533a: [8] System error: too many open files in system\n"
I have already terminated all pods and try restarting the machine, but it doesn't solve the issue. I am not an Linux expert, so I am just wondering how shall find all the open files and close them?
You can confirm which process is hogging file descriptors by running:
lsof | awk '{print $2}' | sort | uniq -c | sort -n
That will give you a sorted list of open FD counts with the pid of the process. Then you can look up each process w/
ps -p <pid>
If the main hogs are docker/kubernetes, then I would recommend following along on the issue that caesarxuchao referenced.

Shell script to avoid killing the process that started the script

I am very new to shell scripting, can anyone help to solve a simple problem: I have written a simple shell script that does:
1. Stops few servers.
2. Kills all the process by user1
3. Starts few servers .
This script runs on the remote host. so I need to ssh to the machine copy my script and then run it. Also Command I have used for killing all the process is:
ps -efww | grep "user1"| grep -v "sshd"| awk '{print $2}' | xargs kill
Problem1: since user1 is used for ssh and running the script.It kills the process that is running the script and never goes to start the server.can anyone help me to modify the above command.
Problem2: how can I automate the process of sshing into the machine and running the script.
I have tried expect script but do I need to have a separate script for sshing and performing these tasksor can I do it in one script itself.
any help is welcomed.
Basically the answer is already in your script.
Just exclude your script from found processes like this
grep -v <your script name>
Regarding running the script automatically after you ssh, have a look here, it can be done by a special ssh configuration
Just create a simple script like:
#!/bin/bash
ssh user1#remotehost '
someservers stop
# kill processes here
someservers start
'
In order to avoid killing itself while stopping all user's processes try to add | grep -v bash after grep -v "sshd"
This is a problem with some nuance, and not straightforward to solve in shell.
The best approach
My suggestion, for easier system administration, would be to redesign. Run the killing logic as root, for example, so you may safely TERMinate any luser process without worrying about sawing off the branch you are sitting on. If your concern is runaway processes, run them under a timeout. Etc.
A good enough approach
Your ssh login shell session will have its own pseudo-tty, and all of its descendants will likely share that. So, figure out that tty name and skip anything with that tty:
TTY=$(tty | sed 's!^/dev/!!') # TTY := pts/3 e.g.
ps -eo tty=,user=,pid=,cmd= | grep luser | grep -v -e ^$TTY -e sshd | awk ...
Almost good enough approaches
The problem with "almost good enough" solutions like simply excluding the current script and sshd via ps -eo user=,pid=,cmd= | grep -v -e sshd -e fancy_script | awk ...) is that they rely heavily on the accident of invocation. ps auxf probably reveals that you have a login shell in between your script and your sshd (probably -bash) — you could put in special logic to skip that, too, but that's hardly robust if your script's invocation changes in the future.
What about question no. 2? (How can I automate sshing...?)
Good question. Off-topic. Try superuser.com.

How to log the Ram and CPU usage for Linux Processes

How would I track the CPU and Ram usage for a process that may run, stop, and then re-run with a different PID?
I am looking to track this information for all processes on a Linux server but the problem is when the process stops and restarts, it will have a different PID and I am not sure how to identify it as the same process.
What you're looking for here is called "process accounting".
http://tldp.org/HOWTO/Process-Accounting/
If you know the command of the process, just pipe it to a grep like this:
ps ux | grep yourcommandgoeshere
You can setup a crontab to record output of commands like
top -b -n1 | grep
ps ux | grep
Alternatively, you can use sealion service. By simply installing agent and configuring it according to your needs in simple steps, you can see output of the executed commands online.
Hope it helps...

Resources