I have a machine with minified single user OS based on 64bit Fedora 24:
Vendor: Acer Veriton VN4640G
CPU: Intel(R) Core(TM) i5-6400T CPU # 2.20GHz
RAM: 4GB DDR4 2133 MHz
Storage: 32GB 2,5" ADATA SP600
I wrote a simple script /root/test.sh which run 10000 processes on background:
ulimit -a > /tmp/ulimit
i=1
while [ $i -le 10000 ]; do
echo $i
sleep 60 & disown
i=$(( $i + 1 ))
done
When I run this script directly from console, it runs 10000 sleep processes and print numbers as expected.
# bash test.sh
1
2
...
9999
10000
# ps ax | grep -c [s]leep
10000
Ulimit looks well
# cat /tmp/ulimit
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15339
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15339
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
BUT
if I run this script via cron (/etc/cron.d/custom), e.g.
0 8 * * * root bash /root/test.sh
I see in journalctl -e -o cat messages like:
(root) CMDOUT (494)
(root) CMDOUT (495)
(root) CMDOUT (496)
(root) CMDOUT (/root/test.sh: fork: retry: Resource temporarily unavailable)
(root) CMDOUT (/root/test.sh: fork: retry: Resource temporarily unavailable)
(root) CMDOUT (/root/test.sh: fork: retry: Resource temporarily unavailable)
(root) CMDOUT (/root/test.sh: fork: retry: Resource temporarily unavailable)
(root) CMDOUT (/root/proc.sh: fork: Resource temporarily unavailable)
So it run only about 500 processes and then cann't fork any other process even if there is still enough resources and user limits are the same as console case.
# free -h
total used free shared buff/cache available
Mem: 3,8G 472M 2,8G 62M 498M 3,0G
Swap: 0B 0B 0B
The count of running sleeps is always the same. Is there any recource limit for tasks run from cron?
P.S.: I did the test even on full Fedora 24 and result is the same...
Well, I found a solution during writing this question.
The main pointer to the problem was that I once saw in journalctl message
kernel: cgroup: fork rejected by pids controller in /system.slice/crond.service
So I checked the cron.service and found a parameter TasksMax.
# systemctl show crond.service
Type=simple
Restart=no
...
TasksMax=512
EnvironmentFile=/etc/sysconfig/crond (ignore_errors=no)
UMask=0022
LimitCPU=18446744073709551615
LimitCPUSoft=18446744073709551615
Solution
Add parameter TasksMax to the service configuration in /usr/lib/systemd/system/crond.service, e.g.:
Note: As Mark Plotnick wrote, better way is copy this service to /etc/systemd/system/ folder and modify this file to avoid rewriting service in /usr/ during upgrade.
# cat /usr/lib/systemd/system/crond.service
[Unit]
Description=Command Scheduler
After=auditd.service nss-user-lookup.target systemd-user-sessions.service time-sync.target ypbind.service
[Service]
EnvironmentFile=/etc/sysconfig/crond
ExecStart=/usr/sbin/crond -n $CRONDARGS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
TasksMax=100000
[Install]
WantedBy=multi-user.target
Then reload systemd daemons
# systemctl daemon-reload
General solution
If you want avoid this problem with any systemd service you can change default value in /etc/systemd/system.conf, e.g.:
sed -i 's/#DefaultTasksMax=512/DefaultTasksMax=10000/' /etc/systemd/system.conf
And reload systemd daemons to apply the changes
# systemctl daemon-reload
But I don't know the exact consequences of this solution, so I can not recommend it.
Related
I face a problem when running Microservices on K8s.
Sometimes this error is shown
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
I was checking some config on the worker (all process running as root): ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 385975
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1048576
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
and cat /proc/sys/kernel/threads-max
771951
Memory is good
free -m
total used free shared buff/cache available
Mem: 96517 35835 45218 10 15462 60126
Swap: 0 0
Limits on each process is unlimited.
cat /proc/440583/limits | grep processes
Max processes unlimited processes
I created a script to counting the number of thread and write to file
while true; do ps -eo nlwp | tail -n +2 | awk '{ num_threads += $1 } END { print num_threads }' >> test.txt; date >> test.txt; sleep 10; done &
this is the log at the time of the error
Mon Jun 21 13:18:48 2021
34323
Mon Jun 21 13:18:52 2021
34325
Mon Jun 21 13:18:58 2021
34324
Mon Jun 21 13:19:02 2021
11945
Mon Jun 21 13:19:10 2021
11979
(the error occurs in 13:18)
All Microservices have good heap usage.
It's seems ulimit or thread-max not working as expected.
Could anyone help me?
Description
Recently I've run into an problem. I am not able to run yarn start in element-web directory, I get these errors. Originally I thought it had something to do with element-web itself so I created an issue. Some time after that I tried to run wintersmith preview in bibviz directory and got the same errors. This was weird so I tried to create an Angular project and run ng serve and errors again. I headed to the issue to close it as it wasn't an element-web issue. I found that there was another issue created with the same problem. It had already been closed by turt2live saying it looks like you've run out of memory on your system. Based on this I tried to turn of most programs running in the background and now all the commands worked.
I am sure that ng serve used to work in the past.
My PC has 16 GB of RAM and the commands already fail when I am on 7/16 GB. I can't see any memory spikes when running the commands. Running the commands with sudo also completely eliminates the problem. This doesn't make any sense to me.
Research lead me to ulimits but they seem to have no effect. I have also installed watchman with no effect.
Can someone tell me what I am missing?
Thank you in advance!
Info
I am on Debian 11 Bullseye. This is the output of a few commands that could be useful.
As a regular user:
> uname -a
Linux Simon-s-PC 5.8.0-3-amd64 #1 SMP Debian 5.8.14-1 (2020-10-10) x86_64 GNU/Linux
> sudo sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 524288
> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 46482
-n: file descriptors 8192
-l: locked-in-memory size (kbytes) unlimited
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 63664
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 95
-N 15: unlimited
> yarn --version
1.22.5
With sudo su:
> sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 524288
> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 63664
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 2043392
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 63664
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
I think I've found a solution:
Set limits in /etc/sysctl.conf by adding:
fs.inotify.max_user_watches=524288
fs.inotify.max_user_instances=512
Open a new terminal or reload sysctl.conf variables with
sudo sysctl --system
Run yarn start
Everything should work fine now, hopefully. If it doesn't work try setting the limits higher.
Using Linux
$ uname -r
4.4.0-1041-aws
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
With limits allowing up to 200k processes
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 563048
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 524288
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ cat /proc/sys/kernel/pid_max
200000
$ cat /proc/sys/kernel/threads-max
1126097
And enough free memory to give 1MB each to 127k processes
$ free
total used free shared buff/cache available
Mem: 144156492 5382168 130458252 575604 8316072 137302624
Swap: 0 0 0
And I have fewer than 1k existing processes/threads.
$ ps -elfT | wc -l
832
But I cannot start 50k processes
$ echo '
seq 50000 | while read _; do
sleep 20 &
done
' | bash
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
...
Why can't I create 50k processes?
It was caused by Linux cancer systemd.
In addition to kernel.pid_max and ulimit, I also needed to change a third limit.
/etc/systemd/logind.conf
[Login]
UserTasksMax=70000
And then restart.
Building on #Basile's answer, you probably ran out of pids.
cat /proc/sys/kernel/pid_max gives me 32768 on my machine (maximum value of a signed short). which is less than 50k
EDIT: I missed that /proc/sys/kernel/pid_max is set to 200000. That probably isn't the issue in this case.
Because each process requires some resources: some RAM (including some kernel memory), some CPU, etc.
Each process has its own virtual address space, including its own call stack (and some of it requires physical resources, including several pages of RAM; read more about resident set size; on my desktop the RSS of some bash process is about 6Mbytes). So a process is actually some quite heavy stuff.
BTW, this is not specific to Linux.
Read more about operating systems, e.g. Operating Systems : Three Easy Pieces
Try also cat /proc/$$/maps and cat /proc/$$/status and read more about proc(5). Read about failure of fork(2) and of execve(2). The resource temporarily unavailable is for EAGAIN (see errno(3)), and several reasons can make fork fail with EAGAIN. And on my system, cat /proc/sys/kernel/pid_max gives 32768 (and reaching that limit gives EAGAIN for fork).
BTW, imagine if you could fork ten thousand processes. Then the context switch time would be dominant w.r.t. to running time.
Your Linux system looks like some AWS instance. Amazon won't let you create that much processes, because their hardware is not expecting that much.
(on some costly supercomputer or server with e.g. a terabyte of RAM and a hundred of cores, perhaps you could run 50K processes; I guess that they need some particular kernel, or kernel configuration. I recommend getting help from Amazon support)
I have changed /etc/security/limits.com and rebooted the machine remotely, However, after the boot, the nproc parameter has still the old value.
[ost#compute-0-1 ~]$ cat /etc/security/limits.conf
* - memlock -1
* - stack -1
* - nofile 4096
* - nproc 4096 <=====================================
[ost#compute-0-1 ~]$
Broadcast message from root#compute-0-1.local
(/dev/pts/0) at 19:27 ...
The system is going down for reboot NOW!
Connection to compute-0-1 closed by remote host.
Connection to compute-0-1 closed.
ost#cluster:~$ ssh compute-0-1
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Last login: Tue Sep 27 19:25:25 2016 from cluster.local
Rocks Compute Node
Rocks 6.1 (Emerald Boa)
Profile built 19:00 23-Aug-2016
Kickstarted 19:08 23-Aug-2016
[ost#compute-0-1 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 516294
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1024 <=========================
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Please see that I set max user processes to 4096 but after the reboot, the value is still 1024.
Please take a look at a file named /etc/pam.d/sshd .
If you can find it, open the file and insert a following line.
session required pam_limits.so
Then the new value will be effective even after rebooting.
PAM is a module which is related to authentication. So you need to enable the module through ssh login.
More details on man pam_limits.
Thanks!
How to debug following points just to find out exact reason which resource exceeding limit
How many process currently running
How many process running for per
user No. of opened files for per process.
Total no. of opened files for all process.
No. of process limit No. of open file limit
There can be multiple ways to go about what you are trying to achieve, e.g. you could get all the information you need by evaluating /proc/ fs. Below is a list of utilities you could use to debug the actual resource issue.
Good luck.
How many process currently running
ps -eaf | wc -l
How many process running for per user
ps -fu [username] | wc -l
No. of opened files for per process.
lsof -p < pid > | wc -l
Total no. of opened files for all process.
You could iterate over all the pid as shown above and make use of lsof command. Here, you might have to execute the command as root, else you would get permission denied while doing lsof
No. of process limit No. of open file limit
For a specific terminal, you could do
$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15973
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15973
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited