Maximum number of threads per process in Linux? - linux

What is the maximum number of threads that can be created by a process under Linux?
How (if possible) can this value be modified?

Linux doesn't have a separate threads per process limit, just a limit on the total number of processes on the system (threads are essentially just processes with a shared address space on Linux) which you can view like this:
cat /proc/sys/kernel/threads-max
The default is the number of memory pages/4. You can increase this like:
echo 100000 > /proc/sys/kernel/threads-max
There is also a limit on the number of processes (and hence threads) that a single user may create, see ulimit/getrlimit for details regarding these limits.

This is WRONG to say that LINUX doesn't have a separate threads per process limit.
Linux implements max number of threads per process indirectly!!
number of threads = total virtual memory / (stack size*1024*1024)
Thus, the number of threads per process can be increased by increasing total virtual memory or by decreasing stack size. But, decreasing stack size too much can lead to code failure due to stack overflow while max virtual memory is equals to the swap memory.
Check you machine:
Total Virtual Memory: ulimit -v (default is unlimited, thus you need to increase swap memory to increase this)
Total Stack Size: ulimit -s (default is 8Mb)
Command to increase these values:
ulimit -s newvalue
ulimit -v newvalue
*Replace new value with the value you want to put as limit.
References:
http://dustycodes.wordpress.com/2012/02/09/increasing-number-of-threads-per-process/

In practical terms, the limit is usually determined by stack space. If each thread gets a 1MB stack (I can't remember if that is the default on Linux), then you a 32-bit system will run out of address space after 3000 threads (assuming that the last gb is reserved to the kernel).
However, you'll most likely experience terrible performance if you use more than a few dozen threads. Sooner or later, you get too much context-switching overhead, too much overhead in the scheduler, and so on. (Creating a large number of threads does little more than eat a lot of memory. But a lot of threads with actual work to do is going to slow you down as they're fighting for the available CPU time)
What are you doing where this limit is even relevant?

proper 100k threads on linux:
ulimit -s 256
ulimit -i 120000
echo 120000 > /proc/sys/kernel/threads-max
echo 600000 > /proc/sys/vm/max_map_count
echo 200000 > /proc/sys/kernel/pid_max
./100k-pthread-create-app
2018 update from #Thomas, on systemd systems:
/etc/systemd/logind.conf: UserTasksMax=100000

#dragosrsupercool
Linux doesn't use the virtual memory to calculate the maximum of thread, but the physical ram installed on the system
max_threads = totalram_pages / (8 * 8192 / 4096);
http://kavassalis.com/2011/03/linux-and-the-maximum-number-of-processes-threads/
kernel/fork.c
/* The default maximum number of threads is set to a safe
* value: the thread structures can take up at most half
* of memory.
*/
max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
So thread max is different between every system, because the ram installed can be from different sizes, I know Linux doesn't need to increase the virtual memory, because on 32 bit we got 3 GB for user space and 1 GB for the kernel, on 64 bit we got 128 TB of virtual memory, that happen on Solaris, if you want increase the virtual memory you need to add swap space.

To retrieve it:
cat /proc/sys/kernel/threads-max
To set it:
echo 123456789 | sudo tee -a /proc/sys/kernel/threads-max
123456789 = # of threads

Thread count limit:
$ cat /proc/sys/kernel/threads-max
How it is calculated:
max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);
and:
x86_64 page size (PAGE_SIZE) is 4K;
Like all other architectures, x86_64 has a kernel stack for every active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big;
for mempages :
cat /proc/zoneinfo | grep spanned | awk '{totalpages=totalpages+$2} END {print totalpages}';
so actually the number is not related with limitation of thread memory stack size (ulimit -s).
P.S: thread memory stack limitation is 10M in my rhel VM, and for 1.5G memory, this VM can only afford 150 threads?

For anyone looking at this now, on systemd systems (in my case, specifically Ubuntu 16.04) there is another limit enforced by the cgroup pids.max parameter.
This is set to 12,288 by default, and can be overriden in /etc/systemd/logind.conf
Other advice still applies including pids_max, threads-max, max_maps_count, ulimits, etc.

check the stack size per thread with ulimit, in my case Redhat Linux 2.6:
ulimit -a
...
stack size (kbytes, -s) 10240
Each of your threads will get this amount of memory (10MB) assigned for it's stack. With a 32bit program and a maximum address space of 4GB, that is a maximum of only 4096MB / 10MB = 409 threads !!! Minus program code, minus heap-space will probably lead to an observed max. of 300 threads.
You should be able to raise this by compiling and running on 64bit or setting ulimit -s 8192 or even ulimit -s 4096. But if this is advisable is another discussion...

It probably shouldn't matter. You are going to get much better performance designing your algorithm to use a fixed number of threads (eg, 4 or 8 if you have 4 or 8 processors). You can do this with work queues, asynchronous IO, or something like libevent.

Use nbio
non-blocking i/o
library or whatever, if you need more threads for doing I/O calls that block

Depends on your system, just write a sample program [ by creating processes in a loop ] and check using ps axo pid,ppid,rss,vsz,nlwp,cmd. When it can no more create threads check nlwp count [ nlwp is the number threads ] voila you got your fool proof answer instead of going thru books

To set permanently,
vim /etc/sysctl.conf
and add
kernel.threads-max = "value"

I think we missed another restriction which will also block the new thread creation, this is the kernel.pid_max limit.
root#myhost:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.7 LTS
Release: 16.04
Codename: xenial
root#myhost:~# uname -a
Linux myhost 4.4.0-190-generic #220-Ubuntu SMP Fri Aug 28 23:02:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
I find that at least in my system, this threshold kernel.pid_max is 32768. When I launch any simple JVM process, it reports error like below:
java/jstack/jstat ...
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# An error report file with more information is saved as:
# /root/hs_err_pid1390.log
Check the memory, sufficient.
root#lascorehadoop-15a32:~# free -mh
total used free shared buff/cache available
Mem: 125G 11G 41G 1.2G 72G 111G
Swap: 0B 0B 0B
Check the system thread:
~# ps -eLf|wc -l
31506
But I check the system limit by ulimit:
root#myhost:~# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515471
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 98000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 515471
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
From the ulimit output, we could see that currently thread number is far less than the maximum user process limit.
In fact, the limit which is reached is the kernel.pid_max
Very easy to check and tuning it:
https://www.cyberciti.biz/tips/howto-linux-increase-pid-limits.html

We can see the maximum number of threads defined in the following file in linux
cat /proc/sys/kernel/threads-max
(OR)
sysctl -a | grep threads-max

You can see the current value by the following command-
cat /proc/sys/kernel/threads-max
You can also set the value like
echo 100500 > /proc/sys/kernel/threads-max
The value you set would be checked against the available RAM pages. If the thread structures occupies more than 1/8th) of the available RAM pages, thread-max would be reduced accordingly.

Yes, to increase the threads number you need to increase the virtual memory or decrease the stack size. In Raspberry Pi I didn’t find a way to increase the virtual memory, if a decrease the stack size from default 8MB to 1MB It is possibly get more than 1000 threads per process but decrease the stack size with the “ulimit -s” command make this for all threads. So, my solution was use “pthread_t” instance “thread class” because the pthread_t let me set the stack size per each thread. Finally, I am available to archive more than 1000 threads per process in Raspberry Pi each one with 1MB of stack.

If you using Suse, you need to use one of these methods
https://www.suse.com/support/kb/doc/?id=000015901
global,
/etc/systemd/system.conf
DefaultTasksMax=Value
For specific ssh service
/etc/systemd/system/sshd.service.d/override.conf
TasksMax=Value

Related

Why can I create more threads than "free memory" / "thread stack size"?

In Linux the maximum number of threads is defined as max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);, and can be retreived by calling cat /proc/sys/kernel/threads-max. This returns around 14,000 for my raspberry Pi 3. However, when I just create threads in a loop with pthread_create(),(which are empty), I can create only 250, before I getENOMEM (Cannot allocate memory).
Now I looked at the default stack that is allocated to a process or thread, and that is 8192k. So at around 250 threads I would be using 2GB memory. However, in my mind this also does not add up, because calling free -m shows I got total of 1GB memory.
Since I have 1GB of ram, I expected to be able to only create 125 threads at maximum, not 250, and not 14000.
Why can I create 250 threads?
By default, Linux performs memory overcommit. This means that you can allocate more anonymous, writable memory than there is physical memory.
You can turn off memory overcommit using:
# sysctl vm.overcommit_memory=2
This will cause some workloads to fail which work perfectly fine in vm.overcommit_memory=0 mode. Some details can be found in the overcommit accounting documentation.

How do I package a Go program so that it is self sufficient?

I have a Go Program and I want to run it on a switch. Since I cannot install Go on the switch itself, I just copy the executable and try to run. But I get the following error.
runtime: panic before malloc heap initialized
fatal error: runtime: cannot reserve arena virtual address space
runtime stack:
runtime.throw(0x8149b8b)
/usr/local/go/src/pkg/runtime/panic.c:520 +0x71
runtime.mallocinit()
/usr/local/go/src/pkg/runtime/malloc.goc:552 +0xf2
runtime.schedinit()
/usr/local/go/src/pkg/runtime/proc.c:150 +0x3a
_rt0_go()
/usr/local/go/src/pkg/runtime/asm_386.s:95 +0xf6`
How do I package the Go executable with all it's dependencies?
EDIT 1: Here is the ulimit -a dump.
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 40960
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) 395067
file locks (-x) unlimited
TL;DR
Your Go app is not being able to allocate virtual memory to run. I've never developed for a switch before but if it's running linux or a unix variant, check group/user permissions and ulimit values to check if that user has any kind of restriction. Maybe this question might be of help
Longer version
So, your problem here is not go not being able to run without the go development environment because you really don't need it. Go is known for generating static binaries that by definition are self contained and don't depend on other libraries to run.
If you take a better look at your error message, you'll notice that it says:
"cannot reserve arena virtual address space"
You might be asking yourself "what is this arena?"
I quick look at malloc's source code gives us a hint:
Set up the allocation arena, a contiguous area of memory where
allocated data will be found. The arena begins with a bitmap large
enough to hold 4 bits per allocated word.
If you go through that source code you'll find your error message around here.
The runtime·SysReserve C function is the one that actually tries to reserve the virtual address space for the arena. If it can't allocate that, it will throw that error.
You can find code for the Linux implementation of it here.
As go normally tries to avoid big allocations as the might fail right away, if your user can't allocate something as small as 64K, it means your user has tight restrictions. As I have no idea which OS your switch is running and have no experience developing for them I can't go any further than this.
If you can provide more information, I can try to update this answer accordingly.

How Limit memory usage for a single Linux process and not kill the process

How Limit memory usage for a single Linux process and not kill the process.
I know ulimit can limit memory usage, but if exceed the limit, will kill the process.
Is there any other command or shell can limit memory usage and not kill the process?
Another way besides setrlimit, which can be set using the ulimit utility:
$ ulimit -Sv 500000 # Set ~500 mb limit
is to use Linux's control groups, because it limits a process's (or group of processes') allocation of physical memory distinctly from virtual memory. For example:
$ cgcreate -g memory:/myGroup
$ echo $(( 500 * 1024 * 1024 )) > /sys/fs/cgroup/memory/myGroup/memory.limit_in_bytes
$ echo $(( 5000 * 1024 * 1024 )) > /sys/fs/cgroup/memory/myGroupmemory.memsw.limit_in_bytes
will create a control group named "myGroup", cap the set of processes run under myGroup up to 500 MB of physical memory and up to 5000 MB of swap. To run a process under the control group:
$ cgexec -g memory:myGroup COMMAND
Note: For what I can understand setrlimit will limit the virtual memory, although with cgroups you can limit the physical memory.
I believe you are wrong on thinking that a limit set with setrlimit(2) will always kill the process.
Indeed, if the stack space is exceeded (RLIMIT_STACK), the process would be killed (with SIGSEGV).
But if it is heap memory (RLIMIT_DATA or RLIMIT_AS), mmap(2) would fail. If it has been called from malloc(3) or friends, that malloc would fail.
Some Linux systems are configured with memory overcommit. This is a sysadmin issue: echo 0 > /proc/sys/vm/overcommit_memory
The moral of the story is that you should always check result of malloc, at least like
struct mystruct_st *ptr = malloc(sizeof(struct mystruct_st));
if (!ptr) { perror("mystruct allocation"); exit(EXIT_FAILURE); }
Of course, sophisticated applications could handle "out-of-memory" conditions more wisely, but it is difficult to get right.
Some people incorrectly assumes that malloc does not fail (this is wrong). But that is their mistake. Then they dereference a NULL pointer, getting a SIGSEGV that they deserve.
You could consider catching SIGSEGV with some processor-specific code, see this answer. Unless you are a guru, don't do that.

How can I increase OpenFabrics memory limit for Torque jobs?

When I run MPI job over InfiniBand, I get the following worning. We use Torque Manager.
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.
This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.
See this Open MPI FAQ item for more information on these Linux kernel module
parameters:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Local host: host1
Registerable memory: 65536 MiB
Total memory: 196598 MiB
Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
I've read the link on the warning message, and I've done so far is;
Append options mlx4_core log_num_mtt=20 log_mtts_per_seg=4 on /etc/modprobe.d/mlx4_en.conf.
Make sure the following lines are written on /etc/security/limits.conf
* soft memlock unlimited
* hard memlock unlimited
Append session required pam_limits.so on /etc/pam.d/sshd
Make sure ulimit -c unlimited is uncommented on /etc/init.d/pbs_mom
Can anyone help me to find out what I'm missing?
Your mlx4_core parameters allow for the registration of 2^20 * 2^4 * 4 KiB = 64 GiB only. With 192 GiB of physical memory per node and given that it is recommended to have at least twice as much registerable memory, you should set log_num_mtt to 23, which would increase the limit to 512 GiB - the closest power of two greater or equal to twice the amount of RAM. Be sure to reboot the node(s) or unload and then reload the kernel module.
You should also submit a simple Torque job script that executes ulimit -l in order to verify the limits on locked memory and make sure there is no such limit. Note that ulimit -c unlimited does not remove the limit on the amount of locked memory but rather the limit on the size of core dump files.

mmap returns ENOMEM with shm_open file object

experimenting with shm_open in linux and running into problems. i'm frequently resizing a shared memory segment with ftrunc and using mmap to remap the resized segment. however, right around the 20 megabyte mark i get ENOMEM from mmap.
things i have attempted to do to resolve the issue:
first, i found out about these sysctl parameters. i reconfigured them:
kernel.shmmax = 268435456
kernel.shmall = 2097152
(shmall is specified in pages)
the issue still occurred after this. investigating the details of the resize that causes the issue revealed that the call made to ftrunc to resize the shared memory object succeeded (the corresponding file in /dev/shm had the requested new size).
documentation from here http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html suggests three possible causes for an ENOMEM errno:
[ENOMEM]
MAP_FIXED was specified, and the range [addr,addr+len) exceeds that allowed for the address space of a process; or, if MAP_FIXED was not specified and there is insufficient room in the address space to effect the mapping.
[ENOMEM]
[ML] [Option Start] The mapping could not be locked in memory, if required by mlockall(), because it would require more space than the system is able to supply. [Option End]
[ENOMEM]
[TYM] [Option Start] Not enough unallocated memory resources remain in the typed memory object designated by fildes to allocate len bytes. [Option End]
i am not using MAP_FIXED or locking, and the size of the image in /dev/shm suggests that the third reason is not the problem. my mmap call looks like this:
mmap(mem, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)
where mem initially is 0 and thereafter refers to the last address mmap successfully mapped.
i found information suggesting that ulimit settings could be limiting the memory mappable into a single process, but i don't think the problem was here. just in case, ulimit -a looks like this on my machine:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
i hope this is an easy one :)
well, i found out what my problem was the other day. i misread the documentation for mmap, which says that mmap returns a mapping based on the first parameter (the previously mapped address in my case), and the result is defined by implementation. i took this as a suggestion that mmap might remap my previous mapping for me, but this was certainly not the case. this might only have been the case if i had used the MAP_FIXED flag, but i avoided this because the documentation recommended against it. in any case, it was necessary to use munmap to remove the previous mapping before creating a new one. i hope this post will help anyone making the same foolish misreading that i did

Resources