This question already has answers here:
Maximum number of threads per process in Linux?
(18 answers)
Changing the limit of maximum number of pthreads by an application
(4 answers)
Closed 8 years ago.
I have about 500 threads that I want them to run simultaneously.
I read that the default glibc allows only about 300 threads to run simultaneously.
How did they got to this number? (I'm on 32 bit system)
The default stack size of a thread on linux is 10MB (or 8 on some). On a 32 bit linux, user space applications have 3GB of memory address space, some used for shared libraries, heap, the code, and other housekeeping, exhausting the address space at about 260 threads(2.6GB memory) is reasonable.
You can probably do with less space for the stack, so create threads with less stack space, e.g.
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 1024*1000*2);
pthread_create(&tid, &attr, threadfunc, NULL);
Related
In Linux the maximum number of threads is defined as max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);, and can be retreived by calling cat /proc/sys/kernel/threads-max. This returns around 14,000 for my raspberry Pi 3. However, when I just create threads in a loop with pthread_create(),(which are empty), I can create only 250, before I getENOMEM (Cannot allocate memory).
Now I looked at the default stack that is allocated to a process or thread, and that is 8192k. So at around 250 threads I would be using 2GB memory. However, in my mind this also does not add up, because calling free -m shows I got total of 1GB memory.
Since I have 1GB of ram, I expected to be able to only create 125 threads at maximum, not 250, and not 14000.
Why can I create 250 threads?
By default, Linux performs memory overcommit. This means that you can allocate more anonymous, writable memory than there is physical memory.
You can turn off memory overcommit using:
# sysctl vm.overcommit_memory=2
This will cause some workloads to fail which work perfectly fine in vm.overcommit_memory=0 mode. Some details can be found in the overcommit accounting documentation.
Let me start by clarifying two aspects:
(1) by concurrently-running, I mean executing on the hardware at any one point in time, rather than being in some other OS state such as ready or waiting; and
(2) assume that the hardware has a sufficiently large number of hardware threads (aka logical processors), so that that's not the limiting factor. E.g. 4096 hardware threads. (Obviously I don't have such a machine, yet.)
I've read that 32-bit Windows only supports 32 concurrently-running threads, and that a 64-bit process (on 64-bit Windows) can have 64 concurrently-running threads per processor group and up to 20 processor groups (when using multiple groups) on Windows 10.
But I've been unable to find anything relevant about WOW64. I've found lots of information on the maximum number of threads that can be created, but nothing on concurrently-running threads.
So, how many concurrently-running threads can a WOW64 process have (on Windows 10)?
Is it?
(a) 32, for compatibility with 32-bit Windows; or
(b) 64, because processor groups aren't accessible by 32-bit code, so all threads run in the default processor group; or
(c) a larger number, because a WOW64 process is partly 64-bit code, and that (Microsoft) code can use multiple processor groups. (I don't think that this is likely, but include it as another possibility.)
Edit.
This is not a duplicate of any of the following questions on Stack Overflow, because their answers focus mainly on thread maximums due to address space limits.
What is the maximum number of threads a process can have in windows [closed]
What's the maximum number of threads in Windows Server 2003?
"What's the maximum number of threads possible for a threads in Windows 8.1?
The maximum number of thread [duplicate]
Similarly, the following two oft-cited articles also are concerned with address space limits.
Pushing the Limits of Windows: Processes and Threads
Does Windows have a limit of 2000 threads per process?
This question already has answers here:
Java process memory usage (jcmd vs pmap)
(3 answers)
Relation between memory host and memory arguments xms and xmx from Java
(1 answer)
Closed 5 years ago.
This is my ps -eo snapshot some process occupy 2.1GB memory.
Max size of its heap is 768mb and max size of its metaspace size is 256mb.
And I guess the process will cannot occupy over 1024mb(768+256). But It isn`t.
What is included in "RSS" except heap and metaspace? And how can I monitoring inside of "RSS" like heap stack analzer?
the RSS is the size of all the memory used for any purpose including the JVM, Shared libraries, thread stacks, direct memory, memory mapped files, native memory use, native GFX components. The heap and meta space are just two memory regions.
Note the virtual memory size is 15 GB.
To see what the memory is used for you can dump /proc/{pid}/smaps which shows all the memory regions (and there will be hundreds) and how much of each one is resident. (IntelliJ running on my machine has 403 memory regions)
The answer probably differs depending on the OS, but I'm curious how much stack space does a thread normally preallocate. For example, if I use:
push rax
that will put a value on the stack and increment the rsp. But what if I never use a push op? I imagine some space still gets allocated, but how much? Also, is this a fixed amount or is does it grow dynamically with the amount of stuff pushed?
POSIX does not define any standards regarding stack size, it is entirely implementation dependent. Since you tagged this OSX, the default allocations there are :
Main thread (8MB)
Secondary Thread (512kB)
Naturally, these can be configured to suit your needs. The allocation is dynamic :
The minimum allowed stack size for secondary threads is 16 KB and the
stack size must be a multiple of 4 KB. The space for this memory is
set aside in your process space at thread creation time, but the
actual pages associated with that memory are not created until they
are needed.
There is too much detail to include here. I suggest you read :
Thread Management (Mac Developer Library)
I use pthreads_attr_getthreadsizes() to get default stack size of one thread, 8MB on my machine.
But when I create 8 threads and allocate a very large stack size to them, say hundreds of MB, the program crash.
So, I guess, shall
("Number of threads" * "stack size per thread") < a constant value (e.g. virtual memory size)
?
The short answer is "Yes".
The longer answer is that all of your threads share one virtual address space, and userspace-usable part of this space must be therefore be large enough to contain all thread stacks (as well as the code, static data, heap, libraries and any miscellaneous mappings).
Multi-hundred-megabyte stacks are a good indication that You're Doing It Wrong, as they say in the classics.