This question already has answers here:
Why doesn't this memory eater really eat memory?
(5 answers)
Closed 10 months ago.
Hi Anybody knows what is the upper limit for the heap allocation in linux process?
Consider below example,
int main() {
char *p;
unsigned long int cnt=0;
while(1) {
p = (char*)malloc(128*1024*1024); //128MB
cnt++;
cout <<cnt<<endl;
}
return 0;
}
This program will get killed only after around ~200000 iterations, that means it allocates 128MB*200000=~25TB, my system itself has 512gb of SSD + 6GB of RAM, how this program able to allocate 25TB of memory?
Thanks Nate Eldredge for pointing Why doesn't this memory eater really eat memory?
It actually consumes the memory only if we wrote some data into it, when I modify above program like this it actually exited after my PC's RAM was fully consumed (which was ~3GB in 4GB RAM system, I guess 1 GB RAM was reserved for kernel)
int main() {
char *p;
unsigned long int cnt=0;
size_t i=0, t = 128*1024*1024;
while(1) {
p = (char*)malloc(t);
#if 1
for (int i=0;i<t;i++)
*p++ = '0'+(i%65);
#endif
cnt++;
cout <<cnt<<endl;
}
return 0;
}
Related
Code as below, found that when vector push_back string on a Mac demo app, memory not be freed. I thought the stack variable will be freed when out of function scope, am I wrong? Thanks for any tips.
in model.h:
#pragma once
namespace NS {
const uint8_t kModel[8779041] = {4,0,188,250,....};
}
in ViewController.mm:
- (void)start {
std::vector<std::string> params = {};
std::string strModel(reinterpret_cast<const char *>(NS::kModel), sizeof(NS:kModel));
params.push_back(strModel);
}
The answer to your question depends on your understanding of the the "free" memory. The behaviour you are observing can be reproduced as simple as with a couple lines of code:
void myFunc() {
const auto *ptr = new uint8_t[8779041]{};
delete[] ptr;
}
Let's run this function and see how the memory consumption graph changes:
int main() {
myFunc(); // 1 MB
std::cout << "Check point" << std::endl; // 9.4 MB
return 0;
}
If you put one breakpoint right at the line with myFunc() invocation and another one at the line with "Check point" console output, you will witness how memory consumption for the process jumps by about 8 MB (for my system and machine configuration Xcode shows sudden jump from 1 MB to 9.4 MB). But wait, isn't it supposed to be 1 MB again after the function, as the allocated memory is freed at the end of the function? Well, not exactly.. The system doesn't regain this memory right away, because it's not that cheap operation to begin with, and if your process requests the same amount memory 1 CPU cycle later, it would be quite a redundant work. Thus, the system usually doesn't bother shrinking memory dedicated to a process either until it's needed for another process, and until it runs out of available resources (it also can be some kind of fixed timer, but overall I would say this is implementation-defined). Another common reason the memory is not freed, is because you often observe it through debug mode, where the memory remains dedicated to the process to track some tricky scenarios (like NSZombie objects, which address needs to remain accessible to the process in order to report the use-after-free occasions).
The most important here is that internally, the process can differentiate between "deleted" and "occupied" memory pages, thus it can re-occupy memory which is already deleted. As a result, no matter how many times you call the same function, the memory dedicated to the process remains the same:
int main() {
myFunc(); // 1 MB
std::cout << "Check point" << std::endl; // 9.4 MB
for (int i = 0; i < 10000; ++i) {
myFunc();
}
std::cout << "Another point" << std::endl; // 9.4 MB
return 0;
}
when I run this, it seems to have no problem with keep allocating memory with cnt going over thousands. I don't understand why -- aren't I supposed to get a NULL at some point? Thanks!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
int main(void)
{
long C = pow(10, 9);
int cnt = 0;
int conversion = 8 * 1024 * 1024;
int *p;
while (1)
{
p = (int *)malloc(C * sizeof(int));
if (p != NULL)
cnt++;
else break;
if (cnt % 10 == 0)
printf("number of successful malloc is %d with %ld Mb\n", cnt, cnt * C / conversion);
}
return 0;
}
Are you running this on Linux? Linux has a highly surprising feature known as overcommit. It doesn't actually allocate memory when you call malloc(), but rather when you actually use that memory. malloc() will happily let you allocate as much memory as your heart desires, never returning a NULL pointer.
It's only when you actually access the memory that Linux takes you seriously and goes out searching for free memory to give you. Of course there may not actually be enough memory to meet the promise it gave your program. You say, "Give me 8GB," and malloc() says, "Sure." Then you try to write to your pointer and Linux says, "Oops! I lied. How bout I just kill off processes (probably yours) until I I free up enough memory?"
You're allocating virtual memory. On a 64-bit OS, virtual memory is available in almost unlimited supply.
I have been working on trying to figure out why my program is consuming so much system RAM. I'm loading a file from disk into a vector of structs of several dynamically allocated arrays. A 16MB file ends up consuming 280MB of system RAM according to task manager. The types in the file are mostly chars with some shorts and a few longs. There are 331,000 records in the file containing on average about 5 fields. I converted the vector to a struct and that reduced the memory to about 255MB but that still seems very high. With the vector taking up so much memory the program is running out of memory so I need to find a way to get the memory usage more reasonable.
I wrote a simple program to just stuff a vector (or array) with 1,000,000 char pointers. I would expect it to allocate 4+1 bytes for each giving 5MB of memory required for storage, but in fact it is using 64MB (array version) or 67MB (vector version). When the program first starts up it only consumes 400K so why is there an additional 59MB for array or 62MB for vectors being allocated? This extra memory seems to be for each container, so if I create a size_check2 and copy everything and run it the program uses up 135MB for 10MB worth of pointers and data.
Thanks in advance,
size_check.h
#pragma once
#include <vector>
class size_check
{
public:
size_check(void);
~size_check(void);
typedef unsigned long size_type;
void stuff_me( unsigned int howMany );
private:
size_type** package;
// std::vector<size_type*> package;
size_type* me;
};
size_check.cpp
#include "size_check.h"
size_check::size_check(void)
{
}
size_check::~size_check(void)
{
}
void size_check::stuff_me( unsigned int howMany )
{
package = new size_type*[howMany];
for( unsigned int i = 0; i < howMany; ++i )
{
size_type *me = new size_type;
*me = 33;
package[i] = me;
// package.push_back( me );
}
}
main.cpp
#include "size_check.h"
int main( int argc, char * argv[ ] )
{
const unsigned int buckets = 20;
const unsigned int size = 50000;
size_check* me[buckets];
for( unsigned int i = 0; i < buckets; ++i )
{
me[i] = new size_check();
me[i]->stuff_me( size );
}
printf( "done.\n" );
}
In my test using VS2010, a debug build had a working set size of 52,500KB. But a release build had a working set
size of 20,944KB.
Debug builds will usually use more memory than optimized builds due to the debug heap manager doing things like creating memory fences.
In release builds, I suspect that the heap manager reserves more memory than you are actually using as a performance optimization.
Memory Leak
package = new size_type[howMany]; // instantiate 50,000 size_type's
for( unsigned int i = 0; i < howMany; ++i )
{
size_type *me = new size_type; // Leak: results in an extra 50k size_type's being instantiated
*me = 33;
package[i] = *me; // Set a non-pointer to what is at the address of pointer "me"
// Would package[i] = 33; not suffice?
}
Furthermore, make sure you've compiled in release mode
There might be a couple reasons why you're seeing such a large memory footprint from your test program. Inside your
void size_check::stuff_me( unsigned int howMany )
{
This method is always getting called with howMany = 50000.
package = new size_type[howMany];
Assuming this is on a 32-bit setup the above statement will allocate 50,000 * 4 bytes.
for( unsigned int i = 0; i < howMany; ++i )
{
size_type *me = new size_type;
The above will allocate new storage on each iteration of the loop. Since this loops 50,000 and the allocation never gets deleted that effectively takes up another 50,000 * 4 bytes upon loop completion.
*me = 33;
package[i] = *me;
}
}
Lastly, since stuff_me() gets called 20 times from main() your program would have allocated at least ~8Mbytes upon completion. If this is on a 64-bit system than the footprint will likely double since sizeof(long) == 8bytes.
The increase in memory consumption could have something to do with the way VS implements dynamic allocation. For performance reasons, it's possible that due to the multiple calls to new your program is reserving extra memory so as to avoid hitting up the OS everytime it needs more.
FYI, when I ran your test program on mingw-gcc 4.5.2, the memory consumption was ~20Mbytes -- much lower than what you were seeing but still a substantial amount. If I changed the stuff_me method to this:
void size_check::stuff_me( unsigned int howMany )
{
package = new size_type[howMany];
size_type *me = new size_type;
for( unsigned int i = 0; i < howMany; ++i )
{
*me = 33;
package[i] = *me;
}
delete me;
}
memory consumption goes down quite a bit down to ~4-5mbytes.
I think I found the answer by delving into the new statement. In debug builds there are two items that are created when you do a new. One is _CrtMemBlockHeader which is 32 bytes in length. The other is noMansLand (a memory fence) with a size of 4 bytes which gives us an overhead of 36 bytes for each new. In my case each individual new for a char was costing me 37 bytes. In release builds the memory usage is reduced to about 1/2 but I can't tell exactly how much is allocated for each new as I can't get to the new/malloc routine.
So my work around is to allocate a large block of memory to hold the file in memory. Then parse the memory image filling in a vector of pointers to the beginning of each of the records. Then on demand, I build a record from the memory image using the pointer to the beginning of the selected record. Doing this reduced the memory footprint to <25MB.
Thanks for all your help and suggestions.
I wrote a simple program to calculate the maximum number of threads that a process can have in linux (Centos 5). here is the code:
int main()
{
pthread_t thrd[400];
for(int i=0;i<400;i++)
{
int err=pthread_create(&thrd[i],NULL,thread,(void*)i);
if(err!=0)
cout << "thread creation failed: " << i <<" error code: " << err << endl;
}
return 0;
}
void * thread(void* i)
{
sleep(100);//make the thread still alive
return 0;
}
I figured out that max number for threads is only 300!? What if i need more than that?
I have to mention that pthread_create returns 12 as error code.
Thanks before
There is a thread limit for linux and it can be modified runtime by writing desired limit to /proc/sys/kernel/threads-max. The default value is computed from the available system memory. In addition to that limit, there's also another limit: /proc/sys/vm/max_map_count which limits the maximum mmapped segments and at least recent kernels will mmap memory per thread. It should be safe to increase that limit a lot if you hit it.
However, the limit you're hitting is lack of virtual memory in 32bit operating system. Install a 64 bit linux if your hardware supports it and you'll be fine. I can easily start 30000 threads with a stack size of 8MB. The system has a single Core 2 Duo + 8 GB of system memory (I'm using 5 GB for other stuff in the same time) and it's running 64 bit Ubuntu with kernel 2.6.32. Note that memory overcommit (/proc/sys/vm/overcommit_memory) must be allowed because otherwise system would need at least 240 GB of committable memory (sum of real memory and swap space).
If you need lots of threads and cannot use 64 bit system your only choice is to minimize the memory usage per thread to conserve virtual memory. Start with requesting as little stack as you can live with.
Your system limits may not be allowing you to map the stacks of all the threads you require. Look at /proc/sys/vm/max_map_count, and see this answer. I'm not 100% sure this is your problem, because most people run into problems at much larger thread counts.
I had also encountered the same problem when my number of threads crosses some threshold.
It was because of the user level limit (number of process a user can run at a time) set to 1024 in /etc/security/limits.conf .
so check your /etc/security/limits.conf and look for entry:-
username -/soft/hard -nproc 1024
change it to some larger values to something 100k(requires sudo privileges/root) and it should work for you.
To learn more about security policy ,see http://linux.die.net/man/5/limits.conf.
check the stack size per thread with ulimit, in my case Redhat Linux 2.6:
ulimit -a
...
stack size (kbytes, -s) 10240
Each of your threads will get this amount of memory (10MB) assigned for it's stack. With a 32bit program and a maximum address space of 4GB, that is a maximum of only 4096MB / 10MB = 409 threads !!! Minus program code, minus heap-space will probably lead to your observed max. of 300 threads.
You should be able to raise this by compiling a 64bit application or setting ulimit -s 8192 or even ulimit -s 4096. But if this is advisable is another discussion...
You will run out of memory too unless u shrink the default thread stack size. Its 10MB on our version of linux.
EDIT:
Error code 12 = out of memory, so I think the 1mb stack is still too big for you. Compiled for 32 bit, I can get a 100k stack to give me 30k threads. Beyond 30k threads I get Error code 11 which means no more threads allowed. A 1MB stack gives me about 4k threads before error code 12. 10MB gives me 427 threads. 100MB gives me 42 threads. 1 GB gives me 4... We have 64 bit OS with 64 GB ram. Is your OS 32 bit? When I compile for 64bit, I can use any stack size I want and get the limit of threads.
Also I noticed if i turn the profiling stuff (Tools|Profiling) on for netbeans and run from the ide...I only can get 400 threads. Weird. Netbeans also dies if you use up all the threads.
Here is a test app you can run:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <signal.h>
// this prevents the compiler from reordering code over this COMPILER_BARRIER
// this doesnt do anything
#define COMPILER_BARRIER() __asm__ __volatile__ ("" ::: "memory")
sigset_t _fSigSet;
volatile int _cActive = 0;
pthread_t thrd[1000000];
void * thread(void *i)
{
int nSig, cActive;
cActive = __sync_fetch_and_add(&_cActive, 1);
COMPILER_BARRIER(); // make sure the active count is incremented before sigwait
// sigwait is a handy way to sleep a thread and wake it on command
sigwait(&_fSigSet, &nSig); //make the thread still alive
COMPILER_BARRIER(); // make sure the active count is decrimented after sigwait
cActive = __sync_fetch_and_add(&_cActive, -1);
//printf("%d(%d) ", i, cActive);
return 0;
}
int main(int argc, char** argv)
{
pthread_attr_t attr;
int cThreadRequest, cThreads, i, err, cActive, cbStack;
cbStack = (argc > 1) ? atoi(argv[1]) : 0x100000;
cThreadRequest = (argc > 2) ? atoi(argv[2]) : 30000;
sigemptyset(&_fSigSet);
sigaddset(&_fSigSet, SIGUSR1);
sigaddset(&_fSigSet, SIGSEGV);
printf("Start\n");
pthread_attr_init(&attr);
if ((err = pthread_attr_setstacksize(&attr, cbStack)) != 0)
printf("pthread_attr_setstacksize failed: err: %d %s\n", err, strerror(err));
for (i = 0; i < cThreadRequest; i++)
{
if ((err = pthread_create(&thrd[i], &attr, thread, (void*)i)) != 0)
{
printf("pthread_create failed on thread %d, error code: %d %s\n",
i, err, strerror(err));
break;
}
}
cThreads = i;
printf("\n");
// wait for threads to all be created, although we might not wait for
// all threads to make it through sigwait
while (1)
{
cActive = _cActive;
if (cActive == cThreads)
break;
printf("Waiting A %d/%d,", cActive, cThreads);
sched_yield();
}
// wake em all up so they exit
for (i = 0; i < cThreads; i++)
pthread_kill(thrd[i], SIGUSR1);
// wait for them all to exit, although we might be able to exit before
// the last thread returns
while (1)
{
cActive = _cActive;
if (!cActive)
break;
printf("Waiting B %d/%d,", cActive, cThreads);
sched_yield();
}
printf("\nDone. Threads requested: %d. Threads created: %d. StackSize=%lfmb\n",
cThreadRequest, cThreads, (double)cbStack/0x100000);
return 0;
}
What does it mean when Valgrind reports o bytes lost, like here:
==27752== 0 bytes in 1 blocks are definitely lost in loss record 2 of 1,532
I suspect it is just an artifact from creative use of malloc, but it is good to be sure (-;
EDIT: Of course the real question is whether it can be ignored or it is an effective leak that should be fixed by freeing those buffers.
Yes, this is a real leak, and it should be fixed.
When you malloc(0), malloc may either give you NULL, or an address that is guaranteed to be different from that of any other object.
Since you are likely on Linux, you get the second. There is no space wasted for the allocated buffer itself, but libc has to do some housekeeping, and that does waste space, so you can't go on doing malloc(0) indefinitely.
You can observe it with:
#include <stdio.h>
#include <stdlib.h>
int main() {
unsigned long i;
for (i = 0; i < (size_t)-1; ++i) {
void *p = malloc(0);
if (p == NULL) {
fprintf(stderr, "Ran out of memory on %ld iteration\n", i);
break;
}
}
return 0;
}
gcc t.c && bash -c 'ulimit -v 10240 && ./a.out'
Ran out of memory on 202751 iteration
It looks like you allocated a block with 0 size and then didn't subsequently free it.