What plausible reasons are there why the startup time for a large program would get SLOWER when I add more RAM into a Linux PC? - linux

I have a large C++/OpenGL program (running under Linux) that takes several minutes to start up (it's loading a LOT of files) - it ends up consuming around 70Gbytes of RAM. We've been running it on a 96Gbyte machine - and recently upgraded to 128Gbytes. To our surprise, the startup time INCREASED by about 30 seconds.
We've checked that the RAM clock speed didn't change - the new RAM chips are the same type as the existing ones and the problem is very repeatable. We've checked for RAM faults (there aren't any) and that none of the CPU cache properties have changed. The amount of RAM our program consumes hasn't changed before/after the memory increase.
We'd expect the startup time to be about the same - but this is not a small increase.

Related

Is there a way to prevent VRAM swapping from stalling the whole system?

I made a game (using Unity) which is powered by two open-source machine learning models, VQGAN-CLIP and GPT-NEO (converted to exe via pyinstaller), simultaneously running.
VQGAN-CLIP generates images in the background while GPT-NEO needs to generate text periodically. When GPT-NEO gets called while VQGAN-CLIP is generating something, MSI Afterburner shows the VRAM usage exceeds the maximum, or was already at maximum, and I get a whole system freeze for about 1 second while the VRAM swaps (I confirmed that it only happens if both are running simultaneously, not when only 1 of them is running). I am using the RTX 2060S with 8GB of VRAM.
Also I have 64 GB ram and the usage is nowhere near there so it's only swapping to RAM rather than SSD.
It is not feasible to block GPT-NEO until VQGAN-CLIP finishes generating the image, because it could take 30-60 seconds to generate 1 image which is too long to wait in between turns.
My question is: Is this absolutely inevitable or is there a workaround to swap the VRAM without freezing the whole system, or better yet the game itself?

inotifywait secretely consumes a lot of memory

On Ubuntu 14.04:
$ cat /proc/sys/fs/inotify/max_queued_events
16384
$ cat /proc/sys/fs/inotify/max_user_instances
128
$ cat /proc/sys/fs/inotify/max_user_watches
1048576
Right after computer restart I had 1GB of RAM consumed. After 20-30 minutes (having just 1 terminal opened) I had 6GB RAM used and growing, however none of the processes seemed to be using so much memory (according to htop and top). When I've killed inotifywait process memory was not freed but stopped growing. Then I've restarted PC, killed inotifywait right away and memory usage stopped at 1GB.
I have 2 hard drives, one is 1TB and second is 2TB. Was inotifywait somehow caching those or in general is it normal that it caused such behavior?
This is the Linux disk cache at work, it is not a memory leak in inotifywait or anything else.
I've accepted previous answer because it explains what's going on. However I have some second thoughts regarding this topic. What the page is basically saying is "caching doesn't occupy memory because this memory can at any point be taken back, so you should not worry, you should not panic, you should be grateful!". Well ... I'm not. I believe there should be some decent but at the same time hard limit for caching.
The idea behind this process was good -> "let's not waste user time and cache everything till we have space". However what this process is actually doing in my case is wasting my time. Currently I'm working on Linux which is running in virtual machine. Since I have a lot jira pages opened, a lot terminals in many desktops, some tools opened and etc I don't want to open all that stuff every day, so I just save virtual machine state instead of turning it off at the end of the day. Now let's say that my stuff occupies 4GB RAM. What will happen is that 4GB will be written into my hard drive after I save state and then 4GB will have to be loaded into RAM when I start virtual machine. Unfortunately that's only theory. In practice due to inotifywait which will happily fill my 32 GB RAM I have to wait 8 times longer for saving and restoring virtual machine. Yes my RAM is "fine" as the page is saying, yes I can open different app and I will not hit OOM but at the same time caching is wasting my time.
If the limit was decent let's say 1GB for caching then it would not be so painful. I think if I would have VM on HDD instead of SSD it would took forever to save the state and I would probably not use it at all.

Estimate Core capacity required based on load?

I have quad core ubuntu system. say If I see the load average as 60 in last 15 mins during peak time. Load average goes to 150 as well.
This loads happens generally only during peak time. Basically I want to know if there is any standard formula to derive the number of cores ideally required to handle the given load ?
Objective :-
If consider the load as 60 then it means 60 task were in queue on an average at any point of time in last 15 mins ? Adding cpu can help me to server the
request faster or save system from hang or crashing .
Linux load average (as printed by uptime or top) includes tasks in I/O wait, so it can have very little to do with CPU time that could potentially be used in parallel.
If all the tasks were purely CPU bound, then yes 150 sustained load average would mean that potentially 150 cores could be useful. (But if it's not sustained, then it might just be a temporary long queue that wouldn't get that long if you had better CPU throughput.)
If you're getting crashes, that's a huge problem that isn't explainable by high loads. (Unless it's from the out-of-memory killer kicking in.)
It might help to use vmstat or dstat to see how much CPU time is spent in user/kernel space when your load avg. is building up, or if it's probably mostly I/O.
Or of course you probably know what tasks are running on your machine, and whether one single task is I/O bound or CPU bound on an otherwise-idle machine. I/O throughput usually scales a bit positively with queue depth, except on magnetic hard drives when that turns sequential read/write into seek-heavy workloads.

Mono 4.2.2 garbage collection really slow/leaking on Linux with multiple threads?

I have an app that processes 3+GB of data into 300MB of data. Run each independent dataset sequentially on the main thread, its memory usage tops out at about 3.5GB and it works fine.
If I run each dataset concurrently on 10 threads, I see the following:
Virtual memory usage climbs steadily until allocations fail and it crashes. I can see GC is trying to run in the stack trace)
CPU utilization is 1000% for periods, then goes down to 100% for minutes, and then cycles back up. The app is easily 10x slower when run with multiple threads, even though they are completely independent.
This is mono 4.2.2 build for Linux with large heap support, running on 128GB RAM with 40 logical processors. I am running mono-sgen and have tried all the custom GC settings I could think of (concurrent mark-sweep, max heap size, etc).
These problems do not happen on Windows. If I rewrite code to do significant object pooling, I get farther in the dataset before running OOM, but the fate is the same. I have verified that I have no memory leaks using multiple tools and good-old printf-debugging.
My best theory is that lots of allocations across lots of threads are a weak case for the GC, and most of that wall-clock time is spent with my work threads suspended.
Does anyone have any experience with this? Is there a way I can help the GC get out of that 100% rut it gets stuck in, and to not run out of memory?

Limiting RAM usage during performance tests

I have to run some performance tests, to see how my programs work when the system runs out of RAM and the system starts thrashing. Ideally, I would be able to change the amount of RAM used by the system.
I haved tried to by boot my system (running Ubuntu 10.10) in single user mode with a limited amount of physical memory, but with the parameters I used (max_addr=300M, max_addr=314572800 or mem=300M) the system did not use my swap partition.
Is there a way to limit the amount of RAM used by the total system, while still using swap space?
The point is to measure the total running time of each program as a function of the input size. I am not trying to pinpoint performance problems, I am trying to compare algorithms, which means I need accuracy.
Write a simple c program which
Will allocate large amount of memory.
Keep on accessing allocated memory random to try to keep in main memory (in an infinite loop).
Now run this program (one or few processes) so that you allocate enough memory to cause the thrashing of process you are testing.

Resources