Is there a way to prevent VRAM swapping from stalling the whole system? - pytorch

I made a game (using Unity) which is powered by two open-source machine learning models, VQGAN-CLIP and GPT-NEO (converted to exe via pyinstaller), simultaneously running.
VQGAN-CLIP generates images in the background while GPT-NEO needs to generate text periodically. When GPT-NEO gets called while VQGAN-CLIP is generating something, MSI Afterburner shows the VRAM usage exceeds the maximum, or was already at maximum, and I get a whole system freeze for about 1 second while the VRAM swaps (I confirmed that it only happens if both are running simultaneously, not when only 1 of them is running). I am using the RTX 2060S with 8GB of VRAM.
Also I have 64 GB ram and the usage is nowhere near there so it's only swapping to RAM rather than SSD.
It is not feasible to block GPT-NEO until VQGAN-CLIP finishes generating the image, because it could take 30-60 seconds to generate 1 image which is too long to wait in between turns.
My question is: Is this absolutely inevitable or is there a workaround to swap the VRAM without freezing the whole system, or better yet the game itself?

Related

What plausible reasons are there why the startup time for a large program would get SLOWER when I add more RAM into a Linux PC?

I have a large C++/OpenGL program (running under Linux) that takes several minutes to start up (it's loading a LOT of files) - it ends up consuming around 70Gbytes of RAM. We've been running it on a 96Gbyte machine - and recently upgraded to 128Gbytes. To our surprise, the startup time INCREASED by about 30 seconds.
We've checked that the RAM clock speed didn't change - the new RAM chips are the same type as the existing ones and the problem is very repeatable. We've checked for RAM faults (there aren't any) and that none of the CPU cache properties have changed. The amount of RAM our program consumes hasn't changed before/after the memory increase.
We'd expect the startup time to be about the same - but this is not a small increase.

Saving GPU memory by bypassing GUI

I have a mac book pro with a 2 Gb Nvidia GPU. I am trying to utilize all my GPU memory for computations (python code). How much saving I may get if I bypassed the GUI interface and only accessed my machine through command line. I want to know if such a thing would save me a good amount of GPU memory?
The difference probably won't be huge.
A GPU that is only hosting a console display will only typically have ~5-25 megabytes of memory reserved out of the total memory size. On the other hand, a GPU that is hosting a GUI display (using the NVIDIA GPU driver) might typically have ~50 megabytes or more reserved for display use (this may vary somewhat based on the size of the display attached).
So you can probably get a good "estimate" of the savings by running nvidia-smi and looking at the difference between total and available memory for your GPU with the GUI running. If that is, for example, 62MB, then you can probably "recover" around 40-50MB by shutting off the GUI, for example on linux switching to runlevel 3.
I just ran this experiment on a linux laptop with a Quadro3000M that happens to have 2GB of memory. With the X display up and NVIDIA GPU driver loaded, the "used" memory was 62MB out of 2047MB (reported by nvidia-smi).
When I switch to runlevel 3 (X is not started) the memory usage dropped to about 4MB. That probably means that ~50MB additional is available to CUDA.
A side benefit of shutting off the GUI might be the elimination of the display watchdog.

Linux using swap instead of RAM with large image processing

I'm processing large images on a Linux server using the R programming language, so I expect much of the RAM to be used in the image processing and file writing process.
However, the server is using swap memory long before it appears to need to, thus slowing down the processing time significantly. See following image:
This shows I am using roughly 50% of the RAM for the image processing, about 50% appears to be reserved for disk cache (yellow) and yet 10Gb of swap is being used!
I was watching the swap being eaten up, and it didn't happen when the RAM was any higher in use than is being shown in this image. The swap appears to be eaten up during the processed data being written to a GeoTiff file.
My working theory is that the disk writing process is using much of the disk cache area (yellow region), and therefore the yellow isn't actually available to the server (as is often assumed of disk cache RAM)?
Does that sound reasonable? Is there another reason for swap being used when RAM is apparently available?
I believe you may be affected by swappiness kernel parameter:
When an application needs memory and all the RAM is fully occupied, the kernel has two ways to free some memory at its disposal: it can either reduce the disk cache in the RAM by eliminating the oldest data or it may swap some less used portions (pages) of programs out to the swap partition on disk. It is not easy to predict which method would be more efficient. The kernel makes a choice by roughly guessing the effectiveness of the two methods at a given instant, based on the recent history of activity.
Swappiness takes a value between 0 and 100 to change the balance between swapping applications and freeing cache. At 100, the kernel will always prefer to find inactive pages and swap them out. A value of 0 gives something close to the old behavior where applications that wanted memory could shrink the cache to a tiny fraction of RAM.
If you want to force the kernel to avoid swapping whenever possible and give the RAM from device buffers and disk cache to your application, you can set swappiness to zero:
echo 0 > /proc/sys/vm/swappiness
Note that you may actually worsen the performace with this setting, because your disk cache may shrink to a tiny fraction of what it is now, making disk access slower.

linux CPU cache slowdown

We're getting overnight lockups on our embedded (Arm) linux product but are having trouble pinning it down. It usually takes 12-16 hours from power on for the problem to manifest itself. I've installed sysstat so I can run sar logging, and I've got a bunch of data, but I'm having trouble interpreting the results.
The targets only have 512Mb RAM (we have other models which have 1Gb, but they see this issue much less often), and have no disk swap files to avoid wearing the eMMCs.
Some kind of paging / virtual memory event is initiating the problem. In the sar logs, pgpin/s, pgnscand/s and pgsteal/s, and majflt/s all increase steadily before snowballing to crazy levels. This puts the CPU up correspondingly high levels (30-60 on dual core Arm chips). At the same time, the frmpg/s values go very negative, whilst campg/s go highly positive. The upshot is that the system is trying to allocate a large amount of cache pages all at once. I don't understand why this would be.
The target then essentially locks up until it's rebooted or someone kills the main GUI process or it crashes and is restarted (We have a monolithic GUI application that runs all the time and generally does all the serious work on the product). The network shuts down, telnet blocks forever, as do /proc filesystem queries and things that rely on it like top. The memory allocation profile of the main application in this test is dominated by reading data in from file and caching it as textures in video memory (shared with main RAM) in an LRU using OpenGL ES 2.0. Most of the time it'll be accessing a single file (they are about 50Mb in size), but I guess it could be triggered by having to suddenly use a new file and trying to cache all 50Mb of it all in one go. I haven't done the test (putting more logging in) to correlate this event with these system effects yet.
The odd thing is that the actual free and cached RAM levels don't show an obvious lack of memory (I have seen oom-killer swoop in the kill the main application with >100Mb free and 40Mb cache RAM). The main application's memory usage seems reasonably well-behaved with a VmRSS value that seems pretty stable. Valgrind hasn't found any progressive leaks that would happen during operation.
The behaviour seems like that of a system frantically swapping out to disk and making everything run dog slow as a result, but I don't know if this is a known effect in a free<->cache RAM exchange system.
My problem is superficially similar to question: linux high kernel cpu usage on memory initialization but that issue seemed driven by disk swap file management. However, dirty page flushing does seem plausible for my issue.
I haven't tried playing with the various vm files under /proc/sys/vm yet. vfs_cache_pressure and possibly swappiness would seem good candidates for some tuning, but I'd like some insight into good values to try here. vfs_cache_pressure seems ill-defined as to what the difference between setting it to 200 as opposed to 10000 would be quantitatively.
The other interesting fact is that it is a progressive problem. It might take 12 hours for the effect to happen the first time. If the main app is killed and restarted, it seems to happen every 3 hours after that fact. A full cache purge might push this back out, though.
Here's a link to the log data with two files, sar1.log, which is the complete output of sar -A, and overview.log, a extract of free / cache mem, CPU load, MainGuiApp memory stats, and the -B and -R sar outputs for the interesting period between midnight and 3:40am:
https://drive.google.com/folderview?id=0B615EGF3fosPZ2kwUDlURk1XNFE&usp=sharing
So, to sum up, what's my best plan here? Tune vm to tend to recycle pages more often to make it less bursty? Are my assumptions about what's happening even valid given the log data? Is there a cleverer way of dealing with this memory usage model?
Thanks for your help.
Update 5th June 2013:
I've tried the brute force approach and put a script on which echoes 3 to drop_caches every hour. This seems to be maintaining the steady state of the system right now, and the sar -B stats stay on the flat portion, with very few major faults and 0.0 pgscand/s. However, I don't understand why keeping the cache RAM very low mitigates a problem where the kernel is trying to add the universe to cache RAM.

How can I detect my RAM free and total space in Python?

So, the title describes almost all the necessary to answer me. Just one more thing: please, just reply about libraries installed with Python by default, as the app which I'm developing is part of the Ubuntu App Showdown.
Running Python 2.7, Ubuntu 12.04.
You are asking for a number that is nearly impossible to calculate and has very little value.
Any Linux system that is running for an amount of time will have hardly any 'free' ram available. Just cat /proc/meminfo - the MemFree entry is usually in order of just a few megabytes.
So, where did that memory go?
The kernel caches all disk access, for starters.
That's usually visible in the Cached entry. Disk cache will be pruned when you require more memory, so you could add that number to MemFree .
But, if an application allocates (malloc() in C) 2 gigabytes on a system with exactly 2 gigabytes of RAM, that usually will just be granted: you get a valid pointer back.
However, none of the RAM is actually reserved for your application - that only happens when your application starts touching memory pages - each touched page will be allocated.
The maximum size you can ask for is available as CommitLimit.
But the application code itself might not be in RAM either - binary file and libraries are mmapp()ed, so again only pages that are touched are loaded into RAM.
If you run a tool like top - you get all kinds of memory info per process, including VIRT, RES and SHR.
VIRT is for 'virtual' - all memory pages that the app would need if it would claim all pages it has asked for.
RES is 'resident' - the amount of memory actually used
SHR is 'shared' - the amount of pages that are shared with other applications, like e.g. libraries that are loaded in multiple applications.
So, what is the value of knowing how much memory is available?
You can start an application that could require significantly more RAM than your system has, and yet it runs...
You might even be able to run the application twice or thrice - code pages are shared anyway...
Note: the above answer cuts quite a few corners, the real mechanisms are significantly more complex. And I haven't even started bringing swap space into the story.
But this will do for you, I hope...

Resources