Limit memory usage Unix

Limit memory usage Unix - linux

I have access to a shared workstation running Linux and have to load in a large .csv file. However, I am uncertain how much memory that requires of the system as there will be some overhead and I am not allowed to use more than a specific amount of the memory.
So can I by any means limit the memory usage either inside Matlab or as I start the job itself? Everything need to happen through the terminal.

I you are using MATLAB R2015 or later you can setup the array size limits in the Preferences:
http://de.mathworks.com/help/matlab/matlab_env/set-workspace-and-variable-preferences.html
In my opinion it would be a better solution to control the array sizes by your script/function.

Related

Does RAM affect the time taken to sort an array?

I have an array of a 500k to million items to be sorted. Does going with a configuration of increased RAM be beneficial or not, say 8GB to 32GB or above. Im using a node.JS/mongoDB environment.

Adding RAM for an operation like that would only make a difference if you have filled up the available memory with everything that was running on your computer and the OS was swapping data out to disk to make room for your sort operation. Chances are, if that was happening, you would know because your computer would become pretty sluggish.
So, you just need enough memory for the working set of whatever applications you're running and then enough memory to hold the data you are sorting. Adding additional memory beyond that will not make any difference.
If you had an array of a million numbers to be sorted in Javascript, that array would likely take (1,000,000 * 8 bytes per number) + some overhead for a JS data structure = ~8MB. If your array values were larger than 8 bytes, then you'd have to account for that in the calculation, but hopefully you can see that this isn't a ton of memory in a modern computer.
If you have only an 8GB system and you have a lot of services and other things configured in it and are perhaps running a few other applications at the time, then it's possible that by the time you run nodejs, you don't have much free memory. You should be able to look at some system diagnostics to see how much free memory you have. As long as you have some free memory and are not causing the system to do disk swapping, adding more memory will not increase performance of the sort.
Now, if the data is stored in a database and you're doing some major database operation (such as creating a new index), then it's possible that the database may adjust how much memory it can use based on how much memory is available and it might be able to go faster by using more RAM. But, for a Javascript array which is already all in memory and is using a fixed algorithm for the sort, this would not be the case.

How to run many instances of the same process in a resource constraint environment without duplicating memory content

I observe that each ffmpeg instance doing audio decoding takes about 50 mb of memory. If I record 100 stations, that's 5 GB of RAM.
Now, they all more or less use the same amount of RAM, I suspect the contain the same information over and over again because they are spawned as new processes rather than forked.
Is there way to avoid this duplication?
I am using Ubuntu 20.04, x64

Now, they all more or less use the same amount of RAM, I suspect the
contain the same information over and over again because they are
spawned as new processes rather than forked.
Have you considered that the processes may use about the same amount of RAM because they are performing roughly the same computation, with similar parameters?
Have you considered that whatever means you are using to compute memory usage may be insensitive to whether the memory used is attributed uniquely to the process vs. already being shared with other processes?
Is there way to avoid this duplication?
Programs that rely on shared libraries already share those libraries' executable code among them, saving memory.
Of course, each program does need its own copy of any writable data belonging to the library, some of which may turn out to be unused by a particular client program, and programs typically have memory requirements separate from those of any libraries they use, too. Whatever amount of that 50 MB per process is in fact additive across processes is going to be from these sources. Possibly you could reduce the memory load from these by changing program parameters (or by changing programs), but there's no special way to run the same number of instances of the program you're running now, with the same options and inputs, to reduce the amount of memory they use.

stop file caching for a process and its children

I have a process that reads thousands of small files ONE TIME. The cached data is not needed after this. The process proceeds at full speed until most memory is consumed by the file cache and then it slows down. I don't understand the slowdown, since freeing cache memory and allocating space for the next file should be a matter of microseconds. Hard page faults also increase when this threshold is reached. The OS is vanilla Ubuntu 16.04.
I would like to limit the file caching for this process only.
This is a user process, so using a privileged shell command to purge the cache is not a solution. Using fadvise on a per-file level is not a solution, since the files are being read my multiple library programs depending on the file type.
What I need is a process-level option: do not cache, or set a low size limit like 100 MB. I have searched for this and found nothing. Is this really the case? Seems like something big that is missing.
Any insight on the apparent memory management performance issue?

Here's the strict answer to your question. If you are mmap-ing your files, the way to do this is using madvise() and MADV_DONTNEED:
MADV_DONTNEED
Do not expect access in the near future. (For the time being,
the application is finished with the given range, so the ker‐
nel can free resources associated with it.) Subsequent
accesses of pages in this range will succeed, but will result
either in reloading of the memory contents from the underlying
mapped file (see mmap(2)) or zero-fill-on-demand pages for
mappings without an underlying file.
There is to my knowledge no way of doing it with files that are simply opened, read (using read() or similar) and closed.
However, it sounds to me like this is not in fact the issue. Are you sure it's buffer / cache that is growing here, and not something else? (e.g. perhaps you are reading them into RAM and not freeing that RAM, or not closing them, or similar)
You can tell by doing:
echo 3 > /proc/sys/vm/drop_caches
if you don't get all the memory back, then it's your program which is leaking something.

I am convinced there is no way to stop file caching on a per-process level. The program must have direct control over file I/O, with access to the file descriptors so that madvise() can be used. You cannot do this if library functions are doing all the file reading and you are not willing to modify them. This does look like a design gap that should be filled.
HOWEVER: My assertion of some performance issue with memory management was wrong. The reason for the process slow-down as the file cache grows and free memory shrinks was something else: disk seek distances were growing during the process. Other tests have verified that allocating memory does not significantly slow down as the file cache grows and free memory shrinks.

How to shrink the page talbe size of a process?

mongodb server map all db files into RAM. Along with size of database becoming bigger, the server will has a huge page table which is up to 3G bytes.
Is there a way to shrink it when the server is running?
mongodb version is 2.0.4

Mongodb will memory-map all of the data files that it creates, plus the journal files (if you're using journaling). There is no way to prevent this from happening. This means that the virtual memory size of the MongoDB process will always be roughly twice the size of the data files.
Note that the OS memory management system will page out unused RAM pages, so that the physical memory size of the process will typically be much less than the virtual memory size.
The only way to reduce the virtual memory size of the 'mongod' process is to reduce the size of the MongoDB data files. The only way to reduce the size of the data files is to take the node offline and perform a 'repair'.
See here for more details:
- http://www.mongodb.org/display/DOCS/Excessive+Disk+Space#ExcessiveDiskSpace-RecoveringDeletedSpace

Basically you are asking to do something that the MongoDB manual recommends not to: http://docs.mongodb.org/manual/administration/ulimit/ in this specific scenario. Recommended however does not mean required and it is just a guideline really.
This is just the way MongoDB runs and something you have got to accept unless you wish to toy around and test out different scenarios and how they work.

You probably want to reduce the used memory of the process. You could use the ulimit bash builtin (before starting your server, perhaps in some /etc/rc.d/mongodb script) which calls the setrlimit(2) syscall

Limiting RAM usage during performance tests

I have to run some performance tests, to see how my programs work when the system runs out of RAM and the system starts thrashing. Ideally, I would be able to change the amount of RAM used by the system.
I haved tried to by boot my system (running Ubuntu 10.10) in single user mode with a limited amount of physical memory, but with the parameters I used (max_addr=300M, max_addr=314572800 or mem=300M) the system did not use my swap partition.
Is there a way to limit the amount of RAM used by the total system, while still using swap space?
The point is to measure the total running time of each program as a function of the input size. I am not trying to pinpoint performance problems, I am trying to compare algorithms, which means I need accuracy.

Write a simple c program which
Will allocate large amount of memory.
Keep on accessing allocated memory random to try to keep in main memory (in an infinite loop).
Now run this program (one or few processes) so that you allocate enough memory to cause the thrashing of process you are testing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string