How do I ensure my job has enough memory to run using SLURM? Should I refer to MaxRss or MaxVMSize? - slurm

I wanted to reserve enough memory for my job. Should I set the memory to be larger than MaxRss or MaxVMSize? I am confused. Thank you!

Related

Is there a way to increase memory allocation for running jobs through "srun, sbatch, or salloc"?

I use srun, salloc, or sbatch with slurm when I want to execute my Job.
srun -p PALL --cpus-per-task=2 --mem=8G --pty --x11 ./my_job --job-name=my_job_1
I don't know how much memory I should allocate for the first job.
There are times when memory allocation is insufficient during running, and I want to prevent it from being 'out of memory exit'
Is there a way to increase memory allocation for jobs running through slurm?
In the example above, if you are getting a memory error, try increasing your --mem allocation to more than 8G.
If you are using sbatch: sbatch your_script.sh to run your script, add in it following line:
#SBATCH --mem-per-cpu=<value bigger than you've requested before>
If you are using srun: srun python3 your_script.py add this parameter like this:
srun --mem-per-cpu=<value bigger than you've requested before> python3 your_script.py
No, generally speaking, you cannot increase the amount of resources allocated to a running job (except some cases where nodes can be added from another job).
There is no easy way to know how much memory a specific experience will require. It depends mostly on the data that are consumed/produced.
Some tips:
in Python, you can use sys.getsizeof(object) to get the size of an object in memory (e.g. a Panda data frame)
you can also use a memory profiler such as https://pypi.org/project/memory-profiler/ to get an overview of the overall memory consumption of the script
you can use the top command while the script is running and look at the RSS column while running in an interactive Slurm session, or on your laptop, or other machine where you can test the script
you can use the sacct command to get the actual memory usage of the job afterwards and possibly use that information to better estimate future, similar-looking, jobs.

Slurm uses more memory than allocated

As you can see in the picture below, I have made a sbatch script so that 10 job array (with 1GB of memory allocation) to be run. However, when I run it as the second picture shows, the memory used is 3.7% of total memory, which equates to about 18.9GB per job... Could anyone explain why this is happening?
(I did sbatch --nodelist node1 ver_5_FINAL_array_bash on the linux terminal )
Thank you!
For reference, the picture below shows that the amount of allocated memory is indeed 10GB, as specified in the sbatch script
Possibly pertinent information: our servers use both slurm and regular job submissions (without any job submission methods like slurm)
By default, the --mem option gives the minimum memory requirement (see the documentation here: https://slurm.schedmd.com/sbatch.html#OPT_mem)
A hard limit can by set by the Slurm administrator, by using cgroups. It's not something the user can do, I don't think.
A cgroup is created for the job with hard resource limits (CPU, memory, disk, etc), and if the job exceeds any of these limits, the job is terminated.

Can h2o allow to allocate more memory to standalone cluster?

I want to increase the h2o cluster memory up to 64gb. Can I do that yes or no? If no then it should be equal or less to my system memory? or if yes then how much I can allocate?
import h2o
h2o.init(nthreads=-1,max_mem_size='16g')
Thanks
The max_mem_size parameter goes straight to the Xmx parameter for the Java heap allocated to the h2o backend process.
Because java is a garbage collected language, you never want to make the java heap size larger than about 90% of physical memory or you run the risk of uncontrollable swapping.

Used and Cached Memory In Spark

I would like to know if spark uses the linux cached memory or the linux used memory when we use the cache/persist method.
I'm asking this because I we have a custer and we see that the machines are used only at 50% used memory and 50% cached memory even when we have long jobs.
Thank you in advance,
Cached/buffered memory is memory that Linux uses for disk caching. When you read a file it is always read into memory cache. You can consider cached memory as free memory. JVM process of spark executor doesn't take directly cached memory. If you see that only 50% of memory is used on your machine, it means that spark executor definitely doesn't take more than 50% of memory. You can use top or ps utils to see how much memory spark executor actually takes. Usually it is a little bit more than current size of heap.

Does RSS Include Kernel Space Memory?

I am writing a simple memory profiler by reading the VmRSS value of /proc/[pid]/status. My question is that does a process' RSS include kernel space memory? Thank you!
No, if you read the code in task_mmu.c you'll see that it's strictly pages allocated to the process. Kernel space memory usage doesn't really have a quantifiable value at process scope. Any memory increase in the kernel after a process starts execution should be negligible though.
AFAIK RSS tells how much user space memory given process occupies.

Resources