Limit Tensorflow GPU memory usage via environmental variable - linux

I am using a C++ library that internally uses Tensorflow, so I do not have access to session parameters.
When Tensorflow session is created one can limit GPU memory usage by setting per_process_gpu_memory_fraction value and allow growth flag (example in Python):
memory_config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.3))
memory_config.gpu_options.allow_growth=True
It is also possible to set global value for allow growth using environmental variable that will be used if this option is not specified in the tf.ConfigProto: export TF_FORCE_GPU_ALLOW_GROWTH=true in linux shell.
I wonder if there's an environmental variable for setting per_process_gpu_memory_fraction globally?

Related

Get available memory inside SLURM step

I'm trying to generate a script that automatically adapt its requirements to whatever is the environment where it is running.
I already got the number of CPUs available by accessing the SLURM_CPUS_PER_TASK environment variable. If it does not exists, I assume it is an interactive execution and default the value to 1.
Now I need to get the memory available, but this is not so straightforward. We have SLURM_MEM_PER_CPU and SLURM_MEM_PER_NODE. If I'm not wrong, this numbers are not always present, and there's the special case of asking for zero memory. But I need to have the real number, as I'm trying to run a java application and I need to put something specific in the -Xmx parameter.
Is there any easy way to get that info? Or I have to test for availability of any of the variables and query SLURM/the system in order to get total memory available in case of zero?
If you request memory (--mem) on your submit script these environment variables should be set.
Else you can try (scontrol show config)
or parse /etc/slurm/slurm.conf for MaxMemPerNode with the PartitionName you are running.
ref: https://slurm.schedmd.com/sbatch.html

Slurm: by default assign a certain number of GPUs

If I do not specify any --gres=gpu:1 option then the process will use up all GPUs in the compute node.
We only use Slurm for GPU sharing so we would like that every process be assigned one GPU automatically... Is it possible to specify that by default srun --gres=gpu:1?
You can set a default for --gres by setting the SBATCH_GRES env variable to all users, for instance in /etc/profile.d on the login node. Simply create a file in there, that has the following content:
export SBATCH_GRES=gpu:1
Note that the documentation says
Note that environment variables will override any options set in a batch script
so people who will want to use more than one, or not use a GPU at all will need to override this default using the command line option, and won't be able to override it with a #SBATCH --gres line in their submission script.
Another option would be, to set the CUDA_VISIBLE_DEVICES to an empty string for all users by default. Then, in jobs that request GPUs, the variable will be modified by Slurm according to the request, and jobs that do not make the GPU request will not 'see' the GPUs.
If users are likely to play the system (the CUDA_VISIBLE_DEVICES variable can be overwritten by the users), then you will have to set cgroups.

Tensorflow: switch between CPU and GPU [Windows 10]

How can I quickly switch between running tensorflow code with my CPU and my GPU?
My setup:
OS = Windows 10
Python = 3.5
Tensorflow-gpu = 1.0.0
CUDA = 8.0.61
cuDNN = 5.1
I saw a post suggesting something about setting CUDA_VISIBLE_DEVICES=0 but I don't have this variable in my environment (not sure if it's because I'm running windows or what) but if I do set it using something like os.environ it doesn't effect how tensorflow runs code.
If you set the environment variable CUDA_VISIBLE_DEVICES=-1 you will use the CPU only. If you don't set that environment variable you will allocate memory to all GPUs but by default only use GPU 0. You can also set it to the specific GPU you want to use. CUDA_VISIBLE_DEVICES=0 will only use GPU 0.
This environment variable is created by the user, it won't exist until you create it. You need to set the variable before tensorflow is imported (usually that is before you start your script).

Secure Erase of a Bash Environmental Variable

Suppose I have in a Bash shell script an environmental variable that holds a sensitive value (e.g. a password). How may I securely overwrite the memory that holds this variable's value before exiting my script?
If possible, the technique used to do so would not be dependent on the particular implementation of Bash I'm using. I'd like to find a standards-respecting/canonical way to do this that works on all correct Bash implementations.
Please note that the following are not in the scope of the question:
1. How the sensitive value is placed into the environmental variable
2. How the sensitive value stored in the environmental variable is passed to the program that consumes it
7/10/2017 5:03 AM Update to Address Comment by rici
rici, thank you for your comment, copied here:
"Exiting the script is really the only way to reliably delete an
environment variable from the script's resident memory. Why do you
feel the string is less safe after the script terminates than while it
is running?"
My intent here is to follow good practice and actively scrub all cryptographically-sensitive values from memory as soon as I am through using them.
I do not know if Bash actively scrubs the memory used by a script when that script exits. I suspect that it does not. If it indeed does not, the sensitive cryptographic value will remain resident in memory and is subject to capture by an adversary.
In C/C++, one can easily scrub a value's memory location. I am trying to find out of this is possible in Bash. It may be that Bash is simply not the right tool for security-sensitive applications.
First off, we need to distinguish between environment variables and shell variables. Environment variables exist for the lifetime of the process and cannot be overwritten. Not only that, but on many systems they are trivially visible to other processes. For example Linux provides the /proc filesystem which allows for lots of introspection of running processes, including observing their environment variables.
Here's an example of a Bash script that attempts to overwrite an environment variable. Notice that although the value within the script changes, the process' environment is not changed:
$ SECRET=mysecret bash -c \
'strings /proc/$$/environ | grep SECRET
SECRET=overwritten
echo "SECRET=$SECRET"
strings /proc/$$/environ | grep SECRET'
SECRET=mysecret
SECRET=overwritten
SECRET=mysecret
So it is never safe to store secrets in environment variables unless you control all access to the machine.
Holding a secret in a (non-environment) shell variable is much more secure, as an attacker would need to be able to access the memory of the process, which is generally something only the kernel can do. And while you're correct that minimizing the time you hold onto such secrets is a good practice, it's not generally worth jumping through lots of hoops for. It's far more important to secure your system and execution environment, because a motivated attacker who has sufficient access can observe a secret even if it only lives in memory for a brief time. Holding a secret in memory for longer than strictly necessary is only a marginal increase in risk, whereas running a privileged program on an insecure system already means all bets are off.

How does AppArmor do "Environment Scrubbing"?

The AppArmor documentation mentions giving applications the ability to execute other programs with or without enviroment scrubbing. Apparently a scrubbed environment is more secure, but the documentation doesn't seem to specify exactly how environment scrubbing happens.
What is environment scrubbing and what does AppArmor do to scrub the environment?
"Environment scrubbing" is the removal of various "dangerous" environment variables which may be used to affect the behaviour of a binary - for example, LD_PRELOAD can be used to make the dynamic linker pull in code which can make essentially arbitrary changes to the running of a program; some variables can be set to cause trace output to files with well-known names; etc.
This scrubbing is normally performed for setuid/setgid binaries as a security measure, but the kernel provides a hook to allow security modules to enable it for arbitrary other binaries as well.
The kernel's ELF loader code uses this hook to set the AT_SECURE entry in the "auxiliary vector" of information which is passed to the binary. (See here and here for the implementation of this hook in the AppArmor code.)
As execution starts in userspace, the dynamic linker picks up this value and uses it to set the __libc_enable_secure flag; you'll see that the same routine also contains the code which sets this flag for setuid/setgid binaries. (There is equivalent code elsewhere for binaries which are statically linked.)
__libc_enable_secure affects a number of places in the main body of the dynamic linker code, and causes a list of specific environment variables to be removed.

Resources