SLURM loading modules vs using library from virtual environment - pytorch

I'm relatively new to using clusters, in our uni we have one that is operated using slurm.
I'm trying to train a model that I can run locally on my CPU with my virtual enviroment
However when I try using the following script:
`#!/bin/csh
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --time=4:0:0
#SBATCH --<mymail>
source <VENV_PATH>/bin/activate.csh
python3 --version
which python3
set RUNPATH="my_path"
cd $RUNPATH
python3 my_prog.py
I get the following error: "ModuleNotFoundError: No module named 'torchvision"
and I find this to be funny because when I run my program locally with the same virtual environment it can obviously find the module.
Granted the cluster does have its own modules and I can load them using module avail I can see what modules are available, however I'm not sure they have the versions of cuda / pytorch that I need which is why, if possible, I would rather load them from the virtual environment.
Is such as thing possible?
Thanks

Related

How to load modules in a Linux HPC when running it through RStudio

I am using a Linux HPC environment to run C++ models. The models require certain modules to be loaded which I have specified through setting up a .profile file, therefore the modules are always loaded when I log into the HPC through the terminal.
However I would like to run the models when I access the HPC through a visual Rstudio session through the ondemand.HPC software, which runs R as a batch job. I have heard that in this case R is operating through a non-interactive session, which means i need a .bashrc file to specify the modules which require loading. So I have created both files and left them in my home directory.
Here are my .profile :
module load nano
module load use.own
module load intel intelmpi dependencies cmake
and .bashrc files:
module load nano
module load use.own
module load intel intelmpi dependencies cmake
if [ -f ~/.profile ]; then
. ~/.profile
fi
Unfortunately, the modules are still not loaded when opening an batch based visual Rstudio session through OnDemand.HPC. I am a linux rookie therefore any help or tips would be much appreciated.
Thank you for considering this issue.

Python code fails when sending it to cluster nodes via Slurm [duplicate]

This question already has answers here:
How to load anaconda virtual environment from slurm?
(1 answer)
Using conda activate or specifying python path in bash script?
(2 answers)
Closed 8 months ago.
I'm trying to run some python scripts on a computing cluster.
On the head node of the computing cluster, I have my conda environment activated, all the software loaded that I need and any python modules installed that I need to make the code run and it runs fine.
But after this test on the head node, I am trying to use Slurm to distribute jobs to the nodes. Now my script immediately fails and the error tells me that numpy is not installed etc..
Obviously this is because the node isn't set up properly to deal with my code. How does this work normally? Do I need to include instructions in my bash script to initialise conda and activate a conda environment and install the python modules before running my script? This doesn't seem sensible.
I was hoping that the node would just be used for computing resources but my existing personal head node environment (whatever the correct term is) would be used to execute the job. Could someone explain how people do this please? I've used a different cluster before and never had this problem.

Failing importing module to Slurm

I am a beginner and I am starting using a local cluster that works with Slurm.
I am able to execute some python codes with the usual modules (numpy, scipy, etc..) but as I try to run a script that includes my own library: myownlib.py, the following message is displayed:
No module named myownlib
I sought a lot for the solution, probably looking in the wrong direction. Hereby what I tried to fix this:
I created an environment file, with conda;
I wrote the following test.sh
(That led to the error mentioned before)
#!/bin/bash
module purge
source myownlib-devel #This is the name I gave into the environment file)
/usr/bin/python ~/filexample.py
Any suggestions?
(Thank you in advance...)
One of the most probable cause could be a difference in Python version between the login node where you created the environment and the compute nodes. If you loaded a specific Python module with module load for creating the virtual env, you should load the same module in the submission script. The default Python version for the login node could be Python 3 while the default version for the compute nodes could be Python 2, depending on the Linux distribution and list of modules loaded.

pyspark - can PYTHONPATH be used for the Python interpreter on worker nodes to find Python modules?

Please advise where I should look into to understand the in-detail mechanism on how PySpark finds Python modules on the worker nodes, especially the usage of PHTHONPATH.
PYTHONPATH variable
Environment Variables says environment variables defined in spark-env.sh could be used.
Certain Spark settings can be configured through environment variables, which are read from the conf/spark-env.sh script in the directory where Spark is installed
Then if I define PHTHONPATH in spark-env.sh on all the worker nodes, will PySpark start a Python interpreter process with the PYTHONPATH being passed to the UNIX process, and Python modules will be loaded from the PYTHONPATH?
Precedence of PYTHONPATH when --archives is used
In case PHTHONPATH in spark-env.sh can be used, then what will happen when --archives specifies the virtual environment package?
Python Package Management says a conda environment can be packaged into a tar.gz and passed to the worker nodes.
There are multiple ways to manage Python dependencies in the cluster:
Using PySpark Native Features
Using Conda
Using Virtualenv
Using PEX
Using Conda
conda create -y -n pyspark_conda_env -c conda-forge pyarrow pandas conda-pack
conda activate pyspark_conda_env
conda pack -f -o pyspark_conda_env.tar.gz
After that, you can ship it together with scripts or in the code by using the --archives option or spark.archives configuration (spark.yarn.dist.archives in YARN). It automatically unpacks the archive on executors.
export PYSPARK_DRIVER_PYTHON=python # Do not set in cluster modes.
export PYSPARK_PYTHON=./environment/bin/python
spark-submit --archives pyspark_conda_env.tar.gz#environment app.py
However when PYTHONPATH is defined in spark-env.sh, will PYTHONPATH be used and which will take precedence either inside the conda environment or PYTHONPATH?

Loading python modules through a computing cluster

I have an account to a computing cluster that uses Scientific Linux. Of course I only have user access. I'm working with python and I need to run python scripts, so I need to import some python modules. Since I don't have root access, I installed a local python copy on my $HOME with all the required modules. When I run the scripts on my account (hosting node), they run correctly. But in order to submit jobs to the computing queues (to process on much faster machines), I need to submit a bash script that has a line that executes the scripts. The computing cluster uses SunGrid Engine. However when I submit the bash script, I get an error that the modules I installed can't be found! I can't figure out what is wrong. I hope if you can help.
You could simply call your python program from the bash script with something like: PYTHONPATH=$HOME/lib/python /path/to/my/python my_python_script
I don't know how SunGrid works, but if it uses a different user than yours, you'll need global read access to your $HOME. Or at least to the python libraries.
First, whether or not this solution works for you depends heavily on how the cluster is set up. That said, the general solution to your problem is below. If the compute cluster has access to the same files as you do in your home directory, I see no reason why this would not work.
You need to be using a virtualenv. Install your software inside that virtualenv along with any additional python packages you need. Then in your batch bash script, provide the full path to the python interpreter within that virtualenv.
Note: to install python packages inside your virtualenv, you need to use the pip instance that is in your virtualenv, not the system pip.
Example:
$ virtualenv foo
$ cd foo
$ ./bin/pip install numpy
Then in your bash script:
/path/to/foo/bin/python /path/to/your/script.py
Have you tried to add these in your python code:
import sys
sys.path.append("..")
from myOtherPackage import myPythonFile
This works very well for my code when I run it on Cluster and I wanted to call my "myPythonFile" from other package "myOtherPackage"

Resources