Loading python modules through a computing cluster - linux

I have an account to a computing cluster that uses Scientific Linux. Of course I only have user access. I'm working with python and I need to run python scripts, so I need to import some python modules. Since I don't have root access, I installed a local python copy on my $HOME with all the required modules. When I run the scripts on my account (hosting node), they run correctly. But in order to submit jobs to the computing queues (to process on much faster machines), I need to submit a bash script that has a line that executes the scripts. The computing cluster uses SunGrid Engine. However when I submit the bash script, I get an error that the modules I installed can't be found! I can't figure out what is wrong. I hope if you can help.

You could simply call your python program from the bash script with something like: PYTHONPATH=$HOME/lib/python /path/to/my/python my_python_script
I don't know how SunGrid works, but if it uses a different user than yours, you'll need global read access to your $HOME. Or at least to the python libraries.

First, whether or not this solution works for you depends heavily on how the cluster is set up. That said, the general solution to your problem is below. If the compute cluster has access to the same files as you do in your home directory, I see no reason why this would not work.
You need to be using a virtualenv. Install your software inside that virtualenv along with any additional python packages you need. Then in your batch bash script, provide the full path to the python interpreter within that virtualenv.
Note: to install python packages inside your virtualenv, you need to use the pip instance that is in your virtualenv, not the system pip.
Example:
$ virtualenv foo
$ cd foo
$ ./bin/pip install numpy
Then in your bash script:
/path/to/foo/bin/python /path/to/your/script.py

Have you tried to add these in your python code:
import sys
sys.path.append("..")
from myOtherPackage import myPythonFile
This works very well for my code when I run it on Cluster and I wanted to call my "myPythonFile" from other package "myOtherPackage"

Related

Run scripts on start up on Google Cloud Platform (GCP) does not work for some scripts?

I have a Linux machine on GCP and followed this answer to add a startup script. When my file is the following it perfectly works and creates a log file when the machine starts:
#!/usr/bin/python3
with open('./my_log.txt', 'w') as f:
f.write('Hello Saeed\n')
f.close()
However, when I change it to any other files like:
#!/usr/bin/python3
from package import *
import numpy as np
import pandas as pd
I do not see the run file on my python process when the machine starts. I use ps -fA | grep python to see my python processes.
Can you please help me to figure this out?
Edit:
I import some packages it does not work.
I would recommend you check out the "official" way to install and run a startup script on a GCE VM: https://cloud.google.com/compute/docs/instances/startup-scripts. Using a cron job might work, but it's sort of a "low-level" solution given that GCP provides more managed and theoretically more straightforward ways to provide a script that should be executed every time your VM starts up.
As per your script, it might be a couple of things: either it finishes executing before you have a chance to run ps -fA | grep python, or it may be failing for some other reasons, probably missing dependencies. You can try the following test: run your script as root with sudo ./your_script.py. You should be able to see some errors and then you can start troubleshooting those to find a solution to your problem. The reason you need to run the script as root is that startup scripts are run as root, and thus such a test will represent what will happen at startup time more closely.
If you see missing dependencies problems you should be able to solve them by installing the python dependencies (like numpy) at the system level using a command like sudo pip install numpy. If you don't want your script to run as root you can also add the username that you want your script to be run as after the command in the crontab:
#reboot /path/to/script username
Hope this gets you on the right path :)

How can I create a Python executable program that doesn't force the user to actually download Python libraries and that stuff in order to be executed?

Let's say that I already built a program that essentially takes some images from an user's path and make new ones using the following libraries/modules:
os itertools pandas PIL
What I need now is to make that program become an executable file that can be actually executed without having to have installed the Python environment and the libraries used in the code, because the user would not know how to code, and would not likely matter how the code works.
The program would run on Windows OS (PC) only , and would use the cmd.exe as the interpreter and medium for user inputs and results display.
Solved here
You can use wheel, and ship whole environment.
#Vincent Caeles:
"#rh979 a wheel is not meant to have all dependencies. Subsequently you could do pip install path/to/wheel.whl --target /path/to/some/folder and zip the contents of the 'folder' to have all your dependencies in the zip archive and ship that to the environment where you want to run your code.
"
Another options is pyinstaller library, according to documentation it should do what you need.

Failing importing module to Slurm

I am a beginner and I am starting using a local cluster that works with Slurm.
I am able to execute some python codes with the usual modules (numpy, scipy, etc..) but as I try to run a script that includes my own library: myownlib.py, the following message is displayed:
No module named myownlib
I sought a lot for the solution, probably looking in the wrong direction. Hereby what I tried to fix this:
I created an environment file, with conda;
I wrote the following test.sh
(That led to the error mentioned before)
#!/bin/bash
module purge
source myownlib-devel #This is the name I gave into the environment file)
/usr/bin/python ~/filexample.py
Any suggestions?
(Thank you in advance...)
One of the most probable cause could be a difference in Python version between the login node where you created the environment and the compute nodes. If you loaded a specific Python module with module load for creating the virtual env, you should load the same module in the submission script. The default Python version for the login node could be Python 3 while the default version for the compute nodes could be Python 2, depending on the Linux distribution and list of modules loaded.

How can I run a Flask application

The Flask official website says that we can run a Flask application by
$ export FLASK_APP=hello.py
$ flask run
The second command doesn't work for me.
$ flask run
Command 'flask' not found, but can be installed with:
sudo apt install python3-flask
Instead this works
python3 -m flask run
How can I make the second command works? If I run sudo apt install python3-flask, will I get two installations of flask?
Can the two commands be combined into one command without using environment variable?
Bear with me as I will try to explain the different pieces and how they all interconnect. export FLASK_APP=hello.py is setting an operating system environment variable called FLASK_APP and is simply pointing to the entry file to start your flask application. This is no different than setting any other environment variable on your operating system. Now the flask team has provided everyone with a command called flask run which can be used to start up your flask application and this command will use the value set within your FLASK_APP environment variable when it attempts to start your flask server. So the reason why your python3 -m flask run command works is because you're telling your operating system's install of python to run the flask run command as a script, which is how this command is intended to be invoked.
For reference:
-m mod : run library module as a script (terminates option list)
Additionally, python attempts to resolve modules from it's sys.path environment variable and it looks in the following order of directories to resolve the requested module:
The current directory where the script has been invoked. This is why you can always import modules contained in the same directory as one another.
The value of your PYTHONPATH environment variable
The standard library directory on your path
Lastly, the site packages directory, i.e. your third party packages like flask
Now the reason your flask run command didn't initially work is because python couldn't find flask within any of the four locations listed above. However, once you gave the -m python knew to look in your site-packages directory for flask and was able to find said module.
For reference you can see where python is looking to resolve modules by printing out the sys.path variable to the console:
import sys
print(sys.path)
Ok so that answers the first part of your first question, now as for the second part of your first question:
"If I run sudo apt install python3-flask, will I get two installations of flask?"
Yes, this would install flask globally on your system and I would highly advise against this as you can mess up your system pretty badly if you're not careful. So how do I avoid messing with my system level python configurations?
Virtualenv to the rescue, Virtual environments allow you to have a sandboxed area to play around with libraries. With the worst case scenario being you blow them away and start fresh if you screwed something up, without affecting your Operating System's install of python. You should have a one to one relationship between each python project and virtual environment. If you use virtualenv I highly suggest looking into Virtualenvwrapper which wraps virtualenv with easier to remember commands. Although I think all the cool kids are using pipenv now so you may want to look into that as well, I will leave that decision up to you. What's nice is once you've activated your virtual environment and are developing you can just use flask run since your virtual environment will be on your python path.
As for your second question: "Can the two commands be combined into one command without using environment variable?"
No you would still need to set the FLASK_APP environment variable to use flask run since it looks for the value of that environment variable to start your flask server. Perhaps you could try something like:
FLASK_APP=hello.py flask run
on the command line and see if that helps you, but you're still setting the FLASK_APP environment variable. Alternatively, you could just start the entry file for your flask server directly, with a:
python hello.py
I know that was a lot, but hopefully that helps clarify things for you!

Is there an installer for geodict python library?

I wanted to use the code:
import geodict_lib
locations = geodict_lib.find_locations_in_text(text)
But there seems to be no installer for geodict_lib. How do I install this is Anaconda 3.0 Python 3?
I know this is a year on, but perhaps I could help others who stumble on this. You'll need to place the files in the directory for modules that your installation of Python is monitoring.
First, download the .zip file from GitHub here.
Once you've done that, you can run the following at the command line or terminal:
conda list
This will provide the path to all installed packages in your installation of Python. Move the geodict.zip file you downloaded to that location. You might want to run which python as well (see here) since you may have a few different installations to check for.
Now when you run python import geodict in Python it should run without trouble!

Resources