nextflow does not find all my python modules - python-3.x

I am trying to make a Nextflow script that utilizes a python script. My python script imports a number of modules but within Nextflow python3 does not find two (cv2 and matplotlib) of 7 modules and crashes. If I call the script directly from bash it works fine. I would like to avoid creating a docker image to run this script.
Error executing process > 'grab_images (1)'
Caused by:
Process `grab_images (1)` terminated with an error exit status (1)
Command executed:
python3 --version
echo 'processing image-1.npy'
python3 /home/hq/cv_proj/k_means2.py image-1.npy
Command exit status:
1
Command output:
Python 3.7.3
processing image-1.npy
Command error:
Traceback (most recent call last):
File "/home/hq/cv_proj/k_means2.py", line 5, in <module>
import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'matplotlib'
Work dir:
/home/hq/cv_proj/work/7f/b787c62ec420b2b5eb490603ef913f
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
I think there is a path issue as modules like numpy, sys, re, time are successfully loaded. How can I fix?
Thanks in advance
UPDATE
To assist other who may have problems using python in nextflow scripts... Make sure your shebang is correct. I was using
#!/usr/bin/python
instead of
#!/usr/bin/python3
Since all of my packages were installed with pip3 and I exclusively use python3 you need to have the right shebang.

Best to avoid absolute paths to your script(s) in your process declarations. This section of the docs is worth taking some time to read: https://www.nextflow.io/docs/latest/sharing.html#manage-dependencies, particularly the subsection on how to manage third party scripts:
Any third party script that does not need to be compiled (Bash,
Python, Perl, etc) can be included in the pipeline project repository,
so that they are distributed with it.
Grant the execute permission to these files and copy them into a
folder named bin/ in the root directory of your project repository.
Nextflow will automatically add this folder to the PATH environment
variable, and the scripts will automatically be accessible in your
pipeline without the need to specify an absolute path to invoke them.
Then the problem is how to manage your Python dependencies. You mentioned Docker is not an option. Is Conda also not an option? The config for Conda might look something like:
name: myenv
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::matplotlib-base=3.4.3
- conda-forge::numpy=1.21.2
- conda-forge::opencv=4.5.2
Then if the above is in a file called environment.yml, create the environment with:
conda env create
See also the best practices for using Conda.

Related

Why are these import errors occurring when running python scripts from cmd or windows task scheduler, but not anaconda?

I am encoutering import errors, but only when running my python scripts from cmd or windows task scheduler (effectively the same issue I assume). I have researched answers already and attempted various solutions (detailed below), but nothing has worked yet. I need to understand the problem in any case so that I can manage anything like it in the future.
Here is the issue:
Windows 10. Anaconda Python 3.9.7. Virtual enviromnent.
I have a script that works fine if I open an anaconda prompt, activate the virtual environment and run it.
However, this is where the fun starts. If I try to run the script from the non-anaconda cmd prompt deploying the commands: "C:\Users\user\anaconda3\envs\venv\python.exe" "C:\Users\user\scripts\script.py" if get the following error:
ImportError: DLL load failed while importing etree: The specified module could not be found.
Traceback includes:
"C:\Users\user\anaconda3\envs\venv\lib\site-packages\lxml\html\__init__.py", line 53, in <module>
from ..import etree
This is not as simple as one specific module not being installed, because of course running the script from within the anaconda prompt and the virtual environment works. Similar also happens when I run other scripts. Other errors I have seen include, for example:
ImportError: DLL load failed while importing _imaging: The specified module could not be found.
Traceback includes:
"C:\Users\user\anaconda3\envs\venv\lib\site-packages\PIL\Image.py", line 114, in <module>
from . import _imaging as core
Also, I think this may be somehow related. Importing numpy (1.22.3) from within the python interpreter in the virtual environment works fine, but when I try to run a test script that imports numpy it fails both from anaconda and the cmd with the following error:
ImportError: cannot import name SystemRandom
The oveall issue was noted originally when trying to run various scripts from Windows Task Scheduler with the path to python "C:\Users\user\anaconda3\envs\venv\python.exe" entered as the Program/script and the script "script.py" entered as an argument. The above errors were produced, then reproduced by running the scripts from a non-anaconda cmd.
I am looking to understand what is happening here and for a solution that can get the scripts running from the virtual enviroment from Windows Task Scheduler effectively.
Update:
I have uninstalled and reinstalled numpy (and pandas) using conda. This has left the venv with numpy==1.20.3 (and pandas=1.4.2). On attempting to re-run one of the scripts, it runs fine from within the venv in anaconda, but produces the following error when attempting to run from cmd or from within Windows Task Scheduler as above:
ImportError: Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions faled. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed.
We have complied some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.9 from "C:\Users\user\anaconda3\envs\venv\python.exe"
* The NumPy version is "1.20.3"
and make sure that they are the versions you expect.
Please carefull study the documentation linked above for further help.
Original error was: DLL load failed while importing _multiarray_umath: The specified module could not be found.
I have looked into the solutions suggested, but am still completely at a loss, especially as to why the script runs from the venv in one place, but NOT the other.

Can't execute python3 on terminal (PYTHONPATH and PATH complications)

I installed the library ase. Since then I can't execute python3 on terminal neither run its python files. I found out with the command which python3 command that with python3 is located at '/usr/local/bin/python3'. The libraries, including ase, are located at '/usr/local/lib/python3.9/site-packages' apparently.
See:
pip3 install --upgrade git+https://gitlab.com/ase/ase.git#master
WARNING: The scripts ase, ase-build, ase-db, ase-gui, ase-info and ase-run are installed in '/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
This is the error I got when I try to run python3 on terminal:
Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/ase/io/__init__.py", line 1, in <module>
ModuleNotFoundError: No module named 'ase'
I think that something has to be wrong at my .bash_profile file however, I'm not getting the error. Here is my edited .bash_profile:
export PYTHONPATH=/usr/local/lib/python3.9/site-packages/ase:$PYTHONPATH
export PATH=/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/bin:$PATH
Hope you can help me, thank you!
Update:
I changed my PYTHONPATH from:
export PYTHONPATH=/usr/local/lib/python3.9/site-packages/ase:$PYTHONPATH
to:
export PYTHONPATH=/usr/local/lib/python3.9/site-packages:$PYTHONPATH
Now I can run python3 on terminal again, however I try to execute ase directly on the terminal it seems that the command is not found.
-bash: ase: command not found
I have already checked out the requires (matplotlib, scipy, numpy) and they are all installed. Seems that everything is wrong with the PATH. I don't know what could be.
The binaries files related to ase package are not anymore into this directory "/usr/local/Cellar/python#3.9/3.9.2_1/Frameworks/Python.framework/Versions/3.9/bin". I don't know where they are since the installation was successfully done.
According to https://unix.stackexchange.com/questions/240037/why-did-pip-install-a-package-into-local-bin I redefined my paths to:
export PYTHONPATH=/usr/local/lib/python3.9/site-packages:$PYTHONPATH
export PATH=/usr/local/bin:$PATH
and seems to be working, I already ran a program containing that package, however when I enter the folder '/usr/local/bin' I don't find those binary files.
If you can help me again I will be grateful.

Packages install but not found

I have packages installed under /usr/local/lib and I added that in my PATH as well, but then I try to import it in any of my python scripts I get an error saying module not found.
-bash-4.2$ pip2 list | grep pytest
pytest-mock 2.0.0
My PATH:
echo $PATH
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/bin
ERROR:
-bash-4.2$ python2
>>> import pytest
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named pytest
Only if the packages is installed under my /users/user-name/.local/bin folder, it is reflected else it is not.
My usecase is to use this machine as a slave for my Jenkins setup. I tried injecting this PATH directly to the job during build process as well. Didn't work for me.
I have been stuck on this for quite some while, any help on this is greatly appreciated.
First thing, it's generally a good idea to use virtualenv to create Python environments - installing Python packages system-wide is asking for trouble.
Second, your path may not work because you are setting PATH in the way that Jenkins ignores. The simplest solution is to provide full path to file: /usr/local/bin/pytest.
The safest way is to combine two above - create virtualenv, install pytest in it and rovide full path when using (note: you don't need to activate virtualenv to use it).

Running python scripts from command line using different python versions defined in anaconda envs

I have 2 python tools that I have to run via the windows cmd line. One is written in python2.7 while the other requires python3.6.
I have installed the newest Anaconda python3.7 version and created two new environments in 'C:\ProgramData\Anaconda3\envs' called 'python27' and 'python36'. For some reason I had to manually install numpy and scipy using conda install -n env_name numpy scipy for each of the new environments.
The reason I have to run both tools using the windows cmd line is that I have integrated them into a workflow environment (RCE by the DLR in case this is relevant), which executes integrated tools in this way. Which means I cannot simply use the Anaconda Prompt instead.
I cannot simply add the python installation to the PATH environment variable because of each tool requiring a different python version (and the file being called 'python.exe' in all versions), so I tried to create aliases for the cmd prompt as suggested by "roryhewitt" in this thread Aliases in Windows command prompt.
my 'python27.bat' file:
#echo off
echo.
C:\ProgramData\Anaconda3\envs\python27\python.exe %*
The problem with this approach is that python encounters an error when trying to import numpy:
>>> import numpy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\ProgramData\Anaconda3\envs\python27\lib\site-packages\numpy\__init__.py", line 142, in <module>
from . import core
File "C:\ProgramData\Anaconda3\envs\python27\lib\site-packages\numpy\core\__init__.py", line 71, in <module>
raise ImportError(msg)
ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
Here is how to proceed:
- If you're working with a numpy git repository, try `git clean -xdf`
(removes all files not under version control) and rebuild numpy.
- If you are simply trying to use the numpy version that you have installed:
your installation is broken - please reinstall numpy.
- If you have already reinstalled and that did not fix the problem, then:
1. Check that you are using the Python you expect (you're using C:\ProgramData\Anaconda3\envs\python27\python.exe),
and that you have no directories in your PATH or PYTHONPATH that can
interfere with the Python and numpy versions you're trying to use.
2. If (1) looks fine, you can open a new issue at
https://github.com/numpy/numpy/issues. Please include details on:
- how you installed Python
- how you installed numpy
- your operating system
- whether or not you have multiple versions of Python installed
- if you built from source, your compiler versions and ideally a build log
Note: this error has many possible causes, so please don't comment on
an existing issue about this - open a new one instead.
Original error was: DLL load failed: The specified module could not be found.
Does anyone know a better way to run python scripts with a specific python environment via the windows cmd line, or what is causing the import error when I use the alias?
TLDR: I have 2 python tools that require python2.7 and python3.6 respectively. I have to run these tools using the windows cmd line and using aliases to the 'python.exe' file in the specific anaconda environments results in an import error of numpy. Is there a better way to handle two python environments via the cmd line or an easy fix for the import error?
It seems that the ImportError when using python2.7 via the alias in the cmd line was due to the anaconda environment not being activated. This caused the module to be unable to load properly.
I have added the conda functionality to be used in the cmd line as instructed in the post by "Simba" here: Conda command is not recognized on Windows 10. This has fixed the numpy ImportError.
I can now execute the desired tools from the cmd line using the correct python version by activating the correct anaconda environment first, and then calling the alias
Example:
C:\Users\user>conda activate python27
(python27) C:\Users\user>python27 main.py

Command works at the file path, but won't work from root

I'm still a newbie at bash programming, but trying to run a program with little script. Reducing the problem to the error message, I have
cd /full/path/to/program
python3 -m krop
that is the command working when the actual folder is the /full/path/to/program
but if I run the same from root it doesn't work.
cd /another/path
python3 -m /full/path/to/program/krop
/usr/bin/python3: Error while finding module specification for '/full/path/to/program/krop'
(ModuleNotFoundError: No module named 'krop-0')
I tried lot of variants, but always the same output with errors. I do not have a clue of why the library "python3" adds the "-0" at the end of the name of the file.
What should I put to run the program from root?
python -m command expects a module name, similarly to the syntax you would import in a python program. So if your modules lies in ./directory and directory is a valid python module, you can do python -m directory.krop
You can't however index python modules from file system root. You have either to make your bash script run it in the good directory so you make a local import; or you have to package and install your module system-wide to make a global import that would be invoked with python -m krop from anywhere.
More information on packaging and installing modules: https://packaging.python.org/tutorials/packaging-projects/
Problem solved!,
It was a matter of managing the python import paths, as #hiroprotagonist replied. The list that contains all of directories python will use to search for modules, is available in a variable named sys.path.
So, if somebody wants to run a program (a 'library module as a script', according to python help) through python's command, from a directory different from the 'pwd' one, should write in the command line:
export PYTHONPATH='/full/path/to/program/'
python3 -c "import sys; print(sys.path)"
python3 -m krop
The second line is actually to print on screen, but the first one is the only necessary (export PYTHONPATH).
Thank you for the keywords and help!
Ps. May be should be edited the question title to "problem with a python command to run a program from command line on linux" or something like that.
Reference: python --help :)

Resources