When should I use pytest --import-mode importlib - python-3.x

This is my project structure:
git_dir/
root/
__init__.py
tests/
__init__.piy
test1.py
foo_to_test/
foo.py
I'm using pytest to test foo.py, and test1.py is as follows:
from foo_to_test.foo import func_to_test
def test1():
assert something about func_to_test
From the terminal, i want to run
pytest tests
Now for the problem.
When using --import-mode append or --import-mode prepend it adds 'git_dir' to PYTHONPATH.
The problem is that 'git_dir' is not the project root.
I can add sys.path.append('git_dir/root') but the name 'git_dir' is different for other programmers working on other computers.
I've seen in the pytest documentation that using --import-mode importlib might solve my problem, but it doesn't seem to have any effect on my PYTHONPATH and i can't understand what it is doing.
So my questions are:
What --import-mode importlib is doing?
How can i automatically add my root to the path when testing so it will be the same for every programmer pulling this project?

This looks hard because it's not how it's supposed to work. It may be time to learn about packaging.
The Python module system is designed to look for modules on the sys.path directories, as you describe. This variable is to be populated by the user, somehow, and should preferably not be meddled with.
When a developer wants to use your package, they must make sure it is installed on the system they run the tests on.
If you want to make that easier for them, provide e.g. a project.toml file; then they can build your package, and install it with pip install /path/to/your/distribution-file.tgz.
More info about that here: https://setuptools.readthedocs.io/en/latest/userguide/quickstart.html#python-packaging-at-a-glance.

Your problem is that you have __init__.py in the root directory. When pytest finds a test module, it goes to the parent folders until it finds the one that does not have __init__.py. That's going to be you a directory that's added to sys.path, see Choosing a test layout / import rules. So your root directory should NOT be a Python module.
Now about importlib and why you probably don't need it
--import-mode=importlib circumvents the standard Python way of using modules and sys.path. The exact reasons are unclear from the docs:
With --import-mode=importlib things are less convoluted because pytest doesn’t need to change sys.path or sys.modules, making things much less surprising.
And they also mention that this allows test modules to have non-unique names. But why would we want this?! It seems like with proper file structure everything works fine even if we mess with sys.path.
Anyway, overall the usage of this option doesn't sound convincing, especially given this in the official docs:
Initially we intended to make importlib the default in future releases, however it is clear now that it has its own set of drawbacks so the default will remain prepend for the foreseeable future.

Related

Freeling Python API working on sample, get Import error on other code

I'm trying out Freeling's API for python. The installation and test were ok, they provide a sample.py file that works perfectly (I've played around a little bit with it and it works).
So I was trying to use it on some other python code I have, in a different folder (I'm kind of guessing this is a path issue), but whenever I import freeling (like it shows on the sample.py):
import freeling
FREELINGDIR = "/usr/local";
DATA = FREELINGDIR+"/share/freeling/";
LANG="es";
freeling.util_init_locale("default");
I get this error:
ModuleNotFoundError: No module named 'freeling'.
The sample.py is located on the ~/Freeling-4.0/APIs/Python/ folder, while my other file is located in ~/project/, I dont know if that can be an issue.
Thank you!
A simple solution is to have a copy of freeling.py in the same directory as your code, since python will look there.
A better solution is to either paste it in one of the locations where it usually checks (like the lib folder in its install directory), or to tell it that the path where your file is should be scanned for a module.
You can check out this question to see how it can be done on Windows. You are basically just setting the PYTHONPATH environment variable, and there will only be minor differences in how to do so for other OSes. This page gives instructions that should work on Linux systems.
I like this answer since it adds the path at runtime in the script itself, doesn't make persistent changes, and is largely independent of the underlying OS (apart from the fact that you need to use the appropriate module path of course).
You need to set PYTHONPATH so python can find the modules if they are not in the same folder.

How do we import a module high in a project structure into a file that's at a low level? (Python 3.6 +)

Suppose we have the following project structure:
E:\demo_proj\
utilities_package\
__init__.py
util_one.py
util_two.py
demo_package\
__init__.py
demo_sub_package\
__init__.py
demo_sub_sub_package\
demo_file.py
What is a sensible way for demo_file.py to import from utilities_package?
The utilities_package is in a much higher level directory than demo_file.py.
Although it is not shown in the little diagram I gave above, suppose that utilities_package is used everywhere throughout the project. It is not sensible to make utilities_package a sub-directory of demo_sub_sub_package
Some similar questions have been asked here and here.
The asker of the first question neglected to include a __init__.py in their packages. That is not an issue in my case.
The answers given to the the second question linked-to above are problematic:
Most of the answers are quite old. For example, Loader.load_module was deprecated after Python 3.4
Most of the answers are specific to the question in that they only go up one directory, whereas, I would like to go up several levels.
Some of the answers are high-level descriptions without code. Other answers have the opposite problem: they contain ugly-looking code, but don't bother to explain it.
I am open to solutions which require restructuring the project. I'm also interested in solutions which leave the project structure intact, but which insert the relevant code into demo_file.py
It already has the very epic solution for your problems. Please refer there: Relative imports for the billionth time. Just as a notation, the python3 and python2 have the totally different definitions for the relative and absolute import. For version 3+, any modules import directly using import module_name are treated as the absolute importation, which means this module must be included in the sys.path. Otherwise, for any modules import like from one_module import something or from . import something are treated like the relative importation. The dot-notation represent the current directory. However, any module is executed directly by itself, it will lose its namespace, instead it will always be treated as __main__, and all the files in the same directory will automatically add to sys.path by default. Hope it will help you.

Can other extensions (besides .py .pyc .pyo) be recognized as Python modules?

I have a few test files residing in test/ directory which has an __init__.py file. The test files are basically Python files but have the extension as .thpy (Python test files). Each of these tests internally use unittest.
For code coverage purposes, I'm using the trace module. coverage.py, unfortunately would have been ideal but at this time, doesn't give information on the number of line hits per line.
unittest and trace are not really compatible. unittest.py doesn't play well with trace.py - why?
For example: I have a file name cluster_ha.thpy. From the solution given for the above question, I need to mention module='cluster_ha' inside cluster_ha.thpy but because of the extension, Python doesn't look at it as a Python module.
Is there any way to get around this? Is there a hack that can make other extensions to be considered as a Python module? Or perhaps, there's another module out there which can help me get code coverage?

Is there any way to import modules on bash the same way I do in python?

I've been working on a few scripts lately and I found that it would be really useful to separate some common functionalities on other files like 'utils' and the import them into my main scripts. For this, I used source ./utils.sh. However, It seems that this approach depends on the current path from where I'm calling my main script.
Let's say I have this folder structure:
scripts/
|-tools/
| |-utils.sh
|-main.sh
On main.sh:
source ./tools/utils.sh
some_function_defined_on_utils_sh
...
If I run main.sh from scripts/ folder everything works fine. But if I run it from a different directory ./scripts/main.sh the script fails because it can't find ./tools/utils.sh, which makes sense.
I understand why this doesn't work. My question is if there is any other mechanism to import 'modules' into my script, and make everything current-dir agnostic (just like python scripts from utils import some_function_defined_on_utils_sh))
Thanks!
Define an environment variable with a suitable default setting that is where you'll store your modules, and then read the module using the . (dot) command.
One serious option is simply to place the files in a directory already on your PATH — $HOME/bin is a plausible candidate for private material; /usr/local/bin for material to be shared. Then you won't even need to specify the path to the files; you can write . file-found-on-path to read it. Of course, the downside is that if someone switches their PATH setting, or puts a file with the same name in another directory earlier on their PATH, then you end up with a mess.
The files do not even need to be executable — they just need to be readable to be usable with the dot (.) command (or in Bash/C shells, the source command).
Or you could specify the location more exactly, such as:
. ${SHELL_UTILITY_DIR:-/opt/shell/bin}/tools/utils.sh
or something along those general lines. The choice of environment variable name and default location is up to you, and how much substructure you use on the directory is infinitely variable. Simplicity and consistency are paramount. (Also consider whether versioning will be a problem; how will you manage updates to your utilities?)

Python 3 relative imports or running modules from within packages... which to abandon?

I'll first begin by stating that I have been coming back to this problem over and over again for the last several months. No matter how much I research into it, I cannot seem to find a satisfactory answer. I'm hoping that the community here can help me out.
Basic problem - consider this structure of python packages and python modules.
|- src
|- pkg_1
|- foo_1.py
|- foo_2.py
|- __init__.py
|- pkg2
|- bar_1.py
|- bar_2.py
|- __init__.py
|- do_stuff.py
|- __init__.py
Suppose that module bar_2 needs to import from module foo_1. My options are myriad, but a few present themselves quickly.
(My preferred way) Module bar_2 can do from ..pkg_1 import foo_1. This is great because it doesn't require what amounts to hard-coding a path into the module, thereby allowing flexibility, ease of maintenance, all that. In do_stuff.py if I then write from src.pkg_2 import bar_2 and then run, I am golden. Here is an example setup:
foo_1.py:
class Foo_1():
def __init__(self):
print('Hello from foo_1!')
bar_2.py:
from ..pkg_1 import foo_1
class Bar_2():
def __init__(self):
print('Hello from bar_2!')
foo_1.Foo_1() #Prints foo_1 message!
do_stuff.py:
from src.pkg_2 import bar_2
bar_2.Bar_2()
Console prints:
Hello from bar_2!
Hello from foo_1!
All is well. However, consider the next scenario.
Suppose now that I want to run bar_2 as __main__, as follows:
from ..pkg_1 import foo_1
class Bar_2():
def __init__(self):
print('Hello from bar_2!')
foo_1.Foo_1()
if __name__ == '__main__':
Bar_2()
A SystemError is raised: from ..pkg_1 import foo_1
SystemError: Parent module '' not loaded, cannot perform relative import
For far longer than I care to admit, I did not understand the reason for this. The solution, though, lies in the fact that when you run a module directly, its __name__ variable is set to __main__. Since the relative imports establish position in the hierarchy with __name__, this means that there is no directory information to parse to figure things out. This makes loads of sense, and I feel very dumb for having not realized it before.
So, thus began my quest (yeah, just getting started). Eventually I learned of the __package__ variable. Reading about it in the PEP notes, it seemed as though it would solve all my problems! So I tried the following boilerplate code before the import statements in bar_2:
if __name__ == '__main__':
__package__ = 'src.pkg_2'
This did not work. :(
I since have come to find out that Guido has addressed this very issue and that he regards the whole notion of running a module from within a package as anti-pattern.
See this link:
https://mail.python.org/pipermail/python-3000/2007-April/006793.html
This makes sense, as I will be the first to admit that I only do it for on the fly testing... which should never be done! (Right??) Therefore, as best as I understand, there are no elegant ways to run the module from within a package UNLESS you do absolute imports... which I would like to avoid.
So, after all of that, here is my question: Should I use one of the many hacky ways out there to get around this problem and do unholy things with the system path so that I can have my relative import cake and eat it (i.e. test on the fly by running as __main__) too?? I think I already know the answer (I just would like some wizened Yoda-like person to confirm).
Possible Answer:
Use relative imports, because hard-coding paths (and hard-coding in general if avoidable) is ant-pattern.
Do not bother with running modules nested in packages as __main__... instead, run them from your testing module (which you wrote first... right??).
Thank you for taking the time to read this question. I realize there are many other questions regarding this topic... but I wanted to 'share my journey' and see if my current course of action is correct.
Running the following in src/:
python -m pkg2.bar_2
will have pkg2/bar_2.py be your main script while still being inside a package, meaning that relative imports will work.
I follow these rules which prevent any issues:
Always use full imports, no relative imports.
Always start a Python process from the root of the project or using the full absolute path.
It's rather rare that you benefit from relative imports when you have to move files around (and when you do, it's not that much work to rename a few import lines).
Full paths from the root of a project or using the full path on disk of a file removes any ambiguity and will run the desired file.
My answer (and that of many CPython core developers) is essentially the same as Simeon's. The only thing not hard-coded by relative imports is the name of the package ('src', in this case). But you hard-coded it in do_stuff.py anyway. Within-subpackage relative imports (unlike the cross-subpackage imports you exhibit) let you copy the subpackage to another package (with a different name), and possibly change the subpackage name as you do so. Do you really need that particular flexibility? Is it really worth more than the very real cost of relative imports?
Why the subpackages, instead of putting everything in the main package, src? Idlelib has about 60 run-time modules in idlelib/ itself. The only subpackage is idle_test, for test modules. All imports start with idlelib.
I am a big fan of being able to run the test for one module (rather than the package test suite) by running non-cli modules as the main module. It encourages incremental TDD. So I have 'if name.. ' clauses in both run-time and test modules in idlelib.

Resources