How python brings functions into scope? - python-3.x

I contribute to scikit-image and was using coverage. Now, when I did
coverage run benchmarks/benchmark_name.py
and then generated the report, there were a lot of files that didn't have any link to this file but were still executed when I ran the above command. One interesting thing that I noted in those files, only the lines having a function definition(def abc():) were run. See the image below:
It basically shows the coverage report of a file which didn't have any link to my file. Yet, it was run and only the function definition statements and import statements.
Is this the way python brings functions defined in the project into its scope? If that's the case, I would like to know the flow in which this happened. Please help.
Thanks.

You're looking at import transitive dependencies. At import time, anything not protected by an if __name__ == '__main__': clause will be executed, including the def statements that you mentioned.
Use coverage run --omit=... and similar options to trim your reporting output.

Related

How to develop Python Library in DataBricks without packaging and installing after every single change?

For simplicity, say I have 2 Python scripts. 1 is main, 1 is lib. My question is how can I test my lib in main without needing to build the lib and installing it every single time?
Single file can be done easily as answered here ( https://stackoverflow.com/a/67280018/18105234 ). What about I have nested library?
The idea is to perform development in DataBricks like in a Jupyter Lab.
There are two approaches:
Use %run (doc) to include the "library" notebook into "main" notebook. You need to re-execute that %run cell. Full example of this approach could be found in this file.
Use new functionality of Databricks Repos called arbitrary files - in this case, your library code should be in the Python file, together with corresponding __init__.py (right now you can't use notebooks), and then you include it as a "normal" Python package using import command. To automatically reload changes from package you need to use special magic commands, as it's shown in another example:
%load_ext autoreload
%autoreload 2
The 2nd approach has more advantages, as it allows to take the code, and, for example, build a library from it, or apply more code checks, that aren't possible with notebooks out of box.
P.S. My repository shows full example of how to use Databricks Repos and perform testing of the code in notebooks from CI/CD pipeline

Why is doctest skipping tests on imported methods?

I have a python module some_module with an __init__.py file that imports methods, like this:
from .some_python_file import some_method
some_method has a docstring that includes doctests:
def some_method():
"""
>>> assert False
"""
But when I run doctest on the module, it passes even if the tests should fail.
import some_module
# How do I get this to consistently fail, regardless of whether
# `some_module.some_method` was declared inline or imported?
assert doctest.testmod(some_module).failed == 0
If I instead define some_method within the __init__.py file, the doctest correctly fails.
Why are these two situations behaving differently? The method is present and has the same __doc__ attribute in both cases.
How do I get doctest to run the tests defined in the dostrings of methods that were imported into the module?
In Python a module is define by a single file and in your case some_python_file is a module while __init__ is another one. Doctest has a check to run tests only for examples reachable from the module which can be found here.
The best way to see this behaviour in practice is to use PDB and pdb.set_trace() right before you call doctest.testmod(some_module) and step inside to follow the logic.
LE:
Doctest ignores imported methods per this comment. If you want to be able to run your test you should probably define a main function in your module and run the test test with python some_module.py. You can follow this example.
To achieve your expected behaviour you need to manually create a __test__ dict in your init file:
from .some_python_file import some_method
__test__ = {"some_method": some_method}
See this link as well.
Objects imported into the module are not searched.
See docs on which docstrings are examined.
You can can inject the imported function into the module's __test__ attribute to have imported objects tested:
__test__ = {'some_method': some_method}
I stumbled upon this question because I was hacking a __doc__ attribute of an imported object.
from foo import bar
bar.__doc__ = """
>>> assert True
"""
and I was also wondering why the doctest of bar did not get executed by the doctest runner.
The previously given answer to add a `__test__` mapping solved it for good.
```python
__test__ = dict(bar=bar.__doc__)
I think the explanation for this behaviour is the following. If you are using a library, lets say NumPy, you do not want all of their doctests to be collected and run in your own code.
Simply, because it would be redundant.
You should trust the developers of the library to (continuously) test their code, so you do not have to do it.
If you have tests defined in your own code, you should have a test collector (e.g. pytest) descend into all files of your project structure and run these.
You would end up testing all doctests in used libraries, which takes a lot of time. So the decision to ignore imported doctests is very sane.

When should I use pytest --import-mode importlib

This is my project structure:
git_dir/
root/
__init__.py
tests/
__init__.piy
test1.py
foo_to_test/
foo.py
I'm using pytest to test foo.py, and test1.py is as follows:
from foo_to_test.foo import func_to_test
def test1():
assert something about func_to_test
From the terminal, i want to run
pytest tests
Now for the problem.
When using --import-mode append or --import-mode prepend it adds 'git_dir' to PYTHONPATH.
The problem is that 'git_dir' is not the project root.
I can add sys.path.append('git_dir/root') but the name 'git_dir' is different for other programmers working on other computers.
I've seen in the pytest documentation that using --import-mode importlib might solve my problem, but it doesn't seem to have any effect on my PYTHONPATH and i can't understand what it is doing.
So my questions are:
What --import-mode importlib is doing?
How can i automatically add my root to the path when testing so it will be the same for every programmer pulling this project?
This looks hard because it's not how it's supposed to work. It may be time to learn about packaging.
The Python module system is designed to look for modules on the sys.path directories, as you describe. This variable is to be populated by the user, somehow, and should preferably not be meddled with.
When a developer wants to use your package, they must make sure it is installed on the system they run the tests on.
If you want to make that easier for them, provide e.g. a project.toml file; then they can build your package, and install it with pip install /path/to/your/distribution-file.tgz.
More info about that here: https://setuptools.readthedocs.io/en/latest/userguide/quickstart.html#python-packaging-at-a-glance.
Your problem is that you have __init__.py in the root directory. When pytest finds a test module, it goes to the parent folders until it finds the one that does not have __init__.py. That's going to be you a directory that's added to sys.path, see Choosing a test layout / import rules. So your root directory should NOT be a Python module.
Now about importlib and why you probably don't need it
--import-mode=importlib circumvents the standard Python way of using modules and sys.path. The exact reasons are unclear from the docs:
With --import-mode=importlib things are less convoluted because pytest doesn’t need to change sys.path or sys.modules, making things much less surprising.
And they also mention that this allows test modules to have non-unique names. But why would we want this?! It seems like with proper file structure everything works fine even if we mess with sys.path.
Anyway, overall the usage of this option doesn't sound convincing, especially given this in the official docs:
Initially we intended to make importlib the default in future releases, however it is clear now that it has its own set of drawbacks so the default will remain prepend for the foreseeable future.

Running doctests using a different function

I wrote a module with a few functions along with their doctests, and I would like to run these tests on functions with the same names but written by someone else.
The documentation provides the following snippet for retrieving all tests for somefunction in mymodule and then running them in the usual way (like running doctest.testmod()):
TESTS = doctest.DocTestFinder().find(mymodule.somefunction)
DTR = doctest.DocTestRunner(verbose=True)
for test in TESTS:
print (test.name, '->', DTR.run(test))
But I don't know where to go from here to have those tests run on theirmodule.somefunction instead. I tried changing the filename field from mymodule to theirmodule in the Example objects for each test, but to no avail. Does anyone know how to achieve this?
This may not be the most elegant solution, but simply copying my docstrings to their functions in my script works:
theirmodule.somefunction.__doc__ = mymodule.somefunction.__doc__
And then I only need to run the snippet in my question on theirmodule.somefunction.

Python 3 relative imports or running modules from within packages... which to abandon?

I'll first begin by stating that I have been coming back to this problem over and over again for the last several months. No matter how much I research into it, I cannot seem to find a satisfactory answer. I'm hoping that the community here can help me out.
Basic problem - consider this structure of python packages and python modules.
|- src
|- pkg_1
|- foo_1.py
|- foo_2.py
|- __init__.py
|- pkg2
|- bar_1.py
|- bar_2.py
|- __init__.py
|- do_stuff.py
|- __init__.py
Suppose that module bar_2 needs to import from module foo_1. My options are myriad, but a few present themselves quickly.
(My preferred way) Module bar_2 can do from ..pkg_1 import foo_1. This is great because it doesn't require what amounts to hard-coding a path into the module, thereby allowing flexibility, ease of maintenance, all that. In do_stuff.py if I then write from src.pkg_2 import bar_2 and then run, I am golden. Here is an example setup:
foo_1.py:
class Foo_1():
def __init__(self):
print('Hello from foo_1!')
bar_2.py:
from ..pkg_1 import foo_1
class Bar_2():
def __init__(self):
print('Hello from bar_2!')
foo_1.Foo_1() #Prints foo_1 message!
do_stuff.py:
from src.pkg_2 import bar_2
bar_2.Bar_2()
Console prints:
Hello from bar_2!
Hello from foo_1!
All is well. However, consider the next scenario.
Suppose now that I want to run bar_2 as __main__, as follows:
from ..pkg_1 import foo_1
class Bar_2():
def __init__(self):
print('Hello from bar_2!')
foo_1.Foo_1()
if __name__ == '__main__':
Bar_2()
A SystemError is raised: from ..pkg_1 import foo_1
SystemError: Parent module '' not loaded, cannot perform relative import
For far longer than I care to admit, I did not understand the reason for this. The solution, though, lies in the fact that when you run a module directly, its __name__ variable is set to __main__. Since the relative imports establish position in the hierarchy with __name__, this means that there is no directory information to parse to figure things out. This makes loads of sense, and I feel very dumb for having not realized it before.
So, thus began my quest (yeah, just getting started). Eventually I learned of the __package__ variable. Reading about it in the PEP notes, it seemed as though it would solve all my problems! So I tried the following boilerplate code before the import statements in bar_2:
if __name__ == '__main__':
__package__ = 'src.pkg_2'
This did not work. :(
I since have come to find out that Guido has addressed this very issue and that he regards the whole notion of running a module from within a package as anti-pattern.
See this link:
https://mail.python.org/pipermail/python-3000/2007-April/006793.html
This makes sense, as I will be the first to admit that I only do it for on the fly testing... which should never be done! (Right??) Therefore, as best as I understand, there are no elegant ways to run the module from within a package UNLESS you do absolute imports... which I would like to avoid.
So, after all of that, here is my question: Should I use one of the many hacky ways out there to get around this problem and do unholy things with the system path so that I can have my relative import cake and eat it (i.e. test on the fly by running as __main__) too?? I think I already know the answer (I just would like some wizened Yoda-like person to confirm).
Possible Answer:
Use relative imports, because hard-coding paths (and hard-coding in general if avoidable) is ant-pattern.
Do not bother with running modules nested in packages as __main__... instead, run them from your testing module (which you wrote first... right??).
Thank you for taking the time to read this question. I realize there are many other questions regarding this topic... but I wanted to 'share my journey' and see if my current course of action is correct.
Running the following in src/:
python -m pkg2.bar_2
will have pkg2/bar_2.py be your main script while still being inside a package, meaning that relative imports will work.
I follow these rules which prevent any issues:
Always use full imports, no relative imports.
Always start a Python process from the root of the project or using the full absolute path.
It's rather rare that you benefit from relative imports when you have to move files around (and when you do, it's not that much work to rename a few import lines).
Full paths from the root of a project or using the full path on disk of a file removes any ambiguity and will run the desired file.
My answer (and that of many CPython core developers) is essentially the same as Simeon's. The only thing not hard-coded by relative imports is the name of the package ('src', in this case). But you hard-coded it in do_stuff.py anyway. Within-subpackage relative imports (unlike the cross-subpackage imports you exhibit) let you copy the subpackage to another package (with a different name), and possibly change the subpackage name as you do so. Do you really need that particular flexibility? Is it really worth more than the very real cost of relative imports?
Why the subpackages, instead of putting everything in the main package, src? Idlelib has about 60 run-time modules in idlelib/ itself. The only subpackage is idle_test, for test modules. All imports start with idlelib.
I am a big fan of being able to run the test for one module (rather than the package test suite) by running non-cli modules as the main module. It encourages incremental TDD. So I have 'if name.. ' clauses in both run-time and test modules in idlelib.

Resources