Retrieving the source code dependencies of a python 3 function - python-3.x

Using the AST in python 3, how do you build a directory or list of code dependencies of a given function?
Consider the following code, where my_clever_function has the desired behaviour:
////// myfile2.py
import numpy as np
a = 1
a += 1
def my_other_function():
def f():
return a
return np.random.randint() + f()
////// myfile1.py
import numpy as np
from . myfile2 import my_other_function
def external(a, b):
return np.sqrt(a * b) + my_other_function
class A:
def afunc(self, a, b):
v = external(a, b)
return v
>>> my_clever_function(A.afunc)
[myfile1.A.afunc, myfile1.external, myfile2.my_other_function, myfile2.a]
with the following structure:
project/
myfile1.py
myfile2.py
I want to retrieve the dependencies of the method afunc as a list.
I'm assuming that there is no funny business about functions altering global variables.
external is a dependency because it is not defined inside A.afunc
np.sqrt is not a "dependency" (in this sense anyway) because it is not defined in my project
likewise for np.random.randint
my_other_function is a dependency because it is not defined inside A.afunc
f is not a dependency because it is inside my_other_function
f needs the global variable a.
My motivation is to see if there have been any code changes between two project versions (in git perhaps).
We could find the dependencies of function like above and store the source.
In the future, we find the dependencies again and see if the source code is different.
We only compare the bits that are required (barring any funny global variables messing inside functions).
It is possible to walk the AST with python's builtin module ast.
So my_clever_function could look like this:
import ast
import dill
class Analyzer(ast.NodeVisitor):
def __init__(self):
self.stats = {...}
...
def report(self):
pprint(self.stats)
def my_clever_function(f):
source = dill.source.getsource(f)
tree = ast.parse(source)
analyser = Analyser()
analyser.visit(tree)
But how do you walk from a given function outwards to its dependencies?
I can see how you can just list symbols (https://www.mattlayman.com/blog/2018/decipher-python-ast/) but how do only list those which depend on the start node?

Related

Python: Pseudo Enums - Classes as enums - how to avoid cyclic import

I want to create a pseudo enums in my python project.
The values are actually classes.
# file my_enums.py
import MyClass1
import MyClass2
import MyClass3
class MyEnum:
MY_CLASS_1 = MyClass1
MY_CLASS_2 = MyClass2
MY_CLASS_3 = MyClass3
# file my_class1.py
import MyEnum
class MyClass1:
def foo(self, x):
print(isinstance(x, MyEnum.MY_CLASS_2))
Doing this will result in cyclic import error.
I want to be able to use the MyEnum values in isinstance function and to import the enum to modules that define some of those classes.
Is there a way to do so?
Solution:
# file my_enums.py
import MyClass1
import MyClass2
import MyClass3
class MyEnum:
MY_CLASS_1 = None
MY_CLASS_2 = None
MY_CLASS_3 = None
#classmethod
def define(cls):
cls.MY_CLASS_1 = MyClass1
MyEnum.define()
The thing to remember is that when a module is loaded, it is executed -- but only top level statements and the immediate interior of top-level classes; the bodies of functions and methods are not evaluated until they are actually called.
# example module
CONSTANT = 7 # top-level, executed
def a_func(value=CONSTANT): # top-level, executed
return value + 9 # body, not executed
class a_class(metaclass=SomeMeta): # top-level, executed (and error as SomeMeta
# has not been defined nor imported)
CLS_CONSTANT = 3 # top-level class body, executed
def a_method(self): # executed
return self.CLS_CONSTANT + FUTURE_CONSTANT # method body, not executed
FUTURE_CONSTANT = 11
So in your example you need to make sure and not use MyEnum anywhere in my_class1.py that will be executed during import, and put the import of my_enums.py at the very end -- then when my_enums.py is executed during its import it will be able to import my_class1 which will, at that point, have the classes defined.

Sphinx autodoc does not display all types or circular import error

I am trying to auto document types with sphinx autodoc, napoleon and autodoc_typehints but I am having problems as it does not work with most of my types. I am using the deap package to do some genetic optimization algorithm, which makes that I have some very specific types I guess sphinx cannot handle.
My conf.py file looks like this:
import os
import sys
sys.path.insert(0, os.path.abspath('../python'))
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx.ext.napoleon',
'sphinx_autodoc_typehints'
]
set_type_checking_flag = False
always_document_param_types = False
I have an Algo.rst file with:
.. automodule:: python.algo.algo
:members: crossover_worker,
test
and my python.algo.algo module looks like this (I've added a dummy test function to show it works whenever I have no special types specified):
# Type hinting imports
from config.config import Config
from typing import List, Set, Dict, NamedTuple, Union, Tuple
from types import ModuleType
from numpy import ndarray
from numpy import float64
from multiprocessing.pool import MapResult
from deap.tools.support import Logbook, ParetoFront
from deap.base import Toolbox
from pandas.core.frame import DataFrame
from deap import creator
...
def crossover_worker(sindices: List[creator.Individual, creator.Individual]) -> Tuple[creator.Individual, creator.Individual]:
"""
Uniform crossover using fixed threshold
Args:
sindices: list of two individuals on which we want to perform crossover
Returns:
tuple of the two individuals with crossover applied
"""
ind1, ind2 = sindices
size = len(ind1)
for i in range(size):
if random.random() < 0.4:
ind1[i], ind2[i] = ind2[i], ind1[i]
return ind1, ind2
def test(a: DataFrame, b: List[int]) -> float:
"""
test funcition
Args:
a: something
b: something
Returns:
something
"""
return b
When settings in conf.py are like above I have no error, types for my test function are correct, but types for my crossover_worker function are missing:
However, when I set the set_type_checking_flag= True to force using all types, I have a circular import error:
reading sources... [100%] index
WARNING: autodoc: failed to import module 'algo' from module 'python.algo'; the following exception was raised:
cannot import name 'ArrayLike' from partially initialized module 'pandas._typing' (most likely due to a circular import) (/usr/local/lib/python3.8/site-packages/pandas/_typing.py)
looking for now-outdated files... none found
And I never import ArrayLike so I don't get it from where it comes or how to solve it?
Or how to force to import also the creator.Individual types that appear everywhere in my code?
My sphinx versions:
sphinx==3.0.1
sphinx-autodoc-typehints==1.10.3
After some searching there were some flaws with my approach:
Firstly a "list is a homogeneous structure containing values of one type. As such, List only takes a single type, and every element of that list has to have that type." (source). Consequently, I cannot do something like List[creator.Individual, creator.Individual], but should transform it to List[creator.Individual] or if you have multiple types in the list, you should use an union operator, such as List[Union[int,float]]
Secondly, the type creator.Individual is not recognized by sphinx as a valid type. Instead I should define it using TypeVar as such:
from typing import TypeVar, List
CreatorIndividual = TypeVar("CreatorIndividual", bound=List[int])
So by transforming my crossover_worker function to this, it all worked:
def crossover_worker(sindices: List[CreatorIndividual]) -> Tuple[CreatorIndividual, CreatorIndividual]:
Note: "By contrast, a tuple is an example of a product type, a type consisting of a fixed set of types, and whose values are a collection of values, one from each type in the product type. Tuple[int,int,int], Tuple[str,int] and Tuple[int,str] are all distinct types, distinguished both by the number of types in the product and the order in which they appear."(source)

Retaining a variable created during module import in python

I am trying to populate a dictionary with functions along with the name of the function contained in another file of the form:
{'fn_a': function fn_a at 0x000002239BDCB510, 'fn_b': function fn_b at 0x000002239BDCB268}.
I'm currently attempting to do it with a decorator so when the file containing the functions (definitions.py) is imported the dictionary is populated as follows. The problem is that dictionary is cleared once the import is complete.
definitions.py:
from main import formatter
#formatter
def fn_a(arg):
return arg
#formatter
def fn_b(arg):
return arg
main.py:
available_functions = {}
def formatter(func):
# work out function name and write to func_name
func_name=str(func).split()[1]
available_functions[func_name] = func
return func
import definitions
How can I keep the dictionary populated with values after the module import is finished?
I was able to solve the problem using the FunctionType module to return the available functions from the imported module. It doesn't solve the problem within the conditions I specified above, but does work.
from types import FunctionType
available_functions = {}
def formatter(func):
# work out function name and write to func_name
#global available_functions
func_name=str(func).split()[1]
available_functions[func_name] = func
return func
import definitions
funcs=[getattr(definitions, a) for a in dir(definitions)
if isinstance(getattr(definitions, a), FunctionType)]
for i in funcs:
formatter(i)

mypy importlib module functions

I am using importlib to import modules at runtime. These modules are plugins for my application and must implement 1 or more module-level functions. I have started adding type annotations to my applications and I get an error from mypy stating
Module has no attribute "generate_configuration"
where "generate_configuration" is one of the module functions.
In this example, the module is only required to have a generate_configuration function in it. The function takes a single dict argument.
def generate_configuration(data: Dict[str, DataFrame]) -> None: ...
I have been searching around for how to specify the interface of a module but all I can find are class interfaces. Can someone point me to some documentation showing how to do this? My google-fu is failing me on this one.
The code that loads this module is shown below. The error is generated by the last line.
plugin_directory = os.path.join(os.path.abspath(directory), 'Configuration-Generation-Plugins')
plugins = (
module_file
for module_file in Path(plugin_directory).glob('*.py')
)
sys.path.insert(0, plugin_directory)
for plugin in plugins:
plugin_module = import_module(plugin.stem)
plugin_module.generate_configuration(directory, points_list)
The type annotation for importlib.import_module simply returns types.ModuleType
From the typeshed source:
def import_module(name: str, package: Optional[str] = ...) -> types.ModuleType: ...
This means that the revealed type of plugin_module is Module -- which doesn't have your specific attributes.
Since mypy is a static analysis tool, it can't know that the return value of that import has a specific interface.
Here's my suggestion:
Make a type interface for your module (it doesn't have to be instantiated, it'll just help mypy figure things out)
class ModuleInterface:
#staticmethod
def generate_configuration(data: Dict[str, DataFrame]) -> None: ...
Make a function which imports your module, you may need to sprinkle # type: ignore, though if you use __import__ instead of import_module you may be able to avoid this limitation
def import_module_with_interface(modname: str) -> ModuleInterface:
return __import__(modname, fromlist=['_trash']) # might need to ignore the type here
Enjoy the types :)
The sample code I used to verify this idea:
class ModuleInterface:
#staticmethod
def compute_foo(bar: str) -> str: ...
def import_module_with_interface(modname: str) -> ModuleInterface:
return __import__(modname, fromlist=['_trash'])
def myf() -> None:
mod = import_module_with_interface('test2')
# mod.compute_foo() # test.py:12: error: Too few arguments for "compute_foo" of "ModuleInterface"
mod.compute_foo('hi')
I did some more research and eventually settled on a slightly different solution which uses typing.cast.
The solution still uses the static method definition from Anthony Sottile.
from typing import Dict
from pandas import DataFrame
class ConfigurationGenerationPlugin(ModuleType):
#staticmethod
def generate_configuration(directory: str, points_list: Dict[str, DataFrame]) -> None: ...
The code that imports the module then uses typing.cast() to set the correct type.
plugin_directory = os.path.join(os.path.abspath(directory), 'Configuration-Generation-Plugins')
plugins = (
module_file
for module_file in Path(plugin_directory).glob('*.py')
if not module_file.stem.startswith('lib')
)
sys.path.insert(0, plugin_directory)
for plugin in plugins:
plugin_module = cast(ConfigurationGenerationPlugin, import_module(plugin.stem))
plugin_module.generate_configuration(directory, points_list)
I am not sure how I feel about having to add the ConfigurationGenerationPlugin class or the cast() call to the code just to make mypy happy. However, I am going to stick with it for now.

Which form of relative import to prefer inside a package

I'm writing a library named Foo for an example.
The __init__.py file:
from .foo_exceptions import *
from .foo_loop import FooLoop()
main_loop = FooLoop()
from .foo_functions import *
__all__ = ['main_loop'] + foo_exceptions.__all__ + foo_functions.__all__
When installed, it can be used like this:
# example A
from Foo import foo_create, main_loop
foo_obj = foo_create()
main_loop().register(foo_obj)
or like this:
# example B
import Foo
foo_obj = Foo.foo_create()
Foo.main_loop().register(foo_obj)
I clearly prefer the example B approach. No name conflicts and the source of each external object is explicitely stated.
So much for introduction, now my question. Inside this library I need to import something from a different file. Again, I have several ways to do it. And the question is which style to prefer - C, D or E? Read below.
# example C
from . import foo_exceptions
raise foo_exceptions.FooError("fail")
or
# example D
from .foo_exceptions import FooError
raise FooError("fail")
or
# example E
from . import FooError
raise FooError("fail")
Approach C has the disadvantage, that importing a whole module instead of importing just a few required objects increases the chance of a cyclical import problem. Also consider this line:
from . import foo_exceptions, main_loop
It looks like an import of 2 symbols from one source, but it isn't. The former (foo_exceptions) is a module (.py file) in the current directory and the latter is an object defined in __init__.py.
That's why I'm not using style C and the question in its final form is: D or E (and why)?
(Thank you for reading this long question. All code fragments are examples only and may contain typos)
After the answer from alexanderlukanin:
EDIT1: corrected errors in init.py
NOTE1: foo_ prefixes are only to emphasize the relationship between objects
EDIT2: When importing an object which is not part of the library interface, style E is not usable. I think we have a winner: It's the from .module import symbol form.
Don't use old-style relative imports:
# Import from foo/foo_loop.py
# This DOES NOT WORK in Python 3
# and MAY NOT WORK AS EXPECTED in Python 2
from foo_loop import FooLoop
# This is reliable and unambiguous
from .foo_loop import FooLoop
Don't use asterisk import unless you really have to.
# Namespace pollution! Name clashes!
from .submodule import *
Don't use prefixes - you've got namespaces exactly for that purpose.
# Unpythonic
from foo import foo_something_create
foo_something_create()
# Pythonic
import foo.something
foo.something.create()
Your package's API must be well-defined. Your implementation must not be too tangled. The rest is a matter of taste.
# [C] This is good.
# Import order: __init__.py, exceptions.py
from . import exceptions
raise exceptions.FooError
# [D] This is also fine.
# Import order is the same as above,
# only name binding inside the current module is different.
from .exceptions import FooError
raise FooError
# [E] This is not as good because it adds one unnecessary level of indirection
# submodule.py -> __init__.py -> exceptions.py
from . import FooError
raise FooError
See also: Circular (or cyclic) imports in Python

Resources