A solution for this problem is available for Python 2, but it uses the imp module which is deprecated in Python 3.
imp has been replaced by importlib which works well for file based imports. Specifically, importlib.import_module requires a file name - not a string or a file handler.
I made a workaround by dumping the contents of the URL to a file and importing it
def initlog():
modulename = '_mylogging'
try:
import _mylogging
except ImportError:
r = requests.get('http://(...)/mylogging.py')
with open(modulename+'.py', "w") as f:
f.write(r.text)
finally:
import _mylogging
return _mylogging.MYLogging().getlogger()
but I would like to avoid the intermediate file.
Putting the security, network performance and availability issues aside - is there a way to feed a string to importlib? (or from a file handler, in which case I would use io.StringIO)
You can adapt exactly the same answer to 3.x, using the replacement for imp.new_module:
from types import ModuleType
foo = ModuleType('foo')
and the replacement for the exec statement:
foo_code = """
class Foo:
pass
"""
exec(foo_code, globals(), foo.__dict__)
After which everything works as expected:
>>> dir(foo)
['Foo', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> foo.Foo()
<__main__.Foo object at 0x110546ba8>
Related
I'm working on a documentation (personal) for nested matplotlib (MPL) library, which differs from MPL own provided, by interested submodule packages. I'm writing Python script which I hope will automate document generation from future MPL releases.
I selected interested submodules/packages and want to list their main classes from which I'll generate list and process it with pydoc
Problem is that I can't find a way to instruct Python to load submodule from string. Here is example of what I tried:
import matplotlib.text as text
x = dir(text)
.
i = __import__('matplotlib.text')
y = dir(i)
.
j = __import__('matplotlib')
z = dir(j)
And here is 3 way comparison of above lists through pprint:
I don't understand what's loaded in y object - it's base matplotlib plus something else, but it lack information that I wanted and that is main classes from matplotlib.text package. It's top blue coloured part on screenshot (x list)
Please don't suggest Sphinx as different approach.
The __import__ function can be a bit hard to understand.
If you change
i = __import__('matplotlib.text')
to
i = __import__('matplotlib.text', fromlist=[''])
then i will refer to matplotlib.text.
In Python 3.1 or later, you can use importlib:
import importlib
i = importlib.import_module("matplotlib.text")
Some notes
If you're trying to import something from a sub-folder e.g. ./feature/email.py, the code will look like importlib.import_module("feature.email")
Before Python 3.3 you could not import anything if there was no __init__.py in the folder with file you were trying to import (see caveats before deciding if you want to keep the file for backward compatibility e.g. with pytest).
importlib.import_module is what you are looking for. It returns the imported module.
import importlib
# equiv. of your `import matplotlib.text as text`
text = importlib.import_module('matplotlib.text')
You can thereafter access anything in the module as text.myclass, text.myfunction, etc.
spent some time trying to import modules from a list, and this is the thread that got me most of the way there - but I didnt grasp the use of ___import____ -
so here's how to import a module from a string, and get the same behavior as just import. And try/except the error case, too. :)
pipmodules = ['pycurl', 'ansible', 'bad_module_no_beer']
for module in pipmodules:
try:
# because we want to import using a variable, do it this way
module_obj = __import__(module)
# create a global object containging our module
globals()[module] = module_obj
except ImportError:
sys.stderr.write("ERROR: missing python module: " + module + "\n")
sys.exit(1)
and yes, for python 2.7> you have other options - but for 2.6<, this works.
Apart from using the importlib one can also use exec method to import a module from a string variable.
Here I am showing an example of importing the combinations method from itertools package using the exec method:
MODULES = [
['itertools','combinations'],
]
for ITEM in MODULES:
import_str = "from {0} import {1}".format(ITEM[0],', '.join(str(i) for i in ITEM[1:]))
exec(import_str)
ar = list(combinations([1, 2, 3, 4], 2))
for elements in ar:
print(elements)
Output:
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)
Module auto-install & import from list
Below script works fine with both submodules and pseudo submodules.
# PyPI imports
import pkg_resources, subprocess, sys
modules = {'lxml.etree', 'pandas', 'screeninfo'}
required = {m.split('.')[0] for m in modules}
installed = {pkg.key for pkg in pkg_resources.working_set}
missing = required - installed
if missing:
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--upgrade', 'pip'])
subprocess.check_call([sys.executable, '-m', 'pip', 'install', *missing])
for module in set.union(required, modules):
globals()[module] = __import__(module)
Tests:
print(pandas.__version__)
print(lxml.etree.LXML_VERSION)
I developed these 3 useful functions:
def loadModule(moduleName):
module = None
try:
import sys
del sys.modules[moduleName]
except BaseException as err:
pass
try:
import importlib
module = importlib.import_module(moduleName)
except BaseException as err:
serr = str(err)
print("Error to load the module '" + moduleName + "': " + serr)
return module
def reloadModule(moduleName):
module = loadModule(moduleName)
moduleName, modulePath = str(module).replace("' from '", "||").replace("<module '", '').replace("'>", '').split("||")
if (modulePath.endswith(".pyc")):
import os
os.remove(modulePath)
module = loadModule(moduleName)
return module
def getInstance(moduleName, param1, param2, param3):
module = reloadModule(moduleName)
instance = eval("module." + moduleName + "(param1, param2, param3)")
return instance
And everytime I want to reload a new instance I just have to call getInstance() like this:
myInstance = getInstance("MyModule", myParam1, myParam2, myParam3)
Finally I can call all the functions inside the new Instance:
myInstance.aFunction()
The only specificity here is to customize the params list (param1, param2, param3) of your instance.
You can also use exec built-in function that execute any string as a Python code.
In [1]: module = 'pandas'
...: function = 'DataFrame'
...: alias = 'DF'
In [2]: exec(f"from {module} import {function} as {alias}")
In [3]: DF
Out[3]: pandas.core.frame.DataFrame
For me this was the most readable way to solve my problem.
Is there a better way to reload a hydra config from an experiment with enumerations? Right now I reload it like so:
initialize_config_dir(config_dir=exp_dir, ".hydra"), job_name=config_name)
cfg = compose(config_name, overrides=overrides)
print(cfg.enum)
>>> ENUM1
But ENUM1 is actually an enumeration that normally loads as
>>> <SomeEnumClass.ENUM1: 'enum1'>
I am able to fix this by adding a configstore default to the experiment hydra file:
defaults:
- base_config_cs
Which now results in
initialize_config_dir(config_dir=exp_dir, ".hydra"), job_name=config_name)
cfg = compose(config_name, overrides=overrides)
print(cfg.enum)
>>> <SomeEnumClass.ENUM1: 'enum1'>
Is there a better way to do this without adding this? Or can I add the default in the python code?
This is a good question -- reliably reloading configs from previous Hydra runs is an area that could be improved.
As you've discovered, loading the saved file config.yaml directly results in an untyped DictConfig object.
The solution below involves a script called reload.py that creates a config node with a defaults list that loads both the schema base_config_cs and the saved file config.yaml.
At the end of this post I also give a simple solution that involves loading .hydra/overrides.yaml to re-run the config composition process.
Suppose you've run a Hydra job with the following setup:
# app.py
from dataclasses import dataclass
from enum import Enum
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import DictConfig
class SomeEnumClass(Enum):
ENUM1 = 1
ENUM2 = 2
#dataclass
class Schema:
enum: SomeEnumClass
x: int = 123
y: str = "abc"
def store_schema() -> None:
cs = ConfigStore.instance()
cs.store(name="base_config_cs", node=Schema)
#hydra.main(config_path=".", config_name="foo")
def app(cfg: DictConfig) -> None:
print(cfg)
if __name__ == "__main__":
store_schema()
app()
# foo.yaml
defaults:
- base_config_cs
- _self_
enum: ENUM1
x: 456
$ python app.py y=xyz
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
After running app.py, there exists a directory outputs/2022-02-05/06-42-42/.hydra containing the saved file config.yaml.
As you correctly pointed out in your question, to reload the saved config you must merge the schema base_config_cs with the contents of config.yaml. Here is a pattern for accomplishing that:
# reload.py
import os
from hydra import compose, initialize_config_dir
from hydra.core.config_store import ConfigStore
from app import store_schema
config_name = "config"
exp_dir = os.path.abspath("outputs/2022-02-05/07-19-56")
saved_cfg_dir = os.path.join(exp_dir, ".hydra")
assert os.path.exists(f"{saved_cfg_dir}/{config_name}.yaml")
store_schema() # stores `base_config_cs`
cs = ConfigStore.instance()
cs.store(
name="reload_conf",
node={
"defaults": [
"base_config_cs",
config_name,
]
},
)
with initialize_config_dir(config_dir=saved_cfg_dir):
cfg = compose("reload_conf")
print(cfg)
$ python reload.py
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
In the above, python file reload.py, we store a node called reload_conf in the ConfigStore. Storing reload_conf this way is equivalent to creating a file called reload_conf.yaml that is discoverable by Hydra on the config search path. This reload_conf node has a defaults list that loads both the schema base_config_cs and config. For this to work, the following two conditions must be met:
the schema base_config_cs must be stored in the ConfigStore. This is accomplished by calling the store_schema function that we have imported from app.py.
a config node with name specified by the variable config_name, i.e. config.yaml in this example, must be discoverable by Hydra (which is taken care of here by calling initialize_config_dir).
Note that in foo.yaml we have a defaults list ["base_config_cs", "_self_"] that loads the schema base_config_cs before loading the contents _self_ of foo. In order for reload_conf to reconstruct the app's config with the same merge order, base_config_cs should come before config_name in the defaults list belonging to reload_conf.
The above approach could be taken one step further by removing the defaults list from foo.yaml and using cs.store to ensure the same defaults list is used in both the app and the reloading script
# app2.py
from dataclasses import dataclass
from enum import Enum
from typing import Any, List
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING, DictConfig
class SomeEnumClass(Enum):
ENUM1 = 1
ENUM2 = 2
#dataclass
class RootConfig:
defaults: List[Any] = MISSING
enum: SomeEnumClass = MISSING
x: int = 123
y: str = "abc"
def store_root_config(primary_config_name: str) -> None:
cs = ConfigStore.instance()
# defaults list defined here:
cs.store(
name="root_config", node=RootConfig(defaults=["_self_", primary_config_name])
)
#hydra.main(config_path=".", config_name="root_config")
def app(cfg: DictConfig) -> None:
print(cfg)
if __name__ == "__main__":
store_root_config("foo2")
app()
# foo2.yaml (note NO DEFAULTS LIST)
enum: ENUM1
x: 456
$ python app2.py hydra.job.chdir=false y=xyz
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
# reload2.py
import os
from hydra import compose, initialize_config_dir
from hydra.core.config_store import ConfigStore
from app2 import store_root_config
config_name = "config"
exp_dir = os.path.abspath("outputs/2022-02-05/07-45-43")
saved_cfg_dir = os.path.join(exp_dir, ".hydra")
assert os.path.exists(f"{saved_cfg_dir}/{config_name}.yaml")
store_root_config("config")
with initialize_config_dir(config_dir=saved_cfg_dir):
cfg = compose("root_config")
print(cfg)
$ python reload2.py
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
A simpler alternative approach is to use .hydra/overrides.yaml to recompose the app's configuration based on the overrides that were originally passed to Hydra:
# reload3.py
import os
import yaml
from hydra import compose, initialize
from app import store_schema
config_name = "config"
exp_dir = os.path.abspath("outputs/2022-02-05/07-19-56")
saved_cfg_dir = os.path.join(exp_dir, ".hydra")
overrides_path = f"{saved_cfg_dir}/overrides.yaml"
assert os.path.exists(overrides_path)
overrides = yaml.unsafe_load(open(overrides_path, "r"))
print(f"{overrides=}")
store_schema()
with initialize(config_path="."):
cfg = compose("foo", overrides=overrides)
print(cfg)
$ python reload3.py
overrides=['y=xyz']
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
This approach has its drawbacks: if your app's configuration involves some non-hermetic operation like querying a timestamp (e.g. via Hydra's now resolver) or looking up an environment variable (e.g. via the oc.env resolver), the configuration composed by reload.py might be different from the original version loaded in app.py.
I'm refactoring some code and have moved around some files. But for backwards compatibility, I would like to make all of my modules keep their old import paths.
my file structure is as follows
--| calcs/
----| __init__.py
----| new_dir
------| new_file1.py
------| new_file2.py
What do I need to do ensure that I can use an import like
import calcs.newfile1.foo
# OR
from calcs.newfile1 import foo
I have tried a few methods of adding the imports to the top level __init__.py file. As is reccommended here
But while this seems to allow an import such as import calcs.newfile1, An import such as import calcs.newfile1.foo raises ModuleNotFoundError: No module named calcs.newfile1
I expect that I need python to recognize calcs.newfile1 as a **module **. At the moment it seems to just be importing it as a class or other object of some sort
The only way i know how to do it is by creating a custom import hook.
Here is the PEP for more information.
If you need some help on how to implement one, i'll suggest you to take a look at the six module,
here
and here
Basically your calcs/__init__.py will become like this:
''' calcs/__init__.py '''
from .new_dir import new_file1, new_file2
import sys
__path__ = []
__package__ = __name__
class CalcsImporter:
def __init__(self, exported_mods):
self.exported_mods = {
f'{__name__}.{key}': value for key, value in exported_mods.items()
}
def find_module(self, fullname, path=None):
if fullname in self.exported_mods:
return self
return None
def load_module(self, fullname):
try:
return sys.modules[fullname]
except KeyError:
pass
try:
mod = self.exported_mods[fullname]
except KeyError:
raise ImportError('Unable to load %r' % fullname)
mod.__loader__ = self
sys.modules[fullname] = mod
return mod
_importer = CalcsImporter({
'new_file1': new_file1,
'new_file2': new_file2,
})
sys.meta_path.append(_importer)
and you should be able to do from calcs.new_file1 import foo
I've written some python code that needs to read a config file at /etc/myapp/config.conf . I want to write a unit test for what happens if that file isn't there, or contains bad values, the usual stuff. Lets say it looks like this...
""" myapp.py
"""
def readconf()
""" Returns string of values read from file
"""
s = ''
with open('/etc/myapp/config.conf', 'r') as f:
s = f.read()
return s
And then I have other code that parses s for its values.
Can I, through some magic Python functionality, make any calls that readconf makes to open redirect to custom locations that I set as part of my test environment?
Example would be:
main.py
def _open_file(path):
with open(path, 'r') as f:
return f.read()
def foo():
return _open_file("/sys/conf")
test.py
from unittest.mock import patch
from main import foo
def test_when_file_not_found():
with patch('main._open_file') as mopen_file:
# Setup mock to raise the error u want
mopen_file.side_effect = FileNotFoundError()
# Run actual function
result = foo()
# Assert if result is expected
assert result == "Sorry, missing file"
Instead of hard-coding the config file, you can externalize it or parameterize it. There are 2 ways to do it:
Environment variables: Use a $CONFIG environment variable that contains the location of the config file. You can run the test with an environment variable that can be set using os.environ['CONFIG'].
CLI params: Initialize the module with commandline params. For tests, you can set sys.argv and let the config property be set by that.
In order to mock just calls to open in your function, while not replacing the call with a helper function, as in Nf4r's answer, you can use a custom patch context manager:
from contextlib import contextmanager
from types import CodeType
#contextmanager
def patch_call(func, call, replacement):
fn_code = func.__code__
try:
func.__code__ = CodeType(
fn_code.co_argcount,
fn_code.co_kwonlyargcount,
fn_code.co_nlocals,
fn_code.co_stacksize,
fn_code.co_flags,
fn_code.co_code,
fn_code.co_consts,
tuple(
replacement if call == name else name
for name in fn_code.co_names
),
fn_code.co_varnames,
fn_code.co_filename,
fn_code.co_name,
fn_code.co_firstlineno,
fn_code.co_lnotab,
fn_code.co_freevars,
fn_code.co_cellvars,
)
yield
finally:
func.__code__ = fn_code
Now you can patch your function:
def patched_open(*args):
raise FileNotFoundError
with patch_call(readconf, "open", "patched_open"):
...
You can use mock to patch a module's instance of the 'open' built-in to redirect to a custom function.
""" myapp.py
"""
def readconf():
s = ''
with open('./config.conf', 'r') as f:
s = f.read()
return s
""" test_myapp.py
"""
import unittest
from unittest import mock
import myapp
def my_open(path, mode):
return open('asdf', mode)
class TestSystem(unittest.TestCase):
#mock.patch('myapp.open', my_open)
def test_config_not_found(self):
try:
result = myapp.readconf()
assert(False)
except FileNotFoundError as e:
assert(True)
if __name__ == '__main__':
unittest.main()
You could also do it with a lambda like this, if you wanted to avoid declaring another function.
#mock.patch('myapp.open', lambda path, mode: open('asdf', mode))
def test_config_not_found(self):
...
I am trying to write a unit test to a class init that reads from a file using readlines:
class Foo:
def __init__(self, filename):
with open(filename, "r") as fp:
self.data = fp.readlines()
with sanity checks etc. included.
Now I am trying to create a mock object that would allow me to test what happens here.
I try something like this:
TEST_DATA = "foo\nbar\nxyzzy\n"
with patch("my.data.class.open", mock_open(read_data=TEST_DATA), create=True)
f = Foo("somefilename")
self.assertEqual(.....)
The problem is, when I peek into f.data, there is only one element:
["foo\nbar\nxyzzy\n"]
Which means whatever happened, did not get split into lines but was treated as one line. How do I force linefeeds to happen in the mock data?
This will not work with a class name
with patch("mymodule.class_name.open",
But this will work by mocking the builtin directly, builtins.open for python3
#mock.patch("__builtin__.open", new_callable=mock.mock_open, read_data=TEST_DATA)
def test_open3(self, mock_open):
...
or this without class by mocking the module method
def test_open(self):
with patch("mymodule.open", mock.mock_open(read_data=TEST_DATA), create=True):
...
#Gang's answer pointed me to the right direction but it's not a complete working solution. I have added few details here which makes it a working code without any tinkering.
# file_read.py
def read_from_file():
# Do other things here
filename = "file_with_data"
with open(filename, "r") as f:
l = f.readline()
return l
# test_file_read.py
from file_read import read_from_file
from unittest import mock
import builtins
##mock.patch.object(builtins, "open", new_callable=mock.mock_open, read_data="blah")
def test_file_read(mock_file_open):
output = read_from_file()
expected_output = "blah"
assert output == expected_output