How to reload hydra config with enumerations - pytorch

Is there a better way to reload a hydra config from an experiment with enumerations? Right now I reload it like so:
initialize_config_dir(config_dir=exp_dir, ".hydra"), job_name=config_name)
cfg = compose(config_name, overrides=overrides)
print(cfg.enum)
>>> ENUM1
But ENUM1 is actually an enumeration that normally loads as
>>> <SomeEnumClass.ENUM1: 'enum1'>
I am able to fix this by adding a configstore default to the experiment hydra file:
defaults:
- base_config_cs
Which now results in
initialize_config_dir(config_dir=exp_dir, ".hydra"), job_name=config_name)
cfg = compose(config_name, overrides=overrides)
print(cfg.enum)
>>> <SomeEnumClass.ENUM1: 'enum1'>
Is there a better way to do this without adding this? Or can I add the default in the python code?

This is a good question -- reliably reloading configs from previous Hydra runs is an area that could be improved.
As you've discovered, loading the saved file config.yaml directly results in an untyped DictConfig object.
The solution below involves a script called reload.py that creates a config node with a defaults list that loads both the schema base_config_cs and the saved file config.yaml.
At the end of this post I also give a simple solution that involves loading .hydra/overrides.yaml to re-run the config composition process.
Suppose you've run a Hydra job with the following setup:
# app.py
from dataclasses import dataclass
from enum import Enum
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import DictConfig
class SomeEnumClass(Enum):
ENUM1 = 1
ENUM2 = 2
#dataclass
class Schema:
enum: SomeEnumClass
x: int = 123
y: str = "abc"
def store_schema() -> None:
cs = ConfigStore.instance()
cs.store(name="base_config_cs", node=Schema)
#hydra.main(config_path=".", config_name="foo")
def app(cfg: DictConfig) -> None:
print(cfg)
if __name__ == "__main__":
store_schema()
app()
# foo.yaml
defaults:
- base_config_cs
- _self_
enum: ENUM1
x: 456
$ python app.py y=xyz
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
After running app.py, there exists a directory outputs/2022-02-05/06-42-42/.hydra containing the saved file config.yaml.
As you correctly pointed out in your question, to reload the saved config you must merge the schema base_config_cs with the contents of config.yaml. Here is a pattern for accomplishing that:
# reload.py
import os
from hydra import compose, initialize_config_dir
from hydra.core.config_store import ConfigStore
from app import store_schema
config_name = "config"
exp_dir = os.path.abspath("outputs/2022-02-05/07-19-56")
saved_cfg_dir = os.path.join(exp_dir, ".hydra")
assert os.path.exists(f"{saved_cfg_dir}/{config_name}.yaml")
store_schema() # stores `base_config_cs`
cs = ConfigStore.instance()
cs.store(
name="reload_conf",
node={
"defaults": [
"base_config_cs",
config_name,
]
},
)
with initialize_config_dir(config_dir=saved_cfg_dir):
cfg = compose("reload_conf")
print(cfg)
$ python reload.py
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
In the above, python file reload.py, we store a node called reload_conf in the ConfigStore. Storing reload_conf this way is equivalent to creating a file called reload_conf.yaml that is discoverable by Hydra on the config search path. This reload_conf node has a defaults list that loads both the schema base_config_cs and config. For this to work, the following two conditions must be met:
the schema base_config_cs must be stored in the ConfigStore. This is accomplished by calling the store_schema function that we have imported from app.py.
a config node with name specified by the variable config_name, i.e. config.yaml in this example, must be discoverable by Hydra (which is taken care of here by calling initialize_config_dir).
Note that in foo.yaml we have a defaults list ["base_config_cs", "_self_"] that loads the schema base_config_cs before loading the contents _self_ of foo. In order for reload_conf to reconstruct the app's config with the same merge order, base_config_cs should come before config_name in the defaults list belonging to reload_conf.
The above approach could be taken one step further by removing the defaults list from foo.yaml and using cs.store to ensure the same defaults list is used in both the app and the reloading script
# app2.py
from dataclasses import dataclass
from enum import Enum
from typing import Any, List
import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING, DictConfig
class SomeEnumClass(Enum):
ENUM1 = 1
ENUM2 = 2
#dataclass
class RootConfig:
defaults: List[Any] = MISSING
enum: SomeEnumClass = MISSING
x: int = 123
y: str = "abc"
def store_root_config(primary_config_name: str) -> None:
cs = ConfigStore.instance()
# defaults list defined here:
cs.store(
name="root_config", node=RootConfig(defaults=["_self_", primary_config_name])
)
#hydra.main(config_path=".", config_name="root_config")
def app(cfg: DictConfig) -> None:
print(cfg)
if __name__ == "__main__":
store_root_config("foo2")
app()
# foo2.yaml (note NO DEFAULTS LIST)
enum: ENUM1
x: 456
$ python app2.py hydra.job.chdir=false y=xyz
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
# reload2.py
import os
from hydra import compose, initialize_config_dir
from hydra.core.config_store import ConfigStore
from app2 import store_root_config
config_name = "config"
exp_dir = os.path.abspath("outputs/2022-02-05/07-45-43")
saved_cfg_dir = os.path.join(exp_dir, ".hydra")
assert os.path.exists(f"{saved_cfg_dir}/{config_name}.yaml")
store_root_config("config")
with initialize_config_dir(config_dir=saved_cfg_dir):
cfg = compose("root_config")
print(cfg)
$ python reload2.py
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
A simpler alternative approach is to use .hydra/overrides.yaml to recompose the app's configuration based on the overrides that were originally passed to Hydra:
# reload3.py
import os
import yaml
from hydra import compose, initialize
from app import store_schema
config_name = "config"
exp_dir = os.path.abspath("outputs/2022-02-05/07-19-56")
saved_cfg_dir = os.path.join(exp_dir, ".hydra")
overrides_path = f"{saved_cfg_dir}/overrides.yaml"
assert os.path.exists(overrides_path)
overrides = yaml.unsafe_load(open(overrides_path, "r"))
print(f"{overrides=}")
store_schema()
with initialize(config_path="."):
cfg = compose("foo", overrides=overrides)
print(cfg)
$ python reload3.py
overrides=['y=xyz']
{'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
This approach has its drawbacks: if your app's configuration involves some non-hermetic operation like querying a timestamp (e.g. via Hydra's now resolver) or looking up an environment variable (e.g. via the oc.env resolver), the configuration composed by reload.py might be different from the original version loaded in app.py.

Related

How do i import an input value? [duplicate]

I'm working on a documentation (personal) for nested matplotlib (MPL) library, which differs from MPL own provided, by interested submodule packages. I'm writing Python script which I hope will automate document generation from future MPL releases.
I selected interested submodules/packages and want to list their main classes from which I'll generate list and process it with pydoc
Problem is that I can't find a way to instruct Python to load submodule from string. Here is example of what I tried:
import matplotlib.text as text
x = dir(text)
.
i = __import__('matplotlib.text')
y = dir(i)
.
j = __import__('matplotlib')
z = dir(j)
And here is 3 way comparison of above lists through pprint:
I don't understand what's loaded in y object - it's base matplotlib plus something else, but it lack information that I wanted and that is main classes from matplotlib.text package. It's top blue coloured part on screenshot (x list)
Please don't suggest Sphinx as different approach.
The __import__ function can be a bit hard to understand.
If you change
i = __import__('matplotlib.text')
to
i = __import__('matplotlib.text', fromlist=[''])
then i will refer to matplotlib.text.
In Python 3.1 or later, you can use importlib:
import importlib
i = importlib.import_module("matplotlib.text")
Some notes
If you're trying to import something from a sub-folder e.g. ./feature/email.py, the code will look like importlib.import_module("feature.email")
Before Python 3.3 you could not import anything if there was no __init__.py in the folder with file you were trying to import (see caveats before deciding if you want to keep the file for backward compatibility e.g. with pytest).
importlib.import_module is what you are looking for. It returns the imported module.
import importlib
# equiv. of your `import matplotlib.text as text`
text = importlib.import_module('matplotlib.text')
You can thereafter access anything in the module as text.myclass, text.myfunction, etc.
spent some time trying to import modules from a list, and this is the thread that got me most of the way there - but I didnt grasp the use of ___import____ -
so here's how to import a module from a string, and get the same behavior as just import. And try/except the error case, too. :)
pipmodules = ['pycurl', 'ansible', 'bad_module_no_beer']
for module in pipmodules:
try:
# because we want to import using a variable, do it this way
module_obj = __import__(module)
# create a global object containging our module
globals()[module] = module_obj
except ImportError:
sys.stderr.write("ERROR: missing python module: " + module + "\n")
sys.exit(1)
and yes, for python 2.7> you have other options - but for 2.6<, this works.
Apart from using the importlib one can also use exec method to import a module from a string variable.
Here I am showing an example of importing the combinations method from itertools package using the exec method:
MODULES = [
['itertools','combinations'],
]
for ITEM in MODULES:
import_str = "from {0} import {1}".format(ITEM[0],', '.join(str(i) for i in ITEM[1:]))
exec(import_str)
ar = list(combinations([1, 2, 3, 4], 2))
for elements in ar:
print(elements)
Output:
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)
Module auto-install & import from list
Below script works fine with both submodules and pseudo submodules.
# PyPI imports
import pkg_resources, subprocess, sys
modules = {'lxml.etree', 'pandas', 'screeninfo'}
required = {m.split('.')[0] for m in modules}
installed = {pkg.key for pkg in pkg_resources.working_set}
missing = required - installed
if missing:
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--upgrade', 'pip'])
subprocess.check_call([sys.executable, '-m', 'pip', 'install', *missing])
for module in set.union(required, modules):
globals()[module] = __import__(module)
Tests:
print(pandas.__version__)
print(lxml.etree.LXML_VERSION)
I developed these 3 useful functions:
def loadModule(moduleName):
module = None
try:
import sys
del sys.modules[moduleName]
except BaseException as err:
pass
try:
import importlib
module = importlib.import_module(moduleName)
except BaseException as err:
serr = str(err)
print("Error to load the module '" + moduleName + "': " + serr)
return module
def reloadModule(moduleName):
module = loadModule(moduleName)
moduleName, modulePath = str(module).replace("' from '", "||").replace("<module '", '').replace("'>", '').split("||")
if (modulePath.endswith(".pyc")):
import os
os.remove(modulePath)
module = loadModule(moduleName)
return module
def getInstance(moduleName, param1, param2, param3):
module = reloadModule(moduleName)
instance = eval("module." + moduleName + "(param1, param2, param3)")
return instance
And everytime I want to reload a new instance I just have to call getInstance() like this:
myInstance = getInstance("MyModule", myParam1, myParam2, myParam3)
Finally I can call all the functions inside the new Instance:
myInstance.aFunction()
The only specificity here is to customize the params list (param1, param2, param3) of your instance.
You can also use exec built-in function that execute any string as a Python code.
In [1]: module = 'pandas'
...: function = 'DataFrame'
...: alias = 'DF'
In [2]: exec(f"from {module} import {function} as {alias}")
In [3]: DF
Out[3]: pandas.core.frame.DataFrame
For me this was the most readable way to solve my problem.

Is it possible to make a module available as an import from another module?

I'm refactoring some code and have moved around some files. But for backwards compatibility, I would like to make all of my modules keep their old import paths.
my file structure is as follows
--| calcs/
----| __init__.py
----| new_dir
------| new_file1.py
------| new_file2.py
What do I need to do ensure that I can use an import like
import calcs.newfile1.foo
# OR
from calcs.newfile1 import foo
I have tried a few methods of adding the imports to the top level __init__.py file. As is reccommended here
But while this seems to allow an import such as import calcs.newfile1, An import such as import calcs.newfile1.foo raises ModuleNotFoundError: No module named calcs.newfile1
I expect that I need python to recognize calcs.newfile1 as a **module **. At the moment it seems to just be importing it as a class or other object of some sort
The only way i know how to do it is by creating a custom import hook.
Here is the PEP for more information.
If you need some help on how to implement one, i'll suggest you to take a look at the six module,
here
and here
Basically your calcs/__init__.py will become like this:
''' calcs/__init__.py '''
from .new_dir import new_file1, new_file2
import sys
__path__ = []
__package__ = __name__
class CalcsImporter:
def __init__(self, exported_mods):
self.exported_mods = {
f'{__name__}.{key}': value for key, value in exported_mods.items()
}
def find_module(self, fullname, path=None):
if fullname in self.exported_mods:
return self
return None
def load_module(self, fullname):
try:
return sys.modules[fullname]
except KeyError:
pass
try:
mod = self.exported_mods[fullname]
except KeyError:
raise ImportError('Unable to load %r' % fullname)
mod.__loader__ = self
sys.modules[fullname] = mod
return mod
_importer = CalcsImporter({
'new_file1': new_file1,
'new_file2': new_file2,
})
sys.meta_path.append(_importer)
and you should be able to do from calcs.new_file1 import foo

How to import static from a class

I have previously used importlib as a dynamic importing from a class by doing this:
def load_store(store: str) -> importlib:
"""
Imports the correct path for given store
:param store:
:return:
"""
mod = importlib.import_module(f"lib.vendors.{store}")
class_pointer = getattr(mod, store)()
return class_pointer
However the problem I have seen is that for some reason it calls the importlib 602 times!! whenever I do have this function
on a code that only calls the function once.
from lib.scraper import product_data
from lib.utils import load_store
# To test specific store and link
store: str = "footlockerse"
link: str = "https://www.footlocker.se/en/product/BarelyGreen-Black-White/316700362904"
# -------------------------------------------------------------------------
# Utils
# -------------------------------------------------------------------------
store_class = load_store(store=store) # <--- Calls it only once
def main():
product_data(store_class=store_class)
store = store, link = link, params = "product_page")
if __name__ == '__main__':
main()
I have later on tested to call the import static and the issue went away, However my problem is that I do have around 46 imports that I need to implement and I wonder if I could somehow import only the needed import by given the "store" variable, for example if I have given footlockerse then we import only footlockerse, is that possible?
e.g.
test = "footlockerse"
from lib.vendors.test import test

python mocking function calls that are called during module import

I need to perform mocking for python code that is running during module import
For example I have code like this
import configparser
config = configparser.ConfigParser()
config.read('test.ini')
a = float(config['config']['a'])
b = float(config['config']['b'])
c = float(config['config']['c'])
print(a)
print(b)
print(c)
I need mock "config" for testing
import pytest
import mock
import app
#mock.patch('app.configparser.ConfigParser')
def test_config_mock(config_mock):
config_mock.return_value = {'config': { 'a' : 1 } }
However, this testing function being called after actual import so my mocking is not making any sense
What's the right way of doing this kind of thing?
What you can do in this case is to instead patch the config instance using mock.patch.dict:
# test_coolio.py
import mock
from app.fun import coolio
#mock.patch.dict('app.configparser.config', values={'config': {'a': 15}})
def test_config_mock():
assert coolio() == '15'
# app/fun.py
from app.configparser import config
def coolio():
return config['config']['a']

How to import a module in Python 3 from a string?

A solution for this problem is available for Python 2, but it uses the imp module which is deprecated in Python 3.
imp has been replaced by importlib which works well for file based imports. Specifically, importlib.import_module requires a file name - not a string or a file handler.
I made a workaround by dumping the contents of the URL to a file and importing it
def initlog():
modulename = '_mylogging'
try:
import _mylogging
except ImportError:
r = requests.get('http://(...)/mylogging.py')
with open(modulename+'.py', "w") as f:
f.write(r.text)
finally:
import _mylogging
return _mylogging.MYLogging().getlogger()
but I would like to avoid the intermediate file.
Putting the security, network performance and availability issues aside - is there a way to feed a string to importlib? (or from a file handler, in which case I would use io.StringIO)
You can adapt exactly the same answer to 3.x, using the replacement for imp.new_module:
from types import ModuleType
foo = ModuleType('foo')
and the replacement for the exec statement:
foo_code = """
class Foo:
pass
"""
exec(foo_code, globals(), foo.__dict__)
After which everything works as expected:
>>> dir(foo)
['Foo', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> foo.Foo()
<__main__.Foo object at 0x110546ba8>

Resources