How to get a Hydra config without using #hydra.main() - python-3.x

Let's say we have following setup (copied & shortened from the Hydra docs):
Configuration file: config.yaml
db:
driver: mysql
user: omry
pass: secret
Python file: my_app.py
import hydra
#hydra.main(config_path="config.yaml")
def my_app(cfg):
print(cfg.pretty())
if __name__ == "__main__":
my_app()
This works well when we can use a decorator on the function my_app. Now I would like (for small scripts and testing purposes, but that is not important) to get this cfg object outside of any function, just in a plain python script. From what I understand how decorators work, it should be possible to call
import hydra
cfg = hydra.main(config_path="config.yaml")(lambda x: x)()
print(cfg.pretty())
but then cfg is just None and not the desired configuration object. So it seems that the decorator does not pass on the returned values. Is there another way to get to that cfg ?

Use the Compose API:
from hydra import compose, initialize
from omegaconf import OmegaConf
initialize(config_path="conf", job_name="test_app")
cfg = compose(config_name="config", overrides=["db=mysql", "db.user=me"])
print(OmegaConf.to_yaml(cfg))
This will only compose the config and will not have side effects like changing the working directory or configuring the Python logging system.

None of the above solutions worked for me. They gave errors:
'builtin_function_or_method' object has no attribute 'code'
and
GlobalHydra is already initialized, call
Globalhydra.instance().clear() if you want to re-initialize
I dug further into hydra and realised I could just use OmegaConf to load the file directly. You don't get overrides but I'm not fussed about this.
import omegaconf
cfg = omegaconf.OmegaConf.load(path)

I found a rather ugly answer but it works - if anyone finds a more elegant solution please let us know!
We can use a closure or some mutable object. In this example we define a list outside and append the config object:
For hydra >= 1.0.0 you have to use config_name instead, see documentation.
import hydra
c = []
hydra.main(config_name="config.yaml")(lambda x:c.append(x))()
cfg = c[0]
print(cfg)
For older versions:
import hydra
c = []
hydra.main(config_path="config.yaml")(c.append)()
cfg = c[0]
print(cfg.pretty())

anther ugly answer, but author said this may be crush in next version
Blockquote
from omegaconf import DictConfig
from hydra.utils import instantiate
from hydra._internal.utils import _strict_mode_strategy, split_config_path, create_automatic_config_search_path
from hydra._internal.hydra import Hydra
from hydra.utils import get_class
class SomeThing:
...
def load_from_yaml(self, config_path, strict=True):
config_dir, config_file = split_config_path(config_path)
strict = _strict_mode_strategy(strict, config_file)
search_path = create_automatic_config_search_path(
config_file, None, config_dir
)
hydra = Hydra.create_main_hydra2(
task_name='sdfs', config_search_path=search_path, strict=strict
)
config = hydra.compose_config(config_file, [])
config.pop('hydra')
self.config = config
print(self.config.pretty())

This is my solution
from omegaconf import OmegaConf
class MakeObj(object):
""" dictionary to object.
Thanks to https://stackoverflow.com/questions/1305532/convert-nested-python-dict-to-object
Args:
object ([type]): [description]
"""
def __init__(self, d):
for a, b in d.items():
if isinstance(b, (list, tuple)):
setattr(self, a, [MakeObj(x) if isinstance(x, dict) else x for x in b])
else:
setattr(self, a, MakeObj(b) if isinstance(b, dict) else b)
def read_yaml(path):
x_dict = OmegaConf.load(path)
x_yamlstr = OmegaConf.to_yaml(x_dict)
x_obj = MakeObj(x_dict)
return x_yamlstr, x_dict, x_obj
x_yamlstr, x_dict, x_obj = read_yaml('config/train.yaml')
print(x_yamlstr)
print(x_dict)
print(x_obj)
print(dir(x_obj))

Related

Python3 Class Invocation: Function vs __name_ guard

I'm looking for guidance calling a class in main vs methods.
A simple example would be a single method script.
# filename: one_funct.py
import argparse
import my_utils
from my_module_name import MyAwesomeClass
def my_one_func(class_inst, log):
aw_work_work = class_inst.method()
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Run One Function')
parser.add_argument('class_config', type=str, help='Class configuration filepath.')
opts = parser.parse_args()
_log = my_utils.setup_logging('run_one_funct')
class_config_dict = my_utils.get_config_dict(opts.class_config)
my_class_inst = MyAwesomeClass(class_config_dict)
my_one_func(my_class_inst, _log)
I've only ever been shown that invoking the class in main will mean I have to do it in my unittests when testing my_one_func(). But, there has to be more to whether or not it's better/worse doing it in the script function.
I'm aware doing it in main allows me to pass the single class instance to multiple methods. For instance, assuming all methods required the same class, if I had a 2...N function script I could do...
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Run One Function')
parser.add_argument('class_config', type=str, help='Class configuration filepath.')
opts = parser.parse_args()
_log = my_utils.setup_logging('run_one_funct')
class_config_dict = my_utils.get_config_dict(opts.class_config)
my_class_inst = MyAwesomeClass(class_config_dict)
my_func_1(my_class_inst, _log)
my_func_2(my_class_inst, _log)
...
my_func_N(my_class_inst, _log)
What I don't know is whether or not I should. I've been looking for an answer for a while now. Apologies if it does exist here.

Importing modules from file path very slow Is there any solution for this?

I have a list of modules that should be imported automatically and in a dynanamic way.
Here is a snippet from my code:
for m in modules_to_import:
module_name = dirname(__file__)+ "/" +m
spec = importlib.util.spec_from_file_location("package", module_name)
imported_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(imported_module)
I measured the time and it becomes slower and slower after each import. Is there some solution to this or why does it become slower? Thanks a lot!
I have not timed it but why dont you simplify your code. Looking at your code, you want to import module that are in the same directory as that file. By default, when you import a module, that the first place it look for.
First let's create some files to import all in the same directory:
First.py
def display_first():
print("I'm first")
Second.py
def display_second():
print("I'm second")
Third.py
def display_third():
print("I'm third")
So one way to do it is putting your modules in a dict that you can use afterwards. I'm using here a dict comprehension here to build that dict:
Solution1.py
import importlib
modules_to_import = ["First", "Second", "Third"]
modules_imported = {x: importlib.import_module(x) for x in modules_to_import}
modules_imported["First"].display_first()
modules_imported["Second"].display_second()
modules_imported["Third"].display_third()
Or if you really want to use to use the dotted notation to access a module's content, your could use a named tuple to help:
Solution2.py
import importlib
import collections
modules_to_import = ["First", "Second", "Third"]
modules_imported = collections.namedtuple("imported_modules", modules_to_import)
for next_module in modules_to_import:
setattr(modules_imported, next_module, importlib.import_module(next_module))
modules_imported.First.display_first()
modules_imported.Second.display_second()
modules_imported.Third.display_third()

Adding type-hinting to functions that return boto3 objects?

How do I add type-hinting to my functions that return various boto3 resources? I'd like to get automatic completion/checking on my return values in IDEs like PyCharm. Boto3 does some factory creation magic so I can't figure out how to declare the types correctly
import boto3
ec2 = boto3.Session().resource('ec2')
a = ec2.Image('asdf')
a.__class__ # => boto3.resources.factory.ec2.Image
But boto3.resources.factory.ec2.Image doesn't seem to be a class that's recognized by Python. So I can't use it for a type hint.
The docs show that the return type is EC2.Image. But is there a way to import that type as regular Python type?
UPDATE 2021
As mentioned by #eega, I no longer maintain the package. I'd recommend checking out boto3-stubs. It's a much more mature version of boto3_type_annotations.
Original Answer
I made a package that can help with this, boto3_type_annotations. It's available with or without documentation as well. Example usage below. There's also a gif at my github showing it in action using PyCharm.
import boto3
from boto3_type_annotations.s3 import Client, ServiceResource
from boto3_type_annotations.s3.waiter import BucketExists
from boto3_type_annotations.s3.paginator import ListObjectsV2
# With type annotations
client: Client = boto3.client('s3')
client.create_bucket(Bucket='foo') # Not only does your IDE knows the name of this method,
# it knows the type of the `Bucket` argument too!
# It also, knows that `Bucket` is required, but `ACL` isn't!
# Waiters and paginators and defined also...
waiter: BucketExists = client.get_waiter('bucket_exists')
waiter.wait('foo')
paginator: ListObjectsV2 = client.get_paginator('list_objects_v2')
response = paginator.paginate(Bucket='foo')
# Along with service resources.
resource: ServiceResource = boto3.resource('s3')
bucket = resource.Bucket('bar')
bucket.create()
# With type comments
client = boto3.client('s3') # type: Client
response = client.get_object(Bucket='foo', Key='bar')
# In docstrings
class Foo:
def __init__(self, client):
"""
:param client: It's an S3 Client and the IDE is gonna know what it is!
:type client: Client
"""
self.client = client
def bar(self):
"""
:rtype: Client
"""
self.client.delete_object(Bucket='foo', Key='bar')
return self.client
The boto3_type_annotations mentioned by Allie Fitter is deprecated, but he links to an alternative: https://pypi.org/project/boto3-stubs/

Case-insensitive getattr

Given a module I want to be able to search for classes in that module with case insensitivity.
For example if I have the following module utils/helpers.py
class UtilityClass:
def __init__(self):
...
In a different script, I want to be able to retrieve the class by its name in a case insensitive way
import utils.helpers as util_helpers
module = getattr(util_helpers, 'utilityclass')
What is the right and most Pythonic way to implement this?
You can override builtins.getattr with a case-insensitive version:
import builtins
import pprint
def igetattr(obj, attr):
for a in dir(obj):
if a.lower() == attr.lower():
return orig_getattr(obj, a)
orig_getattr = builtins.getattr
builtins.getattr = igetattr
print(getattr(pprint, 'prettyprinter'))
This outputs: <class 'pprint.PrettyPrinter'>

mypy importlib module functions

I am using importlib to import modules at runtime. These modules are plugins for my application and must implement 1 or more module-level functions. I have started adding type annotations to my applications and I get an error from mypy stating
Module has no attribute "generate_configuration"
where "generate_configuration" is one of the module functions.
In this example, the module is only required to have a generate_configuration function in it. The function takes a single dict argument.
def generate_configuration(data: Dict[str, DataFrame]) -> None: ...
I have been searching around for how to specify the interface of a module but all I can find are class interfaces. Can someone point me to some documentation showing how to do this? My google-fu is failing me on this one.
The code that loads this module is shown below. The error is generated by the last line.
plugin_directory = os.path.join(os.path.abspath(directory), 'Configuration-Generation-Plugins')
plugins = (
module_file
for module_file in Path(plugin_directory).glob('*.py')
)
sys.path.insert(0, plugin_directory)
for plugin in plugins:
plugin_module = import_module(plugin.stem)
plugin_module.generate_configuration(directory, points_list)
The type annotation for importlib.import_module simply returns types.ModuleType
From the typeshed source:
def import_module(name: str, package: Optional[str] = ...) -> types.ModuleType: ...
This means that the revealed type of plugin_module is Module -- which doesn't have your specific attributes.
Since mypy is a static analysis tool, it can't know that the return value of that import has a specific interface.
Here's my suggestion:
Make a type interface for your module (it doesn't have to be instantiated, it'll just help mypy figure things out)
class ModuleInterface:
#staticmethod
def generate_configuration(data: Dict[str, DataFrame]) -> None: ...
Make a function which imports your module, you may need to sprinkle # type: ignore, though if you use __import__ instead of import_module you may be able to avoid this limitation
def import_module_with_interface(modname: str) -> ModuleInterface:
return __import__(modname, fromlist=['_trash']) # might need to ignore the type here
Enjoy the types :)
The sample code I used to verify this idea:
class ModuleInterface:
#staticmethod
def compute_foo(bar: str) -> str: ...
def import_module_with_interface(modname: str) -> ModuleInterface:
return __import__(modname, fromlist=['_trash'])
def myf() -> None:
mod = import_module_with_interface('test2')
# mod.compute_foo() # test.py:12: error: Too few arguments for "compute_foo" of "ModuleInterface"
mod.compute_foo('hi')
I did some more research and eventually settled on a slightly different solution which uses typing.cast.
The solution still uses the static method definition from Anthony Sottile.
from typing import Dict
from pandas import DataFrame
class ConfigurationGenerationPlugin(ModuleType):
#staticmethod
def generate_configuration(directory: str, points_list: Dict[str, DataFrame]) -> None: ...
The code that imports the module then uses typing.cast() to set the correct type.
plugin_directory = os.path.join(os.path.abspath(directory), 'Configuration-Generation-Plugins')
plugins = (
module_file
for module_file in Path(plugin_directory).glob('*.py')
if not module_file.stem.startswith('lib')
)
sys.path.insert(0, plugin_directory)
for plugin in plugins:
plugin_module = cast(ConfigurationGenerationPlugin, import_module(plugin.stem))
plugin_module.generate_configuration(directory, points_list)
I am not sure how I feel about having to add the ConfigurationGenerationPlugin class or the cast() call to the code just to make mypy happy. However, I am going to stick with it for now.

Resources