Overriding namespace with yaml - python-3.x

Problem statement
I want the options supported in a python module to be overridable with an .yaml file, because in some cases there are too many options to be specified with non-default values.
I implemented the logic as follows.
parser = argparse.ArgumentParser()
# some parser.add statements that comes with default values
parser.add_argument("--config_path", default=None, type=str,
help="A yaml file for overriding parameters specification in this module.")
args = parser.parse_args()
# Override parameters
if args.config_path is not None:
with open(args.config_path, "r") as f:
yml_config = yaml.safe_load(f)
for k, v in yml_config.items():
if k in args.__dict__:
args.__dict__[k] = v
else:
sys.stderr.write("Ignored unknown parameter {} in yaml.\n".format(k))
The problem is, for some options I have specific functions/lambda expressions to convert the input strings, such as:
parser.add_argument("--tokens", type=lambda x: x.split(","))
In order to apply corresponding functions when parsing option specifications in YAML, adding so many if statements does not seem a good solution. Maintaining a dictionary that changes accordingly when new options are introduced in parser object seems redundant. Is there any solution to get the type for each argument in parser object?

If the elements that you add to the parser with add_argument start with -- then they
are actually optional and usually called options. You can find these walking over
the result of the _get_optonal_actions() method of the parser instance.
If you config.yaml looks like:
tokens: a,b,c
, then you can do:
import sys
import argparse
import ruamel.yaml
sys.argv[1:] = ['--config-path', 'config.yaml'] # simulate commandline
yaml = ruamel.yaml.YAML(typ='safe')
parser = argparse.ArgumentParser()
parser.add_argument("--config-path", default=None, type=str,
help="A yaml file for overriding parameters specification in this module.")
parser.add_argument("--tokens", type=lambda x: x.split(","))
args = parser.parse_args()
def find_option_type(key, parser):
for opt in parser._get_optional_actions():
if ('--' + key) in opt.option_strings:
return opt.type
raise ValueError
if args.config_path is not None:
with open(args.config_path, "r") as f:
yml_config = yaml.load(f)
for k, v in yml_config.items():
if k in args.__dict__:
typ = find_option_type(k, parser)
args.__dict__[k] = typ(v)
else:
sys.stderr.write("Ignored unknown parameter {} in yaml.\n".format(k))
print(args)
which gives:
Namespace(config_path='config.yaml', tokens=['a', 'b', 'c'])
Please note:
I am using the new API of ruamel.yaml. Loading this way is actually faster
than using from ruamel import yaml; yaml.safe_load() although your config files
are probably not big enough to notice.
I am using the file extension .yaml, as recommended in the official FAQ. You should do so as well, unless you cannot (e.g. if your
filesystem doesn't allow that).
I use a dash in the option string --config-path instead of an underscore, this is
somewhat more natural to type and automatically converted to an underscore to get valid
identifier name
You might want to consider a different approach where you parse sys.argv by hand for
--config-path, then set the defaults for all the options from the YAML config file and
then call .parse_args(). Doing things in that order allow you to override, on
the commandline, what has a value in the config file. If you do things your way, you always
have to edit a config file if it has all correct values except for one.

Related

ruamel.yaml.constructor.ConstructorError: could not determine a constructor for the tag '!vault' by using the C based SafeLoader

I am trying to read a YAML file using ruamel.yaml that has the tag !vault in it. I get the error : could not determine a constructor for the tag '!vault'
The reason why I'm using ruamel.yaml to prevent quotes in yaml structure and I want to use safe typ because of performance. I understood that I need to generate a constructor to resolve this issue, but I could not find any instruction how to do it.
import ruamel.yaml
yaml = ruamel.yaml.YAML(typ='safe', pure=False)
yaml.preserve_quotes = True
yaml.explicit_start = True
yaml.default_flow_style = False
yaml.indent(mapping=2, sequence=4, offset=2)
sfile="boot.yaml"
with open(sfile, 'r') as stream:
data = yaml.load(stream)
print(data)
The YAML file I am using:
---
level1:
bootstrap:
user: admin
admin_user: "yes"
admin_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
23423423423423423423423423423443336133616235373030363166616533396264363132323038
31393636333735316430633062326638616665383865643453453453453453453453453453453465
34333265303537643831376238366437336265363134396632613931376265623338346464663964
3932653961633536360a653466383734653433313135393530323063663034373663363936306264
30613762613164396539653462343437234234234234234234234346547567556345645763534534
the error is:
data = constructor(self, node)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ruamel/yaml/constructor.py", line 690, in construct_undefined
raise ConstructorError(
ruamel.yaml.constructor.ConstructorError: could not determine a constructor for the tag '!vault'
in "boot.yaml", line 6, column 21
The documentation describes how to dump (to_yaml method) and load (from_yaml method) So you can e.g. make a class Vault and provide a from_yaml classmethod that constructs a Vault instance.
yaml.preservequotes doesn't do anything when your not using a round-trip (default) YAML instance.
The more important question is though, how many megabytes is your YAML file, that you downgrade to the C loader, with still many of the bugs, incompatibilities and idiosyncrasies from the orginal libyaml.
In practise, if you have just one tag to deal with you can do (assuming your input is in the file input.yaml):
import sys
import ruamel.yaml
file_in = Path('input.yaml')
yaml = ruamel.yaml.YAML(typ='safe')
#yaml.register_class
class Vault:
yaml_tag = '!vault'
def __init__(self, s):
self._s = s
#classmethod
def from_yaml(cls, constructor, node):
return cls(node.value)
def __repr__(self):
return repr(self._s)
data = yaml.load(file_in)
print(data)
which gives:
{'level1': {'bootstrap': {'user': 'admin', 'admin_user': 'yes', 'admin_password': '$ANSIBLE_VAULT;1.1;AES256\n23423423423423423423423423423443336133616235373030363166616533396264363132323038\n31393636333735316430633062326638616665383865643453453453453453453453453453453465\n34333265303537643831376238366437336265363134396632613931376265623338346464663964\n3932653961633536360a653466383734653433313135393530323063663034373663363936306264\n30613762613164396539653462343437234234234234234234234346547567556345645763534534\n'}}}
However if you have many tags, this becomes tedious real fast. In that case you better discard
any tags found (the following only does tagged scalars, handling tagged sequences and mappings can
be done similarly by copying the relevant lines from composer.py and discarding the event.tag information):
import sys
import ruamel.yaml
file_in = Path('input.yaml')
yaml = ruamel.yaml.YAML(typ='safe')
class MyComposer(ruamel.yaml.composer.Composer):
def compose_scalar_node(self, anchor):
event = self.parser.get_event()
tag = self.resolver.resolve(ruamel.yaml.nodes.ScalarNode, event.value, event.implicit)
node = ruamel.yaml.nodes.ScalarNode(
tag,
event.value,
event.start_mark,
event.end_mark,
style=event.style,
comment=event.comment,
anchor=anchor,
)
if anchor is not None:
self.anchors[anchor] = node
return node
yaml.Composer = MyComposer
data = yaml.load(file_in)
print(data)
which also gives:
{'level1': {'bootstrap': {'user': 'admin', 'admin_user': 'yes', 'admin_password': '$ANSIBLE_VAULT;1.1;AES256\n23423423423423423423423423423443336133616235373030363166616533396264363132323038\n31393636333735316430633062326638616665383865643453453453453453453453453453453465\n34333265303537643831376238366437336265363134396632613931376265623338346464663964\n3932653961633536360a653466383734653433313135393530323063663034373663363936306264\n30613762613164396539653462343437234234234234234234234346547567556345645763534534\n'}}}

F Strings and Interpolation using a properties file

I have a simple python app and i'm trying to combine bunch of output messages to standardize output to the user. I've created a properties file for this, and it looks similar to the following:
[migration_prepare]
console=The migration prepare phase failed in {stage_name} with error {error}!
email=The migration prepare phase failed while in {stage_name}. Contact support!
slack=The **_prepare_** phase of the migration failed
I created a method to handle fetching messages from a Properties file... similar to:
def get_msg(category, message_key, prop_file_location="messages.properties"):
""" Get a string from a properties file that is utilized similar to a dictionary and be used in subsequent
messaging between console, slack and email communications"""
message = None
config = ConfigParser()
try:
dataset = config.read(prop_file_location)
if len(dataset) == 0:
raise ValueError("failed to find property file")
message = config.get(category, message_key).replace('\\n', '\n') # if contains newline characters i.e. \n
except NoOptionError as no:
print(
f"Bad option for value {message_key}")
print(f"{no}")
except NoSectionError as ns:
print(
f"There is no section in the properties file {prop_file_location} that contains category {category}!")
print(f"{ns}")
return f"{message}"
The method returns the F string fine, to the calling class. My question is, in the calling class if the string in my properties file contains text {some_value} that is intended to be interpolated by the compiler in the calling class using an F String with curly brackets, why does it return a string literal? The output is literal text, not the interpolated value I expect:
What I get The migration prepare phase failed while in {stage_name} stage. Contact support!
What I would like The migration prepare phase failed while in Reconciliation stage. Contact support!
I would like the output from the method to return the interpolated value. Has anyone done anything like this?
I am not sure where you define your stage_name but in order to interpolate in config file you need to use ${stage_name}
Interpolation in f-strings and configParser files are not the same.
Update: added 2 usage examples:
# ${} option using ExtendedInterpolation
from configparser import ConfigParser, ExtendedInterpolation
parser = ConfigParser(interpolation=ExtendedInterpolation())
parser.read_string('[example]\n'
'x=1\n'
'y=${x}')
print(parser['example']['y']) # y = '1'
# another option - %()s
from configparser import ConfigParser, ExtendedInterpolation
parser = ConfigParser()
parser.read_string('[example]\n'
'x=1\n'
'y=%(x)s')
print(parser['example']['y']) # y = '1'

__post_init__ of python 3.x dataclasses is not called when loaded from yaml

Please note that I have already referred to StackOverflow question here. I post this question to investigate if calling __post_init__ is safe or not. Please check the question till the end.
Check the below code. In step 3 where we load dataclass A from yaml string. Note that it does not call __post_init__ method.
import dataclasses
import yaml
#dataclasses.dataclass
class A:
a: int = 55
def __post_init__(self):
print("__post_init__ got called", self)
print("\n>>>>>>>>>>>> 1: create dataclass object")
a = A(33)
print(a) # print dataclass
print(dataclasses.fields(a))
print("\n>>>>>>>>>>>> 2: dump to yaml")
s = yaml.dump(a)
print(s) # print yaml repr
print("\n>>>>>>>>>>>> 3: create class from str")
a_ = yaml.load(s)
print(a_) # print dataclass loaded from yaml str
print(dataclasses.fields(a_))
The solution that I see for now is calling __-post_init__ on my own at the end like in below code snippet.
a_.__post_init__()
I am not sure if this is safe recreation of yaml serialized dataclass. Also, it will pose a problem when __post_init__ takes kwargs in case when dataclass fields are dataclasses.InitVar type.
This behavior is working as intended. You are dumping an existing object, so when you load it pyyaml intentionally avoids initializing the object again. The direct attributes of the dumped object will be saved even if they are created in __post_init__ because that function runs prior to being dumped. When you want the side effects that come from __post_init__, like the print statement in your example, you will need to ensure that initialization occurs.
There are few ways to accomplish this. You can use either the metaclass or adding constructor/representer approaches described in pyyaml's documentation. You could also manually alter the dumped string in your example to be ''!!python/object/new:' instead of ''!!python/object:'. If your eventual goal is to have the yaml file generated in a different manner, then this might be a solution.
See below for an update to your code that uses the metaclass approach and calls __post_init__ when loading from the dumped class object. The call to cls(**fields) in from_yaml ensures that the object is initialized. yaml.load uses cls.__new__ to create objects tagged with ''!!python/object:' and then loads the saved attributes into the object manually.
import dataclasses
import yaml
#dataclasses.dataclass
class A(yaml.YAMLObject):
a: int = 55
def __post_init__(self):
print("__post_init__ got called", self)
yaml_tag = '!A'
yaml_loader = yaml.SafeLoader
#classmethod
def from_yaml(cls, loader, node):
fields = loader.construct_mapping(node, deep=True)
return cls(**fields)
print("\n>>>>>>>>>>>> 1: create dataclass object")
a = A(33)
print(a) # print dataclass
print(dataclasses.fields(a))
print("\n>>>>>>>>>>>> 2: dump to yaml")
s = yaml.dump(a)
print(s) # print yaml repr
print("\n>>>>>>>>>>>> 3: create class from str")
a_ = yaml.load(s, Loader=A.yaml_loader)
print(a_) # print dataclass loaded from yaml str
print(dataclasses.fields(a_))

Python argparse with possibly empty string value

I would like to use argparse to pass some values throughout my main function. When calling the python file, I would always like to include the flag for the argument, while either including or excluding its string argument. This is because some external code, where the python file is being called, becomes a lot simpler if this would be possible.
When adding the arguments by calling parser.add_argument, I've tried setting the default value to default=None and also setting it to default=''. I can't make this work on my own it seems..
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-p', '--projects_to_build', default='')
args = parser.parse_args()
This call works fine:
py .\python_file.py -p proj_1,proj_2,proj_3
This call does not:
py .\python_file.py -p
python_file.py: error: argument -p/--projects_to_build: expected one argument
You need to pass a nargs value of '?' with const=''
parser.add_argument('-p', '--projects_to_build', nargs='?', const='')
You should also consider adding required=True so you don't have to pass default='' as well.

Unit Testing a Method That Uses a Context Manager

I have a method I would like to unit test. The method expects a file path, which is then opened - using a context manager - to parse a value which is then returned, should it be present, simple enough.
#staticmethod
def read_in_target_language(file_path):
"""
.. note:: Language code attributes/values can occur
on either the first or the second line of bilingual.
"""
with codecs.open(file_path, 'r', encoding='utf-8') as source:
line_1, line_2 = next(source), next(source)
get_line_1 = re.search(
'(target-language=")(.+?)(")', line_1, re.IGNORECASE)
get_line_2 = re.search(
'(target-language=")(.+?)(")', line_2, re.IGNORECASE)
if get_line_1 is not None:
return get_line_1.group(2)
else:
return get_line_2.group(2)
I want to avoid testing against external files - for obvious reasons - and do not wish to create temp files. In addition, I cannot use StringIO in this case.
How can I mock the file_path object in my unit test case? Ultimately I would need to create a mock path that contains differing values. Any help is gratefully received.
(Disclaimer: I don't speak Python, so I'm likely to err in details)
I suggest that you instead mock codecs. Make the mock's open method return an object with test data to be returned from the read calls. That might involve creating another mock object for the return value; I don't know if there are some stock classes in Python that you could use for that purpose instead.
Then, in order to actually enable testing the logic, add a parameter to read_in_target_language that represents an object that can assume the role of the original codecs object, i.e. dependency injection by argument. For convenience I guess you could default it to codecs.
I'm not sure how far Python's duck typing goes with regards to static vs instance methods, but something like this should give you the general idea:
def read_in_target_language(file_path, opener=codecs):
...
with opener.open(file_path, 'r', encoding='utf-8') as source:
If the above isn't possible you could just add a layer of indirection:
class CodecsOpener:
...
def open(self, file_path, access, encoding):
return codecs.open(file_path, access, encoding)
class MockOpener:
...
def __init__(self, open_result):
self.open_result = open_result
def open(self, file_path, access, encoding):
return self.open_result
...
def read_in_target_language(file_path, opener=CodecsOpener()):
...
with opener.open(file_path, 'r', encoding='utf-8') as source:
...
...
def test():
readable_data = ...
opener = MockOpener(readable_data)
result = <class>.read_in_target_language('whatever', opener)
<check result>

Resources