Defining One Dict Value Based on Another Dict Value (Undefined Value) - python-3.x

My program is a pipeline that processes files. I have a dict (P) which stores directory Path's. All of these directory Path's are are relative to a common ROOT Path from which they are generated. The dict works when I define ROOT outside of the dict as follows:
# WORKS
from pathlib import Path
ROOT = Path("/very/long/path/")
P = {
"ROOT": ROOT,
"FS_TO_IDX": ROOT / "docs/",
"IDXD_FS": ROOT / "indexed_docs/",
}
This seems inelegant. Since ROOT is already an element of dict, I would prefer to use the ROOT value in generating the remaining dict values. However, I get "Undefined variable:P" when I do the following.
# FAILS
from pathlib import Path
P = {
"ROOT": Path("/very/long/path/"),
"FS_TO_IDX": P["ROOT"] / "docs/",
"IDXD_FS": P["ROOT"] / "indexed_docs/",
}
Is there a similar approach that would allow me to assign a dict value and then use that same key/value to define other values in the dict? For example, the walrus operator (:=) seems to provide similar behavior by allowing one to assign to variables within an expression and then use that variable.

Related

How may I dynamically create global variables within a function based on input in Python

I'm trying to create a function that returns a dynamically-named list of columns. Usually I can manually name the list, but I now have 100+ csv files to work with.
My goal:
Function creates a list, and names it based on dataframe name
Created list is callable outside of the function
I've done my research, and this answer from an earlier post came very close to helping me.
Here is what I've adapted
def test1(dataframe):
# Using globals() to get dataframe name
df_name = [x for x in globals() if globals()[x] is dataframe][0]
# Creating local dictionary to use exec function
local_dict = {}
# Trying to generate a name for the list, based on input dataframe name
name = 'col_list_' + df_name
exec(name + "=[]", globals(), local_dict)
# So I can call this list outside the function
name = local_dict[name]
for feature in dataframe.columns:
# Append feature/column if >90% of values are missing
if dataframe[feature].isnull().mean() >= 0.9:
name.append(feature)
return name
To ensure the list name changes based on the DataFrame supplied to the function, I named the list using:
name = 'col_list_' + df_name
The problem comes when I try to make this list accessible outside the function:
name = local_dict[name].
I cannot find away to assign a dynamic list name to the local dictionary, so I am forced to always call name outside the function to return the list. I want the list to be named based on the dataframe input (eg. col_list_df1, col_list_df2, col_list_df99).
This answer was very helpful, but it seems specific to variables.
global 'col_list_' + df_name returns a syntax error.
Any help would be greatly appreciated!

converting a python string to a variable

I have read almost every similar question but none of them seems to solve my current problem.
In my python code, I am importing a string from my bashrc and in the following, I am defining the same name as a variable to index my dictionary. Here is the simple example
obs = os.environ['obs']
>> obs = 'id_0123'
id_0123 = numpy.where(position1 == 456)
>> position1[id_0123] = 456
>> position2[id_0123] = 789
But of course, when I do positions[obs], it throws an error since it is a string rather than an index (numpy.int64). So I have tried to look for a solution to convert my string into a variable but all solution suggesting to either convert into a dictionary or something else and assign the string to an integer, But I can not do that since my string is dynamic and will constantly change. In the end, I am going to have about 50 variables and I need to check the current obs corresponding to which variable, so I could use it as indices to access the parameters.
Edit:
Position1 and Position 2 are just bumpy arrays, so depending on the output of os.environ (which is 'id_0123' in this particular case), they will print an array element. So I can not assign 'id_0123' another string or number since I am using that exact name as a variable.
The logic is that there are many different arrays, I want to use the output of os.environ as an input to access the element of these arrays.
If you wanted to use a dictionary instead, this would work.
obs = 'id_0123'
my_dict = {obs: 'Tester'}
print (my_dict [obs])
print (my_dict ['id_0123'])
You could use the fact that (almost) everything is a dictionary in Python to create storage container that you index with obs:
container = lambda:None
container.id_0123 = ...
...
print(positions[container.__dict__[obs]])
Alternatively, you can use globals() or locals() to achieve the desired behavior:
import numpy
import os
def environment_to_variable(environment_key='obs', variable_values=globals()):
# Get the name of the variable we want to reference from the bash environment
variable_name = os.environ[environment_key]
# Grab the variable value that matches the named variable from the environment. Variable you want to reference must be visible in the global namespace by default.
variable_value = variable_values[variable_name]
return variable_value
positions = [2, 3, 5, 7, 11, 13]
id_0123 = 2 # could use numpy.where here
id_0456 = 4
index = environment_to_variable(variable_values=locals())
print(positions[index])
If you place this in a file called a.py, you can observe the following:
$ obs="id_0123" python ./a.py
5
$ obs="id_0456" python ./a.py
11
This allows you to index the array differently at runtime, which is what it seems like your intention is.

Does the compiler move static definitions out of loops?

Let's say I have this loop with a definition in it:
for name in ["alice", "bob", "ceelo"]:
full_name = name + {"alice": "cooper", "bob": "dylan", "ceelo":"green"}[name]
print(full_name)
As you can see, my dict isn't assigned to anything. I could save runtime by refractoring it to:
names = {"alice": "cooper", "bob": "dylan", "ceelo":"green"}
for name in names:
full_name = name + names[name]
... but I don't want to. For reasons. I promise.
My question: Does the standard Python compiler automatically perform this refractor?
First, python does not use a compiler because there is no compiled file generated after you try to run some python files, it uses interpreter.
Here even if you do not assign the dict to any variable name e.g.
"names = {"alice": "cooper", "bob": "dylan", "ceelo":"green"}", the dictionary itself is still a dictionary and it occupies some space in your memory, no matter a name given or not.
Therefore
, you will be able to use key to get your value e.g. "{"alice": "cooper", "bob": "dylan", "ceelo":"green"}[name]". But the python interpreter will not assign the dictionary to a variable automatically if you never declare so.

How to change an anchored scalar in a sequence without destroying the anchor in ruamel.yaml?

When using ruamel.yaml version 0.15.92 with Python 3.6.6 on CentOS 7, I cannot seem to update the value of an anchored scalar in a sequence without destroying the anchor itself or creating invalid YAML from the next dump.
I have attempted to recreate the original node type with the new value (old PlainScalarString -> new PlainScalarString, old FoldedScalarString -> new FoldedScalarString, etc), copying the anchor to it. While this restores the anchor to the updated scalar value, it also creates invalid YAML because the first alias later in the YAML file duplicates the same anchor name and assigns to it the old value of the scalar I'm trying to update.
I then attempted to replace all of the affected aliases with actual alias text -- like *anchor_name -- but that causes the value to become quoted like '*anchor_name', rendering the alias useless.
I reverted that and then attempted to suppress the duplicate anchor name (by setting always_dump=False on every affected alias). While that does suppress the duplicate anchor name, it unfortunately just dumps the old value of the anchored scalar.
My entire test data is as follows; assume this is named test.yaml:
# Header comment
---
# Post-header comment
# Reusable aliases
aliases:
- &plain_value This is unencrypted
- &string_password ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]
- &block_password >
ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
DQYJKoZIhvcNAQEBBQAEggEAojErrxuNcdX6oR+VA/I3PyuV2CwXx166nIUp
asEHo1/CiCIoE3qCnjK2FJF8vg+l3AqRmdb7vYrqQ+30RFfHSlB9zApSw8NW
tnEpawX4hhKAxnTc/JKStLLu2k7iZkhkor/UA2HeVJcCzEeYAwuOQRPaolmQ
TGHjvm2w6lhFDKFkmETD/tq4gQNcOgLmJ+Pqhogr/5FmGOpJ7VGjpeUwLteM
er3oQozp4l2bUTJ8wk9xY6cN+eeOIcWXCPPdNetoKcVropiwrYH8QV4CZ2Ky
u0vpiybEuBCKhr1EpfqhrtuG5s817eOb7+Wf5ctR0rPuxlTUqdnDY31zZ3Kb
mcjqHDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBATq6BjaxU2bfcLL5S
bxzsgCDsWzggzxsCw4Dp0uYLwvMKjJEpMLeFXGrLHJzTF6U2Nw==]
top_key: unencrypted value
top_alias: *plain_value
top::hash:
ignore: more
# This pulls its string-form value from above
stringified_alias: *string_password
sub:
ignore: value
key: unencrypted subbed-value
# This pulls its block-form value from above
blocked_alias: *block_password
sub_more:
# This is a stringified EYAML value, NOT an alias
inline_string: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqkv6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6HtkolM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoKB4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
# Also NOT an alias, in block form
block_string: >
ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
DQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12
hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5
TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqk
v6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6Htko
lM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4
osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoK
B4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064
EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
# Signature line
There are two forms of this issue, so here are two code examples for reproducing the conditions:
First, "How can we most simply update the value of an anchored scalar in a sequence without destroying the anchor or its aliases?" This looks like:
with open('test.yaml', 'r') as f:
yaml_data = yaml.load(f)
yaml_data['aliases'][1] = "New string password"
yaml.dump(yaml_data, sys.stdout)
Note that this destroys the anchor. I would very much prefer the solution look as similar to this first snippet as possible; perhaps something like yaml_data['aliases'][1].set_value("New string password") # Changes only the scalar value while preserving the original anchor, comments, position, et al..
Second, "If we must instead wrap the new value in some object to preserve the anchor (and other attributes of the entry being replaced), what is the simplest approach which also preserves all aliases that refer to it (such that they adopt the updated value) when dumped?" My attempt to solve this requires quite a lot more code including recursive functions. Since SO guidelines advise against dumping large code, I will offer the relevant bits. Please assume the unlisted code is working perfectly well.
### <snip def FindEYAMLPaths(...) returns lists of paths through the YAML to every value starting with 'ENC['>
### <snip def GetYAMLValue(...) returns the node -- as a PlainScalarString, FoldedScalarString, et al. -- identified by a path from FindEYAMLPaths>
### <snip def DisableAnchorDump(...) sets `anchor.always_dump=False` if the node has an anchor attribute>
def ReplaceYAMLValue(value, data, path=None):
if path is None:
return
ref = data
last_ref = path.pop()
for p in path:
ref = ref[p]
# All I'm trying to do here is change the scalar value without disrupting its comments, anchor, positioning, or any of its aliases.
# This succeeds in changing the scalar value and preserving its original anchor, but disrupts its aliases which insist on preserving the old value.
if isinstance(ref[last_ref], PlainScalarString):
ref[last_ref] = PlainScalarString(value, anchor=ref[last_ref].anchor.value)
elif isinstance(ref[last_ref], FoldedScalarString):
ref[last_ref] = FoldedScalarString(value, anchor=ref[last_ref].anchor.value)
else:
ref[last_ref] = value
with open('test.yaml', 'r') as f:
yaml_data = yaml.load(f)
seen_anchors = []
for path in FindEYAMLPaths(yaml_data):
if path is None:
continue
node = GetYAMLValue(yaml_data, deque(path))
if hasattr(node, 'anchor'):
test_anchor = node.anchor.value
if test_anchor is not None:
if test_anchor in seen_anchors:
# This is expected to just be an alias, pointing at the newly updated anchor
DisableAnchorDump(node)
continue
seen_anchors.append(test_anchor)
ReplaceYAMLValue("New string password", yaml_data, path)
yaml.dump(yaml_data, sys.stdout)
Note that this produces valid YAML except that all of the affected aliases are gone, replaced instead by the old value of the anchored scalar.
I expect to be able to change the value of an aliased scalar in a sequence without disrupting any other part of the YAML content. Based on other posts I've seen about ruamel.yaml, I fully accept that I may need to dump the updated YAML to file and reload it for the in-memory aliases to update to the new value. I simply expect to change:
Input File
aliases:
- &some_anchor Old value
usage: *some_anchor
to:
Output File
aliases:
- &some_anchor NEW VALUE
usage: *some_anchor
Instead, here's the output from the above two examples:
First, notice that the original anchor was destroyed and the value for top::hash:stringified_alias: now carries the original anchor and old value instead of the alias to the newly updated scalar value at ['aliases'][1]:
---
# Post-header comment
# Reusable aliases
aliases:
- &plain_value This is unencrypted
- New string password
- &block_password >
ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
DQYJKoZIhvcNAQEBBQAEggEAojErrxuNcdX6oR+VA/I3PyuV2CwXx166nIUp
asEHo1/CiCIoE3qCnjK2FJF8vg+l3AqRmdb7vYrqQ+30RFfHSlB9zApSw8NW
tnEpawX4hhKAxnTc/JKStLLu2k7iZkhkor/UA2HeVJcCzEeYAwuOQRPaolmQ
TGHjvm2w6lhFDKFkmETD/tq4gQNcOgLmJ+Pqhogr/5FmGOpJ7VGjpeUwLteM
er3oQozp4l2bUTJ8wk9xY6cN+eeOIcWXCPPdNetoKcVropiwrYH8QV4CZ2Ky
u0vpiybEuBCKhr1EpfqhrtuG5s817eOb7+Wf5ctR0rPuxlTUqdnDY31zZ3Kb
mcjqHDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBATq6BjaxU2bfcLL5S
bxzsgCDsWzggzxsCw4Dp0uYLwvMKjJEpMLeFXGrLHJzTF6U2Nw==]
# ... snip ...
top::hash:
ignore: more
# This pulls its string-form value from above
stringified_alias: &string_password ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]
# ... snip ...
Second, notice that ['aliases'][1] now looks correct -- it is the new value with the original anchor -- but where I expect to see aliases to it, I instead see the old value. I expect to see *string_password instead of ENC[...].
---
# Post-header comment
# Reusable aliases
aliases:
- &plain_value This is unencrypted
- &string_password New string password
- &block_password >-
New string password
# ... snip ...
top::hash:
ignore: more
# This pulls its string-form value from above
stringified_alias: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]
# ... snip ...
If you read in an anchored scalar, like your This is unencrypted,
using ruamel.yaml, you get a PlainScalarString object (or one of the other ScalarString
subclasses), which is an extremely thin layer around the basic string
type. That layer has an attribute to store an anchor if applicable (other uses are primarily to
maintain quoting/literal/folding style information). And any aliases using that anchor refer to the same ScalarString instance.
When dumping the anchor attribute is not used to create aliases, that
is is done in the normal way by having multiple references to the same
object. The attribute is only used to write the anchor id and also
does so if there is an attribute but no further references (i.e. an anchor without aliases).
So it is not surprising that if you replace such an object with
multiple references (either at the anchor spot or any of the alias
spots) that the reference disappears. If you then also force the same
anchor name on some other object, you get duplicate anchors, contrary
to the normal anchor/alias generation there is no check done on
"forced" anchors.
Since the ScalarString is such a thin wrapper, they are essentially
immutable objects, just like the string itself. Unlike with aliased
dicts and lists which are collection objects that can be emptied and
then filled (instead of replaced by a new instance), you cannot do
that with string.
The implementation of ScalarString can of course be changed, so you
can have your set_values() method, but involves creating alternative
classes for all the objects (PlainScalarString,
FoldedScalarString). You would have to make sure
these get used for constructing and for representing and then
preferable also behave like normal strings as far as you need it, so
at least you can print.
That is relatively easy to do but requires copying and slightly modifyging several
tens of lines of code
I think it is easier to leave the ScalarStrings in place as is (i.e
being immutable) and do what you need to do if you want to change all
occurences (i.e. references): update all the references to the
original. If your datastructure would contain millions of nodes that
might be prohibitively time consuming, but still would be afraction of what
loading and dumping the YAML itself would take:
import sys
from pathlib import Path
import ruamel.yaml
in_file = Path('test.yaml')
def update_aliased_scalar(data, obj, val):
def recurse(d, ref, nv):
if isinstance(d, dict):
for i, k in [(idx, key) for idx, key in enumerate(d.keys()) if key is ref]:
d.insert(i, nv, d.pop(k))
for k, v in d.non_merged_items():
if v is ref:
d[k] = nv
else:
recurse(v, ref, nv)
elif isinstance(d, list):
for idx, item in enumerate(d):
if item is ref:
d[idx] = nv
else:
recurse(item, ref, nv)
if hasattr(obj, 'anchor'):
recurse(data, obj, type(obj)(val, anchor=obj.anchor.value))
else:
recurse(data, obj, type(obj)(val))
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.preserve_quotes = True
data = yaml.load(in_file)
update_aliased_scalar(data, data['aliases'][1], "New string password")
update_aliased_scalar(data, data['top::hash']['sub']['blocked_alias'], "New block password\n")
yaml.dump(data, sys.stdout)
which gives:
# Post-header comment
# Reusable aliases
aliases:
- &plain_value This is unencrypted
- &string_password New string password
- &block_password >
New block password
top_key: unencrypted value
top_alias: *plain_value
top::hash:
ignore: more
# This pulls its string-form value from above
stringified_alias: *string_password
sub:
ignore: value
key: unencrypted subbed-value
# This pulls its block-form value from above
blocked_alias: *block_password
sub_more:
# This is a stringified EYAML value, NOT an alias
inline_string: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqkv6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6HtkolM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoKB4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
# Also NOT an alias, in block form
block_string: >
ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
DQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12
hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5
TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqk
v6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6Htko
lM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4
osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoK
B4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064
EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
# Signature line
As you can see the anchors are preserved and it doesn't matter for update_aliased_scalar if you
provide the anchored "place" or one of the aliased places as a reference.
The above recurse also handles keys that are aliased, as it is perfectly fine for a key in a YAML mapping to have an anchor or to be an alias. You can even have an anchored key with a value that is an alias to the corresponding key.
It would be very nice to have support for in-place modification of existing anchored fields with types ScalarFloat/ScalarInt etc. YAML is often used for config files. One common use case I encountered is to create multiple config files from a very large template config file with only small changes made to the new files. I would load the template file into CommentedMap, modify a small set of keys in place and dump it back into a new yaml config file. This flow works very nicely if the keys to be changed are not anchored. When they are anchored, the anchors are duplicated in the new files as reported by OP and render them invalid. Manually addressing each anchored key in post-processing can be daunting when there are a large number of them.

Creating nested dictionaries from a list containing paths

I have a list containing paths. For example:
links=['main',
'main/path1',
'main/path1/path2',
'main/path1/path2/path3/path4',
'main/path1/path2/path3/path5',
'main/path1/path2/path3/path4/path6']
I want to create a nested dictionary to store these paths in order. Expected output:
Output = {‘main’: {‘path1’: {‘path2’: {‘path3’: {‘path4’: {‘path6’: {} }},‘path5’:{}}}}}
I am new to python coding (v 3.+) and I am unable to solve it. It gets confusing after i reach path 3 as there is path 4 (with path6 nested) and path5 as well. Can someone please help ?
Something like
tree = {}
for path in links: # for each path
node = tree # start from the very top
for level in path.split('/'): # split the path into a list
if level: # if a name is non-empty
node = node.setdefault(level, dict())
# move to the deeper level
# (or create it if unexistent)
With links defined as above, it results in
>>> tree
{'main': {'path1': {'path2': {'path3': {'path4': {'path6': {}}, 'path5': {}}}}}}

Resources