python yaml update preserving order and comments - python-3.x

Im inserting a key into Yaml using python but I would like to preserve order and comments in the yaml
#This Key is used for identifying Parent tests
ParentTest:
test:
JOb1: myjob
name: testjob
arrive: yes
Now Im using below code to insert new key
params['ParentTest']['test']['new_key']='new value'
yaml_output=yaml.dump(pipeline_params, default_flow_style=False)
How to preserve the exact order and comments ?
Below arrive moved up but I want to preserve order & comments as well
output is :
ParentTest:
test:
arrive: yes
JOb1: myjob
name: testjob

pyyaml cannot keep comments, but ruamel does.
Try this:
doc = ruamel.yaml.load(yaml, Loader=ruamel.yaml.RoundTripLoader)
doc['ParentTest']['test']['new_key'] = 'new value'
print ruamel.yaml.dump(doc, Dumper=ruamel.yaml.RoundTripDumper)
The order of keys will also be preserved.
Edit: Look at Anthon's answer from 2020: https://stackoverflow.com/a/59659659/93745

Although #tinita's answer works, it uses the old ruamel.yaml API and
that gives you less control over the loading/dumping. Even so, you
cannot preserve the inconsistent indentation of your mappings: the key
ParentTest is indented four positions, the key test a further
three and the key JOb1 only two positions. You can "only" set the
same indentation for all mappings (i.e their keys), and separate from
that the indentation of all sequences (i.e. their elements) and if
there is enough space, you can offset the sequence indicator (-) within
the sequence element indent.
In the default, round-trip mode, ruamel.yaml preserves key order, and
additionally you can preserve quotes, folded and literal scalars.
With a slightly extended YAML input as example:
import sys
import ruamel.yaml
yaml_str = """\
#This Key is used for identifying Parent tests
ParentTest:
test:
JOb1:
- my
- job
# ^ four indent positions for the sequence elements
# ^ two position offset for the sequence indicator '-'
name: 'testjob' # quotes added to show working of .preserve_quotes = True
arrive: yes
"""
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=4, offset=2)
yaml.preserve_quotes = True
params = yaml.load(yaml_str)
params['ParentTest']['test']['new_key'] = 'new value'
params['ParentTest']['test'].yaml_add_eol_comment('some comment', key='new_key', column=40) # column is optional
yaml.dump(params, sys.stdout)
which gives:
#This Key is used for identifying Parent tests
ParentTest:
test:
JOb1:
- my
- job
# ^ four indent positions for the sequence elements
# ^ two position offset for the sequence indicator '-'
name: 'testjob' # quotes added to show working of .preserve_quotes = True
arrive: yes
new_key: new value # some comment

Addtion,if you want keep quote,you can try this:
import ruamel.yaml
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True

Related

How can I distinguish end of line comments from full line comments in ruamel.yaml?

In an attempt to make complex Azure DevOps pipelines self-documenting I am trying to read comments out of YAML files automatically. I have decided to use the ruamel.yaml python library.
Reading comments works well, but I have not found how to distinguish comments at the end of the line from full-line comments:
- book # This is an end-of-the-line comment
# This is a full line comment
Does anyone know how I could achieve this?
Code example, reading the stage-level comments of all stages, without comments of sub-entities of stages:
from ruamel.yaml import YAML
from ruamel.yaml.comments import CommentedMap, CommentedSeq
file_name = 'test.yaml'
yaml=YAML()
with open(file_name) as doc:
data = yaml.load(doc)
i = 0
for item in data['stages']:
i+=1
print("*** stage", i, item['stage'])
if isinstance(item, CommentedMap):
comment_token = item.ca.items.get('stage')
stage_help = {"stage_id": i}
current_key = "Comment"
if comment_token:
for tab in comment_token:
if tab:
vals = tab.value.split('\n')
for line in vals:
if line[1:1] == "#":
line = line[1:]
else:
line = line.strip()[1:].strip()
if len(line) == 0:
continue
if line[0:1] == "#":
current_key = line[1:line.index(':')]
content = line[line.index(':')+1:].strip()
else:
content = line
if current_key not in stage_help:
stage_help[current_key] = f"{content}"
else:
stage_help[current_key] = f"{stage_help[current_key]}\n{content}"
print(stage_help)
YAML:
stages:
- stage: TestA
# #Comment: I write what it does
# #Link: https://documentation
- stage: TestB # My favorite stage!
# #Comment: We can also write
# Multiline docs
# #Link: https://doc2
displayName: Test B # The displayName is shown in the pipeline's GUI
Running this gives me:
*** stage 1 TestA
{'stage_id': 1, 'Comment': 'I write what it does', 'Link': 'https://documentation'}
*** stage 2 TestB
{'stage_id': 2, 'Comment': 'My favorite stage!\nWe can also write\nMultiline docs', 'Link': 'https://doc2'}
First of all, the ruamel.yaml package indicates that:
The 0.17 series will also see changes in how comments are attached during
roundtrip. This will result in backwards incompatibilities on the .ca data
and it might even be necessary for documented methods that handle comments.
so you should certainly test for the version of the installed ruamel.yaml in your code.
In ruamel.yaml<0.17. Any comment only lines (and empty lines) are attached to the preceding
end-of-line comment that is associated with a napping key (or sequence element
index). If such a preceding end-of-line comment doesn't exists, it is constructed as just a newline. That
is what you should check for
import sys
import ruamel.yaml
assert ruamel.yaml.version_info < (0, 17)
yaml_str = """\
stages:
- stage: TestA
# #Comment: I write what it does
# #Link: https://documentation
- stage: TestB # My favorite stage!
# #Comment: We can also write
# Multiline docs
# #Link: https://doc2
displayName: Test B # The displayName is shown in the pipeline's GUI
"""
def comment_splitter(comment_list):
""" expects the list that is a comment attached to key/index as argument
returns a tuple containing;
- the eol comment for the key/index (empty string if not available)
- a list of, non-empty, full comments lines following the key/index
"""
token = comment_list[2]
# print(token)
eol, rest = token.value.split('\n', 1)
return eol.strip(), [y for y in [x.strip() for x in rest.split('\n')] if y]
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
print(comment_splitter(data['stages'][0].ca.items.get('stage')))
print(comment_splitter(data['stages'][1].ca.items.get('stage')))
which gives:
('', ['# #Comment: I write what it does', '# #Link: https://documentation'])
('# My favorite stage!', ['# #Comment: We can also write', '# Multiline docs', '# #Link: https://doc2'])
Because you immediately split the value of the CommentToken (if available) in your program and just skip
if the line is of zero length, you probably missed this. If you uncomment the # print(token) this should become clear.
I recommend against abusing YAML comments this way, and instead ask to consider generating the file that azure pipelines
expects by extracting the information from a more complete YAML file with a small python program. That is how I overcome
the deficiencies of the docker-compose.yaml file format.
Whether you check in the resulting (simpler) YAML, or always generate
it on the fly depends on the capabilities of the pipeline cq. how you invoke it.
I would probably start with something like following, and process that:
stages:
- #Stage
name: TestA
comment: I write what it does
link: https://documentation
- #Stage
name: TestB # My favorite stage!
Comment: |
We can also write
Multiline docs
link: https://doc2
displayName: Test B # The displayName is shown in the pipeline's GUI
That way you don't have to deal with any changes in the way comments will be attached in future versions of ruamel.yaml.

How does DoubleQuotedScalarString apply to a CommentedMap

When I use the CommentedMap of the ruamel.yaml library to store ordered dictionaries, I need to put the contents of the CommentedMap in the value of the dictionary as a string, but when I manipulate it with DoubleQuotedScalarString, the output comes with unneeded fields like ordereddict.
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap # CommentedMap用于解决ordereddict数据dump时带"!omap"这样的字段
from ruamel.yaml.scalarstring import SingleQuotedScalarString,DoubleQuotedScalarString
from pathlib import Path
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.indent(mapping=4, sequence=6, offset=4)
file_yml = CommentedMap()
test = CommentedMap()
test['test1'] = "test1"
test['test2'] = "test2"
file_yml["test"] = DoubleQuotedScalarString(test)
path = Path("./test.yaml")
yaml.dump(file_yml, path)
the result as follow
test: "ordereddict([('test1', 'test1'), ('test2', 'test2')])"
I'm looking forward to is the result of the test, "{' test1 ': 'test1', 'test2': 'test2'}"
I would appreciate it if you could tell me how to achieve it?
You shouldn't apply DoubleQuotedScalarString to a CommentedMap. The only thing the former is useful for is to make sure individual strings, that may part of a mapping or sequence, do get double quotes. By applying it the CommentedMap, you convert the whole into a string and that CommntedMap is an ordereddict.
You should probably just do:
test = dict()
and then later on:
file_yml["test"] = str(test)
On modern versions of Python this will preserve the key insertion order, and the quotes should be added as a scalar cannot start with { without being quoted automatically.
If test needs to be a CommentedMap before dumping it as a string, then cast it to a dict:
test = CommentedMap()
.....
file_yaml["test"] = str(dict(test))

How to serialize escaped strings in a list

I'm trying to a .yml policy document for AWS. The problem is my list of strings is being surrounded in double quotes "" when I try to escape it myself, i.e.
- "'acm:AddTagsToCertificate'".
When I do nothing, it shows as
- acm:AddTagsToCertificate.
Problem is I need the final result in the .yml to look like
- 'acm:AddTagsToCertificate'
In terms of my own trouble shooting, I've tried using double and single quotations. I've also tried subclassing list to override how lists are serialized until other SO answers said that was frowned upon.
Here's the reduced code which shows my issue
import yaml;
data = {'apigateway:CreateDeployment': 6}
actions = [];
for key in data:
key = "\'" + key + "\'"
print(key)
actions.append(key);
with open('test.yml', 'w') as output:
yaml.dump(actions, output, default_flow_style=False)
Use default_style="'" in the dump:
import yaml
data = {'apigateway:CreateDeployment': 6}
actions = list(data.keys())
with open('test.yml', 'w') as output:
yaml.dump(actions, output, default_flow_style=False, default_style="'")

Error on load if YAML key has no value (instead of None)?

I have a YAML file with an unknown number of "layers", so when I load it to a Python dictionary it becomes a nested dictionary.
I don't want to allow keys without values in the YAML file. I'd like to either:
cause errors during yaml.load() if there are missing values, or
identify all None values within the resulting nested dictionary.
import yaml
with open(input_path, "r") as yaml_file:
my_dict = yaml.load(yaml_file)
You can redefine the Parser's process_empty_scalar method to raise an error:
import yaml
yaml_str = """\
- 1
- - 2
-
- 3
"""
def pes(*args, **kw):
raise NotImplementedError
yaml.parser.Parser.process_empty_scalar = pes
data = yaml.safe_load(yaml_str)
print(data)
The above raises an error, if you comment out the assignment to .process_empty_scalar it parses correct.
Please note:
layers in YAML do not necessarily mean python dicts will be formed
you are using .load which is documented to be unsafe and almost guaranteed to be inappropriate.

Gitlab CI: Set dynamic variables

For a gitlab CI I'm defining some variables like this:
variables:
PROD: project_package
STAGE: project_package_stage
PACKAGE_PATH: /opt/project/build/package
BUILD_PATH: /opt/project/build/package/bundle
CONTAINER_IMAGE: registry.example.com/project/package:e2e
I would like to set those variables a bit more dynamically, as there are mainly only two parts: project and package. Everything else depends on those values, that means I have to change only two values to get all other variables.
So I would expect something like
variables:
PROJECT: project
PACKAGE: package
PROD: $PROJECT_$PACKAGE
STAGE: $PROD_stage
PACKAGE_PATH: /opt/$PROJECT/build/$PACKAGE
BUILD_PATH: /opt/$PROJECT/build/$PACKAGE/bundle
CONTAINER_IMAGE: registry.example.com/$PROJECT/$PACKAGE:e2e
But it looks like, that the way doing this is wrong...
I don't know where your expectation comes from, but it is trivial to check there is no special meaning for $, _, '/' nor : if not followed by a space in YAML. There might be in gitlab, but I doubt strongly that there is in the way you expect.
To formalize your expectation, you assume that any key (from the same mapping) preceded by a $ and terminated by the end of the scalar, by _ or by / is going to be "expanded" to that key's value. The _ has to be such terminator otherwise $PROJECT_$PACKAGE would not expand correctly.
Now consider adding a key-value pair:
BREAKING_TEST: $PACKAGE_PATH
is this supposed to expand to:
BREAKING_TEST: /opt/project/build/package/bundle
or follow the rule you implied that _ is a terminator and just expand to:
BREAKING_TEST: project_PATH
To prevent this kind of ambiguity programs like bash use quoting around variable names to be expanded ( "$PROJECT"_PATH vs. $PROJECT_PATH), but the more sane, and modern, solution is to use clamping begin and end characters (e.g. { and }, $% and %, ) with some special rule to use the clamping character as normal text.
So this is not going to work as you indicated as indeed you do something wrong.
It is not to hard to pre-process a YAML file, and it can be done with e.g. Python (but watch out that { has special meaning in YAML), possible with the help of jinja2: load the variables, and then expand the original text using the variables until replacements can no longer be made.
But it all starts with choosing the delimiters intelligently. Also keep in mind that although your "variables" seem to be ordered in the YAML text, there is no such guarantee when the are constructed as dict/hash/mapping in your program.
You could e.g. use << and >>:
variables:
PROJECT: project
PACKAGE: package
PROD: <<PROJECT>>_<<PACKAGE>>
STAGE: <<PROD>>_stage
PACKAGE_PATH: /opt/<<PROJECT>>/build/<<PACKAGE>>
BUILD_PATH: /opt/<<PROJECT>>/build/<<PACKAGE>>/bundle
CONTAINER_IMAGE: registry.example.com/<<PROJECT>>/<<PACKAGE>>:e2
which, with the following program (that doesn't deal with escaping << to keep its normal meaning) generates your original, expanded, YAML exactly.
import sys
from ruamel import yaml
def expand(s, d):
max_recursion = 100
while '<<' in s:
res = ''
max_recursion -= 1
if max_recursion < 0:
raise NotImplementedError('max recursion exceeded')
for idx, chunk in enumerate(s.split('<<')):
if idx == 0:
res += chunk # first chunk is before <<, just append
continue
try:
var, rest = chunk.split('>>', 1)
except ValueError:
raise NotImplementedError('delimiters have to balance "{}"'.format(chunk))
if var not in d:
res += '<<' + chunk
else:
res += d[var] + rest
s = res
return s
with open('template.yaml') as fp:
yaml_str = fp.read()
variables = yaml.safe_load(yaml_str)['variables']
data = yaml.round_trip_load(expand(yaml_str, variables))
yaml.round_trip_dump(data, sys.stdout, indent=2)

Resources