dump ruamel yaml - preserving the original structure

dump ruamel yaml - preserving the original structure - python-3.x

I have this yaml file (file_in.yaml),
components: &base
type1: 3 # sample comment
type2: 0x353
type3: 1.2.3.4
type4: "bla"
schemas:
description: 'ex1' # this is comment
description: 'ex2'
test2:
<<: *base
type4: 4555 # under the sea :)
yellow: &yellow
bla: 1
collor: &yellow
bla: 2
paint:
color: *yellow
slot_si_value_t: &bla1
desc: hav slot 2
slot_number: 2 #SI_SLOT_2
inst_max: 4
slot_si_value_t: &bla2
desc: hav slot 4
slot_number: 4 #SI_SLOT_4
inst_max: 4
slot:
- slot_si_value: *bla1
- slot_si_value: *bla2
I load it with this code snippet, and dump it into another file.
import ruamel.yaml
ryaml = ruamel.yaml.YAML()
ryaml.allow_duplicate_keys = True
ryaml.preserve_quotes = True
with open("file_in.yaml") as si_file:
si_data = ryaml.load(si_file)
with open("file_out.yaml", "w") as fp:
ryaml.dump(si_data, fp)
the file_out.yaml looks like this,
components: &base
type1: 3 # sample comment
type2: 0x353
type3: 1.2.3.4
type4: "bla"
schemas:
description: 'ex1' # this is comment
test2:
<<: *base
type4: 4555 # under the sea :)
yellow:
bla: 1
collor: &yellow
bla: 2
paint:
color: *yellow
slot_si_value_t: &bla1
desc: hav slot 2
slot_number: 2 #SI_SLOT_2
inst_max: 4
slot:
- slot_si_value: *bla1
- slot_si_value:
desc: hav slot 4
slot_number: 4 #SI_SLOT_4
inst_max: 4
I can see that the comments, quotes, hex values and the order are preserved, however the structure of the yaml is changed. Is there any ways to instruct ruamel to dump the exact format?
here is a side-by-side comparison,

You can preserve all anchors by changing the .yaml_set_anchor method of the CommentedBase. For
historical reasons this is currently only done for scalar values that are anchored (i.e. it is inconsistent).
Having duplicate keys in your YAML mappings however makes the document invalid, because keys have to be unique
according to the YAML specification. In order
to allow loading of these faulty documents ruamel.yaml allows you to load such broken
documents by setting .allow_duplicate_keys, but it doesn't support writing such incorrect documents
and disregards further occurrences of the same key in a mapping (during loading, so you cannot access the values for those keys, unless the are anchored and aliased somewhere else).
That is why you "lose" the description: 'ext2' under key schemas, including the following
empty line (which is part of that entries "comment")
The second occurrence of slot_si_value_t in the root mapping causes more problems. Because
it is not preserved, the &bla2 anchored mapping exists only once in the loaded
data and gets dumped (with an anchor because the .yaml_set_anchor change), within
the sequence that is the value of slot.
import sys
import warnings
from pathlib import Path
import ruamel.yaml
def yaml_set_anchor(self, value, always_dump=True):
self.anchor.value = value
self.anchor.always_dump = always_dump
ruamel.yaml.comments.CommentedBase.yaml_set_anchor = yaml_set_anchor
in_file = Path('file_in.yaml')
yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
yaml.preserve_quotes = True
with warnings.catch_warnings():
warnings.simplefilter("ignore")
data = yaml.load(in_file)
yaml.dump(data, sys.stdout)
which gives:
components: &base
type1: 3 # sample comment
type2: 0x353
type3: 1.2.3.4
type4: "bla"
schemas:
description: 'ex1' # this is comment
test2:
<<: *base
type4: 4555 # under the sea :)
yellow: &yellow
bla: 1
collor: &yellow
bla: 2
paint:
color: *yellow
slot_si_value_t: &bla1
desc: hav slot 2
slot_number: 2 #SI_SLOT_2
inst_max: 4
slot:
- slot_si_value: *bla1
- slot_si_value: &bla2
desc: hav slot 4
slot_number: 4 #SI_SLOT_4
inst_max: 4
You should update your input file to dispose, or change, the duplicate keys. Even
then this will not exactly round-trip in ruamel.yaml since you have inconsistent
indentation ( e.g. the root level mapping indents two spaces for components and
four spaces for slot_si_value_t ), and that is being normalized.

Related

Updating a YAML file in Python

I am updating the following template.yaml file in Python3:
alpha:
alpha_1:
alpha_2:
beta:
beta_1:
beta_2:
- beta_2a:
beta_2b:
gamma:
Using ruamel.py I am able to fill the blank space correctly.
file_name = 'template.yaml'
config, ind, bsi = ruamel.yaml.util.load_yaml_guess_indent(open(file_name))
and updating each element I am able to arrive to:
alpha:
alpha_1: "val_alpha1"
alpha_2: "val_alpha2"
beta:
beta_1: "val_beta1"
beta_2:
- beta_2a: "val_beta2a"
beta_2b: "val_beta2b"
gamma: "val_gamma"
Here there is the issue, I may need other children elements in beta_2 node, in this way:
alpha:
alpha_1: "val_alpha1"
alpha_2: "val_alpha2"
beta:
beta_1: "val_beta1"
beta_2:
- beta_2a: "val_beta2a"
beta_2b: "val_beta2b"
- beta_2c: "val_beta2c"
beta_2d: "val_beta2d"
gamma: "val_gamma"
I do not know in advance if I could need more branches like above and change the template each time is not an option.
My attempts with update() or appending dict were unsuccessful. How can I get the desired result?
My attempt:
entry = config["beta"]["beta_2"]
entry[0]["beta_2a"] = "val_beta2a"
entry[0]["beta_2b"] = "val_beta2b"
entry[0].update = {"beta_2c": "val_beta2a", "beta_2d": "val_beta2d"}
In this case, the program does not display any changes in the results, meaning that the last line with update did not work at all.

2022-03-31 16:18:34 ['ryd', '--force', 'so-71693609.ryd']
Your indent is five for the list with a two space offset for the indicator (-),
so there is no real need to try and analyse the indent unless some other program
changes that.
The value for beta_2 is a list, to get what you want you need to append
a dictionary to that list:
import sys
from pathlib import Path
import ruamel.yaml
from ruamel.yaml.scalarstring import DoubleQuotedScalarString as DQ
file_name = Path('template.yaml')
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=5, offset=2)
config = yaml.load(file_name)
config['alpha'].update(dict(alpha_1=DQ('val_alpha1'), alpha_2=DQ('val_alpha2')))
config['beta'].update(dict(beta_1=DQ('val_beta1')))
config['gamma'] = DQ('val_gamma')
entry = config["beta"]["beta_2"]
entry[0]["beta_2a"] = DQ("val_beta2a")
entry[0]["beta_2b"] = DQ("val_beta2b")
entry.append(dict(beta_2c=DQ("val_beta2a"), beta_2d=DQ("val_beta2d")))
yaml.dump(config, sys.stdout)
which gives:
alpha:
alpha_1: "val_alpha1"
alpha_2: "val_alpha2"
beta:
beta_1: "val_beta1"
beta_2:
- beta_2a: "val_beta2a"
beta_2b: "val_beta2b"
- beta_2c: "val_beta2a"
beta_2d: "val_beta2d"
gamma: "val_gamma"

Appending a key to the top of an array

I have some hiera not unlike the following (I know this is invalid hiera with two keys... bare with me):
an::example::rule_files:
my_rules:
groups:
- name: my_rules
rules:
- alert: highCPU
expr: CPU > 90
for: 5m
annotations:
summary: "CPU is too high"
description: "CPU should be less than 90"
someone_elses_rules:
groups:
- name: someone_elses_rules
rules:
- alert: highCPU
expr: CPU > 70
for: 5m
annotations:
summary: "CPU is too high"
description: "CPU should be less than 70 on someone else's system"
I'm trying to turn this into a yaml file (the key is the filename). Now I know this is invalid hiera and I can remove the groups key to get this working (exactly what I've done), however when I try to reinsert it into the array, I can't get the formatting right. Here's the puppet code I'm using:
$alert_files = hiera('an::example::rule_files'),
$alert_files.each | String $alerts_file_name, Array $alert_config_pre | {
$prefix = [ "groups:" ]
$alert_config = $prefix + $alert_config_pre
file { "/etc/prometheus/${alerts_file_name}.rules":
ensure => file,
content => $alert_config.to_yaml,
}
}
Here's what I want:
cat /etc/prometheus/my_rules.rules
---
groups:
- name: my_rules
rules:
- alert: highCPU
expr: CPU > 90
for: 5m
annotations:
summary: CPU is too high
description: CPU should be less than 90
and here's what I get:
---
- 'groups:'
- name: my_rules
rules:
- alert: highCPU
expr: CPU > 90
for: 5m
annotations:
summary: CPU is too high
description: CPU should be less than 90
Any help would be massively appreciated. I feel like this should be simple but I've not really made any progress (I can't even remove the quotes from the word groups). If this is possible in either hiera or puppet (perhaps I've defined the hiera wrong) then great; any progress I can make in any way will be really appreciated.

This ...
$alert_files = hiera('an::example::rule_files'),
$alert_files.each | String $alerts_file_name, Array $alert_config_pre | {
... depends on the data associated with key an::example::rule_files to be a Hash with String keys and Array values. In the YAML presented at the top of the question, that item is instead a hash with String keys and Hash values. Inasmuch as the data seem to match the wanted file content, the problem seems to be not with the YAML (except for the inconsistent indentation), but rather with the Puppet code.
To work as you appear to want with the data you want, the Puppet code might look more like so:
$alert_files = lookup('an::example::rule_files'),
$alert_files.each |String $alerts_file_name, Hash $alert_config| {
file { "/etc/prometheus/${alerts_file_name}.rules":
ensure => 'file',
content => $alert_config.to_yaml,
}
}
Note that I have switched from the deprecated hiera() function to its replacement, lookup().

How to format this code so that flake8 is happy?

This code was created by black:
def test_schema_org_script_from_list():
assert (
schema_org_script_from_list([1, 2])
== '<script type="application/ld+json">1</script>\n<script type="application/ld+json">2</script>'
)
But now flake8 complains:
tests/test_utils.py:59:9: W503 line break before binary operator
tests/test_utils.py:59:101: E501 line too long (105 > 100 characters)
How can I format above lines and make flake8 happy?
I use this .pre-commit-config.yaml
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: 'https://github.com/pre-commit/pre-commit-hooks'
rev: v3.2.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: 'https://gitlab.com/pycqa/flake8'
rev: 3.8.4
hooks:
- id: flake8
- repo: 'https://github.com/pre-commit/mirrors-isort'
rev: v5.7.0
hooks:
- id: isort
tox.ini:
[flake8]
max-line-length = 100
exclude = .git,*/migrations/*,node_modules,migrate
# W504 line break after binary operator
ignore = W504
(I think it is a bit strange that flake8 reads config from a file which belongs to a different tool).

from your configuration, you've set ignore = W504
ignore isn't the option you want as it resets the default ignore (bringing in a bunch of things, including W503).
If you remove ignore=, both W504 and W503 are in the default ignore so they won't be caught
as for your E501 (line too long), you can either extend-ignore = E501 or you can set max-line-length appropriately
for black, this is the suggested configuration:
[flake8]
max-line-length = 88
extend-ignore = E203
note that there are cases where black cannot make a line short enough (as you're seeing) -- both from long strings and from long variable names
disclaimer: I'm the current flake8 maintainer

How can I parse YAML into multiple compose.yaml based on the value of a key with python

I'm parsing YAML and break it into multidifferent YAML file. I use constructor of PyYAML to achieve it, but the effect is poor.
This is a part of my project, I need to parse and split into multiple different yaml files based on the value of a key in a yaml file I receive.
yaml file I receive looks like this
testname: testname
testall:
test1:
name: name1
location: 0
test2:
name: name2
location: 2
test3:
name: name3
location: 0
test4:
name: name4
location: 2
...
locations:
- 0
- 2
- ...
I want to parse it and split by device like the following:
# location0.yaml
testname:test
tests:
test1:
name:test1
location:0
test3:
name: test3
location: 0
# location2.yaml
testname:test
tests:
test2:
name:test2
location:0
test4:
name: test4
location: 0
How to parse like above form?

Although you can do this with PyYAML, you would have to restrict
yourself to YAML 1.1. For this kind of read-modify-write you should
use ruamel.yaml (disclaimer: I am the author of that package). Not
only does that support YAML 1.2, it also preserves any comments, tags
and anchor names in case they occur in your source and can preserve
quotes around scalars, literal and folded style, etc. if you need that.
Also note that your output is invalid YAML, you cannot have a
multi-line plain (i.e. unquoted) scalar be the key of (block style)
mapping. You would have to write:
"testname:test
tests":
but I assume you meant that to be two keys for the root level mapping:
testname: test
tests:
Assuming your input is in input.yaml:
testname: testname
testall:
test1:
name: name1 # this is just the first name
location: 0
test2:
name: "name2" # quotes added for demo purposes
location: 2
test3:
name: name3 # as this has the same location as name1
location: 0 # these should be together
test4:
name: name4 # and this one goes with name2
location: 2
locations:
- 0
- 2
you can do:
import sys
from pathlib import Path
import ruamel.yaml
in_file = Path('input.yaml')
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=6, offset=4) # this matches your input
yaml.preserve_quotes = True
data = yaml.load(in_file)
for loc in data['locations']:
out_name = f'location{loc}.yaml'
tests = {}
d = ruamel.yaml.comments.CommentedMap(dict(testname="test", tests=tests))
d.yaml_set_start_comment(out_name)
testall = data['testall']
for test in testall:
if loc == testall[test]['location']:
tests[test] = testall[test]
tests[test]['location'] = 0
# since you set location to zero and this affects data, make sure to remove
# the items. This will prevent things from going wrong in case the
# locations sequence does have zero, but not as its first number
for key in tests:
del testall[key]
yaml.dump(d, Path(out_name))
which gives location0.yaml:
# location0.yaml
testname: test
tests:
test1:
name: name1 # this is just the first name
location: 0
test3:
name: name3 # as this has the same location as name1
location: 0 # these should be together
and location2.yaml:
# location2.yaml
testname: test
tests:
test2:
name: "name2" # quotes added for demo purposes
location: 0
test4:
name: name4 # and this one goes with name2
location: 0

How to capture words spread through multiple lines which have anywhite space(newline, space, tab)

import re
c = """
class_monitor std4:
Name: xyz
Roll number: 123
Age: 9
Badge: Blue
class_monitor std5:
Name: abc
Roll number: 456
Age: 10
Badge: Red
"""
I want to print Name, Roll number and age for std4 and Name, roll number and badge for std5.
pat = (class_monitor)(.*4:)(\n|\s|\t)*(Name:)(.*)(\s|\n|\t)*(Roll number:)(.*)(\s|\n|\t)*(Age:)(.*)(\s|\n|\t)*(Badge:)(.*)
it matches the respective std if I toggle the second group (.*4:) to (.*5:) in pythex.
However, in a script mode, it is not working. Am I missing something here?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

dump ruamel yaml - preserving the original structure - python-3.x

Related

Updating a YAML file in Python

Appending a key to the top of an array

How to format this code so that flake8 is happy?

How can I parse YAML into multiple compose.yaml based on the value of a key with python

How to capture words spread through multiple lines which have anywhite space(newline, space, tab)

Categories

Resources