How to serialize escaped strings in a list - python-3.x

I'm trying to a .yml policy document for AWS. The problem is my list of strings is being surrounded in double quotes "" when I try to escape it myself, i.e.
- "'acm:AddTagsToCertificate'".
When I do nothing, it shows as
- acm:AddTagsToCertificate.
Problem is I need the final result in the .yml to look like
- 'acm:AddTagsToCertificate'
In terms of my own trouble shooting, I've tried using double and single quotations. I've also tried subclassing list to override how lists are serialized until other SO answers said that was frowned upon.
Here's the reduced code which shows my issue
import yaml;
data = {'apigateway:CreateDeployment': 6}
actions = [];
for key in data:
key = "\'" + key + "\'"
print(key)
actions.append(key);
with open('test.yml', 'w') as output:
yaml.dump(actions, output, default_flow_style=False)

Use default_style="'" in the dump:
import yaml
data = {'apigateway:CreateDeployment': 6}
actions = list(data.keys())
with open('test.yml', 'w') as output:
yaml.dump(actions, output, default_flow_style=False, default_style="'")

Related

Dictionary to Tab Delimited Text File for Particular Schema

I have a dictionary of the form:
data = {'a':'one','b':'two','c':'three'}
I want to convert this to a tab delimited text file such that the file reads as:
a b c one two three.
I tried:
import json
data = {'a':'one','b':'two','c':'three'}
with open('file.txt', 'w') as file:
file.write(json.dumps(data))
However the resulting file just reads as ('a':'one','b':'two','c':'three'). I knew it wouldn't be as simple as that, and I'm sure it's not complex, but I just can't seem to figure this one out.
data = {'a':'one','b':'two','c':'three'}
s = ""
for x in data.keys():
s += x
s += "\t"
for x in data.values():
s += x
s += "\t"
print(s)
with open('file.txt', 'w') as file:
file.write(s)
Dictionary is a structure that's designed for when a one-to-one association between values exist. Here is a link to further discussions on how it compares with other structures.
Therefore it makes sense to print the key:value pair together to preserve that association. Thus the default behaviour of print(data) or in your case file.write(data) when data is a dictionary is to output {'a': 'one', 'b': 'two', 'c': 'three'}.
The key1, key2, ... value1, value2 ... type of output format you request is not typical for a structure like dictionary, therefore a more "manual" approach like the one above involving two loops is required.
As for json, its usage is really not that relevant in the code you provided, maybe it is used in other parts of your code base when a json specific output format is required. You can read more on json here to know that it is a format independent of the python programming language.

An Elegant Solution to Python's Multiline String?

I was trying to log a completion of a scheduled event I set to run on Django. I was trying my very best to make my code look presentable, So instead of putting the string into a single line, I have used a multiline string to output to the logger within a Command Management class method. The example as code shown:
# the usual imports...
# ....
import textwrap
logger = logging.getLogger(__name__)
class Command(BaseCommand):
def handle(self, *args, **kwargs):
# some codes here
# ....
final_statement = f'''\
this is the final statements \
with multiline string to have \
a neater code.'''
dedented_text = textwrap.dedent(final_statment)
logger.info(dedent.replace(' ',''))
I have tried a few methods I found, however, most quick and easy methods still left a big chunk of spaces on the terminal. As shown here:
this is the final statement with multiline string to have a neater code.
So I have come up with a creative solution to solve my problem. By using.
dedent.replace(' ','')
Making sure to replace two spaces with no space in order not to get rid of the normal spaces between words. Which finally produced:
this is the final statement with multiline string to have a neater code.
Is this an elegant solution or did I missed something on the internet?
You could use regex to simply remove all white space after a newline. Additionally, wrapping it into a function leads to less repetitive code, so let's do that.
import re
def single_line(string):
return re.sub("\n\s+", "", string)
final_statement = single_line(f'''
this is the final statements
with multiline string to have
a neater code.''')
print(final_statement)
Alternatively, if you wish to avoid this particular problem (and don't mine the developmental overhead), you could store them inside a file, like JSON so you can quickly edit prompts while keeping your code clean.
Thanks to Neil's suggestion, I have come out with a more elegant solution. By creating a function to replace the two spaces with none.
def single_line(string):
return string.replace(' ','')
final_statement = '''\
this is a much neater
final statement
to present my code
'''
print(single_line(final_statement)
As improvised from Neil's solution, I have cut down the regex import. That's one line less of code!
Also, making it a function improves on readability as the whole print statement just read like English. "Print single line final statement"
Any better idea?
The issue with both Neil’s and Wong Siwei’s answers is they don’t work if your multiline string contains lines more indented than others:
my_string = """\
this is my
string and
it has various
identation
levels"""
What you want in the case above is to remove the two-spaces indentation, not every space at the beginning of a line.
The solution below should work in all cases:
import re
def dedent(s):
indent_level = None
for m in re.finditer(r"^ +", s):
line_indent_level = len(m.group())
if indent_level is None or indent_level > line_indent_level:
indent_level = line_indent_level
if not indent_level:
return s
return re.sub(r"(?:^|\n) {%s}" % indent_level, "", s)
It first scans the whole string to find the lowest indentation level then uses that information to dedent all lines of it.
If you only care about making your code easier to read, you may instead use C-like strings "concatenation":
my_string = (
"this is my string"
" and I write it on"
" multiple lines"
)
print(repr(my_string))
# => "this is my string and I write it on multiple lines"
You may also want to make it explicit with +s:
my_string = "this is my string" + \
" and I write it on" + \
" multiple lines"

Gitlab CI: Set dynamic variables

For a gitlab CI I'm defining some variables like this:
variables:
PROD: project_package
STAGE: project_package_stage
PACKAGE_PATH: /opt/project/build/package
BUILD_PATH: /opt/project/build/package/bundle
CONTAINER_IMAGE: registry.example.com/project/package:e2e
I would like to set those variables a bit more dynamically, as there are mainly only two parts: project and package. Everything else depends on those values, that means I have to change only two values to get all other variables.
So I would expect something like
variables:
PROJECT: project
PACKAGE: package
PROD: $PROJECT_$PACKAGE
STAGE: $PROD_stage
PACKAGE_PATH: /opt/$PROJECT/build/$PACKAGE
BUILD_PATH: /opt/$PROJECT/build/$PACKAGE/bundle
CONTAINER_IMAGE: registry.example.com/$PROJECT/$PACKAGE:e2e
But it looks like, that the way doing this is wrong...
I don't know where your expectation comes from, but it is trivial to check there is no special meaning for $, _, '/' nor : if not followed by a space in YAML. There might be in gitlab, but I doubt strongly that there is in the way you expect.
To formalize your expectation, you assume that any key (from the same mapping) preceded by a $ and terminated by the end of the scalar, by _ or by / is going to be "expanded" to that key's value. The _ has to be such terminator otherwise $PROJECT_$PACKAGE would not expand correctly.
Now consider adding a key-value pair:
BREAKING_TEST: $PACKAGE_PATH
is this supposed to expand to:
BREAKING_TEST: /opt/project/build/package/bundle
or follow the rule you implied that _ is a terminator and just expand to:
BREAKING_TEST: project_PATH
To prevent this kind of ambiguity programs like bash use quoting around variable names to be expanded ( "$PROJECT"_PATH vs. $PROJECT_PATH), but the more sane, and modern, solution is to use clamping begin and end characters (e.g. { and }, $% and %, ) with some special rule to use the clamping character as normal text.
So this is not going to work as you indicated as indeed you do something wrong.
It is not to hard to pre-process a YAML file, and it can be done with e.g. Python (but watch out that { has special meaning in YAML), possible with the help of jinja2: load the variables, and then expand the original text using the variables until replacements can no longer be made.
But it all starts with choosing the delimiters intelligently. Also keep in mind that although your "variables" seem to be ordered in the YAML text, there is no such guarantee when the are constructed as dict/hash/mapping in your program.
You could e.g. use << and >>:
variables:
PROJECT: project
PACKAGE: package
PROD: <<PROJECT>>_<<PACKAGE>>
STAGE: <<PROD>>_stage
PACKAGE_PATH: /opt/<<PROJECT>>/build/<<PACKAGE>>
BUILD_PATH: /opt/<<PROJECT>>/build/<<PACKAGE>>/bundle
CONTAINER_IMAGE: registry.example.com/<<PROJECT>>/<<PACKAGE>>:e2
which, with the following program (that doesn't deal with escaping << to keep its normal meaning) generates your original, expanded, YAML exactly.
import sys
from ruamel import yaml
def expand(s, d):
max_recursion = 100
while '<<' in s:
res = ''
max_recursion -= 1
if max_recursion < 0:
raise NotImplementedError('max recursion exceeded')
for idx, chunk in enumerate(s.split('<<')):
if idx == 0:
res += chunk # first chunk is before <<, just append
continue
try:
var, rest = chunk.split('>>', 1)
except ValueError:
raise NotImplementedError('delimiters have to balance "{}"'.format(chunk))
if var not in d:
res += '<<' + chunk
else:
res += d[var] + rest
s = res
return s
with open('template.yaml') as fp:
yaml_str = fp.read()
variables = yaml.safe_load(yaml_str)['variables']
data = yaml.round_trip_load(expand(yaml_str, variables))
yaml.round_trip_dump(data, sys.stdout, indent=2)

Python trouble debugging i/0, how do I get the correct format?

I am attempting to make a dictionary into a formatted string and then write it to a file, however my entire formatting seems to be incorrect. I'm not sure how to debug since all my tester cases are given different files. I was able to use the interactive mode in python to find out what my function is actually writing to the file, and man is it so wrong! Can you help me correctly format?
Given a sorted dictionary, I created it into a string. I need the function to return it like so:
Dictionary is : {'orange':[1,3],'apple':[2]}
"apple:\t2\norange:\t1,\t3\n"
format is: Every key-value pair of the dictionary
should be output as: a string that starts with key, followed by ":", a tab, then the integers from the
value list. Every integer should be followed by a "," and a tab except for the very last one, which should be followed by a newline
Here is my function that I thought would work:
def format_item(key,value):
return key+ ":\t"+",\t".join(str(x) for x in value)
def format_dict(d):
return sorted(format_item(key,value) for key, value in d.items())
def store(d,filename):
with open(filename, 'w') as f:
f.write("\n".join(format_dict(d)))
f.close()
return None
I now have too many tabs on the last line. How do I edit the last line only out of the for loop?
ex input:
d = {'orange':[1,3],'apple':[2]}
my function gives: ['apple:\t2', 'orange:\t1,\t3']
but should give: "apple:\t2\norange:\t1,\t3\n"
Adding the newline character to the end of the return statement in format_item seems to yield the correct output.
return key+ ":\t"+",\t".join(str(x) for x in value) + '\n'
In [10]: format_dict(d)
Out[10]: ['apple:\t2\n', 'orange:\t1,\t3\n']

Unable to remove string from text I am extracting from html

I am trying to extract the main article from a web page. I can accomplish the main text extraction using Python's readability module. However the text I get back often contains several &#13 strings (there is a ; at the end of this string but this editor won't allow the full string to be entered (strange!)). I have tried using the python replace function, I have also tried using regular expression's replace function, I have also tried using the unicode encode and decode functions. None of these approaches have worked. For the replace and Regular Expression approaches I just get back my original text with the &#13 strings still present and with the unicode encode decode approach I get back the error message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 2099: ordinal not in range(128)
Here is the code I am using that takes the initial URL and using readability extracts the main article. I have left in all my commented out code that corresponds to the different approaches I have tried to remove the 
 string. It appears as though &#13 is interpreted to be u'\xa9'.
from readability.readability import Document
def find_main_article_text_2():
#url = 'http://finance.yahoo.com/news/questcor-pharmaceuticals-closes-transaction-acquire-130000695.html'
url = "http://us.rd.yahoo.com/finance/industry/news/latestnews/*http://us.rd.yahoo.com/finance/external/cbsm/SIG=11iiumket/*http://www.marketwatch.com/News/Story/Story.aspx?guid=4D9D3170-CE63-4570-B95B-9B16ABD0391C&siteid=yhoof2"
html = urllib.urlopen(url).read()
readable_article = Document(html).summary()
readable_title = Document(html).short_title()
#readable_article.replace("u'\xa9'"," ")
#print re.sub("
",'',readable_article)
#unicodedata.normalize('NFKD', readable_article).encode('ascii','ignore')
print readable_article
#print readable_article.decode('latin9').encode('utf8'),
print "There are " ,readable_article.count("
"),"
's"
#print readable_article.encode( sys.stdout.encoding , '' )
#sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
#sents = sent_tokenizer.tokenize(readable_article)
#new_sents = []
#for sent in sents:
# unicode_sent = sent.decode('utf-8')
# s1 = unicode_sent.encode('ascii', 'ignore')
#s2 = s1.replace("\n","")
# new_sents.append(s1)
#print new_sents
# u'\xa9'
I have a URL that I have been testing the code with included inside the def. If anybody has any ideas on how to remove this &#13 I would appreciate the help. Thanks, George

Resources