When I use the CommentedMap of the ruamel.yaml library to store ordered dictionaries, I need to put the contents of the CommentedMap in the value of the dictionary as a string, but when I manipulate it with DoubleQuotedScalarString, the output comes with unneeded fields like ordereddict.
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap # CommentedMap用于解决ordereddict数据dump时带"!omap"这样的字段
from ruamel.yaml.scalarstring import SingleQuotedScalarString,DoubleQuotedScalarString
from pathlib import Path
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.indent(mapping=4, sequence=6, offset=4)
file_yml = CommentedMap()
test = CommentedMap()
test['test1'] = "test1"
test['test2'] = "test2"
file_yml["test"] = DoubleQuotedScalarString(test)
path = Path("./test.yaml")
yaml.dump(file_yml, path)
the result as follow
test: "ordereddict([('test1', 'test1'), ('test2', 'test2')])"
I'm looking forward to is the result of the test, "{' test1 ': 'test1', 'test2': 'test2'}"
I would appreciate it if you could tell me how to achieve it?
You shouldn't apply DoubleQuotedScalarString to a CommentedMap. The only thing the former is useful for is to make sure individual strings, that may part of a mapping or sequence, do get double quotes. By applying it the CommentedMap, you convert the whole into a string and that CommntedMap is an ordereddict.
You should probably just do:
test = dict()
and then later on:
file_yml["test"] = str(test)
On modern versions of Python this will preserve the key insertion order, and the quotes should be added as a scalar cannot start with { without being quoted automatically.
If test needs to be a CommentedMap before dumping it as a string, then cast it to a dict:
test = CommentedMap()
.....
file_yaml["test"] = str(dict(test))
Related
my goal is to create several lists out of the contents of several files. In the past, I have used '{}'.format(x) inside of loops as a way to change the paths inside the loop to match whichever item in the list the loop is working on. Now I want to extend that to appending to lists outside the loop. Here is the code I am using currently.
import csv
import os
c3List = []
c4List = []
camList = []
plantList = ('c3', 'c4', 'cam')
for p in plantList:
plantFolder = folder path
plantCsv = '{}List.csv'.format(p)
plantPath = os.path.join(plantFolder, plantCsv)
with open(plantPath) as plantParse:
reader = csv.reader(plantParse)
data = list(reader)
'{}List'.format(p).append(data)
But this is giving me AttributeError: 'str' object has no attribute 'append'
if I try to make a variable like this
pList = '{}List'.format(p)
pList.append(data)
I get the same error. Any advice would be appreciated. I am using Python 3.
Because list object are mutable, you could create a dict referencing all of your lists.
For example with this:
myList = []
myDict = {"a": myList}
myDict["a"].append("appended_by_reference")
myList.append("appended_directly")
print(myList)
you will get ['appended_by_reference', 'appended_directly'] printed.
If you want to learn more about mutability and immutability in python see link.
So my own implementation to achieve your goal would be:
import csv
from pathlib import Path
c3List = []
c4List = []
camList = []
plantList = {'c3': c3List, 'c4': c4List, 'cam': camList}
plantFolder = `folder path`
for p in plantList:
plantCsv = f'{p}List.csv'
plantPath = Path(plantFolder, plantCsv)
with open(plantPath) as plantParse:
reader = csv.reader(plantParse)
data = list(reader)
plantList[p].append(data)
Note: I used an fstring to format the string and pathlib to define filepaths
I am new to Python coding. I am able to create the output XML file. I want to use a variable which holds a string value and pass it to 'predicate' of 'find()'. Is this achievable? How to make this work?
I am using LXML package with Python 3.6. Below is my code. Area of problem is commented at the end of the code.
import lxml.etree as ET
# Create root element
root = ET.Element("Base", attrib={'Name': 'My Base Node'})
# Create first child element
FirstElement = ET.SubElement(root, "FirstNode", attrib={'Name': 'My First Node', 'Comment':'Hello'})
# Create second child element
SecondElement = ET.SubElement(FirstElement, "SecondNode", attrib={'Name': 'My Second Node', 'Comment': 'World'})
# Create XML file
XML_data_as_string = ET.tostring(root, encoding='utf8')
with open("TestFile.xml", "wb") as f:
f.write(XML_data_as_string)
# Variable to substitute in second portion of predicate
NewValue = "My Second Node"
# #### AREA OF PROBLEM ###
# Question. How to pass variable 'NewValue' in the predicate?
# Gives "SyntaxError: invalid predicate"
x = root.find("./FirstNode/SecondNode[#Name={subs}]".format(subs=NewValue))
# I commented above line and reexecuted the code with this below line
# enabled. It gave "ValueError: empty namespace prefix must be passed as None,
# not the empty string"
x = root.find("./FirstNode/SecondNode[#Name=%s]", NewValue)
As Daniel Haley said - you're missing a single quotes in #Name={subs}.
The following line works for me:
x = root.find("./FirstNode/SecondNode[#Name='{subs}']".format(subs=NewValue))
Since you use Python 3.6, you can utilize f-strings:
x = root.find(f"./FirstNode/SecondNode[#Name='{NewValue}']")
The "proper" way to solve this would be to use XPath variables, which are not supported by find() (and consequently, aren't supported by xml.etree from the standard library either) but are supported by xpath().
NewValue = "AJL's Second Node" # Uh oh, that apostrophe is going to break something!!
x_list = root.xpath("./FirstNode/SecondNode[#Name=$subs]", subs=NewValue)
x = x_list[0]
This avoids any sort of issue you might otherwise run into with quoting and escaping.
The main caveat of this method is namespace support, since it doesn't use the bracket syntax of find.
x = root.find("./{foobar.xsd}FirstNode")
# Brackets are doubled to avoid conflicting with `.format()`
x = root.find("./{{foobar.xsd}}FirstNode/SecondNode[#Name='{subs}']".format(subs=NewValue))
Instead, you must specify those in a separate dict:
ns_list = {'hello':'foobar.xsd'}
x_list = root.xpath("./hello:FirstNode/SecondNode[#Name=$subs]", namespaces=ns_list , subs=NewValue)
x = x_list[0]
I have extracted text from an HTML file, and have the whole thing in a string.
I am looking for a method to loop through the string, and extract only values that are within square brackets and put strings in a list.
I have looked in to several questions, among them this one: Extract character before and after "/"
But i am having a hard time modifying it. Can someone help?
Solved!
Thank you for all your inputs, I will definitely look more into regex. I managed to do what i wanted in a pretty manual way (may not be beautiful):
#remove all html code and append to string
for i in html_file:
html_string += str(html2text.html2text(i))
#set this boolean if current character is either [ or ]
add = False
#extract only values within [ or ], based on add = T/F
for i in html_string:
if i == '[':
add = True
if i == ']':
add = False
clean_string += str(i)
if add == True:
clean_string += str(i)
#split string into list without square brackets
clean_string_list = clean_string.split('][')
The HTML file I am looking to get as pure text (dataframe later on) instead of HTML, is my personal Facebook data that i have downloaded.
Try out this regex, given a string it will place all text inside [ ] into a list.
import re
print(re.findall(r'\[(\w+)\]','spam[eggs][hello]'))
>>> ['eggs', 'hello']
Also this is a great reference for building your own regex.
https://regex101.com
EDIT: If you have nested square brackets here is a function that will handle that case.
import re
test ='spam[eg[nested]gs][hello]'
def square_bracket_text(test_text,found):
"""Find text enclosed in square brackets within a string"""
matches = re.findall(r'\[(\w+)\]',test_text)
if matches:
found.extend(matches)
for word in found:
test_text = test_text.replace('[' + word + ']','')
square_bracket_text(test_text,found)
return found
match = []
print(square_bracket_text(test,match))
>>>['nested', 'hello', 'eggs']
hope it helps!
You can also use re.finditer() for this, see below example.
Let suppose, we have word characters inside brackets so regular expression will be \[\w+\].
If you wish, check it at https://rextester.com/XEMOU85362.
import re
s = "<h1>Hello [Programmer], you are [Excellent]</h1>"
g = re.finditer("\[\w+\]", s)
l = list() # or, l = []
for m in g:
text = m.group(0)
l.append(text[1: -1])
print(l) # ['Programmer', 'Excellent']
My application offers the ability to the user to export its results. My application exports text files with name Exp_Text_1, Exp_Text_2 etc. I want it so that if a file with the same file name pre-exists in Desktop then to start counting from this number upwards. For example if a file with name Exp_Text_3 is already in Desktop, then I want the file to be created to have the name Exp_Text_4.
This is my code:
if len(str(self.Output_Box.get("1.0", "end"))) == 1:
self.User_Line_Text.set("Nothing to export!")
else:
import os.path
self.txt_file_num = self.txt_file_num + 1
file_name = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt" + "_" + str(self.txt_file_num) + ".txt")
file = open(file_name, "a")
file.write(self.Output_Box.get("1.0", "end"))
file.close()
self.User_Line_Text.set("A text file has been exported to Desktop!")
you likely want os.path.exists:
>>> import os
>>> help(os.path.exists)
Help on function exists in module genericpath:
exists(path)
Test whether a path exists. Returns False for broken symbolic links
a very basic example would be create a file name with a formatting mark to insert the number for multiple checks:
import os
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
#the "{}" is a formatting mark so we can do file_name.format(num)
num = 1
while os.path.exists(name_to_format.format(num)):
num+=1
new_file_name = name_to_format.format(num)
this would check each filename starting with Exp_Txt_1.txt then Exp_Txt_2.txt etc. until it finds one that does not exist.
However the format mark may cause a problem if curly brackets {} are part of the rest of the path, so it may be preferable to do something like this:
import os
def get_file_name(num):
return os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_" + str(num) + ".txt")
num = 1
while os.path.exists(get_file_name(num)):
num+=1
new_file_name = get_file_name(num)
EDIT: answer to why don't we need get_file_name function in first example?
First off if you are unfamiliar with str.format you may want to look at Python doc - common string operations and/or this simple example:
text = "Hello {}, my name is {}."
x = text.format("Kotropoulos","Tadhg")
print(x)
print(text)
The path string is figured out with this line:
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
But it has {} in the place of the desired number. (since we don't know what the number should be at this point) so if the path was for example:
name_to_format = "/Users/Tadhg/Desktop/Exp_Txt_{}.txt"
then we can insert a number with:
print(name_to_format.format(1))
print(name_to_format.format(2))
and this does not change name_to_format since str objects are Immutable so the .format returns a new string without modifying name_to_format. However we would run into a problem if out path was something like these:
name_to_format = "/Users/Bob{Cat}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Bobcat{}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Smiley{:/Desktop/Exp_Txt_{}.txt"
Since the formatting mark we want to use is no longer the only curly brackets and we can get a variety of errors:
KeyError: 'Cat'
IndexError: tuple index out of range
ValueError: unmatched '{' in format spec
So you only want to rely on str.format when you know it is safe to use. Hope this helps, have fun coding!
I have a set of links that looks like the following:
links = ['http://www.website.com/category/subcategory/1',
'http://www.website.com/category/subcategory/2',
'http://www.website.com/category/subcategory/3',...]
I want to extract the 1, 2, 3, and so on from this list, and store the extracted data in subcategory_explicit. They're stored as str, and I'm having trouble getting at them with the following code:
subcategory_explicit = [cat.get('subcategory') for cat in links if cat.get('subcategory') is not None]
Do I have to change my data type from str to something else? What would be a better way to obtain and store the extracted values?
subcategory_explicit = [i[i.find('subcategory'):] for i in links if 'subcategory' in i]
This uses a substring via slicing, starting at the "s" in "subcategory" until the end of the string. By adding len('subcategory') to the value from find, you can exclude "subcategory" and get "/#" (where # is whatever number).
Try this (using re module):
import re
links = [
'http://www.website.com/category/subcategory/1',
'http://www.website.com/category/subcategory/2',
'http://www.website.com/category/subcategory/3']
d = "|".join(links)
# 'http://www.website.com/category/subcategory/1|http://www.website.com/category/subcategory/2|http://www.website.com/category/subcategory/3'
pattern = re.compile("/category/(?P<category_name>\w+)/\d+", re.I)
subcategory_explicit = pattern.findall(d)
print(subcategory_explicit)