Python3 migration xml write issue - python-3.x

Currently frustrated with code migration to Python3 (3.6.8)
out_fname is a .cproject file (xml format)
self.cproject_xml = ET.parse(self.CPROJ_NAME))
with open(out_fname, 'a') as cxml:
cxml.write('<?xml version="1.0" encoding="UTF-8" standalone="no"?>\n')
cxml.write('<?fileVersion 4.0.0?>')
self.cproject_xml.write(cxml,encoding='utf-8')
leads to:
File "/home/build/workspace/bismuth_build_nightly_py3#2/venv/lib/python3.6/site-packages/tinlane/cprojecttools.py", line 209, in export_cproject
self.cproject_xml.write(fxml)
snips..
File "/usr/lib64/python3.6/xml/etree/ElementTree.py", line 946, in _serialize_xml
write(_escape_cdata(elem.tail))
TypeError: write() argument must be str, not bytes
I have tried all different ways (be careful, i need the "a" when opening my file) to make it work (posting original python2 code, not the alternates). Usually i just placed a "b" in r,a,w which would solve the problem. No it doesn't work:
(cxml.write('<?xml version="1.0" encoding="UTF-8" standalone="no"?>\n')
TypeError: a bytes-like object is required, not 'str')
even when i convert to bytes (wrong in my opinion)
Minimal Example to reproduce:
create 2 identical files (file1, file2) with the following content:
<note>
<to>minimal</to>
<from>xml</from>
<heading>file</heading>
<body>content</body>
</note>
and run this codeblock:
import xml.etree.ElementTree as ET
cproject_xml = ET.parse('file1')
fname = 'file2'
with open(fname, 'a') as cxml:
cxml.write('<?xml version="1.0" encoding="UTF-8" standalone="no"?>\n')
cxml.write('<?fileVersion 4.0.0?>')
cproject_xml.write(cxml,encoding='utf-8')
When run with python2, file2 becomes:
<note>
<to>minimal</to>
<from>xml</from>
<heading>file</heading>
<body>content</body>
</note>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?fileVersion 4.0.0?><note>
<to>minimal</to>
<from>xml</from>
<heading>file</heading>
<body>content</body>
</note>
Any ideas?
Thanks

I'm sure I'm missing something but it doesn't make sense try to write the tree (cproject_xml) to the open file handle (cxml).
I think it would make more sense to serialize the tree and write directly to the open file.
Try changing:
cproject_xml.write(cxml, encoding='utf-8')
to:
cxml.write(ET.tostring(cproject_xml.getroot()).decode())

Related

python xml write with using namespace

Are those two roots the same??
Output xml got changed about the order in the root.
If they are different each other, how could I fix it??
#Python 3.7
import xml.etree.ElementTree as ET
ET.register_namespace('xsi', "http://www.w3.org/2001/test")
ET.register_namespace('', "http://www.test.com/test/test/test")
tree = ET.parse('test.xml')
tree.write("test1.xml", encoding='utf-8', xml_declaration=True)
#input XML root
<root xmlns:xsi="http://www.w3.org/2001/test" schemaVersion="2.8" xmlns="http://www.test.com/test/test/test" labelVersion="1" xsi:schemaLocation="http://www.test.com/test/test/test ..\Schema\CLIFSchema.xsd" name="test.xml">
#output XML root
<root xmlns="http://www.test.com/test/test/test" xmlns:xsi="http://www.w3.org/2001/test" labelVersion="1" name="test.xml" schemaVersion="2.8" xsi:schemaLocation="http://www.test.com/test/test/test ..\Schema\CLIFSchema.xsd">

How to write dictionary into xml file with pretty print (python)?

d = {"a":"a1234","b":"b5678","c":"c4554545"}
Tried converting this dictionary d to xml, as below
<?xml version="1.0" encoding="UTF-8" ?>
<test>
<a>a1234</a>
<b>b5678</b>
<c>c4554545</c>
</test>
Code:
from dicttoxml import dicttoxml
xml = dicttoxml(d, custom_root='test', attr_type=False)
# Above 'xml' is of type bytes here
xml = xml.decode("utf-8") # Converting bytes to string
print(xml) # prints, <?xml version="1.0" encoding="UTF-8" ?><test><a>a1234</a><b>b5678</b><c>c4554545</c></test>
Tried printing above xml output with pretty print, but end up obtaining below (excludes <?xml version)
<test>
<a>a1234</a>
<b>b5678</b>
<c>c4554545</c>
</test>
How to pretty print as below ?
<?xml version="1.0" encoding="UTF-8" ?>
<test>
<a>a1234</a>
<b>b5678</b>
<c>c4554545</c>
</test>

'list' object has not attribute 'get' Python3.8 while getting info from XML

So I'm trying to extract info from an XML file but I keep getting this error:
AttributeError: 'list' object has no attribute 'get'
My Code:
from xml.etree import ElementTree as ET
file = ET.parse('db1.xml')
drug = file.findall('drugbank/drug/products')
f = []
for x in drug:
f.append(x.text)
return f
My XML:
<?xml version="1.0" encoding="UTF-8"?>
<drugbank xmlns="http://www.drugbank.ca" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.drugbank.ca http://www.drugbank.ca/docs/drugbank.xsd" version="5.1" exported-on="2019-07-02">
<drug type="biotech" created="2005-06-13" updated="2019-06-04">
<products>
<product>
<name>Refludan</name>
<labeller>Bayer</labeller>
<ndc-id/>
<ndc-product-code/>
<dpd-id>02240996</dpd-id>
<ema-product-code/>
<ema-ma-number/>
<started-marketing-on>2000-01-31</started-marketing-on>
<ended-marketing-on>2013-07-26</ended-marketing-on>
<dosage-form>Powder, for solution</dosage-form>
<strength>50 mg</strength>
<route>Intravenous</route>
<fda-application-number/>
<generic>false</generic>
<over-the-counter>false</over-the-counter>
<approved>true</approved>
<country>Canada</country>
<source>DPD</source>
</product>
</products>
</drug>
</drugbank>
I also tried using drug = file.findall('drugbank/drug/products/name') instead of drug = file.findall('drugbank/drug/products') but it still gave the same error.
I found the issue . Use this code to get the names of your products :
import xml.etree.ElementTree as ET
xml_str = '''<?xml version="1.0" encoding="UTF-8"?>
<drugbank xmlns="http://www.drugbank.ca" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.drugbank.ca http://www.drugbank.ca/docs/drugbank.xsd" version="5.1" exported-on="2019-07-02">
<drug type="biotech" created="2005-06-13" updated="2019-06-04">
<products>
<product>
<name>Refludan</name>
<labeller>Bayer</labeller>
<ndc-id/>
<ndc-product-code/>
<dpd-id>02240996</dpd-id>
<ema-product-code/>
<ema-ma-number/>
<started-marketing-on>2000-01-31</started-marketing-on>
<ended-marketing-on>2013-07-26</ended-marketing-on>
<dosage-form>Powder, for solution</dosage-form>
<strength>50 mg</strength>
<route>Intravenous</route>
<fda-application-number/>
<generic>false</generic>
<over-the-counter>false</over-the-counter>
<approved>true</approved>
<country>Canada</country>
<source>DPD</source>
</product>
</products>
</drug>
</drugbank>
'''
root = ET.fromstring(xml_str)
# print(root.findall('{http://www.drugbank.ca}drug'))
ns = {'drug_bank': 'http://www.drugbank.ca'}
for drug in root.findall('drug_bank:drug', ns):
for products in drug.findall('drug_bank:products', ns):
for product in products.findall('drug_bank:product', ns):
for nametag in product.findall('drug_bank:name', ns):
print(nametag.text)
Output : Refludan
Explanation :
First I printed root and got this :
<Element '{http://www.drugbank.ca}drugbank' at 0x7f688ffc0770>
So I realised this was Namespace-XML-pattern to be used.
Here is the link to help you understand the topic - https://docs.python.org/3/library/xml.etree.elementtree.html#parsing-xml-with-namespaces

XML-Parsing error AttributeError: 'NoneType' object has no attribute 'text'

There is probably a simple solution to my problem, but I am very new to python3 so please go easy on me;)
I have a simple script running, which already successfully parses information from an xml-file using this code
import xml.etree.ElementTree as ET
root = ET.fromstring(my_xml_file)
u = root.find(".//name").text.rstrip()
print("Name: %s\n" % u)
The xml I am parsing looks like this
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3.2/style/exchange.xsl"?>
<example:world-data xmlns="http://www.example.org" xmlns:ops="http://example.oorg" xmlns:xlink="http://www.w3.oorg/1999/xlink">
<exchange-documents>
<exchange-document system="acb.org" family-id="543672" country="US" doc-number="95962" kind="B2">
<bibliographic-data>
<name>SomeName</name>
...and so on... and ends like this
</exchange-document>
</exchange-documents>
</example:world-data>
(Links are edited due to stackoverflow policy)
Output as expected
SomeName
However, if I try to parse another xml from the same api using the same python commands, I get this error-code
AttributeError: 'NoneType' object has no attribute 'text'
The second xml-file looks like this
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3.2/style/pub-ftxt-claims.xsl"?>
<ops:world-data xmlns="http://www.example.org/exchange" xmlns:example="http://example.org" xmlns:xlink="http://www.example.org/1999/xlink">
<ftxt:fulltext-documents xmlns="http://www.examp.org/fulltext" xmlns:ftxt="ww.example/fulltext">
<ftxt:fulltext-document system="example.org" fulltext-format="text-only">
<bibliographic-data>
<publication-reference data-format="docdb">
<document-id>
<country>EP</country>
<doc-number>10000</doc-number>
<kind>A</kind>
</document-id>
</publication-reference>
</bibliographic-data>
<claims lang="EN">
<claim>
<claim-text>1. Some text.</claim-text>
<claim-text>2. Some text.</claim-text>
<claim-text>2. Some text.</claim-text>
</claim>
</claims>
</ftxt:fulltext-document>
</ftxt:fulltext-documents>
</ops:world-data>
I tried again
root = ET.fromstring(usr_str)
u = root.find(".//claim-text").text.rstrip()
print("Abstract: %s\n" % u)
Expected output
1. Some text.
But it only prints the above mentioned error message.
Why can I parse the first xml but not the second one using these commands?
Any help is highly appreciated.
edit: code by Jack Fleeting works in python console, but unfortunately not in my PyCharm
from lxml import etree
root = etree.XML(my_xml.encode('ascii'))
root2 = etree.XML(my_xml2.encode('ascii'))
root.xpath('//*[local-name()="name"]/text()')
root2.xpath('//*[local-name()="claim-text"]/text()')
Could this be a bug in my PyCharm? My first mentioned code snippet still prints a correct result for name...
edit: Turns out I had to force the output using
a = root3.xpath('//*[local-name()="claim-text"]/text()')
print(a, flush=True)
A couple of problems here before we get to a possible solution. One, the first xml snippet you provided is invalid (for instance, the <bibliographic-data> isn't closed). I realize it's just a snippet but since this is what we have to work with, I modified the snippet below to fix that. Two, both snippets have xmlns declaration with unbound (unused) prefixes (example:world-datain the first, and ops:world-data in the second). I had to remove these prefixes, too, for the rest to work.
Given these modifications, using the lxml library should work for you.
First modified snippet:
my_xml = """<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3.2/style/exchange.xsl"?>
<world-data xmlns="http://www.example.org" xmlns:ops="http://example.oorg" xmlns:xlink="http://www.w3.oorg/1999/xlink">
<exchange-documents>
<exchange-document system="acb.org" family-id="543672" country="US" doc-number="95962" kind="B2">
<bibliographic-data>
<name>SomeName</name>
...and so on... and ends like this
</bibliographic-data>
</exchange-document>
</exchange-documents>
</world-data>"""
And:
my_xml2 = """<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3.2/style/pub-ftxt-claims.xsl"?>
<world-data xmlns="http://www.example.org/exchange" xmlns:example="http://example.org" xmlns:xlink="http://www.example.org/1999/xlink">
<ftxt:fulltext-documents xmlns="http://www.examp.org/fulltext" xmlns:ftxt="ww.example/fulltext">
<ftxt:fulltext-document system="example.org" fulltext-format="text-only">
<bibliographic-data>
<publication-reference data-format="docdb">
<document-id>
<country>EP</country>
<doc-number>10000</doc-number>
<kind>A</kind>
</document-id>
</publication-reference>
</bibliographic-data>
<claims lang="EN">
<claim>
<claim-text>1. Some text.</claim-text>
<claim-text>2. Some text.</claim-text>
<claim-text>3. Some text.</claim-text>
</claim>
</claims>
</ftxt:fulltext-document>
</ftxt:fulltext-documents>
</world-data>"""
And now to work:
from lxml import etree
root = etree.XML(my_xml.encode('ascii'))
root2 = etree.XML(my_xml2.encode('ascii'))
root.xpath('//*[local-name()="name"]/text()')
output:
['SomeName']
root2.xpath('//*[local-name()="claim-text"]/text()')
Output:
['1. Some text.', '2. Some text.', '3. Some text.']

Parsing XML attribute with namespace python3

I have looked at the other question over Parsing XML with namespace in Python via 'ElementTree' and reviewed the xml.etree.ElementTree documentation. The issue I'm having is admittedly similar so feel free to tag this as duplicate, but I can't figure it out.
The line of code I'm having issues with is
instance_alink = root.find('{http://www.w3.org/2005/Atom}link')
My code is as follows:
import xml.etree.cElementTree as ET
tree = ET.parse('../../external_data/rss.xml')
root = tree.getroot()
instance_title = root.find('channel/title').text
instance_link = root.find('channel/link').text
instance_alink = root.find('{http://www.w3.org/2005/Atom}link')
instance_description = root.find('channel/description').text
instance_language = root.find('channel/language').text
instance_pubDate = root.find('channel/pubDate').text
instance_lastBuildDate = root.find('channel/lastBuildDate').text
The XML file:
<?xml version="1.0" encoding="windows-1252"?>
<rss version="2.0">
<channel>
<title>Filings containing financial statements tagged using the US GAAP or IFRS taxonomies.</title>
<link>http://www.example.com</link>
<atom:link href="http://www.example.com" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
<description>This is a list of up to 200 of the latest filings containing financial statements tagged using the US GAAP or IFRS taxonomies, updated every 10 minutes.</description>
<language>en-us</language>
<pubDate>Mon, 20 Nov 2017 20:20:45 EST</pubDate>
<lastBuildDate>Mon, 20 Nov 2017 20:20:45 EST</lastBuildDate>
....
The attributes I'm trying to retrieve are in line 6; so 'href', 'type', etc.
<atom:link href="http://www.example.com" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
Obviously, I've tried
instance_alink = root.find('{http://www.w3.org/2005/Atom}link').attrib
but that doesn't work cause it's type None. My thought is that it's looking for children but there are none. I can grab the attributes in the other lines in XML but not these for some reason. I've also played with ElementTree and lxml (but lxml won't load properly on Windows for whatever reason)
Any help is greatly appreciated cause the documentation seems sparse.
I was able to solve with
alink = root.find('channel/{http://www.w3.org/2005/Atom}link').attrib
the issue is that I was looking for the tag {http://www.w3.org/2005/Atom}link at the same level of <channel>, which, of course, didn't exist.

Resources