Python 2.7 to 3.6 Code porting issue --copying xml data to list - python-3.x

emphasized textHi , I am currently Porting a Piece of code from python 2.7 to python 3.6 , The Code involves dumping data from xml into a list the codes looks as below
import os
import xml.etree.ElementTree as ET
regs_list = []
tree = ET.parse("test.xml")
root = tree.getroot()
for reg in root:
regs_list.append(reg)
for i in range(len(regs_list)):
print (regs_list[i].attrib["name"])
if not (regs_list[i].find("field") == None):
for regs_list[i].field in regs_list[i]:
print (regs_list[i].field.attrib["first_bit"])
The XML looks like this
<register offset="0x4" width="4" defaultValue="0x100000" name="statuscommand" desc="STATUSCOMMAND- Status and Command ">
<field first_bit="30" last_bit="31" WH="ROOO" flask="0xc0000000" name="reserved0" desc=""/>
<field first_bit="29" last_bit="29" WH="1CWRH flask="0x20000000" name="rma" desc=""/>
<field first_bit="28" last_bit="28" WH="1CWRH flask="0x10000000" name="rta" desc=""/>
</register>
<register offset="0x8" width="4" defaultValue="0x8050100" name="reve" desc="REVCLASSCODE - Revision ID and Class Code">
<field first_bit="8" last_bit="31" WH="ROOO" flask="0xffffff00" name="class_codes" desc=""/>
<field first_bit="0" last_bit="7" WH="ROOO" flask="0xff" name="rid" desc=""/>
</register>
This Code works perfectly fine in python 2.7 , we are able to dump both parent (Register) and child (field) into the list regs_list , But in 3.6 we get an error as below
3.6 output "
for regs_list[i].field in regs_list[i]:
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'field'
2.7 output
Parent and child parsed and dumped in the list without error
Is there a difference in the way xml.etree.ElementTree.Element and lists work in 3.6 and 2.7 ??

Related

python xml write with using namespace

Are those two roots the same??
Output xml got changed about the order in the root.
If they are different each other, how could I fix it??
#Python 3.7
import xml.etree.ElementTree as ET
ET.register_namespace('xsi', "http://www.w3.org/2001/test")
ET.register_namespace('', "http://www.test.com/test/test/test")
tree = ET.parse('test.xml')
tree.write("test1.xml", encoding='utf-8', xml_declaration=True)
#input XML root
<root xmlns:xsi="http://www.w3.org/2001/test" schemaVersion="2.8" xmlns="http://www.test.com/test/test/test" labelVersion="1" xsi:schemaLocation="http://www.test.com/test/test/test ..\Schema\CLIFSchema.xsd" name="test.xml">
#output XML root
<root xmlns="http://www.test.com/test/test/test" xmlns:xsi="http://www.w3.org/2001/test" labelVersion="1" name="test.xml" schemaVersion="2.8" xsi:schemaLocation="http://www.test.com/test/test/test ..\Schema\CLIFSchema.xsd">

Parse nested XML file and get the value of previous tag in python

I have a huge nested .xml file with a lots of entries. What I need is to find a previous value if I know the child ID.
Extraction of my xml file:
<?xml version="1.0"?>
<nodes>
<node>
<node_id>0x2D</node_id>
<num_1>11</num_1>
<num_2>905.908</num_2>
<signs>
<sign>
<sign_id>30</sign_id>
<name>INDEX_0</name>
<size_b>842069</size_b>
<content>
<models>
<model>1_x</model>
<model>2_x</model>
<model>3_x</model>
<model>4_x</model>
</models>
<images>
<image>
<value>VALUE1</value>
<folder_ids>
<folder_id>012345678</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE2</value>
<folder_ids>
<folder_id>1235365454</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE3</value>
<folder_ids>
<folder_id>3562377456</folder_id>
<folder_id>3566743626</folder_id>
<folder_id>012345678</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE4</value>
<folder_ids>
<folder_id>34627876</folder_id>
</folder_ids>
</image>
<image>
.
.
.
So for example if I need to find all values that contain 012345678 folder_id.
I tried to use lxml library.
Simple code:
from lxml import etree
tree = etree.parse('D:\\test_nested_xml.xml')
#root = etree.Element("root")
for element in tree.iter(tag="folder_id"):
if element.text == '012345678':
print("%s - %s" % (element.text, element.getparent))
But in output I get following entries:
012345678 - <bound method _Element.getparent of <Element folder_id at 0x2cf2648>>
012345678 - <bound method _Element.getparent of <Element folder_id at 0x2cf2620>>
And it is not what I need.
Expected result for me is something like:
012345678 - VALUE1
012345678 - VALUE3
Could someone help me how to correctly parse xml file and get what I need?
You're currently printing the method itself.
print("%s - %s" % (element.text, element.getparent))
If you want to see what the method returns, you need to call it.
print("%s - %s" % (element.text, element.getparent()))
You can also use XPath to select the desired values in one step:
search_id = '012345678'
for value in tree.xpath(f"//image[folder_ids/folder_id='{search_id}']/value"):
print(value.text)

parse xml whose attribute include double quote with lxml

I can't get an xpath research on the attribute "Frais de Services" with lxml:
I have an xml file whose content is the folowing:
<column caption='Choix Découpage' name='[Aujourd&apos;Hui Parameter (copy 2)]'>
<alias key='"Frais de Services"' value='Offline Fees' />
</column>
from lxml import etree
import sys
tree = etree.parse('test.xml')
root = tree.getroot()
print([node.attrib['key'] for node in root.xpath("//alias")]) # we get ['"Billetterie Ferroviaire"']
I tried many hack, none works (i can't understand why lxml change internally the original "Predefined entities"):
root.xpath('//alias[#key="\"Frais de Services\""]')
root.xpath('//alias[#key=""Frais de Services""]')

Parsing XML attribute with namespace python3

I have looked at the other question over Parsing XML with namespace in Python via 'ElementTree' and reviewed the xml.etree.ElementTree documentation. The issue I'm having is admittedly similar so feel free to tag this as duplicate, but I can't figure it out.
The line of code I'm having issues with is
instance_alink = root.find('{http://www.w3.org/2005/Atom}link')
My code is as follows:
import xml.etree.cElementTree as ET
tree = ET.parse('../../external_data/rss.xml')
root = tree.getroot()
instance_title = root.find('channel/title').text
instance_link = root.find('channel/link').text
instance_alink = root.find('{http://www.w3.org/2005/Atom}link')
instance_description = root.find('channel/description').text
instance_language = root.find('channel/language').text
instance_pubDate = root.find('channel/pubDate').text
instance_lastBuildDate = root.find('channel/lastBuildDate').text
The XML file:
<?xml version="1.0" encoding="windows-1252"?>
<rss version="2.0">
<channel>
<title>Filings containing financial statements tagged using the US GAAP or IFRS taxonomies.</title>
<link>http://www.example.com</link>
<atom:link href="http://www.example.com" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
<description>This is a list of up to 200 of the latest filings containing financial statements tagged using the US GAAP or IFRS taxonomies, updated every 10 minutes.</description>
<language>en-us</language>
<pubDate>Mon, 20 Nov 2017 20:20:45 EST</pubDate>
<lastBuildDate>Mon, 20 Nov 2017 20:20:45 EST</lastBuildDate>
....
The attributes I'm trying to retrieve are in line 6; so 'href', 'type', etc.
<atom:link href="http://www.example.com" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
Obviously, I've tried
instance_alink = root.find('{http://www.w3.org/2005/Atom}link').attrib
but that doesn't work cause it's type None. My thought is that it's looking for children but there are none. I can grab the attributes in the other lines in XML but not these for some reason. I've also played with ElementTree and lxml (but lxml won't load properly on Windows for whatever reason)
Any help is greatly appreciated cause the documentation seems sparse.
I was able to solve with
alink = root.find('channel/{http://www.w3.org/2005/Atom}link').attrib
the issue is that I was looking for the tag {http://www.w3.org/2005/Atom}link at the same level of <channel>, which, of course, didn't exist.

XPath: How can I get all nodes with specific path suffix?

I have next xml-file (E://testHtmlFile.html):
<bookstore>
<anyA>
<anyB>
<book>
<title lang="end">Harry Potter</title>
<price>29.99</price>
</book>
</anyB>
<book>
<title lang="era">Learning XML</title>
<price>39.95</price>
</book>
</anyA>
</bookstore>
I need to get all elements inside tags anyB/book with lang="e...", there is one:
<title lang="end">Harry Potter</title>
For this purpose I use next xpath string:
"//anyB/book/*[#lang[starts-with(., 'e')]]"
but it don't return any element. What is wrong?
My Python program:
from lxml.html import document_fromstring
with open ("E://testHtmlFile.html", "r") as myfile:
html=myfile.read()
tree = document_fromstring(html)
books = tree.xpath("//anyB/book/*[#lang[starts-with(., 'e')]]")
outStr="";
for curBook in books:
outStr= outStr+curBook.text+"\n"
outputRes=open("E:\\out.txt", 'w+');
print(outStr,file=outputRes)
If replace tag
<anyB>
by
<other>
then my program work fine.
It looks like a problem with letter registers...

Resources