placing child element of xml in variable - python-3.x

Hi I'm new to xml never used it before and I'm trying to place two child elements in a variable each. So here's the XML data I'm using:
<?xml version="1.0"?>
<data>
<counties>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
</countries>
<cities>
<city id="1036323110">
<city>Katherine</city>
<country>Australia</country>
<capital>Australia</capital>
<population>1488</population>
</city>
</cities>
</data>
So I'm trying to get a variable that contains each child branch and this is what I've tried so far:
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
country = root.find(".//countries")
city = root.find(".//cities")
Am I right in approaching it in this method? Thank you

Related

Stripping whitespaces from an xml element using ElementTree

I'm having difficulty removing leading and trailing whitespace, even white space between elements that are deemed excessive. For the sake of the example, this is the xml document I'm currently running test cases on:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
<description>Liechtenstein has a lot of flowers. </description>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
<description>Singapore has a lot of street markets.</description>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
<description> Panama has a lot of great food.</description>
</country>
</data>
Notice how in description for country name = "Liechtenstein" there is excess whitespace at the end of the description or excess white space between neighbor and description in the second country element or excess leading whitespace in description of the third country node.
Every time I run my code:
# Remove whitespace for each element in the tree
for elem in root.iter():
elem.text = elem.text.strip()
elem.tail = elem.tail.strip()
I end up with the following error:
AttributeError: 'NoneType' object has no attribute 'strip'
import xml.etree.ElementTree as ET
file = 'source.xml'
root = ET.parse(file)
for elem in root.iter():
if elem.text is not None:
elem.text = elem.text.strip()
if elem.tail is not None:
elem.tail = elem.tail.strip()
# print XML with stripped out whitespace
ET.dump(root)
# pretty print XML with stripped out whitespace
ET.indent(root, space="\t", level=0)
ET.dump(root)
Output (stripped out whitespace):
<data><country name="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc><neighbor name="Austria" direction="E" /><neighbor name="Switzerland" direction="W" /><description>Liechtenstein has a lot of flowers.</description></country><country name="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighbor name="Malaysia" direction="N" /><description>Singapore has a lot of street markets.</description></country><country name="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighbor name="Costa Rica" direction="W" /><neighbor name="Colombia" direction="E" /><description>Panama has a lot of great food.</description></country></data>
Output (pretty-printed with stripped out whitespace):
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E" />
<neighbor name="Switzerland" direction="W" />
<description>Liechtenstein has a lot of flowers.</description>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N" />
<description>Singapore has a lot of street markets.</description>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W" />
<neighbor name="Colombia" direction="E" />
<description>Panama has a lot of great food.</description>
</country>
</data>

Removing the same element across all the nodes of an XML tree

For example sake, this is the xml file that I'm working with:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
<description>Liechtenstein has a lot of flowers.</description>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
<description>Singapore has a lot of street markets.</description>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
<description>Panama has a lot of great food.</description>
</country>
</data>
How would I write the code such that I could delete one node element (i.e. year or description) across each of the country nodes. For example, in the following code:
# To remove
# for country in root.findall('country'):
# year = int(country.find('year').text)
# if year > 2010:
# root.remove(country)
# tree.write('sample.xml')
I can remove any country nodes whose attribute of the element year is greater than 2010. But that removes the entire node, not just the year element. I know that I can remove a single element of a node with the following:
# for country in root.findall('country'):
# description_node = country.find('description')
# if description_node.text == "Singapore has a lot of street markets.":
# country.remove(description_node)
# tree.write('sample.xml')
But now I want to create a condition where I delete the description element or the year element or the neighbor element throughout all of the country nodes present.
One option might be the following that uses .findall and .remove:
import xml.etree.ElementTree as ET
file = 'source.xml'
data = ET.parse(file)
for country in data.findall('country'):
for neighbor in country.findall('neighbor'):
country.remove(neighbor)
for year in country.findall('year'):
country.remove(year)
for description in country.findall('description'):
country.remove(description)
ET.dump(data)
Output:
python yourscript.py
<data>
<country name="Liechtenstein">
<rank>1</rank>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<gdppc>59900</gdppc>
</country>
<country name="Panama">
<rank>68</rank>
<gdppc>13600</gdppc>
</country>
</data>
In XSLT 3.0 you can do, for example:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="year[. > 2000]"/>
</xsl:transform>
The empty template rule causes elements that match the predicate to be removed; the xsl:mode instruction causes everything else to be retained.

WP All Import creates duplicate product attributes

I'm using WP All Import to import new products from an XML, or update them. The import works perfect, but I noticed that some attributes are double. They have been created again, after first import.
Maybe for new items. I checked XML file and it is normal.
I want to import toners and I'm using an attribute for color (Χρώμα in greek) and now i have 2 different attributes named Χρώμα with the same colors. Looking for CYAN color, I found 5 products in the first attribute and 11 products in the second one.
Any idea what I'm doing wrong?
Here is the xml from one product:
<product>
<id>1077</id>
<sku>TON-CLP320BK</sku>
<name>
Συμβατό Toner TON-CLP320BK για Samsung, CLT-K4072S, Black, 1.5K
</name>
<barcode>5202705407213</barcode>
<manufacturer>PREMIUM</manufacturer>
<descr>
<p>Συμβατό Toner TON-CLP320BK για Samsung, CLT-K4072S, Black, 1.5K</p><p></p><p>Συμβατά μοντέλα : </p><p>CLP325<br/>CLP320<br/>CLP320N<br/>CLP325W<br/>CLX3185FN<br/>CLX3185FN<br/>CLX318<br/>CLX318FN<br/>CLX3185FW<br/>CLX3185W</p>
</descr>
<availability>1</availability>
<dim1>40.0</dim1>
<dim2>9.5</dim2>
<dim3>18.5</dim3>
<weight>0.791</weight>
<tax>0.000</tax>
<stock_indicator>20</stock_indicator>
<minimum_quantity_to_order>1</minimum_quantity_to_order>
<RRP>15.00</RRP>
<url>
https://www.data-media.gr/product_det.asp?catid=263&subid=329&prid=1077
</url>
<thumb>
https://www.data-media.gr/photos/TON-CLP320BK.jpg
</thumb>
<image>
https://www.data-media.gr/photos/max/TON-CLP320BK.jpg
</image>
<volume>7030.000</volume>
<courier_weight>1.786</courier_weight>
<in_offer>0</in_offer>
<guarantee>12 μήνες</guarantee>
<group>
<id>4</id>
<name>Εκτυπωτές & Toner-Ink</name>
<category>
<id>263</id>
<name>Toner - Ribbon Μελάνια</name>
<subcategory>
<id>329</id>
<name>Toner</name>
</subcategory>
</category>
</group>
<filters>
<filter>
<name_id>9</name_id>
<name>Για Brand</name>
<value_id>8</value_id>
<value>SAMSUNG</value>
</filter>
<filter></filter>
<filter></filter>
<filter></filter>
</filters>
<price>10.16</price>
<price_without_offer>10.16</price_without_offer>
<retail_percent>20</retail_percent>
<retail_price>15.12</retail_price>
<vat>24</vat>
<pp>0</pp>
</product>

I would like to find friends child element is present or not in specific parent element like 'Liechtenstein'

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
ss= 'Liechtenstein'
tag_names = set (t.tag for t in root.findall(".//*[#name=ss]/friends"))
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
<friends>
<frined name="arun" />
</friends>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
I would like to find friends child element is present or not in specific parent element like 'Liechtenstein'
here, tag_names gives empty set. but expecting tag_names=(friends)

Counting an XML attribute in Groovy

So I'm making a script in Groovy that parses a really large XML file, appends some stuff and slightly changes each element every time it appends. Each of these elements has an ID number associated with it and I want to make it so that every time an element is appended, the ID number will = the highest ID number in the file +1. I'll show a little piece of code to that will help understand:
<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns="xyxy" version="1.1">
<file original="zzz.js" source-language="en" target-language="en" datatype="javascript">
<body>
<trans-unit id="20" resname="foo">
<source>foofoo</source>
<target>foofoo</target>
</trans-unit>
<trans-unit id="21" resname="blah">
<source>blahblah</source>
<target>blahblah</target>
</trans-unit>
</body>
</file>
</xliff>
In this case, if I added an element (trans-unit) to the list, the ID would need to be 22. I have an algorithm that parses and appends, but I'm not sure how to increment the ID each time. Again, I'm using Groovy to do this. Does anyone have an idea? Thanks in advance!!
Assuming you have parsed that XML with xmlslurper or xmlparser, you should be able to get the next id with the help of max:
def xml = '''<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns="xyxy" version="1.1">
<file original="zzz.js" source-language="en" target-language="en" datatype="javascript">
<body>
<trans-unit id="20" resname="foo">
<source>foofoo</source>
<target>foofoo</target>
</trans-unit>
<trans-unit id="21" resname="blah">
<source>blahblah</source>
<target>blahblah</target>
</trans-unit>
</body>
</file>
</xliff>'''
def x = new XmlSlurper().parseText(xml)
def next = 1 + x.file.body.'trans-unit'*.#id*.text().collect { it as Integer }.max()
assert next == 22
To use XmlParser, you need to change the line to:
def next = 1 + x.file.body.'trans-unit'*.#id.collect { it as Integer }.max()

Resources