Groovy xml selecting child node attribute based on parent mode attribute - groovy

I have an xml structure like this
<JJJ version="2.0" >
<Data >
<Object class="classX">
<k name="constname1">variable1</p>
<k name="constname2">variable2</p>
<k name="constname3">variable3</p>
</Object>
<Object class="classY">
<k name="constname1">variable11</p>
<k name="constname2">variable12</p>
<k name="constname3">variable13</p>
</Object>
I need to select ClassX node and in that value of atttribute containing constname1 (ie variable1
def parser = new XmlSlurper()
def mymo=records.'Data'.'Object';
def mytest = mymo.findAll{it.#class.text()=="ClassX"};
mytest.each{
it.'p'.each{
println it.#name.text() +'-'+ it.text() }
}
This is working. But instead of then comapring text in the loop I need to do something like this
def testme= mytest.'k'.find{ it.#name.text()=="constname1"}
This I am not getting right
However the below is right
println mymo.'k'.find{it.#name.text()=="constname1"}
But I want to restrict selection to the node for classX

Correcting your xml so it's valid, gives:
def xml = '''<JJJ version="2.0" >
| <Data >
| <Object class="classX">
| <k name="constname1">variable1</k>
| <k name="constname2">variable2</k>
| <k name="constname3">variable3</k>
| </Object>
| <Object class="classY">
| <k name="constname1">variable14</k>
| <k name="constname2">variable15</k>
| <k name="constname3">variable16</k>
| </Object>
| <Object class="classX">
| <k name="constname1">variable7</k>
| <k name="constname2">variable8</k>
| <k name="constname3">variable9</k>
| </Object>
| </Data>
|</JJJ>'''.stripMargin()
List var = new XmlSlurper().parseText( xml ).'**'.grep {
it.#name == 'constname1' && it.parent().#class == 'classX'
}
assert var == [ 'variable1', 'variable7' ]
is that what you wanted?

I guess the correct way was
mytest.'k'.find{ it.['#name']=="constname1"}
Did similar for some other xml Link -->GPath Groovy

Related

Parse nested XML file and get the value of previous tag in python

I have a huge nested .xml file with a lots of entries. What I need is to find a previous value if I know the child ID.
Extraction of my xml file:
<?xml version="1.0"?>
<nodes>
<node>
<node_id>0x2D</node_id>
<num_1>11</num_1>
<num_2>905.908</num_2>
<signs>
<sign>
<sign_id>30</sign_id>
<name>INDEX_0</name>
<size_b>842069</size_b>
<content>
<models>
<model>1_x</model>
<model>2_x</model>
<model>3_x</model>
<model>4_x</model>
</models>
<images>
<image>
<value>VALUE1</value>
<folder_ids>
<folder_id>012345678</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE2</value>
<folder_ids>
<folder_id>1235365454</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE3</value>
<folder_ids>
<folder_id>3562377456</folder_id>
<folder_id>3566743626</folder_id>
<folder_id>012345678</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE4</value>
<folder_ids>
<folder_id>34627876</folder_id>
</folder_ids>
</image>
<image>
.
.
.
So for example if I need to find all values that contain 012345678 folder_id.
I tried to use lxml library.
Simple code:
from lxml import etree
tree = etree.parse('D:\\test_nested_xml.xml')
#root = etree.Element("root")
for element in tree.iter(tag="folder_id"):
if element.text == '012345678':
print("%s - %s" % (element.text, element.getparent))
But in output I get following entries:
012345678 - <bound method _Element.getparent of <Element folder_id at 0x2cf2648>>
012345678 - <bound method _Element.getparent of <Element folder_id at 0x2cf2620>>
And it is not what I need.
Expected result for me is something like:
012345678 - VALUE1
012345678 - VALUE3
Could someone help me how to correctly parse xml file and get what I need?
You're currently printing the method itself.
print("%s - %s" % (element.text, element.getparent))
If you want to see what the method returns, you need to call it.
print("%s - %s" % (element.text, element.getparent()))
You can also use XPath to select the desired values in one step:
search_id = '012345678'
for value in tree.xpath(f"//image[folder_ids/folder_id='{search_id}']/value"):
print(value.text)

How to use getElementsByTagName for specific node in Minidom

My XML looks like this
<TOPIC>
<LIST>
<Area>JKH</Area>
<USED>
<type id='123' />
<type id='345' />
</USED>
<DEMAND>
<type id='809' />
<type id='321' />
</DEMAND>
<CLOSED>
<type id='456' />
<type id='765' />
</CLOSED>
</LIST>
</TOPIC>
Here i want to print only the id under <DEMAND>. i have tried the below code.
from xml.dom import minidom
root=minidom.parse('sample.xml')
tag=root.getElementsByTagName('type')
for i in tag:
print(i.getAttribute("id"))
But this is printing all the id values like below.
123
345
809
321
456
765
How can i get only 809 & 321 that are under <DEMAND> tag. I can give path in ElementTree but not sure how to give in getElementsByTagName ? Is it even possible in Minidom?
for demand in root.getElementsByTagName('DEMAND'):
for tp in demand.getElementsByTagName('type'):
print(tp.getAttribute("id"))

How to get parents and grand parents tags given specific attribute in XML in python?

I have an xml with a structure like this one:
<cat>
<foo>
<fooID>1</fooID>
<fooName>One</fooName>
<bar>
<barID>a</barID>
<barName>small_a</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="x" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
<bar>
<barID>b</barID>
<barName>small_b</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="y" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
</foo>
<foo>
<fooID>2</fooID>
<fooName>Two</fooName>
<bar>
<barID>c</barID>
<barName>small_c</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="z" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
</foo>
</cat>
And, I would like to obtain the values of specific parent/grand parent/grand grand parent tags that have a node with attribute Channel="High". I would like to obtain only fooID value, fooName value, barID value, barName value.
I have the following code in Python 3:
import xml.etree.ElementTree as xmlET
root = xmlET.parse('file.xml').getroot()
test = root.findall(".//*[#Channel='High']")
Which is actually giving me a list of elements that match, however, I still need the information of the specific parents/grand parents/grand grand parents.
How could I do that?
fooID | fooName | barID | barName
- - - - - - - - - - - - - - - - -
1 | One | a | small_a <-- This is the information I'm interested
1 | One | b | small_b <-- Also this
2 | Two | c | small_c <-- And this
Edit: fooID and fooName nodes are siblings of the grand-grand-parent bar, the one that contains the Channel="High". It's almost the same case for barID and barName, they are siblings of the grand-parent barClass, the one that contains the Channel="High". Also, what I want to obtain is the values 1, One, a and small_a, not filtering by it, since there will be multiple foo blocks.
If I understand you correctly, you are probably looking for something like this (using python):
from lxml import etree
foos = """[your xml above]"""
items = []
for entry in doc.xpath('//foo[.//corgeReportRes[#Channel="High"]]'):
items.append(entry.xpath('./fooID/text()')[0])
items.append(entry.xpath('./fooName/text()')[0])
items.append(entry.xpath('./bar/barID/text()')[0])
items.append(entry.xpath('./bar/barName/text()')[0])
print('fooID | fooName | barID | barName')
print(' | '.join(items))
Output:
fooID | fooName | barID | barName
1 | One | a | small_a

Python: Getting parent attribute from child attribute in xml

I have an XML area.xml
<area>
<controls>
<internal>yes</internal>
</controls>
<schools>
<school id="001"/>
<time>2020-05-18T14:21:00Z</time>
<venture index="5">
<venture>
<basicData type="class">
<wage numberOfDollars="13" Correction="4.61">
<tax>70</tax>
</wage>
</basicData>
</venture>
</venture>
<venture index="9">
<venture>
<basicData type="class">
<wage numberOfDollars="13" Correction="5.61">
<tax>70</tax>
</wage>
</basicData>
</venture>
</venture>
<school id="056"/>
<time>2020-05-18T14:21:00Z</time>
<venture index="5">
<venture>
<basicData type="class">
<wage numberOfDollars="13">
<tax>70</tax>
</wage>
</basicData>
</venture>
</venture>
<venture index="9">
<venture>
<basicData type="class">
<wage numberOfDollars="13">
<tax>70</tax>
</wage>
</basicData>
</venture>
</venture>
</schools>
What i am trying to achieve with Python: in a school node there are multiple wage nodes(leaves). if a wage node(leave)(1 or more) has an attribute called Correction i want the attribute value of the school node.
So the outcome of my script should be: 001 because this school has the attribute Correction in the wage node(leave)
First i tried it using ETree
import xml.etree.ElementTree as ET
data_file = 'area.xml'
tree = ET.parse(data_file)
root = tree.getroot()
t1 = "school"
t2 = "wage"
for e1, e2 in zip(root.iter(t1), root.iter(t2)):
if hasattr(e2,'Correction'):
e2.Correction
print (e1.attrib['id'])
but that didn't work. Now I am trying to reach my goal using minidom
but I find it quite hard.
This is my code so far:
from xml.dom import minidom
doc = minidom.parse("area.xml")
staffs = doc.getElementsByTagName("wage")
for wage in staffs:
sid = wage.getAttribute("Correction")
print("wage:%s" %
(sid))
the output gives all values of the wage attribute Correction:
wage:4.61
wage:5.61
wage:
wage:
Which is obviously far from correct.
i could use some help getting me in the right direction
i am using python 3
thank you in advance
in a school node there are multiple wage nodes
Not really. The school elements are empty. The venture siblings have the wage descendants. Since wage is not a descendant of school, this makes it a little tricky to select the corresponding school.
If you can use lxml you could use XPath to select the wage elements that have a Correction attribute and then select the first preceding school element and get its id attribute...
from lxml import etree
tree = etree.parse("area.xml")
schools_with_corrected_wages = set()
for corrected_wage in tree.xpath(".//wage[#Correction]"):
schools_with_corrected_wages.add(corrected_wage.xpath("preceding::school[1]/#id")[0])
print(schools_with_corrected_wages)
This prints:
{'001'}
You could also use lxml to process the XML with XSLT...
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:key name="corrected_wage_by_school" match="wage[#Correction]" use="preceding::school[1]/#id"/>
<xsl:template match="/">
<xsl:for-each select="//school[key('corrected_wage_by_school',#id)]">
<xsl:value-of select="concat(#id,'
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Python
from lxml import etree
tree = etree.parse("area.xml")
xslt = etree.parse("test.xsl")
result = tree.xslt(xslt)
print(result)
This prints...
001
Here's a less clever way.
from simplified_scrapy import SimplifiedDoc, req, utils
html = utils.getFileContent("area.xml")
doc = SimplifiedDoc(html)
schools = doc.selects('school') # Get all schools
n = len(schools)
i = 0
while i < n - 1:
school = schools[i]
school1 = schools[i + 1]
h = doc.html[school._end:school1._start] # Get data between two schools
staffs = doc.getElementsByReg(' Correction="', tag='wage', html=h)
if staffs:
print(school.id, staffs.Correction)
i += 1
last = schools[n - 1]
h = doc.html[last._end:]
staffs = doc.getElementsByReg(' Correction="', tag='wage', html=h)
if staffs:
print(last.id, staffs.Correction)
Result:
001 ['4.61', '5.61']

Groovy - Use XmlSlurper with a dynamic path

Is it possible to access a node of Xml using an arbitary path?
Eg: Given the xml:
<records>
<bike name='Chopper' />
<car name='HSV Maloo' make='Holden' year='2006'>
<country>Australia</country>
<record type='speed'>Production Pickup Truck with speed of 271kph</record>
</car>
<car name='P50' make='Peel' year='1962'>
<country>Isle of Man</country>
<record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record>
</car>
</records>
How do I access the contents of the xml using an arbitrary path, provided as a string -- eg:
XmlSlurper xml = new XmlSlurper.parse(theXml)
assert xml['bike.#name'] == 'Chopper'
assert xml['car[0].country'] == 'Australia'
One method is to use the Eval.x static method to evaluate a String;
def xml = '''| <records>
| <bike name='Chopper' />
| <car name='HSV Maloo' make='Holden' year='2006'>
| <country>Australia</country>
| <record type='speed'>Production Pickup Truck with speed of 271kph</record>
| </car>
| <car name='P50' make='Peel' year='1962'>
| <country>Isle of Man</country>
| <record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record>
| </car>
| </records>'''.stripMargin()
// Make our GPathResult
def slurper = new XmlSlurper().parseText( xml )
// Define our tests
def tests = [
[ query:'bike.#name', expected:'Chopper' ],
[ query:'car[0].country', expected:'Australia' ]
]
// For each test
tests.each { test ->
// assert that we get the expected result
assert Eval.x( slurper, "x.$test.query" ) == test.expected
}

Resources