Groovy xml selecting child node attribute based on parent mode attribute

Groovy xml selecting child node attribute based on parent mode attribute - groovy

I have an xml structure like this
<JJJ version="2.0" >
<Data >
<Object class="classX">
<k name="constname1">variable1</p>
<k name="constname2">variable2</p>
<k name="constname3">variable3</p>
</Object>
<Object class="classY">
<k name="constname1">variable11</p>
<k name="constname2">variable12</p>
<k name="constname3">variable13</p>
</Object>
I need to select ClassX node and in that value of atttribute containing constname1 (ie variable1
def parser = new XmlSlurper()
def mymo=records.'Data'.'Object';
def mytest = mymo.findAll{it.#class.text()=="ClassX"};
mytest.each{
it.'p'.each{
println it.#name.text() +'-'+ it.text() }
}
This is working. But instead of then comapring text in the loop I need to do something like this
def testme= mytest.'k'.find{ it.#name.text()=="constname1"}
This I am not getting right
However the below is right
println mymo.'k'.find{it.#name.text()=="constname1"}
But I want to restrict selection to the node for classX

Correcting your xml so it's valid, gives:
def xml = '''<JJJ version="2.0" >
| <Data >
| <Object class="classX">
| <k name="constname1">variable1</k>
| <k name="constname2">variable2</k>
| <k name="constname3">variable3</k>
| </Object>
| <Object class="classY">
| <k name="constname1">variable14</k>
| <k name="constname2">variable15</k>
| <k name="constname3">variable16</k>
| </Object>
| <Object class="classX">
| <k name="constname1">variable7</k>
| <k name="constname2">variable8</k>
| <k name="constname3">variable9</k>
| </Object>
| </Data>
|</JJJ>'''.stripMargin()
List var = new XmlSlurper().parseText( xml ).'**'.grep {
it.#name == 'constname1' && it.parent().#class == 'classX'
}
assert var == [ 'variable1', 'variable7' ]
is that what you wanted?

I guess the correct way was
mytest.'k'.find{ it.['#name']=="constname1"}
Did similar for some other xml Link -->GPath Groovy

Related

Parse nested XML file and get the value of previous tag in python

I have a huge nested .xml file with a lots of entries. What I need is to find a previous value if I know the child ID.
Extraction of my xml file:
<?xml version="1.0"?>
<nodes>
<node>
<node_id>0x2D</node_id>
<num_1>11</num_1>
<num_2>905.908</num_2>
<signs>
<sign>
<sign_id>30</sign_id>
<name>INDEX_0</name>
<size_b>842069</size_b>
<content>
<models>
<model>1_x</model>
<model>2_x</model>
<model>3_x</model>
<model>4_x</model>
</models>
<images>
<image>
<value>VALUE1</value>
<folder_ids>
<folder_id>012345678</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE2</value>
<folder_ids>
<folder_id>1235365454</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE3</value>
<folder_ids>
<folder_id>3562377456</folder_id>
<folder_id>3566743626</folder_id>
<folder_id>012345678</folder_id>
</folder_ids>
</image>
<image>
<value>VALUE4</value>
<folder_ids>
<folder_id>34627876</folder_id>
</folder_ids>
</image>
<image>
.
.
.
So for example if I need to find all values that contain 012345678 folder_id.
I tried to use lxml library.
Simple code:
from lxml import etree
tree = etree.parse('D:\\test_nested_xml.xml')
#root = etree.Element("root")
for element in tree.iter(tag="folder_id"):
if element.text == '012345678':
print("%s - %s" % (element.text, element.getparent))
But in output I get following entries:
012345678 - <bound method _Element.getparent of <Element folder_id at 0x2cf2648>>
012345678 - <bound method _Element.getparent of <Element folder_id at 0x2cf2620>>
And it is not what I need.
Expected result for me is something like:
012345678 - VALUE1
012345678 - VALUE3
Could someone help me how to correctly parse xml file and get what I need?

You're currently printing the method itself.
print("%s - %s" % (element.text, element.getparent))
If you want to see what the method returns, you need to call it.
print("%s - %s" % (element.text, element.getparent()))
You can also use XPath to select the desired values in one step:
search_id = '012345678'
for value in tree.xpath(f"//image[folder_ids/folder_id='{search_id}']/value"):
print(value.text)

How to use getElementsByTagName for specific node in Minidom

My XML looks like this
<TOPIC>
<LIST>
<Area>JKH</Area>
<USED>
<type id='123' />
<type id='345' />
</USED>
<DEMAND>
<type id='809' />
<type id='321' />
</DEMAND>
<CLOSED>
<type id='456' />
<type id='765' />
</CLOSED>
</LIST>
</TOPIC>
Here i want to print only the id under <DEMAND>. i have tried the below code.
from xml.dom import minidom
root=minidom.parse('sample.xml')
tag=root.getElementsByTagName('type')
for i in tag:
print(i.getAttribute("id"))
But this is printing all the id values like below.
123
345
809
321
456
765
How can i get only 809 & 321 that are under <DEMAND> tag. I can give path in ElementTree but not sure how to give in getElementsByTagName ? Is it even possible in Minidom?

for demand in root.getElementsByTagName('DEMAND'):
for tp in demand.getElementsByTagName('type'):
print(tp.getAttribute("id"))

How to get parents and grand parents tags given specific attribute in XML in python?

I have an xml with a structure like this one:
<cat>
<foo>
<fooID>1</fooID>
<fooName>One</fooName>
<bar>
<barID>a</barID>
<barName>small_a</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="x" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
<bar>
<barID>b</barID>
<barName>small_b</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="y" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
</foo>
<foo>
<fooID>2</fooID>
<fooName>Two</fooName>
<bar>
<barID>c</barID>
<barName>small_c</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="z" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
</foo>
</cat>
And, I would like to obtain the values of specific parent/grand parent/grand grand parent tags that have a node with attribute Channel="High". I would like to obtain only fooID value, fooName value, barID value, barName value.
I have the following code in Python 3:
import xml.etree.ElementTree as xmlET
root = xmlET.parse('file.xml').getroot()
test = root.findall(".//*[#Channel='High']")
Which is actually giving me a list of elements that match, however, I still need the information of the specific parents/grand parents/grand grand parents.
How could I do that?
fooID | fooName | barID | barName
- - - - - - - - - - - - - - - - -
1 | One | a | small_a <-- This is the information I'm interested
1 | One | b | small_b <-- Also this
2 | Two | c | small_c <-- And this
Edit: fooID and fooName nodes are siblings of the grand-grand-parent bar, the one that contains the Channel="High". It's almost the same case for barID and barName, they are siblings of the grand-parent barClass, the one that contains the Channel="High". Also, what I want to obtain is the values 1, One, a and small_a, not filtering by it, since there will be multiple foo blocks.

If I understand you correctly, you are probably looking for something like this (using python):
from lxml import etree
foos = """[your xml above]"""
items = []
for entry in doc.xpath('//foo[.//corgeReportRes[#Channel="High"]]'):
items.append(entry.xpath('./fooID/text()')[0])
items.append(entry.xpath('./fooName/text()')[0])
items.append(entry.xpath('./bar/barID/text()')[0])
items.append(entry.xpath('./bar/barName/text()')[0])
print('fooID | fooName | barID | barName')
print(' | '.join(items))
Output:
fooID | fooName | barID | barName
1 | One | a | small_a

Python: Getting parent attribute from child attribute in xml

I have an XML area.xml
<area>
<controls>
<internal>yes</internal>
</controls>
<schools>
<school id="001"/>
<time>2020-05-18T14:21:00Z</time>
<venture index="5">
<venture>
<basicData type="class">
<wage numberOfDollars="13" Correction="4.61">
<tax>70</tax>
</wage>
</basicData>
</venture>
</venture>
<venture index="9">
<venture>
<basicData type="class">
<wage numberOfDollars="13" Correction="5.61">
<tax>70</tax>
</wage>
</basicData>
</venture>
</venture>
<school id="056"/>
<time>2020-05-18T14:21:00Z</time>
<venture index="5">
<venture>
<basicData type="class">
<wage numberOfDollars="13">
<tax>70</tax>
</wage>
</basicData>
</venture>
</venture>
<venture index="9">
<venture>
<basicData type="class">
<wage numberOfDollars="13">
<tax>70</tax>
</wage>
</basicData>
</venture>
</venture>
</schools>
What i am trying to achieve with Python: in a school node there are multiple wage nodes(leaves). if a wage node(leave)(1 or more) has an attribute called Correction i want the attribute value of the school node.
So the outcome of my script should be: 001 because this school has the attribute Correction in the wage node(leave)
First i tried it using ETree
import xml.etree.ElementTree as ET
data_file = 'area.xml'
tree = ET.parse(data_file)
root = tree.getroot()
t1 = "school"
t2 = "wage"
for e1, e2 in zip(root.iter(t1), root.iter(t2)):
if hasattr(e2,'Correction'):
e2.Correction
print (e1.attrib['id'])
but that didn't work. Now I am trying to reach my goal using minidom
but I find it quite hard.
This is my code so far:
from xml.dom import minidom
doc = minidom.parse("area.xml")
staffs = doc.getElementsByTagName("wage")
for wage in staffs:
sid = wage.getAttribute("Correction")
print("wage:%s" %
(sid))
the output gives all values of the wage attribute Correction:
wage:4.61
wage:5.61
wage:
wage:
Which is obviously far from correct.
i could use some help getting me in the right direction
i am using python 3
thank you in advance

in a school node there are multiple wage nodes
Not really. The school elements are empty. The venture siblings have the wage descendants. Since wage is not a descendant of school, this makes it a little tricky to select the corresponding school.
If you can use lxml you could use XPath to select the wage elements that have a Correction attribute and then select the first preceding school element and get its id attribute...
from lxml import etree
tree = etree.parse("area.xml")
schools_with_corrected_wages = set()
for corrected_wage in tree.xpath(".//wage[#Correction]"):
schools_with_corrected_wages.add(corrected_wage.xpath("preceding::school[1]/#id")[0])
print(schools_with_corrected_wages)
This prints:
{'001'}
You could also use lxml to process the XML with XSLT...
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:key name="corrected_wage_by_school" match="wage[#Correction]" use="preceding::school[1]/#id"/>
<xsl:template match="/">
<xsl:for-each select="//school[key('corrected_wage_by_school',#id)]">
<xsl:value-of select="concat(#id,'
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Python
from lxml import etree
tree = etree.parse("area.xml")
xslt = etree.parse("test.xsl")
result = tree.xslt(xslt)
print(result)
This prints...
001

Here's a less clever way.
from simplified_scrapy import SimplifiedDoc, req, utils
html = utils.getFileContent("area.xml")
doc = SimplifiedDoc(html)
schools = doc.selects('school') # Get all schools
n = len(schools)
i = 0
while i < n - 1:
school = schools[i]
school1 = schools[i + 1]
h = doc.html[school._end:school1._start] # Get data between two schools
staffs = doc.getElementsByReg(' Correction="', tag='wage', html=h)
if staffs:
print(school.id, staffs.Correction)
i += 1
last = schools[n - 1]
h = doc.html[last._end:]
staffs = doc.getElementsByReg(' Correction="', tag='wage', html=h)
if staffs:
print(last.id, staffs.Correction)
Result:
001 ['4.61', '5.61']

Groovy - Use XmlSlurper with a dynamic path

Is it possible to access a node of Xml using an arbitary path?
Eg: Given the xml:
<records>
<bike name='Chopper' />
<car name='HSV Maloo' make='Holden' year='2006'>
<country>Australia</country>
<record type='speed'>Production Pickup Truck with speed of 271kph</record>
</car>
<car name='P50' make='Peel' year='1962'>
<country>Isle of Man</country>
<record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record>
</car>
</records>
How do I access the contents of the xml using an arbitrary path, provided as a string -- eg:
XmlSlurper xml = new XmlSlurper.parse(theXml)
assert xml['bike.#name'] == 'Chopper'
assert xml['car[0].country'] == 'Australia'

One method is to use the Eval.x static method to evaluate a String;
def xml = '''| <records>
| <bike name='Chopper' />
| <car name='HSV Maloo' make='Holden' year='2006'>
| <country>Australia</country>
| <record type='speed'>Production Pickup Truck with speed of 271kph</record>
| </car>
| <car name='P50' make='Peel' year='1962'>
| <country>Isle of Man</country>
| <record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record>
| </car>
| </records>'''.stripMargin()
// Make our GPathResult
def slurper = new XmlSlurper().parseText( xml )
// Define our tests
def tests = [
[ query:'bike.#name', expected:'Chopper' ],
[ query:'car[0].country', expected:'Australia' ]
]
// For each test
tests.each { test ->
// assert that we get the expected result
assert Eval.x( slurper, "x.$test.query" ) == test.expected
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Groovy xml selecting child node attribute based on parent mode attribute - groovy

I guess the correct way was mytest.'k'.find{ it.['#name']=="constname1"} Did similar for some other xml Link -->GPath Groovy

Related

Parse nested XML file and get the value of previous tag in python

How to use getElementsByTagName for specific node in Minidom

How to get parents and grand parents tags given specific attribute in XML in python?

Python: Getting parent attribute from child attribute in xml

Groovy - Use XmlSlurper with a dynamic path

Categories

Resources