Capture parent node id when unique values found in combination - python-3.x

I am looking to parse through this API response, :
<export_response xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://cakemarketing.com/api/4/">
<success>true</success>
<row_count>3</row_count>
<fruits>
<fruit>
<fruit_id>178</fruit_id>
<filters>
<filter>
<filter_id>231</filter_id>
<filter_type>
<filter_type_id>70</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>CA|NY|AZ</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
<filter>
<filter_id>237</filter_id>
<filter_type>
<filter_type_id>90</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>Poor < 5</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
</filters>
</fruit>
<fruit>
<fruit_id>178</fruit_id>
<filters>
<filter>
<filter_id>231</filter_id>
<filter_type>
<filter_type_id>70</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>CA|NY|AZ</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
<filter>
<filter_id>237</filter_id>
<filter_type>
<filter_type_id>90</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>Poor < 5</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
</filters>
</fruit>
<fruit>
<fruit_id>178</fruit_id>
<filters>
<filter>
<filter_id>231</filter_id>
<filter_type>
<filter_type_id>70</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>CA|NY|AZ</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
<filter>
<filter_id>237</filter_id>
<filter_type>
<filter_type_id>90</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string xsi:nil="true"/>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
</filters>
</fruit>
</fruits>
</export_response>
The goal is to find the fruid_id when <param_string> inside equals 'Poor < 5' and 'CA|NY|AZ'. This should return fruit_id 176 and 178.
I've tried using findall() as well as .find(text='CA|NY|AZ Poor < 5') and I am not able to locate the correct fruit_ids.
Any sugguestions are welcome, thank you in advanced.

With bs4 4.7.1+ you can use a combination of :has and :contains, along with use of an adjacent sibling combinator (+) to specify the request pattern. By joining the patterns for both search terms with AND syntax, you select for only where both present. I assume you wanted the unique ids so used a set
from bs4 import BeautifulSoup as bs
xml = '''your xml goes here'''
soup = bs(xml, 'lxml')
unique_ids = {i.text for i in soup.select('fruit_id:has(+ filters param_string:contains("Poor < 5")):has(+ filters param_string:contains("CA|NY|AZ"))')}
print(unique_ids)

You don't actually have a 176, but nonetheless here's a way to capture the id and parameters to list, then you can test the list elements and operate on them as needed.
# data = your xml from above
soup = BeautifulSoup(data, "lxml")
for f in soup.find_all('fruit'):
test_list = []
test_list.append(f.fruit_id.text)
for ps in f.find_all('filter'):
test_list.append(ps.find('param_string').text)
print(test_list)
# do your validations here.....after each iteration
produces:
['178', 'CA|NY|AZ', 'Poor < 5']
['178', 'CA|NY|AZ', 'Poor < 5']
['178', 'CA|NY|AZ', '']

Related

XSLT Can't Read an Excel XML File?

I'm using XSLT / XPath to browse some of the XML files you get when you unzip an Excel file. I found a "relationships" file workbook.xml.rels that I don't seem to be able to read, using code similar to that which successfully read the workbook.xml file.
Here's some of the workbook.xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
...
<sheets>
<sheet name="Sheet1"
sheetId="2"
r:id="rId1"/>
<sheet name="Test Sheet"
sheetId="1"
r:id="rId2"/>
</sheets>
...
</workbook>
Here's the workbook.xml.rels file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId3"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme"
Target="theme/theme1.xml"/>
<Relationship Id="rId2"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"
Target="worksheets/sheet2.xml"/>
<Relationship Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"
Target="worksheets/sheet1.xml"/>
<Relationship Id="rId5"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings"
Target="sharedStrings.xml"/>
<Relationship Id="rId4"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"
Target="styles.xml"/>
</Relationships>
Here's some of the XSLT:
<?xml version="1.0"?>
<!-- greeting.xsl -->
<xsl:stylesheet
...
<xsl:output method="text"/>
<xsl:variable name="baseDir" select="replace(document-uri(.), '(.*[\\/]xl).*', '$1/')"/>
<xsl:variable name="workbook" select="concat($baseDir, 'workbook.xml')"/>
<xsl:variable name="theSheetId" select="doc($workbook)/workbook/sheets/sheet[matches(#name, 'Test Sheet')]/#r:id"/>
<xsl:variable name="workbook_rels" select="concat($baseDir, '_rels/workbook.xml.rels')"/>
<!-- code to read workbook.xml.rels -->
<xsl:variable name="theSheet" select="doc($workbook_rels)/Relationships/Relationship[matches(#Id, $theSheetId)]/#Target"/>
<xsl:template match="/">
<xsl:text>
baseDir = </xsl:text><xsl:value-of select="$baseDir"/>
<xsl:text>
workbook = </xsl:text><xsl:value-of select="$workbook"/>
<xsl:text>
workbook_rels = </xsl:text><xsl:value-of select="$workbook_rels"/>
<xsl:text>
theSheetId = </xsl:text><xsl:value-of select="$theSheetId"/>
<xsl:text>
theSheet = </xsl:text><xsl:value-of select="$theSheet"/>
<xsl:text>
end</xsl:text>
</xsl:template>
</xsl:stylesheet>
And the output:
baseDir = file:/C:/Training/sandbox/conv_/xl/
workbook = file:/C:/Training/sandbox/conv_/xl/workbook.xml
workbook_rels = file:/C:/Training/sandbox/conv_/xl/_rels/workbook.xml.rels
theSheetId = rId2
theSheet = **<I get nothing here>**
end
You can see that 'theSheetID' variable is correctly set when reading workbook.xml. But when I use that variable to get the corresponding Target value into 'theSheet' variable from workbook.xml.rels, I get nothing. I tried replacing the matches expression with just a number but I still get nothing. Is there a problem from reading this type of file?
Suggestions? Thanks!
The use of matches and replace suggests you are using an XSLT 2 or 3 processor and that way XSLT 2 or 3 where you can certainly declare xpath-default-namespace, you just have to understand you have to change that in the sections that deal with elements from a different namespace e.g. <xsl:variable name="theSheet" select="doc($workbook_rels)/Relationships/Relationship[matches(#Id, $theSheetId)]/#Target" xpath-default-namespace="http://schemas.openxmlformats.org/package/2006/relationships"/>.
Given the samples I would rather use a key <xsl:key name="rel" match="Relationships/Relationship" use="#Id" xpath-default-namespace="http://schemas.openxmlformats.org/package/2006/relationships"/> and then use <xsl:variable name="theSheet" select="key('rel,$theSheetId, doc($workbook_rels))/#Target"/> but the use of xpath-default-namespace to declare the relevant namespace when selecting elements from a particular document is probably what is missing in your XSLT.

merge xml elements in one element using xslt-3

<?xml version="1.0" encoding="UTF-8" ?>
<mr:collection
xmlns:mr="http://www.lc.gov/mr2/slim"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.lc.gov/mr2/slim http://www.lc.gov/standards/mrxml/schema/mr21slim.xsd">
<mr:rc>
<mr:dtf tg="2000" i1="1" i2=" ">
<mr:sbf cd="a">Christoph Kolumbus</mr:sbf>
<mr:sbf cd="d">John Diter</mr:sbf>
<mr:sbf cd="b">Julie Nat</mr:sbf>
<mr:sbf cd="f">Darius Milhaud</mr:sbf>
<mr:sbf cd="g">Erich kleiber</mr:sbf>
<mr:sbf cd="g">Franz Ludwig Horth</mr:sbf>
</mr:dtf>
<mr:dtf tg="3000" i1="1" i2=" ">
<mr:sbf cd="a">Christoph Kolumbus</mr:sbf>
<mr:sbf cd="d">Serg</mr:sbf>
<mr:sbf cd="b">Mak</mr:sbf>
<mr:sbf cd="f">DarMil</mr:sbf>
<mr:sbf cd="g">Erikl</mr:sbf>
<mr:sbf cd="g">LudHorth</mr:sbf>
</mr:dtf>
</mr:rc>
<mr:rc>
<mr:dtf tg="2000" i1="1" i2="0">
<mr:sbf cd="a">Chris Prante</mr:sbf>
<mr:sbf cd="e">"Chris Dietz"</mr:sbf>
</mr:dtf>
</mr:rc>
</mr:collection>
i need to create a new xml file, by merging the elements that have the value <mr:dtf tg="2000" (irrelevant of what follows in i1 and i2 values),and of course there are other elements in the input xmlfiles with different values,ie <mr:dtf tg="3000" in the following way: the new value of the new single element that will be created will consist of the value of cd a, then a space character and the value of cd b, then a space character, the : character,again a space character, then the value of cd e, then a space character, the / character,again a space character, then the value of cd f, then a space character, the ; character,again a space character, then the value of cd g,only if the above cd elements exist.
Desired output:
<O-PM xsi:schemaLocation="http://www.op.org/O/2.0/ http://www.op.org/O/2.0/O-PM.xsd">
<ListRcs>
<rc>
<mtdt>
<e:rc>
<d:title xml:lang="el">Christoph Kolumbus Julie Nat / Darius Milhaud ; Erich kleiber ; Franz Ludwig Horth</d:title>
</e:rc>
</mtdt>
</rc>
<rc>
<mtdt>
<e:rc>
<d:title xml:lang="el">Chris Prante : "Chris Dietz"</d:title>
</e:rc>
</mtdt>
</rc>
</ListRcs>
I have tried only with xsl:value-of select, getting not the result i need... is there a more clever - efficient way to do this? Thank you
It is kind of hard to cater for all options of missing or existing items, perhaps the following helps as it outputs certain separators only if the item itself exists; nevertheless I haven't got the exact output, hopefully you can adjust the code to your needs:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xpath-default-namespace="http://www.lc.gov/mr2/slim"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="collection">
<O-PM>
<ListRcs>
<xsl:apply-templates/>
</ListRcs>
</O-PM>
</xsl:template>
<xsl:template match="rc">
<rc>
<xsl:apply-templates/>
</rc>
</xsl:template>
<xsl:template match="dtf[#tg = 2000]">
<mdtd>
<rc>
<title xml:lang="el">{sbf[#cd = 'a']} {sbf[#cd = 'b']}{sbf[#cd = 'e']!(':', .)} {sbf[#cd = 'f']!('/', .)}{(sbf[#cd = 'g'] => string-join(' ; '))!('', .)}</title>
</rc>
</mdtd>
</xsl:template>
<xsl:template match="dtf[#tg != 2000]"/>
</xsl:stylesheet>
Namespaces for the output were not declared so I have put everything in no namespace.
Example at https://xsltfiddle.liberty-development.net/pNEj9dR.

How to replace xml element from looping list on python?

I have a xml tree and I would like to replace the sub elements in the xml structure.
This is actual xml tree read from the file
xml_data = ET.parse('file1.xml')
<?xml version='1.0' encoding='UTF-8'?>
<call method="xxxx" callerName="xxx">
<credentials login="" password=""/>
<filters>
<accounts>
<account code="" ass="" can=""/>
</accounts>
</filters>
</call>
I'm expecting this format from looping the list
a = [1,23453, 3543,4354,3455, 6345]
<?xml version='1.0' encoding='UTF-8'?>
<call method="xxxx" callerName="xxx">
<credentials login="" password=""/>
<filters>
<accounts>
<account code="1" ass="34" can="yes"/>
<account code="23453" ass="34" can="yes"/>
<account code="3543" ass="34" can="yes"/>
<account code="4354" ass="34" can="yes"/>
<account code="3455" ass="34" can="yes"/>
<account code="6345" ass="34" can="yes"/>
</accounts>
</filters>
</call>
New to xml-parsing. Any help would be appreciated. Thanks in advance
I'm new in python, but it's work. I find only 1 bug: len of a must be equal len . It would be great if i help you. Good luck.
from xml.etree import ElementTree as ET
a = [1, 23453, 3543, 4354, 3455, 6345]
code = 0
xmlfile = "./log/logs.xml"
tree = ET.parse(xmlfile)
root = tree.getroot()
for filters in root.findall("filters"):
for accounts in filters.findall("accounts"):
for account in accounts.findall("account"):
attributes = account.attrib
attributes["code"] = str(a[code])
attributes["ass"] = "34"
attributes["can"] = "yes"
code += 1
tree.write(xmlfile)

move an element to another element or create a new one if it does not exist using xslt-3

using xslt 3, i need to take all content elements' values, and move them to the title elements (if the title elements already exist in a record, they need to be appended with a separator like -) i now have inputted my real data, since the below solution does not solve the problem when implemented to something like:
example input:
<data>
<RECORD ID="31365">
<no>25099</no>
<seq>0</seq>
<date>2/4/2012</date>
<ver>2/4/2012</ver>
<access>021999</access>
<col>GS</col>
<call>889</call>
<pr>0</pr>
<days>0</days>
<stat>0</stat>
<ch>0</ch>
<title>1 title</title>
<content>1 content</content>
<sj>1956</sj>
</RECORD>
<RECORD ID="31366">
<no>25100</no>
<seq>0</seq>
<date>2/4/2012</date>
<ver>2/4/2012</ver>
<access>022004</access>
<col>GS</col>
<call>8764</call>
<pr>0</pr>
<days>0</days>
<stat>0</stat>
<ch>0</ch>
<sj>1956</sj>
<content>1 title</content>
</RECORD>
</data>
expected output:
<data>
<RECORD ID="31365">
<no>25099</no>
<seq>0</seq>
<date>2/4/2012</date>
<ver>2/4/2012</ver>
<access>021999</access>
<col>GS</col>
<call>889</call>
<pr>0</pr>
<days>0</days>
<stat>0</stat>
<ch>0</ch>
<title>1 title - 1 content</title>
<sj>1956</sj>
</RECORD>
<RECORD ID="31366">
<no>25100</no>
<seq>0</seq>
<date>2/4/2012</date>
<ver>2/4/2012</ver>
<access>022004</access>
<col>ΓΣ</col>
<call>8764</call>
<pr>0</pr>
<days>0</days>
<stat>0</stat>
<ch>0</ch>
<sj>1956</sj>
<title>1 title</title>
</RECORD>
<data>
with my attempt, i did not manage to move the elements, i just got an empty line where the content element existed, so please add the removal of blank lines in the suggested solution.
i believe the removal of blank lines could be fixed with the use of
<xsl:template match="text()"/>
One way to achieve this is the following template. It uses XSLT-3.0 content value templates.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" expand-text="true">
<xsl:output method="xml" indent="yes" />
<xsl:mode on-no-match="shallow-copy" />
<xsl:strip-space elements="*" /> <!-- Remove space between elements -->
<xsl:template match="RECORD">
<xsl:copy>
<xsl:copy-of select="#*" />
<title>{title[1]}{if (title[1]) then ' - ' else ''}<xsl:value-of select="content" separator=" " /></title>
<xsl:apply-templates select="node() except (title,content)" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
It's output is as desired.
If you want to separate the <content> elements with a -, too, you can simplify the core <title> expression to
<xsl:value-of select="title|content" separator=" - " />
EDIT:
All I changed was replacing chapter with RECORD, and it's working fine with Saxon-HE 9.9.1.4J. The only difference in the output is that the title element is always at the first position, but that shouldn't matter. I also added a directive to remove space between elements.

How to get the required values from the below mentioned xml file?

1) i want to read below mentioned XML file and access the values, i already tried in many ways but not able to access, for example i want 'NightRaidPerformanceCPUScore' value and that is from which passIndex.
<?xml version='1.0' encoding='utf8'?>
<benchmark>
<results>
<result>
<name />
<description />
<passIndex>-1</passIndex>
<sourceId>C:\Users\dgadhipx\Documents\3DMark\3dmark-autosave-20200401155825.3dmark-result</sourceId>
<NightRaidPerformance3DMarkScore>2066</NightRaidPerformance3DMarkScore>
<NightRaidPerformanceCPUScore>1454</NightRaidPerformanceCPUScore>
<NightRaidPerformanceGraphicsScore>2233</NightRaidPerformanceGraphicsScore>
<benchmarkRunId>8045dec5-e97c-452b-abeb-54af187fd50a</benchmarkRunId>
</result>
<result>
<name />
<description />
<passIndex>0</passIndex>
<sourceId>C:\Users\dgadhipx\Documents\3DMark\3dmark-autosave-20200401155825.3dmark-result</sourceId>
<NightRaidPerformanceCPUScoreForPass>1454</NightRaidPerformanceCPUScoreForPass>
<NightRaidPerformance3DMarkScoreForPass>2066</NightRaidPerformance3DMarkScoreForPass>
<NightRaidPerformanceGraphicsScoreForPass>2233</NightRaidPerformanceGraphicsScoreForPass>
<NightRaidPerformanceGraphicsTest1>9.57</NightRaidPerformanceGraphicsTest1>
<NightRaidPerformanceGraphicsTest2>12.18</NightRaidPerformanceGraphicsTest2>
<NightRaidCpuP>395.2</NightRaidCpuP>
<benchmarkRunId>8045dec5-e97c-452b-abeb-54af187fd50a</benchmarkRunId>
</result>
</results>
</benchmark>
You can use BeautifulSoup as fellow:
with open(file_path, "r") as f:
content = f.read()
xml = BeautifulSoup(content, 'xml')
elements = xml.find_all("NightRaidPerformanceCPUScore")
for i in elements:
print(i.text)
That will print you the values of all "NightRaidPerformanceCPUScore" tags.

Resources