I have a xml tree and I would like to replace the sub elements in the xml structure.
This is actual xml tree read from the file
xml_data = ET.parse('file1.xml')
<?xml version='1.0' encoding='UTF-8'?>
<call method="xxxx" callerName="xxx">
<credentials login="" password=""/>
<filters>
<accounts>
<account code="" ass="" can=""/>
</accounts>
</filters>
</call>
I'm expecting this format from looping the list
a = [1,23453, 3543,4354,3455, 6345]
<?xml version='1.0' encoding='UTF-8'?>
<call method="xxxx" callerName="xxx">
<credentials login="" password=""/>
<filters>
<accounts>
<account code="1" ass="34" can="yes"/>
<account code="23453" ass="34" can="yes"/>
<account code="3543" ass="34" can="yes"/>
<account code="4354" ass="34" can="yes"/>
<account code="3455" ass="34" can="yes"/>
<account code="6345" ass="34" can="yes"/>
</accounts>
</filters>
</call>
New to xml-parsing. Any help would be appreciated. Thanks in advance
I'm new in python, but it's work. I find only 1 bug: len of a must be equal len . It would be great if i help you. Good luck.
from xml.etree import ElementTree as ET
a = [1, 23453, 3543, 4354, 3455, 6345]
code = 0
xmlfile = "./log/logs.xml"
tree = ET.parse(xmlfile)
root = tree.getroot()
for filters in root.findall("filters"):
for accounts in filters.findall("accounts"):
for account in accounts.findall("account"):
attributes = account.attrib
attributes["code"] = str(a[code])
attributes["ass"] = "34"
attributes["can"] = "yes"
code += 1
tree.write(xmlfile)
Related
I'm using XSLT / XPath to browse some of the XML files you get when you unzip an Excel file. I found a "relationships" file workbook.xml.rels that I don't seem to be able to read, using code similar to that which successfully read the workbook.xml file.
Here's some of the workbook.xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
...
<sheets>
<sheet name="Sheet1"
sheetId="2"
r:id="rId1"/>
<sheet name="Test Sheet"
sheetId="1"
r:id="rId2"/>
</sheets>
...
</workbook>
Here's the workbook.xml.rels file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId3"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme"
Target="theme/theme1.xml"/>
<Relationship Id="rId2"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"
Target="worksheets/sheet2.xml"/>
<Relationship Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"
Target="worksheets/sheet1.xml"/>
<Relationship Id="rId5"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings"
Target="sharedStrings.xml"/>
<Relationship Id="rId4"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"
Target="styles.xml"/>
</Relationships>
Here's some of the XSLT:
<?xml version="1.0"?>
<!-- greeting.xsl -->
<xsl:stylesheet
...
<xsl:output method="text"/>
<xsl:variable name="baseDir" select="replace(document-uri(.), '(.*[\\/]xl).*', '$1/')"/>
<xsl:variable name="workbook" select="concat($baseDir, 'workbook.xml')"/>
<xsl:variable name="theSheetId" select="doc($workbook)/workbook/sheets/sheet[matches(#name, 'Test Sheet')]/#r:id"/>
<xsl:variable name="workbook_rels" select="concat($baseDir, '_rels/workbook.xml.rels')"/>
<!-- code to read workbook.xml.rels -->
<xsl:variable name="theSheet" select="doc($workbook_rels)/Relationships/Relationship[matches(#Id, $theSheetId)]/#Target"/>
<xsl:template match="/">
<xsl:text>
baseDir = </xsl:text><xsl:value-of select="$baseDir"/>
<xsl:text>
workbook = </xsl:text><xsl:value-of select="$workbook"/>
<xsl:text>
workbook_rels = </xsl:text><xsl:value-of select="$workbook_rels"/>
<xsl:text>
theSheetId = </xsl:text><xsl:value-of select="$theSheetId"/>
<xsl:text>
theSheet = </xsl:text><xsl:value-of select="$theSheet"/>
<xsl:text>
end</xsl:text>
</xsl:template>
</xsl:stylesheet>
And the output:
baseDir = file:/C:/Training/sandbox/conv_/xl/
workbook = file:/C:/Training/sandbox/conv_/xl/workbook.xml
workbook_rels = file:/C:/Training/sandbox/conv_/xl/_rels/workbook.xml.rels
theSheetId = rId2
theSheet = **<I get nothing here>**
end
You can see that 'theSheetID' variable is correctly set when reading workbook.xml. But when I use that variable to get the corresponding Target value into 'theSheet' variable from workbook.xml.rels, I get nothing. I tried replacing the matches expression with just a number but I still get nothing. Is there a problem from reading this type of file?
Suggestions? Thanks!
The use of matches and replace suggests you are using an XSLT 2 or 3 processor and that way XSLT 2 or 3 where you can certainly declare xpath-default-namespace, you just have to understand you have to change that in the sections that deal with elements from a different namespace e.g. <xsl:variable name="theSheet" select="doc($workbook_rels)/Relationships/Relationship[matches(#Id, $theSheetId)]/#Target" xpath-default-namespace="http://schemas.openxmlformats.org/package/2006/relationships"/>.
Given the samples I would rather use a key <xsl:key name="rel" match="Relationships/Relationship" use="#Id" xpath-default-namespace="http://schemas.openxmlformats.org/package/2006/relationships"/> and then use <xsl:variable name="theSheet" select="key('rel,$theSheetId, doc($workbook_rels))/#Target"/> but the use of xpath-default-namespace to declare the relevant namespace when selecting elements from a particular document is probably what is missing in your XSLT.
I am looking to parse through this API response, :
<export_response xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://cakemarketing.com/api/4/">
<success>true</success>
<row_count>3</row_count>
<fruits>
<fruit>
<fruit_id>178</fruit_id>
<filters>
<filter>
<filter_id>231</filter_id>
<filter_type>
<filter_type_id>70</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>CA|NY|AZ</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
<filter>
<filter_id>237</filter_id>
<filter_type>
<filter_type_id>90</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>Poor < 5</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
</filters>
</fruit>
<fruit>
<fruit_id>178</fruit_id>
<filters>
<filter>
<filter_id>231</filter_id>
<filter_type>
<filter_type_id>70</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>CA|NY|AZ</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
<filter>
<filter_id>237</filter_id>
<filter_type>
<filter_type_id>90</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>Poor < 5</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
</filters>
</fruit>
<fruit>
<fruit_id>178</fruit_id>
<filters>
<filter>
<filter_id>231</filter_id>
<filter_type>
<filter_type_id>70</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string>CA|NY|AZ</param_string>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
<filter>
<filter_id>237</filter_id>
<filter_type>
<filter_type_id>90</filter_type_id>
</filter_type>
<param_number xsi:nil="true"/>
<param_string xsi:nil="true"/>
<param_date xsi:nil="true"/>
<param_bool xsi:nil="true"/>
</filter>
</filters>
</fruit>
</fruits>
</export_response>
The goal is to find the fruid_id when <param_string> inside equals 'Poor < 5' and 'CA|NY|AZ'. This should return fruit_id 176 and 178.
I've tried using findall() as well as .find(text='CA|NY|AZ Poor < 5') and I am not able to locate the correct fruit_ids.
Any sugguestions are welcome, thank you in advanced.
With bs4 4.7.1+ you can use a combination of :has and :contains, along with use of an adjacent sibling combinator (+) to specify the request pattern. By joining the patterns for both search terms with AND syntax, you select for only where both present. I assume you wanted the unique ids so used a set
from bs4 import BeautifulSoup as bs
xml = '''your xml goes here'''
soup = bs(xml, 'lxml')
unique_ids = {i.text for i in soup.select('fruit_id:has(+ filters param_string:contains("Poor < 5")):has(+ filters param_string:contains("CA|NY|AZ"))')}
print(unique_ids)
You don't actually have a 176, but nonetheless here's a way to capture the id and parameters to list, then you can test the list elements and operate on them as needed.
# data = your xml from above
soup = BeautifulSoup(data, "lxml")
for f in soup.find_all('fruit'):
test_list = []
test_list.append(f.fruit_id.text)
for ps in f.find_all('filter'):
test_list.append(ps.find('param_string').text)
print(test_list)
# do your validations here.....after each iteration
produces:
['178', 'CA|NY|AZ', 'Poor < 5']
['178', 'CA|NY|AZ', 'Poor < 5']
['178', 'CA|NY|AZ', '']
lets say this is my XML
<sky class="new">
<list name="school">
<p>63</p>
<p>62</p>
<p>61</p>
</list>
</sky>
And this is my values in list.
value = [51,56,87]
Now I what I need is:
<sky class="new">
<list name="school">
<p>51</p>
<p>56</p>
<p>87</p>
</list>
</sky>
So far this is what I did:
for i in soup.find_all('sky', {'class':'new'}):
k = i.find('list',{'name':'school'})
After this I am not getting what to do, could you help here?
EDIT1:
<sky class="new">
<list name="alpha">
<item>
<p unit="kg">63</p>
<p weight="wg">54</p>
</item>
<item>
<p unit="kg">57</p>
<p weight="wg">32</p>
</item>
</list>
</sky>
Another version:
from bs4 import BeautifulSoup
txt = '''<sky class="new">
<list name="school">
<p>63</p>
<p>62</p>
<p>61</p>
</list>
</sky>'''
soup = BeautifulSoup(txt, 'xml')
values = [51, 56, 87]
for p, new_value in zip(soup.select('sky.new > list[name="school"] > p'), values):
p.string = str(new_value)
print(soup)
Prints:
<?xml version="1.0" encoding="utf-8"?>
<sky class="new">
<list name="school">
<p>51</p>
<p>56</p>
<p>87</p>
</list>
</sky>
Try something like this:
targets = soup.select('p')
for target in targets:
repl = str(value[targets.index(target)])
target.string.replace_with(repl)
soup
Output:
<html><body><sky class="new">
<list name="school">
<p>51</p>
<p>56</p>
<p>87</p>
</list>
</sky></body></html>
1) i want to read below mentioned XML file and access the values, i already tried in many ways but not able to access, for example i want 'NightRaidPerformanceCPUScore' value and that is from which passIndex.
<?xml version='1.0' encoding='utf8'?>
<benchmark>
<results>
<result>
<name />
<description />
<passIndex>-1</passIndex>
<sourceId>C:\Users\dgadhipx\Documents\3DMark\3dmark-autosave-20200401155825.3dmark-result</sourceId>
<NightRaidPerformance3DMarkScore>2066</NightRaidPerformance3DMarkScore>
<NightRaidPerformanceCPUScore>1454</NightRaidPerformanceCPUScore>
<NightRaidPerformanceGraphicsScore>2233</NightRaidPerformanceGraphicsScore>
<benchmarkRunId>8045dec5-e97c-452b-abeb-54af187fd50a</benchmarkRunId>
</result>
<result>
<name />
<description />
<passIndex>0</passIndex>
<sourceId>C:\Users\dgadhipx\Documents\3DMark\3dmark-autosave-20200401155825.3dmark-result</sourceId>
<NightRaidPerformanceCPUScoreForPass>1454</NightRaidPerformanceCPUScoreForPass>
<NightRaidPerformance3DMarkScoreForPass>2066</NightRaidPerformance3DMarkScoreForPass>
<NightRaidPerformanceGraphicsScoreForPass>2233</NightRaidPerformanceGraphicsScoreForPass>
<NightRaidPerformanceGraphicsTest1>9.57</NightRaidPerformanceGraphicsTest1>
<NightRaidPerformanceGraphicsTest2>12.18</NightRaidPerformanceGraphicsTest2>
<NightRaidCpuP>395.2</NightRaidCpuP>
<benchmarkRunId>8045dec5-e97c-452b-abeb-54af187fd50a</benchmarkRunId>
</result>
</results>
</benchmark>
You can use BeautifulSoup as fellow:
with open(file_path, "r") as f:
content = f.read()
xml = BeautifulSoup(content, 'xml')
elements = xml.find_all("NightRaidPerformanceCPUScore")
for i in elements:
print(i.text)
That will print you the values of all "NightRaidPerformanceCPUScore" tags.
I want to parse a file which looks like this:
<item> <one-of> <item> deepa vats </item> <item> deepa <ruleref uri="#Dg-e_n_t41"/> </item> </one-of> <tag> out = "u-dvats"; </tag> </item>
<item> <one-of> <item> maitha al owais </item> <item> doctor maitha </item> <item> maitha <ruleref uri="#Dg-clinical_nutrition24"/> </item> </one-of> <tag> out = "u-mal_owais"; </tag> </item>
The result should be username:out for example:
deepa vats : u-dvats and maitha al owais : u-mal_owais
to extract the username i tried
print ([j for i,j in re.findall(r"(<item>)\s*(.*?)\s*(?!\1)(?:</item>)",line)])
if len(list1) != 0:
print(list1[0].split("<item>")[-1])
You can parse the xml with objectify from lxml.
To parse an XML string you could use objectify.fromstring(). Then you can use dot notation or square bracket notation to navigate through the element and use the text property to get the text inside the element. Like so:
item = objectify.fromstring(item_str)
item_text = item.itemchild['anotherchild'].otherchild.text
From there you can manipulate the string and format it.
In this case I can see that you want the text inside item >> one-of >> item and the text inside item >> tag. In order to get it we could do something like this:
>>> from lxml import objectify
>>> item_str = '<item> <one-of> <item> maitha al owais </item> <item> doctor maitha </item> <item> maitha <ruleref uri="#Dg-clinical_nutrition24"/> </item> </one-of> <tag> out = "u-mal_owais"; </tag> </item>'
>>> item = objectify.fromstring(item_str)
>>> item_text = item['one-of'].item.text
>>> tag_text = item['tag'].text
>>> item_text
' maitha al owais '
>>> tag_text
' out = "u-mal_owais"; '
Since python doesn't allow hyphens in variable names and since tag is a property of the objectify object you have to use bracket notation instead of dot notation in this case.
I suggest using BeautifulSoup:
import bs4
soup = bs4.BeautifulSoup(your_text, "lxml")
' '.join(x.strip() for x in soup.strings if x.strip())
#'deepa vats deepa out = "u-dvats"; maitha al owais doctor maitha maitha out = "u-mal_owais";'