How to access xml field with lxml?

How to access xml field with lxml? - python-3.x

Python 3.6, Lxml, Windows 10
I am getting crazy. I want to access the item field. But I always get the error:
AttributeError: 'cython_function_or_method' object has no attribute'item'
Everything else (address fields etc...) I can access without problems. How can I access the item fields (sku, amount etc...)?
I've used this code:
import requests
from lxml import objectify
url = "URL_TO_XML_FILE"
xml_content = requests.get(url).text.encode('utf-8')
xml = objectify.fromstring(xml_content)
for sale in xml.response.sales.sale:
for item in sale.items.item:
print(item.sku)
Here is the beginning of the xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<getnewsalesresult xmlns="https://pmcdn.priceminister.com/res/schema/getnewsales">
<request>
<version>2017-08-07</version>
<user>SELLER</user>
</request>
<response>
<lastversion>2017-08-07</lastversion>
<sellerid>95029358</sellerid>
<sales>
<sale>
<purchaseid>297453287592813953</purchaseid>
<purchasedate>15/12/2018-19:10</purchasedate>
<deliveryinformation>
<shippingtype>Normal</shippingtype>
<isfullrsl>N</isfullrsl>
<purchasebuyerlogin><![CDATA[LOGIN]]></purchasebuyerlogin>
<purchasebuyeremail>EMAIL</purchasebuyeremail>
<deliveryaddress>
<civility>Mme</civility>
<lastname><![CDATA[Lastname]]></lastname>
<firstname><![CDATA[Firstname]]></firstname>
<address1><![CDATA[STREET]]></address1>
<address2><![CDATA[]]></address2>
<zipcode>13570</zipcode>
<city><![CDATA[Paris]]></city>
<country><![CDATA[France]]></country>
<countryalpha2>FX</countryalpha2>
<phonenumber1></phonenumber1>
<phonenumber2>PHONENUMBER</phonenumber2>
</deliveryaddress>
</deliveryinformation>
<items>
<item>
<sku><![CDATA[SKU1]]></sku>
<advertid>411812243030</advertid>
<advertpricelisted>
<amount>15.99</amount>
<currency>EUR</currency>
</advertpricelisted>
<itemid>551131040</itemid>
<headline><![CDATA[HEADLINE]]></headline>
<itemstatus><![CDATA[REQUESTED]]></itemstatus>
<ispreorder>N</ispreorder>
<isnego>N</isnego>
<negotiationcomment></negotiationcomment>
<price>
<amount>15.99</amount>
<currency>EUR</currency>
</price>
<isrsl>N</isrsl>
<isbn></isbn>
<ean>4363745894373857474; </ean>
<paymentstatus><![CDATA[INCOMING]]></paymentstatus>
<sellerscore></sellerscore>
</item>
</items>
</sale>
<sale>

The problem is that items is actually a method of ObjectifiedElement, so the expression sale.items actually returns the method, because it has precedence.
To get the 'items' object you want, you have to be more explicit about getting the attribute of sale and not looking for methods of the class first, which is the usual python order. This is what python does behind the scene when you access an attribute, and you can do it too:
sale.__getattr__('items')
This will also work (it's a dictionary-like interface to the attributes of an object):
sale.__dict__['items']
The revised code:
import requests
from lxml import objectify
url = "URL_TO_XML_FILE"
xml_content = requests.get(url).text.encode('utf-8')
xml = objectify.fromstring(xml_content)
for sale in xml.response.sales.sale:
for item in sale.__dict__['items'].item:
print(item.sku)

Another way to deal with this is to avoid using the flaky attribute interface:
for sale in xml['response']['sales']['sale']:
for item in sale['items']['item']:
print(item['sku'])
Using the dict-like indexing interface, you never have to worry about certain attributes names (which includes such common words as items, index, keys, remove, replace, tag, set, text, and values) returning surprising results.

Related

How to resolve: XML schema created in Excel contains denormalized data

Edit/Update: By removing the <GrpHdr> element completely, Excel was able to verify the XML Map as exportable. My original question still remains, how can I solve the "Denormalized Data" error, with the <GrpHdr> included.
I am new to XML, and have been trying to import a source file (XML below) into Excel, create a schema/XML Map (unsure of the difference there) which I can then drag and drop onto two different tables:
One table contains one row of data for the Group Header: <GrpHdr> (Occurs ONCE)
One table contains multiple rows of data for the various Payments: <PmtInf> (Occurs MULTIPLE times)
I am able to successfully load the below XML into Excel using the Source button, and also to create an XML map off of it (which then appears in a "XML Source" window, showing the parent and child elements).
The problem I am having is in Verifying the XML Map for export. Excel says that the map contains "Denormalized Data". I have looked at various Microsoft resources, as well as on Stack Overflow.
Such as:
https://support.microsoft.com/en-us/office/issue-verifying-an-xml-map-for-export-fbfcdb77-c2d6-4040-b256-e584a71151b0
excel: Cannot save or export xml data. The xml map in this workbook are not exportable
Export denormalized data from excel to xml
Based on my research, I tried the following:
I have tried setting the MinOccurs and MaxOccurs attributes to be "0" and "unbounded" respectively, as I believe the default is "1" for both, and Excel's error saying that the XML Map contains "Denormalized Data" is due to having an element with the MaxOccurs set to "1".
I have also tried adding multiple <PmtInf> elements, so Excel knows (when creating a schema from the below sample file), that <PmtInf> is to occur multiple times.
Thanks!
<?xml version="1.0" encoding="utf-8"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>UNIQUE MESSAGE ID 35 AN</MsgId>
<CreDtTm>2016-05-26T10:07:00</CreDtTm>
<NbOfTxs>1</NbOfTxs>
<CtrlSum>0.01</CtrlSum>
<InitgPty>
<Id>
<OrgId>
<Othr>
<Id>ABC12345678</Id>
</Othr>
</OrgId>
</Id>
</InitgPty>
</GrpHdr>
<PmtInf>
<PmtInfId>ORIGINATOR REFERENCE 35AN</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
</PmtTpInf>
<ReqdExctnDt>2016-05-26</ReqdExctnDt>
<Dbtr>
<Nm>DEBTOR NAME 70AN</Nm>
<PstlAdr>
<StrtNm>Street Name</StrtNm>
<BldgNb>Building Number</BldgNb>
<PstCd>Post Code</PstCd>
<TwnNm>Town Name</TwnNm>
<CtrySubDvsn>County/State/Region</CtrySubDvsn>
<Ctry>LU</Ctry>
</PstlAdr>
</Dbtr>
<DbtrAcct>
<Id>
<IBAN>NL39HSBC0123456789</IBAN>
</Id>
</DbtrAcct>
<DbtrAgt>
<FinInstnId>
<BIC>HSBCNL2A</BIC>
<PstlAdr>
<Ctry>IE</Ctry>
</PstlAdr>
</FinInstnId>
</DbtrAgt>
<ChrgBr>SLEV</ChrgBr>
<CdtTrfTxInf>
<PmtId>
<InstrId>PAYMENT ID 35AN</InstrId>
<EndToEndId>UNIQUE BENEFICIARY REFERENCE 35AN</EndToEndId>
</PmtId>
<Amt>
<InstdAmt Ccy="EUR">0.01</InstdAmt>
</Amt>
<CdtrAgt>
<FinInstnId>
<BIC>MIDLGB22</BIC>
<PstlAdr>
<Ctry>GB</Ctry>
</PstlAdr>
</FinInstnId>
</CdtrAgt>
<Cdtr>
<Nm>CREDITOR NAME 70AN</Nm>
<PstlAdr>
<StrtNm>Street Name</StrtNm>
<BldgNb>Building Number</BldgNb>
<PstCd>Post Code</PstCd>
<TwnNm>Town Name</TwnNm>
<CtrySubDvsn>County/State/Region</CtrySubDvsn>
<Ctry>GB</Ctry>
</PstlAdr>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>GB94MIDL40123487654321</IBAN>
</Id>
</CdtrAcct>
<RmtInf>
<Ustrd>Remittance Info up to 140AN</Ustrd>
</RmtInf>
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

Python lxml.etree: how to add 'xml:lang="en-US"' as a namespace

I am trying to create a xml whose first element is:
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
</speak>
I am able to add the first attributes with...
from lxml.etree import Element, SubElement, QName, tostring
root = Element('speak', version="1.0",
xmlns="http://www.w3.org/2001/10/synthesis")
...but not the namespace xml:lang="en-US". Based on several tuto/question like this and this I tried many solutions but none worked.
For example, I tried this :
class XMLNamespaces:
xml = 'http://www.w3.org/2001/10/synthesis'
root.attrib[QName(XMLNamespaces.xml, 'lang')] = "en-US"
But the ouput is
<speak xmlns:ns0="http://www.w3.org/2001/10/synthesis" version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" ns0:lang="en-US">
How can I create the xml:lang="en-US" of my first xml element?

The special xml: prefix is associated with the http://www.w3.org/XML/1998/namespace URI.
The following code adds xml:lang="en-US" to the root element:
root.attrib[QName("http://www.w3.org/XML/1998/namespace", "lang")] = "en-US"

how to get child node value from a paren preferenced by another child value in Groovy readyAPI

I am trying to get values from a web service response in readyAPI, so i can pass it to another web service request, so i can create a automated test flow.
I have tried different code pieces most of them was a single line of code, which i prefer if it possible. I can take value from a node by typing the parent node by its attribute value. I also can get parent node by child nodes attribute value and use it to get another child value.
Here some examples:
First Format that I can use it to get childs value:
<webserviceResponse>
<documentslist>
<document #id="1">
<payment #currency="USD" >
<amount>1250.00</amount>
</payment>
</document>
<document #id="2">
<payment #currency="JPY" >
<amount>150.00</amount>
</payment>
</document>
<document #id="3">
<payment #currency="EUR" >
<amount>1170.00</amount>
</payment>
</document>
<!-- etc. -->
</documentslist>
-----> To get currency for a specific document
def webServiceResponse = "webservice#Response"
int index=2
def currency = context.expand('${'+webServiceResponse+'//*:document[#id="['+index+']"]//*:payment/#currency}')
-----> Result of this is "JPY"
<webserviceResponse>
<documentslist>
<document #id="1">
<payment #currency="USD" >
<amount>1250.00</amount>
</payment>
<refund>true</refund>
</document>
<document #id="2">
<payment #currency="JPY" >
<amount>150.00</amount>
</payment>
</document>
<document #id="3">
<payment #currency="EUR" >
<amount>1170.00</amount>
</payment>
<refund>false</refund>
</document>
<!-- etc. -->
</documentslist>
-------> To get a currency dependent on existence of a specific node
In this example we are looking the file from up to down and we are finding every refund nodes,
and taking currency value that is in the same block with the second time we see a refund node.
def webServiceResponse = "webservice#Response"
int index=2
def currrency= context.expand('${'+webServiceResponse+'(//*:refund)['+index+']//parent::*//*:payment/#currency}')
--------> Result for this is "EUR"
This one is that i cant take child value with the same way.
<webserviceResponse>
<documentslist>
<document>
<key>D_Computer</key>
<currency>USD</currency>
<amount>1250.00</amount>
<refund>true</refund>
</document>
<document>
<key>D_Keyboard</key>
<currency>JPY</currency>
<amount>150.00</amount>
</document>
<document>
<key>D_Monitor</key>
<currency>EUR</currency>
<amount>1170.00</amount>
<refund>false</refund>
</document>
<!-- etc. -->
</documentslist>
My problem with this one it doesn't have any attributes, has only values of the nodes. I know that it doesnt have an integer by the way but maybe i am doing wrong that i dont realize.
I want to get the amount value only dependent to the "key" nodes value which i am going to specify in the script.
result should show :150.00

Thank you for the very detailed and well written question.
You can use the below. Your problem is easy as there are no namespace in it.
Technique is same which you have dispalyed, its just that you need not to use # as its for attributes
def groovyUtils=new com.eviware.soapui.support.GroovyUtils(context)
def xml=groovyUtils.getXmlHolder("NameOfRequest#Response");
def currency=xml.getNodeValue("//*:documentslist/*:document[key='${key}']/*:amount");
log.info "Value of $key is " + currency
key="D_Monitor"
currency=xml.getNodeValue("//*:documentslist/*:document[key='${key}']/*:amount");
log.info "Value of $key is " + currency
Replace NameOfRequest with your Request's name
There is an alternative way too. I will post it as a separate answer so not to cause confusion. This one is still better than other one

There is an alternate way of doing things using Hashmap if the other answer is not working due to namespaces in your XML
Try this method
We are getting all values first by using getNodeValues and then since we have pair we are putting in hashmap.
Now you can retrieve anything.
def groovyUtils=new com.eviware.soapui.support.GroovyUtils(context)
def xml=groovyUtils.getXmlHolder("Request1#Response");
def keys=xml.getNodeValues("//*:documentslist/*:document/*:key")
def amounts=xml.getNodeValues("//*:documentslist/*:document/*:amount")
log.info keys.toString()
log.info amounts.toString()
HashMap h1=[:]
// Add the pair into hashmap and then retrieve
for(int i=0;i<keys.size();i++)
{
h1.put(keys[i],amounts[i])
}
def whichone="D_Computer"
log.info "Value for $whichone is " + h1.get(whichone)
Lets say you want to retrieve more than one value then you can use arrays.
i.e. take arrays as key,currency,amount,refund
so if you want to retrieve the refund for a key='Z' So using a for loop you can know that Z is present at 3 location in the array
then your refund should be refund[3]. Similarly currency[3] and amount[3]
Both the answers have their own relevance

How to find elements that do not include a certain class name with selenium and python

I want to find all the elements that contain a certain class name but skip the ones the also contain another class name beside the one that i am searching for
I have the element <div class="examplenameA"> and the element <div class="examplenameA examplenameB">
At the moment i am doing this to overcome my problem:
items = driver.find_elements_by_class_name('examplenameA')
for item in items:
cname = item.get_attribute('class')
if 'examplenameB' in cname:
pass
else:
rest of code
I only want the elements that have the class name examplenameA and i want to skip the ones that also contain examplenameB

To find all the elements with class attribute as examplenameA leaving out the ones with class attribute as examplenameB you can use the following solution:
css_selector:
items = driver.find_elements_by_css_selector("div.examplenameA:not(.examplenameB)")
xpath:
items = driver.find_element_by_xpath("//div[contains(#class, 'examplenameA') and not(#class='examplenameB')]")

You can use xpath in this case. So as per your example you need to use something like driver.find_elements_by_xpath('//div[#class='examplenameA'). This will give you only the elements whose class is examplenameA
So how xpath works is : Xpath=//tagname[#attribute='value']
Hence the class is considered as the attribute & xpath will try to match the exact given value, in this case examplenameA, so <div class="examplenameA examplenameB"> will be ignored
In case of find_elements_by_class_name method, it will try to match the element which has the class as examplenameA, so the <div class="examplenameA examplenameB"> will also be matched
Hope this helps

Simple XSLT transformation into ABAP Object

I'm again stuck with a transformation from XML into ABAP. This time, I want to put the XML data directly into an ABAP Object.
My XML looks like this:
<qualityStatus>
<address>0</address>
<bounceRisk>0</bounceRisk>
<checked>1</checked>
<domain>1</domain>
<domainScores>
<domainScore>
<domain>gmx.de</domain>
<score>0.8333333134651184</score>
</domainScore>
<domainScore>
<domain>ggs.de</domain>
<score>0.6666666269302368</score>
</domainScore>
<domainScore>
<domain>xyz.de</domain>
<score>0.6666666269302368</score>
</domainScore>
</domainScores>
<extSyntax>1</extSyntax>
<mailserver>1</mailserver>
<mailserverDiagnosis>1</mailserverDiagnosis>
<probability>1</probability>
<syntax>1</syntax>
</qualityStatus>
Edit: I changed back to a XSLT transformation, shortened to one attribute it looks like this:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sap="http://www.sap.com/sapxsl" version="1.0">
<xsl:output encoding="iso-8859-1" indent="yes" method="xml" version="1.0"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/qualityStatus">
<asx:abap version="1.0" xmlns:asx="http://www.sap.com/abapxml">
<asx:values>
<ROOT href="#o26"/>
</asx:values>
<asx:heap xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:abap="http://www.sap.com/abapxml/types/built-in"
xmlns:cls="http://www.sap.com/abapxml/classes/global"
xmlns:dic="http://www.sap.com/abapxml/types/dictionary">
<cls:ZCL_ADDRESS_QUALITY id="o26" >
<local.ZCL_ADDRESS_QUALITY>
<W_ADDRESS>
<xsl:value-of select="address"/>
</W_ADDRESS>
<!--More attributes here-->
</local.ZCL_ADDRESS_QUALITY>
</cls:ZCL_ADDRESS_QUALITY>
</asx:heap>
</asx:abap>
</xsl:template>
My object attributes are all public right now, because I thought this could be the problem. However, setter and getter do exist. Yes, my class does implement the interface if_serializable_object.
DATA:
w_address TYPE char1,
w_bouncerisk TYPE char1,
w_checked TYPE char1,
w_decoded TYPE stringval,
w_domain TYPE char1,
w_domainscores TYPE z_domainscore_t, "Table type for name + score
w_extsyntax TYPE char1,
w_mailserver TYPE char1,
w_mailserverdiagnosis TYPE char1,
w_probability TYPE char1,
w_syntax TYPE char1,
w_syntaxwarnings TYPE z_syntaxwarnings_t. "Table of syntaxwarnings
Finally, I call my transformation with an instance of my class:
CALL TRANSFORMATION zst_addressquality
SOURCE XML lw_xml
RESULT result = lo_addressquality.
Now, when debugging through the transformation code, it successfully notices all fields of the given lw_xml and appears to write them into the object lo_addressquality. But the object attributes stay empty afterwards.
When testing the serialization, I can access result which contains my object, but result-w_address (and all others) are empty.
While testing, I created a structure with completely identical names and types. With it, it worked as intended.
What am I missing? Is there anything else I have to watch out for when working with transformation into ABAP Objects?
_Edit: After changing to the XSLT, I can get until W_ADDRESS before my code throws an CX_XSLT_ABAP_CALL_ERROR. So, I'm still not able to access the object'S attributes properly. :|_

Objects can be serialized/deserialized only with an XSL transformation. It's not possible to do it with a simple transformation, dixit ABAP documentation:
ST programs are restricted to the transformation of elementary and structured ABAP data, along with internal tables. The transformation of reference variables and referenced objects is not currently supported.
The XSL transformation must convert the XML into ASXML, which in short corresponds to a structure like this:
<?xml ...?>
<asx:abap xmlns:asx="http://www.sap.com/abapxml" version="1.0">
<asx:values>
...
</asx:values>
<asx:heap>
...
</asx:heap>
</asx:abap>
The easiest way to understand what the ASXML should look like is to serialize your object reference using the identity transformation (it's an XSL transformation), and then adapt your transformation to produce the same kind of asXML:
CALL TRANSFORMATION id SOURCE anyRootName = yourObjectReference RESULT XML asXMLutf8xstring.
Example:
REPORT.
CLASS serialization_demo DEFINITION.
PUBLIC SECTION.
INTERFACES if_serializable_object.
DATA attribute TYPE i.
ENDCLASS.
START-OF-SELECTION.
DATA obj_ref TYPE REF TO serialization_demo.
DATA xstring TYPE xstring.
CREATE OBJECT obj_ref.
obj_ref->attribute = 5.
CALL TRANSFORMATION id " serialize
SOURCE root = obj_ref
RESULT XML xstring.
CLEAR obj_ref.
CALL TRANSFORMATION id " deserialize
SOURCE XML xstring
RESULT root = obj_ref.
ASXML (in the xstring variable):
<?xml version="1.0" encoding="utf-8"?>
<asx:abap version="1.0" xmlns:asx="http://www.sap.com/abapxml">
<asx:values>
<ROOT href="#o3"/>
</asx:values>
<asx:heap xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:abap="http://www.sap.com/abapxml/types/built-in" xmlns:cls="http://www.sap.com/abapxml/classes/global" xmlns:dic="http://www.sap.com/abapxml/types/dictionary">
<prg:SERIALIZATION_DEMO id="o3" xmlns:prg="http://www.sap.com/abapxml/classes/program/ZZSRO_TEST16I">
<local.SERIALIZATION_DEMO>
<ATTRIBUTE>5</ATTRIBUTE>
</local.SERIALIZATION_DEMO>
</prg:SERIALIZATION_DEMO>
</asx:heap>
</asx:abap>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to access xml field with lxml? - python-3.x

Related

How to resolve: XML schema created in Excel contains denormalized data

Python lxml.etree: how to add 'xml:lang="en-US"' as a namespace

how to get child node value from a paren preferenced by another child value in Groovy readyAPI

How to find elements that do not include a certain class name with selenium and python

Simple XSLT transformation into ABAP Object

Categories

Resources