XML Element Tree - appending to existing elements and attributes with ET.SubElement()? - python-3.x

I have the following function which builds up a re-usable XML SOAP envelope:
def get_xml_soap_envelope():
"""
Returns a generically re-usable SOAP envelope in the following format:
<soapenv:Envelope>
<soapenv:Header/>
<soapenv:Body />
</soapenv:Envelope>
"""
soapenvEnvelope = ET.Element('soapenv:Envelope')
soapenvHeader = ET.SubElement(soapenvEnvelope, 'soapenv:Header')
soapenvBody = ET.SubElement(soapenvEnvelope, 'soapenv:Body')
return soapenvEnvelope
Fairly simple stuff so far.
I was wondering now, would it be possible to append attributes (such as xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance") to the soapenv:Envelope element?
And if I also wanted to append the following XML:
<urn:{AAction} soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<AUserName>{AUserName}</AUserName>
<APassword>{APassword}</APassword>
</urn:{AAction}>
To the soapenv:Body such that I would have something like this:
if __name__ == "__main__":
soapenvEnvelope = get_xml_soap_envelope()
actions = {
'AAction': 'UserLogin',
}
soapAAction = ET.Element('urn:{AAction}'.format(**actions))
soapenvEnvelope.AppendElement(soapAAction, 'soapenv:Body')
So, I could specify a target node and the Element to append to?

Let's start from the bad news: Your function to create the SOAP envelope
(get_xml_soap_envelope) is wrong as it fails to specify at least
xmlns:soapenv="...".
Actually all other namespaces to be used should be also specified here.
A proper function creating the SOAP envelope should be somenting like this:
def get_xml_soap_env():
"""
Returns a generically re-usable SOAP envelope in the following format:
<soapenv:Envelope xmlns:soapenv="...", ...>
<soapenv:Header/>
<soapenv:Body />
</soapenv:Envelope>
"""
ns = {'xmlns:soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',
'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'xmlns:urn': 'http://dummy.urn'}
env = ET.Element('soapenv:Envelope', ns)
ET.SubElement(env, 'soapenv:Header')
ET.SubElement(env, 'soapenv:Body')
return env
Note that ns dictionary contains also other namespaces, which will be
needed later, a.o. xsi namespace.
A possible alternative is to define ns outside of this function and pass it as
a parameter (your choice).
When I ran:
env = get_xml_soap_env()
print(ET.tostring(env, encoding='unicode', short_empty_elements=True))
the printout (reformatted by me for readability) was:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:urn="http://dummy.urn"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Header />
<soapenv:Body />
</soapenv:Envelope>
Note that this time proper namespaces are included.
Then, to add the Action element and its children, define the following function:
def addAction(env, action, subelems):
body = env.find('soapenv:Body')
actn = ET.SubElement(body, f'soapenv:{action}')
for k, v in subelems.items():
child = ET.SubElement(actn, k)
child.text = v
When I ran:
subelems = {'AUserName': 'Mark', 'APassword': 'Secret!'}
addAction(env, 'UserLogin', subelems)
and printed the whole XML tree again, the result was:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:urn="http://dummy.urn" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Header />
<soapenv:Body>
<soapenv:UserLogin>
<AUserName>Mark</AUserName>
<APassword>Secret!</APassword>
</soapenv:UserLogin>
</soapenv:Body>
</soapenv:Envelope>

Related

Python XML ElementTree cannot iter(), find() or findall()

I can an xml file and loop through the root printing, but root.iter('tag'), root.find('tag') and root.findall('tag') will not work.
Here is a sample of the XML:
<?xml version='1.0' encoding='UTF-8'?>
<cpe-list xmlns:config="http://scap.nist.gov/schema/configuration/0.1" xmlns="http://cpe.mitre.org/dictionary/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.3" xmlns:cpe-23="http://scap.nist.gov/schema/cpe-extension/2.3" xmlns:ns6="http://scap.nist.gov/schema/scap-core/0.1" xmlns:meta="http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2" xsi:schemaLocation="http://scap.nist.gov/schema/cpe-extension/2.3 https://scap.nist.gov/schema/cpe/2.3/cpe-dictionary-extension_2.3.xsd http://cpe.mitre.org/dictionary/2.0 https://scap.nist.gov/schema/cpe/2.3/cpe-dictionary_2.3.xsd http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2 https://scap.nist.gov/schema/cpe/2.1/cpe-dictionary-metadata_0.2.xsd http://scap.nist.gov/schema/scap-core/0.3 https://scap.nist.gov/schema/nvd/scap-core_0.3.xsd http://scap.nist.gov/schema/configuration/0.1 https://scap.nist.gov/schema/nvd/configuration_0.1.xsd http://scap.nist.gov/schema/scap-core/0.1 https://scap.nist.gov/schema/nvd/scap-core_0.1.xsd">
<generator>
<product_name>National Vulnerability Database (NVD)</product_name>
<product_version>4.4</product_version>
<schema_version>2.3</schema_version>
<timestamp>2021-05-21T03:50:31.204Z</timestamp>
</generator>
<cpe-item name="cpe:/a:%240.99_kindle_books_project:%240.99_kindle_books:6::~~~android~~">
<title xml:lang="en-US">$0.99 Kindle Books project $0.99 Kindle Books (aka com.kindle.books.for99) for android 6.0</title>
<references>
<reference href="https://play.google.com/store/apps/details?id=com.kindle.books.for99">Product information</reference>
<reference href="https://docs.google.com/spreadsheets/d/1t5GXwjw82SyunALVJb2w0zi3FoLRIkfGPc7AMjRF0r4/edit?pli=1#gid=1053404143">Government Advisory</reference>
</references>
<cpe-23:cpe23-item name="cpe:2.3:a:\$0.99_kindle_books_project:\$0.99_kindle_books:6:*:*:*:*:android:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:%40thi.ng%2fegf_project:%40thi.ng%2fegf:-::~~~node.js~~">
<title xml:lang="en-US">#thi.ng/egf Project #thi.ng/egf for Node.js</title>
<references>
<reference href="https://github.com/thi-ng/umbrella/security/advisories/GHSA-rj44-gpjc-29r7">Advisory</reference>
<reference href="https://www.npmjs.com/package/#thi.ng/egf">Version</reference>
</references>
<cpe-23:cpe23-item name="cpe:2.3:a:\#thi.ng\/egf_project:\#thi.ng\/egf:-:*:*:*:*:node.js:*:*"/>
</cpe-item>
</cpe-list>
The followig Python (3.7) code works:
import xml.etree.ElementTree as ET
infile = open(filename, "r")
xml = infile.read()
infile.close()
parser = ET.XMLParser(encoding="utf-8")
root = ET.fromstring(xml, parser=parser)
print(root.tag)
for child in root:
print(child.tag)
Output:
{http://cpe.mitre.org/dictionary/2.0}cpe-list
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
...
But when I try:
for item in root.iter('cpe-item') or for item in root.iter('cpe-list'), nothing loops. When I try for item in root.findall('cpe-item') or for item in root.findall('cpe-list'), nothing loops. If I try item = root.find('cpe-list'), item = None.
I don't work with XML very often, but this seems so strage to me since I have some example code of other projects where this works perfectly fine. Many other examples online show this exact process is the correct process.
What is am I doing wrong?
It seems odd to me that when I print(root.tag) or print(child.tag) there is something before the tag prints. I don't know why that is happening.
You are getting entangled with namespaces. A lot has been written about it and starting here may be a good place.
As for you specific example, the tl;dr is to disregard them altogether. For example:
for item in root.findall('.//{*}cpe-item'):
print(item.tag)
Another option is to bite the bullet and declare the namespaces:
ns = {"xx":"http://cpe.mitre.org/dictionary/2.0"}
for item in root.findall('.//xx:cpe-item', ns):
print(item.tag)
output is
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item

How to access xml text of child?

I have the following xml file (taken from here:
<BioSampleSet>
<BioSample submission_date="2011-12-01T13:31:02.367" last_update="2014-11-08T01:40:24.717" publication_date="2012-02-16T10:49:52.970" access="public" id="761094" accession="SAMN00761094">
<Ids>
</Ids>
<Package display_name="Generic">Generic.1.0</Package>
<Attributes>
<Attribute attribute_name="Individual">PK314</Attribute>
<Attribute attribute_name="condition">healthy</Attribute>
<Attribute attribute_name="BioSampleModel">Generic</Attribute>
</Attributes>
<Status status="live" when="2014-11-08T00:27:24"/>
</BioSample>
</BioSampleSet>
And I need to access the text next to the attribute attribute_nameof the child Attributes.
I managed accessing the values of attribute_name.:
from Bio import Entrez,SeqIO
Entrez.email = '#'
import xml.etree.ElementTree as ET
handle = Entrez.efetch(db="biosample", id="SAMN00761094", retmode="xml", rettype="full")
tree = ET.parse(handle)
for attr in root[0].iter('Attribute'):
name = attr.get('attribute_name')
print(name)
this prints:
Individual
condition
BioSampleModel
How do I create a dict of the values of attribute_name and the text next to it?
My desired output is
attributes = {'Individual': PK314, 'condition': healthy, 'BioSampleModel': Generic}
Based strictly on the xml sample in the question, try something along these lines:
bio = """[your xml sample]"""
doc = ET.fromstring(bio)
attributes = {}
for item in doc.findall('.//Attributes//Attribute'):
attributes[item.attrib['attribute_name']]=item.text
attributes
Output:
{'Individual': 'PK314', 'condition': 'healthy', 'BioSampleModel': 'Generic'}

Modify Specific xml tags with iterparse

I'm working with open map data and need to be able to update specific tags based on their values. I have been able to read the tags and even print the specific tags that need to be updated to the console, but I have not been able to get them to update.
I am using elementree and lxml. What I'm looking for specifically is if the first word of the addr:street tag is a cardinality direction (ie North, South, East, West) and the last word of the addr:housenumber tag is NOT a cardinality direction, take the first word from the addr:street tag and move it to be the last word of the addr:housenumber tag.
Edited based on questions below.
Initially I was just calling the code with:
clean_data(OUTPUT_FILE)
I didn't realize that iterparse can't be used to print directly from within the method (which I believe is what you're saying). I had code from a different part of the project I use earlier so I adapted what you wrote what what I had before Here's what I have:
Earlier in the file:
import xml.etree.cElementTree as ET
from collections import defaultdict
import pprint
import re
import codecs
import json
OSM_FILE = "Utah County Map.osm"
OUTPUT_FILE = "Utah County Extract.osm"
JSON_FILE = "JSON MAP DATA.json"
The code in this section of the project:
def clean_data(osm_file, tags = ('node', 'way')):
context = iter(ET.iterparse(osm_file, events=('end',)))
for event, elem in context:
if elem.tag == 'node':
streetTag, street = getVal(elem, 'addr:street')
if street is None: # No "street"
continue
first_word = getWord(street, True)
houseTag, houseNo = getVal(elem, 'addr:housenumber')
if houseNo is None: # No "housenumber"
continue
last_word = getWord(houseNo, False)
if first_word in direct_list and last_word not in direct_list:
streetTag.attrib['v'] = street[len(first_word) + 1:]
houseTag.attrib['v'] = houseNo + ' ' + first_word
for i, element in enumerate(clean_data(OUTPUT_FILE)):
print(ET.tostring(context.root, encoding='unicode', pretty_print=True, with_tail=False))
When I'm running this right now I"m getting an error:
TypeError: 'NoneType' object is not iterable
I tried adding in the output code I used earlier for another section of the project, but received the same error. Here's that code for reference as well. (Output file in this code refers to the output of the first stage of data cleaning where I removed multiple invalid nodes).
with open(CLEAN_DATA, 'w') as output:
output.write('<?xml version="1.0" encoding="UTF-8"?>\n')
output.write('<osm>\n ')
for i, element in enumerate(clean_data(OUTPUT_FILE)):
output.write(ET.tostring(element, encoding='unicode'))
output.write('</osm>')
Initial edit was in response to Valdi_bo's question below. Here is a sample from my xml file for reference. Yes I am using both Elementree and lxml since lxml seems to be a subset of elementree. Some of the functions I've called earlier in the program have only worked with one or the other so I'm using both.
<?xml version="1.0" encoding="UTF-8"?>
<osm>
<node changeset="24687880" id="356682074" lat="40.2799548" lon="-111.6457549" timestamp="2014-08-11T20:33:35Z" uid="2253787" user="1000hikes" version="2">
<tag k="addr:city" v="Provo" />
<tag k="addr:housenumber" v="3570" />
<tag k="addr:postcode" v="84604" />
<tag k="addr:street" v="Timpview Drive" />
<tag k="building" v="school" />
<tag k="ele" v="1463" />
<tag k="gnis:county_id" v="049" />
<tag k="gnis:created" v="02/25/1989" />
<tag k="gnis:feature_id" v="1449106" />
<tag k="gnis:state_id" v="49" />
<tag k="name" v="Timpview High School" />
<tag k="operator" v="Provo School District" />
</node>
<node changeset="58421729" id="356685655" lat="40.2414325" lon="-111.6678877" timestamp="2018-04-25T20:23:33Z" uid="360392" user="maxerickson" version="4">
<tag k="addr:city" v="Provo" />
<tag k="addr:housenumber" v="585" />
<tag k="addr:postcode" v="84601" />
<tag k="addr:street" v="North 500 West" />
<tag k="amenity" v="doctors" />
<tag k="gnis:feature_id" v="2432255" />
<tag k="healthcare" v="doctor" />
<tag k="healthcare:speciality" v="gynecology;obstetrics" />
<tag k="name" v="Valley Obstetrics & Gynecology" />
<tag k="old_name" v="Healthsouth Provo Surgical Center" />
<tag k="phone" v="+1 801 374 1801" />
<tag k="website" v="http://valleyobgynutah.com/location/provo-office-2/" />
</node>
</osm>
In this example the first node would remain unchanged. In the second block the addr:housenumber tag should be changed from 585 to 585 North and the addr:street tag should be changed from North 500 West to 500 West.
Try the following code:
Functions / global variables:
def getVal(nd, kVal):
'''
Get data from "tag" child node with required "k" attribute
Parameters:
nd - "starting" node,
kVal - value of "k" attribute.
Results:
- the tag found,
- its "v" attribute
'''
tg = nd.find(f'tag[#k="{kVal}"]')
if tg is None:
return (None, None)
return (tg, tg.attrib.get('v'))
def getWord(txt, first):
'''
Get first / last word from "txt"
'''
pat = r'^\S+' if first else r'\S+$'
mtch = re.search(pat, txt)
return mtch.group() if mtch else ''
direct_list = ["N", "N." "No", "North", "S", "S.",
"So", "South", "E", "E.", "East", "W", "W.", "West"]
And the main code:
for nd in tree.iter('node'):
streetTag, street = getVal(nd, 'addr:street')
if street is None: # No "street"
continue
first_word = getWord(street, True)
houseTag, houseNo = getVal(nd, 'addr:housenumber')
if houseNo is None: # No "housenumber"
continue
last_word = getWord(houseNo, False)
if first_word in direct_list and last_word not in direct_list:
streetTag.attrib['v'] = street[len(first_word) + 1:]
houseTag.attrib['v'] = houseNo + ' ' + first_word
I assume that tree variable holds the entire XML tree.
Edit following the comment as of 22:36:33Z
My code works also in a loop based on iterparse.
Prepare e.g. input.xml file with some root tag and a couple of
node elements inside. Then try the following code (with necessary imports,
functions and global variables presented above):
context = iter(etree.iterparse('input.xml', events=('end',)))
for event, elem in context:
if elem.tag == 'node':
streetTag, street = getVal(elem, 'addr:street')
if street is None: # No "street"
continue
first_word = getWord(street, True)
houseTag, houseNo = getVal(elem, 'addr:housenumber')
if houseNo is None: # No "housenumber"
continue
last_word = getWord(houseNo, False)
if first_word in direct_list and last_word not in direct_list:
streetTag.attrib['v'] = street[len(first_word) + 1:]
houseTag.attrib['v'] = houseNo + ' ' + first_word
As iterparse processes only end events, you don't even need
and event == 'end' in the first if.
You neither need initial _, root = next(context) from your code,
as context.root points to the whole XML tree.
And now, having the constructed XML tree, you can print it, to see the result:
print(etree.tostring(context.root, encoding='unicode', pretty_print=True,
with_tail=False))
Notes:
The above code has been written written without yielding anything,
but it generates a full XML tree, updated according to your needs.
As the task is to construct an XML tree, this code does not clear
anything. Calls to clear are needed only when you:
retrieve some data from processed elements and save it elsewhere,
don't need these elements any more.
Now you can reconstruct the above code into a "yielding" variant and use
it in your environment (you didn't provide any details how your code sample
is called).

How to extract value from a string in XML response using karate

Unfortunately the respons ei am getting from my backend is not in correct xml format and it's giving the response in a bad format like this:
<soapenv:Body>
<ns2:getInputResponse xmlns:ns2="http://docs.oasisopen.org/ns/bpel4people/ws-humantask/api/200803">
<ns2:taskData xmlns:s186="http://www.w3.org/2001/XMLSchema-instance" xmlns:s187="http://www.w3.org/2001/XMLSchema" s186:type="s187:string"><?xml version="1.0" encoding="UTF-8"?>
<SubscriptionApprovalData xmlns="http://workflow.subscription.apimgt.carbon.wso2.org" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<apiName>Auto_Approval</apiName>
<apiVersion>v1</apiVersion>
<apiContext>/test/lambda/v1</apiContext>
<apiProvider>admin</apiProvider>
<subscriber>regtest</subscriber>
<applicationName>newApp</applicationName>
<tierName>Gold</tierName>
<workflowExternalRef>23d30bd8-51e3-4afe-aae0-3fa159d85a6b</workflowExternalRef>
<callBackURL>https://apistore-dev-dev-a878-14-ams10-nonp.qcpaws.qantas.com.au/services/WorkflowCallbackService</callBackURL>
</SubscriptionApprovalData></ns2:taskData>
</ns2:getInputResponse>
</soapenv:Body>
Now because of this Karate is not able to read the response and fetch the value of "workflowExternalRef" which is my goal for this test.
Is there any way karate can read it?
This is really messed up XML so please check with someone in your team if this can be fixed.
Anyway, since you can use Java in Karate, here is one way to do this. This is not production-quality code, please adapt as appropriate:
* string response = response
* def start = response.indexOf('workflowExternalRef>')
* def ref = response.substring(start + 23)
* def end = ref.indexOf('<')
* def ref = ref.substring(0, end)
* match ref == '23d30bd8-51e3-4afe-aae0-3fa159d85a6b'

soapUI groovy script groovy.lang.MissingMethodException

Following exception is received when I tried to parse response within a soapUI test step.Also tried getXMLHolder method. Still no luck.
Am I missing an import or library?
groovy.lang.MissingMethodException: No signature of method:
java.lang.String.getNodeValue() is applicable for argument types:
(java.lang.String) values:
[//ConversionRateResponse/ConversionRateResult] error at line: 16
def groovyUtils = new com.eviware.soapui.support.GroovyUtils(context);
project = testRunner.getTestCase().getTestSuite().getProject().getWorkspace().getProjectByName("FirstProject")
testSuite = project.getTestSuiteByName("TestSuite 1");
testCase = testSuite.getTestCaseByName("TestCase 1");
testCase.setPropertyValue("fromCurrency","EUR")
testCase.setPropertyValue("toCurrency","TRL")
testStep=testCase.testSteps["SOAP Request1"]
def responseHolder=testStep.getPropertyValue("response");
def refNum = responseHolder.getNodeValue("//ConversionRateResponse/ConversionRateResult")
And the response is as follows
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<ConversionRateResponse xmlns="http://www.webserviceX.NET/">
<ConversionRateResult>-1</ConversionRateResult>
</ConversionRateResponse>
</soap:Body>
</soap:Envelope>
You can add the Script Assertion to the Soap Request Test step.
Here is the script:
//Check if the response is not empty
assert context.response, 'Response is empty or null'
def rate = new XmlSlurper().parseText(context.response).'**'.find{it.name() == 'ConversionRateResult'}?.text() as Integer
log.info "Conversion rate result is : $rate "
//Check if the result rate is -1, change if needed
assert -1 == rate
I can see you have used getNodeValue but on String which is wrong
if you see your error it says, "No signature of method: java.lang.String.getNodeValue() is applicable for argument types: (java.lang.String) values"
see the below code where we have used the getNodeValue on the correct thing
def groovyUtils = new com.eviware.soapui.support.GroovyUtils(context);
def response = groovyUtils.getXmlHolder('SOAP Request#Response')
def refNum=response.getNodeValue("//*:ConversionRateResponse//*:ConversionRateResult")
log.info refNum
getNodeValue is a very useful function and will help a lot in extracting value from xml, Similarly we have getDomNode which is for the nodes and not values

Resources