Groovy: replaceLast() is missing - string

I need replaceLast() method in the Groovy script - replace the last substring. It is available in Java, but not in Groovy AFAIK. It must work with regex in the same way as the following replaceFirst.
replaceFirst(CharSequence self, Pattern pattern, CharSequence replacement)
Replaces the first substring of a CharSequence that matches the given compiled regular expression with the given replacement.
EDIT: Sorry not being specific enough. Original string is an XML file and the same key (e.g. Name) is present many times. I want to replace the last one.
<Header>
<TransactionId>1</TransactionId>
<SessionId>1</SessionId>
<User>
<Name>Bob</Name>
...
</User>
<Sender>
<Name>Joe</Name>
...
</Sender>
</Header>
...
<Context>
<Name>Rose</Name>
...
</Context>

No idea what replaceLast in Java is...it's not in the JDK... If it was in the JDK, you could use it in Groovy...
Anyway, how about using an XML parser to change your XML instead of using a regular expression?
Given some xml:
def xml = '''<Header>
<TransactionId>1</TransactionId>
<SessionId>1</SessionId>
<User>
<Name>Bob</Name>
</User>
<Sender>
<Name>Joe</Name>
</Sender>
<Something>
<Name>Tim</Name>
</Something>
</Header>'''
You can parse it using Groovy's XmlParser:
import groovy.xml.*
def parsed = new XmlParser().parseText(xml)
Then, you can do a depth first search for all nodes with the name Name, and take the last -1 one:
def lastNameNode = parsed.'**'.findAll { it.name() == 'Name' }[-1]
Then, set the value to a new string:
lastNameNode.value = 'Yates'
And print the new XML:
println XmlUtil.serialize(parsed)
<?xml version="1.0" encoding="UTF-8"?><Header>
<TransactionId>1</TransactionId>
<SessionId>1</SessionId>
<User>
<Name>Bob</Name>
</User>
<Sender>
<Name>Joe</Name>
</Sender>
<Something>
<Name>Yates</Name>
</Something>
</Header>

Related

Reading CDATA with lxml, problem with end of line

Hello I am parsing a xml document with contains bunch of CDATA sections. I was working with no problems till now. I realised that when I am reading the an element and getting the text abribute I am getting end of line characters at the beggining and also at the end of the text read it.
A piece of the important code as follow:
for comments in self.xml.iter("Comments"):
for comment in comments.iter("Comment"):
description = comment.get('Description')
if language == "Arab":
tag = self.name + description
text = comment.text
The problem is at element Comment, he is made it as follow:
<Comment>
<![CDATA[Usually made it with not reason]]>
I try to get the text atribute and I am getting like that:
\nUsually made it with not reason\n
I Know that I could do a strip and so on. But I would like to fix the problem from the root cause, and maybe there is some option before to parse with elementree.
When I am parsing the xml file I am doing like that:
tree = ET.parse(xml)
Minimal reproducible example
import xml.etree.ElementTree as ET
filename = test.xml #Place here your path test xml file
tree = ET.parse(filename)
root = tree.getroot()
Description = root[0]
text = Description.text
print (text)
Minimal xml file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Description>
<![CDATA[Hello world]]>
</Description>
You're getting newline characters because there are newline characters:
<Comment>
<![CDATA[Usually made it with not reason]]>
</Comment>
Why else would <![CDATA and </Comment start on new lines?
If you don't want newline characters, remove them:
<Comment><![CDATA[Usually made it with not reason]]></Comment>
Everything inside an element counts towards its string value.
<![CDATA[...]]> is not an element, it's a parser flag. It changes how the XML parser is reading the enclosed characters. You can have multiple CDATA sections in the same element, switching between "regular mode" and "cdata mode" at will:
<Comment>normal text <![CDATA[
CDATA mode, this may contain <unescaped> Characters!
]]> now normal text again
<![CDATA[more special text]]> now normal text again
</Comment>
Any newlines before and after a CDATA section count towards the "normal text" section. When the parser reads this, it will create one long string consisting of the individual parts:
normal text
CDATA mode, this may contain <unescaped> Characters!
now normal text again
more special text now normal text again
I thought that when CDATA comes at xml they were coming with end of line at the beginning and at the end, like that.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Description>
<![CDATA[Hello world]]>
</Description>
But you can have it like that also.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Description><![CDATA[Hello world]]></Description>
It is the reason to get end of line characters when we are parsing the with the Elementtree library, is working perfect in both cases, you only have to strip or not strip depending how you want to process the data.
if you want to remove both '\n' just add the following code:
text = Description.text
text = text.strip('\n')

JMeter: # notation (for xml)

On groovy templates for jmeter page there is an example I wanted to follow:
String xml = “
<actions>
<action type=”error” info=”itsErrors”/>
<action type="warning" info=”warnWarn”/>
<action type=”info” info=”justLogInfo”/>
</actions>"
XmlParser parser = new XmlParser()
def actions= parser.parseText (xml)
actions.action.each { action ->
println "${action.'#type'}: ${action.'#info'}";
}
At least in my JMeter 5.1 it did not work as posted, but when I fixed quotation marks it did:
String xml = """
<actions>
<action type="error" info="itsErrors"/>
<action type="warning" info="warnWarn"/>
<action type="info" info="justLogInfo"/>
</actions>"""
XmlParser parser = new XmlParser()
def actions= parser.parseText (xml)
actions.action.each { action ->
println "${action.'#type'}: ${action.'#info'}";
}
My question is usage of # mainly, dot and quotes too (.'#type'). I tried web search for Groovy # and found nothing, for JMeter notations found https://jmeter.apache.org/usermanual/functions.html with only one instance of usage:
Example: ${__XPath(/path/to/build.xml, //target/#name)} This will
match all targets in build.xml and return the contents of the next
name attribute
And about variables same link:
Referencing a variable in a test element is done by bracketing the
variable name with '${' and '}'.
Groovy docs page for xml gives other notations:
https://groovy-lang.org/processing-xml.html
def text = '''
<list>
<technology>
<name>Groovy</name>
</technology>
</list>
'''
def list = new XmlParser().parseText(text)
assert list instanceof groovy.util.Node
assert list.technology.name.text() == 'Groovy'
What each notation in "${action.'#type'}: ${action.'#info'}" means?
It isn't a JMeter variable even with ${}, is it?
I managed to keep in working only w/put ', other parts seems necessary: ", ., #, {}, $. I may have put extra in last phrase, some I can explain, but just to be sure I understand it right.
It's GPath syntax used in groovy
The most common way of querying XML in Groovy is using GPath
For XML, you can also specify attributes, e.g.:
a["#href"] → the href attribute of all the a elements
a.'#href' → an alternative way of expressing this
a.#href → an alternative way of expressing this when using XmlSlurper

How to extract CDATA without the GPath/node name

I'm trying to extract CDATA content from an XML without the using GPath (or) node name. In short, i want to find & retrieve the innerText containing CDATA section from an XML.
My XML look like:
def xml = '''<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<Test1>This node contains some innerText. Ignore This.</Test1>
<Test2><![CDATA[this is the CDATA section i want to retrieve]]></Test2>
</root>'''
From the above XML, i want to get the CDATA content alone without using the reference of its node name 'Test2'. Because the node name is not always the same in my scenario.
Also note that the XML can contain innerText in few other nodes (Test1). I dont want to retrieve that. I just need the CDATA content out of the whole XML.
I want something like below (the code below is incorrect though)
def parsedXML = new xmlSlurper().parseText(xml)
def cdataContent = parsedXML.depthFirst().findAll { it.text().startsWith('<![CDATA')}
My output should be :
this is the CDATA section i want to retrieve
As #daggett says, you can't do this with the Groovy slurper or parser, but it's not too bad to drop down and use the java classes to get it.
Note you have to set the property for CDATA to become visible, as by default it's just treated as characters.
Here's the code:
import javax.xml.stream.*
def xml = '''<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<Test1>This node contains some innerText. Ignore This.</Test1>
<Test2><![CDATA[this is the CDATA section i want to retrieve]]></Test2>
</root>'''
def factory = XMLInputFactory.newInstance()
factory.setProperty('http://java.sun.com/xml/stream/properties/report-cdata-event', true)
def reader = factory.createXMLStreamReader(new StringReader(xml))
while (reader.hasNext()) {
if (reader.eventType in [XMLStreamConstants.CDATA]) {
println reader.text
}
reader.next()
}
That will print this is the CDATA section i want to retrieve
Considering you just have one CDATA in your xml split can help here
def xml = '''<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<Test1>This node contains some innerText. Ignore This.</Test1>
<Test2><![CDATA[this is the CDATA section i want to retrieve]]></Test2>
</root>'''
log.info xml.split("<!\\[CDATA\\[")[1].split("]]")[0]
So in the above logic we split the string on CDATA start and pick the portion which is left after
xml.split("<!\\[CDATA\\[")[1]
and once we got that portion we did the split again and then got the portion which is before that pattern by using
.split("]]")[0]
Here is the proof it works

Delete all chars before the xml Tag in a string - in groovy. soapui

How do I replace all the characters with nothing (thus deleting them) up to a certain character? I have a log string which is an XML request:
I have a string like this:
Mon Dec 19 09:50:50 EST 2016:INFO:
string = "test-testing ID:idm-zx-sawe.3CE65834D32AD741:370 <?xml version="1.0" encoding="UTF-8"?>"
string.replaceAll("([^,]*'<')", "").replaceAll("(?m)^\\s*ID.*","");
I need to remove all the charters before <?xml
and return the following string: "test-testing ID:idm-zx-sawe.3CE65834D32AD741:370
I'm trying with this regular expression:
/.*<\?/ - need this translated to groovy string.replaceAll(".*<\?","")
I would do it like this:
​def string = 'test-testing ID:idm-zx-sawe.3CE65834D32AD741:370 <?xml version="1.0" encoding="UTF-8"?>'
def start = ​​​​​​​​​​​​​​​​string.indexOf('<?xml')​​​​​;
if (start) {
string = string.substring(start);
}​
string is:
<?xml version="1.0" encoding="UTF-8"?>

Append literal string (plain text) to XPath result

Is there a way to append a literal string to whatever an XPath expression gets you?
e.g. from following XML:
<root>
<select>I am</select>
</root>
I would like to produce:
I am a literal
purely with XPath. What do I add to /root/select to get what I want? Important: XPath 1.0 solution required! I'm not using XSLT.
Any reason you can't simply concat() 'a literal' to the end?
Something like: concat( string(/some/selector) , ' some literal' )?

Resources