groovy parse local html file - groovy

I am working on a groovy script that will get all the local html files and parse certain tags in them. I tried using something like html clean and it just is not working. I tried to read each line but that only works when the stuff I need is on 1 line. I have this script up on github, https://github.com/jrock2004/johns-octopress-scripts/blob/master/convertCompiledPosts/convertPosts.groovy. Thanks for any input
Edit: So I am getting closer. I have this code now
def parser = new org.cyberneko.html.parsers.SAXParser()
new XmlParser( parser ).parse( curFile+ "/index.html" ).with { page ->
page.'**'.DIV.grep { it.'#class'?.contains 'entry-content' }.each {
println it
println "--------------------------------"
}
}
And what it prints is
DIV[attributes={class=entry-content}; value=[P[attributes={}; value=[As an automation developer, I have learned how to write code in Java. When I am having an issue, one of the nice things that you can do is debug your code, line by line. For the longest I had wished that something like this existed in PHP. I have come to find out that you can actually debug code, like I do in Java. This is such a helpful task because I do not have to waste time using var_dump and such on variables or results. In your apache/php server you need to install and or enable something called, A[attributes={href=http://xdebug.org/}; value=[Xdebug]], . I will work on a tutorial on how to use xdebug while writing code in Sublime Text 2. So keep an eye out on my blog and or, A[attributes={href=http://www.youtube.com/jrock20041}; value=[YouTube]], channel for this tutorial.]]]]
So basically what I want is I wall the text including the html elements in the div with the class entry-content. If you want to see the page it can be found here -- http://jcwebconcepts.net/blog/2013/02/02/xdebug/
Thanks for your help

It does work... Save the HTML for this page to a file, then you can parse it.
The following code prints the name of the author of every comment on the page:
#Grab('net.sourceforge.nekohtml:nekohtml:1.9.16')
def parser = new org.cyberneko.html.parsers.SAXParser()
new XmlParser( parser ).parse( file ).with { page ->
page.'**'.A.grep { it.'#class'?.contains 'comment-user' }.each {
println it.text()
}
}
When file is set to be a File pointing to the saved HTML (or a String containing the URL of this question), it prints:
tim_yates
jrock2004
tim_yates
Edit:
To print the contents of a given node, you could do (using the example from your edited question):
#Grab('net.sourceforge.nekohtml:nekohtml:1.9.16')
import groovy.xml.*
def parser = new org.cyberneko.html.parsers.SAXParser()
new XmlParser( parser ).parse( 'http://jcwebconcepts.net/blog/2013/02/02/xdebug/' ).with { page ->
page.'**'.DIV.grep { it.'#class'?.contains 'entry-content' }.each { it ->
println XmlUtil.serialize( it )
}
}

Related

Can groovy heredocs be internationalized?

Have some multiline strings that are presented to the user and stored as Heredocs. So rather than a 'normal' (Java) property file, a groovy-based one (see here) to be consumed by ConfigSlurper was used and works great. Sorry if this is a dumb question, but can that be easily internationalized? If so, can you outline how that is accomplished?
My solution: In your ConfigSlurper you should store keys to the internalized strings. Inject messageSourceand localResolver in your controller/service, get key from your ConfigSlurper and find localized string in your i18n messages.property file. Example (not sure that code is correct, but it's the main idea):
def config = new ConfigSlurper().parse(new File('src/Config.groovy').toURL())
//localized string1 value
def msg = messageSource.getMessage(config.data1.string1, null, localeResolver.defaultLocale)
As far as I know the ConfigSlurper does not have special support for i18n.
You may achieve it by using the leveraging its support for environments by creating an environment closure per locale. For example:
environments {
english {
sample {
hello = "hello"
}
}
spanish {
sample {
hello = "hola"
}
}
}
When creating the ConfigSlurper you will need to pass the desired language:
def config = new ConfigSlurper("spanish")

How can I use relative paths to external response files for soapUI MockService

What I've Done
I am using soapUI (3.6.1 Free version) mock services to serve up specific data to 2 client applications I am testing. With some simple Groovy script I've set up some mock operations to fetch responses from specific files based on the requests made by the client applications.
The static contents of the mock response is:
${responsefile}
The groovy in the operation dispatch scripting pane is:
def req = new XmlSlurper().parseText(mockRequest.requestContent)
if (req =~ "CategoryA")
{
context.responsefile = new File("C:/soapProject/Test_Files/ID_List_CategoryA.xml").text
}
else
{
context.responsefile = new File("C:/soapProject/Test_Files/ID_List_CategoryB.xml").text
}
In this example, when the client application issues a request to the mock service that contains the string CategoryA, the response returned by soapUI is the contents of file ID_List_CategoryA.xml
What I'm Trying To Achieve
This all works fine with the absolute paths in the groovy. Now I want to pull the whole collection of soapUI project file and external files into a package for easy re-deployment. From my reading about soapUI I hoped this would be as easy as setting the project Resource Root value to ${projectDir} and changing my paths to:
def req = new XmlSlurper().parseText(mockRequest.requestContent)
if (req =~ "CategoryA")
{
context.responsefile = new File("Test_Files/ID_List_CategoryA.xml").text
}
else
{
context.responsefile = new File("Test_Files/ID_List_CategoryB.xml").text
}
... keeping in mind that the soapUI project xml file resides in C:/soapProject/
What I've Tried So Far
So, that doesn't work. I've tried variations of relative paths:
./Test_Files/ID_List_CategoryA.xml
/Test_Files/ID_List_CategoryA.xml
Test_Files/ID_List_CategoryA.xml
One post indicated that soapUI might consider the project files parent directory as the root for the purposes of the relative path, so tried the following variations too:
./soapProject/Test_Files/ID_List_CategoryA.xml
/soapProject/Test_Files/ID_List_CategoryA.xml
soapProject/Test_Files/ID_List_CategoryA.xml
When none of that worked I tried making use of the ${projectDir} property in the groovy script, but all such attempts failed with a "No such property: mockService for class: Script[n]" error. Admittefly, I was really fumbling around when trying to do that.
I tried using information from this post and others: How do I make soapUI attachment paths relative?
... without any luck. Replacing "test" with "mock," (among other changes), in the solution code from that post resulted in more property errors, e.g.
testFile = new File(mockRunner.project.getPath())
.. led to...
No such property: mockRunner for class: Script3
What I Think I Need
The posts I've found related to this issue all focus on soapUI TestSuites. I really need a solution that is MockService centric or at least sheds some light on how it can be handled differently for MockServices as opposed to TestSuites.
Any help is greatly appreciated. Thanks. Mark.
The Solution - Provided by GargantuChet
The following includes the changes suggested by GargantuChet to solve the problem of trying to access the ${projectDir} property and enable the use of relative paths by defining a new projectDir object within the scope of the groovy script:
def groovyUtils = new com.eviware.soapui.support.GroovyUtils(context)
def projectDir = groovyUtils.projectPath
def req = new XmlSlurper().parseText(mockRequest.requestContent)
if (req =~ "CategoryA")
{
context.responsefile = new File(projectDir, "Test_Files/ID_List_CategoryA.xml").text
}
else
{
context.responsefile = new File(projectDir, "Test_Files/ID_List_CategoryB.xml").text
}
I'm not familiar with Groovy, but I assume the File is a normal java.io.File instance.
Relative paths are interpreted as being relative to the application's current directory. Try something like the following to verify:
def defaultPathBase = new File( "." ).getCanonicalPath()
println "Current dir:" + defaultPathBase
If this is the case here, then you may want to use the new File(String parent, String child) constructor, passing your resource directory as the first argument and the relative path as the second.
For example:
// hardcoded for demonstration purposes
def pathbase = "/Users/chet"
def content = new File(pathbase, "Desktop/sample.txt").text
println content
Here's the result of executing the script:
Chets-MacBook-Pro:Desktop chet$ groovy sample.groovy
This is a sample text file.
It will be displayed by a Groovy script.
Chets-MacBook-Pro:Desktop chet$ groovy sample.groovy
This is a sample text file.
It will be displayed by a Groovy script.
Chets-MacBook-Pro:Desktop chet$
You could have also done the following to get the value of projectDir:
def projectDir = context.expand('${projectDir}');

HttpResponseException when trying to retrieve xml file using groovy script

It always catches the exception and outputs "Unable to read data for $dId:$alias" when I run read():
http = new HTTPBuilder('https://somewebsite.com')
def read(http, path, dId, alias, portalFile, outputFileName) {
try {
println "Reading : path:$path, file:$portalFile"
http.get(path: path,
contentType: TEXT,
query: [id:dId, instance:alias, format:'xml', file:portalFile]) {resp, reader ->
println "response status: ${resp.statusLine}"
println 'Headers: -----------'
resp.headers.each { h ->
println " ${h.name} : ${h.value}"
}
new File(outputFileName).withWriter{out -> out << reader}
}
} catch (HttpResponseException h) {
println "Unable to read data for $dId:$alias"
}
}
If I go to the website using my internet browser and click on the xml file that I need, it works. Is there any way I can output the URL that it connects to?
The best thing to do to see where things are going on wrong is to enable wire debugging for http builder - this will show you the exact requests and responses that are happening. From this you can see what URL is being fetched, with what parameters and headers, and what came back from the server.
Enabling debugging is documented here - http://groovy.codehaus.org/modules/http-builder/doc/index.html#Logging_and_Debugging
E.g. add the following to your log4j.properties
log4j.logger.org.apache.http.headers=DEBUG
log4j.logger.org.apache.http.wire=DEBUG
Or, quicker and dirtier, put this right at the top of your groovy script (YMMV)
System.setProperty('org.apache.commons.logging.Log', 'org.apache.commons.logging.impl.SimpleLog')
System.setProperty('org.apache.commons.logging.simplelog.log.org.apache.http.wire', 'DEBUG')

XMLSlurper appendNode does not see changes

I am having troubles using XMLSlurper to update an XML document. Most things work, but in some situations a "find" doesn't find a Node I just appended (appendNode). The new Node is there at the end of processing, but is not found when I am in the middle of adding children.
I found a post about XMLSlurper that says that finding the new Node requires calling parseText again and/or StreaMarkupBuilder (see below). Really?! That seems so kludgy that I thought I'd verify on SO.
Here is a code snippet. The "find" gets NoChildren even though the Node was just added.
codeNode.appendNode {
'lab:vendorData'() {}
}
vendorNode = codeNode.children().find { it.name() == "vendorData" }
"appendNode doea not modify the slurped document directly. The edit is applied "on the fly" when the document is written out using StreamingMarkupBuilder."
http://markmail.org/message/5nmxbhwna7hr5zcq#query:related%3A5nmxbhwna7hr5zcq+page:1+mid:bkdesettsnfnieno+state:results
Why can't I find my new Node?!
This is what I got to work. Is not elegant, but got past "update" problem:
...
codeNode.appendNode {
'lab:vendorData'() {}
}
//-- must re-slurp to see appended node
labDoc = new XmlSlurper().parseText(serializeXml(labDoc))
codeNode = getResultNodeFor( nextResult.getCode() );
vendorNode = codeNode.children().find { it.name() == "vendorData" }
...
def String serializeXml(GPathResult xml){
XmlUtil.serialize(new StreamingMarkupBuilder().bind {
mkp.declareNamespace("lab", "www.myco.com/LabDocument")
mkp.yield labDoc
} )
}

External Content with Groovy BuilderSupport

I've built a custom builder in Groovy by extending BuilderSupport. It works well when configured like nearly every builder code sample out there:
def builder = new MyBuilder()
builder.foo {
"Some Entry" (property1:value1, property2: value2)
}
This, of course, works perfectly. The problem is that I don't want the information I'm building to be in the code. I want to have this information in a file somewhere that is read in and built into objects by the builder. I cannot figure out how to do this.
I can't even make this work by moving the simple entry around in the code.
This works:
def textClosure = { "Some Entry" (property1:value1, property2: value2) }
builder.foo(textClosure)
because textClosure is a closure.
If I do this:
def text = '"Some Entry" (property1:value1, property2: value2)'
def textClosure = { text }
builder.foo(textClosure)
the builder only gets called for the "foo" node. I've tried many variants of this, including passing the text block directly into the builder without wrapping it in a closure. They all yield the same result.
Is there some way I take a piece of arbitrary text and pass it into my builder so that it will be able to correctly parse and build it?
Your problem is that a String is not Groovy code. The way ConfigSlurper handles this is to compile the text into an instance of Script using GroovyClassLoader#parseClass. e.g.,
// create a Binding subclass that delegates to the builder
class MyBinding extends Binding {
def builder
Object getVariable(String name) {
return { Object... args -> builder.invokeMethod(name,args) }
}
}
// parse the script and run it against the builder
new File("foo.groovy").withInputStream { input ->
Script s = new GroovyClassLoader().parseClass(input).newInstance()
s.binding = new MyBinding(builder:builder)
s.run()
}
The subclass of Binding simply returns a closure for all variables that delegates the call to the builder. So assuming foo.groovy contains:
foo {
"Some Entry" (property1:value1, property2: value2)
}
It would be equivalent to your code above.
I think the problem you described is better solved with a slurper or parser.
See:
http://groovy.codehaus.org/Reading+XML+using+Groovy%27s+XmlSlurper
http://groovy.codehaus.org/Reading+XML+using+Groovy%27s+XmlParser
for XML based examples.
In your case. Given the XML file:
<foo>
<entry name='Some Entry' property1="value1" property2="value2"/>
</foo>
You could slurp it with:
def text = new File("test.xml").text
def foo = new XmlSlurper().parseText(text)
def allEntries = foo.entry
allEntries.each {
println it.#name
println it.#property1
println it.#property2
}
Originally, I wanted to be able to specify
"Some Entry" (property1:value1, property2: value2)
in an external file. I'm specifically trying to avoid XML and XML-like syntax to make these files easier for regular users to create and modify. My current solution uses ConfigSlurper and the file now looks like:
"Some Entry"
{
property1 = value1
property2 = value2
}
ConfigSlurper gives me a map like this:
["Some Entry":[property1:value1,property2:value2]]
It's pretty simple to use these values to create my objects, especially since I can just pass the property/value map into the constructor.

Resources