Geb Test Framework -- get raw page content

Is there a way to get the raw page content using Geb ?
For example the following test should work (but PhantomJS seems to bad the JSON response with HTML code):
def "Get page content example -- health check"() {
go ""
assert driver.pageSource.startsWith('{"status":"(good)"')
Note that, YES I understand that I could just NOT use Geb, and simply just make a URL call in Groovy, but for a number of reasons I want to explicitly use Geb (one of the reasons is dealing with redirects).

What a web browser renders when it loads a URL depends on the browser itself, there is nothing you can do about it. PhantomJS uses the same engine as Chrome, thus the two of them render some HTML around the JSON. IE, Edge and Firefox do the same, by the way. HtmlUnit for a change renders the pure JSON. But why bother with exact matches like startsWith if you can just use a regular expression? It is much more flexible:
driver.pageSource =~ /"status":"good"/
This should work in all browser engines.
P.S.: You do not need assert in then: or expect: blocks, that is the beauty of Spock/Geb.


Capybara can not match xml page

I have problem with matching response text on xml page on capybara.
When I use page.should(have_content(arg1)) capybara raises error that there is no \html element (there shouldn't be as it's xml).
When I use page.should(have_xpath(arg1)) it raises Element at 40 no longer present in the DOM (Capybara::Webkit::NodeNotAttachedError)
What is the correct way to test xml ?
When using capybara-webkit, the driver will try to use a browser's HTML DOM to look for elements. That doesn't work, because you don't have an HTML document.
One workaround is to fall back to Capybara's string implementation:
xml = Capybara.string(page.body)
expect(xml).to have_xpath(arg1)
expect(xml).to have_content(arg1)
Assuming your page returns a content type of text/xml, capybara-webkit won't mess with the response body at all, so you can pass it through to Capybara.string (or directly to Nokogiri if you like).

Reusing Yesod widgets in AJAX results

I'm writing a very simple Yesod message list that uses AJAX to add new list items without reloading the page (both in the case of other users modifying the database, or the client themselves adding an item). This means I have to encode the HTML structure of the message items in both the Halmet template (when the page loads initially) and the Julius template (for when the dynamic addition happens). They look something like this:
In homepage.hamlet:
$if not $ null messages
<ul id=#{listId}>
$forall Entity mid message <- messages
<li id=#{toPathPiece mid}>
<p>#{showMarkdown $ messageText message}
<abbr .timeago title=#{showUTCTime $ messagePosted message}>
And in homepage.julius:
function(message) {
$('##{rawJS listId}').prepend(
.append('<p>' + message.text + '</p>')
.append($('<abbr class=timeago />')
.attr('title', message.posted).timeago())
I'd love to be able to unify these two representations somehow. Am I out of luck, or could I somehow abuse widgets into both generating an HTML response, and filling in code in a JavaScript file?
Note: Of course, I understand that the templates would have to work very differently, since the AJAX call is getting its values from a JS object, not from the server. It's a long shot, but I thought I'd see if anyone's thought about this before.
I think it's something of a AJAX best-practice to pick one place to do your template rendering, either on the server or client. Yesod is (currently) oriented toward doing the rendering on the server.
This can still work with AJAX replacement of contents, though. Instead of getting a JSON response from the POST, you should get a text/html response that contains the result of rendering the template on the server with the values that would have been returned via JSON and then replacing the innerHTML of the DOM node that's being updated.
If you want to support both JSON and HTML responses (to support 3rd party applications via API or something) you would have to make the format of the response be a function of the request; either appending ".json" or ".html" to the URL or including a HTTP header that lists the specific document type required by the client.
It would be nice if Yesod provided a 'jwhamlet' template or something that would render the HTML via javascript in order to support client rendering, but I'm not aware of one. That's not to say there isn't one I'm not aware of, though, so keep an eye open for other answers.
If you wanted to make such a thing, you might try tweaking the hamlet quasi-quote code so that instead of expanding the quasi-quotes to an html-generating function, it expanded them to a JSON-generating function and a pre-rendered chunk of text that's a template in mustache-style such that the JSON returned by the function would provide the correct context for the template to be rendered the way you want.

Easy way to get hyperlink info from rendered web page

I'd like do this programmatically:
Given a page URL, I need to get all links on the page. What's important is that at least 3 pieces of link info must be obtained: anchor text, href attribute value, absolute position of the link on the page.
Java CSSBox library is an option, but it's not fully implemented yet(the href attribute value cannot be obtained at the same time and some extra mapping must be done with additional library such as Jsoup). What's more, the CSSBox library renders a page really slow.
It seems that Javascript has all functions available but we have to inject the javascript code into the page and write a driver to take advantage of existing browsers. Scripting languages such as Python and Ruby have support for this as well. It is hard for me to find out the most handy tool.
Does PHP's DOM manipulation library help you?

Node.JS testing DOM with Mocha?

I'm trying to do some simple view testing in my Node.JS app using the Mocha testing framework...
For example, I want to test that the <ol> on the page has n <li> children, given the number of records I setup in my test.
I tried using Apricot to do this, and while I got it to work, when it fails the error messages are fantastically unhelpful... also, it doesn't always work.
What I'd love is a simple way to test the response body for HTML elements, so I can determine if they match the data they should be displaying.
Here's my test in it's current state:
Anyone know how I can do this?
Posting the comment as answer as well.
For DOM manipulation or element finding, I am suggesting the great library cheerio, which can load the html as string and then use jQuery-like selectors. Also it seems to be really lightweight. I replaced the JSDOM with request + cheerio combination.

Safari is more forgiving locally than remotely with malformed HTML. Why?

I ran into a curious issue today. We have a web page that hides the body via CSS and then there's a bit of JavaScript that sets the body to display: block to show it. (This is part of some iFrame busting logic we are required to add).
We were having issues on one page but only in Safari. In taking a look at things, I found that the culprit was that an include file was being called that contained its own body tag so we were ending up with malformed HTML with a body tag nested within the pages existing body tag.
Since the JS was looking for the first body tag the content we actually wanted to show was never shown, since it was wrapped with the second body tag.
I assume Firefox was just forgiving of the HTML and ignored the second body tag. Safari didn't do this when we looked at the page on the server.
However, if I grab the file and run it locally, Safari does tell me:
Extra <body> encountered. Migrating attributes back to the original <body> element and ignoring the tag.
I'm curious as to why Safari might have adopted this 'policy' of ignoring bad HTML locally but not from a server. If it matters, it is an https site we're hitting. Perhaps Safari is being wise and trying to avoid any potential security issues with allowing bad HTML?
