casperjs empy POST data when large values are submitted - node.js

I have a form that is loaded and I just simply submit the form. In the form there is input type=hidden field that has some long string stored in it. This works fine on a regular browser and does not work with casper. On analyzing this with, apache itself is getting empty POST data from casper. If I reduce the data on the hidden input it works fine. Is there a size limit or something defined in casper?
Below is the code:
var casper = require('casper').create();
casper.start('http://localhost/loadForm', function() {
// Wait for the page to be loaded
this.waitForSelector('form[action="/saveConfig"]');
});
casper.then(function() {
this.evaluate(function() {
$('#form').submit();
});
});
casper.run();

The below bug report is what that helped me. I think this is a phantomjs bug. One of the hidden fields was storing a base64 png image and in my html page it was filled by canvas.toDataURL("image/png"). This in casperjs produces a different base64 compared to actual browers. This resulted in $_POST being empty in php. But when I tried file_get_contents("php://input") the data was all present. I solved it by using canvas.toDataURL("image/png", 0). The second argument produces consistent output in both browers and casperjs.
https://github.com/ariya/phantomjs/issues/10455

Related

Example to show how mobify works

I have been looking at the mobify.js website for a while now, but I fail to understand the benefits of using it. (I am stumped to see why would one replace all the images on the page by GrumpyCat image?).
Could you kindly point me to a clear and lucid example, wherein, I can see that depending on the browser resolution my image size changes.
I have done the following tasks till now:
0. Included mobify.js header information
1. Used the mountains.jpg and forest.jpg image in my hosted website (The page contains only these two images)
2. Request the page from a desktop machine, from a tablet (Samsung Galaxy 10 inch), from an android mobile phone.
3. In all the three cases, I see the same image getting downloaded, the size of the image stays the same in all the cases.
I understand that the magic of size reduction can't happen on the fly, but how do I achieve this?
I realize that the Grumpy Cat example is a bit cheeky, but the same concept applies to solve your problem. Instead of replacing the images with Grumpy Cat images, you could write some logic to replace the images with lower-resolution images (i.e. mountains-320.jpg and forest-320.jpg).
With Mobify.js, you need to write the adaptations in the JavaScript snippet that you added to your site. So, to load smaller images for mobile, you could define the path to the lower resolution image in your original HTML like this:
<img src="mountain.jpg" data-mobile-src="mountain-320.jpg" />
<img src="forest.jpg" data-mobile-src="forest-320.jpg" />
Then, in the JavaScript snippet, you could modify it to grab the image in the data-mobile-src attribute instead like this:
if (capturing) {
// Grab reference to a newly created document
Mobify.Capture.init(function(capture){
// Grab reference to the captured document in progres
var capturedDoc = capture.capturedDoc;
var imgs = capturedDoc.getElementsByTagName("img[data-mobile-src]");
for(var i = 0; i < imgs.length; i++) {
var img = imgs[i];
var ogImage = img.getAttribute("x-src");
var mobileImage = img.getAttribute("data-mobile-src");
img.setAttribute("x-src", mobileImage);
img.setAttribute("old-src", ogImage);
}
// Render source DOM to document
capture.renderCapturedDoc();
});
}
Then, you'll see that the mobile site will download and render mountain-320.jpg or forest-320.jpg, but it will not download mountain.jpg or forest.jpg.
Just out of curiousity, what site are you wanting to use Mobify.js on?

Is it possible to use PhantomJS and Node to dynamically generate PDFs from templates?

Background / Need
I am working with a group on a web application using Node.JS and Express. We need to be able to generate reports which can be printed as hard copies and also hard copy forms. Preferably we'd like to dynamically generate PDFs on the server for both reports and hand written forms. We're currently using EJS templates on the server.
Options
I was thinking that it would be convenient to be able to use templates to be able to build forms/reports and generate a PDF from the resulting HTML, however my options to do this appear limited as far as I can find. I have looked at two different possible solutions:
PhantomJS -- (npm node-phantom module)
PDFKit
EDIT: I found another Node.JS module which is able to generate PDFs from HTML called node-wkhtml which relies on wkhtmltopdf. I am now comparing using node-phantom and node-wkhtml. I have been able to generate PDFs on a Node server with both of these and they both appear to be capable of doing what I need.
I have seen some examples of using PhantomJS to render PDF documents from websites, but all of the examples I have seen use a URL and do not feed it a string of HTML. I am not sure if I could make this work with templates in order to dynamically generate PDF reports.
When a request for a report comes in I was hoping to generate HTML from an EJS template, and use that to generate a PDF. Is there anyway for me to use Phantom to dynamically create a page completely on the server without making a request?
My other option is to use PDFkit which allows dynamic generation of PDFs, but it is a canvas-like API and doesn't really support any notion of templates as far as I can tell.
The Question
Does anyone know if I can use PhantomJS with Node to dynamically generate PDFs from HTML generated from a template? Or does anyone know any other solutions I can use to generate and serve printable reports/forms from my Node/Express back-end.
EJS seems to run fine in PhantomJS (after installing the path module). To load a page in PhantomJS given a string of HTML, do page.content = '<html><head>...';.
npm install ejs and npm install path, then:
var ejs = require('ejs'),
page = require('webpage').create();
var html = ejs.render('<h1><%= title %></h1>', {
title: 'wow'
});
page.content = html;
page.render('test.pdf');
phantom.exit();
(Run this script with phantomjs, not node.)
I am going to post an answer for anyone trying to do something similar with node-phantom. Because node-phantom controls the local installation of PhantomJS, it must use asynchronous methods for everything even when the corresponding PhantomJS operation is synchronous. When setting the content for a page to be rendered in PhantomJS you can simply do:
page.content = '<h1>Some Markup</h1>';
page.render('page.pdf');
However, using the node-phantom module within node you must use the page.set method. This is the code I used below.
'use strict';
var phantom = require('node-phantom');
var html = '<!DOCTYPE html><html><head><title>My Webpage</title></head>' +
'<body><h1>My Webpage</h1><p>This is my webpage. I hope you like it' +
'!</body></html>';
phantom.create(function (error, ph) {
ph.createPage(function (error, page) {
page.set('content', html, function (error) {
if (error) {
console.log('Error setting content: %s', error);
} else {
page.render('page.pdf', function (error) {
if (error) console.log('Error rendering PDF: %s', error);
});
}
ph.exit();
});
});
});
A really easy solution to this problem is the node-webshot module - you can put raw html directly in as an argument and it prints the pdf directly.

How can I replicate Chrome's ability to 'resolve' a DOM from bad html?

I'm using cheerio and node.js to parse a webpage and then use css selectors to find data on it. Cheerio doesn't perform so well on malformed html. jsdom is more forgiving, but both behave differently and I've seen both break when the other works fine in certain cases.
Chrome seems to do a fine job with the same malformed html in creating a DOM.
How can I replicate Chrome's ability to create a DOM from malformed HTML, then give the 'cleaned' html representation of this DOM to cheerio for processing?
This way I'll know the html it gets is wellformed. I tried phantomjs by setting page.content, but then when I read page.content's value the html is still malformed.
So you can use https://github.com/aredridel/html5/ which is a lot more forgiving and from my experience works where jsdom fails.
But last time I tested it, a few month back, it was super slow. I hope it got better.
Then there is also the possibility to spawn a phantomjs process and to output on stdout a json of the data you want to feed it back to your Node.
This seems to do the trick, using phantomjs-node and jquery:
function cleanHtmlWithPhantom(html, callback){
var phantom = require('phantom');
phantom.create(
function(ph){
ph.createPage(
function(page){
page.injectJs(
"/some_local_location/jquery_1.6.1.min.js",
function(){
page.evaluate(
function(){
$('html').html(newHtml)
return $('html').html();
}.toString().replace(/newHtml/g, "'"+html+"'"),
function(result){
callback("<html>" + result + "</html>")
ph.exit();
}
)
}
);
}
)
}
)
}
cleanHtmlWithPhantom(
"<p>malformed",
function(newHtml){
console.log(newHtml);
}
)

Not able to get window.location.search value in HTML - jQueryMobile

My javascript code looks like this
var query = window.location.search.replace('?','');
fetchContent(apiUrl + 'service/' + query + '?callback=?', function(data) {
$('.content').append(data).trigger('create');
});
I am using jQueryMobile, nodeJS for services and views. Also, using EJS templates to display data.
The problem is that I am not able to get value for variable query in the above code and hence the next following lines form up a parameter 'apiUrl/service/?callback=?' for fetchContent method which is a wrong URL for my service. The correct parameter for my fetchContent method should be 'apiUrl/service/1234?callback=?'.
Interesting thing is that this code works fine when I open the link in new tab. URL for the above HTML page in my code is some.html?1234. According to this the value in JS code should be 1234 but it is empty.

Scraping URLs from a node.js data stream on the fly

I am working with a node.js project (using Wikistream as a basis, so not totally my own code) which streams real-time wikipedia edits. The code breaks each edit down into its component parts and stores it as an object (See the gist at https://gist.github.com/2770152). One of the parts is a URL. I am wondering if it is possible, when parsing each edit, to scrape the URL for each edit that shows the differences between the pre-edited and post edited wikipedia page, grab the difference (inside a span class called 'diffchange diffchange-inline', for example) and add that as another property of the object. Right not it could just be a string, does not have to be fully structured.
I've tried using nodeio and have some code like this (i am specifically trying to only scrape edits that have been marked in the comments (m[6]) as possible vandalism):
if (m[6].match(/vandal/) && namespace === "article"){
nodeio.scrape(function(){
this.getHtml(m[3], function(err, $){
//console.log('getting HTML, boss.');
console.log(err);
var output = [];
$('span.diffchange.diffchange-inline').each(function(scraped){
output.push(scraped.text);
});
vandalContent = output.toString();
});
});
} else {
vandalContent = "no content";
}
When it hits the conditional statement it scrapes one time and then the program closes out. It does not store the desired content as a property of the object. If the condition is not met, it does store a vandalContent property set to "no content".
What I am wondering is: Is it even possible to scrape like this on the fly? is the scraping bogging the program down? Are there other suggested ways to get a similar result?
I haven't used nodeio yet, but the signature looks to be an async callback, so from the program flow perspective, that happens in the background and therefore does not block the next statement from occurring (next statement being whatever is outside your if block).
It looks like you're trying to do it sequentially, which means you need to either rethink what you want your callback to do or else force it to be sequential by putting the whole thing in a while loop that exits only when you have vandalcontent (which I wouldn't recommend).
For a test, try doing a console.log on your vandalContent in the callback and see what it spits out.

Resources