get a browser rendered html+javascript - linux

I need a comandline tool (or Javascript/PHP, but i think commandline is the one way) for render and get the rendered content of URL, but the important its I need to renderer the Javascript not only the CSS/Html/images.
For example command like: "renderengine http://www.google.es outputfile.html" and the content of the web (parsed html and javascript executed) isa saved in outputfile.html.
I need this because i need to take the result of a full javascript website like grooveshark, the site load all using javascript/ajax and the crawlers dont find nothing, only basic HTML empty template (because is loaded after using ajax/javscript)
Exists any browser engine for linux with support to Javascript (for example V8) that output the result for save in files?

Selenium : very complete solution with bindings in many languages
puppeteer : headless Chrome API, usable in NodeJS or as a command-line tool
HTtrack : command-line tool
Apache Notch & webmagic : open source Java web crawlers
pholcus : "distributed & high concurrency" web crawler written in Go
Xvfb a display server implementing the X11 display server protocol, without showing any screen output. I have used it successfully with Travis CI and Protractor as an example. Alternative: XDummy
PhantomJS (first suggested by nvuono) : can export the rendered page as non-HTML (pdf, png...). PhantomJS development is suspended until further notice (more details).
Closely related: SlimerJS, CasperJS
And there are many Python web scraping libraries:
Scrapy
pyspider
ghost.py
splinter

Try phantomjs from www.phantomjs.org and you can easily modify the included rasterize.js to export the rendered HTML. It's based on webkit and does full evaluation of your target site's javascript, allowing you to adjust timeouts or execute your own code first if you wish. I personally use it to save hardcopy HTML file version of fully-rendered knockout.js templates.
It executes javascript so I just did something like this and saved the console output to a file:
var markup = page.evaluate(function(){return document.documentElement.innerHTML;});
console.log(markup);
phantom.exit();

Related

Trying to write to a json file using Node fs.writeFile

I hope I'm saying this correctly. What I'm trying to do is write to a json file using fs.writeFile.
I can get it to work using the command line but what I want to do is call a function maybe a button click to update the json file.
I figure I would need some type of call to the node server which is local port 8080. I was researching and seen somebody mention using .post but still can't wrap my head around how to write the logic.
$(".button").on("click", function(event) {
fs.writeFile("./updateme.json", "{test: 1}", function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
});
Using jQuery along with fs? Wow that could be great! Unfortunately that is not as simple as that!
Let me introduce you to server-side VS client-side JavaScript. Well actually there are a lot of resources on the net about that - just google it, or check the answers to this other StackOverflow question. Basically JavaScript can run either on a browser (Chrome, Mozilla...) or as a program (usually a server written in NodeJS), and while the language is (almost) the same, both platforms don't have the same features.
The script that you're showing should run in a browser, because it's using jQuery and interacting with buttons and stuff (aka the DOM). Can you imagine what a mess it would be if that script could interact with the file system? Any page you'll visit will be able to crawl around in your holiday pictures and other personal stuff you keep on your computer. Bad idea! That is why some libraries like fs are not available in the browser.
Similarly, some libraries like jQuery are not available (or simply useless) in the server, because there is no HTML and user interaction, only headless programs running.
So, what can I do to write a JSON file after a user clicks on a button?
You can set up:
A NodeJS server that will write a JSON file
Make jQuery call this server with the data to be written after the user clicks on a button
If you want further guidelines on this, tell me in the comments! I'll be ready to edit my question so as to include instructions on setting up such an environment.

How to inject script using node.js code?

You all know open npm package: https://www.npmjs.com/package/open
Using this package, one can write the following code:
var open = require('./node_modules/open/lib/open.js')
open('http://www.cnn.com')
and activating it by:
$ node app.js
will open a browser window of cnn.com.
I want my script to open this site and inject some code to the console. I mean that the browser will behave like I clicked F12, went to 'console' tab and typed in console the code:
alert('Hello World')
Do you know how to do it?
The open module is used to "Open a file or url in the user's preferred application."
It can open the preferred application (a browser in this case) but it cannot control it. In fact, it doesn't even know what browser will that be (or even if that will be a browser).
What you are asking for can be achieved with tools like PhantomJS ("PhantomJS is a headless WebKit scriptable with a JavaScript API."), Nightmare.js ("A high-level browser automation library.") or CasperJS ("Navigation scripting & testing for PhantomJS and SlimerJS"), see:
http://phantomjs.org/
http://www.nightmarejs.org/
http://casperjs.org/

How to configure Cucumber and Capybara to use the Rack::Test driver?

I am a newbie to Capybara.
Here is my configuration within file env.rb
Capybara.configure do |config|
config.run_server = false
#config.default_driver = :selenium
config.default_driver = :rack_test
config.app_host = 'point to my localhost port 3000'
end
Everything runs just fine if I set default_driver to :selenium. But I need to set the driver to :rack_test, so that when running cucumber command, it will not open the web browser.
Many thanks,
P/S If you are an expert, please show me the learning path, I'm not expecting someone showing them selves.
I presume you want to test against a test server controlled by capybara (which is the normal way to do it), rather than testing against your dev instance (the one at localhost:3000) or a staging server or something.
First, configure capybara to run your Rails app. The usual way to do this is to add the cucumber-rails gem to your Gemfile and require 'cucumber/rails' in your env.rb. You can also set up capybara to run Rails (or any Rack app) manually.
Having done that, capybara will do what you want (use the Rack::Test driver) by default. Remove the configuration that you showed from your env.rb and Cucumber/capybara will work the way you want.
If you also want some scenarios to use Javascript, tag those scenarios with #javascript and add
Capybara.javascript_driver = :selenium
to your env.rb. Capybara will continue to use its Rack::Test driver for scenarios without the tag, and will use its Selenium driver for scenarios with the tag.
Thank you Dave for helping me during the time. Briefly, in order for running "cucumber" without triggering to open a web-browser (which is rack-test), here is the configuration:
1> File env.rb.
require 'cucumber/rails'
Only 1 line above is enough.
2> File .feature
Feature: Post a new Product
Feature: Post a new Product
Scenario: Open new product page
Given I open new product site
When I input new product
Then I should see the product created confirmed
By the way, we don't need "Capybara.javascript_driver = :selenium" within file env.rb.
There's still so many tricky things I need to learn about capybara and cucumber

Actionscript 3.0 Disable navigateToURL

A web game I play on that allows user uploaded content has been having a lot of issues with people using the navigateToURL function to send players to random websites. I was curious if there was a way to disable this function using Actionscript 2 or 3. I have seen a way to do it using the HTML embed but I do not have administrative access to the website.
After doing some more research, I have come up with a solid answer:
You should use a combination of PHP and an executable called swfdump on the server side to validate the user uploaded content.
swfdump is an exe file located in the bin folder of the Flex SDK. You can run it from PHP using exec.
It will read the bytecode of the swf and produce a report. From that you can easily locate which files contain navigateToURL() and reject the files.
I tested a file of my own using swfdump -abc -out myfilereport.swfx myfile.swf
and in that output I found this:
findpropstrict flash.net:navigateToURL
findpropstrict flash.net:URLRequest
pushstring "http://www.plasticsturgeon.com"
constructprop flash.net:URLRequest (1)
callproperty flash.net:navigateToURL (1)
The url I was using was "http://www.plasticsturgeon.com". But it would be far easuer to just eliminate any swf that includes flash.net.navigateToURL. Once you identify tha is present you can generate an error notice to your end user.
So using this method you can find and reject any swf that is using navigate to URL. You could even create a batch to run and invalidate any existing assert with this problem.
More information about using bytecode:
http://code.google.com/p/redtamarin/wiki/ABC
And about decompiling ASbytecode:
http://dougmccune.com/flex/FOTB_Decompiling_Doug_McCune.pdf

Chrome extension: "Uncaught Error: "getBackgroundPage" can only be used in extension processes...."

I published a chrome extension to testers only. The app seems to work very well. I don't see any errors when inspecting the console for the popup or the background page. However, I get the following error when inspecting the console for any web page: "Uncaught Error: "getBackgroundPage" can only be used in extension processes. See the content scripts documentation for more extensions/schema_generated_bindings.js:418"
This app contains several JavaScript files, but each one includes the code within a self executing function. The "getBackgroundPage" calls are in the JavaScript files.
Could you please help? Isn't the app I built an isolated module independent from any web page? How can I prevent this error from happening?
I had the same error when I was trying to communicate with the background page from my content script. The correct way of doing it is via Message Passing. It is very well documented here : https://developer.chrome.com/extensions/messaging.html

Resources