I am new to node.js
I want to try to write node.js client for my web site testing
(stuff like login, filling forms, etc...)
Which module should i use for that?
Since I want to test user login following other user functionality
it should be able to keep session like browser
Also any site where it has example of using that module?
Thanks
As Amenadiel has said in the comments, you might want to use something like Phantom.js for testing websites.
But if you're new to node.js maybe try with something light, like Zombie.js.
An example from their home page:
var Browser = require("zombie");
var assert = require("assert");
// Load the page from localhost
browser = new Browser()
browser.visit("http://localhost:3000/", function () {
// Fill email, password and submit form
browser.
fill("email", "zombie#underworld.dead").
fill("password", "eat-the-living").
pressButton("Sign Me Up!", function() {
// Form submitted, new page loaded.
assert.ok(browser.success);
assert.equal(browser.text("title"), "Welcome To Brains Depot");
})
});
Later on, when you get the hang of it, maybe switch to Phantom (which has webkit beneath, so it's not emulating the Dom).
Related
I have a code which uses nodeJS for a web app using ExpressJS. A sample of the code which redirects the user to a webpage is:
app.post('/accept',(req,res)=>{
res.render("home");
});
The code works fine and redirects the user to the said page, but I want that to happen on a new tab, as in it will redirect the user to the "home" page on a new tab. How do I do that?
Your script is server side and makes a response to the client. You can't define there to open a new tab. So you have to handle that in your frontend via javascript.
I have a challenge I'm running into and cannot seem to find an answer for it anywhere on the web. I'm working on a personal project; it's a Node.js application that uses the request and cheerio packages to hit an end-point and scrape some data... However, the endpoint is a Facebook page... and the display of its content is dependent upon whether the user is logged in or not.
In short, the app seeks to scrape the user's saved links, you know, all that stuff you add to your "save for later" but never actually go back to (at least in my case). The end-point, then, is htpps://www.facebook.com/saved. If, in your browser, you are logged into Facebook, clicking that link will take you where the application needs to go. However, since the application isn't technically going through the browser that has your credentials and your session saved, I'm running into a bit of an issue...
Yes, using the request module I'm able to successfully reach "a" part of Facebook, but not the one I need... My question really is: how should I begin to handle this challenge?
This is all the code I have for the app so far:
var express = require('express');
var fs = require('fs');
var request = require('request');
var cheerio = require('cheerio');
var app = express();
app.get('/scrape', (req, res) => {
// Workspace
var url = 'https://www.facebook.com/saved';
request(url, (err, response, html) => {
if (err) console.log(err);
res.send(JSON.stringify(html));
})
})
app.listen('8081', () => {
console.log('App listening on port 8081');
})
Any input will be greatly appreciated... Currently, I'm on hold...! How could I possibly hit this end-point with credentials (safely) provided by the user so that the application could get legitimately get past authentication and reach the desired end-point?
I don't think you can accomplish that using request-cheerio module since you need to make a post request with your login information.
A headless browser is more appropriate for this kind of project if you want it to be a scraper. Try using casperJs or PhantomJs. It will give you more flexibility but it's not a node.js module so you need to make a step further if you want to incorporate it with express.
One nodeJs module I know that can let you post is Osmosis. If you can make .login(user, pw) to work then that'll be great but I don't think it can successfully login to facebook though.
API if possible would be a much nicer solution but I'm assuming you already looked it up and find nothing in there for what you are looking for.
My personal choice would be to use an RobotProcessAutomation. WinAutomation, for example, is a great tool for manipulating web and scraping. It's a whole new different approach but it can do the job well and can be implemented faster compared to programmatically coding it.
I want to perform following actions at the server side:
1) Scrape a webpage
2) Simulate a click on that page and then navigate to the new page.
3) Scrape the new page
4) Simulate some button clicks on the new page
5) Sending the data back to the client via json or something
I am thinking of using it with Node.js.
But am confused as to which module should i use
a) Zombie
b) Node.io
c) Phantomjs
d) JSDOM
e) Anything else
I have installed node,io but am not able to run it via command prompt.
PS: I am working in windows 2008 server
Zombie.js and Node.io run on JSDOM, hence your options are either going with JSDOM (or any equivalent wrapper), a headless browser (PhantomJS, SlimerJS) or Cheerio.
JSDOM is fairly slow because it has to recreate DOM and CSSOM in Node.js.
PhantomJS/SlimerJS are proper headless browsers, thus performances are ok and those are also very reliable.
Cheerio is a lightweight alternative to JSDOM. It doesn't recreate the entire page in Node.js (it just downloads and parses the DOM - no javascript is executed). Therefore you can't really click on buttons/links, but it's very fast to scrape webpages.
Given your requirements, I'd probably go with something like a headless browser. In particular, I'd choose CasperJS because it has a nice and expressive API, it's fast and reliable (it doesn't need to reinvent the wheel on how to parse and render the dom or css like JSDOM does) and it's very easy to interact with elements such as buttons and links.
Your workflow in CasperJS should look more or less like this:
casper.start();
casper
.then(function(){
console.log("Start:");
})
.thenOpen("https://www.domain.com/page1")
.then(function(){
// scrape something
this.echo(this.getHTML('h1#foobar'));
})
.thenClick("#button1")
.then(function(){
// scrape something else
this.echo(this.getHTML('h2#foobar'));
})
.thenClick("#button2")
thenOpen("http://myserver.com", {
method: "post",
data: {
my: 'data',
}
}, function() {
this.echo("data sent back to the server")
});
casper.run();
Short answer (in 2019): Use puppeteer
If you need a full (headless) browser, use puppeteer instead of PhantomJS as it offers an up-to-date Chromium browser with a rich API to automate any browser crawling and scraping tasks. If you only want to parse a HTML document (without executing JavaScript inside the page) you should check out jsdom and cheerio.
Explanation
Tools like jsdom (or cheerio) allow it to extract information from a HTML document by parsing it. This is fast and works well as long as the website does not contain JavaScript. It will be very hard or even impossible to extract information from a website built on JavaScript. jsdom, for example, is able to execute scripts, but runs them inside a sandbox in your Node.js environment, which can be very dangerous and possibly crash your application. To quote the docs:
However, this is also highly dangerous when dealing with untrusted content.
Therefore, to reliably crawl more complex websites, you need an actual browser. For years, the most popular solution for this task was PhantomJS. But in 2018, the development of PhantomJS was offically suspended. Thankfully, since April 2017 the Google Chrome team makes it possible to run the Chrome browser headlessly (announcement).
This makes it possible to crawl websites using an up-to-date browser with full JavaScript support.
To control the browser, the library puppeteer, which is also maintained by Google developers, offers a rich API for use within the Node.js environment.
Code sample
The lines below, show a simple example. It uses Promises and the async/await syntax to execute a number of tasks. First, the browser is started (puppeteer.launch) and a URL is opened page.goto.
After that, a functions like page.evaluate and page.click are used to extract information and execute actions on the page. Finally, the browser is closed (browser.close).
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// example: get innerHTML of an element
const someContent = await page.$eval('#selector', el => el.innerHTML);
// Use Promise.all to wait for two actions (navigation and click)
await Promise.all([
page.waitForNavigation(), // wait for navigation to happen
page.click('a.some-link'), // click link to cause navigation
]);
// another example, this time using the evaluate function to return innerText of body
const moreContent = await page.evaluate(() => document.body.innerText);
// click another button
await page.click('#button');
// close brower when we are done
await browser.close();
})();
The modules you listed do the following:
Phantomjs/Zombie - simulate browser (headless - nothing is actually displayed). Can be used for scraping static or dynamic. Or testing of your html pages.
Node.io/jsdom - webscraping : extracting data from page (static).
Looking at your requirements, you could use phantom or zombie.
I am trying to create a simple javascript based extension for Google Chrome that takes data from one specific iframe and sends it as part of a POST request to a webpage.
The web page that sends the data submitted by POST request, to my email address.
I tried running the extension, it looks to be running fine, but I am not getting any email.
The servlet which receives form data is very simple, I dont think there is any error in it.
What I want is some way to check if the javascript based extension works or not.
The javascript code is given below-
var mypostrequest=new ajaxRequest()
mypostrequest.onreadystatechange=function(){
if (mypostrequest.readyState==4){
if (mypostrequest.status==200 || window.location.href.indexOf("http")==-1){
document.getElementById("result").innerHTML=mypostrequest.responseText
}
else{
alert("An error has occured making the request")
}
}
}
var namevalue=encodeURIComponent("Arvind")
var descvalue=encodeURIComponent(window.frames['test_iframe'].document.body.innerHTML)
var emailvalue=encodeURIComponent("arvindikchari#yahoo.com")
var parameters="name="+namevalue+"&description="+descvalue &email="+emailvalue
mypostrequest.open("POST", "http://taurusarticlesubmitter.appspot.com/sampleform", true)
mypostrequest.setRequestHeader("Content-type", "application/x-www-form-urlencoded")
mypostrequest.send(parameters)
UPDATE
I made changes so that the content in js file is invoked by background page, but even now the extension is not working.
I put the following code in background.html:
<script>
// Called when the user clicks on the browser action.
chrome.browserAction.onClicked.addListener(function(tab) {
chrome.tabs.executeScript( null, {file: "content.js"});
});
chrome.browserAction.setBadgeBackgroundColor({color:[0, 200, 0, 100]});
</script>
Looking at your code looks like you are trying to send cross domain ajax request from a content script. This is not allowed, you can do that only from background pages and after corresponding domains are declared in the manifest. More info here.
To check if your extension works, you can open dev tools and check if there any errors in the console. Open "Network" tab and see if request was sent to your URL. Place console.log in various places in your code for debugging, or use full featured built in javascript debugger for step-by-step debugging.
Is there a way to check iOS to see if another app has been installed and then launched? If memory serves me this was not possible in early versions but has this been changed?
Doable, but tricky.
Launching installed apps, like the FB or Twitter apps, is done using the Custom URL Scheme. These can be used both in other apps as well as on web sites.
Here's an article about how to do this with your own app.
Seeing if the URL is there, though, can be tricky. A good example of an app that detects installed apps is Boxcar. The thing here is that Boxcar has advanced knowledge of the custom URL's. I'm fairly (99%) certain that there is a canOpenURL:, so knowing the custom scheme of the app you want to target ahead of time makes this simple to implement.
Here's a partial list of some of the more popular URL's you can check against.
There is a way to find out the custom app URL : https://www.amerhukic.com/finding-the-custom-url-scheme-of-an-ios-app
But if you want to scan for apps and deduce their URL's, it can't be done on a non-JB device.
Here's a blog post talking about how the folks at Bump handled the problem.
There is a script like the following.
<script type="text/javascript">
function startMyApp()
{
document.location = 'yourAppScheme://';
setTimeout( function()
{
if( confirm( 'You do not seem to have Your App installed, do you want to go download it now?'))
{
document.location = 'http://itunes.apple.com/us/app/yourAppId';
}
}, 300);
}
</script>
Calling this script from the web (Try to start MyApp), you can determine if your app with scheme "yourAppScheme" is installed on the device or not.
The App will launch if it is installed on the device and "yourAppScheme" is registered in it.
If the app is not installed you can suggest the user to install this app from iTunes.
To check if an app is installed (e.g. Clear):
BOOL installed = [[UIApplication sharedApplication] canOpenURL:[NSURL URLWithString:#"clearapp://"]];
To open that app:
BOOL success = [[UIApplication sharedApplication] openURL:[NSURL URLWithString:#"clearapp://"]];
Hides the error message if the app is not installed
At Branch we use a form of the code below--note that the iframe works on more browsers. Simply substitute in your app's URI and your App Store link.
<!DOCTYPE html>
<html>
<body>
<script type="text/javascript">
window.onload = function() {
// Deep link to your app goes here
document.getElementById("l").src = "my_app://";
setTimeout(function() {
// Link to the App Store should go here -- only fires if deep link fails
window.location = "https://itunes.apple.com/us/app/my.app/id123456789?ls=1&mt=8";
}, 500);
};
</script>
<iframe id="l" width="1" height="1" style="visibility:hidden"></iframe>
</body>
</html>
There's a second possibility that relies on cookies first and the javascript redirect only as a fallback. Here's the logic:
When a user without the app first taps on a link to your app, he or she is redirected straight to the App Store. This is accomplished by a link to your app actually being a dynamically-generated page on your servers with the redirect. You create a cookie and log a "digital fingerprint" of IP address, OS, OS version, etc. on your backend.
When the user installs the app and opens it, you collect and send another "digital fingerprint" to your backend. Now your backend knows the link is installed On any subsequent visits to links associated with your app, your servers make sure that the dynamically-generated redirect page leads to the app, not the App Store, based on the cookie sent up with the request.
This avoids the ugly redirect but involves a ton more work.
To my understanding, because of privacy issues, you can't see if an app is installed on the device. The way around this is to try and launch the app and if it doesn't launch to have the user hit the fall back url. To prevent the mobile safari error from occurring I found that placing it in an iframe helps resolve the issue.
Here's a snippet of code that I used.
<form name="mobileForm" action="mobile_landing.php" method="post">
<input type="hidden" name="url" value="<?=$web_client_url?>">
<input type="hidden" name="mobile_app" value="<?=$mobile_app?>">
<input type="hidden" name="device_os" value="<?=$device_os?>">
</form>
<script type="text/javascript">
var device_os = '<? echo $device_os; ?>';
if (device_os == 'ios'){
var now = new Date().valueOf();
setTimeout(function () {
if (new Date().valueOf() - now > 100)
return;
document.forms[0].submit(); }, 5);
var redirect = function (location) {
var iframe = document.createElement('iframe');
iframe.setAttribute('src', location);
iframe.setAttribute('width', '1px');
iframe.setAttribute('height', '1px');
iframe.setAttribute('position', 'absolute');
iframe.setAttribute('top', '0');
iframe.setAttribute('left', '0');
document.documentElement.appendChild(iframe);
iframe.parentNode.removeChild(iframe);
iframe = null;
};
setTimeout(function(){
window.close()
}, 150 );
redirect("AppScheme");
I struggled with this recently, and here is the solution I came up with. Notice that there is still no surefire way to detect whether the app launched or not.
I serve a page from my server which redirects to an iPhone-specific variant upon detecting the User-Agent. Links to that page can only be shared via email / SMS or Facebook.
The page renders a minimal version of the referenced document, but then automatically tries to open the app as soon as it loads, using a hidden <iframe> (AJAX always fails in this situation -- you can't use jQuery or XMLHttpRequest for this).
If the URL scheme is registered, the app will open and the user will be able to do everything they need. Either way, the page displays a message like this at the bottom: "Did the app launch? If not, you probably haven't installed it yet .... " with a link to the store.