Make a click event in NodeJS - node.js

I'm trying to make a click from server side.
I'm using nodeJS and I'm not able to use JQuery function.
I would make click on the .next class.
This is what I would do :
while (nbrPage > 0)
{
//my scraping code
nbrPage--;
$('.next').click();
}
Note than the html code to scrape is like this :
<span class="next">
<a id="nextPage-159c6fa8635" class="page" href="/blablabla"></a>
</span>
Does anyone know how to use JQuery methods in NodeJS code or how to make click function in NodeJS ?
EDIT: I'm scraping a website and I want to loop on each pagination and scrap my data from each page. For this I need to go on the next page and click on the html code below. In other words I would use JQuery functions like $('.next').click() in my node js code (using request and cheerio).
Note than I don't want to handle the click event, I'm looking to make the click.
Thanks for your help

Cheerio is a pretty useful tool which allows you to utilize jQuery within Node.JS. You can find more information over at - https://github.com/cheeriojs/cheerio
Request is designed to be the simplest way possible to make http
calls. It supports HTTPS and follows redirects by default.
Check out their documentation - https://github.com/request/request
For server-side, you need to create a function to find the a href with the id that started with "nextPage-". Then IF found you will need to get the value of the attribute href.
From there you would pass that value back to your "request" script, which I assume you already have and continue your scrapping till a "nextPage-" could not be located anymore.
That repetitive sequence of a function calling itself is called "recursion".
Now for what that might look like in code -
// Load Dependencies
const CHEERIO = require("cheerio");
const REQUEST = require("request");
/**
* Scraps HTML to find next page URL
*
* #function getNextPageUrl
*
* #param {string} HTML
*
* #returns {string || boolean} Returns URL or False
*/
function getNextPageUrl(HTML) {
// Load in scrapped html
let $ = CHEERIO.load(HTML);
// Find ID that starts with `nextPage-`
let nextPage = $("span[id^='nextPage-']:first");
// If it is 0, its false
if(nextPage.length) {
// Return href attribute value
return nextPage.attr("href");
} else {
// Nothing found, return false
return false;
}
}
/**
* Scraps the HTML from pages
*
* #function scrapper
*
* #param {string} URL
*
* #returns {string || boolean} Returns URL or False
*/
function scrapper(URL) {
// Check if URL was provided
if(!URL) {
return fasle;
}
// Send out request to URL
REQUEST(URL, function(error, response, body) {
// Check for errors
if(!error && response.statusCode == 200) {
console.log(body) // Show the HTML
// Recursion
let URL = getNextPageURL(body);
scrapper(URL);
} else {
return false;
}
});
}
// Pass to scrapper function test
//console.log(getNextPageURL("<span class='next'><a id='nextPage-159c6fa8635' class='page' href='/blablabla'></a></span>"));
// Start the initial scrapping
scrapper("http://google.com");

It's impossible to do it in Node.js. Node.js is server side, not client side.
As a solution, you can parse href at the link and make a request to scrap the next page. This is how the server-side scrappers usually work.

Related

Web Scraper Pagination

I have created a web scraper where I am trying to fetch the dynamic data which loads in a div after page is load.
Here it is my code and source website url https://www.medizinerkarriere.de/kliniken-sortiert-nach-name.html
async function pageFunction(context) {
// jQuery is handy for finding DOM elements and extracting data from them.
// To use it, make sure to enable the "Inject jQuery" option.
const $ = context.jQuery;
var result = [];
$('#klinikListBox ul').each(function(){
var item = {
Name: $(this).find('li.klName').text().trim(),
Ort: $(this).find('li.klOrt').text().trim(),
Land: $(this).find('li.klLand').text().trim(),
Url:""
};
result.push(item);
});
// To make this work, make sure the "Use request queue" option is enabled.
await context.enqueueRequest({ url: 'https://www.medizinerkarriere.de/kliniken-sortiert-nach-name.html' });
// Return an object with the data extracted from the page.
// It will be stored to the resulting dataset.
return result;
}
But there are on click pagination and I am not sure how to do it.
I tried all method from this link but it didn't work.
https://docs.apify.com/scraping/web-scraper#bonus-making-your-code-neater
Please help and quick help will be highly appreciated.
In this case the pagination loads dynamically on a single page so enqueuing new pages doesn't make sense. You can get to the next page by simply clicking the page button, it is also a good practice to wait a bit after the click.
$('#PGPAGES span').eq(1).click();
await context.waitFor(1000)
You can scrape all pages with a simple loop
const numberOfPages = 8 // You can scrape this number too
for (let i = 1; i <= numberOfPages; i++) {
// Your scraping code, push data to an array and return them in the end
$('#PGPAGES span').eq(i).click();
await context.waitFor(1000)
}

Sending an Excel file from backend to frontend and download it at the frontend

I had created an Excel file at the backend (Express JS) using Exceljs npm module. I have it stored in a temp directory. Now I would want to send the file from the back-end to the front-end and download it there when the user clicks a button. I am struck on two things
1. How to send the file from the backend to the frontend through an HTTP POST request
2. How to then download the file in the front-end
Edited content:
I need the front end to be a button that appends the file to it and then download it. This is how my code looks, I am not getting the file properly from the backend to the front-end
front end file:
function(parm1,parm2,parm3){
let url =${path}?parmA=${parm1}&parmB=${parm2}&parmC=${parm3};
let serviceDetails = {};
serviceDetails["method"] = "GET";
serviceDetails["mode"] = "cors";
serviceDetails["headers"] = {
"Content-Type": "application/json"
};
fetch(url, serviceDetails)
.then(res => {
if (res.status != 200) {
return false;
}
var file = new Blob([res], { type : 'application/octet-stream' });
a = document.createElement('a'), file;
a.href = window.URL.createObjectURL(file);
a.target = "_blank"; 
a.download = "excel.xlsx";
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
}).catch(error => {
return false;
});
}`
router.js
var abc = ... // this is a object for the controller.js file
router.get('/path', function(req, res) {
abc.exportintoExcel(req, res);
});
controller.js
let xyz = ... //this is a object for the service.js file
exports.exportintoExcel = function(req, res) {
xyz.exportintoExcel(reqParam,res);
}
service.js
exportintoExcel(req,response){
//I have a excel file in my server root directory
const filepath = path.join(__dirname,'../../nav.txt');
response.sendFile(filepath);
})
}
This is a complete re-write of an earlier answer, so sorry if anyone needed that one, but this version is superior. I'm using a project created with express-generator and working in three files:
routes/index.js
views/index.ejs
public/javascripts/main.js
index.ejs
Start with an anchor tag that has the download attribute, with whatever filename you wish, and an empty href attribute. We will fill in the href in the main.js file with an ObjectURL that represents the Excel file later:
<body>
<a id="downloadExcelLink" download="excelFile.xlsx" href="#">Download Excel File</a>
<script type="text/javascript" src="/javascripts/main.js"></script>
</body>
public/javascripts/main.js
Select the anchor element, and then make a fetch() request to the route /downloadExcel. Convert the response to a Blob, then create an ObjectURL from this Blob. You can then set the href attribute of the anchor tag to this ObjectURL:
const downloadExcelLink = document.getElementById('downloadExcelLink');
(async () => {
const downloadExcelResponse = await fetch('/downloadExcel');
const downloadExcelBlob = await downloadExcelResponse.blob();
const downloadExcelObjectURL = URL.createObjectURL(downloadExcelBlob);
downloadExcelLink.href = downloadExcelObjectURL;
})();
routes/index.js
In the index router, you simply need to call the res.sendFile() function and pass it the path to the Excel file on your server.
router.get('/downloadExcel', (req, res, next) => {
const excelFilePath = path.join(__dirname, '../tmp/excel.xlsx');
res.sendFile(excelFilePath, (err) => {
if (err) console.log(err);
});
});
That's it! You can find a git repo here of the project. Clone into it and try it out for yourself if you can't get this code to work in your project as it is.
How It Works
When the page loads, 4 requests are fired off to our server, as we can see in the console output:
GET / 200 2.293 ms - 302
GET /stylesheets/style.css 200 1.123 ms - 111
GET /javascripts/main.js 200 1.024 ms - 345
GET /downloadExcel 200 2.395 ms - 4679
The first three requests are for index.ejs (/), the CSS stylesheet, and our main.js file. The fourth request is sent by our call to fetch('/downloadExcel') in the main.js file:
const downloadExcelResponse = await fetch('/downloadExcel');
I have a route-handler setup in routes/index.js at this route that uses res.sendFile() to send a file from our filesystem as the response:
router.get('/downloadExcel', (req, res, next) => {
const excelFilePath = path.join(__dirname, '../tmp/excel.xlsx');
res.sendFile(excelFilePath, (err) => {
if (err) console.log(err);
});
});
excelFilePath needs to be the path to the file on YOUR system. On my system, here is the layout of the router file and the Excel file:
/
/routes/index.js
/tmp/excel.xlsx
The response sent from our Express server is stored in downloadExcelResponse as the return value from the call to fetch() in the main.js file:
const downloadExcelResponse = await fetch('/downloadExcel');
downloadExcelResponse is a Response object, and for our purposes we want to turn it into a Blob object using the Response.blob() method:
const downloadExcelBlob = await downloadExcelResponse.blob();
Now that we have the Blob, we can call URL.convertObjectURL() to turn this Blob into something we can use as the href for our download link:
const downloadExcelObjectURL = URL.createObjectURL(downloadExcelBlob);
At this point, we have a URL that represents our Excel file in the browser, and we can point the href to this URL by adding it to the DOM element we selected earlier's href property:
When the page loads, we selected the anchor element with this line:
<a id="downloadExcelLink" download="excelFile.xlsx" href="#">Download Excel File</a>
So we add the URL to the href here, in the function that makes the fetch request:
downloadExcelLink.href = downloadExcelObjectURL;
You can check out the element in the browser and see that the href property has been changed by the time the page has loaded:
Notice, on my computer, the anchor tag is now:
<a id="downloadExcelLink" download="excelFile.xlsx" href="blob:http://localhost:3000/aa48374e-ebef-461a-96f5-d94dd6d2c383">Download Excel File</a>
Since the download attribute is present on the link, when the link is clicked, the browser will download whatever the href points to, which in our case is the URL to the Blob that represents the Excel document.
I pulled my information from these sources:
JavaScript.info - Blob as URL
Javascript.info - Fetch
Here's a gif of how the download process looks on my machine:
OK, now that I see your code, I can try and help out a little. I have refactored your example a little bit to make it easier for me to understand, but feel free to adjust to your needs.
index.html
I don't know what the page looks like that you're working with, but it looks like in your example you are creating an anchor element with JavaScript during the fetch() call. I'm just creating one with HTML in the actual page, is there a reason you can't do this?
<body>
<a id="downloadLink" download="excel.xlsx" href="#">Download Excel File</a>
<script type="text/javascript" src="/javascripts/test.js"></script>
</body
With that in hand, here is my version of your front end JS file:
test.js
const downloadLink = document.getElementById('downloadLink');
sendFetch('a', 'b', 'c');
function sendFetch(param1, param2, param3) {
const path = 'http://localhost:3000/excelTest';
const url = `${path}?parmA=${param1}&parmB=${param2}&parmC=${param3}`;
const serviceDetails = {};
serviceDetails.method = "GET";
serviceDetails.mode = "cors";
serviceDetails.headers = {
"Content-Type": "application/json"
};
fetch(url, serviceDetails).then((res) => {
if (res.status != 200) {
return false;
}
res.blob().then((excelBlob) => {
const excelBlobURL = URL.createObjectURL(excelBlob);
downloadLink.href = excelBlobURL;
});
}).catch((error) => {
return false;
});
}
I had to fill in some details because I can't tell what is going on from your code. Here are the things I changed:
Selected the DOM element instead of creating it:
Your version:
a = document.createElement('a'), file;
My version:
index.html
<a id="downloadLink" download="excel.xlsx" href="#">Download Excel File</a>
test.js
const downloadLink = document.getElementById('downloadLink');
This saves us the trouble of creating the element. Unless you need to do that for some reason, I wouldn't. I'm also not sure what that file is doing in your original.
Name the function and change parm -> param for arguments list
Your version:
function(parm1,parm2,parm3){
My version:
function sendFetch(param1, param2, param3) {
I wasn't sure how you were actually calling your function, so I named it. Also, parm isn't clear. Param isn't great either, should describe what it is, but I don't know from your code.
Create a path variable and enclose url assignment in backticks
Your version:
let url =${path}?parmA=${parm1}&parmB=${parm2}&parmC=${parm3};
My version:
const path = 'http://localhost:3000/excelTest';
const url = `${path}?parmA=${param1}&parmB=${param2}&parmC=${param3}`;
In your version, that url assignment should throw an error. It looks like you want to use string interpolation, but you need backticks for that, which I added. Also, I had to define a path variable, because I didn't see one in your code.
Cleaned up some formatting
I used 'dot' notation for the serviceDetails, but that was just personal preference. I also changed the spacing of the fetch() call, but no need to reprint that here. Shouldn't effect anything.
Create a blob from the fetch response
Your version:
var file = new Blob([res], { type : 'application/octet-stream' });
My version:
res.blob().then((excelBlob) => {
I'm not sure why you are calling the Blob constructor and what that [res] is supposed to be. The Response object returned from fetch() has a blob() method that returns a promise that resolves to a Blob with whatever MIME-type the data was in. In an Excel documents case, this is application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.
Create an ObjectURL from the Blob and add this URL to the href of the anchor tag.
Your version:
a = document.createElement('a'), file;
a.href = window.URL.createObjectURL(file);
a.target = "_blank";
a.download = "excel.xlsx";
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
My version:
const excelBlobURL = URL.createObjectURL(excelBlob);
downloadLink.href = excelBlobURL;
You have to do a bunch of DOM manipulation, which I'm not sure why you need. If you do have to dynamically create this element, then I'm not sure why you are 'clicking' it, then removing it, if the user is supposed to be able to click it. Maybe clarify for me why you are doing this, or if you really need to do it. Either way, in my version I create the ObjectURL and then assign it, but you could just as easily not store it in a variable.
Call the function that sends the fetch request.
As my function signature is:
function sendFetch(param1, param2, param3)
I needed to call it somewhere in order to fire off the request, so I did so like this:
sendFetch('a', 'b', 'c');
Right when the page loads, as you can see from the server logs:
GET / 304 0.448 ms - -
GET /javascripts/test.js 304 1.281 ms - -
GET /excelTest?parmA=a&parmB=b&parmC=c 304 0.783 ms - -
The first two requests are for the index.html page and the test.js file, then the fetch request is fired with the param's I passed in. I'm not sure how you are doing this in your app, because that is not included in your code.
Everything I just covered is Front-End. I'm assuming your server-side code is actually sending an excel file with your call to response.sendFile() in service.js. If you are sure that the file is getting sent, then the code I've given you should work, when adjusted to your app.
So, in conclusion, what this code does is:
Load an HTML page with an anchor tag with no href attribute set.
Send off a fetch() request to the server.
Turn the fetch response into a Blob, then create an ObjectURL from this Blob.
Assign that ObjectURL to the anchor tag's href attribute.
When the user clicks the 'Download Excel File' link, the Excel sheet should be downloaded. If you didn't want them to see the link until after the fetch request, you could definitely do create the anchor tag in JS instead, let me know if you want to see how to do that.
As before, here is a gif showing how it looks on my machine (this is with your version and my modifications):

Redirect response with pure node.js without frameworks or npm modules

I am writing a simple application on pure node.js without the use of frameworks and npm modules. There was a problem - when the application is redirected, it hangs and after a few minutes an error appears in the browser window: ERR_EMPTY_RESPONSE.
At the same time, if, during the hangup, I quickly press ctrl + c and then quickly restart the server, the request will be fulfilled, and I will successfully redirect to the necessary page.
On php, a similar procedure works successfully.
I tried many different ways, ranging from replacing the status from 302 to 301, writing the status and title directly:
response.statusCode = 302;
response.setHeader('Location', url);
response.end();
ending with adding a host, protocol and port directly:
url = 'http:localhost:3000/${url}';
response.writeHead (302, {'Location': url});
response.end();
// Redirect.js Redirect class
/**
* Which page will the user be redirected to
* #param path
* #param data
* #return {*}
*/
to(path, data = {}){
const session = require('../session').getInstance();
if(Object.keys(data).length) {
session.set('redirect', data);
}
let url = '/${path.replace(/^\/|\/$/g, '')}/';
this.response.writeHead(302, {'Location': url});
this.response.end();
}
// php similar code for example
// helpers.php
/**
* #return string
* http(s)://example.com
* returns the domain name of the application, including the protocol
*/
function domain()
{
$protocol = (!empty($_SERVER['HTTPS']) && $_SERVER['HTTPS'] !== 'off' || $_SERVER['SERVER_PORT'] == 443) ? "https://" : "http://";
$domainName = $_SERVER['HTTP_HOST'];
return $protocol . $domainName;
}
// Redirect.php Redirect class
/**
* #param $path
* #param array $data
*
* Which page will the user be redirected to
*/
public function to($path, $data = [])
{
if($data) {
Session::put('redirect', $data);
}
$url = domain() . '/' . trim( parse_url($path, PHP_URL_PATH), '/' );
header("Location: ${url}");
exit();
}
I expect the user to be redirected to the desired route, but as a result, the application freezes, and a few minutes later an error occurs in the browser window: ERR_EMPTY_RESPONSE.
But if you quickly restart the server, the request will be executed, and the redirection will be successful.
Note: This is not an answer, but to show OP that his Node.js code strategy is correct (I'm not able to post multiple lines of code in the comment), and the root cause of the problem lie on function to() or the code invoke that function.
The Node.js code strategy shown in the question is correct. Here is a simple demo which redirects response with pure Node.js without frameworks or npm modules:
var http = require('http');
http.createServer(function (req, res) {
if (req.url === '/path1') {
res.statusCode = 302;
res.setHeader('Location', '/path2')
res.end();
} else {
res.write('Hello Sun!');
res.end();
}
}).listen(3000);
HTTP request GET /path1 would be redirected to /path2, and display result as Hello Sun! in browser.

Zombiejs how using with specifity situations

I started using zombiejs, but i have some begginer questions:
1.) How testing ajax calls ?
For example i have php ajax action (Zend)
public function ajaxSomeAction()
{
$oRequest = $this->getRequest();
if($oRequest->isXmlHttpRequest() === false || $oRequest->isPost() === false) {
throw new Zend_Controller_Action_Exception('Only AJAX & POST request accepted', 400);
}
//process check params...
}
My zombiejs testing code throws http 400.
2.) How fire jquery plugins public methods ? For example i have code:
(function($) {
$.manager.addInvitation = function()
{
//some code ....
}
$.manager = function(options)
{
//some code
}
})(jQuery);
I try:
Browser.visit(url, function(err, browser, status)
{
// not work
browser.window.jQuery.manager.addInviation();
// also not work
browser.document.jQuery.manager.addInvitation();
browser.window.$.manager.addInvitation();
browser.evaluate('$.manager.addInvitation();');
})
3.) How modifiy header with zombiejs ? For exmaple i want add header x-performace-bot:zombie1 to request send using visit method
Browser = require('zombie');
Browser.visit(url, {debug:true}, function(err, browser, status)
{
//send request witch header x-performace-bot
});
After quick testing (on zombie 0.4.21):
ad 1.
As you're checking ($oRequest->isXmlHttpRequest()) if request is an xml http request, you have to specify (in zombie) X-Requested-With header with a value of XMLHttpRequest.
ad 2.
// works for me (logs jQuery function - meaning it's there)
console.log( browser.window.jQuery );
// that works to...
browser.window.$
Your code must be undefined or there are some other errors in Javascript on your page.
ad 3.
There's a header option, which you can pass just as you do with debug.

jsdom form submission?

I'm trying to use the Node.js packages request and jsdom to scrape web pages, and I want to know how I can submit forms and get their responses. I'm not sure if this is possible with jsdom or another module, but I do know that request supports cookies.
The following code demonstrates how I'm using jsdom (along with request and jQuery) to retrieve and parse a web page (in this case, the Wikipedia home page). (Note that this code is adapted from the jquery-request.js code from this tutorial http://blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs)
var request = require('request'),
jsdom = require('jsdom'),
url = 'http://www.wikipedia.org';
request({ uri:url }, function (error, response, body) {
if (error && response.statusCode !== 200) {
console.log('Error when contacting '+url);
}
jsdom.env({
html: body,
scripts: [
'http://code.jquery.com/jquery-1.5.min.js'
]
}, function (err, window) {
var $ = window.jQuery,
// jQuery is now loaded on the jsdom window created from 'agent.body'
$searchform = $('#searchform'); //search form jQuery object
$('#searchInput').val('Wood');
console.log('form HTML is ' + $searchform.html(),
'search value is ' + $('#searchInput').val()
//how I'd like to submit the search form
$('#searchform .searchButton').click();
);
});
});
The above code prints the HTML from Wikipedia's search form, then "Wood", the value I set the searchInput field to contain. Of course, here the click() method doesn't really do anything, because jQuery isn't operating in a browser; I don't even know if jsdom supports any kind of event handling.
Is there any module that can help me to interact with web pages in this way, or in a similar non-jQuery way? Can this be done in jsdom?
Thanks in advance!
If you don't want to handle the POST request yourself like in the other answer, you can use an alternative to jsdom that does support more things in a browser.
http://www.phantomjs.org/
I'm not familiar with a nodejs library that will let you get a fully interactive client-side view of a web-page, but you can get the results of a form submission without too much worry.
HTML forms are essentially just a way of sending HTTP requests to a specific URL (which can be found as the action attribute of the form tag). With access to the DOM, you can just pull out these values and create your own request for the specified URL.
Something like this as the callback from requesting the wikipedia home page will get you the result of doing a search for "keyboard cat" in english:
var $ = window.jQuery;
var search_term = "keyboard cat";
var search_term_safe = encodeURIComponent(search_term).replace("%20", "+");
var lang = "en";
var lang_safe = encodeURIComponent(lang).replace("%20", "+");
var search_submit_url = $("#searchform").attr("action");
var search_input_name = $("#searchInput").attr("name");
var search_language_name = $("#language").attr("name");
var search_string = search_input_name + "=" + search_term_safe + "&" + search_language_name + "=" + lang_safe;
// Note the wikipedia specific hack by prepending "http:".
var full_search_uri = "http:" + search_submit_url + "?" + search_string;
request({ uri: full_search_uri }, function(error, response) {
if (error && response.statusCode != 200) {
console.log("Got an error from the search page: " + error);
} else {
// Do some stuff with the response page here.
}
});
Basically the important stuff is:
"Submitting a search" really just means sending either a HTTP GET or POST request to the URL specified at the action attribute of the form tag.
Create the string to use for form submission using the name attributes of each of the form's input tags, combined with the value that they are actually submitting, in this format: name1=value1&name2=value2
For GET requests, just append that string to the URL as a query string (URL?query-string)
For POST requests, post that string as the body of the request.
Note that the string used for form submission must be escaped and have spaces represented as +.

Resources