Puppeteer (Cluster) closing page when I interact with it

Puppeteer (Cluster) closing page when I interact with it - node.js

In a NodeJS v10.x.x environment, when trying to create a PDF page from some HTML code, I'm getting a closed page issue every time I try to do something with it (setCacheEnabled, setRequestInterception, etc...):
async (page, data) => {
try {
const {options, urlOrHtml} = data;
const finalOptions = { ...config.puppeteerOptions, ...options };
// Set caching flag (if provided)
const cache = finalOptions.cache;
if (cache != undefined) {
delete finalOptions.cache;
await page.setCacheEnabled(cache); //THIS LINE IS CAUSING THE PAGE TO BE CLOSED
}
// Setup timeout option (if provided)
let requestOptions = {};
const timeout = finalOptions.timeout;
if (timeout != undefined) {
delete finalOptions.timeout;
requestOptions.timeout = timeout;
}
requestOptions.waitUntil = 'networkidle0';
if (urlOrHtml.match(/^http/i)) {
await page.setRequestInterception(true); //THIS LINE IS CAUSING ERROR DUE TO THE PAGE BEING ALREADY CLOSED
page.once('request', request => {
if(finalOptions.method === "POST" && finalOptions.payload !== undefined) {
request.continue({method: 'POST', postData: JSON.stringify(finalOptions.payload)});
}
});
// Request is for a URL, so request it
await page.goto(urlOrHtml, requestOptions);
}
return await page.pdf(finalOptions);
} catch (err) {
logger.info(err);
}
};
I read somewhere that this issue could be caused due to some await missing, but that doesn't look like my case.
I'm not using directly puppeteer, but this library that creates a cluster on top of it and handles processes:
https://github.com/thomasdondorf/puppeteer-cluster

You already gave the solution, but as this is a common problem with the library (I'm the author 🙂) I would like to provide some more insights.
How the task function works
When a job is queued and ready to be executed, puppeteer-cluster will create a page and call the task function (given to cluster.task) with the created page object and the queued data. The cluster then waits until the Promise is finished (fulfilled or rejected) and will close the page and execute the next job in the queue.
As an async-function is implicitly creating a Promise, this means as soon as the async-function given to the cluster.task function is finished, the page is closed. There is no magic happening to determine if the page might be used in the future.
Waiting for asynchronous events
Below is a code sample with a common mistake. The user might want to wait for an external event before closing the page as in the (not working) example below:
Non-working (!) code sample:
await cluster.task(async ({ page, data }) => {
await page.goto('...');
setTimeout(() => { // user is waiting for an asynchronous event
await page.evaluate(/* ... */); // Will throw an error as the page is already closed
}, 1000);
});
In this code, the page is already closed before the asynchronous function is executed. To correct way to do this would be to return a Promise instead.
Working code sample:
await cluster.task(async ({ page, data }) => {
await page.goto('...');
// will wait until the Promise resolves
await new Promise(resolve => {
setTimeout(() => { // user is waiting for an asynchronous event
try {
await page.evalute(/* ... */);
resolve();
} catch (err) {
// handle error
}
}, 1000);
});
});
In this code sample, the task function waits until the inner promise is resolved until it resolves the function. This will keep the page open until the asynchronous function calls resolve. In addition, the code uses a try..catch block as the library is not able to catch events thrown inside asynchronous code blocks.

I got it.
I was indeed forgetting an await to the call that was made to the function I posted.
That call was in another file that I use fot the cluster instance creation:
async function createCluster() {
//We will protect our app with a Cluster that handles all the processes running in our headless browser
const cluster = await Cluster.launch({
concurrency: Cluster[config.cluster.concurrencyModel],
maxConcurrency: config.cluster.maxConcurrency
});
// Event handler to be called in case of problems
cluster.on('taskerror', (err, data) => {
console.log(`Error on cluster task... ${data}: ${err.message}`);
});
// Incoming task for the cluster to handle
await cluster.task(async ({ page, data }) => {
main.postController(page, data); // <-- I WAS MISSING A return await HERE
});
return cluster;
}

Related

Node.js concurrent API GETs, return as soon as one responses

I have a list of APIs I want to call GET simultaneously on all of them and return as soon as one API finishes the request with a response code of 200.
I tried using a for-loop and break, but that doesn't seem to work. It would always use the first API
import axios from 'axios';
const listOfApi = ['https://example.com/api/instanceOne', 'https://example.com/api/instanceTwo'];
let response;
for (const api of listOfApi) {
try {
response = await axios.get(api, {
data: {
url: 'https://example.com/',
},
});
break;
} catch (error) {
console.error(`Error occurred: ${error.message}`);
}
}

You can use Promise.race() to see which of an array of promises finishes first while running all the requests in parallel in flight at the same time:
import axios from 'axios';
const listOfApi = ['https://example.com/api/instanceOne', 'https://example.com/api/instanceTwo'];
Promise.any(listOfApi.map(api => {
return axios.get(api, {data: {url: 'https://example.com/'}}).then(response => {
// skip any responses without a status of 200
if (response.status !== 200) {
throw new Error(`Response status ${response.status}`, {cause: response});
}
return response;
});
})).then(result => {
// first result available here
console.log(result);
}).catch(err => {
console.log(err);
});
Note, this uses Promise.any() which finds the first promise that resolves successfully (skipping promises that reject). You can also use Promise.race() if you want the first promise that resolves or rejects.

I think jfriend00's answer is good, but I want to expand on it a bit and show how it would look with async/await, because that's what you are already using.
As mentioned, you can use Promise.any (or Promise.race). Both take an array of promises as argument. Promise.any will yield the result of the first promise that resolves successfully, while Promise.race will simply wait for the first promise that finishes (regardless of whether it was fulfilled or rejected) and yield its result.
To keep your code in the style of async/await as it originally was, you can map the array using an async callback function, which will effectively return a promise. This way, you don't have to "branch off into .then territory" and can keep the code more readable and easier to expand with conditions, etc.
This way, the code can look as follows:
import axios from 'axios';
const listOfApi = ['https://example.com/api/instanceOne', 'https://example.com/api/instanceTwo'];
try {
const firstResponse = await Promise.any(listOfApi.map(async api => {
const response = await axios.get(api, {
data: {
url: 'https://example.com/',
},
});
if (response.status !== 200) {
throw new Error(`Response status ${response.status}`, {cause: response});
}
return response;
}));
// DO SOMETHING WITH firstResponse HERE
} catch (error) {
console.error('Error occured:', error);
}
Side note: I changed your console.error slightly. Logging only error.message is a common mistake that hinders you from effective debugging later on, because it will lack a lot of important information because it prints only the message and not the error stack, the error name or any additional properties the error may have. Using .stack and not .message will already be better as it includes name and stack then, but what's best is to supply the error as separate argument to console.error so that inspect gets called on it and it can print the whole error object, with stack and any additional properties you may be interested in. This is very valuable when you encounter an error in production that is not so easy to reproduce.

Possible race condition Node/Express

I am trying to understand a case I am running into with my Node/Express application.
Let's say I have the following which is a function called when a specific route is hit:
async function routeThatDoesStuff(req, res, next) {
doAsyncStuff()
res.status(200).json({ message: 'Completed' })
}
In this case doAsyncStuff() does database operations on resources that are not critical to send back to the user so theoretically we don't need to await it. However, it seems that this operation does not actually complete unless I put the await in front.
My guess is that this has to do with the event loop etc in Node. That potentially because the route is completing before the doAsyncStuff() completes, the function doesn't actually complete because Node terminated it prematurely.
My big question is how Node handles async children function execution when parent functions have already completed?

An example to show that this should work:
const sleep = (milliseconds, cb) => {
return new Promise(resolve => setTimeout(() => {
resolve(cb())
}, milliseconds))
}
async function routeThatDoesStuff(req, res, next) {
sleep(5000, () => console.log('done'))
res.status(200).json({ message: 'Completed' })
}
JSON response is shown immediately, callback is executed after 5 seconds even though the response has been sent.

How write simple Jest mock for express-session's req.session.destroy()

I am writing a grapqhl server that has a simple logout mutation. Everything works as expected when I run the server and I can log out by destroying the session and clearing the cookie just fine.
Here is the resolver:
export default async (root, args, context) => {
console.log("THIS WILL LOG")
await new Promise((res, rej) =>
context.req.session.destroy(err => {
if (err) {
return rej(false);
}
context.res.clearCookie("qid");
return res(true);
})
);
console.log("NEVER HERE BEFORE TIMEOUT");
// 4. Return the message
return {
code: "OK",
message: "You have been logged out.",
success: true,
item: null
};
};
I am attempting to write a simple test just to verify that the req.session.destroy and res.clearCookie functions are actually called. At this point I AM NOT attempting to test if a cookie is actually cleared, as I am not actually starting up the server, I am just testing that the graphql resolver was ran correctly and that it called the right functions.
Here is a portion of my test:
describe("confirmLoginResolver", () => {
test("throws error if logged in", async () => {
const user = await createTestUser();
const context = makeTestContext(user.id);
context.req.session.destroy = jest
.fn()
.mockImplementation(() => Promise.resolve(true));
context.res.clearCookie = jest.fn();
// this function is just a helper to process my graphql request.
// it does not actually start up the express server
const res = await graphqlTestCall(
LOGOUT_MUTATION, // the graphql mutation stored in a var
null, // no variables needed for mutation
null // a way for me to pass in a userID to mock auth state,
context // Context override, will use above context
);
console.log(res);
expect(context.req.session.destroy).toHaveBeenCalled();
// expect(res.errors.length).toBe(1);
// expect(res.errors).toMatchSnapshot();
});
});
Again, everything works correctly when actually running the server. The problem is that when I attempt to run the above test, I always get a jest timeout:
Timeout - Async callback was not invoked within the 5000ms timeout specified by jest.setTimeout.
The reason is that the await section of above resolver will hang because it's promise.resolve() is never being executed. So my console will show "THIS WILL LOG", but will never get to "NEVER HERE BEFORE TIMEOUT".
I suspect I need to write a better jest mock to more accurately simulate the callback inside of context.req.session.destroy, but I can not figure it out.
Any ideas how I can write a better mock implementation here?
context.req.session.destroy = jest
.fn()
.mockImplementation(() => Promise.resolve(true));
Is not cutting it. Thoughts?

Try
context.req.session.destroy = jest
.fn()
.mockImplementation((fn) => fn(false));

NodeJS using FS and AsyncJS blocking the UI

I'm using react, electron, nodejs, asyncjs redux and thunk.
I wrote the following code which is supposed to download a list of files and write it to disk. In my code when the user presses a button i call this actionCreator:
export function downloadList(pack) {
return (dispatch, getState) => {
const { downloadManager } = getState();
async.each(downloadManager.downloadQueue[pack].libs, async (url, callback) => {
const filename = url.split('/').pop().split('#')[0].split('?')[0];
await downloadFile(url, `dl/${filename}`);
callback();
}, (err) => {
if (err) {
console.log('A file failed to process');
} else {
dispatch({
type: DOWNLOAD_COMPLETED,
packName: pack
});
}
});
};
}
async function downloadFile(url, path) {
const file = fs.createWriteStream(path);
const request = https.get(url, (response) => {
response.pipe(file);
file.on('finish', () => {
file.close();
});
}).on('error', (err) => { // Handle errors
fs.unlink(path); // Delete the file async. (But we don't check the result)
});
}
It does what it's supposed to do but while it does that, it blocks the entire UI. I really can't understand why it's happening since if i use an
setTimeout
with 3000ms delay inside the async.each it doesn't block the UI.
Another strange behaviour is that if i use the eachLimit function of asyncJS it just downloads me the limit of files, so if i want to download 100 files but i set eachLimit to 10 parallel, it just downloads the first 10 files and then stops. Can you enlight me about this?
I wanted to use axios to download files since it doesn't need to know if the urls are http or https but i can't find any resource on using axios with stream responsetype

I can answer the first part. Pretty much every existent implementation of JavaScript runs on one thread. This means that the runtime is concurrent, but not parallel, i.e. the runtime does one and exactly one thing at a time. This means that if there is a function call that takes a while, it will block everything else. Therefore, something in the downloadList function is blocking the event loop. However, if you use setTimeout, then the downloadList function will be pushed onto the message queue, which will unblock the event and allow the UI to be rendered. For more information on the event loop check out this video

Wait for an event to happen before sending HTTP response in NodeJS?

I'm looking for a solution to waiting for an event to happen before sending a HTTP response.
Use Case
The idea is I call a function in one of my routes: zwave.connect("/dev/ttyACM5"); This function return immediately.
But there exists 2 events that notice about if it succeed or fail to connect the device:
zwave.on('driver ready', function(){...});
zwave.on('driver failed', function(){...});
In my route, I would like to know if the device succeed or fail to connect before sending the HTTP response.
My "solution"
When an event happen, I save the event in a database:
zwave.on('driver ready', function(){
//In the database, save the fact the event happened, here it's event "CONNECTED"
});
In my route, execute the connect function and wait for the event to
appear in the database:
router.get('/', function(request, response, next) {
zwave.connect("/dev/ttyACM5");
waitForEvent("CONNECTED", 5, null, function(){
response.redirect(/connected);
});
});
// The function use to wait for the event
waitForEvent: function(eventType, nbCallMax, nbCall, callback){
if(nbCall == null) nbCall = 1;
if(nbCallMax == null) nbCallMax = 1;
// Looking for event to happen (return true if event happened, false otherwise
event = findEventInDataBase(eventType);
if(event){
waitForEvent(eventType, nbCallMax, nbCall, callback);
}else{
setTimeout(waitForEvent(eventType, callback, nbCallMax, (nbCall+1)), 1500);
}
}
I don't think it is a good practice because it iterates calls over the database.
So what are your opinions/suggestions about it?

I've gone ahead and added the asynchronous and control-flow tags to your question because at the core of it, that is what you're asking about. (As an aside, if you're not using ES6 you should be able to translate the code below back to ES5.)
TL;DR
There are a lot of ways to handle async control flow in JavaScript (see also: What is the best control flow module for node.js?). You are looking for a structured way to handle it—likely Promises or the Reactive Extensions for JavaScript (a.k.a RxJS).
Example using a Promise
From MDN:
The Promise object is used for asynchronous computations. A Promise represents a value which may be available now, or in the future, or never.
The async computation in your case is the computation of a boolean value describing the success or failure to connect to the device. To do so, you can wrap the call to connect in a Promise object like so:
const p = new Promise((resolve) => {
// This assumes that the events are mutually exclusive
zwave.connect('/dev/ttyACM5');
zwave.on('driver ready', () => resolve(true));
zwave.on('driver failed', () => resolve(false));
});
Once you have a Promise representing the state of the connection, you can attach functions to its "future" value:
// Inside your route file
const p = /* ... */;
router.get('/', function(request, response, next) {
p.then(successful => {
if (successful) {
response.redirect('/connected');
}
else {
response.redirect('/failure');
}
});
});
You can learn more about Promises on MDN, or by reading one of many other resources on the topic (e.g. You're Missing the Point of Promises).

Have you tried this? From the look of it, your zwave probably have already implemented an EventEmmiter, you just need to attach a listener to it
router.get('/', function(request, response, next) {
zwave.connect("/dev/ttyACM5");
zwave.once('driver ready', function(){
response.redirect(/connected);
});
});

There is a npm sync module also. which is used for synchronize the process of executing the query.
When you want to run parallel queries in synchronous way then node restrict to do that because it never wait for response. and sync module is much perfect for that kind of solution.
Sample code
/*require sync module*/
var Sync = require('sync');
app.get('/',function(req,res,next){
story.find().exec(function(err,data){
var sync_function_data = find_user.sync(null, {name: "sanjeev"});
res.send({story:data,user:sync_function_data});
});
});
/*****sync function defined here *******/
function find_user(req_json, callback) {
process.nextTick(function () {
users.find(req_json,function (err,data)
{
if (!err) {
callback(null, data);
} else {
callback(null, err);
}
});
});
}
reference link: https://www.npmjs.com/package/sync

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Puppeteer (Cluster) closing page when I interact with it - node.js

Related

Node.js concurrent API GETs, return as soon as one responses

Possible race condition Node/Express

How write simple Jest mock for express-session's req.session.destroy()

NodeJS using FS and AsyncJS blocking the UI

Wait for an event to happen before sending HTTP response in NodeJS?

Categories

Resources