Process blocked at console.log when combining readline and puppeteer - node.js

The following code will be blocked forever before console.log("this line.....");.
const puppeteer = require('puppeteer');
const readline = require("readline");
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
async function main() {
browser = await puppeteer.launch();
rl.close();
await browser.close();
console.log("this line will not be executed.");
}
main();
Moving rl.close() below of console.log solves this problem, removing browser = ..... and await browser.close() did the same.
Is this a bug of puppeteer? Or does there are some mechanism I don't understand?
Puppeteer version: 1.11.0
Node.js version: 10.14.2
OS: Windows 10 1803

It seems this is worth to be reported as an issue to the puppeteer GitHub repository. Something really weird happens to stdin and event loop after this combination (Chrome does exits, but the Node.js remains, and after the Ctrl+C abort the prompt appears twice in the Windows shell as if ENTER was buffered till the exit).
FWIW, this issue disappears if terminal option of readline.createInterface() is set to false.

It seems like you do not completely understand how ASYNC/AWAIT works in js.
If you use await inside an async function it will pause the async function and wait for the Promise to resolve prior to moving on.
A code inside your async function will be processed consistently as if it were synchronous, but without blocking the main thread.
async function main() {
browser = await puppeteer.launch(); // will be executed first
rl.close();// will be executed second (wait untill everithing above is finished)
await browser.close(); // will be executed third (wait untill everithing above is finished)
console.log("this line will not be executed."); // will be executed forth (wait untill everithing above is finished)
}

Related

Node.JS + Puppeteer: browser.close freezes process

I have a really simple function I'm running.
module.exports.test = async () => {
const browser = await puppeteer.launch(puppetOptions);
try {
const page = await browser.newPage();
await page.goto('https://google.com');
// helper function that pauses for five seconds before moving on
await pause(5000);
await browser.close();
console.log('browser closed');
} catch (err) {
console.log(err);
await browser.close();
}
}
I run it from my index.js file:
const server = app.listen(process.env.PORT || 5000, () => {
test();
});
Now I run it in my terminal with node index.js. It opens a browser. It opens a new page and navigates to Google. It waits five seconds. It closes the browser. Everything appears great. I hit ctrl + c in my terminal to stop the process, but nothing happens. Typically this works. If I remove the browser.close function, ctrl + c goes back to working as expected, and ends the process. This function I'm running is the result of me breaking down a more complex function that appears to have a memory leak, so it really seems that browser.close is the culprit. For the life of me though, I can't figure out why it would be causing an issue when simplified this much. This is happening in both headless, and headfull modes. Here are the puppeteer launch options:
puppetOptions = {
defaultViewport: null,
args: [
"--incognito",
"--no-sandbox",
"--single-process",
"--no-zygote"
],
}
puppetOptionsHeadfull = {
headless: false,
executablePath: 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe',
}
EDIT: I tried this in my bash terminal as well and have the same issue. When I try to manually close the terminal to abort it, an error pops up.
Processes are running in session:
| WPID PID COMMAND
| 10900 1122 winpty node.exe index.js
Close anyway?
EDIT 2: Narrowed this down to it being an issue with puppeteer-extra most likely, however, it seems like a bug in the core package. Fairly recent open issue on their repo that reflects this bug found here: https://github.com/berstend/puppeteer-extra/issues/421
I'll leave this question open just in case anyone else stumbles on it with the same issue, they don't pull their hair out debugging it.
I have been having this issue as well. It seems for me that the two arguments '--no-sandbox', and '--disable-setuid-sandbox' are culprits, although it's not intuitive why that would be the case. Try removing the no sandbox argument

how to keep sending requests one after the other in nodejs

I am using puppeteer to automate a website, puppeteer session may take about (30s-60s), and I want to fire a request(open another puppeteer session) right after the one before it finishes and I realized that I can't use setInterval because time is not constant in my case, how can I achieve such thing
Use a recursive function to keep calling the same function right after it's done.
async function bot(){
const browser = await puppeteer.launch()
// other code
await browser.close()
}
async function fireMyRequest(){
await bot()
await fireMyRequest()
}

How to get puppeteer to simply load a web page?

I can't get puppeteer to do anything. I'm simply trying to get it to show google.com and I can't even get it to do that. Here's my code:
console.log('Loading puppeteer...');
const puppeteer = require('puppeteer');
async function test() {
console.log('Launching browser...');
const browser = await puppeteer.launch({headless: false});
console.log('Creating new page...');
const page = await browser.newPage();
console.log('Requesting url...');
await page.goto('https://www.google.com');
console.log('Closing browser...');
await browser.close();
}
test().catch(e=>{console.log(e)});
Chromium crashes every single time I try do do anything...
Then I get a timeout error:
Loading puppeteer...
Launching browser...
TimeoutError: waiting for target failed: timeout 30000ms exceeded
...
...
I've been searching for a solution for literally weeks. Does this thing just not work anymore?
After looking at this thread, which identifies this as a well-known issue with Puppeteer, here is some more information on Puppeteer timeout problems.
Puppeteer.launch() has two parts that can cause timeout problems. One is goto timing out, and the other is waitfor timing out. Since I don't know what could be causing your specific issue, I'll give you potential solutions for both.
Possible issue #1: Goto is timing out.
I'll directly quote the person who posted this solution, rudiedirkx:
In my case the goto timeout happens because of a forever-loading blocking resource (js or css). That'll never trigger the page's load or domcontentloaded. A bug in Puppeteer IMO, but whatever.
My fix (FINALLY!) is to do what Lighthouse does in its Driver: a Promise.race() for a custom 'timeout'-ish. The shorter version I used:
const LOAD_FAIL = Math.random();
const sleep = options => new Promise(resolve => {
options.timer = setTimeout(resolve, options.ms, options.result === undefined ? true : options.result);
});
const sleepOptions = {ms: TIMEOUT - 1000, result: LOAD_FAIL};
const response = await Promise.race([
sleep(sleepOptions),
page.goto(url, {timeout: TIMEOUT + 1000}),
]);
clearTimeout(sleepOptions.timer);
const success = response !== LOAD_FAIL;
Possible issue #2: Waitfor is timing out.
Alternatively you can try the solution to a waitfor timeout given by dealeros, adding --enable-blink-features=HTMLImports in args:
browser = await puppeteer.launch({
//headless: false,
'args': [
'--enable-blink-features=HTMLImports'
]
});
If neither of those worked
If neither of these solutions work, I recommend browsing that thread to find more solutions people have suggested and see if you can narrow down the problem. Use this code to generate some console logs and see if you can find what's going wrong:
page
.on('console', message =>
console.log(`${message.type().substr(0, 3).toUpperCase()} ${message.text()}`))
.on('pageerror', ({ message }) => console.log(message))
.on('response', response =>
console.log(`${response.status()} ${response.url()}`))
.on('requestfailed', request =>
console.log(`${request.failure().errorText} ${request.url()}`));
These options both resolved the issue for me:
Kill all Chromium processes
pkill -o chromium
Reinstall node packages (if step 1 doesn't help)
rm -rf node_modules
npm install

Close Browser after Navigation Timeout

I have this code below made with nodejs + puppeteer, whose goal is to take a screenshot of the user's site:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://MY_WEBSITE/try/slowURL',{timeout: 30000, waitUntil: 'networkidle0' });//timeout 30 seconds
await page.setViewport({width: 1920, height: 1080});
await page.screenshot({path: pathUpload});
await browser.close();
Its operation is quite simple, but to test the timeout I created a page (http://MY_WEBSITE/try/slowURL) that takes 200 seconds to load.
According to the puppeteer timeout (timeout: 30000), there is a 100% chance of a Navigation Timeout Exceeded: 30000ms exceeded error happening, especially because I'm forcing it.
THE PROBLEM
Through the htop command (used in linux), even after the system crashes and shows "TimeoutError", I can see that the browser has not been closed.
And if the browser is not closed, as scans were done, there is a good chance that the server will run out of memory, and I don't want that.
How can I solve this problem?
You want to wrap your code into a try..catch..finally statement to handle the error and close the browser.
Code Sample
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
await page.goto(/* ... */);
// more code which might throw...
} catch (err) {
console.error('error', err.message);
} finally {
await browser.close();
}
Your main code is executed inside a try block. The catch block shows any kind of error that might happened. The finally part is the part of your script that is always executed, not only when an error is thrown. That way, independent of whether an error happened or not, your script will call the browser.close function.

Puppeteers waitFor functions fail BEFORE the page finished rendering

How come waitForFunction, waitForSelector, await page.evaluate etc. all give errors UNLESS I put a 10 seconds delay after reading the page?
I would think these were made to wait for something to happen on the page, but without my 10 seconds delay (just after page.goto) - all of them fail with errors.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://sunnythailand.com')
console.log("Waiting 10 seconds")
await new Promise((resolve)=>setTimeout(()=> resolve() ,10000));
console.log("Here we go....")
console.log("waitForFunction START")
await page.waitForFunction('document.querySelector(".scrapebot_description").textContent.length > 0');
console.log("waitForFunction FOUND scrapebot")
console.log("Waiting for evaluate")
const name = await page.evaluate(() => document.querySelector('.scrapebot_description').textContent)
console.log("Evaluate: " + name)
await browser.close()
})()
My theory is that our sunnythailand.com page sends an "end of page" or something BEFORE it finished rendering, and then all the waitFor functions go crazy and fail with all kinds of strange errors.
So I guess my question is... how do we get waitFor to actually WAIT for the event to happen or class to appear etc...?
Don't use time out cause you don't know how much time for it will take to load full page. It depends person to person on his internet bandwidth.
All you need to rely on promise for your class
await page.waitForSelector('.scrapebot_description');
lets wait for your particular class then it will work fine
Please remove this
//await new Promise((resolve)=>setTimeout(()=> resolve() ,5000));
plese let me know your test result after this. I am sure it will solve.

Resources