Puppeteer when error go to previous function or call - node.js

i want to make a pupeteer script that if the page is error or in some case facing a connection issues for example "Aw Snap!" in chrome my script will reload/refresh/navigate to the original destination target url. so in .catch() command have to call some refresh/re-navigate function. for now i only can show text message when catch an error with console.log. here's my code
const puppeteer = require('puppeteer');
puppeteer.launch({headless:false}).then(async browser => {
const page = await browser.newPage();
await page.on('error',async err => {console.log('on page.on');});
await page.goto('https://www.google.com').then(async ()=> {
while(1){
await page.waitForSelector("img",{timeout:7000})
.then(async () => {
await page.evaluate(() => {
return document.querySelector('div.jsb input[name="btnI"]').value;
}).then(abc => {
console.log(abc);
})
.catch(err => console.log('input button not found!!'));
})
.catch(err => console.log('selector not found!!'));
}
})
.catch(err => console.log(err));
});
so what i want is :
when 'input button not found!!' it mean something happen to the google page either connection issue or something. i need to re-visit url https://www.google.com when 'input button not found!!' triggered. How to do that? i don't want to manually write it twice by replacing text 'input button not found!!' into await page.goto('https://www.google.com')
what i want is dynamic solution like placing some function() , etc
Thank you.

Related

Amazon Scrape returning undefined nightmare package

I am using nightmare.js to get the price for all the products in the database. When I try running just
const data = await monitor.find({})
data.forEach(async (monitor1) => {
const url = monitor1.Link
console.log(url)
})
It works, and console.logs many links, but when I try
data.forEach(async (monitor1) => {
const url = monitor1.Link
console.log(url)
try {
const priceString = await nightmare.goto(monitor1.Link)
.wait(".a-price-whole")
.evaluate(() => document.getElementsByClassName("a-price-whole").innerText)
.end()
console.log(priceString)
} catch (e) {
throw e
}
})
It returns undefined for all the prices. The class is called a-price-whole as shown here
Why is this happening?
Information is less, but I believe you are using same nightmare instance.
Refer this answer

Puppeteer Devtools Programaticaly

I can open the devtools that exist in Puppeteer, but I cannot write data to the console section and export the log of this data to the cmd screen?
In Puppeteer, I want to print to console as below and get the output below.
Screenshot
You are asking for two things here
Capture console.log messages to the command prompt
Run a javascript command inside puppeteer
For the first point you can set the option dumpio: true as a option
For the second point you can jump into the page using evaluate and make a call to console.log
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
dumpio: true
});
const page = await browser.newPage();
const url = "https://stackoverflow.com";
await page.goto(url);
await page.waitFor('h1');
await page.evaluate(() => {
console.log(document.getElementsByTagName("h1")[0].innerText);
});
console.log("Done.")
await browser.close();
})();
Also for brevity if you are getting to much output you can omit dumpio and instead catch the log as an event e.g.
page.on('console', (msg) => console[msg._type]('PAGE LOG:', msg._text));
await page.waitFor('h1');
await page.evaluate(() => {
console.log(1 + 2);
console.log(document.getElementsByTagName("h1")[0].innerText);
});
the second script returns
PAGE LOG: 3
PAGE LOG: We <3 people who code
Done.

Puppeteer: return JSON response of AJAX response

while the page is loading I am trying to wait for a certain AJAX request made by my page and then return its response's JSON body. My code does not stop iterating through every response even after the condition is met within the listener for 'response' event.
Once I find the response I want to return, how can I capture the JSON from the response, stop execution the page from loading further, and return my JSON?
async function runScrape() {
const browser = await browserPromise;
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
await page.setDefaultTimeout(60000);
let apiResponse;
page.on('response', async response => {
let url = await response.url();
let status = await response.status();
console.info(status + " NETWORK CALL: " + url);
if ( url.match(requestPattern) ) {
apiResponse = await response.text();
await page.evaluate(() => window.stop());
}
});
await page.goto(req.query.url);
console.log("API RESPONSE:\n" + apiResponse);
return apiResponse
}}
=== UPDATE ===
This was the solution that ended up working. It seemed this approach was required due to the specific behavior of the page being scraped.
async function runScrape() {
const browser = await browserPromise;
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
await page.setDefaultTimeout(60000);
await page.setRequestInterception(true);
let JSONResponse;
page.on('response', async response => {
if ( !JSONResponse && response.url().match(requestPattern) ) {
JSONResponse = await response.text();
}
});
page.on('request', request => {
if (request.resourceType() === 'image' || request.resourceType() === 'stylesheet') request.abort()
else request.continue()
});
await page.goto(scrapeURL, {waitUntil: 'networkidle2'});
await page.close();
return JSONResponse
}
runScrape()
.then( response => {
res.setHeader("content-type", "application/json");
res.status(200).send(response);
})
.catch(err => {
let payload = {"errorType": err.name, "errorMessage": err.message+"\n"+err.stack};
console.error(JSON.stringify(payload));
res.status(500).json(payload);
});
I would simplify it to a single page.on('response'... where we are looking for the desired request pattern with String.includes().
Once the response is identified then we can emulate the "Stop loading this page" button of the browser with await page.evaluate(() => window.stop()). The window.stop() method won't close the browser yet, just stops the network requests.
let resp
page.on('response', async response => {
if (response.url().includes(requestPattern)) {
resp = await response.json()
await page.evaluate(() => window.stop())
}
})
await page.goto(req.query.url, { waitUntil: 'networkidle0' } )
console.log(resp)
Edit:
To avoid undefined response you should use waitUntil: 'networkidle0' setting on page.goto(), see the docs about the options. You've got undefined because by default puppeteer considered page to be loaded when the load event is fired on the page (this is the default setting of waitUntil). So if the page considered loaded but there are still network connections in the queue and your request pattern is not found yet: the script will go on from goto to console.log. So you make sure the request is registered before it would happen by waiting until all network request has been finished.
networkidle0: consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
Please note: by setting networkidle you won't be able to disconnect after the request pattern condition was fulfilled, so your plan to stop the responses won't be possible.
I recommend to abort those resourceTypes which are not needed, like this you may have similar results as you would with stopping the requests:
For example:
Place it right after the page.on('response', async response => {... block ended.
await page.setRequestInterception(true)
page.on('request', request => {
if (request.resourceType() === 'image' || request.resourceType() === 'stylesheet') request.abort()
else request.continue()
})
You can use it with a request.url().includes(unwantedRequestPattern) condition as well if you know which connections you don't need.

How to catch an exception inside an event listener?

I use Puppeteer library to open an URL and process all requests' responses. Sometimes inside the event listener page.on('response') I need to throw an error like in the example below. But I'm unable to catch these exceptions in any way, I always got the unhandled promise rejection error. How can I handle these exceptions? I don't want to use process.on('unhandledRejection') because it doesn't solve my problem at all.
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('response', (request) => {
throw 'response error';
});
await page.goto('http://google.com/');
browser.close();
} catch (e) {}
})();
Although the function of your page.on handler is located inside a try..catch block, the function is executed asynchronously and therefore any error thrown inside this function is not caught by the outer try..catch block.
You have to create another try..catch block inside your function.
Example
const puppeteer = require('puppeteer');
function handleError(err) {
// ...
}
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('response', (request) => {
try { // inner try catch block to catch errors inside this function
// ...
throw 'response error';
} catch (err) {
// Handle the error here, or pass it to another function:
handleError(err);
}
});
await page.goto('http://google.com/');
browser.close();
} catch (e) {}
})();
I would never put responses in event handlers as you will most definately run into the problem of express trying to send multiple responses to a user resulting in an error (except if you create a an event handler that manages if it has been called before and then supress sending a response but that is also not pretty). I would use a Promise.race condition.
This would wait for the first one of your promises to either reject or resolve. Even though your page.on cannot resolve that's ok because page.goto should do that or also reject.
So this would look somewhat like this
try {
await Promise.race([
new Promise((res,rej) => page.on('error', error => rej(error))),
page.goto('http://google.com/')
]);
// Do stuff if got succeeds
browser.close();
catch (err) {
// do stuff with your error
browser.close();
}
All you need to do to avoid that error is to catch the result of the async function, which is a Promise, and handle it somehow, even if just piping it through console.error.
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('response', (request) => {
throw 'response error';
});
await page.goto('http://google.com/');
browser.close();
} catch (e) {}
})().catch(console.error);

async/await issues with Chrome remote interface

I'd like to test this piece of code and wait until it's done to assert the results. Not sure where the issue is, it should return the Promise.resolve() at the end, but logs end before the code is executed.
Should Page.loadEventFired also be preceded by await?
const CDP = require('chrome-remote-interface')
async function x () {
const protocol = await CDP()
const timeout = ms => new Promise(resolve => setTimeout(resolve, ms))
// See API docs: https://chromedevtools.github.io/devtools-protocol/
const { Page, Runtime, DOM } = protocol
await Promise.all([Page.enable(), Runtime.enable(), DOM.enable()])
Page.navigate({ url: 'http://example.com' })
// wait until the page says it's loaded...
return Page.loadEventFired(async () => {
console.log('Page loaded! Now waiting a few seconds for all the JS to load...')
await timeout(3000) // give the JS some time to load
protocol.close()
console.log('Processing page source...')
console.log('Doing some fancy stuff here ...')
console.log('All done.')
return Promise.resolve()
})
}
(async function () {
console.log('start')
await x()
console.log('end')
})()
Yes you should await for Page.loadEventFired Example
async function x () {
const protocol = await CDP()
const timeout = ms => new Promise(resolve => setTimeout(resolve, ms))
// See API docs: https://chromedevtools.github.io/devtools-protocol/
const { Page, Runtime, DOM } = protocol
await Promise.all([Page.enable(), Runtime.enable(), DOM.enable()])
await Page.navigate({ url: 'http://example.com' })
// wait until the page says it's loaded...
await Page.loadEventFired()
console.log('Page loaded! Now waiting a few seconds for all the JS to load...')
await timeout(3000) // give the JS some time to load
protocol.close()
console.log('Processing page source...')
console.log('Doing some fancy stuff here ...')
console.log('All done.')
}
BTW you might also want to wrap your code with try-finally to always close protocol.
async function x () {
let protocol
try {
protocol = await CDP()
...
} finally {
if(protocol) protocol.close()
}

Resources