puppeteer cannot handle dialog - node.js

The codes is a copy of this with small change.
It simulates:
Simply go to an add-on page in Chrome store.
In the add-on page, click Add to Chrome button, and there'll be a popup. Click cancel button to close the popup.
but it doesn't work. The popup appeared, but didn't close.
const puppeteer = require('puppeteer')
puppeteer.launch({headless: false}).then(async browser => {
const page = await browser.newPage()
await page.goto('https://chrome.google.com/webstore/detail/evernote-web-clipper/pioclpoplcdbaefihamjohnefbikjilc?utm_source=inline-install-disabled')
page.on('dialog', async dialog => {
console.log(dialog.message())
await dialog.dismiss()
await browser.close()
})
await page.waitForSelector('div[aria-label="Add to Chrome"]')
await page.click('div[aria-label="Add to Chrome"]')
await page.waitFor(20000)
})
Any ideas?
Thanks!
puppeteer: 1.9.0
node: v10.6.0

You should initialize event listeners before executing goto method

Related

How can I enable Chromium Extensions?

I am trying to open Chromium with extensions, but I cannot figure out how to do this. When chromium opens there are no extensions installed.
I tried to open with '--enable-remote-extensions', --load-extension=`, I tried to drag and drop the .crx into chromium extensions panel, but nothing worked.
I've got "An error has occurred
Installation is not enabled" and "Package is invalid: 'CRX_REQUIRED_PROOF_MISSING'
Could you help me with a working example ?
Thanks!
After lots of trial and error, I've solved this issue.
Below is the working code and I hope it will help someone else.
const puppeteer = require('puppeteer');
const extentionPath = "C:\\Users\\<YOUR_USERNAME>\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 1\\Extensions\\<LONG_STRING_EXTENTION_ID>\\<EXTENTION_VERSION>"
(async () => {
const customArgs = [
`--start-maximized`,
`--load-extension=${extentionPath}`
];
const browser = await puppeteer.launch({
defaultViewport: null,
headless: false,
ignoreDefaultArgs: ["--disable-extensions", "--enable-automation"],
args: customArgs,
});
const page = await browser.newPage();
await page.goto(`https://google.com/`);
await page.waitForNavigation();
await page.close();
await browser.close();
})();

Trying to crawl a website using puppeteer but getting a timeout error

I'm trying to search the Kwik Trip website for daily deals using nodeJs but I keep getting a timeout error when I try to crawl it. Not quite sure what could be happening. Does anyone know what may be going on?
Below is my code, I'm trying to wait for .agendaItemWrap to load before it brings back all of the HTML because it's a SPA.
function getQuickStar(req, res){
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const navigationPromise = page.waitForNavigation({waitUntil: "domcontentloaded"});
await page.goto('https://www.kwiktrip.com/savings/daily-deals');
await navigationPromise;
await page.waitForSelector('.agendaItemWrap', { timeout: 30000 });
const body = await page.evaluate(() => {
return document.querySelector('body').innerHTML;
});
console.log(body);
await browser.close();
} catch (error) {
console.log(error);
}
})();
}
Here's a link to the web page I'm trying to crawl https://www.kwiktrip.com/savings/daily-deals
It appear your desired selector is located into an iframe, and not into the page.mainframe.
You then need to wait for your iframe, and perform the waitForSelector on this particular iframe.
Quick tip : you don't need any page.waitForNavigation with a page.goto, because you can set the waitUntil condition into the options. By default it waits for the page onLoad event.

How to catch a download with playwright?

I'm trying to download a file from a website using Playwright. The button that triggers the download does some js and then the download starts.
Clicking the button using the .click function triggers a download but it shows an error: Failed - Download error.
I've tried using the devtools protocol Page.setDownloadBehavior, but this doesn't seem to do anything.
const playwright = require("playwright");
const { /*chromium,*/ devices } = require("playwright");
const iPhone = devices["iPad (gen 7) landscape"];
(async () => {
const my_chromium = playwright["chromium"];
const browser = await my_chromium.launch({ headless: false });
const context = await browser.newContext({
viewport: iPhone.viewport,
userAgent: iPhone.userAgent
});
const page = await context.newPage();
const client = await browser.pageTarget(page).createCDPSession();
console.log(client);
await client.send("Page.setDownloadBehavior", {
behavior: "allow",
downloadPath: "C:/in"
});
//...and so on
await page.click("#download-button");
browser.close();
})();
Full file here
There is a proposal for a better download api in Playwright, but I can't find the current API.
There was a suggestion that something to do with the downloadWillBegin event would be useful, but I've no idea how to access that from Playwright.
I'm open to the suggestion that I should use Puppeteer instead, but I moved to playwright because I couldn't work out how to download a file with Pupeteer either, and the issue related to it suggested that the whole team had moved to Playwright.
Take a look at the page.on("download")
const browser = await playwright.chromium.launch({});
const context = await browser.newContext({ acceptDownloads: true });
const page = await context.newPage();
await page.goto("https://somedownloadpage.weburl");
await page.type("#password", password);
await page.click("text=Continue");
const download = await page.waitForEvent("download");
console.log("file downloaded to", await download.path());
Embarassingly, I was closing the browser before the download had started.
It turns out that the download error was caused by the client section. However that means that I have no control over where the file is saved.
The download works when headless: false but not when headless: true.
If anyone has a better answer, that'd be great!
You can use waitForTimeout.
I tried with {headless: true} & await page.waitForTimeout(1000);
it's working fine. you can check same here
To download file (also its buffer) i highly recomend this module: Got node module. Its much easier, clean and light.
(async () => {
const response = await got('https://sindresorhus.com')
.on('downloadProgress', progress => {
// Report download progress
})
.on('uploadProgress', progress => {
// Report upload progress
});
console.log(response);
})();

Puppeteer close javascript alert box

I'm trying to click on a page button on this website but when I enter the site an alert box shows up and I don't know how to close it.
I just started experimenting with Puppeteer, this is the code I'm using this simple code right now:
const ptr = require('puppeteer');
ptr.launch().then(async browser => {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 800 });
await page.goto('https://portaleperiti.grupporealemutua.it/PPVET/VetrinaWebPortalePeriti/');
//This is the alert button selector
await page.click("#BoxAlertBtnOk");
//This is the button on the page i want to click on
await page.click("input[value='Perito RE / G.F.']");
await page.screenshot({
path: 'screenshot.png',
fullPage: true
});
await browser.close();
});
This is the error I get: UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement
at ElementHandle._clickablePoint
Any help would be really appreciated, thanks!
There are few things going on that page,
The alert box only loads after page is loaded (It has a onload property on body tag). So you should wait until network is idle.
Clicking those "Perito" buttons creates a new window/tab due to the window.open() code put into onclick handler.
The new tab redirects multiple times and shows a login page if the user is not logged in already.
Solution:
1. Make sure to load the page properly.
Just add { waitUntil: "networkidle0" } to .goto or .waitForNavigation.
await page.goto(
"https://portaleperiti.grupporealemutua.it/PPVET/VetrinaWebPortalePeriti/",
{ waitUntil: "networkidle0" }
// <-- Make sure the whole page is completely loaded
);
2. Wait for the element before clicking
Already suggested on other answers, wait for the element using waitFor.
// wait and click the alert button
await page.waitFor("#BoxAlertBtnOk");
await page.click("#BoxAlertBtnOk");
3. Optional, add few seconds before taking screenshot after clicking the button.
// optional, add few seconds before taking this screenshot
// just to make sure it works even on slow machine
await page.waitFor(2000);
await page.screenshot({
path: "screenshot_before.png",
fullPage: true
});
4. Use the page.evaluate and document.querySelector to get element
page.click will not handle all kind of clicks. Sometimes there are different events bound to some elements and you have to treat that separately.
// we can click using querySelector and the native
// just page.click does not trigger the onclick handler on this page
await page.evaluate(() =>
document.querySelector("input[value='Perito RE / G.F.']").click()
);
5. Treat the new tab separately
Together with browser.once('targetcreated'), new Promise, and browser.pages() you can catch the newly created tab and work on it.
Note: Read final code at end of the answer before using this.
// this is the final page after clicking the input on previous page
// https://italy.grupporealemutua.it/FIM/sps/IDPRMA/saml20/login
function newTabCatcher(browser) {
// we resolve this promise after doing everything we need to do on this page
// or in error
return new Promise((resolve, reject) => {
// set the listener before clicking the button to have proper interaction
// we listen for only one new tab
browser.once("targetcreated", async function() {
console.log("New Tab Created");
try {
// get the newly created window
const tabs = await browser.pages();
const lastTab = tabs[tabs.length - 1];
// Wait for navigation to finish as well as specific login form
await Promise.all([
lastTab.waitForNavigation({ waitUntil: "networkidle0" }),
lastTab.waitFor("#div_login")
]);
// browser will switch to this tab just when it takes the screenshot
await lastTab.screenshot({
path: "screenshot_newtab.png",
fullPage: true
});
resolve(true);
} catch (error) {
reject(error);
}
});
});
}
Final Code:
Just for clarity, here is how I used all code snippets specified above.
const ptr = require("puppeteer");
ptr.launch({ headless: false }).then(async browser => {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 800 });
await page.goto(
"https://portaleperiti.grupporealemutua.it/PPVET/VetrinaWebPortalePeriti/",
{ waitUntil: "networkidle0" }
// <-- Make sure the whole page is completely loaded
);
// wait and click the alert button
await page.waitFor("#BoxAlertBtnOk");
await page.click("#BoxAlertBtnOk");
// optional, add few seconds before taking this screenshot
// just to make sure it works even on slow machine
await page.waitFor(2000);
await page.screenshot({
path: "screenshot_before.png",
fullPage: true
});
// we can click using querySelector and the native
// just page.click does not trigger the onclick handler on this page
await page.evaluate(() =>
document.querySelector("input[value='Perito RE / G.F.']").click()
);
// here we go and process the new tab
// aka get screenshot, fill form etc
await newTabCatcher(browser);
// rest of your code
// ...
await browser.close();
});
Result:
It worked flawlessly!
Note:
Notice how I used new Promise and async await together. This might not be the best practice, but now you have a lead of what to look for when creating a scraper for some old websites.
If it's relevant to anyone else who facing dialog boxes, the following code solved it for me:
this.page.on('dialog', async dialog => {
await dialog.dismiss();
});
Your button - #BoxAlertBtnOk will be appear on the webpage after a moment, when you call await page.click("#BoxAlertBtnOk"); the button is invisible. Try to wait until it visible then take an action:
await page.waitForSelector("#BoxAlertBtnOk");
await page.click("#BoxAlertBtnOk");
await page.waitForSelector("input[value='Perito RE / G.F.']");
await page.click("input[value='Perito RE / G.F.']");

Run flash game in headless chrome using puppeteer

How can I run a flash game in headless chrome using puppeteer? I'm trying to screenshot this flash game but the game doesn't run and is replaced by "Couldn't load plugin" text.
Here's the relevant code I used to generate the screenshot and its output, running in ubuntu on windows subsystem linux:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setViewport({width: 1243, height: 882});
await page.goto('http://www.bigfuntown.com/Game-59.html');
await page.screenshot({path: 'game.png'});
await browser.close();
})();
In modern Chrome versions flash is blocked by default and requires user interaction to be enabled.
As this was problematic to automate I've made a puppeteer wrapper for this purpose: puppeteer.setExtra({allowFlash: true})
Note that headless: false is still required due to puppeteer limitations.

Resources