I am working on creating PDF from web page.
The application on which I am working is single page application.
I tried many options and suggestion on https://github.com/GoogleChrome/puppeteer/issues/1412
But it is not working
const browser = await puppeteer.launch({
executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
ignoreHTTPSErrors: true,
headless: true,
devtools: false,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto(fullUrl, {
waitUntil: 'networkidle2'
});
await page.type('#username', 'scott');
await page.type('#password', 'tiger');
await page.click('#Login_Button');
await page.waitFor(2000);
await page.pdf({
path: outputFileName,
displayHeaderFooter: true,
headerTemplate: '',
footerTemplate: '',
printBackground: true,
format: 'A4'
});
What I want is to generate PDF report as soon as Page is loaded completely.
I don't want to write any type of delays i.e. await page.waitFor(2000);
I can not do waitForSelector because the page has charts and graphs which are rendered after calculations.
Help will be appreciated.
You can use page.waitForNavigation() to wait for the new page to load completely before generating a PDF:
await page.goto(fullUrl, {
waitUntil: 'networkidle0',
});
await page.type('#username', 'scott');
await page.type('#password', 'tiger');
await page.click('#Login_Button');
await page.waitForNavigation({
waitUntil: 'networkidle0',
});
await page.pdf({
path: outputFileName,
displayHeaderFooter: true,
headerTemplate: '',
footerTemplate: '',
printBackground: true,
format: 'A4',
});
If there is a certain element that is generated dynamically that you would like included in your PDF, consider using page.waitForSelector() to ensure that the content is visible:
await page.waitForSelector('#example', {
visible: true,
});
Sometimes the networkidle events do not always give an indication that the page has completely loaded. There could still be a few JS scripts modifying the content on the page. So watching for the completion of HTML source code modifications by the browser seems to be yielding better results. Here's a function you could use -
const waitTillHTMLRendered = async (page, timeout = 30000) => {
const checkDurationMsecs = 1000;
const maxChecks = timeout / checkDurationMsecs;
let lastHTMLSize = 0;
let checkCounts = 1;
let countStableSizeIterations = 0;
const minStableSizeIterations = 3;
while(checkCounts++ <= maxChecks){
let html = await page.content();
let currentHTMLSize = html.length;
let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length);
console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize);
if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize)
countStableSizeIterations++;
else
countStableSizeIterations = 0; //reset the counter
if(countStableSizeIterations >= minStableSizeIterations) {
console.log("Page rendered fully..");
break;
}
lastHTMLSize = currentHTMLSize;
await page.waitForTimeout(checkDurationMsecs);
}
};
You could use this after the page load / click function call and before you process the page content. e.g.
await page.goto(url, {'timeout': 10000, 'waitUntil':'load'});
await waitTillHTMLRendered(page)
const data = await page.content()
In some cases, the best solution for me was:
await page.goto(url, { waitUntil: 'domcontentloaded' });
Some other options you could try are:
await page.goto(url, { waitUntil: 'load' });
await page.goto(url, { waitUntil: 'domcontentloaded' });
await page.goto(url, { waitUntil: 'networkidle0' });
await page.goto(url, { waitUntil: 'networkidle2' });
You can check this at puppeteer documentation:
https://pptr.dev/#?product=Puppeteer&version=v11.0.0&show=api-pagewaitfornavigationoptions
I always like to wait for selectors, as many of them are a great indicator that the page has fully loaded:
await page.waitForSelector('#blue-button');
In the latest Puppeteer version, networkidle2 worked for me:
await page.goto(url, { waitUntil: 'networkidle2' });
Wrap the page.click and page.waitForNavigation in a Promise.all
await Promise.all([
page.click('#submit_button'),
page.waitForNavigation({ waitUntil: 'networkidle0' })
]);
I encountered the same issue with networkidle when I was working on an offscreen renderer. I needed a WebGL-based engine to finish rendering and only then make a screenshot. What worked for me was a page.waitForFunction() method. In my case the usage was as follows:
await page.goto(url);
await page.waitForFunction("renderingCompleted === true")
const imageBuffer = await page.screenshot({});
In the rendering code, I was simply setting the renderingCompleted variable to true, when done. If you don't have access to the page code you can use some other existing identifier.
You can also use to ensure all elements have rendered
await page.waitFor('*')
Reference: https://github.com/puppeteer/puppeteer/issues/1875
As for December 2020, waitFor function is deprecated, as the warning inside the code tell:
waitFor is deprecated and will be removed in a future release. See
https://github.com/puppeteer/puppeteer/issues/6214 for details and how
to migrate your code.
You can use:
sleep(millisecondsCount) {
if (!millisecondsCount) {
return;
}
return new Promise(resolve => setTimeout(resolve, millisecondsCount)).catch();
}
And use it:
(async () => {
await sleep(1000);
})();
Keeping in mind the caveat that there's no silver bullet to handle all page loads, one strategy is to monitor the DOM until it's been stable (i.e. has not seen a mutation) for more than n milliseconds. This is similar to the network idle solution but geared towards the DOM rather than requests and therefore covers a different subset of loading behaviors.
Generally, this code would follow a page.waitForNavigation({waitUntil: "domcontentloaded"}) or page.goto(url, {waitUntil: "domcontentloaded"}), but you could also wait for it alongside, say, waitForNetworkIdle() using Promise.all() or Promise.race().
Here's a simple example:
const puppeteer = require("puppeteer"); // ^14.3.0
const waitForDOMStable = (
page,
options={timeout: 30000, idleTime: 2000}
) =>
page.evaluate(({timeout, idleTime}) =>
new Promise((resolve, reject) => {
setTimeout(() => {
observer.disconnect();
const msg = `timeout of ${timeout} ms ` +
"exceeded waiting for DOM to stabilize";
reject(Error(msg));
}, timeout);
const observer = new MutationObserver(() => {
clearTimeout(timeoutId);
timeoutId = setTimeout(finish, idleTime);
});
const config = {
attributes: true,
childList: true,
subtree: true
};
observer.observe(document.body, config);
const finish = () => {
observer.disconnect();
resolve();
};
let timeoutId = setTimeout(finish, idleTime);
}),
options
)
;
const html = `<!DOCTYPE html><html lang="en"><head>
<title>test</title></head><body><h1></h1><script>
(async () => {
for (let i = 0; i < 10; i++) {
document.querySelector("h1").textContent += i + " ";
await new Promise(r => setTimeout(r, 1000));
}
})();
</script></body></html>`;
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.setContent(html);
await waitForDOMStable(page);
console.log(await page.$eval("h1", el => el.textContent));
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
For pages that continually mutate the DOM more often than the idle value, the timeout will eventually trigger and reject the promise, following the typical Puppeteer fallback. You can set a more aggressive overall timeout to fit your needs or tailor the logic to ignore (or only monitor) a particular subtree.
Answers so far haven't mentioned a critical fact: it's impossible to write a one-size-fits-all waitUntilPageLoaded function that works on every page. If it were possble, Puppeteer would surely provide it.
Such a function can't rely on a timeout, because there's always some page that takes longer to load than that timeout. As you extend the timeout to reduce the failure rate, you introduce unnecessary delays when working with fast pages. Timeouts are generally a poor solution, opting out of Puppeteer's event-driven model.
Waiting for idle network requests might not always work if the responses involve long-running DOM updates that take longer than 500ms to trigger a render.
Waiting for the DOM to stop changing might miss slow network requests, long-delayed JS triggers, or ongoing DOM manipulation that might cause the listener never to settle, unless specially handled.
And, of course, there's user interaction: captchas, prompts and cookie/subscription modals that need to be clicked through and dismissed before the page is in a sensible state for a full-page screenshot (for example).
Since every page has different, arbitrary JS behavior, the typical approach is to write event-driven logic that works for a specific page. Making precise, directed assumptions is much better than cobbling together a boatload of hacks that tries to solve every edge case.
If your use case is to write a load event that works on every page, my suggestion is to use some combination of the tools described here that is most balanced to meet your needs (speed vs. accuracy, development time/code complexitiy vs accuracy, etc). Use fail-safes for everything rather than blindly assuming all pages will cooperate with your assumptions. Think hard about what extent you really need to try to handle every web page. Prepare to compromise and accept some degree of failures you can live with.
Here's a quick rundown of the strategies you can mix and match to wait for loads to fit your needs:
page.goto() and page.waitForNavigation() default to the load event, which "is fired when the whole page has loaded, including all dependent resources such as stylesheets and images" (MDN), but this is often too pessimistic; there's no need to wait for a ton of data you don't care about. Often the data is available without waiting for all external resources, so domcontentloaded should be faster. See my post Avoiding Puppeteer Antipatterns for further discussion.
On the other hand, if there are JS-triggered networks requests after load, you'll miss that data. Hence networkidle2 and networkidle0, which wait 500 ms after the number of active network requests are 2 or 0. The motivation for the 2 version is that some sites keep ongoing requests open, which would cause networkidle0 to time out.
If you're waitng for a specific network response that might have a payload (or, for the general case, implementing your own network idle monitor), use page.waitForResponse(). page.waitForRequest(), page.waitForNetworkIdle() and page.on("request", ...) are also useful here.
If you're waiting for a particular selector to be visible, use page.waitForSelector(). If you're waiting for a load on a specific page, identify a selector that indicates the state you want to wait for. Generally speaking, for scripts specific to one page, this is the main tool to wait for the state you want, whether you're extracting data or clicking something. Frames and shadow roots thwart this function.
page.waitForFunction() lets you wait for an arbitrary predicate, for example, checking that the page's HTML or a specific list is a certain length. It's also useful for quickly dipping into frames and shadow roots to wait for predicates that depend on nested state. This function is also handy for detecting DOM mutations.
The most general tool is page.evaluate(), which plugs code into the browser. You can put just about any conditions you want here; most other Puppeteer functions are convenience wrappers for common cases you could implement by hand with evaluate.
I can't leave comments, but I made a python version of Anand's answer for anyone who finds it useful (i.e. if they use pyppeteer).
async def waitTillHTMLRendered(page: Page, timeout: int = 30000):
check_duration_m_secs = 1000
max_checks = timeout / check_duration_m_secs
last_HTML_size = 0
check_counts = 1
count_stable_size_iterations = 0
min_stabe_size_iterations = 3
while check_counts <= max_checks:
check_counts += 1
html = await page.content()
currentHTMLSize = len(html);
if(last_HTML_size != 0 and currentHTMLSize == last_HTML_size):
count_stable_size_iterations += 1
else:
count_stable_size_iterations = 0 # reset the counter
if(count_stable_size_iterations >= min_stabe_size_iterations):
break
last_HTML_size = currentHTMLSize
await page.waitFor(check_duration_m_secs)
For me the { waitUntil: 'domcontentloaded' } is always my go to.
I found that networkidle doesnt work well...
Related
I have a working puppeteer script that I'd like to make into an API but I'm having problems with waitForSelector.
Background:
I wrote a puppeteer script that successfully searches for and scrapes the result of a query I specify in the code e.g. let address = xyz;. Now I'd like to make it into an API so that a user can query something. I managed to code everything necessary for the local API (working with express) and everything works as well. By that I mean: I coded all the server side stuff: I can make a request, the scraper function is called, puppeteer starts up, carries out my search (I need to type in an address, choose from a dropdown and press enter).
The status:
The result of my query is a form (basically 3 columns and some rows) in an iFrame and I want to scrape all the rows (I modify them into a specific json later on). The way it works is I use waitForSelector on the form's selector and then I use frame.evaluate.
Problem:
When I run my normal scraper everything works well, but when I run the (slightly modified but essentially same) code within the API framework, waitForSelector suddenly always times out. I have tried all the usual workarounds: waitForNavigation, taking a screenshot and inspecting etc but nothing helped. I've been reading quite a bit and could it be that I'm screwing something up in terms of async/await when I call my scraper from within the context of the API? I'm still quite new to this so please bear with me. This is the code of the working script - I indicated the important part
const puppeteer = require("puppeteer");
const chalk = require("chalk");
const fs = require('fs');
const error = chalk.bold.red;
const success = chalk.keyword("green");
address = 'Gumpendorfer Straße 12, 1060 Wien';
(async () => {
try {
// open the headless browser
var browser = await puppeteer.launch();
// open a new page
var page = await browser.newPage();
// enter url in page
await page.goto(`https://mein.wien.gv.at/Meine-Amtswege/richtwert?subpage=/lagezuschlag/`, {waitUntil: 'networkidle2'});
// continue without newsletter
await page.click('#dss-modal-firstvisit-form > button.btn.btn-block.btn-light');
// let everyhting load
await page.waitFor(1000)
console.log('waiting for iframe with form to be ready.');
//wait until selector is available
await page.waitForSelector('iframe');
console.log('iframe is ready. Loading iframe content');
//choose the relevant iframe
const elementHandle = await page.$(
'iframe[src="/richtwertfrontend/lagezuschlag/"]',
);
//go into frame in order to input info
const frame = await elementHandle.contentFrame();
//enter address
console.log('filling form in iframe');
await frame.type('#input_adresse', address, { delay: 100});
//choose first option from dropdown
console.log('Choosing from dropdown');
await frame.click('#react-autowhatever-1--item-0');
console.log('pressing button');
//press button to search
await frame.click('#next-button');
// scraping data
console.log('scraping')
await frame.waitForSelector('#summary > div > div > br ~ div');//This keeps failing in the API
const res = await frame.evaluate(() => {
const rows = [...document.querySelectorAll('#summary > div > div > br ~ div')];
const cells = rows.map(
row => [...row.querySelectorAll('div')]
.map(cell => cell.innerText)
);
return cells;
});
await browser.close();
console.log(success("Browser Closed"));
const mapFields = (arr1, arr2) => {
const mappedArray = arr2.map((el) => {
const mappedArrayEl = {};
el.forEach((value, i) => {
if (arr1.length < (i+1)) return;
mappedArrayEl[arr1[i]] = value;
});
return mappedArrayEl;
});
return mappedArray;
}
const Arr1 = res[0];
const Arr2 = res.slice(1,3);
let dataObj = {};
dataObj[address] = [];
// dataObj['lagezuschlag'] = mapFields(Arr1, Arr2);
// dataObj['adresse'] = address;
dataObj[address] = mapFields(Arr1, Arr2);
console.log(dataObj);
} catch (err) {
// Catch and display errors
console.log(error(err));
await browser.close();
console.log(error("Browser Closed"));
}
})();
I just can't understand why it would work in the one case and not in the other, even though I barely changed something. For the API I basically changed the name of the async function to const search = async (address) => { such that I can call it with the query in my server side script.
Thanks in advance - I'm not attaching the API code cause I don't want to clutter the question. I can update it if it's necessary
I solved this myself. Turns out the problem wasn't as complicated as I thought and it was annoyingly simple to solve. The problem wasn't with the selector that was timing out but with the previous selectors, specifically the typing and choosing from dropdown selectors. Essentially, things were going too fast. Before the search query was typed in, the dropdown was already pressed and nonsense came out. How I solved it: I included a waitFor(1000) call before the dropdown is selected and everything went perfectly. An interesting realisation was that even though that one selector timed out, it wasn't actually the source of the problem. But like I said, annoyingly simple and I feel dumb for asking this :) but maybe someone will see this and learn from my mistake
I have this code below made with nodejs + puppeteer, whose goal is to take a screenshot of the user's site:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://MY_WEBSITE/try/slowURL',{timeout: 30000, waitUntil: 'networkidle0' });//timeout 30 seconds
await page.setViewport({width: 1920, height: 1080});
await page.screenshot({path: pathUpload});
await browser.close();
Its operation is quite simple, but to test the timeout I created a page (http://MY_WEBSITE/try/slowURL) that takes 200 seconds to load.
According to the puppeteer timeout (timeout: 30000), there is a 100% chance of a Navigation Timeout Exceeded: 30000ms exceeded error happening, especially because I'm forcing it.
THE PROBLEM
Through the htop command (used in linux), even after the system crashes and shows "TimeoutError", I can see that the browser has not been closed.
And if the browser is not closed, as scans were done, there is a good chance that the server will run out of memory, and I don't want that.
How can I solve this problem?
You want to wrap your code into a try..catch..finally statement to handle the error and close the browser.
Code Sample
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
await page.goto(/* ... */);
// more code which might throw...
} catch (err) {
console.error('error', err.message);
} finally {
await browser.close();
}
Your main code is executed inside a try block. The catch block shows any kind of error that might happened. The finally part is the part of your script that is always executed, not only when an error is thrown. That way, independent of whether an error happened or not, your script will call the browser.close function.
I need help to undestand how timeout works, especially with node/puppeteer
I read all stack questions and github issues about this, but i can figure it out what is wrong
Probably my code...
When i run this file, i receive the error from image. You can see the ways i tryied to fix it, nothing works
Can someone explain why this happens and the best approach to avoid this? Is there a better way to get these Projects?
//vou até os seeds em x tempo
var https = require('https');
var Q = require('q');
var fs = require('fs');
var puppeteer = require('puppeteer');
var Projeto = require('./Projeto.js');
const url = 'https://www.99freelas.com.br/projects?categoria=web-e-desenvolvimento'
/*const idToScrape;
deverá receber qual a url e os parametros específicos de cada seed */
async function genScraper() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
//page.setDefaultNavigationTimeout(60000);
page.waitForNavigation( { timeout: 60000, waitUntil: 'domcontentloaded' });
await page.goto(url);
var projetos = await page.evaluate(() => {
let qtProjs = document.querySelectorAll('.result-list li').length;
let listaDeProjs = Array.from(document.querySelectorAll('.result-list li'));
let tempProjetos = [];
for( var i=0; i<=listaDeProjs.length; i++ ) {
let titulo = listaDeProjs[i].children[1].children[0].textContent;
let descricao = listaDeProjs[i].children[2].textContent;
let habilidades = listaDeProjs[i].children[3].textContent;
let publicado = listaDeProjs[i].children[1].children[1].children[0].textContent;
let tempoRestante = listaDeProjs[i].children[1].children[1].children[1].textContent;
//let infoCliente;
proj = new Projeto(titulo, descricao, habilidades, publicado, tempoRestante);
tempProjetos.push(proj);
}
return tempProjetos;
});
console.log(projetos);
browser.close();
}
genScraper();
I recommend you to avoid using the method waitForNavigation before the goTo call.
Basically, It would be better to use the method gotTo with the default value, that is 30000. In my opinion, if the website takes more than 30 seconds to work or respond, there should be something wrong.
Instead, I would do something like this:
await page.goto(url, {
waitUntil: 'networkidle0'
});
Depending on the version of puppeteer that you're using, you will have different behaviours. I am using version 1.4.0 and it is working good so far.
Inside the documentation states the following:
The page.goto will throw an error if:
there's an SSL error (e.g. in case of self-signed certificates).
target URL is invalid.
the timeout is exceeded during navigation.
the main resource failed to load.
So, check that none of the previous scenarios is happening.
Also, you can curl the URL from your terminal to see if the URL respond to outside calls, cross origin problems are common too.
Sincerely, there is no way to say what can be triggering your timeout, but that checklist should help. I had a problem with timeout recently and the problem was my server configuration, so I suggest you to see also if the machine in which you are running this code, has the necessary memory to execute.
In your for loop,
for( var i=0; i<=listaDeProjs; i++ ) {
...
}
listaDeProjs should be listaDeProjs.length
Your evaluation script will fail in several places, if anywhere along this path is undefined: (E.g., if children[1] is undefined or children[0] is undefined.)
listaDeProjs[i].children[1].children[0].textContent;
You can do the following with lodash:
_.get(listaDeProjs[i],"children[1].children[0].textContent","")
That will default to "" if there is no such value.
Additionally, the following works perfectly fine with your code in 1.7 via https://try-puppeteer.appspot.com/
await page.goto(url, {
waitUntil: 'networkidle2',
timeout: '5000'
});
How come waitForFunction, waitForSelector, await page.evaluate etc. all give errors UNLESS I put a 10 seconds delay after reading the page?
I would think these were made to wait for something to happen on the page, but without my 10 seconds delay (just after page.goto) - all of them fail with errors.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://sunnythailand.com')
console.log("Waiting 10 seconds")
await new Promise((resolve)=>setTimeout(()=> resolve() ,10000));
console.log("Here we go....")
console.log("waitForFunction START")
await page.waitForFunction('document.querySelector(".scrapebot_description").textContent.length > 0');
console.log("waitForFunction FOUND scrapebot")
console.log("Waiting for evaluate")
const name = await page.evaluate(() => document.querySelector('.scrapebot_description').textContent)
console.log("Evaluate: " + name)
await browser.close()
})()
My theory is that our sunnythailand.com page sends an "end of page" or something BEFORE it finished rendering, and then all the waitFor functions go crazy and fail with all kinds of strange errors.
So I guess my question is... how do we get waitFor to actually WAIT for the event to happen or class to appear etc...?
Don't use time out cause you don't know how much time for it will take to load full page. It depends person to person on his internet bandwidth.
All you need to rely on promise for your class
await page.waitForSelector('.scrapebot_description');
lets wait for your particular class then it will work fine
Please remove this
//await new Promise((resolve)=>setTimeout(()=> resolve() ,5000));
plese let me know your test result after this. I am sure it will solve.
So here's the code snippet:
for (let item of items)
{
await page.waitFor(10000)
await page.click("#item_"+item)
await page.click("#i"+item)
let pages = await browser.pages()
let tempPage = pages[pages.length-1]
await tempPage.waitFor("a.orange", {timeout: 60000, visible: true})
await tempPage.click("a.orange")
counter++
}
page and tempPage are two different pages.
What happens is that page waits for 10 seconds, then clicks some stuff, which opens a second page.
What's supposed to happen is that tempPage waits for an element, clicks it, then page should wait 10 seconds before doing it all over again.
However, what actually happens is that page waits for 10 seconds, clicks the stuff, then starts waiting for 10 seconds without waiting for tempPage to finish its tasks.
Is this a bug, or am I misunderstanding something? How should I fix this so that when the for loop loops again, it is only after tempPage has clicked.
Generally, you cannot rely on await tempPage.click("a.orange") to pause execution until tempPage has "finish[ed] its tasks". For super simple code that executes synchronously, it may work. But in general, you cannot rely on it.
If the click triggers an Ajax operation, or starts a CSS animation, or starts a computation that cannot be immediately computed, or opens a new page, etc., then the result you are waiting for is asynchronous, and the .click method will not wait for this asynchronous operation to complete.
What can you do? In some cases you may be able to hook into the code that is running on the page and wait for some event that matters to you. For instance, if you want to wait for an Ajax operation to be done and the code on the page uses jQuery, then you might use ajaxComplete to detect when the operation is complete. If you cannot hook into any event system to detect when the operation is done, then you may need to poll the page to wait for evidence that the operation is done.
Here is an example that shows the issue:
const puppeteer = require('puppeteer');
function getResults(page) {
return page.evaluate(() => ({
clicked: window.clicked,
asynchronousResponse: window.asynchronousResponse,
}));
}
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.goto("https://example.com");
// We add a button to the page that will click later.
await page.evaluate(() => {
const button = document.createElement("button");
button.id = "myButton";
button.textContent = "My Button";
document.body.appendChild(button);
window.clicked = 0;
window.asynchronousResponse = 0;
button.addEventListener("click", () => {
// Synchronous operation
window.clicked++;
// Asynchronous operation.
setTimeout(() => {
window.asynchronousResponse++;
}, 1000);
});
});
console.log("before clicks", await getResults(page));
const button = await page.$("#myButton");
await button.click();
await button.click();
console.log("after clicks", await getResults(page));
await page.waitForFunction(() => window.asynchronousResponse === 2);
console.log("after wait", await getResults(page));
await browser.close();
});
The setTimeout code simulates any kind of asynchronous operation started by the click.
When you run this code, you'll see on the console:
before click { clicked: 0, asynchronousResponse: 0 }
after click { clicked: 2, asynchronousResponse: 0 }
after wait { clicked: 2, asynchronousResponse: 2 }
You see that clicked is immediately incremented twice by the two clicks. However, it takes a while before asynchronousResponse is incremented. The statement await page.waitForFunction(() => window.asynchronousResponse === 2) polls the page until the condition we are waiting for is realized.
You mentioned in a comment that the button is closing the tab. Opening and closing tabs are asynchronous operations. Here's an example:
puppeteer.launch().then(async browser => {
let pages = await browser.pages();
console.log("number of pages", pages.length);
const page = pages[0];
await page.goto("https://example.com");
await page.evaluate(() => {
window.open("https://example.com");
});
do {
pages = await browser.pages();
// For whatever reason, I need to have this here otherwise
// browser.pages() always returns the same value. And the loop
// never terminates.
await page.evaluate(() => {});
console.log("number of pages after evaluating open", pages.length);
} while (pages.length === 1);
let tempPage = pages[pages.length - 1];
// Add a button that will close the page when we click it.
tempPage.evaluate(() => {
const button = document.createElement("button");
button.id = "myButton";
button.textContent = "My Button";
document.body.appendChild(button);
window.clicked = 0;
window.asynchronousResponse = 0;
button.addEventListener("click", () => {
window.close();
});
});
const button = await tempPage.$("#myButton");
await button.click();
do {
pages = await browser.pages();
// For whatever reason, I need to have this here otherwise
// browser.pages() always returns the same value. And the loop
// never terminates.
await page.evaluate(() => {});
console.log("number of pages after click", pages.length);
} while (pages.length > 1);
await browser.close();
});
When I run the above, I get:
number of pages 1
number of pages after evaluating open 1
number of pages after evaluating open 1
number of pages after evaluating open 2
number of pages after click 2
number of pages after click 1
You can see it takes a bit before window.open() and window.close() have detectable effects.
In your comment you also wrote:
I thought await was basically what turned an asynchronous function into a synchronous one
I would not say it turns asynchronous functions into synchronous ones. It makes the current code wait for an asynchronous operation's promise to be resolved or rejected. However, more importantly for the issue at hand here, the problem is that you have two virtual machines executing JavaScript code: there's Node which runs puppeteer and the script that controls the browser, and there's the browser itself which has its own JavaScript virtual machine. Any await that you use on the Node side affects only the Node code: it has no bearing on the code that runs in the browser.
It can get confusing when you see things like await page.evaluate(() => { some code; }). It looks like it is all of one piece, and all executing in the same virtual machine, but it is not. puppeteer takes the parameter passed to .evaluate, serializes it, and sends it over to the browser, where it executes. Try adding something like await page.evaluate(() => { button.click(); }); in the script above, after const button = .... Something like this:
const button = await tempPage.$("#myButton");
await button.click();
await page.evaluate(() => { button.click(); });
In the script, button is defined before page.evaluate, but you'll get a ReferenceError when page.evaluate runs because button is not defined on the browser side!