How to access page.getby* functions inside crawlee

How to access page.getby* functions inside crawlee - node.js

I'm using crawlee with PlaywrightCrawler. I'm getting a new url to crawl after clicking a few elements in the starting page. The way that i'm clicking those elements is using
page.getByRole().click(), which codegen playwright used it:
import { chromium } from "#playwright/test";
const browser = await chromium.launch({
headless: true,
});
const context = await browser.newContext();
const page = await context.newPage();
for (let i = 0; i < brandSection.length; i++) {
let [brandName, brandCount] = brandSection[i].split("\n");
await page
.getByRole("button", { name: `${brandName} ${brandCount}` })
.click();
}
So this works without crawlee, but when I try to use it inside a PlaywrightCrawler, It fails saying that the page instance doesn't have a method called .getByRole().
import { createPlaywrightRouter, enqueueLinks, Dataset } from "crawlee";
import { PlaywrightCrawler } from "crawlee";
....
....
router.addDefaultHandler(async ({ page, request, enqueueLinks }) => {
const prodGridSel = ".catalog-grid a";
//**here goes the code that uses .getByRole()**//
await enqueueLinks({
...
},
});
...
...
const crawler = new PlaywrightCrawler({
requestHandler: router,
});
I haven't used playwright for testing, only for crawling with crawlee, so I'm guessing the getby*() functions are available when using "#playwright/test". I didn't found any information except this, which is related to cypress and probably a faulty import.
So, can I have a page instance inside crawlee that has these functions?

Related

Why cant Puppeteer find this link element on page?

^^UPDATE^^
Willing to pay someone to walk me through this, issue posted on codeMentor.io: https://www.codementor.io/u/dashboard/my-requests/9j42b83f0p
I've been looking to click on the element:
<a id="isc_LinkItem_1$20j" href="javascript:void" target="javascript" tabindex="2"
onclick="if(window.isc_LinkItem_1) return isc_LinkItem_1.$30i(event);"
$9a="$9d">Reporting</a>
In: https://stackblitz.com/edit/js-nzhhbk
(I haven't included the acutal page because its behind a username & pass)
seems easy enough
----------------------------------------------------------------------
solution1:
page.click('[id=isc_LinkItem_1$20j]') //not a valid selector
solution2:
const linkHandlers = await frame.$x("//a[contains(text(), 'Reporting')]");
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error('Link not found');
} //link not found
----------------------------------------------------------------------
I have looked at every which way to select and click it and it says it isn't in the document even though it clearly is (verified by inspecting the html in chrome dev tools and calling:page.evaluate(() => document.body.innerHTML))
**tried to see if it was in an iframe
**tried to select by id
**tried to select by inner text
**tried to console log the body in the browser (console logging not working verified on the inspect _element) //nothing happens
**tried to create an alert with body text by using: _evaluate(() => alert(document)) // nothing happens
**tried to create an alert to test to see if javascript can be injected by: _evaluate(() => alert('works')) // nothing happens
**also tried this: How to select elements within an iframe element in Puppeteer // doesn't work
Here is the code I have built so far
const page = await browser.newPage();
const login1url =
'https://np3.nextiva.com/NextOSPortal/ncp/landing/landing-platform';
await page.goto(login1url);
await page.waitFor(1000);
await page.type('[name=loginUserName', 'itsaSecretLol');
await page.type('[name=loginPassword]', 'nopeHaha');
await page.click('[type=submit]');
await page.waitForNavigation();
const login3url = 'https://np3.nextiva.com/NextOSPortal/ncp/admin/dashboard';
await page.goto(login3url);
await page.click('[id=hdr_users]');
await page.goto('https://np3.nextiva.com/NextOSPortal/ncp/user/manageUsers');
await page.goto('https://np3.nextiva.com/NextOSPortal/ncp/user/garrettmrg');
await page.waitFor(2000);
await page.click('[id=loginAsUser]');
await page.waitFor(2000);
await page.click('[id=react-select-5--value');
await page.waitFor(1000);
await page.click('[id=react-select-5--option-0]');
await page.waitFor(20000);
const elementHandle = await page.$('iframe[id=callcenter]');
const frame = await elementHandle.contentFrame();
const linkHandlers = await frame.$x("//a[contains(text(), 'Reporting')]");
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error('Link not found');
}

due isc_LinkItem_1$20j is not a valid selector, maybe you can try finding elements STARTING WITH isc_LinkItem_1 , like this
await page.waitForSelector("[id^=isc_LinkItem_1]", {visible: true, timeout: 30000});
await page.click("[id?=isc_LinkItem_1]);
?

On your solution1:
await page.click('a[id=isc_LinkItem_1\\$20j]');
Or try to:
await page.click('#isc_LinkItem_1\\$20j]');
I have the slight impression that you must provide what kind of element your trying to select before the brackets, in this case, an < a > element.
On the second solution, the # character means we're selecting an element by it's id

It turns out that the previous click triggered a new tab. Puppeteer doesn't move to the new tab, all previous code was being executed on the old tab. To fix all we had to do was find the new tab, select it and execute code, here is the function we wrote to select for the tab:
async function getTab(regex, browser, targets) {
let pages = await browser.pages();
if (targets) pages = await browser.targets();
let newPage;
for (let i = 0; i < pages.length; i++) {
const url = await pages[i].url();
console.log(url);
if (url.search(regex) !== -1) {
newPage = pages[i];
console.log('***');
console.log(url);
console.log('***');
break;
}
}
console.log('finished');
return newPage;
}

Click event does nothing when triggered

When I trigger a .click() event in a non-headless mode in puppeteer, nothing happens, not even an error.. "non-headless mode so i could visually monitor what is being clicked"
const scraper = {
test: async () => {
let browser, page;
try {
browser = await puppeteer.launch({
headless: false,
args: ["--no-sandbox", "--disable-setuid-sandbox"]
});
page = await browser.newPage();
} catch (err) {
console.log(err);
}
try {
await page.goto("https://www.betking.com/sports/s/eventOdds/1-840-841-0-0,1-1107-1108-0-0,1-835-3775-0-0,", {
waitUntil: "domcontentloaded"
});
console.log("scraping, wait...");
} catch (err) {
console.log(err);
}
console.log("waiting....");
try {
await page.waitFor('.eventsWrapper');
} catch (err) {
console.log(err, err.response);
}
try {
let oddsListData = await page.evaluate(async () => {
let regionAreaContainer = document.querySelectorAll('.areaContainer.region .regionGroup > .regionAreas > div:first-child > .area:nth-child(5)');
regionAreaContainer = Array.prototype.slice.call(regionAreaContainer);
let t = []; //Used to monitor the element being clicked
regionAreaContainer.forEach(async (region) => {
let dat = await region.querySelector('div');
dat.innerHTML === "GG/NG" ? t.push(dat.innerHTML) : false; //Used to confirm that the right element is being clicked
dat.innerHTML === "GG/NG" ? dat.click() : false;
})
return t;
})
console.log(oddsListData);
} catch (err) {
console.log(err);
}
}
}
I expect it to click the specified button and load in some dynamic data on the page.
In Chrome's console, I get the error
Transition Rejection($id: 1 type: 2, message: The transition has been superseded by a different transition, detail: Transition#3( 'sportsMultipleEvents'{"eventMarketIds":"1-840-841-0-0,1-1107-1108-0-0,1-835-3775-0-0,"} -> 'sportsMultipleEvents'{"eventMarketIds":"1-840-841-0-0,1-1107-1108-0-0,1-835-3775-535-14,"} ))

Problem
Behaving non-human-like by executing code like element.click() (inside the page context) or element.value = '..' (see this answer for a similar problem) seems to be problematic for Angular applications. You want to try to behave more human-like by using puppeteer functions like page.click() as they simulate a "real" mouse click instead of just triggering the element's click event.
In addition the page seems to rebuild parts of the page whenever one of the items is clicked. Therefore, you need to execute the selector again after each click.
Code sample
To behave more human-like and requery the elements after each click you can change the latter part of your code to something like this:
let list = await page.$x("//div[div/text() = 'GG/NG']");
for (let i = 0; i < list.length; i++) {
await list[i].click();
// give the page some time and then query the selectors again
await page.waitFor(500);
list = await page.$x("//div[div/text() = 'GG/NG']");
}
This code uses an XPath expression to query the div elements which contain another div element with the given text. After that, a click is simulated on the element and then the contents of the page are queried another time to respect the change of the DOM elements.

Here might be a less confusing way to click those:
for(var div of document.querySelectorAll('div')){
if(div.innerHTML === 'GG/NG') div.click()
}

Puppeteer in NodeJS reports 'Error: Node is either not visible or not an HTMLElement'

I'm using 'puppeteer' for NodeJS to test a specific website. It seems to work fine in most case, but some places it reports:
Error: Node is either not visible or not an HTMLElement
The following code picks a link that in both cases is off the screen.
The first link works fine, while the second link fails.
What is the difference?
Both links are off the screen.
Any help appreciated,
Cheers, :)
Example code
const puppeteer = require('puppeteer');
const initialPage = 'https://website.com/path';
const selectors = [
'div[id$="-bVMpYP"] article a',
'div[id$="-KcazEUq"] article a'
];
(async () => {
let selector, handles, handle;
const width=1024, height=1600;
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width, height }
});
const page = await browser.newPage();
await page.setViewport({ width, height});
page.setUserAgent('UA-TEST');
// Load first page
let stat = await page.goto(initialPage, { waitUntil: 'domcontentloaded'});
// Click on selector 1 - works ok
selector = selectors[0];
await page.waitForSelector(selector);
handles = await page.$$(selector);
handle = handles[12]
console.log('Clicking on: ', await page.evaluate(el => el.href, handle));
await handle.click(); // OK
// Click that selector 2 - fails
selector = selectors[1];
await page.waitForSelector(selector);
handles = await page.$$(selector);
handle = handles[12]
console.log('Clicking on: ', await page.evaluate(el => el.href, handle));
await handle.click(); // Error: Node is either not visible or not an HTMLElement
})();
I'm trying to emulate the behaviour of a real user clicking around the site, which is why I use .click(), and not .goto(), since the a tags have onclick events.

Instead of
await button.click();
do this:
await button.evaluate(b => b.click());
The difference is that button.evaluate(b => b.click()) runs the JavaScript HTMLElement.click() method on the given element in the browser context, which will fire a click event on that element even if it's hidden, off-screen or covered by a different element, whereas button.click() clicks using Puppeteer's ElementHandle.click() which
scrolls the page until the element is in view
gets the bounding box of the element (this step is where the error happens) and finds the screen x and y pixel coordinates of the middle of that box
moves the virtual mouse to those coordinates and sets the mouse to "down" then back to "up", which triggers a click event on the element under the mouse

First and foremost, your defaultViewport object that you pass to puppeteer.launch() has no keys, only values.
You need to change this to:
'defaultViewport' : { 'width' : width, 'height' : height }
The same goes for the object you pass to page.setViewport().
You need to change this line of code to:
await page.setViewport( { 'width' : width, 'height' : height } );
Third, the function page.setUserAgent() returns a promise, so you need to await this function:
await page.setUserAgent( 'UA-TEST' );
Furthermore, you forgot to add a semicolon after handle = handles[12].
You should change this to:
handle = handles[12];
Additionally, you are not waiting for the navigation to finish (page.waitForNavigation()) after clicking the first link.
After clicking the first link, you should add:
await page.waitForNavigation();
I've noticed that the second page sometimes hangs on navigation, so you might find it useful to increase the default navigation timeout (page.setDefaultNavigationTimeout()):
page.setDefaultNavigationTimeout( 90000 );
Once again, you forgot to add a semicolon after handle = handles[12], so this needs to be changed to:
handle = handles[12];
It's important to note that you are using the wrong selector for your second link that you are clicking.
Your original selector was attempting to select elements that were only visible to xs extra small screens (mobile phones).
You need to gather an array of links that are visible to your viewport that you specified.
Therefore, you need to change the second selector to:
div[id$="-KcazEUq"] article .dfo-widget-sm a
You should wait for the navigation to finish after clicking your second link as well:
await page.waitForNavigation();
Finally, you might also want to close the browser (browser.close()) after you are done with your program:
await browser.close();
Note: You might also want to look into handling unhandledRejection errors.
Here is the final solution:
'use strict';
const puppeteer = require( 'puppeteer' );
const initialPage = 'https://statsregnskapet.dfo.no/departementer';
const selectors = [
'div[id$="-bVMpYP"] article a',
'div[id$="-KcazEUq"] article .dfo-widget-sm a'
];
( async () =>
{
let selector;
let handles;
let handle;
const width = 1024;
const height = 1600;
const browser = await puppeteer.launch(
{
'defaultViewport' : { 'width' : width, 'height' : height }
});
const page = await browser.newPage();
page.setDefaultNavigationTimeout( 90000 );
await page.setViewport( { 'width' : width, 'height' : height } );
await page.setUserAgent( 'UA-TEST' );
// Load first page
let stat = await page.goto( initialPage, { 'waitUntil' : 'domcontentloaded' } );
// Click on selector 1 - works ok
selector = selectors[0];
await page.waitForSelector( selector );
handles = await page.$$( selector );
handle = handles[12];
console.log( 'Clicking on: ', await page.evaluate( el => el.href, handle ) );
await handle.click(); // OK
await page.waitForNavigation();
// Click that selector 2 - fails
selector = selectors[1];
await page.waitForSelector( selector );
handles = await page.$$( selector );
handle = handles[12];
console.log( 'Clicking on: ', await page.evaluate( el => el.href, handle ) );
await handle.click();
await page.waitForNavigation();
await browser.close();
})();

For anyone still having trouble this worked for me:
await page.evaluate(()=>document.querySelector('#sign-in-btn').click())
Basically just get the element in a different way, then click it.
The reason I had to do this was because I was trying to click a button in a notification window which sits outside the rest of the app (and Chrome seemed to think it was invisible even if it was not).

I know I’m late to the party but I discovered an edge case that gave me a lot of grief, and this thread, so figured I’d post my findings.
The culprit:
CSS
scroll-behavior: smooth
If you have this you will have a bad time.
The solution:
await page.addStyleTag({ content: "{scroll-behavior: auto !important;}" });
Hope this helps some of you.

My way
async function getVisibleHandle(selector, page) {
const elements = await page.$$(selector);
let hasVisibleElement = false,
visibleElement = '';
if (!elements.length) {
return [hasVisibleElement, visibleElement];
}
let i = 0;
for (let element of elements) {
const isVisibleHandle = await page.evaluateHandle((e) => {
const style = window.getComputedStyle(e);
return (style && style.display !== 'none' &&
style.visibility !== 'hidden' && style.opacity !== '0');
}, element);
var visible = await isVisibleHandle.jsonValue();
const box = await element.boxModel();
if (visible && box) {
hasVisibleElement = true;
visibleElement = elements[i];
break;
}
i++;
}
return [hasVisibleElement, visibleElement];
}
Usage
let selector = "a[href='https://example.com/']";
let visibleHandle = await getVisibleHandle(selector, page);
if (visibleHandle[1]) {
await Promise.all([
visibleHandle[1].click(),
page.waitForNavigation()
]);
}

How to avoid being detected as bot on Puppeteer and Phantomjs?

Puppeteer and PhantomJS are similar. The issue I'm having is happening for both, and the code is also similar.
I'd like to catch some informations from a website, which needs authentication for viewing those informations. I can't even access home page because it's detected like a "suspicious activity", like the SS: https://i.imgur.com/p69OIjO.png
I discovered that the problem doesn't happen when I tested on Postman using a header named Cookie and the value of it's cookie caught on browser, but this cookie expires after some time. So I guess Puppeteer/PhantomJS both are not catching cookies, because this site is denying the headless browser access.
What could I do for bypass this?
// Simple Javascript example
var page = require('webpage').create();
var url = 'https://www.expertflyer.com';
page.open(url, function (status) {
if( status === "success") {
page.render("home.png");
phantom.exit();
}
});

If anyone need in future for the same problem.
Using puppeteer-extra
I have tested the code on a server. On 2nd run there is google Captcha. You can solve it your self and restart the bot or use a Captcha solving service.
I did run the code more than 10 times there is no ip ban. I did not get captcha again on my continues run.
But you can get captcha again!
//sudo npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth puppeteer-extra-plugin-adblocker readline
var headless_mode = process.argv[2]
const readline = require('readline');
const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker')
puppeteer.use(AdblockerPlugin({ blockTrackers: true }))
async function run () {
const browser = await puppeteer.launch({
headless:(headless_mode !== 'true')? false : true,
ignoreHTTPSErrors: true,
slowMo: 0,
args: ['--window-size=1400,900',
'--remote-debugging-port=9222',
"--remote-debugging-address=0.0.0.0", // You know what your doing?
'--disable-gpu', "--disable-features=IsolateOrigins,site-per-process", '--blink-settings=imagesEnabled=true'
]})
const page = await browser.newPage();
console.log(`Testing expertflyer.com`)
//await page.goto('https://www.expertflyer.com')
await goto_Page('https://www.expertflyer.com')
await waitForNetworkIdle(page, 3000, 0)
//await page.waitFor(7000)
await checking_error(do_2nd_part)
async function do_2nd_part(){
try{await page.click('#yui-gen2 > a')}catch{}
await page.waitFor(5000)
var seat = '#headerTitleContainer > h1'
try{console.log(await page.$eval(seat, e => e.innerText))}catch{}
await page.screenshot({ path: 'expertflyer1.png'})
await checking_error(do_3nd_part)
}
async function do_3nd_part(){
try{await page.click('#yui-gen1 > a')}catch{}
await page.waitFor(5000)
var pro = '#headerTitleContainer > h1'
try{console.log(await page.$eval(pro, e => e.innerText))}catch{}
await page.screenshot({ path: 'expertflyer2.png'})
console.log(`All done, check the screenshots?`)
}
async function checking_error(callback){
try{
try{var error_found = await page.evaluate(() => document.querySelectorAll('a[class="text yuimenubaritemlabel"]').length)}catch(error){console.log(`catch error ${error}`)}
if (error_found === 0) {
console.log(`Error found`)
var captcha_msg = "Due to suspicious activity from your computer, we have blocked your access to ExpertFlyer. After completing the CAPTCHA below, you will immediately regain access unless further suspicious behavior is detected."
var ip_blocked = "Due to recent suspicious activity from your computer, we have blocked your access to ExpertFlyer. If you feel this block is in error, please contact us using the form below."
try{var error_msg = await page.$eval('h2', e => e.innerText)}catch{}
try{var error_msg_details = await page.$eval('body > p:nth-child(2)', e => e.innerText)}catch{}
if (error_msg_details == captcha_msg) {
console.log(`Google Captcha found, You have to solve the captch here manually or some automation recaptcha service`)
await verify_User_answer()
await callback()
} else if (error_msg_details == ip_blocked) {
console.log(`The current ip address is blocked. The only way is change the ip address.`)
} else {
console.log(`Waiting for error page load... Waiting for 10 sec before rechecking...`)
await page.waitFor(10000)
await checking_error()
}
} else {
console.log(`Page loaded successfully! You can do things here.`)
await callback()
}
}catch{}
}
async function goto_Page(page_URL){
try{
await page.goto(page_URL, { waitUntil: 'networkidle2', timeout: 30000 });
} catch {
console.log(`Error in loading page, re-trying...`)
await goto_Page(page_URL)
}
}
async function verify_User_answer(call_back){
user_Answer = await readLine();
if (user_Answer == 'yes') {
console.log(`user_Answer is ${user_Answer}, Processing...`)
// Not working what i want. Will fix later
// Have to restart the bot after solving
await call_back()
} else {
console.log(`answer not match. try again...`)
var user_Answer = await readLine();
console.log(`user_Answer is ${user_Answer}`)
await verify_User_answer(call_back)
}
}
async function readLine() {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
return new Promise(resolve => {
rl.question('Solve the captcha and type yes to continue: ', (answer) => {
rl.close();
resolve(answer)
});
})
}
async function waitForNetworkIdle(page, timeout, maxInflightRequests = 0) {
console.log('waitForNetworkIdle called')
page.on('request', onRequestStarted);
page.on('requestfinished', onRequestFinished);
page.on('requestfailed', onRequestFinished);
let inflight = 0;
let fulfill;
let promise = new Promise(x => fulfill = x);
let timeoutId = setTimeout(onTimeoutDone, timeout);
return promise;
function onTimeoutDone() {
page.removeListener('request', onRequestStarted);
page.removeListener('requestfinished', onRequestFinished);
page.removeListener('requestfailed', onRequestFinished);
fulfill();
}
function onRequestStarted() {
++inflight;
if (inflight > maxInflightRequests)
clearTimeout(timeoutId);
}
function onRequestFinished() {
if (inflight === 0)
return;
--inflight;
if (inflight === maxInflightRequests)
timeoutId = setTimeout(onTimeoutDone, timeout);
}
}
await browser.close()
}
run();
Please note "Solve the captcha and type yes to continue: " method not working as expected, Need some fixing.
Edit: Re-run the bot after 10 minutes got captcha again. Solved captcha on chrome://inspect/#devices restarted the bot, everything working again. No ip ban.

Things that can help in general :
Headers should be similar to common browsers, including :
User-Agent : use a recent one (see https://developers.whatismybrowser.com/useragents/explore/), or better, use a random recent one if you make multiple requests (see https://github.com/skratchdot/random-useragent)
Accept-Language : something like "en,en-US;q=0,5" (adapt for your language)
Accept: a standard one would be like "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"
If you make multiple request, put a random timeout between them
If you open links found in a page, set the Referer header accordingly
Images should be enabled
Javascript should be enabled
Check that "navigator.plugins" and "navigator.language" are set in the client javascript page context
Use proxies

If you think from the websites perspective, you are indeed doing suspicious work. So whenever you want to bypass something like this, make sure to think how they are thinking.
Set cookie properly
Puppeteer and PhantomJS etc will use real browsers and the cookies used there are better than when using via postman or such. You just need to use cookie properly.
You can use page.setCookie(...cookies) to set the cookies. Cookies are serialized, so if cookies is an array of object, you can simply do this,
const cookies = [{name: 'test', value: 'foo'}, {name: 'test2', value: 'foo'}]; // just as example, use real cookies here;
await page.setCookie(...cookies);
Try to tweak the behaviors
Turn off the headless mode and see the behavior of the website.
await puppeteer.launch({headless: false})
Try proxies
Some websites monitor based on Ip address, if multiple hits are from same IP, they blocks the request. It's best to use rotating proxies on that case.

The website you are trying to visit uses Distil Networks to prevent web scraping.
People have had success in the past bypassing Distil Networks by substituting the $cdc_ variable found in Chromium's call_function.js (which is used in Puppeteer).
For example:
function getPageCache(opt_doc, opt_w3c) {
var doc = opt_doc || document;
var w3c = opt_w3c || false;
// var key = '$cdc_asdjflasutopfhvcZLmcfl_'; <-- This is the line that is changed.
var key = '$something_different_';
if (w3c) {
if (!(key in doc))
doc[key] = new CacheWithUUID();
return doc[key];
} else {
if (!(key in doc))
doc[key] = new Cache();
return doc[key];
}
}
Note: According to this comment, if you have been blacklisted before you make this change, you face another set of challenges, so you must "implement fake canvas fingerprinting, disable flash, change IP, and change request header order (swap language and Accept headers)."

Click on every 'a' tag in page puppeteer

I am trying to get puppeteer to go to all a tags in a page and load them, add them to an array and return it. My puppeteer version is 1.5.0. Here is my code:
module.exports.scrapeLinks = async (page, linkXpath) => {
page.waitForNavigation();
linksElement = await page.$x(linkXpath);
var url_list_arr = [];
console.log(linksElement.length);
i=1;
for(linksElementItem in linksElement)
{
const linksData = await page.$x('(' + linkXpath + ')[' + (i + 1) +']');
if (linksData.length > 0) {
linksData[0].click();
console.log(page.url());
url_list_arr.push(page.url());
}
else {
throw new Error('Link not found');
}
}
return url_list_arr;
};
However with this code, I get an
UnhandledPromiseRejectionWarning: Error: Node is either not visible or
not an HTMLElement
I also found out through the docs that is not possible to use the xpath on the page.click function. Is there anyway to achieve this?
It is also okay if there is a function to get all the link from a page, but I couldn't find it in the docs.

To get a handle on all a-tags in an array:
const aTags= await page.$$('a')
Loop through them with:
for (const aTag of aTags) {...}
Inside the loop you can interact with each of these elementHandle separately.
Note that
await aTag.click()
will destroy (garbage collect) all elementHandles when the page context is navigated. In this case you need a workaround like loading the initial page inside a loop to always start with a fresh instance.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to access page.getby* functions inside crawlee - node.js

Related

Why cant Puppeteer find this link element on page?

Click event does nothing when triggered

Puppeteer in NodeJS reports 'Error: Node is either not visible or not an HTMLElement'

How to avoid being detected as bot on Puppeteer and Phantomjs?

Click on every 'a' tag in page puppeteer

Categories

Resources