Currently I am using
const element = await page.$('div.layout-board-section')
to get the elementHandle of the div. However, I then need to get the list of classes from that elementHandle. I've tried a couple different solutions though they all seem to only return the first class using element.className in an evaluate function.
Is there any way to get all of the classes of an element?
You can use a node's .classList property.
const classes = await page.$eval(
'div.layout-board-section',
el => [...el.classList]
);
or if you already have an elementHandle:
const classes = await someElement.evaluate(el => [...el.classList]);
Complete example:
const puppeteer = require("puppeteer");
let browser;
(async () => {
const html = `<div class="foo bar baz quux">blahhh</div>`;
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const classes = await page.$eval("div", el => [...el.classList]);
console.log(classes); // => [ 'foo', 'bar', 'baz', 'quux' ]
// or with an elementHandle:
const divEl = await page.$("div");
console.log(await divEl.evaluate(el => [...el.classList]));
})()
.catch(err => console.error(err))
.finally(async () => await browser.close())
;
Related
Here is my code where I have got the element Handle of some target divs
const puppeteer = require("puppeteer");
(async () => {
const searchString = `https://www.google.com/maps/search/restaurants/#-6.4775265,112.057849,3.67z`;
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(searchString);
const xpath_expression ='//div[contains(#aria-label, "Results for")]/div/div[./a]';
await page.waitForXPath(xpath_expression);
const targetDivs = await page.$x(xpath_expression);
// const link_urls = await page.evaluate((...targetDivs) => {
// return targetDivs.map((e) => {
// return e.textContent;
// });
// }, ...targetDivs);
})();
I have two relative XPath links inside these target Divs which contain related data
'link' : './a/#href'
'title': './a/#aria-label'
I have a sample of similar python code like this
from parsel import Selector
response = Selector(page_content)
results = []
for el in response.xpath('//div[contains(#aria-label, "Results for")]/div/div[./a]'):
results.append({
'link': el.xpath('./a/#href').extract_first(''),
'title': el.xpath('./a/#aria-label').extract_first('')
})
How to do it in puppeteer?
I think you can get the href and ariaLabel property values with e.g.
const targetDivs = await page.$x(xpath_expression);
targetDivs.forEach(async (div, pos) => {
const links = await div.$x('a[#href]');
const href = await (await links[0].getProperty('href')).jsonValue();
const ariaLabel = await (await links[0].getProperty('ariaLabel')).jsonValue();
console.log(pos, href, ariaLabel);
});
These are the element properties, not the attribute values, which, in the case of href, might for instance mean you get an absolute instead of a relative URL but I haven't checked for that particular page whether it makes a difference. I am not sure the $x allows direct attribute node or even string value selection, the documentation only talks about element handles.
I am getting this error, when I try to run the script (which uses webpack)
Error: Evaluation failed: ReferenceError: _babel_runtime_helpers_toConsumableArray__WEBPACK_IMPORTED_MODULE_1___default is not defined at __puppeteer_evaluation_script__:2:27
but when I run same code which doesn't use webpack I got the expected result.
here is my function.
const getMeenaClickProducts = async (title) => {
const url = ` ${MEENACLICK}/${title}`;
console.log({ url });
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url);
await page.waitForSelector('.ant-pagination-total-text');
const products = await page.evaluate(() => {
const cards = [...document.querySelectorAll('.card-thumb')];
console.log({ cards });
return cards.map((card) => {
const productTitle = card.querySelector('.title').innerText;
const priceElement = card.querySelector('.reg-price');
const price = priceElement ? priceElement.innerText : '';
const image = card.querySelector('.img').src;
const link = card.querySelector('.main-link').href;
return {
title: productTitle,
price,
image,
link,
};
});
});
await browser.close();
const filteredProducts = products
.filter((product) =>
product.title.toLowerCase().includes(title.toLowerCase())
)
.filter((item) => item.price);
return filteredProducts;
};
what could be the reason?
The problem is with Babel, and with this part:
const products = await page.evaluate(() => {
const cards = [...document.querySelectorAll('.card-thumb')];
console.log({ cards });
return cards.map((card) => {
const productTitle = card.querySelector('.title').innerText;
const priceElement = card.querySelector('.reg-price');
const price = priceElement ? priceElement.innerText : '';
const image = card.querySelector('.img').src;
const link = card.querySelector('.main-link').href;
return {
title: productTitle,
price,
image,
link,
};
});
});
The inside of the page.evaluate() script you are passing as a function parameter, is not the actual code that is being passed to the page instance, because first you are using babel to transform it.
The array spread operator you have in this part:
const cards = [...document.querySelectorAll('.card-thumb')];
Is most likely being transformed in your build to a function named _babel_runtime_helpers_toConsumableArray__WEBPACK_IMPORTED_MODULE_1___default, which is then passed to the puppeteer page context, and ultimately executed in that page. But such function is not defined in that context, that's why you get a ReferenceError.
Some options to fix it:
Don't use the spread operator combined with the current babel config you are using, so the transformed build doesn't includ a polyfill/replacement of it. Think of a replacement with an equivalent effect, such as:
const cards = Array.from(document.querySelectorAll('.card-thumb'));
Or more traditional for / forEach() loops and build up the array yourself will get job done.
Update your babel config / target language level to support the spread operator natively.
I am trying to download invoice from website using puppeteer, I just started to learn puppeteer. I am using node to create and execute the code. I have managed to login and navigate to the invoice page, but it opens in new tab, so, code is not detecting it since its not the active tab. This is the code I used:
const puppeteer = require('puppeteer')
const SECRET_EMAIL = 'emailid'
const SECRET_PASSWORD = 'password'
const main = async () => {
const browser = await puppeteer.launch({
headless: false,
})
const page = await browser.newPage()
await page.goto('https://my.apify.com/sign-in', { waitUntil: 'networkidle2' })
await page.waitForSelector('div.sign_shared__SignForm-sc-1jf30gt-2.kFKpB')
await page.type('input#email', SECRET_EMAIL)
await page.type('input#password', SECRET_PASSWORD)
await page.click('input[type="submit"]')
await page.waitForSelector('#logged-user')
await page.goto('https://my.apify.com/billing#/invoices', { waitUntil: 'networkidle2' })
await page.waitForSelector('#reactive-table-1')
await page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a')
const newPagePromise = new Promise(x => browser.once('targetcreated', target => x(target.page())))
const page2 = await newPagePromise
await page2.bringToFront()
await page2.screenshot({ path: 'apify1.png' })
//await browser.close()
}
main()
In the above code I am just trying to take screenshot. Can anyone help me?
Here is an example of a work-around for the chromium issue mentioned in the comments above. Adapt to fit your specific needs and use-case. Basically, you need to capture the new page (target) and then do whatever you need to do to download the file, possibly pass it as a buffer to Node as per the example below if no other means work for you (including a direct request to the download location via fetch or ideally some request library on the back-end)
const [PDF_page] = await Promise.all([
browser
.waitForTarget(target => target.url().includes('my.apify.com/account/invoices/' && target).then(target => target.page()),
ATT_page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a'),
]);
const asyncRes = PDF_page.waitForResponse(response =>
response
.request()
.url()
.includes('my.apify.com/account/invoices'));
await PDF_page.reload();
const res = await asyncRes;
const url = res.url();
const headers = res.headers();
if (!headers['content-type'].includes('application/pdf')) {
await PDF_page.close();
return null;
}
const options = {
// target request options
};
const pdfAb = await PDF_page.evaluate(
async (url, options) => {
function bufferToBase64(buffer) {
return btoa(
new Uint8Array(buffer).reduce((data, byte) => {
return data + String.fromCharCode(byte);
}, ''),
);
}
return await fetch(url, options)
.then(response => response.arrayBuffer())
.then(arrayBuffer => bufferToBase64(arrayBuffer));
},
url,
options,
);
const pdf = Buffer.from(pdfAb, 'base64');
await PDF_page.close();
I'm trying to click on a cookiewall on a webpage, but Puppeteer refuses to recognize the short selector with just the type and class selector (button.button-action). Changing this to the full CSS selector fixes the problem but isn't a viable solution since any chance in parent elements can break the selector. As far as I know this shouldn't be a problem because on the page in question using document.querySelector("button.button-action") also returns the element I'm trying to click.
The code that doesn't work:
const puppeteer = require('puppeteer');
const main = async () => {
const browser = await puppeteer.launch({headless: false,});
const page = await browser.newPage();
await page.goto("https://www.euclaim.nl/check-uw-vlucht#/problem", { waitUntil: 'networkidle2' });
const cookiewall = await page.waitForSelector("button.button-action", {visible: true});
await cookiewall.click();
};
main();
The code that does work:
const puppeteer = require('puppeteer');
const main = async () => {
const browser = await puppeteer.launch({headless: false,});
const page = await browser.newPage();
await page.goto("https://www.euclaim.nl/check-uw-vlucht#/problem", { waitUntil: 'networkidle2' });
const cookiewall = await page.waitForSelector("#InfoPopupContainer > div.ipBody > div > div > div.row.actionButtonContainer.mobileText > button", {visible: true});
await cookiewall.click();
};
main();
The problem is that you have three button.button-action there. And the first match is not visible.
One thing you could do is waitForSelector but without the visible bit (because it will check the first button).
And then iterate through all items checking which item is clickable.
await page.waitForSelector("button.button-action");
const actions = await page.$$("button.button-action");
for(let action of actions) {
if(await action.boundingBox()){
await action.click();
break;
}
}
I know the common methods such as evaluate for capturing the elements in puppeteer, but I am curious why I cannot get the href attribute in a JavaScript-like approach as
const page = await browser.newPage();
await page.goto('https://www.example.com');
let links = await page.$$('a');
for (let i = 0; i < links.length; i++) {
console.log(links[i].getAttribute('href'));
console.log(links[i].href);
}
await page.$$('a') returns an array with ElementHandles — these are objects with their own pupeteer-specific API, they have not usual DOM API for HTML elements or DOM nodes. So you need either retrieve attributes/properties in the browser context via page.evaluate() or use rather complicated ElementHandles API. This is an example with both ways:
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('https://example.org/');
// way 1
const hrefs1 = await page.evaluate(
() => Array.from(
document.querySelectorAll('a[href]'),
a => a.getAttribute('href')
)
);
// way 2
const elementHandles = await page.$$('a');
const propertyJsHandles = await Promise.all(
elementHandles.map(handle => handle.getProperty('href'))
);
const hrefs2 = await Promise.all(
propertyJsHandles.map(handle => handle.jsonValue())
);
console.log(hrefs1, hrefs2);
await browser.close();
} catch (err) {
console.error(err);
}
})();
const yourHref = await page.$eval('selector', anchor => anchor.getAttribute('href'));
but if are working with a handle you can
const handle = await page.$('selector');
const yourHref = await page.evaluate(anchor => anchor.getAttribute('href'), handle);
I don't know why it's such a pain, but this was found when I encountered this a while ago.
async function getHrefs(page, selector) {
return await page.$$eval(selector, anchors => [].map.call(anchors, a => a.href));
}
A Type safe way of returning an array of strings as the hrefs of the links by casting using the HTMLLinkElement generic for TypeScript users:
await page.$$eval('a', (anchors) => anchors.map((link) => (link as HTMLLinkElement).href));
A simple way to get an href from an anchor element
Say you fetched an anchor element with the following
const anchorElement = await page.$('a') // or page.$<HTMLAnchorElement>('a') if using typescript
You can get the href property with the following
const href = anchorElement.evaluate(element => element.href)