Can't select and click on a div with "button" role - node.js

I'm trying to click on a div that has a role="button" attribute,
Even though I'm not using it trying to get it by its DOM path is not working either.
What I've tried:
try {
let Button = await this.page.$("div.class1.class2 > div");
await Button.click();
console.log("Clicked");
} catch (e) {
console.log(e);
console.log("No button");
}
The error I get is:
TypeError: Cannot read property '$' of undefined
I tried to get to the div by the div that contains it which does have 2 classes I can relate on but it doesn't seem to work.
Is there a way to get an array of all the divs with role="button" and click only on the one that has a span inside it with a specific text?

Remove this keyword to fix the TypeError error.
let Button = await page.$("div.class1.class2 > div");
To get an array of all the divs with role=button and Specific text text:
const buttons = await page.$x('//div[#role="button"][text()="Specific Text"]'); // returns: <Promise<Array<ElementHandle>>>
But I would recommend adding waitForXPath method to wait for the element.
Full example:
try {
const button = await page.waitForXPath('//div[#role="button"][text()="Specific Text"]');
await button.click();
console.log("Clicked");
} catch (e) {
console.log("No button", e);
}

Related

Cheerio how to remove DOM elements from selection

I am trying to write a bot to convert a bunch of HTML pages to markdown, in order to import them as Jekyll document. For this, I use puppeteer to get the HTML document, and cheerio to manipulate it.
The source HTML is pretty complex, and polluted with Google ADS tags, external scripts, etc. What I need to do, is to get the HTML content of a predefined selector, and then remove elements that match a predefined set of selectors from it in order to get a plain HTML with just the text and convert it to markdown.
Assume the source html is something like this:
<html>
<head />
<body>
<article class="post">
<h1>Title</h1>
<p>First paragraph.</p>
<script>That for some reason has been put here</script>
<p>Second paragraph.</p>
<ins>Google ADS</ins>
<p>Third paragraph.</p>
<div class="related">A block full of HTML and text</div>
<p>Forth paragraph.</p>
</article>
</body>
</html>
What I want to achieve is something like
<h1>Title</h1>
<p>First paragraph.</p>
<p>Second paragraph.</p>
<p>Third paragraph.</p>
<p>Forth paragraph.</p>
I defined an array of selectors that I want to strip from the source object:
stripFromText: ['.social-share', 'script', '.adv-in', '.postinfo', '.postauthor', '.widget', '.related', 'img', 'p:empty', 'div:empty', 'section:empty', 'ins'],
And wrote the following function:
const getHTMLContent = async ($, selector) => {
let value;
try {
let content = await $(selector);
for (const s of SELECTORS.stripFromText) {
// 1
content = await content.remove(s);
// 2
// await content.remove(s);
// 3
// content = await content.find(s).remove();
// 4
// await content.find(s).remove();
// 5
// const matches = await content.find(s);
// for (m of matches) {
// await m.remove();
// }
};
value = content.html();
} catch(e) {
console.log(`- [!] Unable to get ${selector}`);
}
console.log(value);
return value;
};
Where
$ is the cheerio object containing const $ = await cheerio.load(html);
selector is the dome selector for the container (in the example above it would be .post)
What I am unable to do, is to use cheerio to remove() the objects. I tried all the 5 versions I left commented in the code, but without success. Cheerio's documentation didn't help so far, and I just found this link but the proposed solution did not work for me.
I was wondering if someone more experienced with cheerio could point me in the right direction, or explain me what I am missing here.
I found a classical newby error in my code, I was missing an await before the .remove() call.
The working function now looks like this, and works:
const getHTMLContent = async ($, selector) => {
let value;
try {
let content = await $(selector);
for (const s of SELECTORS.stripFromText) {
console.log(`--- Stripping ${s}`);
await content.find(s).remove();
};
value = await content.html();
} catch(e) {
console.log(`- [!] Unable to get ${selector}`);
}
return value;
};
You can remove the elements with remove:
$('script,ins,div').remove()

Node js Click with puppeteer an element that has no id or name

Hi everyone I'm trying to click with puppeteer three elements that do not have an id, a name and a class; these are the checkboxes and the button that I have to click (www.omegle.com):
i tried to do it through the click with the coordinates but I can't center the elements to click:
await page.mouse.click(50, 200);
await page.waitForNavigation();
})()
So is there a way to click on an element without knowing its id, class or name?
// open modal by clicking "Text" button
const btnText = await page.waitForSelector('#chattypetextcell img')
await btnText.click()
// click both checkbox labels when modal opens
const selectorCheckboxLabels ='div div p label'
await page.waitForSelector(selectorCheckboxLabels)
const labels = await page.$$(selectorCheckboxLabels)
await labels[0].click()
await labels[1].click()

Using Puppeteer to extract text from span

I'm using Puppeteer to extract the text of a span by it's class name but I'm getting returned nothing. I don't know if its because the page isn't loading in time or not.
This is my current code:
async function Reload() {
Page.reload()
Price = await Page.evaluate(() => document.getElementsByClassName("text-robux-lg wait-for-i18n-format-render"))
console.log(Price)
}
Reload()
HTML
<div class="icon-text-wrapper clearfix icon-robux-price-container">
<span class="icon-robux-16x16 wait-for-i18n-format-render"></span>
<span class="text-robux-lg wait-for-i18n-format-render">689</span>
</div>
because the function that you passed to Page.evaluate() returns a non-Serializable value.
from the puppeteer official document
If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined
so you have to make the function that passed to Page.evaluate() returns the text of span element rather than returns the Element object of span.
like the following code
const puppeteer = require('puppeteer');
const htmlCode = `
<div class="icon-text-wrapper clearfix icon-robux-price-container">
<span class="icon-robux-16x16 wait-for-i18n-format-render"></span>
<span class="text-robux-lg wait-for-i18n-format-render">689</span>
</div>
`;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(htmlCode);
const price = await page.evaluate(() => {
const elements = document.getElementsByClassName('text-robux-lg wait-for-i18n-format-render');
return Array.from(elements).map(element => element.innerText); // as you see, now this function returns array of texts instead of Array of elements
})
console.log(price); // this will log the text of all elements that have the specific class above
console.log(price[0]); // this will log the first element that have the specific class above
// other actions...
await browser.close();
})();
NOTE: if you want to get the html code from another site by its url use page.goto() instead of page.setContent()
NOTE: because you are using document.getElementsByClassName() the returned value of the function that passed to page.evaluate() in the code above will be array of texts and not text as document.getElementById() do
NOTE: if you want to know what is the difference between Serializable objects and non-serializable objects read the answers of this question on Stackoverflow

Puppeteer: how to foreach every button class and click if specific class name found

How to foreach every button class and click if specific class name found
<button class="b-deliverytime--slot b-deliverytime--slot-unavailable" aria-label="Not Available Today" title="Not free today">Busy</button>
<button class="b-deliverytime--slot b-deliverytime--slot-available" aria-label="Available Today" title="Today Free">Free</button>
I need to find every button with "--slot-available" and click it
Don't use forEach for asynchronous execution as it throws away the promises instead of awaiting them. Use a simple for loop:
const buttons = await page.$$('button[class*="--slot-available"]')
for (const button of buttons)
await button.click();
You can use a CSS selector to do the filtering:
const elements = await page.$$('button[class*="--slot-available"]');
elements.forEach(async element => {
await element.click();
});
The [attribute*=value] selector matches every element whose attribute value contain a specified value.

puppeteer howto get element tagName

I would like to get an element's tagName. Should be button in following example.
const puppeteer = require('puppeteer')
async function run () {
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage()
const html = `
<div>
<button type="button">click me</button>
<span>Some words.</span>
</div>
`
await page.setContent(html)
const elements = await page.$$('button')
const tagName = await elements[0].$eval('*', node => node.tagName)
console.log(tagName) // expect to be 'button'
await browser.close()
}
run()
The error message said Error: failed to find element matching selector "*"
I can tell elements matched one element as elements.length is 1
Where is wrong?
========== Edit ==========
Let's say I already had elements beforehand, how to get the tagName out of it.
Thanks!
Try using page.$eval to select the button, and then get the tagName from the button:
const tagName = await page.$eval('button', button => button.tagName);
If you already have an elementHandle like elements[0], you can get an attribute from that element by passing it through page.evaluate:
const tagName = await page.evaluate(
element => element.tagName,
elements[0]
);
It appears your elements is an array of ElementHandles.
In that case, there may be a slightly more straightforward syntax:
const tag_name = await (await elements[0].getProperty('tagName')).jsonValue()
This does not involve referring to the page object.
Thanks!

Resources