Count years with Puppeteer - node.js

I'm trying to count the total number of "year" from the following page:
See UPDATE below.
In my Nodejs script, I have:
await page.click(SELECT_YEARS_TO_VIEW);
await Promise.all([
page.waitForNavigation({ waitUntil: 'networkidle0' }),
page.waitForSelector('#ItemsTable > tbody > tr > td.DataItemSelections')
]);
const numberOfYears = (await page.$$('#ItemsTable > tbody > tr > td.DataItemSelections')).length;
console.log(`Years length: ${numberOfYears}.`);
It returns: Years length: 16.
Instead, in the Chrome console, if I run:
document.querySelectorAll('#ItemsTable > tbody > tr > td.DataItemSelections').length;
The (correct) output is: 39
I have read Puppeteer - counting elements in the DOM, but the suggestions inside it didn't resolve my problem.
UPDATE: the starting point is: https://unctadstat.unctad.org/wds/TableViewer/tableView.aspx?ReportId=96740
Then you have to click the icon "Select items to view" and then to "YEAR":
Here, the page where I need to count the number of the years:

The site is changing html based on scroll, you can open the dev tools and check the html tr tags change with the scroll, you can try and do the scroll to get the data but you can just intercept the response with all that data anyway.
Another way of counting the years,
await page.setRequestInterception(true);
page.on('response', async response => {
if (response.url().indexOf('https://unctadstat.unctad.org/wds/TableViewer/getItems.aspx') > -1) {
console.log(response.url());
console.log(await response.text()); // parse xml to json and count it
}
})
await Promise.all([
page.waitForNavigation({ waitUntil: 'networkidle0' }),
page.click(SELECT_YEARS_TO_VIEW)
]);

Related

puppeteer text box with no id

I want to use Puppeteer to enter a value in an input field. Ive done it for most of a web page but having a real problem with a specific field that doesn't have an id or a good label.
here is the inspect elements
<div class="retype-security-code"><input type="text" class="form-text mask-cvv-4" aria-label="Security code" placeholder="CVV2" value=""><img src="https://c1.neweggimages.com/WebResource/Themes/Nest/images/cvv2_sm.gif" alt="cvv2"></div>
<input type="text" class="form-text mask-cvv-4" aria-label="Security code" placeholder="CVV2" value="">
image of code above
here is some code that Ive been playing with
while (true) {
try {
await page.waitForSelector('#cvv2Code' , {timeout: 500})
await page.type('#cvv2Code', config.cv2)
break
}
catch (err) {}
try {
await page.waitForSelector('#creditCardCVV2' , {timeout: 500})
await page.type('#creditCardCVV2', config.cv2)
break
}
catch (err) {}
try {
await page.waitForSelector('#app > div > section > div > div > form > div.row-inner > div.row-body > div > div:nth-child(3) > div > div.checkout-step-body > div.checkout-step-done > div.card > div.retype-security-code > input' , {timeout: 500})
await page.focus('#app > div > section > div > div > form > div.row-inner > div.row-body > div > div:nth-child(3) > div > div.checkout-step-body > div.checkout-step-done > div.card > div.retype-security-code > input')
await page.keyboard.type('###')
break
}
catch (err) {}
}
Why are you using #cvv2Code and #creditCardCVV2 as selectors when they are not in your html code, nor in the picture?
This class form-text mask-cvv-4 seems like a reasonable option for the field:
await page.waitForSelector('.form-text.mask-cvv-4');
Those selectors in the last try block are too long, that's unmaintainable and hard to read, avoid writing such selectors.
Also, please add all relevant error messages to your question, it's hard(er) to help you without it.
What I found you can do is find the element using querySelector and then use setAttribute to give the elemtent an id.
So something like
await page.evaluate(() => {
const inputBox = document.querySelector(".form-text mask-cvv-4");
inputBoxes.setAttribute('id', 'inputBox1');
});
await page.type("#inputBox1", "yourText");
If the element has children then you'd just have to get the children from the element. If there is multiple elements with the class tag you can use querySelectorAll and loop through that list.

timeout error with navigation and waitForSelector() in puppeteer irrespective of timeout value

I want my program to do this:
open a web page
click on a button to go to a new page
take a screenshot of the new page.
Steps 1 and 2 are working fine but I'm running into timeout error with step 3. Based on responses to similar questions on StackOverflow, I used waitForNavigation() with bigger timeout spans (up to 2 min) but I'm still getting the same error. Using waitForSelector() instead of waitForNavigation() is also giving the same error. If I remove both, puppeteer takes a screenshot of the webpage in step 1. I have also tried using different options with waitUntil, such as "domcontentloaded", "loaded", "networkidle0" and "newtorkidle2", but nothing is working.
This is my first program in puppeteer and I've been stuck on this problem for a long time.
Here's my code:
await page.waitForSelector('#featured > c-wiz > div.OXo54d > div > div > div > span > span > span.veMtCf');
// await navigation;
await page.screenshot({path: 'learnmore.png'});
console.log('GOT THIS FAR:)');
//await page.close();
await browser.close();
return 0;
Here's the complete program:
const puppeteer = require('puppeteer');
(async () => {
try{
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
// const navigationPromise = page.waitForNavigation({waitUntil: "load"});
//google.com
await page.goto('https://google.com');
await page.type('input.gLFyf.gsfi',"hotels in london");
await page.keyboard.press('Enter');
//search results
// await navigationPromise;
await page.waitForSelector('#rso > div:nth-child(2) > div > div > div > g-more-link > a > div');
await page.click('#rso > div:nth-child(2) > div > div > div > g-more-link > a > div');
//list of hotels
// await navigationPromise;
await page.waitForSelector('#yDmH0d > c-wiz.zQTmif.SSPGKf > div > div.lteUWc > div > c-wiz > div > div.gpcwnc > div.cGQUT > main > div > div.Hkwcrd.Sy8xcb.XBQ4u > c-wiz > div.J6e2Vc > div > div > span > span');
await page.click("#yDmH0d > c-wiz.zQTmif.SSPGKf > div > div.lteUWc > div > c-wiz > div > div.gpcwnc > div.cGQUT > main > div > div.Hkwcrd.Sy8xcb.XBQ4u > c-wiz > div.l5cSPd > c-wiz:nth-child(3) > div > div > div > div.kCsInf.ZJqrAd.qiy8jf.G9g6o > div > div.TPQEac.qg10C.RCpQOe > a > button > span");
//"learn more"
// await navigationPromise;
//This is where timeout error occurs:
await page.waitForSelector('#featured > c-wiz > div.OXo54d > div > div > div > span > span > span.veMtCf');
// await navigation;
await page.screenshot({path: 'learnmore.png'});
console.log('GOT THIS FAR:)');
//await page.close();
await browser.close();
return 0;
}
catch(err){
console.error(err);
}
})()
.then(resolvedValue => {
console.log(resolvedValue);
})
.catch(rejectedValue => {
console.log(rejectedValue);
})
Your timeout occurs because the selector you are waitng for is not exist on the page. (If you are opening the browser console where the script stucks and launch $(selector) it will return null)
Google uses dynamic class and id values, exactly to prevent (or to make it harder) to retrieve data by scripts, the selectors will have different values everytime you visit the page.
If you really need to scrape its content you can use XPath selectors which are less fragile compared to dynamically changing selector names:
E.g.:
await page.waitForXpath('//h3[contains(text(), "The Best Hotels in London")]')
const link = await page.$x('//h3[contains(text(), "The Best Hotels in London")]')
await link[0].click()
Docs references:
page.waitForXpath
page.$x

How can I get the HTML attribute of an element from puppeteer

How do I get an href attribute from an element in puppeteer?
I am trying to get the href attribute from the anchorTag.
const anchorTag = await page.$('table#middleContent_grvTransactionList > tbody > tr:nth-child(4) > td:nth-child(7) > a');
You can do this in many ways:
Using evaluate:
const href = await page.evaluate(el => el.getAttribute('href'), anchorTag);
Using getProperty:
const propertyHandle = await anchorTag.getProperty('href');
const href = await propertyHandle.jsonValue();
Using $eval:
const href = await page.$eval('table#middleContent_grvTransactionList > tbody > tr:nth-child(4) > td:nth-child(7) > a', el => el.getAttribute('href'));

Can't select and click on a div with "button" role

I'm trying to click on a div that has a role="button" attribute,
Even though I'm not using it trying to get it by its DOM path is not working either.
What I've tried:
try {
let Button = await this.page.$("div.class1.class2 > div");
await Button.click();
console.log("Clicked");
} catch (e) {
console.log(e);
console.log("No button");
}
The error I get is:
TypeError: Cannot read property '$' of undefined
I tried to get to the div by the div that contains it which does have 2 classes I can relate on but it doesn't seem to work.
Is there a way to get an array of all the divs with role="button" and click only on the one that has a span inside it with a specific text?
Remove this keyword to fix the TypeError error.
let Button = await page.$("div.class1.class2 > div");
To get an array of all the divs with role=button and Specific text text:
const buttons = await page.$x('//div[#role="button"][text()="Specific Text"]'); // returns: <Promise<Array<ElementHandle>>>
But I would recommend adding waitForXPath method to wait for the element.
Full example:
try {
const button = await page.waitForXPath('//div[#role="button"][text()="Specific Text"]');
await button.click();
console.log("Clicked");
} catch (e) {
console.log("No button", e);
}

How to handle focus with puppeteer on Html tags that don't have id, problem with multiple and hidden classes

I am trying to automate login and scrape for some data, html tags on the web page doesn't have ID only classes. For some reason first time I can focus on input and enter my email address but with similar code it won't type password and I can't figure out why?
I have tried
await page.waitForSelector('.login__form-wrap > .form > div:nth-child(1) > .form-group > .form-control')
await page.click('.login__form-wrap > .form > div:nth-child(1) > .form-group > .form-control')
await page.keyboard.type('myemailaddress', {delay: 100});
and it's working for email. Then similar code for password not working:
await page.waitForSelector('.login__form-wrap > .form > div:nth-child(2) > .form-group > .form-control');
await page.click('.login__form-wrap > .form > div:nth-child(2) > .form-group > .form-control');
await page.keyboard.type('mypassword', {delay: 100});
Also tried this way:
await page.waitForSelector('input[name=email]');
await page.focus('input[name=email]');
await page.keyboard.type('myemailaddress', {delay: 100});
again working, but doing the same for password:
await page.waitForSelector('input[name=password]');
await page.focus('input[name=password]');
await page.keyboard.type('myemail', {delay: 100});
doesn't work...
Doesn't work means that the password is not typed in input field.
Another solution I tried was:
await page.$eval('input[name=password]', el => el.value = "mypassword");
That works(the pass was shown), but clicking on login button it said wrong password, and when I typed it manually the same password is correct and page logs me in.
So if someone can point me in right direction of doing it, or how to handle it and show me what I'm doing wrong I would be so thankful.
P.s if you want to play with it the page I'm trying to log in is here
I already passed the validation and full code is here

Resources