How can I get the HTML attribute of an element from puppeteer - node.js

How do I get an href attribute from an element in puppeteer?
I am trying to get the href attribute from the anchorTag.
const anchorTag = await page.$('table#middleContent_grvTransactionList > tbody > tr:nth-child(4) > td:nth-child(7) > a');

You can do this in many ways:
Using evaluate:
const href = await page.evaluate(el => el.getAttribute('href'), anchorTag);
Using getProperty:
const propertyHandle = await anchorTag.getProperty('href');
const href = await propertyHandle.jsonValue();
Using $eval:
const href = await page.$eval('table#middleContent_grvTransactionList > tbody > tr:nth-child(4) > td:nth-child(7) > a', el => el.getAttribute('href'));

Related

How to exclude a nested class that is wrapped in the class being retrieved with querySelector

Using Node and Puppeteer to scrape a website.
I'm wanting to return the innerText of a class but there is another span element nested which is returning both.
<div class="inner_sm">
<h1 class="pdp_address ">79 Etwell Street
<span>East Victoria Park, WA 6101</span>
</h1>
<span class="pdp_price">$670,000
<span class="price_feature">Under Offer</span>
</span>
</div>
Target is .pdp_price wanting the result to be '$670,000' but getting '$670,000Under Offer'
const data = await page.evaluate(() => {
const address = document.querySelector('#app > div > div > article > div.pdp_header > div > div > h1').innerText.replaceAll('\n',',')
const bed = document.querySelector('.bed')?.innerText || ""
const bath = document.querySelector('.bath')?.innerText || ""
const car = document.querySelector('.car')?.innerText || ""
const price = document.querySelector('.pdp_price').innerText
return `${address}, ${price}, ${bed}, ${bath}, ${car} \n`
})
I've tried a few things but haven't been able to make it work.
const price = document.querySelector('.pdp_price:not(.price_feature)').innerText
const price = document.querySelector('.pdp_price')?.innerText.replaceAll(',','*') || ""
const cleanPrice = price.remove('.price_feature').innerText

How to speed up Node js puppeteer typing?

Hello can someone please with puppeteer, The code i have is working but typing long words takes for ever is there a way to speed up typing?
The problem is the typing takes way to long when i have a lot of words in a text file. want a way to just paste from .tx file without typing
const fsExtra = require('fs-extra')
const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
// Start building links
const links = fs.readFileSync('robux4.txt').toString().split('\r\n');
const finalLinks = links.join().replace(/,/g, '\n');
let halfLinks1 = links.slice(0, 60000);
puppeteer.launch({ headless: false }).then(async browser => {
const page = await browser.newPage()
// Build
await page.goto('https://www.site.ec.illinois.edu/account/my-account', {
waitUntil: 'networkidle0',
timeout: 110000,
});
await page.waitForSelector('#input_comp-jzxudh31')
let random1 = Math.random().toString(36).substring(10);
let random2 = Math.random().toString(36).substring(10);
let random3 = Math.random(3).toString(36).substring(10);
await page.type('#input_comp-jzxudh31', random1 + random2 ,)
await page.type('#input_comp-jzxudh50', random1 + random2 ,)
await page.type('#input_comp-jzxudh50', '#gmail.com')
await page.type('#input_comp-jzxudh60', random1 + random2)
await page.click('._1fbEI')
await page.waitFor(5000)
await page.goto('https://www.site.ec.illinois.edu/account/my-account', {
waitUntil: 'networkidle0',
timeout: 110000,
});
await page.waitForSelector('.sAa6lj > #profileVisibility > .sfYIV2 > .s3Fb6T > .s1b4VH')
await page.click('.sAa6lj > #profileVisibility > .sfYIV2 > .s3Fb6T > .s1b4VH')
await page.waitForSelector('div > div > .s3xEru > .s1CVDb > .s3UsU7')
await page.click('div > div > .s3xEru > .s1CVDb > .s3UsU7')
await page.waitForSelector('#TPASection_k14nm80e > div > div.sHCADR > div > div > section > div > div > div.s3uVoc > div > div.s1Dlxl > button.s1CVDb.s3X1VX.s2uKgn.o3x-IY--upgrade.s2_1ar.oYbiBO--primary')
await page.click('#TPASection_k14nm80e > div > div.sHCADR > div > div > section > div > div > div.s3uVoc > div > div.s1Dlxl > button.s1CVDb.s3X1VX.s2uKgn.o3x-IY--upgrade.s2_1ar.oYbiBO--primary')
await page.waitForSelector('#display-name-id')
const input = await page.$('#display-name-id');
await input.click({ clickCount: 3 })
await page.waitFor(2000)
await input.type('FREE ROBUX CODES - HOW TO GET FREE ROBUX [' + random3 + ']');
await page.waitFor(2000)
await page.waitForSelector('#TPASection_k14nm80e > div > div.s1bpIa > form > div.s2faAG > div:nth-child(2) > div > div > button.s1CVDb.s3X1VX.s2uKgn.o3x-IY---priority-7-primary.o3x-IY--upgrade.stQWOt')
await page.click('#TPASection_k14nm80e > div > div.s1bpIa > form > div.s2faAG > div:nth-child(2) > div > div > button.s1CVDb.s3X1VX.s2uKgn.o3x-IY---priority-7-primary.o3x-IY--upgrade.stQWOt')
await page.waitForSelector('#comp-k14nm5yw > ul > li:nth-child(1) > div')
await page.click('#comp-k14nm5yw > ul > li:nth-child(1) > div')
await page.waitForNavigation
await page.waitForSelector('#TPASection_kkfzu8kj > div > div.s7lnRg > div.s3hIHm.o1DNob--childrenFullWidth > div > div > div.sGjTGl.o2GAvo--withHoverColor > div.sugWin > div.s2neI5.opo9uN---typography-11-runningText.opo9uN---priority-7-primary.s1uyGa')
await page.waitFor(2000)
await page.click('#TPASection_kkfzu8kj > div > div.s7lnRg > div.s3hIHm.o1DNob--childrenFullWidth > div > div > div.sGjTGl.o2GAvo--withHoverColor > div.sugWin > div.s2neI5.opo9uN---typography-11-runningText.opo9uN---priority-7-primary.s1uyGa')
let link1 = halfLinks1.join().replace(/,/g, '\n')
let content = link1;
await page.waitForSelector('#placeholder-editor')
await page.click('#placeholder-editor')
await page.type('#placeholder-editor', content)
await page.waitForTimeout(14000)
await page.click('#TPASection_kkfzu8kj > div > div.s1CcJ8 > div > div > div > section > div > div > div.s2DXU2.s1_fDS > div.s17Yb3 > button.s2cOuO.s1ZhWc.s2lWPV.opZNt_--upgrade.s22DiS.oyXUyw--primary')
await page.waitForTimeout(3000)
const link = await page.evaluate(() => location.href);
fs.appendFileSync('links2.txt', 'FREE ROBUX CODES' + '\n')
await browser.close();
}) ```

timeout error with navigation and waitForSelector() in puppeteer irrespective of timeout value

I want my program to do this:
open a web page
click on a button to go to a new page
take a screenshot of the new page.
Steps 1 and 2 are working fine but I'm running into timeout error with step 3. Based on responses to similar questions on StackOverflow, I used waitForNavigation() with bigger timeout spans (up to 2 min) but I'm still getting the same error. Using waitForSelector() instead of waitForNavigation() is also giving the same error. If I remove both, puppeteer takes a screenshot of the webpage in step 1. I have also tried using different options with waitUntil, such as "domcontentloaded", "loaded", "networkidle0" and "newtorkidle2", but nothing is working.
This is my first program in puppeteer and I've been stuck on this problem for a long time.
Here's my code:
await page.waitForSelector('#featured > c-wiz > div.OXo54d > div > div > div > span > span > span.veMtCf');
// await navigation;
await page.screenshot({path: 'learnmore.png'});
console.log('GOT THIS FAR:)');
//await page.close();
await browser.close();
return 0;
Here's the complete program:
const puppeteer = require('puppeteer');
(async () => {
try{
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
// const navigationPromise = page.waitForNavigation({waitUntil: "load"});
//google.com
await page.goto('https://google.com');
await page.type('input.gLFyf.gsfi',"hotels in london");
await page.keyboard.press('Enter');
//search results
// await navigationPromise;
await page.waitForSelector('#rso > div:nth-child(2) > div > div > div > g-more-link > a > div');
await page.click('#rso > div:nth-child(2) > div > div > div > g-more-link > a > div');
//list of hotels
// await navigationPromise;
await page.waitForSelector('#yDmH0d > c-wiz.zQTmif.SSPGKf > div > div.lteUWc > div > c-wiz > div > div.gpcwnc > div.cGQUT > main > div > div.Hkwcrd.Sy8xcb.XBQ4u > c-wiz > div.J6e2Vc > div > div > span > span');
await page.click("#yDmH0d > c-wiz.zQTmif.SSPGKf > div > div.lteUWc > div > c-wiz > div > div.gpcwnc > div.cGQUT > main > div > div.Hkwcrd.Sy8xcb.XBQ4u > c-wiz > div.l5cSPd > c-wiz:nth-child(3) > div > div > div > div.kCsInf.ZJqrAd.qiy8jf.G9g6o > div > div.TPQEac.qg10C.RCpQOe > a > button > span");
//"learn more"
// await navigationPromise;
//This is where timeout error occurs:
await page.waitForSelector('#featured > c-wiz > div.OXo54d > div > div > div > span > span > span.veMtCf');
// await navigation;
await page.screenshot({path: 'learnmore.png'});
console.log('GOT THIS FAR:)');
//await page.close();
await browser.close();
return 0;
}
catch(err){
console.error(err);
}
})()
.then(resolvedValue => {
console.log(resolvedValue);
})
.catch(rejectedValue => {
console.log(rejectedValue);
})
Your timeout occurs because the selector you are waitng for is not exist on the page. (If you are opening the browser console where the script stucks and launch $(selector) it will return null)
Google uses dynamic class and id values, exactly to prevent (or to make it harder) to retrieve data by scripts, the selectors will have different values everytime you visit the page.
If you really need to scrape its content you can use XPath selectors which are less fragile compared to dynamically changing selector names:
E.g.:
await page.waitForXpath('//h3[contains(text(), "The Best Hotels in London")]')
const link = await page.$x('//h3[contains(text(), "The Best Hotels in London")]')
await link[0].click()
Docs references:
page.waitForXpath
page.$x

Unable to click on element using xpath - TypeError: Cannot read property 'click' of undefined

I am trying to click a checkbox inside div for the below css selector:
#sheet1 > tbody > tr:nth-child(2) > td > div > div.GMPageOne > table > tbody > tr.GMDataRow.GMClassFocused > td.GMClassFocusedCell.GMWrap0.GMAlignCenter.GMBool0.GMCell.IBSheetFont0.GMEmpty.HideCol0C2
I tried different ways from google search but none worked.
const example = await page.$x('//*[#id="sheet1"]/tbody/tr[2]/td/div/div[1]/table/tbody/tr[2]/td[3]',{waitUntil: 'networkidle0',});
delay(1000);
await example[0].click();
This is giving an error:
TypeError: Cannot read property 'click' of undefined
at C:\Apps\headless\node_modules\project\esi.js:60:20
at process._tickCallback (internal/process/next_tick.js:68:7)
It looks like invalid syntax.
Try this:
await page.waitForXPath('//*[#id="sheet1"]/tbody/tr[2]/td/div/div[1]/table/tbody/tr[2]/td[3]', 5000);
const [example] = await page.$x('//*[#id="sheet1"]/tbody/tr[2]/td/div/div[1]/table/tbody/tr[2]/td[3]');
if(example) await example.click();

puppeteer howto get element tagName

I would like to get an element's tagName. Should be button in following example.
const puppeteer = require('puppeteer')
async function run () {
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage()
const html = `
<div>
<button type="button">click me</button>
<span>Some words.</span>
</div>
`
await page.setContent(html)
const elements = await page.$$('button')
const tagName = await elements[0].$eval('*', node => node.tagName)
console.log(tagName) // expect to be 'button'
await browser.close()
}
run()
The error message said Error: failed to find element matching selector "*"
I can tell elements matched one element as elements.length is 1
Where is wrong?
========== Edit ==========
Let's say I already had elements beforehand, how to get the tagName out of it.
Thanks!
Try using page.$eval to select the button, and then get the tagName from the button:
const tagName = await page.$eval('button', button => button.tagName);
If you already have an elementHandle like elements[0], you can get an attribute from that element by passing it through page.evaluate:
const tagName = await page.evaluate(
element => element.tagName,
elements[0]
);
It appears your elements is an array of ElementHandles.
In that case, there may be a slightly more straightforward syntax:
const tag_name = await (await elements[0].getProperty('tagName')).jsonValue()
This does not involve referring to the page object.
Thanks!

Resources