I've this simple code chunk:
const BUSINESS = 'lalala';
await page.waitForSelector('#searchboxinput').then(
page.evaluate( (BUSINESS) => {
document.querySelector('#searchboxinput').value = BUSINESS
}, BUSINESS)
),
If I set wait for selector -> then, I would expect then to be executed when the selector exists, but I'm getting a Cannot set property value of 'null'.
So I uderstand that document.querySelector('#searchboxinput') == undefined while I suppose that it cannot be possible as it's executed when waitForSelector promise is finished...
is that a bug or I'm missing something?
Not sure if I understand correctly as your chunk is syntactically not complete and not reprducible. But what if you use the returned value of page.waitForSelector()? This seems working:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: false, defaultViewport: null });
try {
const [page] = await browser.pages();
await page.goto('https://example.org/');
const BUSINESS = 'lalala';
await page.waitForSelector('a').then((element) => {
page.evaluate((element, BUSINESS) => {
element.textContent = BUSINESS;
}, element, BUSINESS);
});
} catch (err) { console.error(err); }
Related
I'm trying to automate away downloading multiple files on thingiverse. I choose an object at random. But I'm having a hard time locating the link I need, clicking and then downloading. Has someone run into this before can I get some help?
I've tried several other variations.
import puppeteer from 'puppeteer';
async function main() {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files');
const buttons = await page.$x(`//a[contains(text(), 'Download')]`);
if(buttons.length > 0){
console.log(buttons.length);
} else {
console.log('no buttons');
}
await wait(5000);
await browser.close();
return 'Finish';
}
async function wait(time: number) {
return new Promise(function (resolve) {
setTimeout(resolve, time);
});
}
function start() {
main()
.then((test) => console.log('DONE'))
.catch((reason) => console.log('Error: ', reason));
}
start();
Download Page
Code
I was able to get it to work.
The selector is: a[class^="ThingFile__download"]
Puppeteer is: const puppeteer = require('puppeteer-extra');
Before the await page.goto() I always recommend setting the viewport:
await page.setViewport({width: 1920, height: 720});
After that is set, change the await page.goto() to have a waitUntil option:
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files', { waitUntil: 'networkidle0' }); // wait until page load
Next, this is a very important part. You have to do what is called waitForSelector() or waitForFunction().
I added both of these lines of code after the const response:
await page.waitForSelector('a[class^="ThingFile__download"]', {visible: true})
await page.waitForFunction("document.querySelector('a[class^=\"ThingFile__download\"]') && document.querySelector('a[class^=\"ThingFile__download\"]').clientHeight != 0");
Next, get the buttons. For my testing I just grabbed the button href.
const buttons = await page.$eval('a[class^="ThingFile__download"]', anchor => anchor.getAttribute('href'));
Lastly, do not check the .length of this variable. In this case we are just returning the href value which is a string. You will get a Promise of an ElementHandle when you try getting just the button:
const button = await page.$('a[class^="ThingFile__download"]');
console.log(button)
if (button) { ... }
Now if you change that page.$ to be page.$$, you will be getting a Promise of an Array<ElementHandle>, and will be able to use .length there.
const buttonsAll = await page.$$('a[class^="ThingFile__download"]');
console.log(buttonsAll)
if (buttons.length > 0) { ... }
Hopefully this helps, and if you can't figure it out I can post my full source later if I have time to make it look better.
I've recently converted a script over from Puppeteer to Puppeteer Cluster and during testing I've observed some odd results when testing multiple pages concurrently.
Effectively I'm loading a single page and then iterating over the product options on the page and gathering the price for any product variants.
One particular product has around 9 product variants, sometimes I will accurately capture all 9 variants, whereas on the next testing cycle it may only return 2 or 3 variants.
Any help would be greatly appreciated!
const puppeteer = require('puppeteer');
const { Cluster } = require('puppeteer-cluster');
const Product = require('../utils/product')
const config = require('../config/config.json')
const selectors = config.productData;
(async () => {
const urls = [
{link: ...},
{link: ...},
{link: ...}
]
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 5,
puppeteerOptions: {
headless: false
},
});
await cluster.task(async ({ page, data: url }) => {
//instantiate a new product object
const product = new Product();
await page.goto(url, { waitUntil: 'load' });
const skuprice = await page.$eval(selectors.price, element => element.innerText);
console.log('Sku Price:' + skuprice)
//deal with variants
const options = await page.$$eval(selectors.variant, elements => elements.map(element=>element.id))
if (options.length > 0) {
//set up a variants array
for (let index = 0; index < options.length; index++) {
const element = options[index];
await page.waitForSelector(`#${element}`);
await page.$eval(`#${element}`, radio => radio.click());
await page.waitForTimeout(500);
const variantprice = await page.$eval(selectors.price, element => element.innerText);
console.log('Variant Price:' + variantprice)
}
}
});
urls.forEach(url => {
cluster.queue(url.link);
})
// many more pages
await cluster.idle();
await cluster.close();
})();
Dynamic javascript page should be scraped when all of the element is visible.
You can do following tricks:
[1] wait until selector is visible, check withawait page.waitForSelector(selector, {visible: true, timeout: 0})
[2] wait for desired time, but this is more flaky and prone to resulting error.
You can simplify and rewrite your code, like this:
await page.waitForSelector(`#${element}`, {visible: true, timeout: 0})
await page.click(`#${element}`)
/* await page.waitForTimeout(500) <= prone to error, use line below */
await page.waitForSelector(selectors.price, {visible: true, timeout: 0})
const variantprice = await page.$eval(selectors.price, element => element.innerText)
For anyone else searching for an answer, it looked like some of my CSS selectors weren't working on page refresh.
Re-reading the project documentation its worth including the following:
// Event handler to be called in case of problems
cluster.on('taskerror', (err, data) => {
console.log(`Error crawling ${data}: ${err.message}`);
});
I'm trying to select a certain size on this website, I have tried multiple approaches that have worked for me so far in puppeteer but none of them seems to work on this instance. I can get the size tab open but cannot figure how to select a specific size.
my code:
await page.goto(data[ii][0]), { //the website link
waitUntil: 'load',
timeout: 0
};
//part 1
await page.click('span[class="default-text__21bVM"]'); //opens size menu
let size = data[ii][1]; //gets size from an array, for example 9
// const xp = `//div[contains(#class, "col-3") and text()="${size}"]`;
// await page.waitForXPath(xp);
// const [sizeButton] = await page.$x(xp);
// await sizeButton.evaluate(btn => {
// btn.parentNode.dispatchEvent(new Event("mousedown"));
// });
await delay(1500);
await page.evaluate((size) => {
document.querySelector(`div > div[class="col-3"][text="${size}"]`).parentElement.click()
});
await page.click('span[class="text__1S19c"]'); // click on submit button
Neither of my approaches worked. I get Error: Evaluation failed: TypeError: Cannot read property 'parentElement' of null meaning the div wasn't found for whatever reason
this is the html of the div I'm trying to click on:
I tried different variations of the querySelector but none of them worked so I'm posting the problem here to see if this is even possible, or if I just made a mistake along the way
This seems working:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: false, defaultViewport: null });
try {
const [page] = await browser.pages();
await page.goto('https://releases.footshop.com/nike-air-force-1-07-lx-wmns-5iBRxXsBHBhvh4GFc9ge');
await page.click('span[class="default-text__21bVM"]');
const size = 9;
const xp = `//div[contains(#class, "col-3") and text()="${size}"]`;
await page.waitForXPath(xp);
const [sizeButton] = await page.$x(xp);
await sizeButton.click();
await page.click('span[class="text__1S19c"]');
} catch (err) { console.error(err); }
I am trying to click the "Create File" button on fakebook's download your information page. I am currently able to goto the page, and I wait for the login process to finish. However, when I try to detect the button using
page.$x("//div[contains(text(),'Create File')]")
nothing is found. The same thing occurs when I try to find it in the chrome dev tools console, both in a puppeteer window and in a regular window outside of the instance of chrome puppeteer is controlling:
This is the html info for the element:
I am able to find the element however after I have clicked on it using the chrome dev tools inspector tool:
(the second print statement is from after I have clicked on it with the element inspector tool)
How should I select this element? I am new to puppeteer and to xpath so I apologize if I just missed something obvious.
A small few links I currently remember looking at previously:
Puppeteer can't find selector
puppeteer cannot find element
puppeteer: how to wait until an element is visible?
My Code:
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
(async () => {
let browser;
try {
puppeteer.use(StealthPlugin());
browser = await puppeteer.launch({
headless: false,
// path: "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
args: ["--disable-notifications"],
});
const pages = await browser.pages();
const page = pages[0];
const url = "https://www.facebook.com/dyi?referrer=yfi_settings";
await page.goto(url);
//Handle the login process. Since the login page is different from the url we want, I am going to assume the user
//has logged in if they return to the desired page.
//Wait for the login page to process
await page.waitForFunction(
(args) => {
return window.location.href !== args[0];
},
{ polling: "mutation", timeout: 0 },
[url]
);
//Since multifactor auth can resend the user temporarly to the desired url, use a little debouncing to make sure the user is completely done signing in
// make sure there is no redirect for mfa
await page.waitForFunction(
async (args) => {
// function to make sure there is a debouncing delay between checking the url
// Taken from: https://stackoverflow.com/a/49813472/11072972
function delay(delayInms) {
return new Promise((resolve) => {
setTimeout(() => {
resolve(2);
}, delayInms);
});
}
if (window.location.href === args[0]) {
await delay(2000);
return window.location.href === args[0];
}
return false;
},
{ polling: "mutation", timeout: 0 },
[url]
);
// await page.waitForRequest(url, { timeout: 100000 });
const requestArchiveXpath = "//div[contains(text(),'Create File')]";
await page.waitForXPath(requestArchiveXpath);
const [requestArchiveSelector] = await page.$x(requestArchiveXpath);
await page.click(requestArchiveSelector);
page.waitForTimeout(3000);
} catch (e) {
console.log("End Error: ", e);
} finally {
if (browser) {
await browser.close();
}
}
})();
Resolved using the comment above by #vsemozhebuty and source. Only the last few lines inside the try must change:
const iframeXpath = "//iframe[not(#hidden)]";
const requestArchiveXpath = "//div[contains(text(),'Create File')]";
//Wait for and get iframe
await page.waitForXPath(iframeXpath);
const [iframeHandle] = await page.$x(iframeXpath);
//content frame for iframe => https://devdocs.io/puppeteer/index#elementhandlecontentframe
const frame = await iframeHandle.contentFrame();
//Wait for and get button
await frame.waitForXPath(requestArchiveXpath);
const [requestArchiveSelector] = await frame.$x(requestArchiveXpath);
//click button
await requestArchiveSelector.click();
await page.waitForTimeout(3000);
Currently I'm using Puppeteer to fetch cookies & headers from a page, however it's using a bot prevention system which is only bypassed when clicking on the page; I don't want to keep this sequential so it's "detectable"
How can I have my Puppeteer click anywhere on the page at random? regardless of wether it clicks a link, button etc..
I've currently got this code
const getCookies = async (state) => {
try {
state.browser = await launch_browser(state);
state.context = await state.browser.createIncognitoBrowserContext();
state.page = await state.context.newPage();
await state.page.authenticate({
username: proxies.username(),
password: proxies.password(),
});
await state.page.setViewport(functions.get_viewport());
state.page.on('response', response => handle_response(response, state));
await state.page.goto('https://www.website.com', {
waitUntil: 'networkidle0',
});
await state.page.waitFor('.unlockLink a', {
timeout: 5000
});
await state.page.click('.unlockLink a');
await state.page.waitFor('input[id="nondevice"]', {
timeout: 5000
});
state.publicIpv4Address = await state.page.evaluate(() => {
return sessionStorage.getItem("publicIpv4Address");
});
state.csrfToken = await state.page.evaluate(() => {
return sessionStorage.getItem("csrf-token");
});
//I NEED TO CLICK HERE! CAN BE WHITESPACE, LINK, IMAGE
state.browser_cookies = await state.page.cookies();
state.browser.close();
for (const cookie of state.browser_cookies) {
if(cookie.name === "dtPC") {
state.dtpc = cookie.value;
}
await state.jar.setCookie(
`${cookie.name}=${cookie.value}`,
'https://www.website.com'
)
}
return state;
} catch(error) {
if(state.browser) {
state.browser.close();
}
throw new Error(error);
}
};
The simplest way I can think of out of my head to choose a random element from DOM would be probably something like using querySelectorAll() which will return you an array of all <div>s in your document (or choose any other element, like <p> or anything else), then you can easily use click() on random one from the result, for example:
await page.evaluate(() => {
const allDivs = document.querySelectorAll('.left-sidebar-toggle');
const randomElement = allDivs[Math.floor(Math.random() * allDivs.length)];
randomElement.click();
});