Can't click link using puppeteer - Thingiverse - node.js

I'm trying to automate away downloading multiple files on thingiverse. I choose an object at random. But I'm having a hard time locating the link I need, clicking and then downloading. Has someone run into this before can I get some help?
I've tried several other variations.
import puppeteer from 'puppeteer';
async function main() {
const browser = await puppeteer.launch({
headless: true,
});
const page = await browser.newPage();
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files');
const buttons = await page.$x(`//a[contains(text(), 'Download')]`);
if(buttons.length > 0){
console.log(buttons.length);
} else {
console.log('no buttons');
}
await wait(5000);
await browser.close();
return 'Finish';
}
async function wait(time: number) {
return new Promise(function (resolve) {
setTimeout(resolve, time);
});
}
function start() {
main()
.then((test) => console.log('DONE'))
.catch((reason) => console.log('Error: ', reason));
}
start();
Download Page
Code

I was able to get it to work.
The selector is: a[class^="ThingFile__download"]
Puppeteer is: const puppeteer = require('puppeteer-extra');
Before the await page.goto() I always recommend setting the viewport:
await page.setViewport({width: 1920, height: 720});
After that is set, change the await page.goto() to have a waitUntil option:
const response = await page.goto('https://www.thingiverse.com/thing:2033856/files', { waitUntil: 'networkidle0' }); // wait until page load
Next, this is a very important part. You have to do what is called waitForSelector() or waitForFunction().
I added both of these lines of code after the const response:
await page.waitForSelector('a[class^="ThingFile__download"]', {visible: true})
await page.waitForFunction("document.querySelector('a[class^=\"ThingFile__download\"]') && document.querySelector('a[class^=\"ThingFile__download\"]').clientHeight != 0");
Next, get the buttons. For my testing I just grabbed the button href.
const buttons = await page.$eval('a[class^="ThingFile__download"]', anchor => anchor.getAttribute('href'));
Lastly, do not check the .length of this variable. In this case we are just returning the href value which is a string. You will get a Promise of an ElementHandle when you try getting just the button:
const button = await page.$('a[class^="ThingFile__download"]');
console.log(button)
if (button) { ... }
Now if you change that page.$ to be page.$$, you will be getting a Promise of an Array<ElementHandle>, and will be able to use .length there.
const buttonsAll = await page.$$('a[class^="ThingFile__download"]');
console.log(buttonsAll)
if (buttons.length > 0) { ... }
Hopefully this helps, and if you can't figure it out I can post my full source later if I have time to make it look better.

Related

Puppeteer click parent node of a element with no id

I'm trying to select a certain size on this website, I have tried multiple approaches that have worked for me so far in puppeteer but none of them seems to work on this instance. I can get the size tab open but cannot figure how to select a specific size.
my code:
await page.goto(data[ii][0]), { //the website link
waitUntil: 'load',
timeout: 0
};
//part 1
await page.click('span[class="default-text__21bVM"]'); //opens size menu
let size = data[ii][1]; //gets size from an array, for example 9
// const xp = `//div[contains(#class, "col-3") and text()="${size}"]`;
// await page.waitForXPath(xp);
// const [sizeButton] = await page.$x(xp);
// await sizeButton.evaluate(btn => {
// btn.parentNode.dispatchEvent(new Event("mousedown"));
// });
await delay(1500);
await page.evaluate((size) => {
document.querySelector(`div > div[class="col-3"][text="${size}"]`).parentElement.click()
});
await page.click('span[class="text__1S19c"]'); // click on submit button
Neither of my approaches worked. I get Error: Evaluation failed: TypeError: Cannot read property 'parentElement' of null meaning the div wasn't found for whatever reason
this is the html of the div I'm trying to click on:
I tried different variations of the querySelector but none of them worked so I'm posting the problem here to see if this is even possible, or if I just made a mistake along the way
This seems working:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: false, defaultViewport: null });
try {
const [page] = await browser.pages();
await page.goto('https://releases.footshop.com/nike-air-force-1-07-lx-wmns-5iBRxXsBHBhvh4GFc9ge');
await page.click('span[class="default-text__21bVM"]');
const size = 9;
const xp = `//div[contains(#class, "col-3") and text()="${size}"]`;
await page.waitForXPath(xp);
const [sizeButton] = await page.$x(xp);
await sizeButton.click();
await page.click('span[class="text__1S19c"]');
} catch (err) { console.error(err); }

Puppeteer unable to find element using xPath contains(text()) until after element has been selected in chrome dev tools

I am trying to click the "Create File" button on fakebook's download your information page. I am currently able to goto the page, and I wait for the login process to finish. However, when I try to detect the button using
page.$x("//div[contains(text(),'Create File')]")
nothing is found. The same thing occurs when I try to find it in the chrome dev tools console, both in a puppeteer window and in a regular window outside of the instance of chrome puppeteer is controlling:
This is the html info for the element:
I am able to find the element however after I have clicked on it using the chrome dev tools inspector tool:
(the second print statement is from after I have clicked on it with the element inspector tool)
How should I select this element? I am new to puppeteer and to xpath so I apologize if I just missed something obvious.
A small few links I currently remember looking at previously:
Puppeteer can't find selector
puppeteer cannot find element
puppeteer: how to wait until an element is visible?
My Code:
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
(async () => {
let browser;
try {
puppeteer.use(StealthPlugin());
browser = await puppeteer.launch({
headless: false,
// path: "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",
args: ["--disable-notifications"],
});
const pages = await browser.pages();
const page = pages[0];
const url = "https://www.facebook.com/dyi?referrer=yfi_settings";
await page.goto(url);
//Handle the login process. Since the login page is different from the url we want, I am going to assume the user
//has logged in if they return to the desired page.
//Wait for the login page to process
await page.waitForFunction(
(args) => {
return window.location.href !== args[0];
},
{ polling: "mutation", timeout: 0 },
[url]
);
//Since multifactor auth can resend the user temporarly to the desired url, use a little debouncing to make sure the user is completely done signing in
// make sure there is no redirect for mfa
await page.waitForFunction(
async (args) => {
// function to make sure there is a debouncing delay between checking the url
// Taken from: https://stackoverflow.com/a/49813472/11072972
function delay(delayInms) {
return new Promise((resolve) => {
setTimeout(() => {
resolve(2);
}, delayInms);
});
}
if (window.location.href === args[0]) {
await delay(2000);
return window.location.href === args[0];
}
return false;
},
{ polling: "mutation", timeout: 0 },
[url]
);
// await page.waitForRequest(url, { timeout: 100000 });
const requestArchiveXpath = "//div[contains(text(),'Create File')]";
await page.waitForXPath(requestArchiveXpath);
const [requestArchiveSelector] = await page.$x(requestArchiveXpath);
await page.click(requestArchiveSelector);
page.waitForTimeout(3000);
} catch (e) {
console.log("End Error: ", e);
} finally {
if (browser) {
await browser.close();
}
}
})();
Resolved using the comment above by #vsemozhebuty and source. Only the last few lines inside the try must change:
const iframeXpath = "//iframe[not(#hidden)]";
const requestArchiveXpath = "//div[contains(text(),'Create File')]";
//Wait for and get iframe
await page.waitForXPath(iframeXpath);
const [iframeHandle] = await page.$x(iframeXpath);
//content frame for iframe => https://devdocs.io/puppeteer/index#elementhandlecontentframe
const frame = await iframeHandle.contentFrame();
//Wait for and get button
await frame.waitForXPath(requestArchiveXpath);
const [requestArchiveSelector] = await frame.$x(requestArchiveXpath);
//click button
await requestArchiveSelector.click();
await page.waitForTimeout(3000);

Click anywhere on page using Puppeteer

Currently I'm using Puppeteer to fetch cookies & headers from a page, however it's using a bot prevention system which is only bypassed when clicking on the page; I don't want to keep this sequential so it's "detectable"
How can I have my Puppeteer click anywhere on the page at random? regardless of wether it clicks a link, button etc..
I've currently got this code
const getCookies = async (state) => {
try {
state.browser = await launch_browser(state);
state.context = await state.browser.createIncognitoBrowserContext();
state.page = await state.context.newPage();
await state.page.authenticate({
username: proxies.username(),
password: proxies.password(),
});
await state.page.setViewport(functions.get_viewport());
state.page.on('response', response => handle_response(response, state));
await state.page.goto('https://www.website.com', {
waitUntil: 'networkidle0',
});
await state.page.waitFor('.unlockLink a', {
timeout: 5000
});
await state.page.click('.unlockLink a');
await state.page.waitFor('input[id="nondevice"]', {
timeout: 5000
});
state.publicIpv4Address = await state.page.evaluate(() => {
return sessionStorage.getItem("publicIpv4Address");
});
state.csrfToken = await state.page.evaluate(() => {
return sessionStorage.getItem("csrf-token");
});
//I NEED TO CLICK HERE! CAN BE WHITESPACE, LINK, IMAGE
state.browser_cookies = await state.page.cookies();
state.browser.close();
for (const cookie of state.browser_cookies) {
if(cookie.name === "dtPC") {
state.dtpc = cookie.value;
}
await state.jar.setCookie(
`${cookie.name}=${cookie.value}`,
'https://www.website.com'
)
}
return state;
} catch(error) {
if(state.browser) {
state.browser.close();
}
throw new Error(error);
}
};
The simplest way I can think of out of my head to choose a random element from DOM would be probably something like using querySelectorAll() which will return you an array of all <div>s in your document (or choose any other element, like <p> or anything else), then you can easily use click() on random one from the result, for example:
await page.evaluate(() => {
const allDivs = document.querySelectorAll('.left-sidebar-toggle');
const randomElement = allDivs[Math.floor(Math.random() * allDivs.length)];
randomElement.click();
});

reuse browser instance puppeterr

I would like to know if it is possible to have one .js file that opens a browser instance, creates new page/tab logs in to a website (with username/password) and just stays idle. And in a second .js file use file one browser instance and its page.
1.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox'],
ignoreDefaultArgs: ["--hide-scrollbars"]
});
const page = await browser.newPage();
const response = await page.goto('https://google.com');
console.log('Browser open in the background (headless)!');
//await browser.close();
})();
2.js
const puppeteer = require('puppeteer');
(async () => {
// instructions on browser instance/page from 1.js ...
})();
The crawler object keeps the state of the browser instance and
wherever you call/pass that instance, it refers to the same chromium
in the "background". If this is an overkill, and you just want to
connect to an already running chromium using puppeteer, you can do it
with puppeteer.connect. take a look at this:
How to "hook in" puppeteer into a running Chrome instance/tab – mbit
Yeah I guess its to overkill for me :). But the link you posted was what I wanted but have 2 questions.
This Is a sample what I have.
// 1.js
// open chromium, new tab, go to google.com, print browserWSEndpoint, disconnect
const puppeteer = require('puppeteer');
(async () => {
var browser = await puppeteer.launch({headless: false});
var page = await browser.newPage();
var response = await page.goto('https://google.com');
var browserWSEndpoint = browser.wsEndpoint();
console.log(browserWSEndpoint); // prints: ws://127.0.0.1:51945/devtools/browser/6462daeb-469b-4ae4-bfd1-c3bd2f26aa5e
browser.disconnect();
})();
And
// 2.js
// connect to the open browser with the browserWSEndpoint manualy put in, ... , disconect.
const puppeteer = require('puppeteer');
(async () => {
var browser = await puppeteer.connect({browserWSEndpoint: 'ws://127.0.0.1:51945/devtools/browser/6462daeb-469b-4ae4-bfd1-c3bd2f26aa5e'});
// somehow use the tab that is open from 1.js (google.com)
await browser.disconnect();
})();
I get the browserWSEndpoint string from the console.log 1.js.
It works great but I have two difficulties.
1 - How can I use the variable browserWSEndpoint from 1.js so I dont have to always copy paste it to 2.js.
2- If I open a new page/tab on 1.js and go for example to google and disconnect (browser.disconnect()), how can use that page/tab on 2.js.
Working tested code
getEmail.js is where actual page will be exported. ask clarifications in comments.
getBrowser.js
const puppeteer = require("puppeteer");
module.exports = {
browser: {},
pptr_instance_url:"",
getBrow: async function(){ try {
console.log("line6",this.pptr_instance_url);
this.browser = await puppeteer.connect({browserWSEndpoint: this.pptr_instance_url}).catch(async e =>{
console.log("end point",this.pptr_instance_url);
this.browser = await puppeteer.launch({timeout: 0});
this.pptr_instance_url = this.browser.wsEndpoint();
console.log("line 11",this.pptr_instance_url);
return this.browser;
});
return this.browser;
}catch (e){
console.log(e)
} }
}
pageRenderer.js
const abc = require("../getBrowsernew")
const pageRenderer = async (request) => {
const {reactProjectUrl} = constants, uuidStorageKey = uuidv4(),
localStorageObject = {[uuidStorageKey]: request.body};
const browser = await abc.getBrow();
let url = "someurl.com"
await setLocalStorage(browser, url, localStorageObject);
const page = await browser.newPage();
const response = await page.goto(
url,
{
waitUntil: "networkidle0"
}, {waitUntil: 'load', timeout: 0}
);
return page;
}
module.exports = pageRenderer;
getEmail.js
const pageRenderer = require("./pageRenderer");
const getEmail =async (request) =>{
const page = await pageRenderer(request)
const emailbody = await page.content();
page.close();
return emailbody;
}
module.exports = getEmail;
You can implement this in many ways like having separate modules with functions, or different classes, and it depends on your particular need.
You can have a class that launches the browser and creates pages plus some extra functionalities.
//1.js
const puppeteer = require('puppeteer');
class Crawler {
constructor() {
//init with whatever values you'll need in your class
//or throw an error if the object wasn't created through build
}
static async build() {
let crawler = new Crawler();
await crawler._init();
return crawler;
}
async _init() {
//launch the browser and keep its state
this._browser = await puppeteer.launch({timeout: 0});
//create a page and keep its state
this._page = await this._browser.newPage();
}
//getter
get browser() {
return this._browser;
}
//getter
get page() {
return this._page;
}
async login(url) {
await this._page.goto(url);
//do whatever is related to the login process
}
}
module.exports = {Crawler};
Note that we can't have async functions in the constructor. Since launching browser is async, we use something like a build function to initiate the browser when creating the object. Then we create the crawler object like this:
//2.js
const {Crawler} = require('./1.js');
(async() => {
let crawler = await Crawler.build();
await crawler.login("https://example.com");
//access crawler's page
console.log(crawler.page.url());
})();
Keep in mind that this is only an example and by no means representative of the best practices. So first, you need to understand what you want to achieve out of such encapsulation, then adopt the method that suits you best.
Read more on JS classes here

Puppeteer doesn't recognize selector with just type and class but accepts full selector

I'm trying to click on a cookiewall on a webpage, but Puppeteer refuses to recognize the short selector with just the type and class selector (button.button-action). Changing this to the full CSS selector fixes the problem but isn't a viable solution since any chance in parent elements can break the selector. As far as I know this shouldn't be a problem because on the page in question using document.querySelector("button.button-action") also returns the element I'm trying to click.
The code that doesn't work:
const puppeteer = require('puppeteer');
const main = async () => {
const browser = await puppeteer.launch({headless: false,});
const page = await browser.newPage();
await page.goto("https://www.euclaim.nl/check-uw-vlucht#/problem", { waitUntil: 'networkidle2' });
const cookiewall = await page.waitForSelector("button.button-action", {visible: true});
await cookiewall.click();
};
main();
The code that does work:
const puppeteer = require('puppeteer');
const main = async () => {
const browser = await puppeteer.launch({headless: false,});
const page = await browser.newPage();
await page.goto("https://www.euclaim.nl/check-uw-vlucht#/problem", { waitUntil: 'networkidle2' });
const cookiewall = await page.waitForSelector("#InfoPopupContainer > div.ipBody > div > div > div.row.actionButtonContainer.mobileText > button", {visible: true});
await cookiewall.click();
};
main();
The problem is that you have three button.button-action there. And the first match is not visible.
One thing you could do is waitForSelector but without the visible bit (because it will check the first button).
And then iterate through all items checking which item is clickable.
await page.waitForSelector("button.button-action");
const actions = await page.$$("button.button-action");
for(let action of actions) {
if(await action.boundingBox()){
await action.click();
break;
}
}

Resources